Rahul Pathak Opening Session | AWS Startup Showcase S2 E2

>>Hello, everyone. Welcome to the cubes presentation of the 80 minutes startup showcase. Season two, episode two, the theme is data as code, the future of analytics. I'm John furry, your host. We had a great day lineup for you. Fast growing startups, great lineup of companies, founders, and stories around data as code. And we're going to kick it off here with our opening keynote with Rahul Pathak VP of analytics at AWS cube alumni. Right? We'll thank you for coming on and being the opening keynote for this awesome event. >>Yeah. And it's great to see you, and it's great to be part of this event, uh, excited to, um, to help showcase some of the great innovation that startups are doing on top of AWS. >>Yeah. We last spoke at AWS reinvent and, uh, a lot's happened there, service loss of serverless as the center of the, of the action, but all these start-ups rock set Dremio Cribble monks next Liccardo, a HANA imply all doing great stuff. Data as code has a lot of traction. So a lot of still momentum going on in the marketplace. Uh, pretty exciting. >>No, it's, uh, it's awesome. I mean, I think there's so much innovation happening and you know, the, the wonderful part of working with data is that the demand for services and products that help customers drive insight from data is just skyrocketing and has no sign of no sign of slowing down. And so it's a great time to be in the data business. >>It's interesting to see the theme of the show getting traction, because you start to see data being treated almost like how developers write software, taking things out of branches, working on them, putting them back in, uh, machine learnings, uh, getting iterated on you, seeing more models, being trained differently with better insights, action ones that all kind of like working like code. And this is a whole nother way. People are reinventing their businesses. This has been a big, huge wave. What's your reaction to that? >>Uh, I think it's spot on, I mean, I think the idea of data's code and bringing some of the repeatability of processes from software development into how people built it, applications is absolutely fundamental and especially so in machine learning where you need to think about the explainability of a model, what version of the world was it trained on? When you build a better model, you need to be able to explain and reproduce it. So I think your insights are spot on and these ideas are showing up in all stages of the data work flow from ingestion to analytics to I'm out >>This next way is about modernization and going to the next level with cloud-scale. Uh, thank you so much for coming on and being the keynote presenter here for this great event. Um, I'll let you take it away. Reinventing businesses, uh, with ads analytics, right? We'll take it away. >>Okay, perfect. Well, folks, we're going to talk about, uh, um, reinventing your business with, uh, data. And if you think about it, the first wave of reinvention was really driven by the cloud. As customers were able to really transform how they thought about technology and that's well on her way. Although if you stop and think about it, I think we're only about five to 10% of the way done in terms of it span being on the cloud. So lots of work to do there, but we're seeing another wave of reinvention, which is companies reinventing their businesses with data and really using data to transform what they're doing to look for new opportunities and look for ways to operate more efficiently. And I think the past couple of years of the pandemic, it really only accelerated that trend. And so what we're seeing is, uh, you know, it's really about the survival of the most informed folks for the best data are able to react more quickly to what's happening. >>Uh, we've seen customers being able to scale up if they're in, say the delivery business or scale down, if they were in the travel business at the beginning of all of this, and then using data to be able to find new opportunities and new ways to serve customers. And so it's really foundational and we're seeing this across the board. And so, um, you know, it's great to see the innovation that's happening to help customers make sense of all of this. And our customers are really looking at ways to put data to work. It's about making better decisions, finding new efficiencies and really finding new opportunities to succeed and scale. And, um, you know, when it comes to, uh, good examples of this FINRA is a great one. You may not have heard of them, but that the U S equities regulators, all trading that happens in equities, they keep track of they're look at about 250 billion records per day. >>Uh, the examiner, I was only EMR, which is our spark and Hadoop service, and they're processing 20 terabytes of data running across tens of thousands of nodes. And they're looking for fraud and bad actors in the market. So, um, you know, huge, uh, transformation journey for FINRA over the years of customer I've gotten to work with personally since really 2013 onward. So it's been amazing to see their journey, uh, Pinterest, not a great customer. I'm sure everyone's familiar with, but, um, you know, they're about visual search and discovery and commerce, and, um, they're able to scale their daily lot searches, um, really a factor of three X or more, uh, drive down their costs. And they're using the Amazon Opus search service. And really what we're trying to do at AWS is give our customers the most comprehensive set of services for the end-to-end journey around, uh, data from ingestion to analytics and machine learning. And we will want to provide a comprehensive set of capabilities for ingestion, cataloging analytics, and then machine learning. And all of these are things that our partners and the startups that are run on us have available to them to build on as they build and deliver value for their customers. >>And, you know, the way we think about this is we want customers to be able to modernize what they're doing and their infrastructure. And we provide services for that. It's about unifying data, wherever it lives, connecting it. So the customers can build a complete picture of their customers and business. And then it's about innovation and really using machine learning to bring all of this unified data, to bear on driving new innovation and new opportunities for customers. And what we're trying to do AWS is really provide a scalable and secure cloud platform that customers and partners can build on a unifying is about connecting data. And it's also about providing well-governed access to data. So one of the big trends that we see is customers looking for the ability to make self-service data available to that customer there and use. And the key to that is good foundational governance. >>Once you can define good access controls, you then are more comfortable setting data free. And, um, uh, the other part of it is, uh, data lakes play a huge role because you need to be able to think about structured and unstructured data. In fact, about 80% of the data being generated today, uh, is unstructured. And you want to be able to connect data that's in data lakes with data that's in purpose-built data stores, whether that's databases on AWS databases, outside SAS products, uh, as well as things like data warehouses and machine learning systems, but really connecting data as key. Uh, and then, uh, innovation, uh, how can we bring to bear? And we imagine all processes with new technologies like AI and machine learning, and AI is also key to unlocking a lot of the value that's in unstructured data. If you can figure out what's in an imagine the sentiment of audio and do that in real-time that lets you then personalize and dynamically tailor experiences, all of which are super important to getting an edge, um, in, uh, in the modern marketplace. And so at AWS, we, when we think about connecting the dots across sources of data, allowing customers to use data, lakes, databases, analytics, and machine learning, we want to provide a common catalog and governance and then use these to help drive new experiences for customers and their apps and their devices. And then this, you know, in an ideal world, we'll create a closed loop. So you create a new experience. You observe our customers interact with it, that generates more data, which is a data source that feeds into the system. >>And, uh, you know, on AWS, uh, thinking about a modern data strategy, uh, really at the core is a data lakes built on us three. And I'll talk more about that in a second. Then you've got services like Athena included, lake formation for managing that data, cataloging it and querying it in place. And then you have the ability to use the right tool for the right job. And so we're big believers in purpose-built services for data because that's where you can avoid compromising on performance functionality or scale. Uh, and then as I mentioned, unification and inter interconnecting, all of that data. So if you need to move data between these systems, uh, there's well-trodden pathways that allow you to do that, and then features built into services that enable that. >>And, um, you know, some of the core ideas that guide the work that we do, um, scalable data lakes at key, um, and you know, this is really about providing arbitrarily scalable high throughput systems. It's about open format data for future-proofing. Uh, then we talk about purpose-built systems at the best possible functionality, performance, and cost. Uh, and then from a serverless perspective, this has been another big trend for us. We announced a bunch of serverless services and reinvented the goal here is to really take away the need to manage infrastructure from customers. They can really focus about driving differentiated business value, integrated governance, and then machine learning pervasively, um, not just as an end product for data scientists, but also machine learning built into data, warehouses, visualization and a database. >>And so it's scalable data lakes. Uh, data three is really the foundation for this. One of our, um, original services that AWS really the backbone of so much of what we do, uh, really unmatched your ability, availability, and scale, a huge portfolio of analytics services, uh, both that we offer, but also that our partners and customers offer and really arbitrary skin. We've got individual customers and estimator in the expert range, many in the hundreds of petabytes. And that's just growing. You know, as I mentioned, we see roughly a 10 X increase in data volume every five years. So that's a exponential increase in data volumes, Uh, from a purpose-built perspective, it's the right tool for the right job, the red shift and data warehousing Athena for querying all your data. Uh, EMR is our managed sparking to do, uh, open search for log analytics and search, and then Kinesis and Amex care for CAFCA and streaming. And that's been another big trend is, uh, real time. Data has been exploding and customers wanting to make sense of that data in real time, uh, is another big deal. >>Uh, some examples of how we're able to achieve differentiated performance and purpose-built systems. So with Redshift, um, using managed storage and it's led us and since types, uh, the three X better price performance, and what's out there available to all our customers and partners in EMR, uh, with things like spark, we're able to deliver two X performance of open source with a hundred percent compatibility, uh, almost three X and Presto, uh, with on two, which is our, um, uh, new Silicon chips on AWS, better price performance, about 10 to 12% better price performance, and 20% lower costs. And then, uh, all compatible source. So drop your jobs, then have them run faster and cheaper. And that translates to customer benefits for better margins for partners, uh, from a serverless perspective, this is about simplifying operations, reducing total cost of ownership and freeing customers from the need to think about capacity management. If we invent, we, uh, announced serverless redshifts EMR, uh, serverless, uh, Kinesis and Kafka, um, and these are all game changes for customers in terms of freeing our customers and partners from having to think about infrastructure and allowing them to focus on data. >>And, um, you know, when it comes to several assumptions in analytics, we've really got a very full and complete set. So, uh, whether that's around data warehousing, big data processing streaming, or cataloging or governance or visualization, we want all of our customers to have an option to run something struggles as well as if they have specialized needs, uh, uh, instances are available as well. And so, uh, really providing a comprehensive deployment model, uh, based on the customer's use cases, uh, from a governance perspective, uh, you know, like information is about easy build and management of data lakes. Uh, and this is what enables data sharing and self service. And, um, you know, with you get very granular access controls. So rule level security, uh, simple data sharing, and you can tag data. So you can tag a group of analysts in the year when you can say those only have access to the new data that's been tagged with the new tags, and it allows you to very, scaleably provide different secure views onto the same data without having to make multiple copies, another big win for customers and partners, uh, support transactions on data lakes. >>So updates and deletes. And time-travel, uh, you know, John talked about data as code and with time travel, you can look at, um, querying on different versions of data. So that's, uh, a big enabler for those types of strategies. And with blue, you're able to connect data in multiple places. So, uh, whether that's accessing data on premises in other SAS providers or, uh, clouds, uh, as well as data that's on AWS and all of this is, uh, serverless and interconnected. And, um, and really it's about plugging all of your data into the AWS ecosystem and into our partner ecosystem. So this API is all available for integration as well, but then from an AML perspective, what we're really trying to do is bring machine learning closer to data. And so with our databases and warehouses and lakes and BI tools, um, you know, we've infused machine learning throughout our, by, um, the state of the art machine running that we offer through SageMaker. >>And so you've got a ML in Aurora and Neptune for broths. Uh, you can train machine learning models from SQL, directly from Redshift and a female. You can use free inference, and then QuickSight has built in forecasting built in natural language, querying all powered by machine learning, same with anomaly detection. And here are the ideas, you know, how can we up our systems get smarter at the surface, the right insights for our customers so that they don't have to always rely on smart people asking the right questions, um, and you know, uh, really it's about bringing data back together and making it available for innovation. And, uh, thank you very much. I appreciate your attention. >>Okay. Well done reinventing the business with AWS analytics rural. That was great. Thanks for walking through that. That was awesome. I have to ask you some questions on the end-to-end view of the data. That seems to be a theme serverless, uh, in there, uh, Mel integration. Um, but then you also mentioned picking the right tool for the job. So then you've got like all these things moving on, simplify it for me right now. So from a business standpoint, how do they modernize? What's the steps that the clients are taking with analytics, what's the best practice? How do they, what's the what's the high order bit here? >>Uh, so the basic hierarchy is, you know, historically legacy systems are rigid and inflexible, and they weren't really designed for the scale of modern data or the variety of it. And so what customers are finding is they're moving to the cloud. They're moving from legacy systems with punitive licensing into more flexible, more systems. And that allows them to really think about building a decoupled, scalable future proof architecture. And so you've got the ability to combine data lakes and databases and data warehouses and connect them using common KPIs and common data protection. And that sets you up to deal with arbitrary scale and arbitrary types. And it allows you to evolve as the future changes since it makes it easy to add in a new type of engine, as we invent a better one a few years from now. Uh, and then, uh, once you've kind of got your data in a cloud and interconnected in this way, you can now build complete pictures of what's going on. You can understand all your touch points with customers. You can understand your complete supply chain, and once you can build that complete picture of your business, you can start to use analytics and machine learning to find new opportunities. So, uh, think about modernizing, moving to the cloud, setting up for the future, connecting data end to end, and then figuring out how to use that to your advantage. >>I know as you mentioned, modern data strategy gives you the best of both worlds. And you've mentioned, um, briefly, I want to get a little bit more, uh, insight from you on this. You mentioned open, open formats. One of the themes that's come out of some of the interviews, these companies we're going to be hearing from today is open source. The role opens playing. Um, how do you see that integrating in? Because again, this is just like software, right? Open, uh, open source software, open source data. It seems to be a trend. What does open look like to you? How do you see that progressing? >>Uh, it's a great question. Uh, open operates on multiple dimensions, John, as you point out, there's open data formats. These are things like JSI and our care for analytics. This allows multiple engines tend to operate on data and it'll, it, it creates option value for customers. If you're going to data in an open format, you can use it with multiple technologies and that'll be future-proofed. You don't have to migrate your data. Now, if you're thinking about using a different technology. So that's one piece now that sort of software, um, also, um, really a big enabler for innovation and for customers. And you've got things like squat arc and Presto, which are popular. And I know some of the startups, um, you know, that we're talking about as part of the showcase and use these technologies, and this allows for really the world to contribute, to innovating and these engines and moving them forward together. And we're big believers in that we've got open source services. We contribute to open-source, we support open source projects, and that's another big part of what we do. And then there's open API is things like SQL or Python. Uh, again, uh, common ways of interacting with data that are broadly adopted. And this one, again, create standardization. It makes it easier for customers to inter-operate and be flexible. And so open is really present all the way through. And it's a big part, I think, of, uh, the present and the future. >>Yeah. It's going to be fun to watch and see how that grows. It seems to be a lot of traction there. I want to ask you about, um, the other comment I thought was cool. You had the architectural slides out there. One was data lakes built on S3, and you had a theme, the glue in lake formation kind of around S3. And then you had the constellation of, you know, Kinesis SageMaker and other things around it. And you said, you know, pick the tool for the right job. And then you had the other slide on the analytics at the center and you had Redshift and all the other, other, other services around it around serverless. So one was more about the data lake with Athena glue and lake formation. The other one's about serverless. Explain that a little bit more for me, because I'm trying to understand where that fits. I get the data lake piece. Okay. Athena glue and lake formation enables it, and then you can pick and choose what you need on the serverless side. What does analytics in the center mean? >>So the idea there is that really, we wanted to talk about the fact that if you zoom into the analytics use case within analytics, everything that we offer, uh, has a serverless option for our customers. So, um, you could look at the bucket of analytics across things like Redshift or EMR or Athena, or, um, glue and league permission. You have the option to use instances or containers, but also to just not worry about infrastructure and just think declaratively about the data that you want to. >>Oh, so basically you're saying the analytics is going serverless everywhere. Talking about volumes, you mentioned 10 X volumes. Um, what are other stats? Can you share in terms of volumes? What are people seeing velocity I've seen data warehouses can't move as fast as what we're seeing in the cloud with some of your customers and how they're using data. How does the volume and velocity community have any kind of other kind of insights into those numbers? >>Yeah, I mean, I think from a stats perspective, um, you know, take Redshift, for example, customers are processing. So reading and writing, um, multiple exabytes of data there across from each shift. And, uh, you know, one of the things that we've seen in, uh, as time has progressed as, as data volumes have gone up and did a tapes have exploded, uh, you've seen data warehouses get more flexible. So we've added things like the ability to put semi-structured data and arbitrary, nested data into Redshift. Uh, we've also seen the seamless integration of data warehouses and data lakes. So, um, actually Redshift was one of the first to enable a straightforward acquiring of data. That's sitting in locally and drives as well as feed and that's managed on a stream and, uh, you know, those trends will continue. I think you'll kind of continue to see this, um, need to query data wherever it lives and, um, and, uh, allow, uh, leaks and warehouses and purpose-built stores to interconnect. >>You know, one of the things I liked about your presentation was, you know, kind of had the theme of, you know, modernize, unify, innovate, um, and we've been covering a lot of companies that have been, I won't say stumbling, but like getting to the future, some go faster than others, but they all kind of get stuck in an area that seems to be the same spot. It's the silos, breaking down the silos and get in the data lakes and kind of blending that purpose built data store. And they get stuck there because they're so used to silos and their teams, and that's kind of holding back the machine learning side of it because the machine learning can't do its job if they don't have access to all the data. And that's where we're seeing machine learning kind of being this new iterative model where the models are coming in faster. And so the silo brake busting is an issue. So what's your take on this part of the equation? >>Uh, so there's a few things I plan it. So you're absolutely right. I think that transition from some old data to interconnected data is always straightforward and it operates on a number of levels. You want to have the right technology. So, um, you know, we enable things like queries that can span multiple stores. You want to have good governance, you can connect across multiple ones. Uh, then you need to be able to get data in and out of these things and blue plays that role. So there's that interconnection on the technical side, but the other piece is also, um, you know, you want to think through, um, organizationally, how do you organize, how do you define it once data when they share it? And one of the asylees for enabling that sharing and, um, think about, um, some of the processes that need to get put in place and create the right incentives in your company to enable that data sharing. And then the foundational piece is good guardrails. You know, it's, uh, it can be scary to open data up. And, uh, the key to that is to put good governance in place where you can ensure that data can be shared and distributed while remaining protected and adhering to the privacy and compliance and security regulations that you have for that. And once you can assert that level of protection, then you can set that data free. And that's when, uh, customers really start to see the benefits of connecting all of it together, >>Right? And then we have a batch of startups here on this episode that are doing a lot of different things. Uh, some have, you know, new lake new lakes are forming observability lakes. You have CQL innovation on the front end data, tiering innovation at the data tier side, just a ton of innovation around this new data as code. How do you see as executive at AWS? You're enabling all this, um, where's the action going? Where are the white spaces? Where are the opportunities as this architecture continues to grow, um, and get traction because of the relevance of machine learning and AI and the apps are embedding data in there now as code where's the opportunities for these startups and how can they continue to grow? >>Yeah, the, I mean, the opportunity is it's amazing, John, you know, we talked a little bit about this at the beginning, but the, there is no slow down insight for the volume of data that we're generating pretty much everything that we have, whether it's a watch or a phone or the systems that we interact with are generating data and, uh, you know, customers, uh, you know, we talk a lot about the things that'll stay the same over time. And so, you know, the data volumes will continue to go up. Customers are gonna want to keep analyzing that data to make sense of it. They're going to want to be able to do it faster and more cheaply than they were yesterday. And then we're going to want to be able to make decisions and innovate, uh, in a shorter cycle and run more experiments than they were able to do. >>And so I think as long as, and they're always going to want this data to be secure and well-protected, and so I think as long as we, and the startups that we work with can continue to push on making these things better. Can I deal with more data? Can I deal with it more cheaply? Can I make it easier to get insight? And can I maintain a super high bar in security investments in these areas will just be off. Um, because, uh, the demand side of this equation is just in a great place, given what we're seeing in terms of theater and the architect for forum. >>I also love your comment about, uh, ML integration being the last leg of the equation here or less likely the journey, but you've got that enablement of the AIP solves a lot of problems. People can see benefits from good machine learning and AI is creating opportunities. Um, and also you also have mentioned the end to end with security piece. So data and security are kind of going hand in hand these days, not just the governments and the compliance stuff we're talking about security. So machine learning integration kind of connects all of this. Um, what's it all mean for the customers, >>For customers. That means that with machine learning and really enabling themselves to use machine learning, to make sense of data, they're able to find patterns that can represent new opportunities, um, quicker than ever before. And they're able to do it, uh, dynamically. So, you know, in a prior version of the world, we'd have little bit of systems and they would be relatively rigid and then we'd have to improve them. Um, with machine learning, this can be dynamic and near real time and you can customize them. So, uh, that just represents an opportunity to deepen relationships with customers and create more value and to find more efficiency in how businesses are run. So that piece is there. Um, and you know, your ideas around, uh, data's code really come into play because machine learning needs to be repeatable and explainable. And that means versioning, uh, keeping track of everything that you've done from a code and data and learning and training perspective >>And data sets are updating the machine learning. You got data sets growing, they become code modules that can be reused and, uh, interrogated, um, security okay. Is a big as a big theme data, really important security is seen as one of our top use cases. Certainly now in this day and age, we're getting a lot of, a lot of breaches and hacks coming in, being defended. It brings up the open, brings up the data as code security is a good proxy for kind of where this is going. What's your what's take on that and your reaction to that. >>So I'm, I'm security. You can, we can never invest enough. And I think one of the things that we, um, you know, guide us in AWS is security, availability, durability sort of jobs, you know, 1, 2, 3, and, um, and it operates at multiple levels. You need to protect data and rest with encryption, good key management and good practices though. You need to protect data on the wire. You need to have a good sense of what data is allowed to be seen by whom. And then you need to keep track of who did what and be able to verify and come back and prove that, uh, you know, uh, only the things that were allowed to happen actually happened. And you can actually then use machine learning on top of all of this apparatus to say, uh, you know, can I detect things that are happening that shouldn't be happening in near real time so they could put a stop to them. So I don't think any of us can ever invest enough in securing and protecting my data and our systems, and it is really fundamental or adding customer trust and it's just good business. So I think it is absolutely crucial. And we think about it all the time and are always looking for ways to raise >>Well, I really appreciate you taking the time to give the keynote final word here for the folks watching a lot of these startups that are presenting, they're doing well. Business wise, they're being used by large enterprises and people buying their products and using their services for customers are implementing more and more of the hot startups products they're relevant. What's your advice to the customer out there as they go on this journey, this new data as code this new future of analytics, what's your recommendation. >>So for customers who are out there, uh, recommend you take a look at, um, what, uh, the startups on AWS are building. I think there's tremendous innovation and energy, uh, and, um, there's really great technology being built on top of a rock solid platform. And so I encourage customers thinking about it to lean forward, to think about new technology and to embrace, uh, move to the cloud suite, modernized, you know, build a single picture of our data and, and figure out how to innovate and when >>Well, thanks for coming on. Appreciate your keynote. Thanks for the insight. And thanks for the conversation. Let's hand it off to the show. Let the show begin. >>Thank you, John pleasure, as always.

Published Date : Apr 5 2022

SUMMARY :

And we're going to kick it off here with our opening keynote with um, to help showcase some of the great innovation that startups are doing on top of AWS. service loss of serverless as the center of the, of the action, but all these start-ups rock set Dremio And so it's a great time to be in the data business. It's interesting to see the theme of the show getting traction, because you start to see data being treated and especially so in machine learning where you need to think about the explainability of a model, Uh, thank you so much for coming on and being the keynote presenter here for this great event. And so what we're seeing is, uh, you know, it's really about the survival And so, um, you know, it's great to see the innovation that's happening to help customers make So, um, you know, huge, uh, transformation journey for FINRA over the years of customer And the key to that is good foundational governance. And you want to be able to connect data that's in data lakes with data And then you have the ability to use the right tool for the right job. And, um, you know, some of the core ideas that guide the work that we do, um, scalable data lakes at And that's been another big trend is, uh, real time. and freeing customers from the need to think about capacity management. those only have access to the new data that's been tagged with the new tags, and it allows you to And time-travel, uh, you know, John talked about data as code And here are the ideas, you know, how can we up our systems get smarter at the surface, I have to ask you some questions on the end-to-end Uh, so the basic hierarchy is, you know, historically legacy systems are I know as you mentioned, modern data strategy gives you the best of both worlds. And I know some of the startups, um, you know, that we're talking about as part of the showcase And then you had the other slide on the analytics at the center and you had Redshift and all the other, So the idea there is that really, we wanted to talk about the fact that if you zoom about volumes, you mentioned 10 X volumes. And, uh, you know, one of the things that we've seen And so the silo brake busting is an issue. side, but the other piece is also, um, you know, you want to think through, Uh, some have, you know, new lake new lakes are forming observability lakes. And so, you know, the data volumes will continue to go up. And so I think as long as, and they're always going to want this data to be secure and well-protected, Um, and also you also have mentioned the end to end with security piece. And they're able to do it, uh, that can be reused and, uh, interrogated, um, security okay. And then you need to keep track of who did what and be able Well, I really appreciate you taking the time to give the keynote final word here for the folks watching a And so I encourage customers thinking about it to lean forward, And thanks for the conversation.

ENTITIES

Entity	Category	Confidence
Rahul Pathak	PERSON	0.99+
John	PERSON	0.99+
20 terabytes	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
2013	DATE	0.99+
20%	QUANTITY	0.99+
yesterday	DATE	0.99+
two	QUANTITY	0.99+
S3	TITLE	0.99+
Python	TITLE	0.99+
FINRA	ORGANIZATION	0.99+
10 X	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
hundred percent	QUANTITY	0.99+
SQL	TITLE	0.98+
both	QUANTITY	0.98+
One	QUANTITY	0.98+
80 minutes	QUANTITY	0.98+
each shift	QUANTITY	0.98+
one piece	QUANTITY	0.98+
about 80%	QUANTITY	0.98+
Neptune	LOCATION	0.98+
one	QUANTITY	0.98+
Pinterest	ORGANIZATION	0.98+
today	DATE	0.97+
QuickSight	ORGANIZATION	0.97+
three	QUANTITY	0.97+
Redshift	TITLE	0.97+
wave of reinvention	EVENT	0.97+
first	EVENT	0.96+
hundreds of petabytes	QUANTITY	0.96+
HANA	TITLE	0.96+
first	QUANTITY	0.95+
both worlds	QUANTITY	0.95+
Aurora	LOCATION	0.94+
Amex	ORGANIZATION	0.94+
SAS	ORGANIZATION	0.94+
pandemic	EVENT	0.94+
12%	QUANTITY	0.93+
about 10	QUANTITY	0.93+
past couple of years	DATE	0.92+
Kafka	TITLE	0.92+
Kinesis	ORGANIZATION	0.92+
Liccardo	TITLE	0.91+
EMR	TITLE	0.91+
about five	QUANTITY	0.89+
tens of thousands of nodes	QUANTITY	0.88+
Kinesis	TITLE	0.88+
10%	QUANTITY	0.87+
three X	QUANTITY	0.86+
Athena	ORGANIZATION	0.86+
about 250 billion records per	QUANTITY	0.85+
U S	ORGANIZATION	0.85+
CAFCA	ORGANIZATION	0.84+
Silicon	ORGANIZATION	0.83+
every five years	QUANTITY	0.82+
Season two	QUANTITY	0.82+
Athena	OTHER	0.78+
single picture	QUANTITY	0.74+

Rahul Pathak, AWS | AWS re:Invent 2021

>>Hey, welcome back everyone. We're live here in the cube in Las Vegas Raiders reinvent 2021. I'm Jeffrey hosted the key we're in person this year. It's a hybrid event online. Great action. Going on. I'm rolling. Vice-president of ADF analytics. David is great to see you. Thanks for coming on. >>It's great to be here, John. Thanks for having me again. >>Um, so you've got a really awesome job. You've got serverless, you've got analytics. You're in the middle of all the action for AWS. What's the big news. What are you guys announcing? What's going on? >>Yeah, well, it's been an awesome reinvent for us. Uh, we've had a number of several us analytics launches. So red shift, our petabyte scale data warehouse, EMR for open source analytics. Uh, and then we've also had, uh, managed streaming for Kafka go serverless and then on demand for Kinesis. And then a couple of other big ones. We've got RO and cell based security for AWS lake formation. So you can get really fine grain controls over your data lakes and then asset transactions. You can actually have a inserts, updates and deletes on data lakes, which is a big step forward. >>Uh, so Swami on stage and the keynote he's actually finishing up now. But even last night I saw him in the hallway. We were talking about as much as about AI. Of course, he's got the AI title, but AI is the outcome. It's the application of all the data and this and a new architecture. He said on stage just now like, Hey, it's not about the old databases from the nineties, right? There's multiple data stores now available. And there's the unification is the big trend. And he said something interesting. Governance can be an advantage, not an inhibitor. This is kind of this new horizontally scalable, um, kind of idea that enables the vertical specialization around machine learning to be effective. It's not a new architecture, but it's now becoming more popular. People are realizing it. It's sort of share your thoughts on this whole not shift, but the acceleration of horizontally scalable and vertically integrated. Yeah, >>No, I think the way Swami put it is exactly right. What you want is the right tool for the right job. And you want to be able to deliver that to customers. So you're not compromising on performance or functionality of scale, but then you wanted all of these to be interconnected. So they're, well-integrated, you can stay in your favorite interface and take advantage of other technologies. So you can have things like Redshift integrated with Sage makers, you get analytics and machine learning. And then in Swami's absolutely right. Governance is actually an enabler of velocity. Once you've got the right guardrails in place, you can actually set people free because they can innovate. You don't have to be in the way, but you know that your data is protected. It's being used in the way that you expect by the people that you are allowing to use that data. And so it becomes a very powerful way for customers to set data free. And then, because things are elastic and serverless, uh, you can really just match capacity with demand. And so as you see spikes in usage, the system can scale out as those dwindle, they can scale back down, and it just becomes a very efficient way for customers to operate with data at scale >>Every year it reinvented. So it was kind of like a pinch me moment. It's like, well, more that's really good technology. Oh my God, it's getting easier and easier. As the infrastructure as code becomes more programmable, it's becoming easier, more Lambda, more serverless action. Uh, you got new offerings. How are customers benefiting for instance, from the three new offerings that you guys announced here? What specifically is the value proposition that you guys are putting out there? Yeah, so the, >>Um, you know, as we've tried to do with AWS over the years, customers get to focus on the things that really differentiate them and differentiate their businesses. So we take away in Redshift serverless, for example, all of the work that's needed to manage clusters, provision them, scale them, optimize them. Uh, and that's all been automated and made invisible to customers, the customers to think about data, what they want to do with it, what insights they can derive from it. And they know they're getting the most efficient infrastructure possible to make that a reality for them with high performance and low costs. So, uh, better results, more ability to focus on what differentiates their business and lower cost structure over time. >>Yeah. I had the essential guys on it's interesting. They had part of the soul cloud. Continuous is their word for what Adam was saying is clouds everywhere. And they're saying it's faster to match what you want to do with the outcomes, but the capabilities and outcomes kind of merging together where it's easy to say, this is what we want to do. And here's the outcome it supports that's right with that. What are some of the key trends on those outcomes that you see with the data analytics that's most popular right now? And kind of where's that, where's that going? >>Yeah. I mean, I think what we've seen is that data's just becoming more and more critical and top of mind for customers and, uh, you know, the pandemic has also accelerated that we found that customers are really looking to data and analytics and machine learning to find new opportunities. How can they, uh, really expand their business, take advantage of what's happening? And then the other part is how can they find efficiencies? And so, um, really everything that we're trying to do is we're trying to connect it to business outcomes for customers. How can you deepen your relationship with your customers? How can you create new customer experiences and how can you do that more efficiently, uh, with more agility and take advantage of, uh, the ability to be flexible. And you know, what is a very unpredictable world, as we've seen, >>I noticed a lot of purpose-built discussion going on in the keynote with Swami as well. How are you creating this next layer of what I call purpose-built platform like features? I mean, tools are great. You see a lot of tools in the data market tools are tools of your hammer. You want to look for a nail. We see people over by too many tools and you have ultimately a platform, but this seems to be a new trend where there's this connect phenomenon was showing me that you've got these platform capabilities that people can build on top of it, because there's a huge ecosystem of data tools out there that you guys have as partners that want to snap together. So the trend is things are starting to snap together, less primitive, roll your own, which you can do, but there's now more easier ways. Take me through that. Explain that, unpack that that phenomenon role rolling your own firm is, which has been the way now to here. Here's, here's some prefabricated software go. >>Yeah. Um, so it's a great observation and you're absolutely right. I mean, I think there's some customers that want to roll their own and they'll start with instances, they'll install software, they'll write their own code, build their own bespoke systems. And, uh, and we provide what the customers need to do that. But I think increasingly you're starting to see these higher level abstractions that take away all of that detail. And mark has Adam put it and allow customers to compose these. And we think it's important when you do that, uh, to be modular. So customers don't have to have these big bang all or nothing approaches you can pick what's appropriate, uh, but you're never on a dead end. You can always evolve and scale as you need to. And then you want to bring these ideas of unified governance and cohesive interfaces across so that customers find it easy to adopt the next thing. And so you can start off say with batch analytics, you can expand into real time. You can bring in machine learning and predictive capabilities. You can add natural language, and it's a big ecosystem of managed services as well as third parties and partners. >>And what's interesting. I want to get your thoughts while I got you here, because I think this is such an important trend and historic moment in time, Jerry chin, who one of the smartest VCs that we know from Greylock and coin castles in the cloud, which kind of came out of a cube conversation here in the queue years ago, where we saw the movement of that someone's going to build real value on AWS, not just an app. And you see the rise of the snowflakes and Databricks and other companies. And he was pointing out that you can get a very narrow wedge and get a position with these platforms, build on top of them and then build value. And I think that's, uh, the number one question people ask me, it's like, okay, how do I build value on top of these analytic packages? So if I'm a startup or I'm a big company, I also want to leverage these high level abstractions and build on top of it. How do you talk about that? How do you explain that? Because that's what people kind of want to know is like, okay, is it enabling me or do I have to fend for myself later? This is kind of, it comes up a lot. >>That's a great question. And, um, you know, if you saw, uh, Goldman's announcement this week, which is about bringing, building their cloud on top of AWS, it's a great example of using our capabilities in terms of infrastructure and analytics and machine learning to really allow them to take what's value added about Goldman and their position to financial markets, to build something value, add, and create a ton of value for Goldman, uh, by leveraging the things that we offer. And to us, that's an ideal outcome because it's a win-win for us in Goldman, but it's also a win for Goldman and their customers. >>That's what we call the Supercloud that's the opportunity. So is there a lot of Goldmans opportunities out there? Is that just a, these unicorns, are these sites? I mean, how do you, I mean, that's Goldman Sachs, they're huge. Is there, is this open to everybody? >>Absolutely. I mean, that's been one of the, uh, you know, one of the core ideas behind AWS was we wanted to give anybody any developer access to the same technology that the world's largest corporations had. And, uh, that's what you have today. The things that Goldman uses to build that cloud are available to anybody. And you can start for a few pennies scale up, uh, you know, into the petabytes and beyond >>When I was talking to Adams, Lipski when I met with him prior to re-invent, I noticed that he was definitely had an affinity towards the data, obviously he's Amazonia, but he spent time at Tableau. So, so as he's running that company, so you see that kind of mindset of the data advantage. So I have to ask you, because it's something that I've been talking about for a while and I'm waiting for it to emerge, but I'm not sure it's going to happen yet. But what infrastructure is code was for dev ops and then dev sec ops, there's almost like a data ops developing where data as code or programmable data. If I can connect the dots of what Swami's saying, what you're doing is this is like a new horizontal layer of data of freely available data with some government governance built in that's right. So it's, data's being baked into everything. So data is any ingredient, not a query to some database, it's gotta be baked into the apps, that's data as code that's. Right. So it's almost a data DevOps kind of vibe. >>Yeah, no, you're absolutely right. And you know, you've seen it with things like ML ops and so on. It's all the special case of dev ops. But what you're really trying to do is to get programmatic and systematic about how you deal with data. And it's not just data that you have. It's also publicly available data sets and it's customers sharing with each other. So building the ecosystem, our data, and we've got things like our open data program where we've got publicly hosted data sets or things like the AWS data exchange where customers can actually monetize data. So it's not just data as code, but now data as a monetizeable asset. So it's a really exciting time to be in the data business. >>Yeah. And I think it's so many too. So I've got to ask you while I got you here since you're an expert. Um, okay. Here's my problem. I have a lot of data. I'm nervous about it. I want to secure it. So if I try to secure it, I'm not making it available. So I want to feed the machine learning. How do I create an architecture where I can make it freely available, but yet maintain the control and the comfort that this is going to be secure. So what products do I buy? >>Yeah. So, uh, you know, a great place to start at as three. Um, you know, it's one of the best places for data lakes, uh, for all the reasons. That's why we talked about your ability scale costs. You can then use lake formation to really protect and govern that data so you can decide who's allowed to see it and what they're allowed to see, and you don't have to create multiple copies. So you can define that, you know, this group of partners can see a, B and C. This group can see D E and F and the system enforces that. And you have a central point of control where you can monitor what's happening. And if you want to change your mind, you can do that instantly. And all access can be locked down that you've got a variety of encryption capabilities with things like KMS. And so you can really lock down your data, but yet keep it open to the parties that you want and give them specifically the access that you want to give them. And then once you've done that, they're free to use that data, according to the rules that you defined with the analytics tools that we offer to go drive value, create insight, and do something >>That's lake formation. And then you got a Thena querying. Yes, we got all kinds of tooling on top of it. >>It's all right. You can have, uh, Athena query and your data in S3 lake formation, protecting it. And then SageMaker is integrated with Athena. So you can pull that data into SageMaker for machine learning, interrogate that data, using natural language with things like QuickSight Q a like we demoed. So just a ton of power without having to really think too deeply about, uh, developing expert skill sets in this. >>So the next question I want to ask you is because that first part of the great, great, great description, thank you very much. Now, 5g in the edges here, outpost, how was the analytics going on that as edge becomes more pervasive in the architecture? >>Yeah, it's going to be a key part of this ecosystem and it's really a continuum. So, uh, you know, we find customers are collecting data at the edge. They might be making local ML or inference type decisions on edge devices, or, you know, automobiles, for example. Uh, but typically that data with some point will come back into the cloud, into S3 will be used to do heavy duty training, and then those models get pushed back out to the edge. And then some of the things that we've done in Athena, for example, with federated query, as long as you have a network path, and you can understand what the data format or the database is, you can actually run a query on that data. So you can run real-time queries on data, wherever it lives, whether it's on an edge device, on an outpost, in a local zone or in your cloud region and combine all of that together in one place. >>Yeah. And I think having that data copies everywhere is a big thing deal. I've got to ask you now that we're here at reinvent, what's your take we're back in person last year was all virtual. Finally, not 60,000 people, like a couple of years ago, it's still 27,000 people here, all lining up for the sessions, all having a great time. Um, all good. What's the most important story from your, your area that people should pay attention to? What's the headline, what's the top news? What should people pay attention to? >>Yeah, so I think first off it is awesome to be back in person. It's just so fun to see customers and to see, I mean, you, like, we've been meeting here over the years and it's, it's great to so much energy in person. It's been really nice. Uh, you know, I think from an analytics perspective, there's just been a ton of innovation. I think the core idea for us is we want to make it easy for customers to use the right tool for the right job to get insight from all of their data as cost effectively as possible. And I think, uh, you know, I think if customers walk away and think about it as being, it's now easier than ever for me to take advantage of everything that AWS has to offer, uh, to make sense of all the data that I'm generating and use it to drive business value, but I think we'll have done our jobs. Right. >>What's the coolest thing that you're seeing here is that the serverless innovation, is it, um, the new abstraction layer with data high level services in your mind? What's the coolest thing. Got it. >>It's hard to pick the coolest that sticks like kicking the candies. I mean, I think the, uh, you know, the continued innovation in terms of, uh, performance and functionality in each of our services is a big deal. I think serverless is a game changer for customers. Uh, and then I think really the infusion of machine learning throughout all of these systems. So things like Redshift ML, Athena ML, Pixar, Q a just really enabling new experiences for customers, uh, in a way that's easier than it ever has been. And I think that's a, that's a big deal and I'm really excited to see what customers do with it. >>Yeah. And I think the performance thing to me, the coolest thing that I'm seeing is the graviton three and the gravitron progression with the custom stacks with all this ease of use, it's just going to be just a real performance advantage and the costs are getting lowered. So I think the ECE two instances around the compute is phenomenal. No, >>Absolutely. I mean, I think the hardware and Silicon innovation is huge and it's not just performance. It's also the energy efficiency. It's a big deal for the future reality. >>We're at an inflection point where this modern applications are being built. And in my history, I'm old, my birthday is today. I'm in my fifties. So I remember back in the eighties, every major inflection point when there was a shift in how things were developed from mainframe client server, PC inter network, you name it every time the apps change, the app owners, app developers all went to the best platform processing. And so I think, you know, that idea of system software applications being bundled together, um, is a losing formula. I think you got to have that decoupling large-scale was seeing that with cloud. And I think now if I'm an app developer, whether whether I'm in a large ISV in your ecosystem or in the APN partner or a startup, I'm going to go with my software runs the best period and where I can create value. That's right. I get distribution, I create value and it runs fast. I mean, that's, I mean, it's pretty simple. So I think the ecosystem is going to be a big action for the next couple of years. >>Absolutely. Right. And I mean, the ecosystem's huge and I think, um, and we're also grateful to have all these partners here. It's a huge deal for us. And I think it really matters for customers >>What's on your roadmap this year, what you got going on. What can you share a little bit of a trajectory without kind of, uh, breaking the rules of the Amazonian, uh, confidentiality. Um, what's, what's the focus for the year? What do you what's next? >>Well, you know, as you know, we're always talking to customers and, uh, I think we're going to make things better, faster, cheaper, easier to use. And, um, I think you've seen some of the things that we're doing with integration now, you'll see more of that. And, uh, really the goal is how can customers get value as quickly as possible for as low cost as possible? That's how we went to >>Yeah. They're in the longterm. Yeah. We've always say every time we see each other data is at the center of the value proposition. I've been saying that for 10 years now, it's actually the value proposition, powering AI. And you're seeing because of it, the rise of superclouds and then the superclouds are emerging. I think you guys are the under innings of these emerging superclouds. And so it's a huge treading, the Goldman Sachs things of validation. So again, more data, the better, sorry, cool things happening. >>It is just it's everywhere. And the, uh, the diversity of use cases is amazing. I mean, I think from, you know, the Australia swimming team to, uh, to formula one to NASDAQ, it's just incredible to see what our >>Customers do. We see the great route. Good to see you. Thanks for coming on the cube. >>Pleasure to be here as always John. Great to see you. Thank you. Yeah. >>Thanks for, thanks for sharing. All of the data is the key to the success. Data is the value proposition. You've seen the rise of superclouds because of the data advantage. If you can expose it, protect it and govern it, unleashes creativity and opportunities for entrepreneurs and businesses. Of course, you got to have the scale and the price performance. That's what doing this is the cube coverage. You're watching the leader in worldwide tech coverage here in person for any of us reinvent 2021 I'm John ferry. Thanks for watching.

Published Date : Dec 1 2021

SUMMARY :

David is great to see you. It's great to be here, John. What are you guys announcing? So you can get really fine grain controls over your data lakes and then asset transactions. It's the application of all the data and this and a new architecture. And so as you see spikes in usage, the system can scale out How are customers benefiting for instance, from the three new offerings that you guys announced the customers to think about data, what they want to do with it, what insights they can derive from it. And they're saying it's faster to match what you want to do with the outcomes, And you know, what is a very unpredictable world, as we've seen, tools out there that you guys have as partners that want to snap together. So customers don't have to have these big bang all or nothing approaches you can pick And he was pointing out that you can get a very narrow wedge and get a position And, um, you know, if you saw, uh, Goldman's announcement this week, Is there, is this open to everybody? I mean, that's been one of the, uh, you know, one of the core ideas behind AWS was we wanted to give so you see that kind of mindset of the data advantage. And it's not just data that you have. So I've got to ask you while I got you here since you're an expert. And so you can really lock down your data, but yet And then you got a Thena querying. So you can pull that data into SageMaker for machine learning, So the next question I want to ask you is because that first part of the great, great, great description, thank you very much. data format or the database is, you can actually run a query on that data. I've got to ask you now that we're here at reinvent, And I think, uh, you know, I think if customers walk away and think about it as being, What's the coolest thing that you're seeing here is that the serverless innovation, I think the, uh, you know, the continued innovation in terms of, uh, So I think the ECE two instances around the compute is phenomenal. It's a big deal for the future reality. And so I think, you know, And I think it really matters for customers What can you share a little bit of a trajectory without kind of, Well, you know, as you know, we're always talking to customers and, uh, I think we're going to make things better, I think you guys are the under innings of these emerging superclouds. I mean, I think from, you know, the Australia swimming team to, uh, to formula one to NASDAQ, Thanks for coming on the cube. Great to see you. All of the data is the key to the success.

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Goldman	ORGANIZATION	0.99+
Rahul Pathak	PERSON	0.99+
AWS	ORGANIZATION	0.99+
John	PERSON	0.99+
Goldman Sachs	ORGANIZATION	0.99+
Adam	PERSON	0.99+
Jerry chin	PERSON	0.99+
NASDAQ	ORGANIZATION	0.99+
Athena	LOCATION	0.99+
Jeffrey	PERSON	0.99+
2021	DATE	0.99+
60,000 people	QUANTITY	0.99+
27,000 people	QUANTITY	0.99+
10 years	QUANTITY	0.99+
last year	DATE	0.99+
John ferry	PERSON	0.99+
today	DATE	0.99+
three	QUANTITY	0.98+
Kafka	TITLE	0.98+
Swami	PERSON	0.98+
one	QUANTITY	0.98+
first	QUANTITY	0.98+
ADF analytics	ORGANIZATION	0.97+
Amazonia	ORGANIZATION	0.97+
eighties	DATE	0.97+
Pixar	ORGANIZATION	0.97+
fifties	QUANTITY	0.97+
each	QUANTITY	0.97+
three new offerings	QUANTITY	0.96+
Redshift	TITLE	0.96+
first part	QUANTITY	0.96+
this year	DATE	0.96+
last night	DATE	0.95+
Lipski	PERSON	0.94+
couple of years ago	DATE	0.94+
next couple of years	DATE	0.94+
Sage	ORGANIZATION	0.93+
Goldmans	ORGANIZATION	0.92+
pandemic	EVENT	0.92+
this week	DATE	0.92+
Databricks	ORGANIZATION	0.9+
one place	QUANTITY	0.9+
mark	PERSON	0.88+
Supercloud	ORGANIZATION	0.87+
Tableau	ORGANIZATION	0.84+
S3	TITLE	0.84+
5g	QUANTITY	0.8+
Athena ML	ORGANIZATION	0.78+
Athena	ORGANIZATION	0.78+
Raiders	ORGANIZATION	0.77+
years ago	DATE	0.75+
nineties	DATE	0.74+
ML	ORGANIZATION	0.73+
Adams	PERSON	0.73+
two instances	QUANTITY	0.72+
Lambda	TITLE	0.71+
Thena	ORGANIZATION	0.71+
SageMaker	TITLE	0.66+
Vegas	LOCATION	0.66+
Invent	EVENT	0.64+
Vice	PERSON	0.63+
graviton three	OTHER	0.62+
Australia	LOCATION	0.59+
Las	ORGANIZATION	0.59+
Kinesis	TITLE	0.57+
Amazonian	ORGANIZATION	0.56+

Rahul Pathak, AWS | AWS re:Invent 2020

>>from around the globe. It's the Cube with digital coverage of AWS reinvent 2020 sponsored by Intel and AWS. Yeah, welcome back to the cubes. Ongoing coverage of AWS reinvent virtual Cuba's Gone Virtual along with most events these days are all events and continues to bring our digital coverage of reinvent With me is Rahul Pathak, who is the vice president of analytics at AWS A Ro. It's great to see you again. Welcome. And thanks for joining the program. >>They have Great co two and always a pleasure. Thanks for having me on. >>You're very welcome. Before we get into your leadership discussion, I want to talk about some of the things that AWS has announced. Uh, in the early parts of reinvent, I want to start with a glue elastic views. Very notable announcement allowing people to, you know, essentially share data across different data stores. Maybe tell us a little bit more about glue. Elastic view is kind of where the name came from and what the implication is, >>Uh, sure. So, yeah, we're really excited about blue elastic views and, you know, as you mentioned, the idea is to make it easy for customers to combine and use data from a variety of different sources and pull them together into one or many targets. And the reason for it is that you know we're really seeing customers adopt what we're calling a lake house architectural, which is, uh, at its core Data Lake for making sense of data and integrating it across different silos, uh, typically integrated with the data warehouse, and not just that, but also a range of other purpose. Both stores like Aurora, Relation of Workloads or dynamodb for non relational ones. And while customers typically get a lot of benefit from using purpose built stores because you get the best possible functionality, performance and scale forgiven use case, you often want to combine data across them to get a holistic view of what's happening in your business or with your customers. And before glue elastic views, customers would have to either use E. T. L or data integration software, or they have to write custom code that could be complex to manage, and I could be are prone and tough to change. And so, with elastic views, you can now use sequel to define a view across multiple data sources pick one or many targets. And then the system will actually monitor the sources for changes and propagate them into the targets in near real time. And it manages the anti pipeline and can notify operators if if anything, changes. And so the you know the components of the name are pretty straightforward. Blues are survivalists E T Elling data integration service on blue elastic views about our about data integration their views because you could define these virtual tables using sequel and then elastic because it's several lists and will scale up and down to deal with the propagation of changes. So we're really excited about it, and customers are as well. >>Okay, great. So my understanding is I'm gonna be able to take what's called what the parlance of materialized views, which in my laypersons terms assumes I'm gonna run a query on the database and take that subset. And then I'm gonna be ableto thio. Copy that and move it to another data store. And then you're gonna automatically keep track of the changes and keep everything up to date. Is that right? >>Yes. That's exactly right. So you can imagine. So you had a product catalog for example, that's being updated in dynamodb, and you can create a view that will move that to Amazon Elasticsearch service. You could search through a current version of your catalog, and we will monitor your dynamodb tables for any changes and make sure those air all propagated in the real time. And all of that is is taken care of for our customers as soon as they defined the view on. But they don't be just kept in sync a za long as the views in effect. >>Let's see, this is being really valuable for a person who's building Looks like I like to think in terms of data services or data products that are gonna help me, you know, monetize my business. Maybe, you know, maybe it's a simple as a dashboard, but maybe it's actually a product. You know, it might be some content that I want to develop, and I've got transaction systems. I've got unstructured data, may be in a no sequel database, and I wanna actually combine those build new products, and I want to do that quickly. So So take me through what I would have to do. You you sort of alluded to it with, you know, a lot of e t l and but take me through in a little bit more detail how I would do that, you know, before this innovation. And maybe you could give us a sense as to what the possibilities are with glue. Elastic views? >>Sure. So, you know, before we announced elastic views, a customer would typically have toe think about using a T l software, so they'd have to write a neat L pipeline that would extract data periodically from a range of sources. They then have to write transformation code that would do things like matchup types. Make sure you didn't have any invalid values, and then you would combine it on periodically, Write that into a target. And so once you've got that pipeline set up, you've got to monitor it. If you see an unusual spike in data volume, you might have to add more. Resource is to the pipeline to make a complete on time. And then, if anything changed in either the source of the destination that prevented that data from flowing in the way you would expect it, you'd have toe manually, figure that out and have data, quality checks and all of that in place to make sure everything kept working but with elastic views just gets much simpler. So instead of having to write custom transformation code, you right view using sequel and um, sequel is, uh, you know, widely popular with data analysts and folks that work with data, as you well know. And so you can define that view and sequel. The view will look across multiple sources, and then you pick your destination and then glue. Elastic views essentially monitors both the source for changes as well as the source and the destination for any any issues like, for example, did the schema changed. The shape of the data change is something briefly unavailable, and it can monitor. All of that can handle any errors, but it can recover from automatically. Or if it can't say someone dropped an important table in the source. That was part of your view. You can actually get alerted and notified to take some action to prevent bad data from getting through your system or to prevent your pipeline from breaking without your knowledge and then the final pieces, the elasticity of it. It will automatically deal with adding more resource is if, for example, say you had a spiky day, Um, in the markets, maybe you're building a financial services application and you needed to add more resource is to process those changes into your targets more quickly. The system would handle that for you. And then, if you're monetizing data services on the back end, you've got a range of options for folks subscribing to those targets. So we've got capabilities like our, uh, Amazon data exchange, where people can exchange and monetize data set. So it allows this and to end flow in a much more straightforward way. It was possible before >>awesome. So a lot of automation, especially if something goes wrong. So something goes wrong. You can automatically recover. And if for whatever reason, you can't what happens? You quite ask the system and and let the operator No. Hey, there's an issue. You gotta go fix it. How does that work? >>Yes, exactly. Right. So if we can recover, say, for example, you can you know that for a short period of time, you can't read the target database. The system will keep trying until it can get through. But say someone dropped a column from your source. That was a key part of your ultimate view and destination. You just can't proceed at that point. So the pipeline stops and then we notify using a PS or an SMS alert eso that programmatic action can be taken. So this effectively provides a really great way to enforce the integrity of data that's going between the sources and the targets. >>All right, make it kindergarten proof of it. So let's talk about another innovation. You guys announced quicksight que, uh, kind of speaking to the machine in my natural language, but but give us some more detail there. What is quicksight Q and and how doe I interact with it. What What kind of questions can I ask it >>so quick? Like you is essentially a deep, learning based semantic model of your data that allows you to ask natural language questions in your dashboard so you'll get a search bar in your quick side dashboard and quick site is our service B I service. That makes it really easy to provide rich dashboards. Whoever needs them in the organization on what Q does is it's automatically developing relationships between the entities in your data, and it's able to actually reason about the questions you ask. So unlike earlier natural language systems, where you have to pre define your models, you have to pre define all the calculations that you might ask the system to do on your behalf. Q can actually figure it out. So you can say Show me the top five categories for sales in California and it'll look in your data and figure out what that is and will prevent. It will present you with how it parse that question, and there will, in line in seconds, pop up a dashboard of what you asked and actually automatically try and take a chart or visualization for that data. That makes sense, and you could then start to refine it further and say, How does this compare to what happened in New York? And we'll be able to figure out that you're tryingto overlay those two data sets and it'll add them. And unlike other systems, it doesn't need to have all of those things pre defined. It's able to reason about it because it's building a model of what your data means on the flight and we pre trained it across a variety of different domains So you can ask a question about sales or HR or any of that on another great part accused that when it presents to you what it's parsed, you're actually able toe correct it if it needs it and provide feedback to the system. So, for example, if it got something slightly off you could actually select from a drop down and then it will remember your selection for the next time on it will get better as you use it. >>I saw a demo on in Swamis Keynote on December 8. That was basically you were able to ask Quick psych you the same question, but in different ways, you know, like compare California in New York or and then the data comes up or give me the top, you know, five. And then the California, New York, the same exact data. So so is that how I kind of can can check and see if the answer that I'm getting back is correct is ask different questions. I don't have to know. The schema is what you're saying. I have to have knowledge of that is the user I can. I can triangulate from different angles and then look and see if that's correct. Is that is that how you verify or there are other ways? >>Eso That's one way to verify. You could definitely ask the same question a couple of different ways and ensure you're seeing the same results. I think the third option would be toe, uh, you know, potentially click and drill and filter down into that data through the dash one on, then the you know, the other step would be at data ingestion Time. Typically, data pipelines will have some quality controls, but when you're interacting with Q, I think the ability to ask the question multiple ways and make sure that you're getting the same result is a perfectly reasonable way to validate. >>You know what I like about that answer that you just gave, and I wonder if I could get your opinion on this because you're you've been in this business for a while? You work with a lot of customers is if you think about our operational systems, you know things like sales or E r. P systems. We've contextualized them. In other words, the business lines have inject context into the system. I mean, they kind of own it, if you will. They own the data when I put in quotes, but they do. They feel like they're responsible for it. There's not this constant argument because it's their data. It seems to me that if you look back in the last 10 years, ah, lot of the the data architecture has been sort of generis ized. In other words, the experts. Whether it's the data engineer, the quality engineer, they don't really have the business context. But the example that you just gave it the drill down to verify that the answer is correct. It seems to me, just in listening again to Swamis Keynote the other day is that you're really trying to put data in the hands of business users who have the context on the domain knowledge. And that seems to me to be a change in mindset that we're gonna see evolve over the next decade. I wonder if you could give me your thoughts on that change in the data architecture data mindset. >>David, I think you're absolutely right. I mean, we see this across all the customers that we speak with there's there's an increasing desire to get data broadly distributed into the hands of the organization in a well governed and controlled way. But customers want to give data to the folks that know what it means and know how they can take action on it to do something for the business, whether that's finding a new opportunity or looking for efficiencies. And I think, you know, we're seeing that increasingly, especially given the unpredictability that we've all gone through in 2020 customers are realizing that they need to get a lot more agile, and they need to get a lot more data about their business, their customers, because you've got to find ways to adapt quickly. And you know, that's not gonna change anytime in the future. >>And I've said many times in the The Cube, you know, there are industry. The technology industry used to be all about the products, and in the last decade it was really platforms, whether it's SAS platforms or AWS cloud platforms, and it seems like innovation in the coming years, in many respects is coming is gonna come from the ecosystem and the ability toe share data we've We've had some examples today and then But you hit on. You know, one of the key challenges, of course, is security and governance. And can you automate that if you will and protect? You know the users from doing things that you know, whether it's data access of corporate edicts for governance and compliance. How are you handling that challenge? >>That's a great question, and it's something that really emphasized in my leadership session. But the you know, the notion of what customers are doing and what we're seeing is that there's, uh, the Lake House architectural concept. So you've got a day late. Purpose build stores and customers are looking for easy data movement across those. And so we have things like blue elastic views or some of the other blue features we announced. But they're also looking for unified governance, and that's why we built it ws late formation. And the idea here is that it can quickly discover and catalog customer data assets and then allows customers to define granular access policies centrally around that data. And once you have defined that, it then sets customers free to give broader access to the data because they put the guardrails in place. They put the protections in place. So you know you can tag columns as being private so nobody can see them on gun were announced. We announced a couple of new capabilities where you can provide row based control. So only a certain set of users can see certain rose in the data, whereas a different set of users might only be able to see, you know, a different step. And so, by creating this fine grained but unified governance model, this actually sets customers free to give broader access to the data because they know that they're policies and compliance requirements are being met on it gets them out of the way of the analyst. For someone who can actually use the data to drive some value for the business, >>right? They could really focus on driving value. And I always talk about monetization. However monetization could be, you know, a generic term, for it could be saving lives, admission of the business or the or the organization I meant to ask you about acute customers in bed. Uh, looks like you into their own APs. >>Yes, absolutely so one of quick sites key strengths is its embed ability. And on then it's also serverless, so you could embed it at a really massive scale. And so we see customers, for example, like blackboard that's embedding quick side dashboards into information. It's providing the thousands of educators to provide data on the effectiveness of online learning. For example, on you could embed Q into that capability. So it's a really cool way to give a broad set of people the ability to ask questions of data without requiring them to be fluent in things like Sequel. >>If I ask you a question, we've talked a little bit about data movement. I think last year reinvent you guys announced our A three. I think it made general availability this year. And remember Andy speaking about it, talking about you know, the importance of having big enough pipes when you're moving, you know, data around. Of course you do. Doing tearing. You also announced Aqua Advanced Query accelerator, which kind of reduces bringing the computer. The data, I guess, is how I would think about that reducing that movement. But then we're talking about, you know, glue, elastic views you're copying and moving data. How are you ensuring you know, maintaining that that maximum performance for your customers. I mean, I know it's an architectural question, but as an analytics professional, you have toe be comfortable that that infrastructure is there. So how does what's A. W s general philosophy in that regard? >>So there's a few ways that we think about this, and you're absolutely right. I think there's data volumes were going up, and we're seeing customers going from terabytes, two petabytes and even people heading into the exabyte range. Uh, there's really a need to deliver performance at scale. And you know, the reality of customer architectures is that customers will use purpose built systems for different best in class use cases. And, you know, if you're trying to do a one size fits all thing, you're inevitably going to end up compromising somewhere. And so the reality is, is that customers will have more data. We're gonna want to get it to more people on. They're gonna want their analytics to be fast and cost effective. And so we look at strategies to enable all of this. So, for example, glue elastic views. It's about moving data, but it's about moving data efficiently. So What we do is we allow customers to define a view that represents the subset of their data they care about, and then we only look to move changes as efficiently as possible. So you're reducing the amount of data that needs to get moved and making sure it's focused on the essential. Similarly, with Aqua, what we've done, as you mentioned, is we've taken the compute down to the storage layer, and we're using our nitro chips to help with things like compression and encryption. And then we have F. P. J s in line to allow filtering an aggregation operation. So again, you're tryingto quickly and effectively get through as much data as you can so that you're only sending back what's relevant to the query that's being processed. And that again leads to more performance. If you can avoid reading a bite, you're going to speed up your queries. And that Awkward is trying to do. It's trying to push those operations down so that you're really reducing data as close to its origin as possible on focusing on what's essential. And that's what we're applying across our analytics portfolio. I would say one other piece we're focused on with performance is really about innovating across the stack. So you mentioned network performance. You know, we've got 100 gigabits per second throughout now, with the next 10 instances and then with things like Grab it on to your able to drive better price performance for customers, for general purpose workloads. So it's really innovating at all layers. >>It's amazing to watch it. I mean, you guys, it's a It's an incredible engineering challenge as you built this hyper distributed system. That's now, of course, going to the edge. I wanna come back to something you mentioned on do wanna hit on your leadership session as well. But you mentioned the one size fits all, uh, system. And I've asked Andy Jassy about this. I've had a discussion with many folks that because you're full and and of course, you mentioned the challenges you're gonna have to make tradeoffs if it's one size fits all. The flip side of that is okay. It's simple is you know, 11 of the Swiss Army knife of database, for example. But your philosophy is Amazon is you wanna have fine grained access and to the primitives in case the market changes you, you wanna be able to move quickly. So that puts more pressure on you to then simplify. You're not gonna build this big hairball abstraction layer. That's not what he gonna dio. Uh, you know, I think about, you know, layers and layers of paint. I live in a very old house. Eso your That's not your approach. So it puts greater pressure on on you to constantly listen to your customers, and and they're always saying, Hey, I want to simplify, simplify, simplify. We certainly again heard that in swamis presentation the other day, all about, you know, minimizing complexity. So that really is your trade office. It puts pressure on Amazon Engineering to continue to raise the bar on simplification. Isn't Is that a fair statement? >>Yeah, I think so. I mean, you know, I think any time we can do work, so our customers don't have to. I think that's a win for both of us. Um, you know, because I think we're delivering more value, and it makes it easier for our customers to get value from their data way. Absolutely believe in using the right tool for the right job. And you know you talked about an old house. You're not gonna build or renovate a house of the Swiss Army knife. It's just the wrong tool. It might work for small projects, but you're going to need something more specialized. The handle things that matter. It's and that is, uh, that's really what we see with that, you know, with that set of capabilities. So we want to provide customers with the best of both worlds. We want to give them purpose built tools so they don't have to compromise on performance or scale of functionality. And then we want to make it easy to use these together. Whether it's about data movement or things like Federated Queries, you can reach into each of them and through a single query and through a unified governance model. So it's all about stitching those together. >>Yeah, so far you've been on the right side of history. I think it serves you well on your customers. Well, I wanna come back to your leadership discussion, your your leadership session. What else could you tell us about? You know, what you covered there? >>So we we've actually had a bunch of innovations on the analytics tax. So some of the highlights are in m r, which is our managed spark. And to do service, we've been able to achieve 1.7 x better performance and open source with our spark runtime. So we've invested heavily in performance on now. EMR is also available for customers who are running and containerized environment. So we announced you Marnie chaos on then eh an integrated development environment and studio for you Marco D M R studio. So making it easier both for people at the infrastructure layer to run em are on their eks environments and make it available within their organizations but also simplifying life for data analysts and folks working with data so they can operate in that studio and not have toe mess with the details of the clusters underneath and then a bunch of innovation in red shift. We talked about Aqua already, but then we also announced data sharing for red Shift. So this makes it easy for red shift clusters to share data with other clusters without putting any load on the central producer cluster. And this also speaks to the theme of simplifying getting data from point A to point B so you could have central producer environments publishing data, which represents the source of truth, say into other departments within the organization or departments. And they can query the data, use it. It's always up to date, but it doesn't put any load on the producers that enables these really powerful data sharing on downstream data monetization capabilities like you've mentioned. In addition, like Swami mentioned in his keynote Red Shift ML, so you can now essentially train and run models that were built in sage maker and optimized from within your red shift clusters. And then we've also automated all of the performance tuning that's possible in red ships. So we really invested heavily in price performance, and now we've automated all of the things that make Red Shift the best in class data warehouse service from a price performance perspective up to three X better than others. But customers can just set red shift auto, and it'll handle workload management, data compression and data distribution. Eso making it easier to access all about performance and then the other big one was in Lake Formacion. We announced three new capabilities. One is transactions, so enabling consistent acid transactions on data lakes so you can do things like inserts and updates and deletes. We announced row based filtering for fine grained access control and that unified governance model and then automated storage optimization for Data Lake. So customers are dealing with an optimized small files that air coming off streaming systems, for example, like Formacion can auto compact those under the covers, and you can get a 78 x performance boost. It's been a busy year for prime lyrics. >>I'll say that, z that it no great great job, bro. Thanks so much for coming back in the Cube and, you know, sharing the innovations and, uh, great to see you again. And good luck in the coming here. Well, >>thank you very much. Great to be here. Great to see you. And hope we get Thio see each other in person against >>I hope so. All right. And thank you for watching everybody says Dave Volonte for the Cube will be right back right after this short break

Published Date : Dec 10 2020

SUMMARY :

It's great to see you again. They have Great co two and always a pleasure. to, you know, essentially share data across different And so the you know the components of the name are pretty straightforward. And then you're gonna automatically keep track of the changes and keep everything up to date. So you can imagine. services or data products that are gonna help me, you know, monetize my business. that prevented that data from flowing in the way you would expect it, you'd have toe manually, And if for whatever reason, you can't what happens? So if we can recover, say, for example, you can you know that for a So let's talk about another innovation. that you might ask the system to do on your behalf. but in different ways, you know, like compare California in New York or and then the data comes then the you know, the other step would be at data ingestion Time. But the example that you just gave it the drill down to verify that the answer is correct. And I think, you know, we're seeing that increasingly, You know the users from doing things that you know, whether it's data access But the you know, the notion of what customers are doing and what we're seeing is that admission of the business or the or the organization I meant to ask you about acute customers And on then it's also serverless, so you could embed it at a really massive But then we're talking about, you know, glue, elastic views you're copying and moving And you know, the reality of customer architectures is that customers will use purpose built So that puts more pressure on you to then really what we see with that, you know, with that set of capabilities. I think it serves you well on your customers. speaks to the theme of simplifying getting data from point A to point B so you could have central in the Cube and, you know, sharing the innovations and, uh, great to see you again. thank you very much. And thank you for watching everybody says Dave Volonte for the Cube will be right back right after

ENTITIES

Entity	Category	Confidence
Rahul Pathak	PERSON	0.99+
Andy Jassy	PERSON	0.99+
AWS	ORGANIZATION	0.99+
David	PERSON	0.99+
California	LOCATION	0.99+
New York	LOCATION	0.99+
Andy	PERSON	0.99+
Swiss Army	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
December 8	DATE	0.99+
Dave Volonte	PERSON	0.99+
last year	DATE	0.99+
2020	DATE	0.99+
third option	QUANTITY	0.99+
Swami	PERSON	0.99+
each	QUANTITY	0.99+
both	QUANTITY	0.99+
A. W	PERSON	0.99+
this year	DATE	0.99+
10 instances	QUANTITY	0.98+
A three	COMMERCIAL_ITEM	0.98+
78 x	QUANTITY	0.98+
two petabytes	QUANTITY	0.98+
five	QUANTITY	0.97+
Amazon Engineering	ORGANIZATION	0.97+
Red Shift ML	TITLE	0.97+
Formacion	ORGANIZATION	0.97+
11	QUANTITY	0.96+
one	QUANTITY	0.96+
one way	QUANTITY	0.96+
Intel	ORGANIZATION	0.96+
One	QUANTITY	0.96+
five categories	QUANTITY	0.94+
Aqua	ORGANIZATION	0.93+
Elasticsearch	TITLE	0.93+
terabytes	QUANTITY	0.93+
both worlds	QUANTITY	0.93+
next decade	DATE	0.92+
two data sets	QUANTITY	0.91+
Lake Formacion	ORGANIZATION	0.9+
single query	QUANTITY	0.9+
Data Lake	ORGANIZATION	0.89+
thousands of educators	QUANTITY	0.89+
Both stores	QUANTITY	0.88+
Thio	PERSON	0.88+
agile	TITLE	0.88+
Cuba	LOCATION	0.87+
dynamodb	ORGANIZATION	0.86+
1.7 x	QUANTITY	0.86+
Swamis	PERSON	0.84+
EMR	TITLE	0.82+
one size	QUANTITY	0.82+
Red Shift	TITLE	0.82+
up to three X	QUANTITY	0.82+
100 gigabits per second	QUANTITY	0.82+
Marnie	PERSON	0.79+
last decade	DATE	0.79+
reinvent 2020	EVENT	0.74+
Invent	EVENT	0.74+
last 10 years	DATE	0.74+
Cube	COMMERCIAL_ITEM	0.74+
today	DATE	0.74+
A Ro	EVENT	0.71+
three new capabilities	QUANTITY	0.71+
two	QUANTITY	0.7+
E T Elling	PERSON	0.69+
Eso	ORGANIZATION	0.66+
Aqua	TITLE	0.64+
Cube	ORGANIZATION	0.63+
Query	COMMERCIAL_ITEM	0.63+
SAS	ORGANIZATION	0.62+
Aurora	ORGANIZATION	0.61+
Lake House	ORGANIZATION	0.6+
Sequel	TITLE	0.58+
P.	PERSON	0.56+

Rahul Pathak & Shawn Bice, AWS | AWS re:Invent 2018

(futuristic electronic music) >> Live from Las Vegas, its theCUBE covering AWS re:Invent 2018. Brought to you by Amazon Web Services, Intel, and their ecosystem partners. >> Hey welcome back everyone. Live here in Las Vegas with AWS, Amazon Web Services, re:Invent 2018's CUBE coverage. Two sets, wall-to-wall coverage here on the ground floor. I'm here with Dave Vellante. Dave, six years we've been coming to re:Invent. Every year except for the first year. What a progression. We got great news. Always raising the bar, as they say at Amazon. This year, big announcements. One of them is blockchain. Really kind of laying out early formation of how they're going to roll out, thinking about blockchain. We're here to talk about here, with Rahul Pathak, who's the GM of analytics, and data lakes, and blockchain. Managing that. And Shawn Bice who's the vice president of non-relational databases. Guys, welcome to theCUBE. >> Thank you. >> Thank you, it's great to be here. >> I wish my voice was a little bit stronger. I love this segment. You know, we've been doing blockchain. We've been following one of the big events in the industry. If you separate out the whole token ICO scam situation, token economics is actually a great business model opportunity. Blockchain is an infrastructure, a decentralized infrastructure, that's great. But it's early. Day one really for you guys in a literal sense. How are you guys doing blockchain? Take a minute to explain the announcement because there are use cases, low-hanging use cases, that look a lot like IoT and supply chain that people are interested in. So take a minute to explain the announcements and what it means. >> Absolutely, so when we began looking at blockchain and blockchain use cases, we really realized there are two things that customers are trying to do. One case is really keep an immutable record of transactions and in a scenario where centralized trust is okay. And for that we have Amazon QLDB, which is an immutable cryptographically verifiable ledger. And then in scenarios where customers really wanted the decentralized trust and the smart contracts, that's where blockchain frameworks like Hyperledger Fabric and Ethereum play a role. But they're just super complicated to use and that's why we built Managed Blockchain, to make it easy to stand up, scale, and monitor these networks, so customers can focus on building applications. And in terms of use cases on the decentralized side, it's really quite diverse. I mean, we've got a customer, Guardian Life Insurance, so they're looking at Managed Blockchain 'cause they have this distributed network of partners, providers, patients, and customers, and they want to provide decentralized verifiable records of what's taking place. And it's just a broad set of use cases. >> And then we saw in the video this morning, I think it was Indonesian farmers, right? Wasn't that before the keynote? Did you see that? It was good. >> I missed that one. >> Yeah, so they don't have bank accounts. >> Oh, got it. >> And they got a reward system, so they're using the blockchain to reward farmers to participate. >> So a lot of people ask the question is, why do I need blockchain? Why don't you just put in database? So there are unique, which is true by the way, 'cause latency's an issue. (chuckles) Certainly, you might want to avoid blockchain in the short term, until that gets fixed. Assume that every one will get fixed over time, but what are some of the use cases where blockchain actually is relevant? Can you be specific because that's really people starting to make their selection criteria on. Look, I still use a database. I'm going to have all kinds of token and models around, but in a database. Where is the blockchain specifically resonating right now? >> I'll take a shot at this or we can do it together, but when you think of QLDB, it's not that customers are asking us for a ledger database. What they were really saying is, hey, we'd like to have this complete immutable, cryptographically verifiable trail of data. And it wasn't necessarily a blockchain conversation, wasn't necessarily a database conversation, it was like, I really would like to have this complete cyrptographic verifiable trail of data. And it turns out, as you sort of look at the use cases, in particular, the centralized trust scenario, QLDB does exactly that. It's not about decentralized trust. It's really about simply being able to have a database that when you write to that database, you write a transaction to the database, you can't change it. You know, a typical database people are like, well, hey, wait a second, what does immutable really mean? And once you get people to understand that once that transaction is written to a journal, it cannot be changed at all and attached, then all of a sudden there's that breakthrough moment of it being immutable and having that cryptographic trail. >> And the advantage relative to a distributive blockchain is performance, scale, and all the challenges that people always say. >> Yeah, exactly. Like with QLDB, you can find it's going to be two to three times faster cause you're not doing that distributing consensus. >> How about data lakes? Let's talk about data lakes. What problem were you guys trying to solve with the data lakes? There's a lot of them, but. (chuckles) >> That's a great question. So, essentially it's been hard for customers to set up data lakes 'cause you have to figure out where to get data from, you have to land it in S3, you've got to secure it, you've then got to secure every analytic service that you've got, you might have to clean your data. So with lake formation, what we're trying to do is make it super easy to set up data lakes. So we have blueprints for common databases and data sources. We bring that data into an S3 data lake and we've created a central catalog for that data where customers can define granular access policies with the table, and the column, and the row level. We've also got ML-based data cleansing and data deduplication. And so now customers can just use lake formation, set up data lakes, curate their data, protect it in a single place, and have those policies that enforce across all of the analytic services that they might use. >> So does it help solve the data swamp problem, get more value out of the data lake? And if so, how? >> Absolutely, so the way it does that is by automatically cataloging all datas that comes in. So we can recognize what the data is and then we allow customers to add business metadata to that so they can tag this as customer data, or PII data, or this is my table of sales history. And that then becomes searchable. So we automatically generate a catalog as data comes in and that addresses the, what do I have in my data lake problem. >> Okay, so-- >> Go ahead. >> So, Rahul, you're the general manager. Shawn, what's your job, what do you do? >> So our team builds all the non-relational databases at Amazon. So DynamoDB, Neptune, ElastiCache, Timestream, which you'll hear about today, QLDB, et cetera. So all those things-- >> Beanstalk too, Elastic Beanstalk? >> No we do not build Beanstalk. >> Okay, we're a customer of DynamoDB, by the way. >> Great! >> We're happy customers. >> That's great! >> And we use ElastiCache, right? >> Yup, the elastic >> There you go! >> surge still has it. >> So-- >> Haven't used Neptune yet. >> What's the biggest problem stigmas that you guys are trying to raise the bar on? What's the key focus as you get this new worlds and use cases coming together? These are new use cases. How are you guys evaluating it? How are you guys raising the bar? >> You know, that's a really good question you ask. What I've found in my experience is developers that have been building apps for a long time, most people are familiar with relational databases. For years we've been building apps in that context, but when you kind of look at how people are building apps today, it's very different than how they did in the past. Today developer do what they do best. They take an application, a big application, break it down into smaller parts, and they pick the right tool for the right job. >> I think the game developer mark is going to be a canary in the coal mine for developers, and it's a good spot for data formation in these kind of unstructured, non-relational scenarios. Okay, now all this engagement data, could be first person shooter, whatever it is, just throw it, I need to throw it somewhere, and I'll get to it and let it be ready to be worked on by analytics. >> Well, yeah, if you think about that gamer scenario, think about if you and I are building a game, who knows if there's going to be one user, ten players, or 10 million, or 100 million. And if we had 100 million, it's all about the performance being steady. At 100 million or ten. >> You need a fleet of servers. (John laughing) >> And a fleet of servers! >> Have you guys played Fortnite? Or do you have kids that play? >> I look over my kid's shoulder. I might play it. >> I've played, but-- >> They run all their analytics on us. They've got about 14 petabytes in S3 using S3 as their data lake, with EMR and Athena for analytics. >> We got a season-- >> I mean, think about that F1 example on keynotes today. Great example of insights. We apply that kind of concept to Fortnite, by the way, Fortnite has theCUBE in there. It's always a popular term. We noticed that, the hastag, #wherestheCUBEtoday. (Rahul chuckling) I couldn't resist. But the analytics you could get out of all that data, every interaction, all that gesture data. I mean, what are some of the things they're doing? Can you share how they're using the new tech to scale up and get these insights? >> Yeah, absolutely. So they're doing a bunch of things. I mean, one is just the health of the systems when you've got hundreds of millions of players. You need to know if you're up and it's working. The second is around engagement. What games, what collection of people work well together. And then it's what incentives they create in the game, what power ups people buy that lead to continued engagement, 'cause that defines success over the long term. What gets people coming back? And then they have an offline analytics process where they're looking at reporting, and history, and telemetry, so it's very comprehensive. So you're exactly right about gaming and analytics being a huge consumer of databases. >> Now, Shawn, didn't you guys have hard news today on DynamoDB, or? >> Yeah today we announce DynamoDB On-Demand, so customers that basically have workloads that could spike up and then all of a sudden drop off, a lot of these customers basically don't even want to think about capacity planning. They don't want to guess. They just want to basically pay only for what they're using. So we announced DynamoDB On-Demand. The developer experience is simple. You create a table and you putyour read/write capacity in the on-demand mode, and you literally only pay for the request that your workload puts through system. >> It's a great service actually. Again, making life easier for customers. Lower the bill, manage capacity, make things go better, faster, enables value. >> It's all about improving the customer experience. >> Alright, guys, I really appreciate you coming in. I'm really interested in following what you guys do in the future. I'm sure a lot of people watching will be as well, as analytics and AI become a real part of, as you guys move the stack and create that API model for, what you did for infrastructure, for apps. A total game changer, we believe. We're interested in following you guys, I'm sure others are. Where are you going to be this year? What's your focus? Where can people find out more besides going to Amazon site? Is there certain events you're going to be at? How do people get more information and what's the plans? >> There's actually some sessions on lake formation, blockchain that we're doing here. We'll have a continuous stream of summits, so as the AWS Summit calendar for 2019 gets published that's a great place to go for more information. And then just engage with us either on social media or through the web and we'll be happy to follow up. >> Alright, well, we'll do a good job on amplifying. A lot of people are interested, certainly blockchain, super hot. But people want better, stronger, more stable, but they want the decentralized immutable database model. >> Cryptographically verifiable! >> And see as everyone knows. >> Scalable! >> Anyone who wants to keep those, they talk about CUBE coins but I haven't said CUBE coin once on this episode. Wait for those tokens to be released soon. More coverage after this short break, stay with us. I'm John Furrier, and Dave Vellante, we'll be right back. (futuristic buzzing) (futuristic electronic music)

Published Date : Nov 29 2018

SUMMARY :

Brought to you by Amazon Web Services, of how they're going to roll out, thinking about blockchain. it's great to be here. How are you guys doing blockchain? And for that we have Amazon QLDB, which is an immutable Wasn't that before the keynote? And they got So a lot of people ask the question is, that when you write to that database, And the advantage relative Like with QLDB, you can find it's going to be two What problem were you guys trying where to get data from, you have to land it in S3, And that then becomes searchable. Shawn, what's your job, what do you do? So our team builds all the non-relational that you guys are trying to raise the bar on? You know, that's a really good question you ask. and I'll get to it and let it be ready think about if you and I are building a game, You need a fleet of servers. I might play it. as their data lake, with EMR and Athena for analytics. But the analytics you could get out of all that data, 'cause that defines success over the long term. and you literally only pay for the request Lower the bill, manage capacity, improving the customer experience. I'm really interested in following what you guys And then just engage with us either on social media A lot of people are interested, I'm John Furrier, and Dave Vellante, we'll be right back.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Amazon Web Services	ORGANIZATION	0.99+
Rahul	PERSON	0.99+
Rahul Pathak	PERSON	0.99+
Shawn	PERSON	0.99+
Shawn Bice	PERSON	0.99+
John Furrier	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
ten players	QUANTITY	0.99+
10 million	QUANTITY	0.99+
Dave	PERSON	0.99+
ten	QUANTITY	0.99+
Fortnite	TITLE	0.99+
Intel	ORGANIZATION	0.99+
Las Vegas	LOCATION	0.99+
two	QUANTITY	0.99+
One case	QUANTITY	0.99+
100 million	QUANTITY	0.99+
One	QUANTITY	0.99+
two things	QUANTITY	0.99+
This year	DATE	0.99+
Today	DATE	0.99+
today	DATE	0.99+
S3	TITLE	0.99+
one user	QUANTITY	0.99+
Two sets	QUANTITY	0.99+
six years	QUANTITY	0.99+
EMR	ORGANIZATION	0.98+
second	QUANTITY	0.98+
DynamoDB	TITLE	0.98+
Athena	ORGANIZATION	0.98+
three times	QUANTITY	0.98+
John	PERSON	0.97+
re:Invent	EVENT	0.97+
2019	DATE	0.97+
Day one	QUANTITY	0.95+
one	QUANTITY	0.94+
this year	DATE	0.93+
this morning	DATE	0.91+
hundreds of millions of players	QUANTITY	0.91+
Ethereum	TITLE	0.88+
a second	QUANTITY	0.87+
re:Invent 2018	EVENT	0.85+
about 14 petabytes	QUANTITY	0.85+
single place	QUANTITY	0.85+
Indonesian	OTHER	0.81+
Hyperledger Fabric	TITLE	0.81+
Beanstalk	TITLE	0.79+
Invent 2018	EVENT	0.77+
first year	QUANTITY	0.75+
Neptune	TITLE	0.72+
re:	EVENT	0.67+
Timestream	ORGANIZATION	0.66+
QLDB	ORGANIZATION	0.65+
ElastiCache	TITLE	0.63+
tabase	PERSON	0.62+
DynamoDB	ORGANIZATION	0.61+
Elastic Beanstalk	TITLE	0.61+
Guardian Life Insurance	ORGANIZATION	0.56+
theCUBE	ORGANIZATION	0.5+
them	QUANTITY	0.5+
ElastiCache	ORGANIZATION	0.48+
Blockchain	OTHER	0.46+
QLDB	TITLE	0.45+
F1	TITLE	0.45+

Rahul Pathak, AWS | Inforum DC 2018

>> Live, from Washington, D.C., it's theCUBE! Covering Inforum DC 2018. Brought to you by Infor. >> Well, welcome back. We are here on theCUBE. Thanks for joining us here as we continue our coverage here at Inforum 18. We're in Washington D.C., at the Walter Washington Convention Center. I'm John Walls, with Dave Vellante and we're joined now by Rahul Pathak, who is the G.M. at Amazon Athena and Amazon EMR. >> Hey there. Rahul, nice to see you, sir. >> Nice to see you as well. Thanks for having me. >> Thank you for being with us, um, now you spoke earlier, at the executive forum, and, um, wanted to talk to you about the title of the presentation. It was Datalinks and Analytics: the Coming Wave of Brilliance. Alright, so tell me about the title, but more about the talk, too. >> Sure. Uh, so the talk was really about a set of components and a set of transdriving data lake adoption and then how we partner with Infor to allow Infor to provide a data lake that's customized for their vertical lines of business to their customers. And I think part of the notion is that we're coming from a world where customers had to decide what data they could keep, because their systems were expensive. Now, moving to a world of data lakes where storage and analytics is a much lower cost and so customers don't have to make decisions about what data to throw away. They can keep it all and then decide what's valuable later. So we believe we're in this transition, an inflection point where you'll see a lot more insights possible, with a lot of novel types of analytics, much more so than we could do, uh, to this point. >> That's the brilliance. That's the brilliance of it. >> Right. >> Right? Opportunity to leverage... >> To do more. >> Like, that you never could before. >> Exactly. >> I'm sorry, Dave. >> No, no. That's okay. So, if you think about the phases of so called 'big data,' you know, the.... We went from, sort of, EDW to cheaper... >> (laughs) Sure. >> Data warehouses that were distributed, right? And this guy always joked that the ROI of a dupe was reduction of investment, and that's what it became. And as a result, a lot of the so-called data lakes just became stagnant, and so then you had a whole slew of companies that emerged trying to get, sort of, clean up the swamp, so to speak. Um, you guys provide services and tools, so you're like "Okay guys, here it is. We're going to make it easier for you." One of the challenges that Hadoop and big data generally had was the complexity, and so, what we noticed was the cloud guys--not just AWS, but in particular AWS really started to bring in tooling that simplified the effort around big data. >> Right. >> So fast-forward to today, and now we're at the point of trying to get insights-- data's plentiful,insights aren't. Um, bring us up to speed on Amazon's big data strategy, the status, what customers are doing. Where are we at in those waves? >> Uh, it's a big question, but yeah, absolutely. So... >> It's a John Furrier question. (laughter) So what we're seeing is this transition from sort of classic EDW to S3 based data lakes. S3's our Amazon storage service, and it's really been foundational for customers. And what customers are doing is they're bringing their data to S3 and open data formats. EDWs still have a role to play. And then we offer services that make it easy to catalog and transform the data in S3, as well as the data in customer databases and data warehouses, and then make that available for systems to drive insight. And, when I talk about that, what I mean is, we have the classic reporting and visualization use cases, but increasingly we're seeing a lot more real time event processing, and so we have services like Kinesis Analytics that makes it easy to run real time queries on data as it's moving. And then we're seeing the integration of machine learning into the stacks. Once you've got data in S3, it's available to all of these different analytic services simultaneously, and so now you're able to run your reporting, your real time processing, but also now use machine learning to make predictive analytics and decisions. And then I would say a fourth piece of this is there's really been, with machine learning and deep learning and embedding them in developer services, there's now been a way to get at data that was historically opaque. So, if you had an audio recording of a social support call, you can now put it through a service that will actually transcribe it, tell you the sentiment in the call and that becomes data that you can then track and measure and report against. So, there's been this real explosion in capability and flexibility. And what we've tried to do at AWS is provide managed services to customers, so that they can assemble sophisticated applications out of building blocks that make each of these components easier, and, that focus on being best of breed in their particular use case. >> And you're responsible for EMR, correct? >> Uh, so I own a few of these, EMR, Athena and Glue. And, uh, really these are... EMR's Open Source, Spark and Hadoop, um, with customized clusters that upbraid directly against S3 data lakes, so no need to load in HDFS, so you avoid that staleness point that you mentioned. And then, Athena is a serverless sequel NS3, so you can let any analyst log in, just get a sequel prompt and run a query. And then Glue is for cataloging the data in your data lake and databases, and for running transformations to get data from raw form into an efficient form for querying, typically. >> So, EMR is really the first service, if I recall, right? The sort of first big data service-- >> That's right. >> -that you offered, right? And, as you say, you really begin to simplify for customers, because the dupe complexity was just unwieldy, and the momentum is still there with EMR? Are people looking for alternatives? Sounds like it's still a linchpin of the strategy? >> No, absolutely. I mean, I think what we've seen is, um, customers bring data to S3, they will then use a service, like Redshift, for petabyte scale data warehousing, they'll use EMR for really arbitrary analytics, using opensource technologies, and then they'll use Athena for broad data lake query and access. So these things are all very much complimentary, uh, to each other. >> How do you define, just the concept of data lakes, uh, versus other approaches to clients? And trying to explain to them, you know, the value and the use for them, uh, I guess ultimately how they can best leverage it for their purposes? How do you walk them through that? >> Yeah, absolutely. So, there's, um. You know, that starts from the principles around how data is changing. So before we used to have, typically, tabular data coming out of ERP systems, or CRM systems, going into data warehouses. Now we're seeing a lot more variety of data. So, you might have tweets, you might have JSON events, you might have log events, real time data. And these don't fit traditional... well into the traditional relational tabular model, ah, so what data lakes allow you to do is, you can actually keep both types of the data. You can keep your tabular data indirectly in your data lake and you can bring in these new types of data, the semi-structured or the unstructured data sets. And they can all live in the data lake. And the key is to catalog that all so you know what you have and then figure out how to get that catalog visible to the analytic layer. And so the value becomes you can actually now keep all your data. You don't have to make decisions about it a priori about what's going to be valuable or what format it's going to be useful in. And you don't have to throw away data, because it's expensive to store it in traditional systems. And this gives you the ability then to replay the past when you develop better ideas in the future about how to leverage that data. Ah, so there's a benefit to being able to store everything. And then I would say the third big benefit is around um, by placing data and data lakes in open data formats, whether that's CSV or JSON or a more efficient formats, that allows customers to take advantage of best of breed analytics technology at any point in time without having to replatform their data. So you get this technical agility that's really powerful for customers, because capabilities evolve over time, constantly, and so, being in a position to take advantage of them easily is a real competitive advantage for customers. >> I want to get to Infor, but this is so much fun, I have some other questions, because Amazon's such a force in this space. Um, when you think about things like Redshift, S3, Pedisys, DynamoDB...we're a customer, these are all tools we're using. Aurora. Um, the data pipeline starts to get very complex, and the great thing about AWS is I get, you know, API access to each of those and Primitive access. The drawback is, it starts to get complicated, my data pipeline gets elongated and I'm not sure whether I should run it on this service or that service until I get my bill at the end of the month. So, are there things you're doing to help... First of all, is that a valid concern of customers and what are you doing to help customers in that regard? >> Yeah, so, we do provide a lot of capability and I think our core idea is to provide the best tool for the job, with APIs to access them and combine them and compose them. So, what we're trying to do to help simplify this is A) build in more proscriptive guidance into our services about look, if you're trying to do x, here's the right way to do x, at least the right way to start with x and then we can evolve and adapt. Uh, we're also working hard with things like blogs and solution templates and cloud formation templates to automatically stand up environments, and then, the third piece is we're trying to bring in automation and machine learning to simplify the creation of these data pipelines. So, Glue for example. When you put data in S3, it will actually crawl it on your behalf and infer its structure and store that structure in a catalog and then once you've got a source table, and a destination table, you can point those out and Glue will then automatically generate a pipeline for you to go from A to B, that you can then edit or store in version control. So we're trying to make these capabilities easier to access and provide more guidance, so that you can actually get up and running more quickly, without giving up the power that comes from having the granular access. >> That's a great answer. Because the granularity's critical, because it allows you, as the market changes, it allows you... >> To adapt. To move fast, right? And so you don't want to give that up, but at the same time, you're bringing in complexity and you just, I think, answered it well, in terms of how you're trying to simplify that. The strategy's obviously worked very well. Okay, let's talk about Infor now. Here's a big ISP partner. They've got the engineering resources to deal with all this stuff, and they really seem to have taken advantage of it. We were talking earlier, that, I don't know if you heard Charles's keynote this morning, but he said, when we were an on prem software company, we didn't manage customer servers for them. Back then, the server was the server, uh software companies didn't care about the server infrastructure. Today it's different. It's like the cloud is giving Infor strategic advantage. The flywheel effect that you guys talk about spins off innovation that they can exploit in new ways. So talk about your relationship with Infor, and kind of the history of where it's come and where it's going. >> Sure. So, Infor's a great partner. We've been a partner for over four years, they're one of our first all-in partners, and we have a great working relationship with them. They're sophisticated. They understand our services well. And we collaborate on identifying ways that we can make our services better for their use cases. And what they've been able to do is take all of the years of industry and domain expertise that they've gained over time in their vertical segments, and with their customers, and bring that to bear by using the components that we provide in the cloud. So all these services that I mentioned, the global footprint, the security capabilities, the, um, all of the various compliance certifications that we offer act as accelerators for what Infor's trying to do, and then they're able to leverage their intellectual property and their relationships and experience they've built up over time to get this global footprint that they can deploy for their customers, that gets better over time as we add new capabilities, they can build that into the Infor platform, and then that rolls out to all of their customers much more quickly than it could before. >> And they seem to be really driving hard, I have not heard an enterprise software company talk so much about data, and how they're exploiting data, the way that I've heard Infor talk about it. So, data's obviously key, it's the lifeblood-- people say it's the new oil--I'm not sure that's the best analogy. I can only put oil in my house or my car, I can't put it in both. Data--I can do so many things with it, so, um... >> I suspect that analogy will evolve. >> I think it should. >> I'm already thinking about it now. >> You heard it here first in the Cube. >> You keep going, I'll come up with something >> Don't use that anymore. >> Scratch the oil. >> Okay, so, your perspectives on Infor, it's sort of use of data and what Amazon's role is in terms of facilitating that. >> So what we're providing is a platform, a set of services with powerful building blocks, that Infor can then combine into their applications that match the needs of their customers. And so what we're looking to do is give them a broad set of capabilities, that they can build into their offerings. So, CloudSuite is built entirely on us, and then Infor OS is a shared set of services and part of that is their data lake, which uses a number of our analytic services underneath. And so, what Infor's able to do for their customers is break down data silos within their customer organizations and provide a common way to think about data and machine learning and IoT applications across data in the data lake. And we view our role as really a supporting partner for them in providing a set of capabilities that they can then use to scale and grow and deploy their applications. >> I want to ask you about--I mean, security-- I've always been comfortable with cloud security, maybe I'm naive--but compliance is something that's interesting and something you said before... I think you said cataloging Glue allows you to essentially keep all the data, right? And my concern about that is, from a governance perspective, the legal counsel might say, "Well, I don't "want to keep all my data, if it's work in process, "I want to get rid of it "or if there's a smoking gun in there, "I want to get rid of it as soon as I can." Keep data as long as possible but no longer, to sort of paraphrase Einstein. So, what do you say to that? Do you have customers in the legal office that say, "Hey, we don't want to keep data forever, "and how can you help?" >> Yeah, so, just to refine the point on Glue. What Glue does is it gives you essentially a catalog, which is a map of all your data. Whether you choose to keep that data or not keep that data, that's a function of the application. So, absolutely >> Sure. Right. We have customers that say, "Look, here are my data sets for "whether it's new regulations, or I just don't want this "set of data to exist anymore, or this customer's no longer with us and we need to delete that," we provide all of those capabilities. So, our goal is to really give customers the set of features, functionality, and compliance certifications they need to express the enterprise security policies that they have, and ensure that they're complying with them. And, so, then if you have data sets that need to be deleted, we provide capabilities to do that. And then the other side of that is you want the audit capabilities, so we actually log every API access in the environment in a service called CloudTrail and then you can actually verify by going back and looking at CloudTrail that only the things that you wanted to have happen, actually did happen. >> So, you seem very relaxed. I have to ask you what life is like at Amazon, because when I was down at AWS's D.C. offices, and you walk in there, and there's this huge-- I don't know if you've seen it-- there's this giant graph of the services launched and announced, from 2006, when EC2 first came out, til today. And it's just this ridiculous set of services. I mean the line, the graph is amazing. So you're moving at this super, hyper pace. What's life like at AWS? >> You know, I've been there almost seven years. I love it. It's been fantastic. I was an entrepreneur and came out of startups before AWS, and when I joined, I found an environment where you can continue to be entrepreneurial and active on behalf of you customers, but you have the ability to have impact at a global scale. So it's been super fun. The pace is fast, but exhilarating. We're working on things we're excited about, and we're working on things that we believe matter, and make a difference to our customers. So, it's been really fun. >> Well, so you got--I mean, you're right at the heart of what I like to call the innovation sandwich. You've got data, tons of data, obviously, in the cloud. You're a leader and increasingly becoming sophisticated in machine intelligence. So you've got data, machine intelligence, or AI, applied to that data, and you've got cloud for scale, cloud for economics, cloud for innovation, you're able to attract startups--that's probably how you found AWS to begin with, right? >> That's right. >> All the startups, including ours, we want to be on AWS. That's where the developers want to be. And so, again, it's an overused word, but that flywheel of innovation occurs. And that to us is the innovation sandwich, it's not Moore's Law anymore, right? For decades this industry marched to the cadence of Moore's Law. Now it's a much more multi-dimensional matrix and it's exciting and sometimes scary. >> Yeah. No, I think you touched on a lot of great points. It's really fun. I mean, I think, for us, the core is, we want to put things together the customers want. We want to make them broadly available. We want to partner with our customers to understand what's working and what's not. We want to pass on efficiencies when we can and then that helps us speed up the cycle of learning. >> Well, Rahul, I actually was going to say, I think he's so relaxed because he's on theCUBE. >> Ah, could be. >> Right, that's it. We just like to do that with people. >> No, you're fantastic. >> Thanks for being with us. >> It's a pleasure. >> We appreciate the insights, and we certainly wish you well with the rest of the show here. >> Excellent. Thank you very much, it was great to be here. >> Thank you, sir. >> You're welcome. >> You're watching theCUBE. We are live here in Washington, D.C. at Inforum 18. (techno music)

Published Date : Sep 25 2018

SUMMARY :

Brought to you by Infor. We're in Washington D.C., at the Walter Washington Rahul, nice to see you, sir. Nice to see you as well. and, um, wanted to talk to you about the title and so customers don't have to make decisions about That's the brilliance of it. Opportunity to leverage... So, if you think about the phases of so called 'big data,' just became stagnant, and so then you had a whole So fast-forward to today, and now we're at the point of Uh, it's a big question, but yeah, absolutely. and that becomes data that you can then track so you can let any analyst log in, just get a customers bring data to S3, they will then use a service, And the key is to catalog that all so you know what you have and the great thing about AWS is I get, you know, and provide more guidance, so that you can actually Because the granularity's critical, because it allows They've got the engineering resources to deal with all this and then they're able to leverage And they seem to be really driving hard, it's sort of use of data and what Amazon's role is that match the needs of their customers. So, what do you say to that? Whether you choose to keep that data or not keep that data, looking at CloudTrail that only the things that you I have to ask you what life is like at Amazon, and make a difference to our customers. Well, so you got--I mean, you're right at the heart And that to us is the innovation sandwich, No, I think you touched on a lot of great points. I think he's so relaxed because he's on theCUBE. We just like to do that with people. We appreciate the insights, and we certainly Thank you very much, it was great to be here. We are live here in Washington, D.C. at Inforum 18.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Rahul Pathak	PERSON	0.99+
Rahul	PERSON	0.99+
AWS	ORGANIZATION	0.99+
John Walls	PERSON	0.99+
Charles	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
2006	DATE	0.99+
John Furrier	PERSON	0.99+
Dave	PERSON	0.99+
Washington, D.C.	LOCATION	0.99+
Washington D.C.	LOCATION	0.99+
Einstein	PERSON	0.99+
Today	DATE	0.99+
Infor	ORGANIZATION	0.99+
D.C.	LOCATION	0.99+
third piece	QUANTITY	0.99+
first service	QUANTITY	0.99+
both	QUANTITY	0.99+
S3	TITLE	0.99+
fourth piece	QUANTITY	0.99+
Amazon Athena	ORGANIZATION	0.98+
Athena	TITLE	0.98+
CloudSuite	TITLE	0.98+
over four years	QUANTITY	0.98+
Walter Washington Convention Center	LOCATION	0.98+
Moore's Law	TITLE	0.98+
first	QUANTITY	0.98+
one	QUANTITY	0.97+
EMR	TITLE	0.97+
CloudTrail	TITLE	0.96+
today	DATE	0.96+
Datalinks and Analytics: the Coming Wave of Brilliance	TITLE	0.95+
Glue	ORGANIZATION	0.95+
Redshift	TITLE	0.94+
Infor	TITLE	0.94+
First	QUANTITY	0.94+
this morning	DATE	0.94+
almost seven years	QUANTITY	0.94+
each	QUANTITY	0.91+
prem	ORGANIZATION	0.91+
Amazon EMR	ORGANIZATION	0.9+
DC	LOCATION	0.87+
EDW	TITLE	0.86+
Spark	TITLE	0.85+
both types	QUANTITY	0.84+
JSON	TITLE	0.83+
EC2	TITLE	0.82+
EMR	ORGANIZATION	0.82+
NS3	TITLE	0.82+
Athena	ORGANIZATION	0.81+
Hadoop	TITLE	0.8+
2018	DATE	0.78+
Kinesis Analytics	ORGANIZATION	0.77+
2018	EVENT	0.76+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Rahul Pathak: