Rahul Pathak, AWS | Inforum DC 2018

>> Live, from Washington, D.C., it's theCUBE! Covering Inforum DC 2018. Brought to you by Infor. >> Well, welcome back. We are here on theCUBE. Thanks for joining us here as we continue our coverage here at Inforum 18. We're in Washington D.C., at the Walter Washington Convention Center. I'm John Walls, with Dave Vellante and we're joined now by Rahul Pathak, who is the G.M. at Amazon Athena and Amazon EMR. >> Hey there. Rahul, nice to see you, sir. >> Nice to see you as well. Thanks for having me. >> Thank you for being with us, um, now you spoke earlier, at the executive forum, and, um, wanted to talk to you about the title of the presentation. It was Datalinks and Analytics: the Coming Wave of Brilliance. Alright, so tell me about the title, but more about the talk, too. >> Sure. Uh, so the talk was really about a set of components and a set of transdriving data lake adoption and then how we partner with Infor to allow Infor to provide a data lake that's customized for their vertical lines of business to their customers. And I think part of the notion is that we're coming from a world where customers had to decide what data they could keep, because their systems were expensive. Now, moving to a world of data lakes where storage and analytics is a much lower cost and so customers don't have to make decisions about what data to throw away. They can keep it all and then decide what's valuable later. So we believe we're in this transition, an inflection point where you'll see a lot more insights possible, with a lot of novel types of analytics, much more so than we could do, uh, to this point. >> That's the brilliance. That's the brilliance of it. >> Right. >> Right? Opportunity to leverage... >> To do more. >> Like, that you never could before. >> Exactly. >> I'm sorry, Dave. >> No, no. That's okay. So, if you think about the phases of so called 'big data,' you know, the.... We went from, sort of, EDW to cheaper... >> (laughs) Sure. >> Data warehouses that were distributed, right? And this guy always joked that the ROI of a dupe was reduction of investment, and that's what it became. And as a result, a lot of the so-called data lakes just became stagnant, and so then you had a whole slew of companies that emerged trying to get, sort of, clean up the swamp, so to speak. Um, you guys provide services and tools, so you're like "Okay guys, here it is. We're going to make it easier for you." One of the challenges that Hadoop and big data generally had was the complexity, and so, what we noticed was the cloud guys--not just AWS, but in particular AWS really started to bring in tooling that simplified the effort around big data. >> Right. >> So fast-forward to today, and now we're at the point of trying to get insights-- data's plentiful,insights aren't. Um, bring us up to speed on Amazon's big data strategy, the status, what customers are doing. Where are we at in those waves? >> Uh, it's a big question, but yeah, absolutely. So... >> It's a John Furrier question. (laughter) So what we're seeing is this transition from sort of classic EDW to S3 based data lakes. S3's our Amazon storage service, and it's really been foundational for customers. And what customers are doing is they're bringing their data to S3 and open data formats. EDWs still have a role to play. And then we offer services that make it easy to catalog and transform the data in S3, as well as the data in customer databases and data warehouses, and then make that available for systems to drive insight. And, when I talk about that, what I mean is, we have the classic reporting and visualization use cases, but increasingly we're seeing a lot more real time event processing, and so we have services like Kinesis Analytics that makes it easy to run real time queries on data as it's moving. And then we're seeing the integration of machine learning into the stacks. Once you've got data in S3, it's available to all of these different analytic services simultaneously, and so now you're able to run your reporting, your real time processing, but also now use machine learning to make predictive analytics and decisions. And then I would say a fourth piece of this is there's really been, with machine learning and deep learning and embedding them in developer services, there's now been a way to get at data that was historically opaque. So, if you had an audio recording of a social support call, you can now put it through a service that will actually transcribe it, tell you the sentiment in the call and that becomes data that you can then track and measure and report against. So, there's been this real explosion in capability and flexibility. And what we've tried to do at AWS is provide managed services to customers, so that they can assemble sophisticated applications out of building blocks that make each of these components easier, and, that focus on being best of breed in their particular use case. >> And you're responsible for EMR, correct? >> Uh, so I own a few of these, EMR, Athena and Glue. And, uh, really these are... EMR's Open Source, Spark and Hadoop, um, with customized clusters that upbraid directly against S3 data lakes, so no need to load in HDFS, so you avoid that staleness point that you mentioned. And then, Athena is a serverless sequel NS3, so you can let any analyst log in, just get a sequel prompt and run a query. And then Glue is for cataloging the data in your data lake and databases, and for running transformations to get data from raw form into an efficient form for querying, typically. >> So, EMR is really the first service, if I recall, right? The sort of first big data service-- >> That's right. >> -that you offered, right? And, as you say, you really begin to simplify for customers, because the dupe complexity was just unwieldy, and the momentum is still there with EMR? Are people looking for alternatives? Sounds like it's still a linchpin of the strategy? >> No, absolutely. I mean, I think what we've seen is, um, customers bring data to S3, they will then use a service, like Redshift, for petabyte scale data warehousing, they'll use EMR for really arbitrary analytics, using opensource technologies, and then they'll use Athena for broad data lake query and access. So these things are all very much complimentary, uh, to each other. >> How do you define, just the concept of data lakes, uh, versus other approaches to clients? And trying to explain to them, you know, the value and the use for them, uh, I guess ultimately how they can best leverage it for their purposes? How do you walk them through that? >> Yeah, absolutely. So, there's, um. You know, that starts from the principles around how data is changing. So before we used to have, typically, tabular data coming out of ERP systems, or CRM systems, going into data warehouses. Now we're seeing a lot more variety of data. So, you might have tweets, you might have JSON events, you might have log events, real time data. And these don't fit traditional... well into the traditional relational tabular model, ah, so what data lakes allow you to do is, you can actually keep both types of the data. You can keep your tabular data indirectly in your data lake and you can bring in these new types of data, the semi-structured or the unstructured data sets. And they can all live in the data lake. And the key is to catalog that all so you know what you have and then figure out how to get that catalog visible to the analytic layer. And so the value becomes you can actually now keep all your data. You don't have to make decisions about it a priori about what's going to be valuable or what format it's going to be useful in. And you don't have to throw away data, because it's expensive to store it in traditional systems. And this gives you the ability then to replay the past when you develop better ideas in the future about how to leverage that data. Ah, so there's a benefit to being able to store everything. And then I would say the third big benefit is around um, by placing data and data lakes in open data formats, whether that's CSV or JSON or a more efficient formats, that allows customers to take advantage of best of breed analytics technology at any point in time without having to replatform their data. So you get this technical agility that's really powerful for customers, because capabilities evolve over time, constantly, and so, being in a position to take advantage of them easily is a real competitive advantage for customers. >> I want to get to Infor, but this is so much fun, I have some other questions, because Amazon's such a force in this space. Um, when you think about things like Redshift, S3, Pedisys, DynamoDB...we're a customer, these are all tools we're using. Aurora. Um, the data pipeline starts to get very complex, and the great thing about AWS is I get, you know, API access to each of those and Primitive access. The drawback is, it starts to get complicated, my data pipeline gets elongated and I'm not sure whether I should run it on this service or that service until I get my bill at the end of the month. So, are there things you're doing to help... First of all, is that a valid concern of customers and what are you doing to help customers in that regard? >> Yeah, so, we do provide a lot of capability and I think our core idea is to provide the best tool for the job, with APIs to access them and combine them and compose them. So, what we're trying to do to help simplify this is A) build in more proscriptive guidance into our services about look, if you're trying to do x, here's the right way to do x, at least the right way to start with x and then we can evolve and adapt. Uh, we're also working hard with things like blogs and solution templates and cloud formation templates to automatically stand up environments, and then, the third piece is we're trying to bring in automation and machine learning to simplify the creation of these data pipelines. So, Glue for example. When you put data in S3, it will actually crawl it on your behalf and infer its structure and store that structure in a catalog and then once you've got a source table, and a destination table, you can point those out and Glue will then automatically generate a pipeline for you to go from A to B, that you can then edit or store in version control. So we're trying to make these capabilities easier to access and provide more guidance, so that you can actually get up and running more quickly, without giving up the power that comes from having the granular access. >> That's a great answer. Because the granularity's critical, because it allows you, as the market changes, it allows you... >> To adapt. To move fast, right? And so you don't want to give that up, but at the same time, you're bringing in complexity and you just, I think, answered it well, in terms of how you're trying to simplify that. The strategy's obviously worked very well. Okay, let's talk about Infor now. Here's a big ISP partner. They've got the engineering resources to deal with all this stuff, and they really seem to have taken advantage of it. We were talking earlier, that, I don't know if you heard Charles's keynote this morning, but he said, when we were an on prem software company, we didn't manage customer servers for them. Back then, the server was the server, uh software companies didn't care about the server infrastructure. Today it's different. It's like the cloud is giving Infor strategic advantage. The flywheel effect that you guys talk about spins off innovation that they can exploit in new ways. So talk about your relationship with Infor, and kind of the history of where it's come and where it's going. >> Sure. So, Infor's a great partner. We've been a partner for over four years, they're one of our first all-in partners, and we have a great working relationship with them. They're sophisticated. They understand our services well. And we collaborate on identifying ways that we can make our services better for their use cases. And what they've been able to do is take all of the years of industry and domain expertise that they've gained over time in their vertical segments, and with their customers, and bring that to bear by using the components that we provide in the cloud. So all these services that I mentioned, the global footprint, the security capabilities, the, um, all of the various compliance certifications that we offer act as accelerators for what Infor's trying to do, and then they're able to leverage their intellectual property and their relationships and experience they've built up over time to get this global footprint that they can deploy for their customers, that gets better over time as we add new capabilities, they can build that into the Infor platform, and then that rolls out to all of their customers much more quickly than it could before. >> And they seem to be really driving hard, I have not heard an enterprise software company talk so much about data, and how they're exploiting data, the way that I've heard Infor talk about it. So, data's obviously key, it's the lifeblood-- people say it's the new oil--I'm not sure that's the best analogy. I can only put oil in my house or my car, I can't put it in both. Data--I can do so many things with it, so, um... >> I suspect that analogy will evolve. >> I think it should. >> I'm already thinking about it now. >> You heard it here first in the Cube. >> You keep going, I'll come up with something >> Don't use that anymore. >> Scratch the oil. >> Okay, so, your perspectives on Infor, it's sort of use of data and what Amazon's role is in terms of facilitating that. >> So what we're providing is a platform, a set of services with powerful building blocks, that Infor can then combine into their applications that match the needs of their customers. And so what we're looking to do is give them a broad set of capabilities, that they can build into their offerings. So, CloudSuite is built entirely on us, and then Infor OS is a shared set of services and part of that is their data lake, which uses a number of our analytic services underneath. And so, what Infor's able to do for their customers is break down data silos within their customer organizations and provide a common way to think about data and machine learning and IoT applications across data in the data lake. And we view our role as really a supporting partner for them in providing a set of capabilities that they can then use to scale and grow and deploy their applications. >> I want to ask you about--I mean, security-- I've always been comfortable with cloud security, maybe I'm naive--but compliance is something that's interesting and something you said before... I think you said cataloging Glue allows you to essentially keep all the data, right? And my concern about that is, from a governance perspective, the legal counsel might say, "Well, I don't "want to keep all my data, if it's work in process, "I want to get rid of it "or if there's a smoking gun in there, "I want to get rid of it as soon as I can." Keep data as long as possible but no longer, to sort of paraphrase Einstein. So, what do you say to that? Do you have customers in the legal office that say, "Hey, we don't want to keep data forever, "and how can you help?" >> Yeah, so, just to refine the point on Glue. What Glue does is it gives you essentially a catalog, which is a map of all your data. Whether you choose to keep that data or not keep that data, that's a function of the application. So, absolutely >> Sure. Right. We have customers that say, "Look, here are my data sets for "whether it's new regulations, or I just don't want this "set of data to exist anymore, or this customer's no longer with us and we need to delete that," we provide all of those capabilities. So, our goal is to really give customers the set of features, functionality, and compliance certifications they need to express the enterprise security policies that they have, and ensure that they're complying with them. And, so, then if you have data sets that need to be deleted, we provide capabilities to do that. And then the other side of that is you want the audit capabilities, so we actually log every API access in the environment in a service called CloudTrail and then you can actually verify by going back and looking at CloudTrail that only the things that you wanted to have happen, actually did happen. >> So, you seem very relaxed. I have to ask you what life is like at Amazon, because when I was down at AWS's D.C. offices, and you walk in there, and there's this huge-- I don't know if you've seen it-- there's this giant graph of the services launched and announced, from 2006, when EC2 first came out, til today. And it's just this ridiculous set of services. I mean the line, the graph is amazing. So you're moving at this super, hyper pace. What's life like at AWS? >> You know, I've been there almost seven years. I love it. It's been fantastic. I was an entrepreneur and came out of startups before AWS, and when I joined, I found an environment where you can continue to be entrepreneurial and active on behalf of you customers, but you have the ability to have impact at a global scale. So it's been super fun. The pace is fast, but exhilarating. We're working on things we're excited about, and we're working on things that we believe matter, and make a difference to our customers. So, it's been really fun. >> Well, so you got--I mean, you're right at the heart of what I like to call the innovation sandwich. You've got data, tons of data, obviously, in the cloud. You're a leader and increasingly becoming sophisticated in machine intelligence. So you've got data, machine intelligence, or AI, applied to that data, and you've got cloud for scale, cloud for economics, cloud for innovation, you're able to attract startups--that's probably how you found AWS to begin with, right? >> That's right. >> All the startups, including ours, we want to be on AWS. That's where the developers want to be. And so, again, it's an overused word, but that flywheel of innovation occurs. And that to us is the innovation sandwich, it's not Moore's Law anymore, right? For decades this industry marched to the cadence of Moore's Law. Now it's a much more multi-dimensional matrix and it's exciting and sometimes scary. >> Yeah. No, I think you touched on a lot of great points. It's really fun. I mean, I think, for us, the core is, we want to put things together the customers want. We want to make them broadly available. We want to partner with our customers to understand what's working and what's not. We want to pass on efficiencies when we can and then that helps us speed up the cycle of learning. >> Well, Rahul, I actually was going to say, I think he's so relaxed because he's on theCUBE. >> Ah, could be. >> Right, that's it. We just like to do that with people. >> No, you're fantastic. >> Thanks for being with us. >> It's a pleasure. >> We appreciate the insights, and we certainly wish you well with the rest of the show here. >> Excellent. Thank you very much, it was great to be here. >> Thank you, sir. >> You're welcome. >> You're watching theCUBE. We are live here in Washington, D.C. at Inforum 18. (techno music)

Published Date : Sep 25 2018

SUMMARY :

Brought to you by Infor. We're in Washington D.C., at the Walter Washington Rahul, nice to see you, sir. Nice to see you as well. and, um, wanted to talk to you about the title and so customers don't have to make decisions about That's the brilliance of it. Opportunity to leverage... So, if you think about the phases of so called 'big data,' just became stagnant, and so then you had a whole So fast-forward to today, and now we're at the point of Uh, it's a big question, but yeah, absolutely. and that becomes data that you can then track so you can let any analyst log in, just get a customers bring data to S3, they will then use a service, And the key is to catalog that all so you know what you have and the great thing about AWS is I get, you know, and provide more guidance, so that you can actually Because the granularity's critical, because it allows They've got the engineering resources to deal with all this and then they're able to leverage And they seem to be really driving hard, it's sort of use of data and what Amazon's role is that match the needs of their customers. So, what do you say to that? Whether you choose to keep that data or not keep that data, looking at CloudTrail that only the things that you I have to ask you what life is like at Amazon, and make a difference to our customers. Well, so you got--I mean, you're right at the heart And that to us is the innovation sandwich, No, I think you touched on a lot of great points. I think he's so relaxed because he's on theCUBE. We just like to do that with people. We appreciate the insights, and we certainly Thank you very much, it was great to be here. We are live here in Washington, D.C. at Inforum 18.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Rahul Pathak	PERSON	0.99+
Rahul	PERSON	0.99+
AWS	ORGANIZATION	0.99+
John Walls	PERSON	0.99+
Charles	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
2006	DATE	0.99+
John Furrier	PERSON	0.99+
Dave	PERSON	0.99+
Washington, D.C.	LOCATION	0.99+
Washington D.C.	LOCATION	0.99+
Einstein	PERSON	0.99+
Today	DATE	0.99+
Infor	ORGANIZATION	0.99+
D.C.	LOCATION	0.99+
third piece	QUANTITY	0.99+
first service	QUANTITY	0.99+
both	QUANTITY	0.99+
S3	TITLE	0.99+
fourth piece	QUANTITY	0.99+
Amazon Athena	ORGANIZATION	0.98+
Athena	TITLE	0.98+
CloudSuite	TITLE	0.98+
over four years	QUANTITY	0.98+
Walter Washington Convention Center	LOCATION	0.98+
Moore's Law	TITLE	0.98+
first	QUANTITY	0.98+
one	QUANTITY	0.97+
EMR	TITLE	0.97+
CloudTrail	TITLE	0.96+
today	DATE	0.96+
Datalinks and Analytics: the Coming Wave of Brilliance	TITLE	0.95+
Glue	ORGANIZATION	0.95+
Redshift	TITLE	0.94+
Infor	TITLE	0.94+
First	QUANTITY	0.94+
this morning	DATE	0.94+
almost seven years	QUANTITY	0.94+
each	QUANTITY	0.91+
prem	ORGANIZATION	0.91+
Amazon EMR	ORGANIZATION	0.9+
DC	LOCATION	0.87+
EDW	TITLE	0.86+
Spark	TITLE	0.85+
both types	QUANTITY	0.84+
JSON	TITLE	0.83+
EC2	TITLE	0.82+
EMR	ORGANIZATION	0.82+
NS3	TITLE	0.82+
Athena	ORGANIZATION	0.81+
Hadoop	TITLE	0.8+
2018	DATE	0.78+
Kinesis Analytics	ORGANIZATION	0.77+
2018	EVENT	0.76+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for NS3: