Ian Colle, AWS | SuperComputing 22

(lively music) >> Good morning. Welcome back to theCUBE's coverage at Supercomputing Conference 2022, live here in Dallas. I'm Dave Nicholson with my co-host Paul Gillin. So far so good, Paul? It's been a fascinating morning Three days in, and a fascinating guest, Ian from AWS. Welcome. >> Thanks, Dave. >> What are we going to talk about? Batch computing, HPC. >> We've got a lot, let's get started. Let's dive right in. >> Yeah, we've got a lot to talk about. I mean, first thing is we recently announced our batch support for EKS. EKS is our Kubernetes, managed Kubernetes offering at AWS. And so batch computing is still a large portion of HPC workloads. While the interactive component is growing, the vast majority of systems are just kind of fire and forget, and we want to run thousands and thousands of nodes in parallel. We want to scale out those workloads. And what's unique about our AWS batch offering, is that we can dynamically scale, based upon the queue depth. And so customers can go from seemingly nothing up to thousands of nodes, and while they're executing their work they're only paying for the instances while they're working. And then as the queue depth starts to drop and the number of jobs waiting in the queue starts to drop, then we start to dynamically scale down those resources. And so it's extremely powerful. We see lots of distributed machine learning, autonomous vehicle simulation, and traditional HPC workloads taking advantage of AWS Batch. >> So when you have a Kubernetes cluster does it have to be located in the same region as the HPC cluster that's going to be doing the batch processing, or does the nature of batch processing mean, in theory, you can move something from here to somewhere relatively far away to do the batch processing? How does that work? 'Cause look, we're walking around here and people are talking about lengths of cables in order to improve performance. So what does that look like when you peel back the cover and you look at it physically, not just logically, AWS is everywhere, but physically, what does that look like? >> Oh, physically, for us, it depends on what the customer's looking for. We have workflows that are all entirely within a single region. And so where they could have a portion of say the traditional HPC workflow, is within that region as well as the batch, and they're saving off the results, say to a shared storage file system like our Amazon FSx for Lustre, or maybe aging that back to an S3 object storage for a little lower cost storage solution. Or you can have customers that have a kind of a multi-region orchestration layer to where they say, "You know what? "I've got a portion of my workflow that occurs "over on the other side of the country "and I replicate my data between the East Coast "and the West Coast just based upon business needs. "And I want to have that available to customers over there. "And so I'll do a portion of it in the East Coast "a portion of it in the West Coast." Or you can think of that even globally. It really depends upon the customer's architecture. >> So is the intersection of Kubernetes with HPC, is this relatively new? I know you're saying you're, you're announcing it. >> It really is. I think we've seen a growing perspective. I mean, Kubernetes has been a long time kind of eating everything, right, in the enterprise space? And now a lot of CIOs in the industrial space are saying, "Why am I using one orchestration layer "to manage my HPC infrastructure and another one "to manage my enterprise infrastructure?" And so there's a growing appreciation that, you know what, why don't we just consolidate on one? And so that's where we've seen a growth of Kubernetes infrastructure and our own managed Kubernetes EKS on AWS. >> Last month you announced a general availability of Trainium, of a chip that's optimized for AI training. Talk about what's special about that chip or what is is customized to the training workloads. >> Yeah, what's unique about the Trainium, is you'll you'll see 40% price performance over any other GPU available in the AWS cloud. And so we've really geared it to be that most price performance of options for our customers. And that's what we like about the silicon team, that we're part of that Annaperna acquisition, is because it really has enabled us to have this differentiation and to not just be innovating at the software level but the entire stack. That Annaperna Labs team develops our network cards, they develop our ARM cards, they developed this Trainium chip. And so that silicon innovation has become a core part of our differentiator from other vendors. And what Trainium allows you to do is perform similar workloads, just at a lower price performance. >> And you also have a chip several years older, called Inferentia- >> Um-hmm. >> Which is for inferencing. What is the difference between, I mean, when would a customer use one versus the other? How would you move the workload? >> What we've seen is customers traditionally have looked for a certain class of machine, more of a compute type that is not as accelerated or as heavy as you would need for Trainium for their inference portion of their workload. So when they do that training they want the really beefy machines that can grind through a lot of data. But when you're doing the inference, it's a little lighter weight. And so it's a different class of machine. And so that's why we've got those two different product lines with the Inferentia being there to support those inference portions of their workflow and the Trainium to be that kind of heavy duty training work. >> And then you advise them on how to migrate their workloads from one to the other? And once the model is trained would they switch to an Inferentia-based instance? >> Definitely, definitely. We help them work through what does that design of that workflow look like? And some customers are very comfortable doing self-service and just kind of building it on their own. Other customers look for a more professional services engagement to say like, "Hey, can you come in and help me work "through how I might modify my workflow to "take full advantage of these resources?" >> The HPC world has been somewhat slower than commercial computing to migrate to the cloud because- >> You're very polite. (panelists all laughing) >> Latency issues, they want to control the workload, they want to, I mean there are even issues with moving large amounts of data back and forth. What do you say to them? I mean what's the argument for ditching the on-prem supercomputer and going all-in on AWS? >> Well, I mean, to be fair, I started at AWS five years ago. And I can tell you when I showed up at Supercomputing, even though I'd been part of this community for many years, they said, "What is AWS doing at Supercomputing?" I know you care, wait, it's Amazon Web Services. You care about the web, can you actually handle supercomputing workloads? Now the thing that very few people appreciated is that yes, we could. Even at that time in 2017, we had customers that were performing HPC workloads. Now that being said, there were some real limitations on what we could perform. And over those past five years, as we've grown as a company, we've started to really eliminate those frictions for customers to migrate their HPC workloads to the AWS cloud. When I started in 2017, we didn't have our elastic fabric adapter, our low-latency interconnect. So customers were stuck with standard TCP/IP. So for their highly demanding open MPI workloads, we just didn't have the latencies to support them. So the jobs didn't run as efficiently as they could. We didn't have Amazon FSx for Lustre, our managed lustre offering for high performant, POSIX-compliant file system, which is kind of the key to a large portion of HPC workloads is you have to have a high-performance file system. We didn't even, I mean, we had about 25 gigs of networking when I started. Now you look at, with our accelerated instances, we've got 400 gigs of networking. So we've really continued to grow across that spectrum and to eliminate a lot of those really, frictions to adoption. I mean, one of the key ones, we had a open source toolkit that was jointly developed by Intel and AWS called CFN Cluster that customers were using to even instantiate their clusters. So, and now we've migrated that all the way to a fully functional supported service at AWS called AWS Parallel Cluster. And so you've seen over those past five years we have had to develop, we've had to grow, we've had to earn the trust of these customers and say come run your workloads on us and we will demonstrate that we can meet your demanding requirements. And at the same time, there's been, I'd say, more of a cultural acceptance. People have gone away from the, again, five years ago, to what are you doing walking around the show, to say, "Okay, I'm not sure I get it. "I need to look at it. "I, okay, I, now, oh, it needs to be a part "of my architecture but the standard questions, "is it secure? "Is it price performant? "How does it compare to my on-prem?" And really culturally, a lot of it is, just getting IT administrators used to, we're not eliminating a whole field, right? We're just upskilling the people that used to rack and stack actual hardware, to now you're learning AWS services and how to operate within that environment. And it's still key to have those people that are really supporting these infrastructures. And so I'd say it's a little bit of a combination of cultural shift over the past five years, to see that cloud is a super important part of HPC workloads, and part of it's been us meeting the the market segment of where we needed to with innovating both at the hardware level and at the software level, which we're going to continue to do. >> You do have an on-prem story though. I mean, you have outposts. We don't hear a lot of talk about outposts lately, but these innovations, like Inferentia, like Trainium, like the networking innovation you're talking about, are these going to make their way into outposts as well? Will that essentially become this supercomputing solution for customers who want to stay on-prem? >> Well, we'll see what the future lies, but we believe that we've got the, as you noted, we've got the hardware, we've got the network, we've got the storage. All those put together gives you a a high-performance computer, right? And whether you want it to be redundant in your local data center or you want it to be accessible via APIs from the AWS cloud, we want to provide that service to you. >> So to be clear, that's not that's not available now, but that is something that could be made available? >> Outposts are available right now, that have this the services that you need. >> All these capabilities? >> Often a move to cloud, an impetus behind it comes from the highest levels in an organization. They're looking at the difference between OpEx versus CapEx. CapEx for a large HPC environment, can be very, very, very high. Are these HPC clusters consumed as an operational expense? Are you essentially renting time, and then a fundamental question, are these multi-tenant environments? Or when you're referring to batches being run in HPC, are these dedicated HPC environments for customers who are running batches against them? When you think about batches, you think of, there are times when batches are being run and there are times when they're not being run. So that would sort of conjure, in the imagination, multi-tenancy, what does that look like? >> Definitely, and that's been, let me start with your second part first is- >> Yeah. That's been a a core area within AWS is we do not see as, okay we're going to, we're going to carve out this super computer and then we're going to allocate that to you. We are going to dynamically allocate multi-tenant resources to you to perform the workloads you need. And especially with the batch environment, we're going to spin up containers on those, and then as the workloads complete we're going to turn those resources over to where they can be utilized by other customers. And so that's where the batch computing component really is powerful, because as you say, you're releasing resources from workloads that you're done with. I can use those for another portion of the workflow for other work. >> Okay, so it makes a huge difference, yeah. >> You mentioned, that five years ago, people couldn't quite believe that AWS was at this conference. Now you've got a booth right out in the center of the action. What kind of questions are you getting? What are people telling you? >> Well, I love being on the show floor. This is like my favorite part is talking to customers and hearing one, what do they love, what do they want more of? Two, what do they wish we were doing that we're not currently doing? And three, what are the friction points that are still exist that, like, how can I make their lives easier? And what we're hearing is, "Can you help me migrate my workloads to the cloud? "Can you give me the information that I need, "both from a price for performance, "for an operational support model, "and really help me be an internal advocate "within my environment to explain "how my resources can be operated proficiently "within the AWS cloud." And a lot of times it's, let's just take your application a subset of your applications and let's benchmark 'em. And really that, AWS, one of the key things is we are a data-driven environment. And so when you take that data and you can help a customer say like, "Let's just not look at hypothetical, "at synthetic benchmarks, let's take "actually the LS-DYNA code that you're running, perhaps. "Let's take the OpenFOAM code that you're running, "that you're running currently "in your on-premises workloads, "and let's run it on AWS cloud "and let's see how it performs." And then we can take that back to your to the decision makers and say, okay, here's the price for performance on AWS, here's what we're currently doing on-premises, how do we think about that? And then that also ties into your earlier question about CapEx versus OpEx. We have models where actual, you can capitalize a longer-term purchase at AWS. So it doesn't have to be, I mean, depending upon the accounting models you want to use, we do have a majority of customers that will stay with that OpEx model, and they like that flexibility of saying, "Okay, spend as you go." We need to have true ups, and make sure that they have insight into what they're doing. I think one of the boogeyman is that, oh, I'm going to spend all my money and I'm not going to know what's available. And so we want to provide the, the cost visibility, the cost controls, to where you feel like, as an HPC administrator you have insight into what your customers are doing and that you have control over that. And so once you kind of take away some of those fears and and give them the information that they need, what you start to see too is, you know what, we really didn't have a lot of those cost visibility and controls with our on-premises hardware. And we've had some customers tell us we had one portion of the workload where this work center was spending thousands of dollars a day. And we went back to them and said, "Hey, we started to show this, "what you were spending on-premises." They went, "Oh, I didn't realize that." And so I think that's part of a cultural thing that, at an HPC, the question was, well on-premises is free. How do you compete with free? And so we need to really change that culturally, to where people see there is no free lunch. You're paying for the resources whether it's on-premises or in the cloud. >> Data scientists don't worry about budgets. >> Wait, on-premises is free? Paul mentioned something that reminded me, you said you were here in 2017, people said AWS, web, what are you even doing here? Now in 2022, you're talking in terms of migrating to cloud. Paul mentioned outposts, let's say that a customer says, "Hey, I'd like you to put "in a thousand-node cluster in this data center "that I happen to own, but from my perspective, "I want to interact with it just like it's "in your data center." In other words, the location doesn't matter. My experience is identical to interacting with AWS in an AWS data center, in a CoLo that works with AWS, but instead it's my physical data center. When we're tracking the percentage of IT that's that is on-prem versus off-prem. What is that? Is that, what I just described, is that cloud? And in five years are you no longer going to be talking about migrating to cloud because people go, "What do you mean migrating to cloud? "What do you even talking about? "What difference does it make?" It's either something that AWS is offering or it's something that someone else is offering. Do you think we'll be at that point in five years, where in this world of virtualization and abstraction, you talked about Kubernetes, we should be there already, thinking in terms of it doesn't matter as long as it meets latency and sovereignty requirements. So that, your prediction, we're all about insights and supercomputing- >> My prediction- >> In five years, will you still be talking about migrating to cloud or will that be something from the past? >> In five years, I still think there will be a component. I think the majority of the assumption will be that things are cloud-native and you start in the cloud and that there are perhaps, an aspect of that, that will be interacting with some sort of an edge device or some sort of an on-premises device. And we hear more and more customers that are saying, "Okay, I can see the future, "I can see that I'm shrinking my footprint." And, you can see them still saying, "I'm not sure how small that beachhead will be, "but right now I want to at least say "that I'm going to operate in that hybrid environment." And so I'd say, again, the pace of this community, I'd say five years we're still going to be talking about migrations, but I'd say the vast majority will be a cloud-native, cloud-first environment. And how do you classify that? That outpost sitting in someone's data center? I'd say we'd still, at least I'll leave that up to the analysts, but I think it would probably come down as cloud spend. >> Great place to end. Ian, you and I now officially have a bet. In five years we're going to come back. My contention is, no we're not going to be talking about it anymore. >> Okay. >> And kids in college are going to be like, "What do you mean cloud, it's all IT, it's all IT." And they won't remember this whole phase of moving to cloud and back and forth. With that, join us in five years to see the result of this mega-bet between Ian and Dave. I'm Dave Nicholson with theCUBE, here at Supercomputing Conference 2022, day three of our coverage with my co-host Paul Gillin. Thanks again for joining us. Stay tuned, after this short break, we'll be back with more action. (lively music)

Published Date : Nov 17 2022

SUMMARY :

Welcome back to theCUBE's coverage What are we going to talk about? Let's dive right in. in the queue starts to drop, does it have to be of say the traditional HPC workflow, So is the intersection of Kubernetes And now a lot of CIOs in the to the training workloads. And what Trainium allows you What is the difference between, to be that kind of heavy to say like, "Hey, can you You're very polite. to control the workload, to what are you doing I mean, you have outposts. And whether you want it to be redundant that have this the services that you need. Often a move to cloud, to you to perform the workloads you need. Okay, so it makes a What kind of questions are you getting? the cost controls, to where you feel like, And in five years are you no And so I'd say, again, the not going to be talking of moving to cloud and back and forth.

ENTITIES

Entity	Category	Confidence
Ian	PERSON	0.99+
Paul	PERSON	0.99+
Dave Nicholson	PERSON	0.99+
Paul Gillin	PERSON	0.99+
Dave	PERSON	0.99+
AWS	ORGANIZATION	0.99+
400 gigs	QUANTITY	0.99+
2017	DATE	0.99+
Ian Colle	PERSON	0.99+
thousands	QUANTITY	0.99+
Dallas	LOCATION	0.99+
40%	QUANTITY	0.99+
Amazon Web Services	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
2022	DATE	0.99+
Annaperna	ORGANIZATION	0.99+
second part	QUANTITY	0.99+
five years	QUANTITY	0.99+
Last month	DATE	0.99+
Intel	ORGANIZATION	0.99+
five years ago	DATE	0.98+
five	QUANTITY	0.98+
Two	QUANTITY	0.98+
Supercomputing	ORGANIZATION	0.98+
Lustre	ORGANIZATION	0.97+
Annaperna Labs	ORGANIZATION	0.97+
Trainium	ORGANIZATION	0.97+
five years	QUANTITY	0.96+
one	QUANTITY	0.96+
OpEx	TITLE	0.96+
both	QUANTITY	0.96+
first thing	QUANTITY	0.96+
Supercomputing Conference	EVENT	0.96+
first	QUANTITY	0.96+
West Coast	LOCATION	0.96+
thousands of dollars a day	QUANTITY	0.96+
Supercomputing Conference 2022	EVENT	0.95+
CapEx	TITLE	0.94+
three	QUANTITY	0.94+
theCUBE	ORGANIZATION	0.92+
East Coast	LOCATION	0.91+
single region	QUANTITY	0.91+
years	QUANTITY	0.91+
thousands of nodes	QUANTITY	0.88+
Parallel Cluster	TITLE	0.87+
about 25 gigs	QUANTITY	0.87+

Mai Lan Tomsen Bukovec & Wayne Duso, AWS | AWS re:Invent 2021

>>Hi, buddy. Welcome back to the keeps coverage of AWS 2021. Re-invent you're watching the cube and I'm really excited. We're going to go outside the storage box. I like to say with my lawn Thompson Bukovac, who's the vice-president of block and object storage and Wayne Duso was a VP of storage edge and data governance guys. Great to see you again, we saw you at storage day, the 15 year anniversary of AWS, of course, the first product service ever. So awesome to be here. Isn't it. Wow. >>So much energy in the room. It's so great to see customers learning from each other, learning from AWS, learning from the things that you're observing as well. >>A lot of companies decided not to do physical events. I think you guys are on the right side of history. We're going to show you, you weren't exactly positive. How many people are going to show up. Everybody showed. I mean, it's packed house here, so >>Number 10. Yeah. >>All right. So let's get right into it. Uh, news of the week. >>So much to say, when you want to kick this off, >>We had a, we had a great set of announcements that Milan, uh, talked about yesterday, uh, in her talk and, and a couple of them in the file space, specifically a new, uh, member of the FSX family. And if you remember that the FSA, Amazon FSX is, uh, for customers who want to run fully managed versions of third party and open source file systems on AWS. And so yesterday we announced a new member it's FSX for open ZFS. >>Okay, cool. And there's more, >>Well, there's more, I mean, one of the great things about the new match file service world and CFS is it's powered by gravity. >>It is taught by Gravatar and all of the capabilities that AWS brings in terms of networking, storage, and compute, uh, to our customers. >>So this is really important. I want the audience to understand this. So I I've talked on the cube about how a large proportion let's call it. 30% of the CPU cycles are kind of wasted really on things like offloads, and we could be much more efficient, so graviton much more efficient, lower power and better price performance, lower cost. Amazon is now on a new curve, uh, cycles are faster for processors, and you can take advantage of that in storage it's storage users, compute >>That's right? In fact, you have that big launch as well for luster, with gravity. >>We did in fact, uh, so with, with, uh, Yasmin of open CFS, we also announced the next gen Lustre offering. And both of these offerings, uh, provide a five X improvement in performance. For example, now with luster, uh, customers can drive up to one terabyte per second of throughput, which is simply amazing. And with open CFS, right out of, right out of the box at GA a million IOPS at sub-millisecond latencies taking advantage of gravitas, taking advantage of our storage and networking capabilities. >>Well, I guess it's for HPC workloads, but what's the difference between these days HPC, big data, data intensive, a lot of AI stuff, >>All right. You to just, there's a lot of intersection between all of those different types of workloads they have, as you said, and you know, it all, it all depends on it all matters. And this is the reason why having the suite of capabilities that the, if you would, the members of the family is so important to our guests. >>We've talked a lot about, it's really can't think about traditional storage as a traditional storage anymore. And certainly your world's not a box. It's really a data platform, but maybe you could give us your point of view on that. >>Yeah, I think, you know, if, if we look, if we take a step back and we think about how does AWS do storage? Uh, we think along multiple dimensions, we have the dimension that Wayne's talking about, where you bring together the power of compute and storage for these managed file services that are so popular. You and I talked about, um, NetApp ONTAP. Uh, we went into some detail on that with you as well, and that's been enormously popular. And so that whole dimension of these managed file services is all about where is the customer today and how can we help them get to the cloud? But then you think about the other things that we're also imagining, and we're, re-imagining how customers want to grow those applications and scale them. And so a great example here at reinvent is let's just take the concept of archive. >>So many people, when they think about archive, they think about taking that piece of data and putting it away on tape, putting it away in a closet somewhere, never pulling it out. We don't think about archive like that archive just happens to be data that you just aren't using at the moment, but when you need it, you need it right away. And that's why we built a new storage class that we launched just yesterday, Dave, and it's called glacier instead of retrieval, it has retrieval and milliseconds, just like an Esri storage class has the same pricing of four tenths of a cent as glacier archive. >>So what's interesting at the analyst event today, Adam got a question about, and somebody was poking at him, you know, analysts can be snarky sometimes about, you know, price, declines and so forth. And he said, you know, one of the, one of the things that's not always shown up and we don't always get credit for lowering prices, but we might lower costs. And there's the archive and deep archive is an example of that. Maybe you could explain that point of view. >>Yeah. The way we look at it is that our customers, when they talk to us about the cost of storage, they talked to us about the total cost of the storage, and it's not just storing the data, it's retrieving it and using it. And so we have done an amazing amount across all the portfolio around reducing costs. We have glacier answer retrieval, which is 68% cheaper than standard infrequent access. That's a big cost reduction. We have EBS snapshots archive, which we introduced yesterday, 75% cheaper to archive a snapshot. And these are the types of that just transform the total cost. And in some cases we just eliminate costs. And so the glacier storage class, all bulk retrievals of data from the glacier storage class five to 12 hours, it's now free of charge. If you don't even have to think about, we didn't even reduce it. We just eliminated the cost of that data retrieval >>And additive to what Milan said around, uh, archiving. If you look at what we've done throughout the entire year, you know, a interesting statistic that was brought up yesterday is over the course of 2021, between our respective teams, we've launched over 105 capabilities for our customers throughout this year. And in some of them, for instance, on the file side for EFS, we launched one zone which reduced, uh, customer costs by 47%. Uh, you can now achieve on EFS, uh, cost of roughly 4.30 cents per gigabyte month on, uh, FSX, we've reduced costs up to 92%, uh, on Lustre and FSX for windows and with the introduction of ONTAP and open CFS, we continue those forward, including customers ability to compress and Dedoose against those costs. So they ended up seeing a considerable savings, even over what our standard low prices are. >>100 plus, what can I call them releases? And how can you categorize those? Are they features of eight? Do they fall into, >>Because they range for major services, like what we've launched with open ZFS to major features and really 95 of those were launched before re-invent. And so really what you have between the different teams that work in storage is you have this relentless drive to improve all the storage platforms. And we do it all across the course of the year, all across the course of the year. And in some cases, the benefit shows up at no cost at all to a customer. >>Uh, how, how did this, it seems like you're on an accelerated pace, a S3 EBS, and then like hundreds of services. I guess the question is how come it took so long and how is it accelerating now? Is it just like, there was so much focus on compute before you had to get that in place, or, but now it's just rapidly accessing, >>I I'll tell you, Dave, we took the time to count this year. And so we came to you with this number of 106, uh, that acceleration has been in place for many years. We just didn't take the time to couch. Correct. So this has been happening for years and years. Wayne and I have been with AWS for, for a long time now for 10 plus years. And really that velocity that we're talking about right now that has been happening every single year, which is where you have storage today. And I got to tell you, innovation is in our DNA and we are not going to stop now >>So 10 years. Okay. So it was really, the first five years was kind of slow. And then >>I think that's true at all. I don't think that try, you know, if you, if you look at, uh, the services that we have, we have the most complete portfolio of any cloud provider when it comes to storage and data. And so over the years, we've added to the foundation, which is S3 and the foundation, which is EBS. We've come out with a number of storage services in the, in the file space. Now you have an entire suite of persistent data stores within AWS and the teams behind those that are able to accelerate that pace. Just to give you an example, when I joined 10 years ago, AWS launched within that year, roughly a hundred and twenty, a hundred and twenty eight services or features our teams together this year have launched almost that many, just in those in, just in this space. So AWS continues to accelerate the storage teams continue to accelerate. And as my line said, we just started counting >>The thing. And if you think about those first five years, that was laying the baseline to launch us three, to launch EBS, to get that foundation in place, get lifecycle policies in place. But really, I think you're just going to see an even faster acceleration that number's going up. >>No, I that's what I'm saying. It does appear that way. And you had to build a team and put teams in place. And so that's, you know, part of the equation. But again, I come back to, it's not even, I don't even think of it as storage anymore. It's it's data. People are data lake is here to stay. You might not like the term. We always use the joke about a data ocean, but data lake is here to say 200,000 data lakes. Now we heard Adam talk about, uh, this morning. I think it was Adam. No, it was Swami. Do you want a thousand data lakes in your customer base now? And people are adding value to that data in new ways, injecting machine intelligence, you know, SageMaker is a big piece of that. Tying it in. I know a lot of customers are using glue as catalogs and which I'm like, wow, is glue a catalog or, I mean, it's just so flexible. So what are you seeing customers do with that base of data now and driving new business value? Because I've said last decade plus has been about it transformation. And now we're seeing business transformation. Maybe you could talk about that a little bit. >>Well, the base of every data lake is going to be as three yesterday has over 200 trillion objects. Now, Dave, and if you think about that, if you took every person on the planet, each of those people would have 26,000 S3 objects. It's gotten that big. And you know, if you think about the base of data with 200 trillion plus objects, really the opportunity for innovation is limitless. And you know, a great example for that is it's not just business value. It's really the new customer experiences that our customers are inventing the NFL. Uh, they, you know, they have that application called digital athlete where, you know, they started off with 10,000 labeled images or up to 20,000 labeled images now. And they're all using it to drive machine learning models that help predict and support the players on the field when they start to see things unfold that might cause injury. That is a brand new experience. And it's only possible with vast amounts of data >>Additive to when my line said, we're, we're in you talk about business transformation. We are in the age of data and we represent storage services. But what we really represent is what our customers hold one of their most valuable assets, which is their data. And that set of data is only growing. And the ability to use that data, to leverage that data for value, whether it's ML training, whether it's analytics, that's only accelerated, this is the feedback we get from our customers. This is where these features and new capabilities come from. So that's, what's really accelerating our pace >>Guys. I wish we had more time. I'd have to have you back because we're on a tight clock here, but, um, so great to see you both especially live. I hope we get to do more of this in 2022. I'm an optimist. Okay. And keep it right there, everybody. This is Dave Volante for the cube you're leader in live tech coverage, right back.

Published Date : Dec 2 2021

SUMMARY :

Great to see you again, we saw you at storage day, the 15 year anniversary of AWS, So much energy in the room. I think you guys are on the right side of history. Uh, news of the week. And if you remember that the FSA, And there's more, Well, there's more, I mean, one of the great things about the new match file service world and CFS is it's powered It is taught by Gravatar and all of the capabilities that AWS brings a new curve, uh, cycles are faster for processors, and you can take advantage of that in storage In fact, you have that big launch as well for luster, with gravity. And both of these offerings, You to just, there's a lot of intersection between all of those different types of workloads they have, as you said, but maybe you could give us your point of view on that. Uh, we went into some detail on that with you as well, and that's been enormously popular. that you just aren't using at the moment, but when you need it, you need it right away. And he said, you know, one of the, one of the things that's not always shown up and we don't always get credit for And so the glacier storage class, the entire year, you know, a interesting statistic that was brought up yesterday is over the course And so really what you have between the different there was so much focus on compute before you had to get that in place, or, but now it's just And so we came to you And then I don't think that try, you know, if you, And if you think about those first five years, that was laying the baseline to launch us three, And so that's, you know, part of the equation. And you know, a great example for that is it's not just business value. And the ability to use that data, to leverage that data for value, whether it's ML training, I'd have to have you back because we're on a tight clock here,

ENTITIES

Entity	Category	Confidence
Dave Volante	PERSON	0.99+
Dave	PERSON	0.99+
Wayne	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Adam	PERSON	0.99+
2022	DATE	0.99+
30%	QUANTITY	0.99+
10 plus years	QUANTITY	0.99+
75%	QUANTITY	0.99+
47%	QUANTITY	0.99+
68%	QUANTITY	0.99+
10 years	QUANTITY	0.99+
Wayne Duso	PERSON	0.99+
yesterday	DATE	0.99+
2021	DATE	0.99+
95	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
five	QUANTITY	0.99+
Yasmin	PERSON	0.99+
200,000 data lakes	QUANTITY	0.99+
10,000 labeled images	QUANTITY	0.99+
12 hours	QUANTITY	0.99+
first five years	QUANTITY	0.99+
FSX	TITLE	0.99+
10 years ago	DATE	0.98+
over 200 trillion objects	QUANTITY	0.98+
today	DATE	0.98+
each	QUANTITY	0.98+
this year	DATE	0.98+
one	QUANTITY	0.98+
three	QUANTITY	0.97+
both	QUANTITY	0.97+
S3	COMMERCIAL_ITEM	0.97+
up to 20,000 labeled images	QUANTITY	0.97+
eight	QUANTITY	0.96+
one zone	QUANTITY	0.96+
five X	QUANTITY	0.96+
NetApp	TITLE	0.95+
200 trillion plus objects	QUANTITY	0.95+
last decade	DATE	0.95+
a hundred and twenty, a hundred and twenty eight services	QUANTITY	0.95+
this morning	DATE	0.94+
EBS	ORGANIZATION	0.94+
over 105 capabilities	QUANTITY	0.94+
ONTAP	TITLE	0.93+
4.30 cents	QUANTITY	0.93+
100 plus	QUANTITY	0.92+
Swami	PERSON	0.92+
up to 92%	QUANTITY	0.91+
NFL	ORGANIZATION	0.9+
CFS	TITLE	0.9+
Milan	PERSON	0.89+
15 year anniversary	QUANTITY	0.88+
single year	QUANTITY	0.87+
SageMaker	ORGANIZATION	0.87+
four tenths of a cent	QUANTITY	0.87+
Gravatar	ORGANIZATION	0.86+
Invent	EVENT	0.85+
hundreds of services	QUANTITY	0.84+
a million	QUANTITY	0.84+
windows	TITLE	0.82+
Mai Lan Tomsen Bukovec	PERSON	0.81+

Ed Naim & Anthony Lye | AWS Storage Day 2021

(upbeat music) >> Welcome back to AWS storage day. This is the Cubes continuous coverage. My name is Dave Vellante, and we're going to talk about file storage. 80% of the world's data is in unstructured storage. And most of that is in file format. Devs want infrastructure as code. They want to be able to provision and manage storage through an API, and they want that cloud agility. They want to be able to scale up, scale down, pay by the drink. And the big news of storage day was really the partnership, deep partnership between AWS and NetApp. And with me to talk about that as Ed Naim, who's the general manager of Amazon FSX and Anthony Lye, executive vice president and GM of public cloud at NetApp. Two Cube alums. Great to see you guys again. Thanks for coming on. >> Thanks for having us. >> So Ed, let me start with you. You launched FSX 2018 at re-invent. How has it being used today? >> Well, we've talked about MSX on the Cube before Dave, but let me start by recapping that FSX makes it easy to, to launch and run fully managed feature rich high performance file storage in the cloud. And we built MSX from the ground up really to have the reliability, the scalability you were talking about. The simplicity to support, a really wide range of workloads and applications. And with FSX customers choose the file system that powers their file storage with full access to the file systems feature sets, the performance profiles and the data management capabilities. And so since reinvent 2018, when we launched this service, we've offered two file system choices for customers. So the first was a Windows file server, and that's really storage built on top of Windows server designed as a really simple solution for Windows applications that require shared storage. And then Lustre, which is an open source file system that's the world's most popular high-performance file system. And the Amazon FSX model has really resonated strongly with customers for a few reasons. So first, for customers who currently managed network attached storage or NAS on premises, it's such an easy path to move their applications and their application data to the cloud. FSX works and feels like the NAZA appliances that they're used to, but added to all of that are the benefits of a fully managed cloud service. And second, for builders developing modern new apps, it helps them deliver fast, consistent experiences for Windows and Linux in a simple and an agile way. And then third, for research scientists, its storage performance and its capabilities for dealing with data at scale really make it a no-brainer storage solution. And so as a result, the service is being used for a pretty wide spectrum of applications and workloads across industries. So I'll give you a couple of examples. So there's this class of what we call common enterprise IT use cases. So think of things like end user file shares the corporate IT applications, content management systems, highly available database deployments. And then there's a variety of common line of business and vertical workloads that are running on FSX as well. So financial services, there's a lot of modeling and analytics, workloads, life sciences, a lot of genomics analysis, media and entertainment rendering and transcoding and visual effects, automotive. We have a lot of electronic control units, simulations, and object detection, semiconductor, a lot of EDA, electronic design automation. And then oil and gas, seismic data processing, pretty common workload in FSX. And then there's a class of, of really ultra high performance workloads that are running on FSX as well. Think of things like big data analytics. So SAS grid is a, is a common application. A lot of machine learning model training, and then a lot of what people would consider traditional or classic high performance computing or HPC. >> Great. Thank you for that. Just quick follow-up if I may, and I want to bring Anthony into the conversation. So why NetApp? This is not a Barney deal, this was not elbow grease going into a Barney deal. You know, I love you. You love me. We do a press release. But, but why NetApp? Why ONTAP? Why now? (momentary silence) Ed, that was to you. >> Was that a question for Anthony? >> No, for you Ed. And then I want to bring Anthony in. >> Oh, Sure. Sorry. Okay. Sure. Yeah, I mean it, uh, Dave, it really stemmed from both companies realizing a combined offering would be highly valuable to and impactful for customers. In reality, we started collaborating in Amazon and NetApp on the service probably about two years ago. And we really had a joint vision that we wanted to provide AWS customers with the full power of ONTAP. The complete ONTAP with every capability and with ONTAP's full performance, but fully managed an offer as a full-blown AWS native service. So what that would mean is that customers get all of ONTAP's benefits along with the simplicity and the agility, the scalability, the security, and the reliability of an AWS service. >> Great. Thank you. So Anthony, I have watched NetApp reinvent itself started in workstations, saw you go into the enterprise, I saw you lean into virtualization, you told me at least two years, it might've been three years ago, Dave, we are going all in on the cloud. We're going to lead this next, next chapter. And so, I want you to bring in your perspective. You're re-inventing NetApp yet again, you know, what are your thoughts? >> Well, you know, NetApp and AWS have had a very long relationship. I think it probably dates now about nine years. And what we really wanted to do in NetApp was give the most important constituent of all an experience that helped them progress their business. So ONTAP, you know, the industry's leading shared storage platform, we wanted to make sure that in AWS, it was as good as it was on premise. We love the idea of giving customers this wonderful concept of symmetry. You know, ONTAP runs the biggest applications in the largest enterprises on the planet. And we wanted to give not just those customers an opportunity to embrace the Amazon cloud, but we wanted to also extend the capabilities of ONTAP through FSX to a new customer audience. Maybe those smaller companies that didn't really purchase on premise infrastructure, people that were born in the cloud. And of course, this gives us a great opportunity to present a fully managed ONTAP within the FSX platform, to a lot of non NetApp customers, to our competitors customers, Dave, that frankly, haven't done the same as we've done. And I think we are the benefactors of it, and we're in turn passing that innovation, that, that transformation onto the, to the customers and the partners. >> You know, one is the, the key aspect here is that it's a managed service. I don't think that could be, you know, overstated. And the other is that the cloud nativeness of this Anthony, you mentioned here, our marketplace is great, but this is some serious engineering going on here. So Ed maybe, maybe start with the perspective of a managed service. I mean, what does that mean? The whole ball of wax? >> Yeah. I mean, what it means to a customer is they go into the AWS console or they go to the AWS SDK or the, the AWS CLI and they are easily able to provision a resource provision, a file system, and it automatically will get built for them. And if there's nothing that they need to do at that point, they get an endpoint that they have access to the file system from and that's it. We handle patching, we handle all of the provisioning, we handle any hardware replacements that might need to happen along the way. Everything is fully managed. So the customer really can focus not on managing their file system, but on doing all of the other things that they, that they want to do and that they need to do. >> So. So Anthony, in a way you're disrupting yourself, which is kind of what you told me a couple of years ago. You're not afraid to do that because if we don't do it, somebody else is going to do it because you're, you're used to the old days, you're selling a box and you say, we'll see you next time, you know, three or four years. So from, from your customer's standpoint, what's their reaction to this notion of a managed service and what does it mean to NetApp? >> Well, so I think the most important thing it does is it gives them investment protection. The wonderful thing about what we've built with Amazon in the FSX profile is it's a complete ONTAP. And so one ONTAP cluster on premise can immediately see and connect to an ONTAP environment under FSX. We can then establish various different connectivities. We can use snap mirror technologies for disaster recovery. We can use efficient data transfer for things like dev test and backup. Of course, the wonderful thing that we've done, that we've gone beyond, above and beyond, what anybody else has done is we want to make sure that the actual primary application itself, one that was sort of built using NAS built in an on-premise environment an SAP and Oracle, et cetera, as Ed said, that we can move those over and have the confidence to run the application with no changes on an Amazon environment. So, so what we've really done, I think for customers, the NetApp customers, the non NetApp customers, is we've given them an enterprise grade shared storage platform that's as good in an Amazon cloud as it was in an on-premise data center. And that's something that's very unique to us. >> Can we talk a little bit more about those, those use cases? You know, both, both of you. What are you seeing as some of the more interesting ones that you can share? Ed, maybe you can start. >> Yeah, happy to. The customer discussions that we've, we've been in have really highlighted four cases, four use cases the customers are telling us they'll use a service for. So maybe I'll cover two and maybe Anthony can cover the other two. So, the first is application migrations. And customers are increasingly looking to move their applications to AWS. And a lot of those are applications work with file storage today. And so we're talking about applications like SAP. We're talking about relational databases like SQL server and Oracle. We're talking about vertical applications like Epic and the healthcare space. As another example, lots of media entertainment, rendering, and transcoding, and visual effects workload. workflows require Windows, Linux, and Mac iOS access to the same set of data. And what application administrators really want is they want the easy button. They want fully featured file storage that has the same capabilities, the same performance that their applications are used to. Has extremely high availability and durability, and it can easily enable them to meet compliance and security needs with a robust set of data protection and security capabilities. And I'll give you an example, Accenture, for example, has told us that a key obstacle their clients face when migrating to the cloud is potentially re-architecting their applications to adopt new technologies. And they expect that Amazon FSX for NetApp ONTAP will significantly accelerate their customers migrations to the cloud. Then a second one is storage migrations. So storage admins are increasingly looking to extend their on-premise storage to the cloud. And why they want to do that is they want to be more agile and they want to be responsive to growing data sets and growing workload needs. They want to last to capacity. They want the ability to spin up and spin down. They want easy disaster recovery across geographically isolated regions. They want the ability to change performance levels at any time. So all of this goodness that they get from the cloud is what they want. And more and more of them also are looking to make their company's data accessible to cloud services for analytics and processing. So services like ECS and EKS and workspaces and App Stream and VMware cloud and SageMaker and orchestration services like parallel cluster and AWS batch. But at the same time, they want all these cloud benefits, but at the same time, they have established data management workflows, and they build processes and they've built automation, leveraging APIs and capabilities of on-prem NAS appliances. It's really tough for them to just start from scratch with that stuff. So this offering provides them the best of both worlds. They get the benefits of the cloud with the NAS data management capabilities that they're used to. >> Right. >> Ed: So Anthony, maybe, do you want to talk about the other two? >> Well, so, you know, first and foremost, you heard from Ed earlier on the, the, the FSX sort of construct and how successful it's been. And one of the real reasons it's been so successful is, it takes advantage of all of the latest storage technologies, compute technologies, networking technologies. What's great is all of that's hidden from the user. What FSX does is it delivers a service. And what that means for an ONTAP customer is you're going to have ONTAP with an SLA and an SLM. You're going to have hundreds of thousands of IOPS available to you and sub-millisecond latencies. What's also really important is the design for FSX and app ONTAP was really to provide consistency on the NetApp API and to provide full access to ONTAP from the Amazon console, the Amazon SDK, or the Amazon CLI. So in this case, you've got this wonderful benefit of all of the, sort of the 29 years of innovation of NetApp combined with all the innovation AWS, all presented consistently to a customer. What Ed said, which I'm particularly excited about, is customers will see this just as they see any other AWS service. So if they want to use ONTAP in combination with some incremental compute resources, maybe with their own encryption keys, maybe with directory services, they may want to use it with other services like SageMaker. All of those things are immediately exposed to Amazon FSX for the app ONTAP. We do some really intelligent things just in the storage layer. So, for example, we do intelligent tiering. So the customer is constantly getting the, sort of the best TCO. So what that means is we're using Amazon's S3 storage as a tiered service, so that we can back off code data off of the primary file system to give the customer the optimal capacity, the optimal throughput, while maintaining the integrity of the file system. It's the same with backup. It's the same with disaster recovery, whether we're operating in a hybrid AWS cloud, or we're operating in an AWS region or across regions. >> Well, thank you. I think this, this announcement is a big deal for a number of reasons. First of all, it's the largest market. Like you said, you're the gold standard. I'll give you that, Anthony, because you guys earned it. And so it's a large market, but you always had to make previously, you have to make trade-offs. Either I could do file in the cloud, but I didn't get the rich functionality that, you know, NetApp's mature stack brings, or, you know, you could have wrapped your stack in Kubernete's container and thrown it into the cloud and hosted it there. But now that it's a managed service and presumably you're underneath, you're taking advantage. As I say, my inference is there's some serious engineering going on here. You're taking advantage of some of the cloud native capabilities. Yeah, maybe it's the different, you know, ECE two types, but also being able to bring in, we're, we're entering a new data era with machine intelligence and other capabilities that we really didn't have access to last decade. So I want to, I want to close with, you know, give you guys the last word. Maybe each of you could give me your thoughts on how you see this partnership of, for the, in the future. Particularly from a customer standpoint. Ed, maybe you could start. And then Anthony, you can bring us home. >> Yeah, well, Anthony and I and our teams have gotten to know each other really well in, in ideating around what this experience will be and then building the product. And, and we have this, this common vision that it is something that's going to really move the needle for customers. Providing the full ONTAP experience with the power of a, of a native AWS service. So we're really excited. We're, we're in this for the long haul together. We have, we've partnered on everything from engineering, to product management, to support. Like the, the full thing. This is a co-owned effort, a joint effort backed by both companies. And we have, I think a pretty remarkable product on day one, one that I think is going to delight customers. And we have a really rich roadmap that we're going to be building together over, over the years. So I'm excited about getting this in customer's hands. >> Great, thank you. Anthony, bring us home. >> Well, you know, it's one of those sorts of rare chances where you get to do something with Amazon that no one's ever done. You know, we're sort of sitting on the inside, we are a peer of theirs, and we're able to develop at very high speeds in combination with them to release continuously to the customer base. So what you're going to see here is rapid innovation. You're going to see a whole host of new services. Services that NetApp develops, services that Amazon develops. And then the whole ecosystem is going to have access to this, whether they're historically built on the NetApp APIs or increasingly built on the AWS APIs. I think you're going to see orchestrations. I think you're going to see the capabilities expand the overall opportunity for AWS to bring enterprise applications over. For me personally, Dave, you know, I've demonstrated yet again to the NetApp customer base, how much we care about them and their future. Selfishly, you know, I'm looking forward to telling the story to my competitors, customer base, because they haven't done it. So, you know, I think we've been bold. I think we've been committed as you said, three and a half years ago, I promised you that we were going to do everything we possibly could. You know, people always say, you know, what's, what's the real benefit of this. And at the end of the day, customers and partners will be the real winners. This, this innovation, this sort of, as a service I think is going to expand our market, allow our customers to do more with Amazon than they could before. It's one of those rare cases, Dave, where I think one plus one equals about seven, really. >> I love the vision and excited to see the execution Ed and Anthony, thanks so much for coming back in the Cube. Congratulations on getting to this point and good luck. >> Anthony and Ed: Thank you. >> All right. And thank you for watching everybody. This is Dave Vellante for the Cube's continuous coverage of AWS storage day. Keep it right there. (upbeat music)

Published Date : Sep 2 2021

SUMMARY :

And the big news of storage So Ed, let me start with you. And the Amazon FSX model has into the conversation. I want to bring Anthony in. and NetApp on the service And so, I want you to in the largest enterprises on the planet. And the other is that the cloud all of the provisioning, You're not afraid to do that that the actual primary of the more interesting ones and maybe Anthony can cover the other two. of IOPS available to you and First of all, it's the largest market. really move the needle for Great, thank you. the story to my competitors, for coming back in the Cube. This is Dave Vellante for the

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Anthony	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Anthony Lye	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Ed	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Ed Naim	PERSON	0.99+
two	QUANTITY	0.99+
NetApp	ORGANIZATION	0.99+
29 years	QUANTITY	0.99+
FSX	TITLE	0.99+
Barney	ORGANIZATION	0.99+
ONTAP	TITLE	0.99+
one	QUANTITY	0.99+
both	QUANTITY	0.99+
three	QUANTITY	0.99+
80%	QUANTITY	0.99+
both companies	QUANTITY	0.99+
NetApp	TITLE	0.99+
four years	QUANTITY	0.99+
Linux	TITLE	0.99+
Windows	TITLE	0.99+
MSX	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
first	QUANTITY	0.99+

Bill Peterson, MapR - Spark Summit East 2017 - #SparkSummit - #theCUBE

>> Narrator: Live from Boston, Massachusetts, this is theCUBE, covering Spark Summit East 2017. Brought to you by Databricks. Now, here are your hosts Dave Vellante and George Gilbert. >> Welcome back to Boston, everybody, this is theCUBE, the leader in live tech coverage. We're here in Boston, in snowy Boston. This is Spark Summit. Spark Summit does a East Coast version, they do a West Coast version, they've got one in Europe this year. theCUBE has been a partner with Databricks as the live broadcast partner. Our friend Bill Peterson is here. He's the head of partner marketing at MapR. Bill, good to see you again. >> Thank you, thanks for having me. >> So how's the show going for you? >> It's great. >> Give us the vibe. We're kind of windin' down day two. >> It is. The show's been great, we've got a lot of traffic coming by, a lot of deep technical questions which is-- >> Dave: Hardcore at the show-- >> It is, it is. I spend a lot of time there smiling and going, "Yeah, talk to him." (laughs) But it's great. We're getting those deep technical questions and it's great. We actually just got one on Lustre, which I had to think for a minute, oh, HPC. It was like way back in there. >> Dave: You know, Cray's on the floor. >> Oh, yeah that's true. But a lot of our customers as well. UnitedHealth Group, Wells Fargo, AMEX coming by. Which is great to see them and talk to them, but also they've got some deep technical questions for us. So it's moving the needle with existing customers but also new business, which is great. >> So I got to ask a basic question. What is MapR? MapR started in the early days of Hadoop distro, vendor, one of the big three. When somebody says to you what is MapR, what do you say? My answer today is MapR is an enterprise software company that delivers a converged data platform. That converged data platform consists of a file system, a NoSQL database, a Hadoop distribution, a Spark distribution, and a set of data management tools. And as a customer of MapR, you get all of those. You can turn 'em all on if you'd like. You can just turn on the file system, for example, if you wanted to just use the file system for storage. But the enterprise software piece of that is all the hardening we do behind the scenes on things like snapshots, mirroring, data governance, multi-tenancy, ease of use performance, all of that baked in to the solution, or the platform as we're calling it now. So as you're kind of alluding to, a year ago now we kind of got out of that business of saying okay, lead 100% with Hadoop and then while we have your attention, or if we don't, hey wait, we got all this other stuff in the basket we want to show you, we went the platform play and said we're going to include everything and it's all there and then the baseline underneath is the hardening of it, the file system, the database, and the streaming product, actually, which I didn't mention, which is kind of the core, and everything plays off of there. And that honestly has been really well-received. And it just, I feel, makes it so much easier because-- It happened here, we get the question, okay, how are you different from Cloudera or Hortonworks? And some of it here, given the nature of the attendees, is very technical, but there's been a couple of business users that I've talked to. And when I talk about us as an enterprise software company delivering a plethora of solutions versus just Hadoop, you can see the light going on sometimes in people's eyes. And I got it today, earlier, "I had no idea you had a file system," which, to me, just drives me insane because the file system is pretty cool, right? >> Well you guys are early on in investing in that file system and recovery capabilities and all the-- >> Two years in stealth writing it. >> Nasty, gnarly, hard stuff that was kind of poo-pooed early on. >> Yeah, yeah. MapR was never patient about waiting for the open source community to just figure it out and catch up. You always just said all right, we're going to solve this problem and go sell. >> And I'm glad you said that. I want to be clear. We're not giving up on open source or anything, right? Open source is still a big piece. 50% of our engineers' time is working on open source projects. That's still super important to us. And then back in November-ish last year we announced the MapR Ecosystem Packs, which is our effort to help our customers that are using open source components to stay current. 'Cause that's a pain in the butt. So this is a set of packages that have a whole bunch of components. We lead with Spark and Drill, and that was by customer request, that they were having a hard time keeping current with Spark and Drill. So the packs allow them to come up to current level within the converged data platform for all of their open source components. And that's something we're going to do at dot Level, so I think we're at 2.1 or 2 now. The dot levels will bring you up on everything and then the big ones, like the 3.0s, the 4.0s, will bring Spark and Drill current. And so we're going to kind of leapfrog those. So that's still a really important part of our business and we don't want to forget that part, but what we're trying here to do is, via the platform, is deliver all of that in one entity, right? >> So the converged data platform is relevant presumably because you've got the history of Hadoop, 'cause you got all these different components and you got to cobble 'em together and they're different interfaces and different environments, you're trying to unify that and you have unified that, right? >> Yeah, yeah. >> So what is your customer feedback with regard to the converged data platform? >> Yeah so it's a great question because for existing customers, it was like, ah, thank you. It was one of those, right, because we're listening. Actually, again, glad you said that. This week, in addition to Spark Summit we're doing our yearly customer advisory board so we've got, like a lot of vendors, we've got a 30 plus company customer advisory board that we bring in and we sit down with them for a couple of days and they give us feedback on what we should and shouldn't be doing and where, directional and all that, which is super important. And that's where a lot of this converged data platform came out of is the need for... There was just too much, it's kind of confusing. I'll give the example of streams, right? We came out with our streaming product last year and okay, I'm using Hadoop, I'm using your file system, I'm using NoSQL, now you're adding streams, this is great, but now, like MEP, the Ecosystem Packages, I have to keep everything current. You got to make it easier for me, you got to make my life easier for me. So for existing customers it's a stay current, I like this, the model, I can turn on and off what I want when I want. Great model for them, existing business. For new business it gets us out of that Hadoop-only mode, right? I kind of jokingly call us Hadoop plus plus plus plus. We keep adding solutions and add it to a single, cohesive data platform that we keep updated. And as I mentioned here, talking to new customers or new prospects, our potential new business, when I describe the model you can just see the light going on and they realize wow, there's a lot more to this than I had imagined. I got it earlier today, I thought you guys only did Hadoop. Which is a little infuriating as a marketer, but I think from a mechanism and a delivery and a message and a story point of view, it's really helped. >> More Cube time will help get this out there. (laughs) >> Well played, well played. >> It's good to have you back on. Okay, so Spark comes along a couple years ago and it was like ah, what's going to happen to Hadoop? So you guys embraced Spark. Talk more specifically about Spark, where it fits in your platform and the ecosystem generally. >> Spark, Hadoop, others as a entity to bring data into the converged data platform, that's one way to think about it. Way oversimplified, obviously, but that's a really great way, I think, to think about it is if we're going to provide this platform that anybody can query on, you can run analytics against. We talk a lot about now converged applications. So taking historical data, taking operational data, so streaming data, great example. Putting those together and you could use the Data Lake example if you want, that's fine. But putting them into a converged application in the middle where they overlap, kind of typical Venn diagram where they overlap, and that middle part is the converged application. What's feeding that? Well, Spark could be feeding that, Hadoop could be feeding that. Just yesterday we announced a Docker for containers, that could be feeding into the converged data platform as well. So we look at all of these things as an opportunity for us to manage data and to make data accessible at the enterprise level. And then that enterprise level goes back to what I was talkin' before, it's got to have all of those things, like multi-tenancy and snapshots and mirroring and data governance, security, et cetera. But Spark is a big component of that. All of the customers who came by here that I mentioned earlier, which are some really good names for us, are all using Spark to drive data into the converged data platform. So we look at it as we can help them build new applications within converged data platform with that data. So whether it's Spark data, Hadoop data, container data, we don't really care. >> So along those lines, if the focus of intense interest right now is on Spark, and Spark says oh, and we work with all these databases, data storers, file systems, if you approach a customer who's Spark first, what's the message relative to all the other data storers that they can get to through, without getting too techy, their API? >> Sure, sure. I think as you know, George, we support a whole bunch of APIs. So I guess for us it's the breadth. >> But I'm thinking of Spark in particular. If someone says specifically, I want to run Databricks, but I need something underneath it to capture the data and to manage it. >> Well I think that's the beauty of our file system there. As I mentioned, if you think about it from an architectural point of view, our file system along the bottom, or it could be our database or our streaming product, but in this instance-- >> George: That's what I'm getting at too, all three. >> Picture that as the bottom layer as your storage-- I shouldn't say storage layer but as the bottom layer. 'Cause it's not just storage, it's more than storage. Middle layer is maybe some of your open source tools and the like, and then above that is what I called your data delivery mechanisms. Which would be Spark, for example, one bucket. Another bucket could be Hadoop, and another bucket could be these microservices we're talking about. Let my draw the picture another way using a partner, SAP. One of the things we've had some success with SAP is SAP HANA sitting up here. SAP would love to have you put all your data in HANA. It's probably not going to happen. >> George: Yeah, good luck. >> Yeah, good luck, right? But what if you, hey customer, what if you put zero to two years worth of data, historical data, in HANA. Okay, maybe the customer starts nodding their head like you just did. Hey customer, what if you put two to five years worth of data in Business Warehouse. Guess what, you already own that. You've been an SAP customer for awhile, you already have it. Okay, the customer's now really nodding their head. You got their attention. To your original question, whether it's Spark or whatever, five plus years, put it in MapR. >> Oh, and then like HANA Vora could do the query. >> Drill can query across all of them. >> Oh, right including the Business Warehouse, okay. >> So we're running in the file system. That, to me, and we do this obviously with our joint SAP MapR customers, that to me is kind of a really cool vision. And to your original question, if that was Spark at the top feeding it rather than SAP, sure, right? Why not? >> What can you share with us, Bill, about business metrics around MapR? However you choose to share it, head count, want to give us gross margins by product, that's great, but-- (laughs) >> Would you like revenues too, Dave? >> We know they're very high because you're a software company, so that's actually a bad question. I've already profit-- (laughs) >> You don't have to give us top line revenues-- >> So what are you guys saying publicly about the company, its growth. >> That's fair. >> Give us the latest. >> Fantastic, number one. Hiring like crazy, we're well north of 500 people now. I actually, you want to hear a funny story? I yesterday was texting in the booth, with a candidate from my team, back and forth on salary. Did the salary negotiation on text right there in the booth and closed her, she starts on the 27th, so. >> Dave: Congratulations. >> I'm very excited about that. So moving along on that. Seven, 800 plus customers as we talk about... We just finished our fiscal year on January 31st, so we're on Feb one fiscal year. And we always do a momentum press release, which will be coming out soon. Hiring, again, like crazy, as I mentioned, executive staff is all filled in and built to scale which we're really excited about. We talk a lot about the kind of uptake of-- it used to be of the file system, Hadoop, et cetera on its own, but now in this one the momentum release we'll be doing, we'll talk about the converged data platform and the uplift we've seen from that. So we obviously can't talk revenue numbers and the like, but everything... David, I got to tell you, we've been doin' this a long time, all of that is just all moving in the right direction. And then the other example I'll give you from my world, in the partner world. Last year I rebranded our partner to the converged partner program. We're going with this whole converged thing, right? And we established three levels, elite, preferred, and affiliate with different levels there. But also, there's revenue requirements at each level, so elite, preferred, and affiliate, and there's resell and influence revenues, we have MDF funds, not only from the big guys coming to us, but we're paying out MDF funds now to select partners as well. So all of this stuff I always talk about as the maturity of the company, right? We're maturing in our messaging, we're maturing in the level of people who are joining, and we're maturing in the customers and the deals, the deal sizes and volumes that we're seeing. It's all movin' in the right direction. >> Dave: Great, awesome, congratulations. >> Bill: Thank you, yeah, I'm excited. >> Can you talk about number of customers or number of employees relative to last year? >> Oh boy. Honestly, George, I don't know off the top of my head. I apologize, I don't know the metric, but I know it's north of 500 today, of employees, and it's like seven, 800 customers. >> Okay, okay. >> Yeah, yeah. >> And a little bit more on this partner, elite, preferred, and affiliate. >> Affiliate, yeah. >> What did you call it, the converged partners program? >> Converged-- Yeah, yeah. >> What are some of the details of that? >> Sure. So the elites are invite only, and those are some of the bigger ones. So for us, we're-- >> Dave: Like, some examples. >> Cisco, SAP, AWS, others, but those are some of the big ones. And they were looking at things like resell and influence revenue. That's what I track in my... I always jokingly say at MapR, even though we're kind of a big startup now, I always jokingly say at MapR you have three jobs. You have the job you were hired for, you have your Thursday night job, and you have your Sunday night job. (Dave and George laugh) In the job that I was hired for, partner marketing, I track influence and resell revenue. So at the elite level, we're doing both. Like Cisco resells us, so this S-Series, we're in their SKU, their sales reps can go sell an S-Series for big data workloads or analytical workloads, MapR, on it, off you go. Our job then is cashing checks, which I like. That's a good job to have in this business. At the preferred level it's kind of that next tier of big players, but revenue thresholds haven't moved into the elite yet. Partners in there, like the MicroStrategies of the world, we're doing a lot with them, Tableau, Talend, a lot of the BI vendors in there. And then the affiliates are the smaller guys who maybe we'll do one piece of a campaign during the year with them. So I'll give you an example, Attunity, you guys know those guys right here? >> Sure >> Yeah, yeah. >> Last year we were doing a campaign on DWO, data warehouse offload. We wanted to bring them in but this was a MapR campaign running for a quarter, and we're typical, like a lot of companies, we run four campaigns a year and then my partner in field stuff kind of opts into that and we run stuff to support it. And then corporate marketing does something. Pretty traditional. But what I try and do is pull these partners into those campaigns. So we did a webinar with Attunity as part of that campaign. So at the affiliate level, the lower level, we're not doing a full go-to-market like we would with the elites at the top, but they're being brought into our campaigns and then obviously hopefully, we hope on the other side they're going to pull us in as well. >> Great, last question. What should we pay attention to, what's comin' up? >> Yeah, so-- >> Let's see, we got some events, we got Strata coming up you'll be out your way, or out MapR way. >> As my Twitter handle says, seat 11A. That's where I am. (laughs) Yeah, I mean the Docker announcement we're really excited about, and microservices. You'll see more from us on the whole microservices thing. Streaming is still a big one, we think, for this year. You guys probably agree. That's why we announced the MapR streaming product last year. So again, from a go-to-market point of view and kind of putting some meat behind streaming not only MapR but with partners, so streaming as a component and a delivery model for managing data in CDP. I think that's a big one. Machine learning is something that we're seeing more and more touching us from a number of customers but also from the partner perspective. I see all the partner requests that come in to join the partner program, and there's been an uptick in the machine learning customers that want to come in and-- Excuse me, partners, that want to be talking to us. Which I think is really interesting. >> Where you would be the sort of prediction serving layer? >> Exactly, exactly. Or a data store. A lot of them are looking for just an easy data store that the MapR file system can do. >> Infrastructure to support that, yeah. >> Commodity, right? The whole old promise of Hadoop or just a generic file system is give me easy access to storage on commodity hardware. The machine learning-- >> That works. >> Right. The existing machine learning vendors need an answer for that. When the customer asks them, they want just an easy answer, say oh, we just use MapR FS for that and we're done. Okay, that's fine with me, I'll take that one. >> So that's the operational end of that machine learning pipeline that we call DevOps for data scientists? >> Correct, right. I guess the nice synergy there is the whole, going back to the Docker microservices one, there's a DevOps component there as well. So, might be interesting marrying those together. >> All right, we got to go, Bill, thanks very much, good to see you again. >> All right, thank you. >> All right, George and I will be back to wrap. We're going to part two of our big data forecast right now, so stay with us, right back. (digital music) (synth music)

Published Date : Feb 9 2017

SUMMARY :

Brought to you by Databricks. Bill, good to see you again. We're kind of windin' down day two. a lot of deep technical questions which is-- "Yeah, talk to him." So it's moving the needle with existing customers is all the hardening we do behind the scenes that was kind of poo-pooed early on. You always just said all right, we're going to solve So the packs allow them to come up to current level I got it earlier today, I thought you guys only did Hadoop. More Cube time will help get this out there. It's good to have you back on. and that middle part is the converged application. I think as you know, George, we support and to manage it. our file system along the bottom, and the like, and then above that is what I called Okay, maybe the customer starts nodding their head And to your original question, if that was Spark at the top so that's actually a bad question. So what are you guys saying publicly and closed her, she starts on the 27th, so. all of that is just all moving in the right direction. Honestly, George, I don't know off the top of my head. And a little bit more on this partner, elite, Yeah, yeah. So the elites are invite only, So at the elite level, we're doing both. So at the affiliate level, the lower level, What should we pay attention to, what's comin' up? Let's see, we got some events, we got Strata coming up I see all the partner requests that come in that the MapR file system can do. to storage on commodity hardware. When the customer asks them, they want just an easy answer, I guess the nice synergy there is the whole, thanks very much, good to see you again. We're going to part two of our big data forecast

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
George	PERSON	0.99+
Dave Vellante	PERSON	0.99+
UnitedHealth Group	ORGANIZATION	0.99+
George Gilbert	PERSON	0.99+
AMEX	ORGANIZATION	0.99+
Bill Peterson	PERSON	0.99+
Boston	LOCATION	0.99+
Dave	PERSON	0.99+
Cisco	ORGANIZATION	0.99+
Europe	LOCATION	0.99+
two	QUANTITY	0.99+
MapR	ORGANIZATION	0.99+
Wells Fargo	ORGANIZATION	0.99+
Last year	DATE	0.99+
50%	QUANTITY	0.99+
five years	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
Databricks	ORGANIZATION	0.99+
yesterday	DATE	0.99+
two years	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
Bill	PERSON	0.99+
Cloudera	ORGANIZATION	0.99+
30 plus	QUANTITY	0.99+
zero	QUANTITY	0.99+
last year	DATE	0.99+
Two years	QUANTITY	0.99+
today	DATE	0.99+
November	DATE	0.99+
both	QUANTITY	0.99+
January 31st	DATE	0.99+
Feb one	DATE	0.99+
HANA	TITLE	0.99+
This week	DATE	0.99+
Thursday night	DATE	0.99+
SAP	ORGANIZATION	0.99+
Sunday night	DATE	0.99+
five plus years	QUANTITY	0.99+
three jobs	QUANTITY	0.99+
Tableau	ORGANIZATION	0.99+
Boston, Massachusetts	LOCATION	0.99+
Seven, 800 plus customers	QUANTITY	0.99+
100%	QUANTITY	0.98+
Talend	ORGANIZATION	0.98+
NoSQL	TITLE	0.98+
Hadoop	TITLE	0.98+
seven, 800 customers	QUANTITY	0.98+
each level	QUANTITY	0.98+
a year ago	DATE	0.98+
Spark	TITLE	0.98+
Twitter	ORGANIZATION	0.98+
this year	DATE	0.98+
theCUBE	ORGANIZATION	0.98+
day two	QUANTITY	0.98+
27th	DATE	0.97+
One	QUANTITY	0.97+
one	QUANTITY	0.97+
SAP HANA	TITLE	0.97+
Spark Summit	EVENT	0.97+
East Coast	LOCATION	0.96+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Lustre: