Joseph Nelson, Roboflow | AWS Startup Showcase

(chill electronic music) >> Hello everyone, welcome to theCUBE's presentation of the AWS Startups Showcase, AI and machine learning, the top startups building generative AI on AWS. This is the season three, episode one of the ongoing series covering the exciting startups from the AWS ecosystem, talk about AI and machine learning. Can't believe it's three years and season one. I'm your host, John Furrier. Got a great guest today, we're joined by Joseph Nelson, the co-founder and CEO of Roboflow, doing some cutting edge stuff around computer vision and really at the front end of this massive wave coming around, large language models, computer vision. The next gen AI is here, and it's just getting started. We haven't even scratched a service. Thanks for joining us today. >> Thanks for having me. >> So you got to love the large language model, foundation models, really educating the mainstream world. ChatGPT has got everyone in the frenzy. This is educating the world around this next gen AI capabilities, enterprise, image and video data, all a big part of it. I mean the edge of the network, Mobile World Conference is happening right now, this month, and it's just ending up, it's just continue to explode. Video is huge. So take us through the company, do a quick explanation of what you guys are doing, when you were founded. Talk about what the company's mission is, and what's your North Star, why do you exist? >> Yeah, Roboflow exists to really kind of make the world programmable. I like to say make the world be read and write access. And our North Star is enabling developers, predominantly, to build that future. If you look around, anything that you see will have software related to it, and can kind of be turned into software. The limiting reactant though, is how to enable computers and machines to understand things as well as people can. And in a lot of ways, computer vision is that missing element that enables anything that you see to become software. So in the virtue of, if software is eating the world, computer vision kind of makes the aperture infinitely wide. It's something that I kind of like, the way I like to frame it. And the capabilities are there, the open source models are there, the amount of data is there, the computer capabilities are only improving annually, but there's a pretty big dearth of tooling, and an early but promising sign of the explosion of use cases, models, and data sets that companies, developers, hobbyists alike will need to bring these capabilities to bear. So Roboflow is in the game of building the community around that capability, building the use cases that allow developers and enterprises to use computer vision, and providing the tooling for companies and developers to be able to add computer vision, create better data sets, and deploy to production, quickly, easily, safely, invaluably. >> You know, Joseph, the word in production is actually real now. You're seeing a lot more people doing in production activities. That's a real hot one and usually it's slower, but it's gone faster, and I think that's going to be more the same. And I think the parallel between what we're seeing on the large language models coming into computer vision, and as you mentioned, video's data, right? I mean we're doing video right now, we're transcribing it into a transcript, linking up to your linguistics, times and the timestamp, I mean everything's data and that really kind of feeds. So this connection between what we're seeing, the large language and computer vision are coming together kind of cousins, brothers. I mean, how would you compare, how would you explain to someone, because everyone's like on this wave of watching people bang out their homework assignments, and you know, write some hacks on code with some of the open AI technologies, there is a corollary directly related to to the vision side. Can you explain? >> Yeah, the rise of large language models are showing what's possible, especially with text, and I think increasingly will get multimodal as the images and video become ingested. Though there's kind of this still core missing element of basically like understanding. So the rise of large language models kind of create this new area of generative AI, and generative AI in the context of computer vision is a lot of, you know, creating video and image assets and content. There's also this whole surface area to understanding what's already created. Basically digitizing physical, real world things. I mean the Metaverse can't be built if we don't know how to mirror or create or identify the objects that we want to interact with in our everyday lives. And where computer vision comes to play in, especially what we've seen at Roboflow is, you know, a little over a hundred thousand developers now have built with our tools. That's to the tune of a hundred million labeled open source images, over 10,000 pre-trained models. And they've kind of showcased to us all of the ways that computer vision is impacting and bringing the world to life. And these are things that, you know, even before large language models and generative AI, you had pretty impressive capabilities, and when you add the two together, it actually unlocks these kind of new capabilities. So for example, you know, one of our users actually powers the broadcast feeds at Wimbledon. So here we're talking about video, we're streaming, we're doing things live, we've got folks that are cropping and making sure we look good, and audio/visual all plugged in correctly. When you broadcast Wimbledon, you'll notice that the camera controllers need to do things like track the ball, which is moving at extremely high speeds and zoom crop, pan tilt, as well as determine if the ball bounced in or out. The very controversial but critical key to a lot of tennis matches. And a lot of that has been historically done with the trained, but fallible human eye and computer vision is, you know, well suited for this task to say, how do we track, pan, tilt, zoom, and see, track the tennis ball in real time, run at 30 plus frames per second, and do it all on the edge. And those are capabilities that, you know, were kind of like science fiction, maybe even a decade ago, and certainly five years ago. Now the interesting thing, is that with the advent of of generative AI, you can start to do things like create your own training data sets, or kind of create logic around once you have this visual input. And teams at Tesla have actually been speaking about, of course the autopilot team's focused on doing vision tasks, but they've combined large language models to add reasoning and logic. So given that you see, let's say the tennis ball, what do you want to do? And being able to combine the capabilities of what LLM's represent, which is really a lot of basically, core human reasoning and logic, with computer vision for the inputs of what's possible, creates these new capabilities, let alone multimodality, which I'm sure we'll talk more about. >> Yeah, and it's really, I mean it's almost intoxicating. It's amazing that this is so capable because the cloud scales here, you got the edge developing, you can decouple compute power, and let Moore's law and all the new silicone and the processors and the GPUs do their thing, and you got open source booming. You're kind of getting at this next segment I wanted to get into, which is the, how people should be thinking about these advances of the computer vision. So this is now a next wave, it's here. I mean I'd love to have that for baseball because I'm always like, "Oh, it should have been a strike." I'm sure that's going to be coming soon, but what is the computer vision capable of doing today? I guess that's my first question. You hit some of it, unpack that a little bit. What does general AI mean in computer vision? What's the new thing? Because there are old technology's been around, proprietary, bolted onto hardware, but hardware advances at a different pace, but now you got new capabilities, generative AI for vision, what does that mean? >> Yeah, so computer vision, you know, at its core is basically enabling machines, computers, to understand, process, and act on visual data as effective or more effective than people can. Traditionally this has been, you know, task types like classification, which you know, identifying if a given image belongs in a certain category of goods on maybe a retail site, is the shoes or is it clothing? Or object detection, which is, you know, creating bounding boxes, which allows you to do things like count how many things are present, or maybe measure the speed of something, or trigger an alert when something becomes visible in frame that wasn't previously visible in frame, or instant segmentation where you're creating pixel wise segmentations for both instance and semantic segmentation, where you often see these kind of beautiful visuals of the polygon surrounding objects that you see. Then you have key point detection, which is where you see, you know, athletes, and each of their joints are kind of outlined is another more traditional type problem in signal processing and computer vision. With generative AI, you kind of get a whole new class of problem types that are opened up. So in a lot of ways I think about generative AI in computer vision as some of the, you know, problems that you aimed to tackle, might still be better suited for one of the previous task types we were discussing. Some of those problem types may be better suited for using a generative technique, and some are problem types that just previously wouldn't have been possible absent generative AI. And so if you make that kind of Venn diagram in your head, you can think about, okay, you know, visual question answering is a task type where if I give you an image and I say, you know, "How many people are in this image?" We could either build an object detection model that might count all those people, or maybe a visual question answering system would sufficiently answer this type of problem. Let alone generative AI being able to create new training data for old systems. And that's something that we've seen be an increasingly prominent use case for our users, as much as things that we advise our customers and the community writ large to take advantage of. So ultimately those are kind of the traditional task types. I can give you some insight, maybe, into how I think about what's possible today, or five years or ten years as you sort go back. >> Yes, definitely. Let's get into that vision. >> So I kind of think about the types of use cases in terms of what's possible. If you just imagine a very simple bell curve, your normal distribution, for the longest time, the types of things that are in the center of that bell curve are identifying objects that are very common or common objects in context. Microsoft published the COCO Dataset in 2014 of common objects and contexts, of hundreds of thousands of images of chairs, forks, food, person, these sorts of things. And you know, the challenge of the day had always been, how do you identify just those 80 objects? So if we think about the bell curve, that'd be maybe the like dead center of the curve, where there's a lot of those objects present, and it's a very common thing that needs to be identified. But it's a very, very, very small sliver of the distribution. Now if you go out to the way long tail, let's go like deep into the tail of this imagined visual normal distribution, you're going to have a problem like one of our customers, Rivian, in tandem with AWS, is tackling, to do visual quality assurance and manufacturing in production processes. Now only Rivian knows what a Rivian is supposed to look like. Only they know the imagery of what their goods that are going to be produced are. And then between those long tails of proprietary data of highly specific things that need to be understood, in the center of the curve, you have a whole kind of messy middle, type of problems I like to say. The way I think about computer vision advancing, is it's basically you have larger and larger and more capable models that eat from the center out, right? So if you have a model that, you know, understands the 80 classes in COCO, well, pretty soon you have advances like Clip, which was trained on 400 million image text pairs, and has a greater understanding of a wider array of objects than just 80 classes in context. And over time you'll get more and more of these larger models that kind of eat outwards from that center of the distribution. And so the question becomes for companies, when can you rely on maybe a model that just already exists? How do you use your data to get what may be capable off the shelf, so to speak, into something that is usable for you? Or, if you're in those long tails and you have proprietary data, how do you take advantage of the greatest asset you have, which is observed visual information that you want to put to work for your customers, and you're kind of living in the long tails, and you need to adapt state of the art for your capabilities. So my mental model for like how computer vision advances is you have that bell curve, and you have increasingly powerful models that eat outward. And multimodality has a role to play in that, larger models have a role to play in that, more compute, more data generally has a role to play in that. But it will be a messy and I think long condition. >> Well, the thing I want to get, first of all, it's great, great mental model, I appreciate that, 'cause I think that makes a lot of sense. The question is, it seems now more than ever, with the scale and compute that's available, that not only can you eat out to the middle in your example, but there's other models you can integrate with. In the past there was siloed, static, almost bespoke. Now you're looking at larger models eating into the bell curve, as you said, but also integrating in with other stuff. So this seems to be part of that interaction. How does, first of all, is that really happening? Is that true? And then two, what does that mean for companies who want to take advantage of this? Because the old model was operational, you know? I have my cameras, they're watching stuff, whatever, and like now you're in this more of a, distributed computing, computer science mindset, not, you know, put the camera on the wall kind of- I'm oversimplifying, but you know what I'm saying. What's your take on that? >> Well, to the first point of, how are these advances happening? What I was kind of describing was, you know, almost uni-dimensional in that you have like, you're only thinking about vision, but the rise of generative techniques and multi-modality, like Clip is a multi-modal model, it has 400 million image text pairs. That will advance the generalizability at a faster rate than just treating everything as only vision. And that's kind of where LLMs and vision will intersect in a really nice and powerful way. Now in terms of like companies, how should they be thinking about taking advantage of these trends? The biggest thing that, and I think it's different, obviously, on the size of business, if you're an enterprise versus a startup. The biggest thing that I think if you're an enterprise, and you have an established scaled business model that is working for your customers, the question becomes, how do you take advantage of that established data moat, potentially, resource moats, and certainly, of course, establish a way of providing value to an end user. So for example, one of our customers, Walmart, has the advantage of one of the largest inventory and stock of any company in the world. And they also of course have substantial visual data, both from like their online catalogs, or understanding what's in stock or out of stock, or understanding, you know, the quality of things that they're going from the start of their supply chain to making it inside stores, for delivery of fulfillments. All these are are visual challenges. Now they already have a substantial trove of useful imagery to understand and teach and train large models to understand each of the individual SKUs and products that are in their stores. And so if I'm a Walmart, what I'm thinking is, how do I make sure that my petabytes of visual information is utilized in a way where I capture the proprietary benefit of the models that I can train to do tasks like, what item was this? Or maybe I'm going to create AmazonGo-like technology, or maybe I'm going to build like delivery robots, or I want to automatically know what's in and out of stock from visual input fees that I have across my in-store traffic. And that becomes the question and flavor of the day for enterprises. I've got this large amount of data, I've got an established way that I can provide more value to my own customers. How do I ensure I take advantage of the data advantage I'm already sitting on? If you're a startup, I think it's a pretty different question, and I'm happy to talk about. >> Yeah, what's startup angle on this? Because you know, they're going to want to take advantage. It's like cloud startups, cloud native startups, they were born in the cloud, they never had an IT department. So if you're a startup, is there a similar role here? And if I'm a computer vision startup, what's that mean? So can you share your your take on that, because there'll be a lot of people starting up from this. >> So the startup on the opposite advantage and disadvantage, right? Like a startup doesn't have an proven way of delivering repeatable value in the same way that a scaled enterprise does. But it does have the nimbleness to identify and take advantage of techniques that you can start from a blank slate. And I think the thing that startups need to be wary of in the generative AI enlarged language model, in multimodal world, is building what I like to call, kind of like sandcastles. A sandcastle is maybe a business model or a capability that's built on top of an assumption that is going to be pretty quickly wiped away by improving underlying model technology. So almost like if you imagine like the ocean, the waves are coming in, and they're going to wipe away your progress. You don't want to be in the position of building sandcastle business where, you don't want to bet on the fact that models aren't going to get good enough to solve the task type that you might be solving. In other words, don't take a screenshot of what's capable today. Assume that what's capable today is only going to continue to become possible. And so for a startup, what you can do, that like enterprises are quite comparatively less good at, is embedding these capabilities deeply within your products and delivering maybe a vertical based experience, where AI kind of exists in the background. >> Yeah. >> And we might not think of companies as, you know, even AI companies, it's just so embedded in the experience they provide, but that's like the vertical application example of taking AI and making it be immediately usable. Or, of course there's tons of picks and shovels businesses to be built like Roboflow, where you're enabling these enterprises to take advantage of something that they have, whether that's their data sets, their computes, or their intellect. >> Okay, so if I hear that right, by the way, I love, that's horizontally scalable, that's the large language models, go up and build them the apps, hence your developer focus. I'm sure that's probably the reason that the tsunami of developer's action. So you're saying picks and shovels tools, don't try to replicate the platform of what could be the platform. Oh, go to a VC, I'm going to build a platform. No, no, no, no, those are going to get wiped away by the large language models. Is there one large language model that will rule the world, or do you see many coming? >> Yeah, so to be clear, I think there will be useful platforms. I just think a lot of people think that they're building, let's say, you know, if we put this in the cloud context, you're building a specific type of EC2 instance. Well, it turns out that Amazon can offer that type of EC2 instance, and immediately distribute it to all of their customers. So you don't want to be in the position of just providing something that actually ends up looking like a feature, which in the context of AI, might be like a small incremental improvement on the model. If that's all you're doing, you're a sandcastle business. Now there's a lot of platform businesses that need to be built that enable businesses to get to value and do things like, how do I monitor my models? How do I create better models with my given data sets? How do I ensure that my models are doing what I want them to do? How do I find the right models to use? There's all these sorts of platform wide problems that certainly exist for businesses. I just think a lot of startups that I'm seeing right now are making the mistake of assuming the advances we're seeing are not going to accelerate or even get better. >> So if I'm a customer, if I'm a company, say I'm a startup or an enterprise, either one, same question. And I want to stand up, and I have developers working on stuff, I want to start standing up an environment to start doing stuff. Is that a service provider? Is that a managed service? Is that you guys? So how do you guys fit into your customers leaning in? Is it just for developers? Are you targeting with a specific like managed service? What's the product consumption? How do you talk to customers when they come to you? >> The thing that we do is enable, we give developers superpowers to build automated inventory tracking, self-checkout systems, identify if this image is malignant cancer or benign cancer, ensure that these products that I've produced are correct. Make sure that that the defect that might exist on this electric vehicle makes its way back for review. All these sorts of problems are immediately able to be solved and tackled. In terms of the managed services element, we have solutions as integrators that will often build on top of our tools, or we'll have companies that look to us for guidance, but ultimately the company is in control of developing and building and creating these capabilities in house. I really think the distinction is maybe less around managed service and tool, and more around ownership in the era of AI. So for example, if I'm using a managed service, in that managed service, part of their benefit is that they are learning across their customer sets, then it's a very different relationship than using a managed service where I'm developing some amount of proprietary advantages for my data sets. And I think that's a really important thing that companies are becoming attuned to, just the value of the data that they have. And so that's what we do. We tell companies that you have this proprietary, immense treasure trove of data, use that to your advantage, and think about us more like a set of tools that enable you to get value from that capability. You know, the HashiCorp's and GitLab's of the world have proven like what these businesses look like at scale. >> And you're targeting developers. When you go into a company, do you target developers with freemium, is there a paid service? Talk about the business model real quick. >> Sure, yeah. The tools are free to use and get started. When someone signs up for Roboflow, they may elect to make their work open source, in which case we're able to provide even more generous usage limits to basically move the computer vision community forward. If you elect to make your data private, you can use our hosted data set managing, data set training, model deployment, annotation tooling up to some limits. And then usually when someone validates that what they're doing gets them value, they purchase a subscription license to be able to scale up those capabilities. So like most developer centric products, it's free to get started, free to prove, free to poke around, develop what you think is possible. And then once you're getting to value, then we're able to capture the commercial upside in the value that's being provided. >> Love the business model. It's right in line with where the market is. There's kind of no standards bodies these days. The developers are the ones who are deciding kind of what the standards are by their adoption. I think making that easy for developers to get value as the model open sources continuing to grow, you can see more of that. Great perspective Joseph, thanks for sharing that. Put a plug in for the company. What are you guys doing right now? Where are you in your growth? What are you looking for? How should people engage? Give the quick commercial for the company. >> So as I mentioned, Roboflow is I think one of the largest, if not the largest collections of computer vision models and data sets that are open source, available on the web today, and have a private set of tools that over half the Fortune 100 now rely on those tools. So we're at the stage now where we know people want what we're working on, and we're continuing to drive that type of adoption. So companies that are looking to make better models, improve their data sets, train and deploy, often will get a lot of value from our tools, and certainly reach out to talk. I'm sure there's a lot of talented engineers that are tuning in too, we're aggressively hiring. So if you are interested in being a part of making the world programmable, and being at the ground floor of the company that's creating these capabilities to be writ large, we'd love to hear from you. >> Amazing, Joseph, thanks so much for coming on and being part of the AWS Startup Showcase. Man, if I was in my twenties, I'd be knocking on your door, because it's the hottest trend right now, it's super exciting. Generative AI is just the beginning of massive sea change. Congratulations on all your success, and we'll be following you guys. Thanks for spending the time, really appreciate it. >> Thanks for having me. >> Okay, this is season three, episode one of the ongoing series covering the exciting startups from the AWS ecosystem, talking about the hottest things in tech. I'm John Furrier, your host. Thanks for watching. (chill electronic music)

Published Date : Mar 9 2023

SUMMARY :

of the AWS Startups Showcase, of what you guys are doing, of the explosion of use and you know, write some hacks on code and do it all on the edge. and the processors and of the traditional task types. Let's get into that vision. the greatest asset you have, eating into the bell curve, as you said, and flavor of the day for enterprises. So can you share your your take on that, that you can start from a blank slate. but that's like the that right, by the way, How do I find the right models to use? Is that you guys? and GitLab's of the world Talk about the business model real quick. in the value that's being provided. The developers are the that over half the Fortune and being part of the of the ongoing series

ENTITIES

Entity	Category	Confidence
Joseph Nelson	PERSON	0.99+
Joseph	PERSON	0.99+
Walmart	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Tesla	ORGANIZATION	0.99+
400 million	QUANTITY	0.99+
2014	DATE	0.99+
80 objects	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
three years	QUANTITY	0.99+
ten years	QUANTITY	0.99+
80 classes	QUANTITY	0.99+
first question	QUANTITY	0.99+
five years	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Roboflow	ORGANIZATION	0.99+
Wimbledon	EVENT	0.99+
today	DATE	0.98+
both	QUANTITY	0.98+
five years ago	DATE	0.98+
GitLab	ORGANIZATION	0.98+
one	QUANTITY	0.98+
North Star	ORGANIZATION	0.98+
first point	QUANTITY	0.97+
each	QUANTITY	0.97+
over 10,000 pre-trained models	QUANTITY	0.97+
a decade ago	DATE	0.97+
Rivian	ORGANIZATION	0.97+
Mobile World Conference	EVENT	0.95+
over a hundred thousand developers	QUANTITY	0.94+
EC2	TITLE	0.94+
this month	DATE	0.93+
season one	QUANTITY	0.93+
30 plus frames per second	QUANTITY	0.93+
twenties	QUANTITY	0.93+
sandcastle	ORGANIZATION	0.9+
HashiCorp	ORGANIZATION	0.89+
theCUBE	ORGANIZATION	0.88+
hundreds of thousands	QUANTITY	0.87+
wave	EVENT	0.87+
North Star	ORGANIZATION	0.86+
400 million image text pairs	QUANTITY	0.78+
season three	QUANTITY	0.78+
episode one	QUANTITY	0.76+
AmazonGo	ORGANIZATION	0.76+
over half	QUANTITY	0.69+
a hundred million	QUANTITY	0.68+
Startup Showcase	EVENT	0.66+
Fortune 100	TITLE	0.66+
COCO	TITLE	0.65+
Roboflow	PERSON	0.6+
ChatGPT	ORGANIZATION	0.58+
Dataset	TITLE	0.53+
Moore	PERSON	0.5+
COCO	ORGANIZATION	0.39+

Frank & Dave CConvo V1

>> Narrator: From "theCUBE" studios in Palo Alto, in Boston, connecting with thought leaders all around the world, this is "theCUBE" conversation. >> Hi everybody, this is Dave Vellante. and as you know, we've been tracking the next generation of cloud. Sometimes we call it cloud 2.0, Frank Slootman is here to really unpack this with me. Frank, great to see you. Thanks for coming on. >> Yeah, you as well Dave, good to see you. >> Yeah so, obviously hot off your IPO, a lot of buzz around that, that's fine. We could talk about that, but I really want to talk about the future. What, before we get off the IPO though, was something you told me when you were CEO of ServiceNow. You said, "Hey, we're priced to perfection." So it looks like Snowflake is going to be priced to perfection, it's a marathon though. You made that clear. I presume it's not any different here for you. >> Well, I think, you know, the ServiceNow journey was different in the sense that we were kind of underdogs and people sort of discovered over the years, the full potential of the company. And, I think with Snowflake, they pretty much just discovered it day one (laughs). It's a little bit more, sometimes it's nice to be an underdog or a bit of an overdog in this particular scenario, but yeah, it is what it is. And, it's all about execution, delivering the results, being great with our customers and, hopefully the (indistinct) where they may at that point. >> Yeah, you're a poorly kept secret at this point, Frank, after a while. I've got some excerpts of your book that I've been reading and of course I've been following your career since the 2000's. You're off sailing. You mentioned in your book that you were kind of retired, you were done, and then you got sucked back in. Now, why? Are you in this for the sport? What's the story here? >> Actually, that's not a bad way of characterizing it. I think I am in it, for the sport, the only way to become the best version of yourself is to be under the gun, every single day. And that's certainly what we are. It sort of has its own rewards, building great products, building great companies, regardless of what the spoils may be, it has its own reward. It's hard for people like us to get off the field and hang it up, so here we are. >> You're putting forth this vision now, the data cloud, which obviously it's good marketing, but I'm really happy because I don't like the term enterprise data warehouse. I don't think it reflects, what you're trying to accomplish. EDW, it's slow, only a few people really know how to use it, the time value of data is gone by the time, your business is moving faster than the data in EDW. And it really became, as the savior because of Sarbanes-Oxley, that's really, what became of a reporting mechanism. So I have never seen what you guys are doing as EDW. So I want you to talk about the data cloud, I want to get into the vision a little bit and maybe challenge you on a couple things so our audience can better understand it. >> Yeah, so the notion of a data cloud is actually a type of cloud that we haven't had. Data has been fragmented and locked up in a million different places, in different clouds, different cloud regions, obviously, on premise. And for data science teams, they're trying to drive analysis across datasets, which is incredibly hard, which is why a lot of this resorts to programming and things of that sort. It's hardly scalable because the data is not optimized, the economics are not optimized, there's no governance model and so on. But the data cloud is actually the ability to loosely couple and lightly federate data, regardless of where it is, so it doesn't have scale limitations or performance limitations the way traditional data warehouses have had it. So we really have a fighting chance of really killing the silos and unlocking the bunkers and allowing the full promise of data sciences and ML and AI to really happen. A` lot of the analysis that happens on data is on a single dataset because it's just too damn hard to drive analysis across multiple datasets. When we talk to our customers, they have very precise designs on what they're trying to do. They say, "Look, we are trying to discover through deep learning what the patterns are that lead to transactions, whether it's... If you're a streaming company, maybe it's that you're signing up for a channel or you're buying a movie or whatever it is. What is the pattern of datapoints that leads us to that desired outcome?" Once you have a very accurate description of the data relationships that result in that outcome, you can then search for it and scale it tens of million times over. That's what digital enterprises do, right? So in order to discover these patterns, enrich the data to the point where the patterns become incredibly predictive, that's what Snowflake is for, right? But it requires a completely federated data model because you're not going to find a data pattern in a single dataset, per se, right? So that's what it's all about. The outcomes of a data cloud are very, very closely related to the business outcomes that the user is seeking, right? It's not some infrastructure process. that has a very remote relationship with business outcome. This is very, very closely related. >> So it doesn't take a brain surgeon to look at the trillionaires' club. (chuckles) So I can see that. I can see the big trillion dollars, Apple, $2 trillion market cap companies, they get data at the core. Whereas most companies, most incumbents, it might be a bottling plant at the core or some manufacturing or some of the process, and they put data rounded in these silos. It seems like you're trying to really bring that innovation and put data at the core and you've got an architecture to do that. You're talking about your multi cluster shared storage architecture. You mentioned data sharing. Will this, in your opinion enable, for instance, incumbents to do what a lot of the startups were able to do with the cloud days. hey got access to data centers which they couldn't have before the cloud. Are you trying to do something similar with data? >> Yeah, so obviously there's no doubt that the cloud is a critical enabler. This wouldn't be happening without it. At the same time, the trails that have been blazed by Alexa, Facebook and Google. The reason that those enterprises are so extraordinarily valuable is because of what they know through data and how they can monetize what they know through data. But that power is now becoming available to every single enterprise out there, right. Because the data platforms and the underlying cloud capabilities, we are now delivering that to anybody who wants it. Now you still need to have strong data engineering, data science capabilities. It's not like falling off of a log, but fundamentally those capabilities are now broadly accessible in the marketplace. >> So we talking up front about some of the differences between what you've done early in your career, like I said, you're the worst kept secret, Data Domain I would say it was somewhat of a niche market. You blew it up until it was very disruptive, but it was somewhat limited in what could be done. And maybe some of that limitation, you know, wouldn't have occurred if you stayed an independent company. ServiceNow, you mopped the table up 'cause you really had no competition there. Not the case here. You've got some of the biggest competitors in the world. So talk about that and what gives you confidence that you can continue to dominate? >> It's actually interesting that you bring up these companies. Data Domain, it was a scenario where we were constrained on market and we were a data backup company as you recall, we needed to move into backup software, needed to move into primary storage. While we knew it, we couldn't execute on it because it took tremendous resources which, back in the day, it was much harder than what it is right now. So we ended up selling the company to EMC and now part of Dell, but we're left with some trauma from that experience in the sense that, why couldn't we execute on that transformation? So coming to ServiceNow, we were extremely, and certainly me personally, extremely attuned to the challenges that we had endured in our prior company, and one of the reasons why you saw ServiceNow break out at scale, at tremendous growth rates is because of what we learned from the prior journey. We were not going to ever get caught again in the situation where we could not sustain our markets and sustain our growth. So ServiceNow is very much, the execution model, very much a reaction to what we had encountered in the prior company. Now coming into Snowflake a totally different deal because not only is this a large market this is a developing market. I think you've pointed out in some of your broadcasting, that this market is very much influx. And the reason is that technology is now capable of doing things for people and enterprises that they could never do before. So people are spending way more resources than they ever thought possible on these new capabilities. So you can't think in terms of static markets and static data definitions, it means nothing. Okay, these things are so in transition right now. It's very difficult for people to scope the scale of this opportunity. >> Yeah, I want to understand your thinking around and, you know, I've written about the TAM and can Snowflake grow into it's valuation and the way I drew it, I said, okay, I've got data lakes and you've got an enterprise data warehouse, that's pretty well understood but I called it data as a service company the closest analogy to your data cloud. And then even beyond that when you start bringing in the Edge and real time data. Talk about how you're thinking about that TAM what you have to do to participate. Do you have to bring adjacent capabilities? Or is it this read data sharing that will get you there? In other words, you're not like a transaction system. You hear people talking about converged databases. You're going to talk about real time inference at the Edge that today anyway, isn't what Snowflake is about. Does that vision of data sharing in the data cloud, does that allow you to participate in that massive multi hundred billion dollar TAM that I laid out and probably others as well? >> Yeah, well, it's always difficult to define markets based on historical concept that probably not going to apply a whole lot or for much longer. I mean the way we think of it is that data is the beating heart of the digital enterprise and digital enterprises today, what are you looking at people like the car door dash or so on. They were built from the ground up to be digital enterprises. And data is the beating heart of their operation, data operations is their manufacturing if you will. Every other enterprise out there is working very hard to become digital or part digital and is going to learn to develop a data platform like what we're talking about here to data cloud as well as the expertise in terms of data engineering and data sciences to really fully become a digital enterprise, right? So we view data as driving operations, all the all the digital enterprise, that's really what it is, right? And it's completely data driven end-to-end. There's no people involved and the people are developing and supporting the process but in the execution, it is end-to-end data driven. Meaning that data is the signal that initiates the process he's taking, but as they're, as they're being detected, and then they fully execute the entire machinery, programmatic machinery if you will, all of the processes have been designed. Now for example, I may fit a certain pattern, that leads to some transactional law context, but that's not fully completed, that pattern until I click on some link and all of a sudden, poof, I have become a prime prospect. System detects that in the real time and then unleashes all its outreach and capabilities to get me to transact. You and I are experiencing this every day. When we're, when we're online, you just may not fully realize (laughs) that that's what's happening behind the scenes. That's really what this is all about. So to me, this is sort of the new online transaction processing is an end to end data digital process that is continually acquiring, analyzing and acting on data. >> Well, you've talked about the time, time value of, of data. It loses value over time. And to the extent that you can actually affect decisions, maybe prior, before you lose the customer, before you lose the patient, even even more importantly, or before you lose the battle. There's all kinds of mental models that you can apply this. So automation is a key part of that and then again, I think a lot of people, like you said, if you just try to look at historical markets, you can't really squint through those and apply them. You really have to open up your mind and think about the new possibilities. And so I could see >> Exactly. >> Your component of automation. I see what's happening in the RPA space, and I could see these just massive opportunities to really change society, change business. Your last thoughts. >> While there's just no scenario that I can envision where data is not completely core and central to a digital enterprise period. >> Yeah, I think I really do think Frank, your vision is misunderstood somewhat. I think people say, "Okay hey, we'll bet on Slootman, "Scott Pelley, the team." That's great to do that, but I think this is going to unfold in a way that people maybe haven't predicted and maybe you guys yourselves and your founders you know haven't, aren't able to predict as well, but you've got that good, strong architectural philosophy that you're pursuing and it just kind of feels right, doesn't it? >> One of the harder conversations and the this is one of the reasons why we also wrote our book "The Rise of the Data Cloud" is to convey to the marketplace that this is not an incremental evolution. It is just not sort of building on the past. There is a real step function here. And the way to think about it is that typically enterprises and institutions will look at a platform like Snowflake from a workload context. In other words, I have this business, I have this workload, which is very much historically defined by the way, and then they benchmark us against what they're already doing on some legacy platform and they decide, "Yeah, this is a good fit, we're going to put Snowflake here, maybe there." But, it's still very workload centric which means that we are, essentially, perpetuating the mentality of the past. We were doing it one workload at a time, we're creating the new silos and the new bunkers of data in the process. And we're really not approaching this with the level of vision that the data scientists really require to drive maximum benefit from data. So our argument, and this is not an easy argument, is to say to CIOs and any other C-level person that wants to listen and say, "Look, just thinking about operational context and operational excellence, it's like we have to have a platform that allows use unfettered access to the data that we may need to bring the analytical power to." If you have to bring analytical power to a diversity of datasets, how are we going to do that? The data lives in 500 different places, it's just not possible, other than with insane amounts of programming and complexity and then we don't have the performance and we don't have the economics and we don't have the governance and so on. So you really want to set yourself up with a data cloud so that you can unleash your data science capabilities, your machine learning, your deep learning capabilities and then really get the full throttle advantage of what the technology can do. If you going to perpetuate the silo-ing and bunkering of data by doing it one workload at a time, five, 10 years from now, we're having the same conversations we've been having over the last 40 years. >> Yeah, operationalizing your data is going to require busting down those silos and it's going to require something like the data cloud to really power that to the next decade and beyond. Frank Slootman, thanks so much for coming to "theCUBE" and helping us do a preview here of what's to come. >> You bet Dave, thanks. >> All right, thank you for watching everybody. This is Dave Vellante from the "theCUBE". We'll see you next time.

Published Date : Oct 14 2020

SUMMARY :

leaders all around the world, and as you know, we've been tracking Yeah, you as well talk about the future. the full potential of the company. that you were kind of retired, the only way to become the is gone by the time, enrich the data to the and put data at the core no doubt that the cloud is that you can continue to dominate? and one of the reasons why the closest analogy to your data cloud. System detects that in the real time And to the extent that you to really change society, change business. to a digital enterprise period. but I think this is going to that the data scientists and it's going to require This is Dave Vellante from the "theCUBE".

ENTITIES

Entity	Category	Confidence
Frank	PERSON	0.99+
Frank Slootman	PERSON	0.99+
Scott Pelley	PERSON	0.99+
Dave	PERSON	0.99+
Apple	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Palo Alto	LOCATION	0.99+
EMC	ORGANIZATION	0.99+
Slootman	PERSON	0.99+
Dell	ORGANIZATION	0.99+
$2 trillion	QUANTITY	0.99+
Boston	LOCATION	0.99+
five	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
one	QUANTITY	0.99+
The Rise of the Data Cloud	TITLE	0.99+
ServiceNow	ORGANIZATION	0.99+
Facebook	ORGANIZATION	0.98+
tens of million times	QUANTITY	0.98+
today	DATE	0.98+
Snowflake	TITLE	0.97+
Dave CConvo	PERSON	0.97+
next decade	DATE	0.96+
Alexa	TITLE	0.96+
EDW	ORGANIZATION	0.95+
500 different places	QUANTITY	0.95+
TAM	ORGANIZATION	0.95+
10 years	QUANTITY	0.95+
Snowflake	EVENT	0.94+
2000's	DATE	0.94+
multi hundred billion dollar	QUANTITY	0.93+
theCUBE	ORGANIZATION	0.92+
single dataset	QUANTITY	0.92+
One	QUANTITY	0.82+
Sarbanes	ORGANIZATION	0.82+
single enterprise	QUANTITY	0.75+
big trillion dollars	QUANTITY	0.74+
single day	QUANTITY	0.71+
Snowflake	ORGANIZATION	0.65+
million	QUANTITY	0.64+
-Oxley	PERSON	0.64+
last 40	DATE	0.55+
years	QUANTITY	0.49+

Breaking Analysis: Storage...Continued Softness with Some Bright Spots

>> From the SiliconANGLE Media office in Boston, Massachusetts, it's theCUBE. Now here's your host, Dave Vellante. >> Hello everybody and welcome to this week's CUBE Insights, powered by ETR. It is Breaking Analysis, but first I'm coming to you from the floor of Cisco Live in Barcelona, and I want to talk about storage. Storage continues to be soft but there are some bright spots. I've been reporting on this for awhile now and I want to dig in and share with you some of the reasons why, maybe give you some forecasts as to what I think is going to happen in the coming months. And of course, we want to look into some of the ETR spending data, and try to parse through that and understand who's winning, who's losing, who's got the momentum, where are the tailwinds and headwinds. So the first thing I want to show you is let's get right into it. What this slide is showing here is a storage spending snapshot of net score. Now remember, net score in the ETR parlance is an indicator of momentum or spending velocity. Essentially every quarter, what ETR does is they go out to, in this case, 1100 respondents out of the 4500 dataset, and they ask them are you spending more or are you spending less. Essentially they subtract the less from the more and that constitutes net score. It's not that simple but for this purpose, that's what we're showing. Now you can see here on the left hand side, I'm showing all respondents out of 1161. You see the January survey net scores. You've got Rubrik, Cohesity, Nutanix, and Pure, and VMware vSAN are the top five. So Rubrik and Cohesity, very strong, and interesting, Rubrik was very strong last quarter. Cohesity not as strong but really shooting up. It kind of surprised me last quarter, Cohesity being a little low but they were early into the dataset and now they're starting to show what I think is really happening in the marketplace. That's a good indicator. But you can see 75 percent, 72 percent. Nutanix still very strong at 56 percent, driving that hyperconverge piece. You see Pure Storage at 44 percent, down a little bit, talk a little bit more about that in a moment. VMware vSAN, Veeam, et cetera, down the list. The thing about the left hand side and storage in general, you can see the softness. Only about one third of the suppliers are in the green, and that's a problem. If you compare this to security, probably three quarters are in the green. It's a much hotter segment. Now, look on the right hand side. The right hand side is showing what ETR calls GPP, giant, public, and private. You can see there's an N of 403. These are the largest, the very largest public and private companies, private company being a company like Mars Candy. And they say that they are the best indicators of spending momentum in the dataset. So really isolating on some of the large companies. Look what happens here. You can see Rubrik gets even stronger as does Cohesity, they're into the 80 percent range. That's really rarefied air, so very strong. You can see Nutanix drops down. It does better in the smaller companies, it appears. They drop down to 41 percent. Pure gets stronger in the GPP at 68 percent. You can see VMware's vSAN uptick to 45 percent. Nimble gets better, HPE's Nimble, to 54 percent. Dell drops down to 4.8 percent. HPE goes up to 33 percent. HPE was red in the left hand side. You can see Veeam drops, not surprising, Veeam in the biggest companies is not going to be as prevalent. We talked about that in our Breaking Analysis segment after the acquisition of Veeam. You can see NetApp bumps up a little bit but it's still kind of in that red zone. I also want to call your attention to Actifio. They're way down on the bottom in the left hand side, which kind of surprised me. And then I started digging into it because I know Actifio does better in the larger companies. In the right hand side, they pop up to 33 percent. It's only an N of three, but what I'm seeing in the marketplace is Actifio solving some really hard problems in database and copy data management. You're starting to see those results as well. But generally speaking, this picture is not great for storage, with the exception of a few players like Rubrik and Cohesity, Pure, Nutanix. And I'm going to get into that a little bit and try to explain what's going on here. The market's bifurcated. Primary storage has been on the back burner for awhile now, and I've been talking about that. The one exception to that is really been Pure. Little bit for Dell EMC coming back, we'll dig into that a little bit more but Pure has been the stand-out. They're even moderating lately, I'll talk about that some more. Secondary storage is where the market momentum is and you can see that with Rubrik and Cohesity. Again, we'll talk about that some more. Let me dig into the primary side. Cloud, as I've talked about in many Breaking Analysis segments is siphoning off demand from on-prem spend. The second big factor in storage has been there was such an injection of flash into the marketplace, it added headroom. Customers used to buy spindles to get performance, and they don't need to do that so much anymore because so much flash was pushed into the system. The third thing is you're still seeing in primary the consolidation dynamics play out with hyperconverge. So hyperconverge is the software defined bringing together of storage, compute, and networking into a single logical managed unit. That is taking share away from traditional primary storage. You're also seeing tactical NAND pricing be problematic for storage suppliers. You saw that with Pure again this past quarter. NAND pricing comes down, which you'd think would be a good thing from a component standpoint, which it is, but it also lowers prices of the systems. So that hurt Pure's revenue. Their unit volume was pretty good but you're seeing that sort of put pressure on prices, so ASPs are down, average system prices. Let's turn our attention to the secondary market for a moment. Huge injection of venture capital, like a billion dollars, half a billion dollars over the last year, and then another five billion just spent on the acquisition of Veeam. A lot of action going on there. You're seeing big TAM expansions where companies like Rubrik and Cohesity, who have garnered much of that VC spending, are really expanding the notion of data protection from back-up into data management, into analytics, into security, and things of that nature, so a much bigger emphasis on TAM expansion, of course as I talked about the M and A. Let's dig into each of these segments. The chart that I'm showing now really digs into primary storage. You can see here the big players, Pure, Dell EMC, HPE, NetApp, and IBM. And lookit, there's only company in the green, Pure. You can see they're trending down just a little bit from previous quarters but still far and away the company with most spending momentum. Again, here I'm showing net score measure of spending velocity back to the January '18 survey. You can see Dell EMC sort of fell and then is slowly coming back up. NetApp hanging in there, Dell EMC, HP, and NetApp kind of converging, and you can see IBM. IBM announced last quarter about three percent growth. I talked about that actually in September. I predicted that IBM storage would have growth because they synchronized their DS8000 high-end mainframe announcement to the z15, so you saw a little bit of uptick in IBM. Pure, as I said, 15 percent growth. I mean, if you're flat in this market or growing at three percent, you're doing pretty well, you're probably a share gainer. We'll see what happens in February when Dell EMC, HPE, and NetApp announce earnings. We'll update you at that time. So that's what you're seeing now. Same story, Pure outpacing the others, everybody else fighting for share. Let's turn our attention now to secondary storage. What I'm showing here is net score for the secondary storage players. I can't isolate on a drill down for secondary storage, last slide I could do on storage overall, but what I can show is pure plays. What's showing here is Rubrik, Cohesity, Veeam, Commvault, and Veritas. Five pure play, you can argue Veritas isn't a pure play, but I consider it a pure play data protection vendor. Look at Rubrik and Cohesity really shooting up to the right, 75 percent and 72 percent net scores, respectively. You see Veeam hanging in there. This is again, all respondents, the full 1100 dataset. Commvault announced last quarter it beat earnings but it's not growing. You can see some pressure there, and you can see Veritas under some pressure as well. You can see a net score really deep in the red, so that's cause for some concern. We'll keep watching that, maybe dig into some of the larger accounts to see how they're doing there. But you can see clear standouts with Rubrik and Cohesity. I want to look at hyperconverge now. Again, I can't drill into hyperconverge but what I can do is show some of the pure plays. So what this slide shows is the net score for some of the pure play hyperconverge vendors led by Nutanix. The relative newcomer here is vSAN with VMware. You can see Dell EMC, VxRail, and Simplivity. I would say this. A lot of the marketing push that you hear out of Dell and out of VMware says Nutanix is in big trouble, they're dying and so forth. Our data definitely shows something different. The one caution is, you can see Nutanix and larger accounts, not as strong. And you can see both vSAN and Dell EMC stronger in those larger accounts. Maybe that's kind of their bias and their observation space, but it's something that we've got to watch. But you can see the net scores here. Everybody's in the green because overall, this is a strong market. Everybody is winning. It's taking share as I said from primary. We're watching that very closely. Nutanix continues to be strong. Watching very carefully that competitive dynamic and the dynamics within those larger companies which are a bellwether. Now the big question that I want to ask here is can storage reverse the ten-year trend of the big cloud sucking sound that we have heard for the past decade. I've been reporting with data on how cloud generally has hurt that storage spend on-prem. So what I'm showing here in this slide is the net score for the cloud spenders. Many hundreds of cloud spenders in the dataset. What we're showing here is the net score, the spending velocity over the last 10 years for the leaders. You can see Dell EMC, the number one. NetApp, right there in terms of market share, IBM as well. I didn't show HPE because the slide got too busy but they'd be up there as well. So these are the big spenders, big on-prem players and you can see, well, it's up and down. The highs are lower and the lows tend to be lower. You can see on the latest surveys, maybe there's some upticks here in some of the companies. But generally speaking, the trend has been down. That siphoning away of demand from the cloud guys. Can that be reversed, and that's something that we're going to watch, so keeping an eye on that. Let me kind of summarize and I'll make some other comments here. One of the things we're going to watch here is Dell EMC, NetApp, and HPE earnings announcements in February. That's going to be a clear indicator. We'll look for what's happening with overall demand, what the growth trajectory looks like, and very importantly, what NAND pricing looks like. As a corollary to that, we're going to be watching elasticity. I firmly believe as prices go down, that more storage is going to bought. That's always been the case. Flash is still only about 20, 25, 30 percent of the market, about 30 percent of the spending, about 20 percent of the terabytes. But as prices come down, expect people to buy more. That's always been the case. If there's an elasticity of demand, it hasn't shown up in the earning statements, and that's a bit of a concern. But we'll keep an eye on that. We're also going to watch the cloud siphoning demand from on-prem spend. Can the big players and guys like Pure and others, new start-ups maybe, reverse that trend. Multi-cloud, there's an opportunity for these guys. Multi-cloud management, TAM expansion into new areas. Actually delivering services in the cloud. You saw Pure announce block storage in the cloud. So that's kind of interesting that we'll watch. Other players may be getting into the data protection space, but as it relates to the cloud, one of the things I'm watching very closely is the TAM expansion of the cloud players. What do I mean by that. Late last year, Amazon announced a broader set of products or services really in its portfolio. Let's watch for Amazon's moves and other big cloud players into the storage space. I fully expect they're going to want to get a bigger piece of that pie. Remember, much if not most of Amazon's revenue comes from compute. They really haven't awakened to the great storage opportunity that's out there. Why is that important. You saw this play out on-prem. Servers became a really tough market. Intel made all the money. Amazon is a huge customer of Intel, and Intel's getting a big piece of Amazon's EC2 business. That's why you see, in part, Amazon getting into its own chip design. I mean, in the server business, you're talking about low gross margin business. If you're in the 20s or low 30s, you're thrilled. Pure last quarter had 70 plus percent gross margins. It's been a 60 plus percent gross margin business consistently. You're going to see the cloud guys wake up to that and try to grab even more share. It's going to be interesting to see how the traditional on-prem vendors respond to that. Coming into last decade, you saw tons of start-ups but only two companies really reached escape velocity: Nutanix and Pure. At the beginning of the century, you saw Data Domain, Isilon, Compellent, 3PAR all went public. EqualLogic and LeftHand got taken out. There are a bunch of other companies that got acquired. Storage was really a great market. Coming into this decade, mid part of the decade, you had lots of VC opportunity here. You had Fusion and Violin, Intentury went public. They all flamed out. You had a big acquisition with SolidFire, almost a billion dollars, but really Pure and Nutanix were the only ones to make it, so the question is, are you going to see anyone reach escape velocity in the next decade, and where's that going to come from. The likely players today would be Cohesity and Rubrik. Those unicorns would be the opportunity. You could argue Veeam, I guess reached it, but hard to tell because Veeam's a private company. By escape velocity, we're talking large companies who go public, have a big exit in the public market and become transparent so we really know what's going on there. Will it come from a cloud or a cloud native play. We'll see. Are there others that might emerge, like a Nebulon or a Clumio. A company like Infinidat's doing well, will they hit escape velocity and do an IPO and again, become more transparent. That's again something that we're watching, but you're clearly seeing moves up the stack where there's a lot more emphasis in spending on cloud, cloud native. We clearly saw it with hyperconverge consolidation but up the stack towards the apps, really driving digital transformations. People want to spend less on heavy lifting like storage. They're always going to need storage. But is it going to be the same type of market it has been for the last 30 or 40 years, of great investment opportunities. We're starting to see that wane but we'll keep track of it. Thank you for watching this Breaking Analysis, this is CUBE Insights powered by ETR. This is Dave Vellante. We'll see you next time.

Published Date : Jan 31 2020

SUMMARY :

From the SiliconANGLE Media office You can see here the big players, Pure,

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
September	DATE	0.99+
February	DATE	0.99+
Infinidat	ORGANIZATION	0.99+
January '18	DATE	0.99+
15 percent	QUANTITY	0.99+
HPE	ORGANIZATION	0.99+
60 plus percent	QUANTITY	0.99+
January	DATE	0.99+
20s	QUANTITY	0.99+
Barcelona	LOCATION	0.99+
Commvault	ORGANIZATION	0.99+
Veeam	ORGANIZATION	0.99+
HP	ORGANIZATION	0.99+
Nutanix	ORGANIZATION	0.99+
three percent	QUANTITY	0.99+
Mars Candy	ORGANIZATION	0.99+
NetApp	ORGANIZATION	0.99+
72 percent	QUANTITY	0.99+
Dell	ORGANIZATION	0.99+
41 percent	QUANTITY	0.99+
half a billion dollars	QUANTITY	0.99+
75 percent	QUANTITY	0.99+
45 percent	QUANTITY	0.99+
1100 respondents	QUANTITY	0.99+
4.8 percent	QUANTITY	0.99+
Rubrik	ORGANIZATION	0.99+
68 percent	QUANTITY	0.99+
44 percent	QUANTITY	0.99+
TAM	ORGANIZATION	0.99+
five billion	QUANTITY	0.99+
last quarter	DATE	0.99+
56 percent	QUANTITY	0.99+
Intel	ORGANIZATION	0.99+
54 percent	QUANTITY	0.99+
Isilon	ORGANIZATION	0.99+
Veritas	ORGANIZATION	0.99+
Cohesity	ORGANIZATION	0.99+
VMware	ORGANIZATION	0.99+
80 percent	QUANTITY	0.99+
3PAR	ORGANIZATION	0.99+
DS8000	COMMERCIAL_ITEM	0.99+
Dell EMC	ORGANIZATION	0.99+
ten-year	QUANTITY	0.99+
70 plus percent	QUANTITY	0.99+
Intentury	ORGANIZATION	0.99+
last decade	DATE	0.99+
each	QUANTITY	0.99+
EqualLogic	ORGANIZATION	0.99+
Compellent	ORGANIZATION	0.99+
Violin	ORGANIZATION	0.99+
Boston, Massachusetts	LOCATION	0.99+
Data Domain	ORGANIZATION	0.99+
two companies	QUANTITY	0.98+
both	QUANTITY	0.98+

Chandra Mukhyala, IBM - DataWorks Summit Europe 2017 - #DW17 - #theCUBE

>> Narrator: theCUBE covering, DataWorks Summit Europe 2017. Brought to you by Hortonworks. >> Welcome back to the DataWorks Summit in Munich everybody. This is The Cube, the leader in live tech coverage. Chandra Mukhyala is here. He's the offering manager for IBM Storage. Chandra, good to see you. It always comes back to storage. >> It does, it's the foundation. We're here at a Data Show, and you got to put the data somewhere. How's the show going? What are you guys doing here? >> The show's going good. We have lots of participation. I didn't expect this big a crowd, but there is good crowd. Storage, people don't look at it as the most sexy thing but I still see a lot of people coming and asking. "What do you have to do with Hadoop?" kind of questions which is exactly the kind of question I expect. So, going good, we're able to-- >> It's interesting, in the early days of Hadoop and big data, I remember we interviewed, John and I interviewed Jeff Hammerbacher, founder of Cloudera and he was at Facebook and he said, "My whole goal at Facebook "when we're working with Hadoop was to "eliminate the storage container "and the expensive storage container." They succeeded, but now you see guys like you coming in and saying, "Hey, we have better storage." Why does the world need anything different than HDFS? >> This has been happening for the last two decades, right? In storage, every few years a startup comes, they address one problem very well. They address one problem and create a whole storage solution around that. Everybody understands the benefit of it and that becomes part of the main storage. When I say main storage, because these new point solutions address one problem but what about all the rest of the features storage has been developing for decades. Same thing happened with other solutions, for example, deduplication. Very popular, right at one point, dedupe appliances. Nowadays, every storage solution has dedupe in. I think same thing with HDFS right? HDFS's purpose is built for Hadoop. It solves that problem in terms of giving local access storage, scalable storage, big plural storage. But, it's missing out many things you know. One of the biggest problems they have with HDFS is it's siloed storage, meaning that data is only available, the data in HDFS is only for Hadoop. You can't, what about the rest of the applications in the organizations, who may need it through traditional protocols like NFS, or SMB or they maybe need it through new applications like S3 interfaces or Swift interfaces. So, you don't want that siloed storage. That's one of the biggest problems we have. >> So, you're putting forth a vision of some kind horizontal infrastructure that can be leveraged across your application portfolio... >> Chandra: Yes. >> How common is that? And what's the value of that? >> It's not really common, that's one of the stories, messages we're trying to get out. And I've been talking to data scientists in the last one year, a lot of them. One of the first things they do when they are implementing a Hadoop project is, they have to copy a lot data into HDFS Because before they could enter it just as HDFS they can't on any set. That copy process takes days. >> Dave: That's a big move, yeah. >> It's not only wasting time from a data scientist, but it also makes the data stale. I tell them you don't have to do that if your data was on something like IBM Spectrum Scale. You can run Hadoop straight off that, why do you even have to copy into HDFS. You can use the same existing applications map, and just applications with zero change to it and pour in them at Spectrum Scale it can still use the HSFS API. You don't have to copy that. And every data scientists I talk to is like, "Really?" "I don't know how to do this, I'm wasting time?" Yes. So, it's not very well known that, you know, most people think that there's only one way to do Hadoop applications, in sometimes HDFS. You don't have to. And advantages there is, one, you don't have to copy, you can share the data with the rest of the applications but its no more stale data. But also, one other big difference between the HDFS type of storage versus shared storages. In the shared, which is what HDFS is, the various scale is by adding new nodes, which adds both compute and storage. What if our applications, which don't necessarily need need more compute, all they need is more throughput. You're wasting computer resources, right? So there are certain applications where a share nothing is a better architecture. Now the solution which IBM has, will allow you to deploy it in either way. Share nothing or shared storage but that's one of the main reasons, people want to, data scientists especially, want to look at these alternative solutions for storage. >> So when I go back to my Hammerbacher example, it worked for a Facebook of the early days because they didn't have a bunch of legacy data hanging around, they could start with, pretty much, a blank piece of paper. >> Yes. >> Re-architect, plus they had such scale, they probably said, "Okay, we don't want to go to EMC "and NetApp or IBM, or whomever and buy storage, "we want to use commodity components." Not every enterprise can do that, is what you're saying. >> Yes, exactly. It's probably okay for somebody like a very large search engine, when all they're doing is analytics, nothing else. But if you to any large commercial enterprise, they have lots of, the whole point around analytics is they want to pool all of the data and look at that. So, find the correlations, right? It's not about analyzing one small, one dataset from one business function. It's about pooling everything together and see what insights can I get out of it. So that's one of the reasons it's very important to have support to access the data for your legacy enterprise applications, too, right? Yeah, so NFS and SMB are pretty important, so are S3 and Swift, but also for these analytics applications, one of the advantage of IBM Solution here is we provide local access for file system. Not necessarily through mass protocols like an access, we do that, but we also have PO SIX access to have data local access to the file system. With that, HDFS you have to first copy the file into HDFS, you had to bring it back to do anything with that. All those copy operations go away. And this is important, again in enterprise, not just for data sharing but also to get local access. >> You're saying your system is Hadoop ready. >> Chandra: It is. >> Okay. And then, the other thing you hear a lot from IT practitioners anyway, not so much from from the line of businesses, that when people spin up these Hadoop projects, big data projects, they go outside of the edicts of the organization in terms of governance and compliance, and often, security. How do you solve, do you solve that problem? >> Yeah, that's one of the reason to consider again, the enterprise storage, right? It's not just because you have, you're able to share the data with rest of applications, but also the whole bunch of data management features, including data governance features. You can talk about encryption there, you can talk about auditing there, you can talk about features like WAN, right, WAN, so data is, especially archival data, once you write you can't modify that. There are a whole bunch of features around data retention, data governance, those are all part of the data management stack we have. You get that for free. You not only get universal access, unified access, but you also get data governance. >> So is this one of the situations where, on the face of it, when you look at the CapEx, you say, "Oh, wow, I cause use commodity components, save a bunch of money." You know, you remember the client server days. "Oh, wow, cheap, cheap, cheep, "microprocessor based solution," and then all the sudden, people realize we have to manage this. Have we seen a similar sort of trend with Hadoop, with the ability to or the complexity of managing all of this infrastructure? It's so high than it actually drives costs up. >> Actually there are two parts to it, right? There is actually value in utilizing commodity hardware, industry standards. That does reduce your costs right? If you can just buy a standard XL6 server we can, a storage server and utilize that, why not. That is kind of just because. But the real value in any kind of a storage data manage solution is in the software stack. Now you can reduce CapEx by using industry standards. It's a good thing to do and we should, and we support that but in the end, the data management is there in the software stack. What I'm saying is HDFS is solving one problem by dismissing the whole data management problems, which we just touched on. And that all comes in software which goes down under service. >> Well, and you know, it's funny, I've been saying for years, that if you peel back the onion on any storage device, the vast majority anyway, they're all based on standard components. It's the software that you're paying for. So it's sort of artificial in that a company like IBM will say, "Okay, we've got all this value in here, "but it's on top of commodity components, "we're going to charge for the value." >> Right. >> And so if you strip that out, sure, you do it yourself. >> Yeah, exactly. And it's all standard service. It's been like that always. Now one difference is ten years ago people used propriety array controllers. Now all of the functionalities coming into software-- >> ASICs, >> Recording. >> Yeah, 3PAR still has an ASIC, but most don't. >> Right, that's funny, they only come in like.. Almost everybody has some kind of a software-based recording and they're able to utilize sharing server. Now the reason advantage in appliance more over, because, yes it can run on industry's standard, but this is storage, this is where, that's a foundation of all of your inter sectors. And you want RAS, or you want reliability and availability. The only way to get that is a fully integrated, tight solution, where you're doing a lot of testing on the software and the hardware. Yes, it's supposed to work, but what really happens when it fails, how does the sub react. And that's where I think there is still a value for integrated systems. If you're a large customer, you have a lot of storage saving, source of the administrators and they know to build solutions and validate it. Yes, software based storage is the right answer for you. And you're the offering manager for Spectrum Scale, which is the file offering, right, that's right? >> Yes, right yes. >> And it includes object as well, or-- >> Spectrum Sale is a file and object storage pack. It supports both file and protocols. It also supports object protocols. The thing about object storage is it means different things to different people. To some people, it's the object interface. >> Yeah, to me it means get put. >> Yeah, that's what the definition is, then it is objectivity. But the fact is that everybody's supposed to stay in now. But to some of the people, it's not about the protocol, because they're going to still access by finding those protocols, but to them, it's about the object store, which means it's a flat name space and there's no hierarchical name structure, and you can get into billions of finites without having any scalable issues. That's an object store. But to some other people it's neither of those, it's about a range of coding which object storage, so it's cheap storage. It allows you to run on storage and service, and you get cheap storage. So it's three different things. So if you're talking about protocols yes, but their skill is by their definition is object storage, also. >> So in thinking about, well let's start with Spectrum Scale generally. But specifically, your angle in big data and Hadoop, and we talked about that a little bit, but what are you guys doing here, what are you showing, what's your partership with Hortonworks. Maybe talk about that a little bit. >> So we've been supporting this, what we call as Hadoop connector on Spectrum Scale for almost a year now, which is allowing our existing Spectrum Scale customers to run Hadoop straight on it. But if you look at the Hadoop distributions, there are two or three major ones, right? Cloudera, Hortonworks, maybe MapArt. One of the first questions we get is, we tell our customers you can run Hadoop on this. "Oh, is this supported by my distribution?" So that has been a problem. So what we announced is, we found a partnership with Hortonworks, so now Hortonwords is certifying IBM Spectrum Scale. It's not new code changes, it's not new features, but it's a validation and a stamp from Hortonworks, that's in the process. The result of is, Hortonworks certified reference architecture, which is what we announced. We announced it about a month ago. We should be publishing that soon. Now customers can have more confidence in the joint solutions. It's not just IBM saying that it's Hadoop ready, but it's Hortonworks backing that up. >> Okay, and your scope, correct me if I'm wrong, is sort of on prem and hybrid, >> Chandra: Yes. >> Not cloud services. That's kind of you might sell your technology internally, but-- >> Correct so IBM storage is primarily focused on on prem storage. We do have a separate cloud division, but almost every IBM storage production, especially Spectrum Scale, is what I can speak of, we treat them as hybrid cloud storage. What we mean that is we have built in capabilities, we have feature. Most of our products call transfer in cloud tiering, it allows you to set a policy on when data should be automatically tiered to the cloud. Everybody wants public, everybody wants on prem. Obviously there are pros and cons of on primary storage, versus off primary storage, but basially, it boils down to, if you want performance and security, you want to be on premises. But there's always some which is better to be in the cloud, and we try to automate that with our feature called transfer and cloud data. You set a policy based on age, based on the type of data, based on the ownership. The system will automatically tier the data to the cloud, and when a user access that cloud, it comes back automatically, too. It's all transferred to the end. So yes, we're a non primary storage business but our solutions are hybrid cloud storage. >> So, as somebody who knows the file business pretty well, let's talk about kind of the business file and sort of where it's headed. There's some mega trends and dislocations. There's obviously software defined. You guys have made a big investment in software defined a year and a half, two years ago. There's cloud, Amazon with S3 sort of shook up the world. I mean, at first it was sort of small, but then now, it's really catching on. Object obviously fits in there. What do you see as the future of file. >> That's a great question. When it comes to data layout, there's really a block file of object. Software defined and cloud are various ways of consuming storage. If you're large service probably, you would prefer a software based solution so you can run it on your existing service. But who are your preferred solutions? Depending on the organization's preferences for security, and how concerned they are about security and performance needs, they will prefer to run some of the applications on cloud. These are different ways of consuming storage. But coming back to file, an object right? So object is perfect if you are not going to modify the data. You're done writing that data, and you're not going to change. It just belongs an object store, right? It's more scalable storage, I say scalable because file systems are hierarchical in nature. Because it's a file system tree, you have travels through the various subtype trees. Beyond a few million subtype trees, it slows you down. But file systems have a strength. When you want to modify the file, any application which is going to edit the file, which is going to modify the file, that application belongs on file storage, not on object. But let's say you are dealing with medical images. You're not going to modify an x-ray once it's done. That's better suited on an object storage. So file storage will always have a place. Take video editing and all these videos they are doing, you know video, we do a lot of video editing. That belongs on file storage, not on object. If you care about file modifications and file performance, file is your answer, but if you're done and you just want to archive it, you know, you want a scalable storage, billions of objects, then object is answer. Now either of these can be software based storage or it could be appliance. That's again an organization's preference for do you want to integrate a robust ready, ready made solution, then appliance is an answer. "Ah, no I'm a large organization. "I have a lot of storage administered," as they can build something on their own, then software based is answer. Having most windows will give you a choice. >> What brought you to IBM. You used to be at NetApp. IBM's buying the weather company. Dell's buying EMC. What attracted you to IBM? Storage is the foundation which we have, but it's really about data, and it's really about making sense of it, right? And everybody saying data is the new oil, right? And IBM is probably the only company I can think of, which has the tools and the IT to make sense of all this. NetApp, it was great in early 2000s. Even as a storage foundation, they have issues, with scale out and a true scale out, not just a single name space. EMC is pure storage company. In the future it's all about, the reason we are here at this conference is about analyzing the data. What tools do you have to make sense of that. And that's where machine learning, then deep learning comes. Watson is very well-known for that. IBM has the IT and it has a rightful research going on behind that, and I think storage will make more sense here. And also, IBM is doing the right thing by investing almost a billion dollars in software defined storage. They are one of the first companies who did not hesitate to take the software from the integrated systems, for example, XIV, and made the software available as software only. We did the same thing with Store-Wise. We took the software off it and made available as Spectrum Virtualize. We did not hesitate at all to take the same software which was available, to some other vendors, "I can't do that. "I'm going to lose all my margins." We didn't hesitate. We made it available as software. 'Cause we believe that's an important need for our customers. >> So the vision of the company, cognitive, the halo effect of that business, that's the future, is going to bring a lot of storage action, is sort of the premise there. >> Chandra: Yes. >> Excellent, well Chandra, thanks very much for coming to theCUBE. It was great to have you, and good luck with attacking the big data world. >> Thank you, thanks for having me. >> You're welcome. Keep it right there everybody. We'll be back with our next guest. We're live from Munich. This is DataWorks 2017. Right back. (techno music)

Published Date : Apr 5 2017

SUMMARY :

Brought to you by Hortonworks. This is The Cube, the leader It does, it's the foundation. at it as the most sexy thing in the early days of Hadoop and big data, and that becomes part of the main storage. of some kind horizontal infrastructure One of the first things they do but it also makes the data stale. of legacy data hanging around, that, is what you're saying. So that's one of the You're saying your of the organization in terms of governance but also the whole bunch of the client server days. It's a good thing to do and we should, It's the software that you're paying for. And so if you strip that Now all of the functionalities an ASIC, but most don't. is the right answer for you. To some people, it's the object interface. it's not about the protocol, but what are you guys doing One of the first questions we get is, That's kind of you might sell based on the type of data, let's talk about kind of the business file of the applications on cloud. And also, IBM is doing the right thing is sort of the premise there. to theCUBE. This is DataWorks 2017.

ENTITIES

Entity	Category	Confidence
Jeff Hammerbacher	PERSON	0.99+
John	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Hortonwords	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Munich	LOCATION	0.99+
Chandra Mukhyala	PERSON	0.99+
Facebook	ORGANIZATION	0.99+
Chandra	PERSON	0.99+
two parts	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
billions	QUANTITY	0.99+
EMC	ORGANIZATION	0.99+
Dell	ORGANIZATION	0.99+
DataWorks Summit	EVENT	0.99+
Swift	TITLE	0.99+
early 2000s	DATE	0.99+
One	QUANTITY	0.99+
one problem	QUANTITY	0.99+
DataWorks Summit	EVENT	0.99+
Cloudera	ORGANIZATION	0.99+
S3	TITLE	0.98+
one	QUANTITY	0.98+
both	QUANTITY	0.98+
MapArt	ORGANIZATION	0.98+
first	QUANTITY	0.98+
Spectrum Scale	TITLE	0.97+
ten years ago	DATE	0.97+
two years ago	DATE	0.97+
first questions	QUANTITY	0.96+
first companies	QUANTITY	0.96+
billions of objects	QUANTITY	0.95+
Hadoop	TITLE	0.95+
#DW17	EVENT	0.95+
one point	QUANTITY	0.95+
2017	EVENT	0.94+
decades	QUANTITY	0.94+
one business function	QUANTITY	0.94+
zero	QUANTITY	0.94+
a year and a half	DATE	0.93+
DataWorks Summit Europe 2017	EVENT	0.92+
one dataset	QUANTITY	0.92+
one way	QUANTITY	0.92+
three different things	QUANTITY	0.92+
DataWorks 2017	EVENT	0.91+
SMB	TITLE	0.91+
CapEx	ORGANIZATION	0.9+
last one year	DATE	0.89+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Dataset: