Liran Zvibel, WekaIO | CUBEConversations, June 2019

>> from our studios in the heart of Silicon Valley. HOLLOWAY ALTO, California It is a cube conversation. >> Hi! And welcome to the Cube studios from the Cube conversation, where we go in depth with thought leaders driving innovation across the tech industry on hosted a Peter Burress. What are we talking about today? One of the key indicators of success and additional business is how fast you can translate your data into new value streams. That means sharing it better, accelerating the rate at which you're running those models, making it dramatically easier to administrate large volumes of data at scale with a lot of different uses. That's a significant challenge. Is going to require a rethinking of how we manage many of those data assets and how we utilize him. Notto have that conversation. We're here with Le'Ron v. Bell, who was the CEO of work a Iot leering. Welcome back to the Cube. >> Thank you very much for having >> me. So before we get to the kind of a big problem, give us an update. What's going on at work a Iot these days? >> So very recently we announced around CIA financing for the company. Another 31.7 a $1,000,000 we've actually had a very unorthodox way of raising thiss round. Instead of going to the traditional VC lead round, we actually went to our business partners and joined forces with them into building a stronger where Collier for customers we started with and video that has seen a lot of success going with us to their customers. Because when Abel and Video to deploy more G pews so they're customers can either solve bigger problems or solve their problems faster. The second pillar off the data center is networking. So we've had melon ox investing in the company because there are the leader ofthe fast NETWORKINGS. So between and Vidia, melon, ox and work are yo u have very strong pillars. Iran compute network and storage performance is crucial, but it's not the only thing customers care about, so customers need extremely fast access to their data. But they're also accumulating and keeping and storing tremendous amount of it. So we've actually had the whole hard drive industry investing in us, with Sigi and Western Digital both investing in the company and finally one off a very successful go to market partner, Hewlett Pocket enterprise invested in us throw their Pathfinder program. So we're showing tremendous back from the industry, supporting our vision off, enabling next generation performance, two applications and the ability to scale to any workload >> graduations. And it's good money. But it's also smart money that has a lot of operational elements and just repeat it. It's a melon ox, our video video, H P E C Gate and Western Digital eso. It's It's an interesting group, but it's a group that will absolutely sustain and further your drive to try to solve some of these key data Orient problems. But let's talk about what some of those key day or data oriented problems where I set up front that one of the challenges that any business that has that generates a lot of it's value out of digital assets is how fast and how easily and with what kind of fidelity can I reuse and process and move those data assets? How are how is the industry attending? How's that working in the industry today, and where do you think we're going? >> So that's part on So businesses today, through different kind of workloads, need toe access, tremendous amount of data extremely quickly, and the question of how they're going to compare to their cohort is actually based on how quickly and how well they can go through the data and process it. And that's what we're solving for our customers. And we're now looking into several applications where speed and performance. On the one hand, I have to go hand in hand with extreme scale. So we see great success in machine learning, where in videos in we're going after Life Sciences, where the genomic models, the cryo here microscopy the computational chemistry all are now accelerated. And for the pharmacy, because for the research interested to actually get to conclusion, they serve to sift through a lot of data. We are working extremely well at financial analytics, either for the banks, for the hedge funds for the quantitative trading Cos. Because we allow them to go through data much, much quicker. Actually, only last week I had the grades to rate the customer where we were able to change the amount of time they go through one analytic cycle from almost two hours, four minutes. >> This is in a financial analytics >> Exactly. And I think last time I was here was telling you about one of their turn was driving companies using us taking, uh, time to I poke another their single up from two weeks to four hours. So we see consistent 122 orders of monk to speed time in wall clock. So we're not just showing we're faster for a benchmark. We're showing our customer that by leveraging our technology, they get results significantly faster. We're also successful in engineering around chip designed soft rebuild fluid dynamics. We've announced Melon ox as an idiot customer. The chip designed customers, so they're not only a partner, they have brought our technology in house, and they're leveraging us for the next chips. And recently we've also discovered that we are great help for running Noah scale databases in the clouds running ah sparkles plank or Cassandra over work. A Iot is more than twice faster than running over the Standard MPs elected elastic clock services. >> All right, so let's talk about this because your solving problems that really only recently have been within range of some of the technology, but we still see some struggling. The way I described it is that storage for a long time was focused on persisting data transactions executed. Make sure you persisted Now is moved to these life life sciences, machine learning, genomics, those types of outpatients of five workloads we're talking about. How can I share data? How can I deploy and use data faster? But the historian of the storage industry still predicated on this designs were mainly focused on persistent. You think about block storage and filers and whatnot. How is Wecker Io advancing that knowledge that technology space of, you know, reorganizing are rethinking storage for the types, performance and scale that some of these use cases require. >> This is actually a great question. We actually started the company. We We had a long legacy at IBM. We now have no Andy from, uh, metta, uh, kind of prints from the emcee. We see what happens. Page be current storage portfolio for the large Players are very big and very convoluted, and we've decided when we're starting to come see that we're solving it. So our aim is to solve all the little issues storage has had for the last four decades. So if you look at what customers used today, if they need the out most performance they go to direct attached. This's what fusion I awards a violin memory today, these air Envy me devices. The downside is that data is cannot be sure, but it cannot even be backed up. If a server goes away, you're done. Then if customers had to have some way of managing the data they bought Block san, and then they deployed the volume to a server and run still a local file system over that it wasn't as performance as the Daz. But at least you could back it up. You can manage it some. What has happened over the last 15 years, customers realized more. Moore's law has ended, so upscaling stopped working and people have to go out scaling. And now it means that they have to share data to stop to solve their problems. >> More perils more >> probably them out ofthe Mohr servers. More computers have to share data to actually being able to solve the problem, and for a while customers were able to use the traditional filers like Aneta. For this, kill a pilot like an eyes alone or the traditional parlor file system like the GP affair spectrum scale or luster, but these were significantly slower than sand and block or direct attached. Also, they could never scale matter data. You were limited about how many files that can put in a single, uh, directory, and you were limited by hot spots into that meta data. And to solve that, some customers moved to an object storage. It was a lot harder to work with. Performance was unimpressive. You had to rewrite our application, but at least he could scale what were doing at work a Iot. We're reconfiguring the storage market. We're creating a storage solution that's actually not part of any of these for categories that the industry has, uh, become used to. So we are fasted and direct attached, they say is some people hear it that their mind blows off were faster, the direct attached, whereas resilient and durable as San, we provide the semantics off shirt file, so it's perfect your ability and where as Kayla Bill for capacity and matter data as an object storage >> so performance and scale, plus administrative control and simplicity exactly alright. So because that's kind of what you just went through is those four things now now is we think about this. So the solution needs to be borrow from the best of these, but in a way that allows to be applied to work clothes that feature very, very large amounts of data but typically organized as smaller files requiring an enormous amount of parallelism on a lot of change. Because that's a big part of their hot spot with metadata is that you're constantly re shuffling things. So going forward, how does this how does the work I owe solution generally hit that hot spot And specifically, how are you going to apply these partnerships that you just put together on the investment toe actually come to market even faster and more successfully? >> All right, so these are actually two questions. True, the technology that we have eyes the only one that paralyzed Io in a perfect way and also meditate on the perfect way >> to strangers >> and sustains it parla Liz, um, buy load balancing. So for a CZ, we talked about the hot sport some customers have, or we also run natively in the cloud. You may get a noisy neighbor, so if you aren't employing constant load balancing alongside the extreme parallelism, you're going to be bound to a bottleneck, and we're the only solution that actually couples the ability to break each operation to a lot of small ones and make sure it distributed work to the re sources that are available. Doing that allows us to provide the tremendous performance at tremendous scale, so that answers the technology question >> without breaking or without without introducing unbelievable complexity in the administration. >> It's actually makes everything simpler because looking, for example, in the ER our town was driving example. Um, the reason they were able to break down from two weeks to four hours is that before us they had to copy data from their objects, George to a filer. But the father wasn't fast enough, so they also had to copy the data from the filer to a local file system. And these copies are what has added so much complexity into the workflow and made it so slow because when you copy, you don't compute >> and loss of fidelity along the way right? OK, so how is this money and these partnerships going to translate into accelerated ionization? >> So we are leveraging some off the funds for Mohr Engineering coming up with more features supporting Mohr enterprise applications were gonna leverage some of the funds for doing marketing. And we're actually spending on marketing programs with thes five good partners within video with melon ox with sick it with Western Digital and with Hewlett Packard Enterprise. But we're also deploying joint sales motion. So we're now plugged into in video and plugged, anted to melon ox and plugging booked the Western Digital and to Hillary Pocket Enterprise so we can leverage their internal resource now that they have realized through their business units and the investment arm that we make sense that we can actually go and serve their customers more effectively and better. >> Well, well, Kaio is introduced A road through the unique on new technology into makes perfect sense. But it is unique and it's relatively new, and sometimes enterprises might go well. That's a little bit too immature for me, but if the problem than it solves is that valuable will bite the bullet. But even more importantly, a partnership line up like this has got to be ameliorating some of the concerns that your fearing from the marketplace >> definitely so when and video tells the customers Hey, we have tested it in our laps. Where in Hewlett Packard Enterprise? Till the customer, not only we have tested it in our lab, but the support is going to come out of point. Next. Thes customers now have the ability to keep buying from their trusted partners. But get the intellectual property off a nor company with better, uh, intellectual property abilities another great benefit that comes to us. We are 100% channel lead company. We are not doing direct sales and working with these partners, we actually have their channel plans open to us so we can go together and we can implement Go to Market Strategy is together with they're partners that already know howto work with them. And we're just enabling and answering the technical of technical questions, talking about the roadmap, talking about how to deploy. But the whole ecosystem keeps running in the fishing way it already runs, so we don't have to go and reinvent the whales on how how we interact with these partners. Obviously, we also interact with them directly. >> You could focus on solving the problem exactly great. Alright, so once again, thanks for joining us for another cube conversation. Le'Ron zero ofwork I Oh, it's been great talking to you again in the Cube. >> Thank you very much. I always enjoy coming over here >> on Peter Burress until next time.

Published Date : Jun 5 2019

SUMMARY :

from our studios in the heart of Silicon Valley. One of the key indicators of me. So before we get to the kind of a big problem, give us an update. is crucial, but it's not the only thing customers care about, How are how is the industry attending? And for the pharmacy, because for the research interested to actually get to conclusion, in the clouds running ah sparkles plank or Cassandra over But the historian of the storage industry still predicated on this And now it means that they have to share data to stop to solve We're reconfiguring the storage market. So the solution needs to be borrow and also meditate on the perfect way actually couples the ability to break each operation to a lot of small ones and Um, the reason they were able to break down from two weeks to four hours So we are leveraging some off the funds for Mohr Engineering coming up is that valuable will bite the bullet. Thes customers now have the ability to keep buying from their You could focus on solving the problem exactly great. Thank you very much.

ENTITIES

Entity	Category	Confidence
Western Digital	ORGANIZATION	0.99+
Liran Zvibel	PERSON	0.99+
IBM	ORGANIZATION	0.99+
two weeks	QUANTITY	0.99+
Mohr Engineering	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
two weeks	QUANTITY	0.99+
Western Digital	ORGANIZATION	0.99+
two questions	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
CIA	ORGANIZATION	0.99+
Peter Burress	PERSON	0.99+
George	PERSON	0.99+
June 2019	DATE	0.99+
122 orders	QUANTITY	0.99+
Hewlett Packard Enterprise	ORGANIZATION	0.99+
four hours	QUANTITY	0.99+
last week	DATE	0.99+
four minutes	QUANTITY	0.99+
Mohr	ORGANIZATION	0.99+
Sigi	ORGANIZATION	0.99+
Hewlett Pocket	ORGANIZATION	0.99+
two applications	QUANTITY	0.99+
five good partners	QUANTITY	0.98+
second pillar	QUANTITY	0.98+
both	QUANTITY	0.98+
31.7	QUANTITY	0.98+
Andy	PERSON	0.98+
Collier	ORGANIZATION	0.98+
today	DATE	0.98+
H P E C Gate	ORGANIZATION	0.97+
more than twice	QUANTITY	0.97+
Le'Ron	PERSON	0.96+
Hillary Pocket Enterprise	ORGANIZATION	0.95+
Bell	PERSON	0.95+
four things	QUANTITY	0.95+
Melon ox	ORGANIZATION	0.95+
one	QUANTITY	0.94+
five workloads	QUANTITY	0.92+
each operation	QUANTITY	0.92+
Cube	ORGANIZATION	0.92+
Abel	ORGANIZATION	0.91+
Western Digital eso	ORGANIZATION	0.91+
$1,000,000	QUANTITY	0.89+
almost two hours	QUANTITY	0.89+
single	QUANTITY	0.89+
Io	PERSON	0.88+
last 15 years	DATE	0.86+
HOLLOWAY ALTO, California	LOCATION	0.86+
One	QUANTITY	0.85+
one analytic cycle	QUANTITY	0.81+
Cassandra	PERSON	0.78+
last four decades	DATE	0.77+
Kayla Bill	PERSON	0.74+
Kaio	ORGANIZATION	0.69+
Moore	PERSON	0.67+
Vidia	ORGANIZATION	0.59+
Noah	COMMERCIAL_ITEM	0.59+
Video	ORGANIZATION	0.56+
Iran	LOCATION	0.54+
Wecker	ORGANIZATION	0.54+
Aneta	ORGANIZATION	0.53+
Block san	ORGANIZATION	0.5+
Iot	TITLE	0.5+
Daz	ORGANIZATION	0.44+
WekaIO	PERSON	0.41+
Pathfinder	TITLE	0.32+

Liran Zvibel & Andy Watson, WekaIO | CUBE Conversation, December 2018

(cheery music) >> Hi I'm Peter Burris, and welcome to another CUBE Conversation from our studios in Palo Alto, California. Today we're going to be talking about some new advances in how data gets processed. Now it may not sound exciting, but when you hear about some of the performance capabilities, and how it liberates new classes of applications, this is important stuff, now to have that conversation we've got Weka.IO here with us, specifically Liran Zvibel is the CEO of Weka.IO, and joined by Andy Watson, who's the CTO of Weka.IO. Liran, Andy, welcome to the cube. >> Thanks. >> Thank you very much for having us. >> So Liran, you've been here before, Andy, you're a newbie, so Liran, let's start with you. Give us the Weka.IO update, what's going on with the company? >> So 18 has been a grand year for us, we've had great market adoption, so we've spent last year proving our technology, and this year we have accelerated our commercial successes, we've expanded to Europe, we've hired quite a lot of sales in the US, and we're seeing a lot of successes around machine learning, deep learning, and life sciences data processes. >> And you've hired a CTO. >> And we've hired the CTO, Andy Watson, which I am excited about. >> So Andy, what's your pedigree, what's your background? >> Well I've been around a while, got the scars on my back to show it, mostly in storage, dating back to even off-specs before NetApp, but probably best known for the years I spent at NetApp, was there from 95 through 2007, kind of the glory years, I was the second CTO at NetApp, as a matter of fact, and that was a pretty exciting time. We changed the way the world viewed shared storage, I think it's fair to say, at NetApp, and it feels the same here at Weka.IO, and that's one of the reasons I'm so excited to have joined this company, because it's the same kind of experience of having something that is so revolutionary that quite often, whether it's a customer, or an analyst like yourself, people are a little skeptical, they find it hard to believe that we can do the things that we do, and so it's gratifying when we have the data to back it up, and it's really a lot of fun to see how customers react when they actually have it in their environment, and it changes their workflow and their life experience. >> Well I will admit, I might be undermining my credibility here, but I will admit that back in the mid 90s I was a little bit skeptical about NetApp, but I'm considerably less skeptical about Weka.IO, just based on the conversations we've had, but let's turn to that, because there are classes of applications that are highly dependent on very large, small files, being able to be moved very very rapidly, like machine learning, so you mentioned machine learning, Liran, talk a little bit about some of the market success that you're having, some of those applications' successes. >> Right so machine learning actually works extremely well for us for two reasons. For one big reasons, machine learning is being performed by GPU servers, so a server with several GPU offload engines in them, and what we see with this kind of server, a single GPU server replaces ten or tens of CPU based servers, and what we see that you actually need, the IO performance to be ten or tens times what the CPU servers has been, so we came up with a way of providing significantly higher, so two orders of magnitude higher IO to a single client on the one hand, and on the other hand, we have sold the data performance from the metadata perspective, so we can have directories with billions of files, we can have the whole file system with trillions of files, and when we look at the autonomous driving problem, for examples, if you look at the high end car makers, they have eight cameras around the cars, these cameras take small resolution, because you don't need a very high resolution to recognize the line, or a cat, or a pedestrian, but they take them at 60 frames per second, so 30 minutes, you get about the 100k files, traditional filers could put in the directory, but if you'd like to have your cars running in the Bay Area, you'd like to have all the data from the Bay Area in the single directory, then you would need the billions of file directories for us, and what we have heard from some of our customers that have had great success with our platform is that not only they get hundreds of gigabytes of small file read performance per second, they tell us that they take their standard time to add pop from about two weeks before they switched to us down to four hours. >> Now let's explore that, because one of the key reasons there is the scalability of the number of files you can handle, so in other words, instead of having to run against a limit of the number of files that they can typically run through the system, saturate these GPUs based on some other storage or file technology, they now don't have to stop and set up the job again and run it over and over, they can run the whole job against the entire expansive set of files, and that's crucial to speeding up the delivery of the outcome, right? >> Definitely, so what they, these customers used to do before us, they would do a local caching, cause NFS was not fast enough for them, so they would copy the data locally, and then they would run them over on the local file system, because that has been the pinnacle of performance of recent year. We are the only storage currently, I think we'll actually be the first wave of storage solutions where a shared platform built for NVME is actually faster than a local file system, so we'd let them go through any file, they don't have to pick initially what files goes to what server, and also we are even faster than the traditional caching solutions. >> And imagine, having to collect the data and copy it to the local server, application server, and do that again and again and again for a whole server farm, right? So it's bad enough to even do it once, to do it many times, and then to do it over and over and over and over again, it's a huge amount of work. >> And a lot of time? >> And a lot of time, and cumulatively that burden, it's going to slow you down, so that makes a big big difference and secondly, as Liran was explaining, if you put 100,000 files in a directory of other file systems, that is stressful. You want to put more than 100,000 files in a directory of other file systems, that is a tragedy, and we routinely can handle millions of files in a directory, doesn't matter to us at all because just like we distribute the data, we also distribute the metadata, and that's completely counter to the way the other file systems are designed because they were all designed in an era where their focus was on the physical geometry of hard disks, and we have been designed for flash storage. >> And the metadata associated with the distribution of that data typically was in a one file, in one place, and that was the master serialization problem when you come right down to it. So we've got a lot of ML workloads, very large number of files, definitely improved performance because of the parallelism through your file system, in the as I said, the ML world. Let's generalize this. What does this mean overall, you've kind of touched upon it, but what does it mean overall for the way that customers are going to think about storage architectures in the future as they are combining ML and related types of workloads with more traditional types of things? What's the impact of this on storage? >> So if you look at how people architect their solutions around storage recently, you have four different kind of storage systems. If you need the utmost performance, you're going to DAS, Fusion IO had a run, perfecting DAS and then the whole industry realized it. >> Direct attached storage. >> Direct attached storage, right, and then the industry realized hey it makes so much sense, they create a standard out of it, created NVME, but then you're wasting a lot of capacity, and you cannot manage it, you cannot back it up, and then if you need it as some way to manage it, you would put your data over SAN, actually our previous company was XAV storage that IBM acquired, vast majority of our use cases are actually people buying block, and then they overlay a local file system over it because it gets you so much higher performance then if you must get, but you don't get, you cannot share the data. Now, if you put it on a filer, which is Neta, or Islon, or the other solutions, you can share the data but your performance is limited, and your scalability is limited as Andy just said, and if you had to scale through the roof- >> With a shared storage approach. >> With a shared storage approach you had to go and port your application to an object storage which is an enormous feat of engineering, and tons of these projects actually failed. We actually bring the new kind of storage, which is assured storage, as scalable as an object storage, but faster than direct attach storage, so looking at the other traditional storage systems of the last 20 or 30 years, we actually have all the advantages people would come to expect from the different categories, but we don't have any of the downsides. >> Now give us some numbers, or do you have any benchmarks that you can talk about that kind of show or verify or validate this kind of vision that you've got, that Weka's delivering on? >> Definitely, but the i500? >> Sure, sure, we recently actually published our IO500 performance results at the SE1800, SE18 event in Dallas, and there are two different metrics- >> So fast you can go back in time? >> Yes, exactly, there are two different metrics, one metric is like an aggregate total amount of performance, it's a much longer list. I think the one that's more interesting is the one where it's the 10-client version, which we like to focus on because we believe that the most important area for a customer to focus on is how much IO can you deliver to an individual application server? And so this part of the benchmark is most representative of that, and on that rating, we were able to come in second well, after you filter out the irrelevant results, which, that's a separate process. >> Typical of every benchmark. >> Yes exactly, of the relevant meaningful results, we came in second behind the world's largest and most expensive supercomputer at Oak Ridge, the SUMMIT system. So they have a 40 rack system, and we have a half, or maybe a little bit more than half, one rack system of industry standard hardware running our software. So compare that, the cost of our hardware footprint and so forth is much less than a million dollars. >> And what was the differential between the two? >> Five percent. >> Five percent? So okay, sound of jaw dropping. 40 rack system at Oak Ridge? Five percent more performance than you guys running on effectively a half rack of like a supermicro or something like that? >> Oh and it was the first time we ran the benchmark, we were just learning how to run it, so those guys are all experts, they had IBM in there at their elbow helping them with all their tuning and everything, this was literally the first time our engineers ran the benchmark. >> Is a large feature of that the fact that Oak Ridge had to get all that hardware to get the physical IO necessary to run serial jobs, and you guys can just do this parallel on a relatively standard IO subset, NVME subset? >> Because beyond that, you have to learn how to use all those resources, right? All the tuning, all the expertise, one of the things people say is you need a PhD to administer one of those systems, and they're not far off, because it's true that it takes a lot of expertise. Our systems are dirt simple. >> Well you got to move the parallelism somewhere, and either you create it yourself, like you do at Oak Ridge, or you do it using your guys' stuff, through a file system. >> Exactly, and what we are showing that we have tremendously higher IO density, and we actually, what we're showing, that instead of using a local file system, that where most of them were created in the 90s, in the serial way of thinking, of optimizing over hard drives, if now you say, hey, NVME devices, SSDs are beasts at running 4k IOs, if you solve the networking problem, if the network is not the bottleneck anymore, if you just run all your IOs as much parallelized workload over 4k IOs, you actually get much higher performance than what you could get, up until we came, the pinnacle of performance, which is a local file system over a local device. >> Well so NFS has an effective throughput limitation of somewhere around a gigabyte, so if you've got a bunch of GPUs that are each wanting four, five, 10 gigabytes of data coming in, you're not saturating them out of an effective one gigabyte throughput rate, so it's almost like you've got the New York City Waterworks coming in to some of these big file systems, and you got like your little core sink that's actually spitting the data out into the GPUs, have I got that right? >> Good analogy, if you are creating a data lake, and then you're going to sip at it with some tiny little straw, it doesn't matter how much data you have, you can't really leverage the value of all that data that you've accumulated, if you're feeding it into your compute farm, GPU or not, because if you're feeding it into that farm slowly, then you'll never get to it all, right? And meanwhile more data's coming in every day, at a faster rate. It's an impossible situation, so the only solution really is to increase the rate by which you access the data, and that's what we do. >> So I could see how you're making the IO bandwidth junkies at Oak Ridge, or would make them really happy, but the other thing that at least I find interesting about Weka.IO is as you just talked about is that, that you've come up with an approach that's specifically built for SSD, you've moved the parallelism into the file system, as opposed to having it be somewhere else, which is natural, because SSD is not built to persist data, it's built to deliver data, and that suggests as you said earlier, that we're looking at a new way of thinking about storage as a consequence of technologies like Weka, technologies like NVME. Now Andy, you came from NetApp, and I remember what NetApp did to the industry, when it started talking about the advantages of sharing storage. Are we looking at something similar happening here with SSD and NVME and Weka? >> Indeed, I think that's the whole point, it's one of the reasons I'm so excited about it. It's not only because we have this technology that opens up this opportunity, this potential being realized. I think the other thing is, there's a lot of features, there's a lot of meaningful software that needs to be written around this architectural capability, and the team that I joined, their background, coming from having created XIV before, and the almost amazing way they all think together and recognize the market, and the way they interact with customers allows the organization to address realistically customer requirements, so instead of just doing things that we want to do because it seems elegant, or because the technology sparkles in some interesting way, this company, and it remains me of NetApp in the early days, and it was a driver of NetApp's big success, this company is very customer-focused, very customer driven. So when customers tell us what they're trying to do, we want to know more. Tell us in detail how you're trying to get there. What are your requirements? Because if we understand better, then we can engineer what we're doing to meet you there, because we have the fundamental building blocks. Those are mostly done, now what we're trying to do is add the pieces that allow you to implement it into your workflow, into your data center, or into your strategy for leveraging the cloud. >> So Liran, when you're here in 2019, we're having a similar conversation with this customer focus, you've got a value proposition to the IO bandwidth junkies, you can give more, but what's next in your sights? Are you going to show how this for example, you can get higher performance with less hardware? >> So we are already showing how you can get higher performance with less hardware, and I think as we go forward, we're going to have more customers embracing us for more workloads, so what we see already, they get us in for either the high end of their life sciences or their machine learning, and then people working around these people realize hey, I could get some faster speed as well, and then we start expanding within these customers and we get to see more and more workloads where people like us and we can start telling stories about them. The other thing that we have natural to us, we run natively in the cloud, and we actually let you move your workload seamlessly between your on-premises and the cloud, and we are seeing tremendous interest about moving to the cloud today, but not a lot of organizations already do it. I think 19 and forward, we are going to see more and more enterprises considering seriously moving to the cloud, cause we have almost 100% of our customers PFCing, cloudbursting, but not a lot of them using them. I think as time passes, all of them that has seen it working, when they did the initial test, will start leveraging this, and getting the elasticity out of the cloud, because this is what you should get out of the cloud, so this is one way for expansion for us. We are going to spend more resources into Europe, which we have recently started building the team, and later in that year also, JPAC. >> Gentlemen, thanks very much for coming on theCUBE and talking to us about some new advances in file systems that are leading to greater performance, less specialized hardware, and enabling new classes of applications. Liran Zvibel is the CEO of Weka.IO, Andy Watson is the CTO of Weka.IO, thanks for being on theCUBE. >> Thank you very much. >> Yeah, thanks a lot. >> And once again, I'm Peter Burris, and thanks very much for participating in this CUBE Conversation, until next time. (cheery music)

Published Date : Dec 14 2018

SUMMARY :

some of the performance So Liran, you've in the US, and we're And we've hired the CTO, Andy Watson, 2007, kind of the glory years, just based on the conversations we've had, a single client on the one the data locally, and then and then to do it over and distribute the data, we also in the future as they are So if you look at how people and then if you need it as We actually bring the new more interesting is the one Yes exactly, of the than you guys running on the benchmark. expertise, one of the things the parallelism somewhere, in the 90s, in the serial way of thinking, so the only solution the file system, as opposed to and the team that I and the cloud, and we are Liran Zvibel is the CEO and thanks very much for

ENTITIES

Entity	Category	Confidence
Andy	PERSON	0.99+
Peter Burris	PERSON	0.99+
Liran	PERSON	0.99+
30 minutes	QUANTITY	0.99+
ten	QUANTITY	0.99+
Andy Watson	PERSON	0.99+
Liran Zvibel	PERSON	0.99+
2019	DATE	0.99+
Oak Ridge	ORGANIZATION	0.99+
Europe	LOCATION	0.99+
Weka.IO	ORGANIZATION	0.99+
100,000 files	QUANTITY	0.99+
Five percent	QUANTITY	0.99+
IBM	ORGANIZATION	0.99+
40 rack	QUANTITY	0.99+
four hours	QUANTITY	0.99+
two	QUANTITY	0.99+
December 2018	DATE	0.99+
Dallas	LOCATION	0.99+
US	LOCATION	0.99+
2007	DATE	0.99+
Bay Area	LOCATION	0.99+
hundreds of gigabytes	QUANTITY	0.99+
last year	DATE	0.99+
two reasons	QUANTITY	0.99+
Palo Alto, California	LOCATION	0.99+
billions of file directories	QUANTITY	0.99+
NetApp	ORGANIZATION	0.99+
more than 100,000 files	QUANTITY	0.99+
one file	QUANTITY	0.99+
second	QUANTITY	0.99+
this year	DATE	0.99+
NVME	ORGANIZATION	0.99+
mid 90s	DATE	0.99+
one metric	QUANTITY	0.99+
one place	QUANTITY	0.99+
millions of files	QUANTITY	0.98+
90s	DATE	0.98+
five	QUANTITY	0.98+
Weka	ORGANIZATION	0.98+
tens	QUANTITY	0.98+
first time	QUANTITY	0.98+
eight cameras	QUANTITY	0.98+
two different metrics	QUANTITY	0.98+
single directory	QUANTITY	0.98+
trillions of files	QUANTITY	0.98+
one	QUANTITY	0.97+
SE1800	EVENT	0.97+
less than a million dollars	QUANTITY	0.97+
a half	QUANTITY	0.97+
JPAC	ORGANIZATION	0.97+
one way	QUANTITY	0.97+
CUBE Conversation	EVENT	0.96+
10-client	QUANTITY	0.96+
tens times	QUANTITY	0.96+
60 frames per second	QUANTITY	0.96+
Today	DATE	0.96+
NetApp	TITLE	0.96+
two orders	QUANTITY	0.95+
four	QUANTITY	0.95+
almost 100%	QUANTITY	0.94+

Liran Zvibel, WekaIO | CUBEConversation, April 2018

[Music] hi I'm Stu minimun and this is the cube conversation in Silicon angles Palo Alto office happy to welcome back to the program Lear on survival who is the co-founder and CEO of Weka IO thanks so much for joining me thank you for having me over alright so on our research side you know we've really been saying that data is at the center of everything it's in the cloud it's in the network and of course in the storage industry data has always been there but I think especially for customers it's been more front and center well you know why is data becoming more important it's not data growth and some of the other things that we've talked about for decades but you know how was it changing what are you hearing from customers today so I think the main difference is that organization they're starting to understand that the more data they have the better service they're going to provide to their customers and there will be an overall better company than their competitors so about 10 years ago we started hearing about big data and other ways that in a more simpler form just went over sieved through a lot of data and tried to get some sort of high-level meaning out of it last few years people are actually employing deep learning machine learning technique to their vast amounts of data and they're getting much higher level of intelligence out of their huge capacities of data and actually with deep learning the more data you have the better outputs you get before we go into kind of the m/l and the deep learning piece just did kind of a focus on data itself there's some that say you know digital transformation is it's this buzzword when I talk to users absolutely they're going through transformations you know we're saying everybody's becoming a software company but how does data specifically help them with that you know what what what is your viewpoint there and what are you hearing from your customers so if you look at it from the consumer perspective so people now keep track record of their lives at much higher resolution than the and I'm not talking about the images rigid listen I'm talking about the vast amount of data that they store so if I look at how many pictures I have of myself as a kid and how many pictures I have of my kids like you could fit all of my pictures into albums I can probably fit my my kids like a week's worth of time into albums so people keep a lot more data as consumers and then organization keep a lot more data of their customers in order to provide better service and better overall product you know the industry as an industry we saw a real mixed bag when it came to Big Data when I was saying great I have lots more volume of data that doesn't necessarily mean that I got more value out of it so what are the one of the trends that you're seeing why is you know where things like you deep learning machine learning AI you know is it going to be different or is this just kind of the next iteration of well we're trying and maybe we didn't hit as well with big data let's see if this does it does better so I think that Big Data had its glory days and now where they're coming to to the end of that crescendo because people realized that what they got was sort of aggregate of things that they couldn't make too much sense of and then people really understand that for you to make better use of your data you need to employ way similarly to how the brain works so look a lot of data and then you have to have some sense out of their data and once you've made some sense out of that data we can now get computers to go through way more data and make a similar amount of sense out of that and actually get much much better results so just instead of going finding anecdotes or this thing that you were able to do with big date you're actually now are able to generate intelligent systems you know what one of the other things we saw is it used to be okay I have this this huge back catalogue or I'm going to survey all the data I've collected today you know it's much more you know real times a word that's been thrown around for many years you know whether it do you say live data or you know if you're at sensors where I need to have something where I can you know train models react immediately that that kind of immediacy is much more important you know that's what I'm assuming that's something that you're seeing from customers to indeed so what we say is that customers end up collecting vast amounts of data and then they train their models on these kind of data and then they're pushing these intelligent models to the edges and then you're gonna have edges running inference and that could be a straight camera it could be a camera in the store or it could be your car and then usually you run these inference at the endpoints using all the things you've trained the models back then and you will still keep the data push it back and then you should you still run inference at the data center sort of doing QA and now the edges also know to mark where they couldn't make sense of what they saw so the the data center systems know what should we look at first how we make our models smarter for the next iteration because these are closed-loop systems you train them you push through the edges the edges tell you how well you think they think they understood your train again and things improve we're now at the infancy of a lot of these loops but I think the following probably two to five years will take us through a very very fascinating revolution where systems all around us will become way way more intelligent yeah and there's interesting architectural discussions going on if you talk about this edge environment if I'm an autonomous vehicle now from an airplane of course I need to react there I can't go back to the cloud but you know what what happens in the cloud versus what happens at the edge where do where does Weka fit into that that whole discussion so where we currently are running we're running at the data centers so at Weka we created the fastest file system that's perfect for AI and machine learning and training and we make sure that your GPU field servers that are very expensive never sit idle the second component of our system is tearing two very effective object storages that can run into exabytes so we have the system that makes sure you can have as many GPU servers churning all the time and getting the results getting the new models while having the ability to read any form of data that was collected in the several years really through hundreds of petabytes of data sets and now we have customers talking about exabytes of data sets representing a single application not throughout the organization just for that training application yeah so a I in ml you know Keita is that that the killer use case for your customers today so that's one killer application just because of the vast amount of data and the high-performance nature of the clients we actually show clients that runwa kayo finished training sessions ten times faster than how they would use traditional NFS based solutions but just based on the different way we handle data another very strong application for us is around Life Sciences and genomics where we show that we're the only storage that let these processes remain CPU bound so any other storage at some points becomes IO bound so you couldn't paralyzed paralyzed the processing anymore we actually doesn't matter how many servers you run as clients you double the amount of clients you either get the twice the result the same amount of time or you get the same result it's half the time and with genomics nowadays there are applications that are life-saving so hospitals run these things and they need results as fast as they can so faster storage means better healthcare yeah without getting too deep in it because you know the storage industry has lots of wonkiness and it's there's so many pieces there but you know I hear life scientists I think object storage I hear nvme I think block storage your file storage when it comes down to it you know why is that the right architecture you know for today and what advantages does that give you so we we are actually the only company that went through the hassles and the hurdles of utilizing nvme and nvme of the fabrics for a parallel file system all other solutions went the easier route and created the block and the reason we've created a file system is that this is what computers understand this is what the operating system understand when you go to university you learn computer science they teach you how to write programs they need a file system now if you want to run your program over to servers or ten servers what you need is a shirt file system up until we came gold standard was using NFS for sharing files across servers but NFS was actually created in the 80s when Ethernet run at 10 megabit so currently most of our customers run already 100 gigabytes which is four orders of magnitude faster so they're seeing that they cannot run a network protocol that was designed four orders of magnitude last speed with the current demanding workloads so this explains why we had to go and and pick a totally different way of pushing data to the to the clients with regarding to object storages object storages are great because they allow customers to aggregate hard drives into inexpensive large capacity solutions the problem with object storages is that the programming model is different than the standard file system that computers can understand in too thin two ways a when you write something you don't know when it's going to get actually stored it's called eventual consistency and it's very difficult for mortal programmers to actually write a system that is sound that is always correct when you're writing eventual consistency storage the second thing is that objects cannot change you cannot modify them you need to create them you get them or you can delete them they can have versions but this is also much different than how the average programmer is used to write its programs so we are actually tying between the highest performance and vme of the fabrics at the first year and these object storages that are extremely efficient but very difficult to work with at the back and tier two a single solution that is highest performance and best economics right there on I want to give you the last word give us a little bit of a long view you talked about where we've gone how parallel you know architecture helps now that we're at you know 100 Gig look out five years in the future what's gonna happen you know blockchain takes over the world cloud dominates everything but from an infrastructure application in you know storage world you know where does wek I think that the things look like so one one very strong trend that we are saying is around encryption so it doesn't matter what industry I think storing things in clear-text for many organizations just stops making sense and people will demand more and more of day of their data to be encrypted and tighter control around everything that's one very strong trend that we're seeing another very strong trend that we're seeing is enterprises would like to leverage the public cloud but in an efficient way so if you were to run economics moving all your application to the public cloud may end up being more expensive than running everything on Prem and I think a lot of organizations realized that the the trick is going to be each organisation will have to find a balance to what kind of services are run on Prem and these are going to be the services that are run around the clock and what services have the more of a bursty nature and then organization will learn how to leverage the public cloud for its elasticity because if you're just running on the cloud you're not leveraging the elasticity you're doing it wrong and we're actually helping a lot of our customers do it with our hybrid cloud ability to have local workloads and the cloud workloads and getting these whole workflows to actually run is a fascinating process they're on thank you so much for joining us great to hear the update not only on Weka but really where the industry is going dynamic times here in the industry data at the center of all cubes looking to cover it at all the locations including here and our lovely Palo Alto Studio I'm Stu minimun thanks so much for watching the cube thank you very much [Music] you

Published Date : Apr 6 2018

**Summary and Sentiment Analysis are not been shown because of improper transcript**

ENTITIES

Entity	Category	Confidence
Liran Zvibel	PERSON	0.99+
100 gigabytes	QUANTITY	0.99+
April 2018	DATE	0.99+
10 megabit	QUANTITY	0.99+
two	QUANTITY	0.99+
Weka IO	ORGANIZATION	0.99+
Weka	ORGANIZATION	0.99+
twice	QUANTITY	0.99+
Palo Alto	LOCATION	0.99+
second thing	QUANTITY	0.99+
five years	QUANTITY	0.98+
second component	QUANTITY	0.98+
each organisation	QUANTITY	0.98+
first year	QUANTITY	0.98+
today	DATE	0.97+
Stu minimun	PERSON	0.97+
two ways	QUANTITY	0.97+
Prem	ORGANIZATION	0.96+
ten times	QUANTITY	0.95+
about 10 years ago	DATE	0.94+
one	QUANTITY	0.94+
Stu minimun	PERSON	0.94+
last few years	DATE	0.93+
hundreds of petabytes of data sets	QUANTITY	0.93+
first	QUANTITY	0.92+
several years	QUANTITY	0.92+
80s	DATE	0.91+
single application	QUANTITY	0.9+
decades	QUANTITY	0.9+
a lot of data	QUANTITY	0.89+
Silicon angles	LOCATION	0.89+
half the time	QUANTITY	0.87+
ten servers	QUANTITY	0.87+
two very effective object	QUANTITY	0.87+
single solution	QUANTITY	0.86+
four orders	QUANTITY	0.85+
four orders	QUANTITY	0.85+
a week	QUANTITY	0.84+
Palo Alto Studio	ORGANIZATION	0.8+
lot more data	QUANTITY	0.78+
WekaIO	ORGANIZATION	0.78+
100 Gig	QUANTITY	0.74+
Lear on	TITLE	0.72+
double	QUANTITY	0.72+
many pieces	QUANTITY	0.65+
Keita	ORGANIZATION	0.63+
lot of data	QUANTITY	0.6+
lot	QUANTITY	0.58+
lots	QUANTITY	0.58+
application	QUANTITY	0.56+
vast amounts of data	QUANTITY	0.54+
exabytes	QUANTITY	0.53+
trend	QUANTITY	0.52+
CEO	PERSON	0.5+
Big Data	ORGANIZATION	0.45+

Breaking Analysis: Legacy Storage Spending Wanes as Cloud Momentum Builds

(digital music) >> From theCUBE Studios in Palo Alto in Boston, bringing you data-driven insights from theCUBE and ETR. This is Breaking Analysis with Dave Vellante. >> The storage business as we know it has changed forever. On-prem storage was once a virtually unlimited and untapped bastion of innovation, VC funding and lucrative exits. Today it's a shadow of its former self and the glory days of storage will not return. Hello everyone, and welcome to this week's Wikibon CUBE Insights Powered by ETR. In this breaking analysis, we'll lay out our premise for what's happening in the storage industry, and share some fresh insights from our ETR partners, and data that supports our thinking. We've had three decades of tectonic shifts in the storage business. From the simplified history of this industry shows us there've been five major waves of innovation spanning five decades. The dominant industry model has evolved from what was first the mainframe centric vertically integrated business, but of course by IBM and it became a disintegrated business that saw between like 70 or 80 Winchester disk drive companies that rose and then fell. They served a booming PC industry in this way it was led by the likes of Seagate. Now Seagate supplied the emergence of an intelligent controller based external disc array business that drove huge margins for functions that while lucrative was far cheaper than captive storage from system vendors, this era of course was led by EMC and NetApp. And then this business was disrupted by a flash and software defined model that was led by Pure Storage and also VMware. Now the future of storage is being defined by cloud and intelligent data management is being led by AWS and a three letter company that we'll just call TBD, otherwise known as Jump Ball Incorporated. Now, let's get into it here, the impact of AWS cannot be overstated now while legacy storage players, they're sick and tired of talking about the cloud, the reality cannot be ignored. The cloud has been the most disruptive force in storage over the past 10 years, and we've reported on the spending impact extensively. But cloud is not the only factor pressuring the on-prem storage business, flash has killed what we call performance by spindles. In other words, the practice of adding more disk drives to keep performance from tanking. So much flash has been injected into the data center that that no longer is required. But now as you drill down into the cloud, AWS has been by far the most significant factor in our view. Lots of people talked about object storage before AWS, but there sure wasn't much spending going on, S3 changed that. AWS is getting much more aggressive about expanding its storage portfolio and its offerings. S3 came out in 2006 and it was the very first AWS service and then Elastic Block Service EBS came out a couple of years later, nobody really paid much attention. Well last fall at storage day, we saw AWS announce a number of services, many fire-related and this year we saw four new announcements of Amazon at re:Invent. We think AWS' storage revenue will surpass 8 billion this year and could be as high as 10 billion. There's not much data out there, but this would mean that AWS' storage biz is larger than that of a NetApp, which means AWS is larger than every traditional storage player with the exception of Dell. Here's a little glimpse of what's coming at the legacy storage business. It's a clip of the vice-president of AWS storage, her name is Mahlon Thompson Bukovec, watch this. Okay now, you may say Dave, what the heck does that have to do with anything? Yeah, I don't know, but as an older white guy, that's been in this business for awhile, I just think it's badass that this woman boxes and runs a business that we think is approaching $10 billion. Now let's take a quick look at the storage announcements AWS made at re:Invent. The company made four announcements this year, let me try to be brief, the first is EBS io2 Block Express Volumes, got to love the names. AWS was claims this is the first storage area network or sand for the cloud and it offers up to 256,000 IOPS and 4,000 megabytes per second throughput and 64 terabytes of capacity. Hey, sounds pretty impressive right, Well let's dig in a little bit okay, first of all, this is not the first sand in the cloud, at least in my view there may be others but Pure Storage announced cloud block store in 2019 at its annual accelerate customer conference and it's pretty comparable here. Maybe not so much in the speeds and feeds, but the concept of better block storage in the cloud with higher availability. Now, as you may also be saying, what's the big deal? The performance come on, we can smoke that we're on-prem vendor We can bury that. Compared to what we do, AWS' announcement is really not that impressive okay, let me give you a point of comparison there's a startup out there called VAST Data. Just there for you and closure with bundled storage and compute can do 400,000 IOPS and 40,000 megabytes per second and that can be scaled, so yeah, I get it. And AWS also announced that io2 two was priced at 20% less than previous generation volumes, which you might say is also no big deal and I would agree 20% is not as aggressive as the average price decline per gigabyte of any storage technology. AWS loves to make a big deal about its price declines, it's essentially following the industry trends but the point is that this feature will be great for a lot of workloads and it's fully integrated with AWS services meaning for example, it will be very convenient for AWS customers to invoke this capability for example Aurora and other AWS databases through its RDS service, just another easy button for developers to push. This is specially important as we see AWS rapidly expanding its machine learning in AI capabilities with SageMaker, it's embedding ML into things like Redshift and driving analytics, so integration is very key for its customers. Now, is Amazon retail going to run its business on io2 volumes? I doubt it. I believe they're running on Oracle and they need much better performance, but this is a mainstream service for the EBS masses to tap. Now, the other notable announcement was EBS Gp3 volumes. This is essentially a service that lets let you programmatically set SLAs for IOPS and throughput independently without needing to add additional storage. Again, you may be saying things like, well atleast I remember when SolidFire let me do this several years ago and gave me more than 3000 IOPS and 125 megabytes per a second performance, but look, this is great for mainstream customers that want more consistent and predictable performance and that want to set some kind of threshold or floor and it's integrated again into the AWS stack. Two other announcements were made, one that automatically tiers data to colder storage tiers and a replication service. On the former, data migrates to tier two after 90 days of inaccess and tier three, after 180 days. AWS remember, they hired a bunch of folks out of EMC years ago and they put them up in the Boston Seaport area, so they've acquired lots of expertise in a lot of different areas I'm not sure if tiering came out of that group but look, this stuff is not rocket science, but it saves customers money. So these are tried and true techniques that AWS is applying but the important thing is it's in the cloud. Now for sure we'd like to see more policy options than say for example, a fixed 90 day or 180 day policy and more importantly we'd like to see intelligent tiering where the machine is smart enough to elevate and promote certain datasets when they're needed for instance, at the end of a quarter for comparison purposes or at the end of the year, but as NFL Hall of Fame Coach Hank Stram would have said, AWS is matriculating the ball down the field. Okay, let's look at some of the data that supports what we're saying here in our premise today. This chart shows spending across the ETR taxonomy. It depicts the net score or spending velocity for different sectors. We've highlighted storage, now don't put too much weight on the January data because the survey was just launched, but you can see storage continues to be a back burner item relative to some other spending priorities. Now as I've reported, CIOs are really focused on cloud, containers, container orchestration, automation, productivity and other key areas like security. Now let's take a look at some of the financial data from the storage crowd. This chart shows data for eight leading names in storage and we put storage in quotes because as we said earlier, the market is shifting and for sure companies like Cohesity and Rubrik, they're not positioning as storage players in fact, that's the last thing they want to do. Rather they're category creators around data management or intelligent data management but their inadjacency to storage, they're partnering with all the primary storage companies and they're in the ETR taxonomy. Okay, so as you can see, we're showing the year over year, quarterly revenue growth for the leading storage companies. NetApp is a big winner, they're growing at a whopping 2%. They beat expectations, but expectations were way down so you can see in the right most column upper right, we've added the ETR net score from October and net score of 10% says that if you ask customers, are you spending more or less with a company, there are 10% of the customers that are essentially spending more than are spending less, get into that a little further later. For comparison, a company like Snowflake, it has a net score approaching 70% Pure Storage used to be that high several years ago or high sixties anyway. So 10% is in the red zone and yet NetApp, is the big winner this quarter. Now Nutanix isn't really again a storage company, but they're an adjacency and they sell storage and like many of these companies, it's transitioning to a subscription pricing model, so that puts pressure on the income statement, that's why they went out and did a deal with Bain, Bain put in $750 million to help Bridge that transition so that's kind of an interesting move. Every company in this chart is moving to an annual recurring revenue model and that as a service approach is going to be the norm by the end of the decade. HPE's doing it with GreenLake, Dell has announced Apex, virtually every company is headed in this direction. Now speaking of HPE, it's Nimble business that has momentum, but other parts of the storage portfolio are quite a bit softer. Dell continues to see pressure on its storage business although VxRail is a bright spot. Everybody's got a bright spot, everybody's got new stuff that's growing much faster than the old stuff, the problem is the old stuff is much much bigger than the new stuff. IBM's mainframe storage cycle, well that's seems to have run its course, they had been growing for the last several quarters that looks like it's over. And so very very cyclical businesses here now as you can see, The data protection data management companies, they are showing spending momentum but they're not public so we don't have revenue data. But you got to wonder with all the money these guys have raised and the red hot IPO and tech markets, why haven't these guys gone public? The answer has to be that they're either not ready or maybe their a numbers weren't where they want them to be, maybe they're not predictable enough, maybe they don't have their operational act together or maybe they need to you get that in order, some combination of those factors is likely. They'll tell you, they'll give other answers if you ask them, but if they had their stuff together they'd be going out right now. Now here's another look at the spending data in terms of net score, which is again spending velocity. The ETR here is measuring the percent of respondents that are adopting new, spending more, spending flat, spending less or retiring the platform. So net score is adoptions, which is the lime green plus the spending more, which is the forest green. Add those two and then subtract spending less, which is the pink and then leaving the platform, which is the bright red, what's left over is net score. So, let's look at the picture here, Cohesity leads all players in the storage taxonomy, the ETR storage taxonomy, again they don't position that way, but that's the way the customers are answering. They've got 55% net score which is really solid and you can see the data in the upper right-hand corner, it's followed by Nutanix. Now they're really not again in the scope of Pure play storage play but speaking of Pure, its net score has come down from its high of 73% in January, 2016. It's not going to climb back up there, but it's going to be interesting to see if Pure net scorecard rebound in a post COVID world. We're also watching what Pure does in terms of unifying file and object and how it's fairing in cloud and what it does with the Portworx acquisition which is really designed to bring forth a new programming model. Now, Dell is doing fine with VxRail, but VSAN is well off its net score highs which we're in the 60% plus range a couple of years ago, VSAN is definitely been a factor from VMware, but again that's come off its highs, HPE with Nimble still has some room to improve, I think it actually will I think that these figures that we're showing here they're are somewhat depressed by the COVID factor, I expect Nimble is going to bounce back in future surveys. Dell and NetApp are the big leaders in terms of presence or market share in the data other than VMware, 'cause VMware has a lot of instances, it's software defined that's why they're so prominent. And with VMware's large share you'd expect them to have net scores that are tepid and you can see a similar pattern with IBM. So Dell, NetApp, tepid net scores as is IBM because of their large market share VMware, kind of a newer entry into the play and so doing pretty well there from a net score standpoint. Now Commvault like Cohesity and Rubrik is really around intelligent data management, trying to go beyond backup into business recovery, data protection, DevOps, bringing that analytics, bringing that to the cloud, we didn't put Veeam in here and we probably should have. They had pre-COVID net scores well in to the thirties and they have a steadily increasing share of the market, so we expect good things from Veeam going forward. They were acquired earlier this year by Insight, capital private equity firm. So big changes there as well, that was their kind of near-term exit maybe more to come. But look, it's all relative, this is a large and mature market that is moving to the cloud and moving to other adjacencies. And the core is still primary storage, that's the main supreme prerequisite and everything else flows from there, data protection, replication, everything else. This chart gives you another view of the competitive landscape, it's that classic XY chart it plots net score in the vertical axis and market share on the horizontal axis, market share remember is a measure of presence in the dataset. Now think about this from the CIO's perspective, they have their on-prem estate, got all this infrastructure and they're putting a brick wall around their core systems. And what do they want out of storage for that class of workload? They want it to perform consistently, they want it to be efficient and they want it to be cost-effective, so what are they going to do? they're going to consolidate, They're going to consolidate the number of vendors, they're going to consolidate the storage, they're going to minimize complexity, yeah, they're going to worry about the blast radius, but there's ways to architect around that. The last thing they want to worry about is managing a zillion storage vendors this business is consolidating, it has been for some time, we've seen the number of independent storage players that are going public as consolidated over the years, and it's going to continue. so on-prem storage arrays are not giving CIOs the innovation and strategic advantage back when things like storage virtualization, space efficient snapshots, data de-duplication and other storage services were worth maybe taking a flyer on a feature product like for example, a 3PAR or even a Data Domain. Now flash gave the CIOs more headroom and better performance and so as I said earlier, they're not just buying spindles to increase performance, so as more and more work gets pushed to the cloud, you're seeing a bunkering in on these large scale mission-critical workloads. As you saw earlier, the legacy storage market is consolidating and has been for a while as I just said, it's essentially becoming a managed decline business where RnD is going to increasingly get squeezed and go to other areas, both from the vendor community and on the buy-side where they're investing on things like cloud, containers and in building new layers in their business and of course the DX, the Digital Transformation. I mentioned VAST Data before, it is a company that's growing and another company that's growing is Infinidat and these guys are traditional storage on-prem models they don't bristle If I say traditional they're nexgen if you will but they don't own a cloud, so they were selling to the data center. Now Infinidat is focused on petabyte scale and as they say, they're growing revenues, they're having success consolidating storage that thing that I just talked about. Ironically, these are two Israeli founder based companies that are growing and you saw earlier, this is a share shift the market is not growing overall the part of that's COVID, but if you exclude cloud, the market is under pressure. Now these two companies that I'm mentioning, they're kind of the exception to the rule here, they're tiny in the grand scheme of things, they're really not going to shift the market and their end game is to get acquired so they can still share, but they're not going to reverse these trends. And every one on this chart, every on-prem player has to have a cloud strategy where they connect into the cloud, where they take advantage of native cloud services and they help extend their respective install bases into the cloud, including having a capability that is physically proximate to the cloud with a colo like an Equinix or some other approach. Now, for example at re:Invent, we saw that AWS has hybrid strategy, we saw that evolving. AWS is trying to bring AWS to the edge and they treat the data center as just another edge note, so outposts and smaller versions of outposts and things like local zones are all part of bringing AWS to the edge. And we saw a few companies Pure, Infinidant, Veeam come to mind that are connecting to outpost. They saw the Qumulo was in there, Clumio, Commvault, WekaIO is also in there and I'm sure I'm missing some so, DM me, email me, yell at me, I'm sorry I forgot you but you get the point. These companies that are selling on-prem are connecting to the cloud, they're forced to connect to the cloud much in the same way as they were forced to join the VMware ecosystem and try to add value, try to keep moving fast. So, that's what's going on here, what's the prognosis for storage in the coming year? Well, where've of all the good times gone? Look, we would never bet against data but the days of selling storage controllers that masks the deficiencies of spinning disc or add embedded hardware functions or easily picking off a legacy install base with flash, well, those days are gone. Repatriation, it ain't happening it's maybe tiny little pockets. CIOs are rationalizing their on-premises portfolios so they can invest in the cloud, AI, machine learning, machine intelligence, automation and they're re-skilling their teams. Low latency high bandwidth workloads with minimal jitter, that's the sweet spot for on-prem it's becoming the mainframe of storage. CIOs are also developing a cloud first strategy yes, the world is hybrid but what does that mean to CIOs? It means you're going to have some work in the cloud and some work on-prem, there's a hybrid We've got both. Everything that can go to the cloud, will go to the cloud, in our opinion and everything that can't or shouldn't won't. Yes, people will make mistakes and they'll "repatriate" but generally that's the trend. And the CIOs they're building an abstraction layer to connect workloads from an observability and manageability standpoint so they can maintain control and manage lock-in risk, they have options. Everything that doesn't go to the cloud will likely have some type of hybridicity to it, the reverse won't likely be the case. For vendors, cloud strategies involve supporting your install basis migration to the cloud, that's where they're going, that's where they want to go, they want your help there's business to be made there so enabling low latency hybrids in accommodating subscription models, well, that's a whole another topic, but that's the trend that we see and you rethink the business that you're in, for instance, data management and developing an edge strategy that recognizes that edge workloads are going to require new architecture and that's more efficient than what we've seen built around general purpose systems, and wow, that's a topic for another day. You're seeing this whole as a service model really reshape the entire cultures in the way in which the on-prem vendors are operating no longer is it selling a box that has dramatically marked up controllers and disc drives, it's really thinking about services that could be invoked in the cloud. Now remember, these episodes are all available as podcasts, wherever you listen, just search Breaking Analysis podcasts and please subscribe, I'd appreciate that checkout etr.plus for all the survey action. We also publish a full report every week on wikibon.com and siliconangle.com. A lot of ways to get in touch. You can email me at david.vellante@siliconangle.com. you could DM me @dvellante on Twitter, comment on our LinkedIn posts, I always appreciate that. This is Dave Vellante for theCUBE Insights Powered by ETR. Thanks for watching everyone stay safe and we'll see you next time. (upbeat music)

Published Date : Dec 12 2020

SUMMARY :

This is Breaking Analysis and of course the DX, the

ENTITIES

Entity	Category	Confidence
Seagate	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Nutanix	ORGANIZATION	0.99+
2006	DATE	0.99+
IBM	ORGANIZATION	0.99+
AWS'	ORGANIZATION	0.99+
Hank Stram	PERSON	0.99+
January, 2016	DATE	0.99+
October	DATE	0.99+
two	QUANTITY	0.99+
TBD	ORGANIZATION	0.99+
HPE	ORGANIZATION	0.99+
Palo Alto	LOCATION	0.99+
Jump Ball Incorporated	ORGANIZATION	0.99+
Infinidat	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
20%	QUANTITY	0.99+
January	DATE	0.99+
64 terabytes	QUANTITY	0.99+
Dell	ORGANIZATION	0.99+
60%	QUANTITY	0.99+
55%	QUANTITY	0.99+
two companies	QUANTITY	0.99+
Boston Seaport	LOCATION	0.99+
90 day	QUANTITY	0.99+
73%	QUANTITY	0.99+
125 megabytes	QUANTITY	0.99+
70%	QUANTITY	0.99+
180 day	QUANTITY	0.99+
8 billion	QUANTITY	0.99+
Snowflake	ORGANIZATION	0.99+
$750 million	QUANTITY	0.99+
2019	DATE	0.99+
10%	QUANTITY	0.99+
EMC	ORGANIZATION	0.99+
10 billion	QUANTITY	0.99+
NetApp	ORGANIZATION	0.99+
Veeam	ORGANIZATION	0.99+
david.vellante@siliconangle.com	OTHER	0.99+
Pure Storage	ORGANIZATION	0.99+
400,000 IOPS	QUANTITY	0.99+
$10 billion	QUANTITY	0.99+
Apex	ORGANIZATION	0.99+
2%	QUANTITY	0.99+
Cohesity	ORGANIZATION	0.99+
Nimble	ORGANIZATION	0.99+
both	QUANTITY	0.99+
this year	DATE	0.99+
Oracle	ORGANIZATION	0.99+
Rubrik	ORGANIZATION	0.99+
Infinidant	ORGANIZATION	0.99+
first	QUANTITY	0.99+
90 days	QUANTITY	0.99+
siliconangle.com	OTHER	0.98+
wikibon.com	OTHER	0.98+

Wikibon Action Item | De-risking Digital Business | March 2018

>> Hi I'm Peter Burris. Welcome to another Wikibon Action Item. (upbeat music) We're once again broadcasting from theCube's beautiful Palo Alto, California studio. I'm joined here in the studio by George Gilbert and David Floyer. And then remotely, we have Jim Kobielus, David Vellante, Neil Raden and Ralph Finos. Hi guys. >> Hey. >> Hi >> How you all doing? >> This is a great, great group of people to talk about the topic we're going to talk about, guys. We're going to talk about the notion of de-risking digital business. Now, the reason why this becomes interesting is, the Wikibon perspective for quite some time has been that the difference between business and digital business is the role that data assets play in a digital business. Now, if you think about what that means. Every business institutionalizes its work around what it regards as its most important assets. A bottling company, for example, organizes around the bottling plant. A financial services company organizes around the regulatory impacts or limitations on how they share information and what is regarded as fair use of data and other resources, and assets. The same thing exists in a digital business. There's a difference between, say, Sears and Walmart. Walmart mades use of data differently than Sears. And that specific assets that are employed and had a significant impact on how the retail business was structured. Along comes Amazon, which is even deeper in the use of data as a basis for how it conducts its business and Amazon is institutionalizing work in quite different ways and has been incredibly successful. We could go on and on and on with a number of different examples of this, and we'll get into that. But what it means ultimately is that the tie between data and what is regarded as valuable in the business is becoming increasingly clear, even if it's not perfect. And so traditional approaches to de-risking data, through backup and restore, now needs to be re-thought so that it's not just de-risking the data, it's de-risking the data assets. And, since those data assets are so central to the business operations of many of these digital businesses, what it means to de-risk the whole business. So, David Vellante, give us a starting point. How should folks think about this different approach to envisioning business? And digital business, and the notion of risk? >> Okay thanks Peter, I mean I agree with a lot of what you just said and I want to pick up on that. I see the future of digital business as really built around data sort of agreeing with you, building on what you just said. Really where organizations are putting data at the core and increasingly I believe that organizations that have traditionally relied on human expertise as the primary differentiator, will be disrupted by companies where data is the fundamental value driver and I think there are some examples of that and I'm sure we'll talk about it. And in this new world humans have expertise that leverage the organization's data model and create value from that data with augmented machine intelligence. I'm not crazy about the term artificial intelligence. And you hear a lot about data-driven companies and I think such companies are going to have a technology foundation that is increasingly described as autonomous, aware, anticipatory, and importantly in the context of today's discussion, self-healing. So able to withstand failures and recover very quickly. So de-risking a digital business is going to require new ways of thinking about data protection and security and privacy. Specifically as it relates to data protection, I think it's going to be a fundamental component of the so-called data-driven company's technology fabric. This can be designed into applications, into data stores, into file systems, into middleware, and into infrastructure, as code. And many technology companies are going to try to attack this problem from a lot of different angles. Trying to infuse machine intelligence into the hardware, software and automated processes. And the premise is that meaty companies will architect their technology foundations, not as a set of remote cloud services that they're calling, but rather as a ubiquitous set of functional capabilities that largely mimic a range of human activities. Including storing, backing up, and virtually instantaneous recovery from failure. >> So let me build on that. So what you're kind of saying if I can summarize, and we'll get into whether or not it's human expertise or some other approach or notion of business. But you're saying that increasingly patterns in the data are going to have absolute consequential impacts on how a business ultimately behaves. We got that right? >> Yeah absolutely. And how you construct that data model, and provide access to the data model, is going to be a fundamental determinant of success. >> Neil Raden, does that mean that people are no longer important? >> Well no, no I wouldn't say that at all. I'm talking with the head of a medical school a couple of weeks ago, and he said something that really resonated. He said that there're as many doctors who graduated at the bottom of their class as the top of their class. And I think that's true of organizations too. You know what, 20 years ago I had the privilege of interviewing Peter Drucker for an hour and he foresaw this, 20 years ago, he said that people who run companies have traditionally had IT departments that provided operational data but they needed to start to figure out how to get value from that data and not only get value from that data but get value from data outside the company, not just internal data. So he kind of saw this big data thing happening 20 years ago. Unfortunately, he had a prejudice for senior executives. You know, he never really thought about any other people in an organization except the highest people. And I think what we're talking about here is really the whole organization. I think that, I have some concerns about the ability of organizations to really implement this without a lot of fumbles. I mean it's fine to talk about the five digital giants but there's a lot of companies out there that, you know the bar isn't really that high for them to stay in business. And they just seem to get along. And I think if we're going to de-risk we really need to help companies understand the whole process of transformation, not just the technology. >> Well, take us through it. What is this process of transformation? That includes the role of technology but is bigger than the role of technology. >> Well, it's like anything else, right. There has to be communication, there has to be some element of control, there has to be a lot of flexibility and most importantly I think there has to be acceptability by the people who are going to be affected by it, that is the right thing to do. And I would say you start with assumptions, I call it assumption analysis, in other words let's all get together and figure out what our assumptions are, and see if we can't line em up. Typically IT is not good at this. So I think it's going to require the help of a lot of practitioners who can guide them. >> So Dave Vellante, reconcile one point that you made I want to come back to this notion of how we're moving from businesses built on expertise and people to businesses built on expertise resident as patterns in the data, or data models. Why is it that the most valuable companies in the world seem to be the ones that have the most real hardcore data scientists. Isn't that expertise and people? >> Yeah it is, and I think it's worth pointing out. Look, the stock market is volatile, but right now the top-five companies: Apple, Amazon, Google, Facebook and Microsoft, in terms of market cap, account for about $3.5 trillion and there's a big distance between them, and they've clearly surpassed the big banks and the oil companies. Now again, that could change, but I believe that it's because they are data-driven. So called data-driven. Does that mean they don't need humans? No, but human expertise surrounds the data as opposed to most companies, human expertise is at the center and the data lives in silos and I think it's very hard to protect data, and leverage data, that lives in silos. >> Yes, so here's where I'll take exception to that, Dave. And I want to get everybody to build on top of this just very quickly. I think that human expertise has surrounded, in other businesses, the buildings. Or, the bottling plant. Or, the wealth management. Or, the platoon. So I think that the organization of assets has always been the determining factor of how a business behaves and we institutionalized work, in other words where we put people, based on the business' understanding of assets. Do you disagree with that? Is that, are we wrong in that regard? I think data scientists are an example of reinstitutionalizing work around a very core asset in this case, data. >> Yeah, you're saying that the most valuable asset is shifting from some of those physical assets, the bottling plant et cetera, to data. >> Yeah we are, we are. Absolutely. Alright, David Foyer. >> Neil: I'd like to come in. >> Panelist: I agree with that too. >> Okay, go ahead Neil. >> I'd like to give an example from the news. Cigna's acquisition of Express Scripts for $67 billion. Who the hell is Cigna, right? Connecticut General is just a sleepy life insurance company and INA was a second-tier property and casualty company. They merged a long time ago, they got into health insurance and suddenly, who's Express Scripts? I mean that's a company that nobody ever even heard of. They're a pharmacy benefit manager, what is that? They're an information management company, period. That's all they do. >> David Foyer, what does this mean from a technology standpoint? >> So I wanted to to emphasize one thing that evolution has always taught us. That you have to be able to come from where you are. You have to be able to evolve from where you are and take the assets that you have. And the assets that people have are their current systems of records, other things like that. They must be able to evolve into the future to better utilize what those systems are. And the other thing I would like to say-- >> Let me give you an example just to interrupt you, because this is a very important point. One of the primary reasons why the telecommunications companies, whom so many people believed, analysts believed, had this fundamental advantage, because so much information's flowing through them is when you're writing assets off for 30 years, that kind of locks you into an operational mode, doesn't it? >> Exactly. And the other thing I want to emphasize is that the most important thing is sources of data not the data itself. So for example, real-time data is very very important. So what is your source of your real-time data? If you've given that away to Google or your IOT vendor you have made a fundamental strategic mistake. So understanding the sources of data, making sure that you have access to that data, is going to enable you to be able to build the sort of processes and data digitalization. >> So let's turn that concept into kind of a Geoffrey Moore kind of strategy bromide. At the end of the day you look at your value proposition and then what activities are central to that value proposition and what data is thrown off by those activities and what data's required by those activities. >> Right, both internal-- >> We got that right? >> Yeah. Both internal and external data. What are those sources that you require? Yes, that's exactly right. And then you need to put together a plan which takes you from where you are, as the sources of data and then focuses on how you can use that data to either improve revenue or to reduce costs, or a combination of those two things, as a series of specific exercises. And in particular, using that data to automate in real-time as much as possible. That to me is the fundamental requirement to actually be able to do this and make money from it. If you look at every example, it's all real-time. It's real-time bidding at Google, it's real-time allocation of resources by Uber. That is where people need to focus on. So it's those steps, practical steps, that organizations need to take that I think we should be giving a lot of focus on. >> You mention Uber. David Vellante, we're just not talking about the, once again, talking about the Uberization of things, are we? Or is that what we mean here? So, what we'll do is we'll turn the conversation very quickly over to you George. And there are existing today a number of different domains where we're starting to see a new emphasis on how we start pricing some of this risk. Because when we think about de-risking as it relates to data give us an example of one. >> Well we were talking earlier, in financial services risk itself is priced just the way time is priced in terms of what premium you'll pay in terms of interest rates. But there's also something that's softer that's come into much more widely-held consciousness recently which is reputational risk. Which is different from operational risk. Reputational risk is about, are you a trusted steward for data? Some of that could be personal information and a use case that's very prominent now with the European GDPR regulation is, you know, if I ask you as a consumer or an individual to erase my data, can you say with extreme confidence that you have? That's just one example. >> Well I'll give you a specific number on that. We've mentioned it here on Action Item before. I had a conversation with a Chief Privacy Officer a few months ago who told me that they had priced out what the fines to Equifax would have been had the problem occurred after GDPR fines were enacted. It was $160 billion, was the estimate. There's not a lot of companies on the planet that could deal with $160 billion liability. Like that. >> Okay, so we have a price now that might have been kind of, sort of mushy before. And the notion of trust hasn't really changed over time what's changed is the technical implementations that support it. And in the old world with systems of record we basically collected from our operational applications as much data as we could put it in the data warehouse and it's data marked satellites. And we try to govern it within that perimeter. But now we know that data basically originates and goes just about anywhere. There's no well-defined perimeter. It's much more porous, far more distributed. You might think of it as a distributed data fabric and the only way you can be a trusted steward of that is if you now, across the silos, without trying to centralize all the data that's in silos or across them, you can enforce, who's allowed to access it, what they're allowed to do, audit who's done what to what type of data, when and where? And then there's a variety of approaches. Just to pick two, one is where it's discovery-oriented to figure out what's going on with the data estate. Using machine learning this is, Alation is an example. And then there's another example, which is where you try and get everyone to plug into what's essentially a new system catalog. That's not in a in a deviant mesh but that acts like the fabric for your data fabric, deviant mesh. >> That's an example of another, one of the properties of looking at coming at this. But when we think, Dave Vellante coming back to you for a second. When we think about the conversation there's been a lot of presumption or a lot of bromide. Analysts like to talk about, don't get Uberized. We're not just talking about getting Uberized. We're talking about something a little bit different aren't we? >> Well yeah, absolutely. I think Uber's going to get Uberized, personally. But I think there's a lot of evidence, I mentioned the big five, but if you look at Spotify, Waze, AirbnB, yes Uber, yes Twitter, Netflix, Bitcoin is an example, 23andme. These are all examples of companies that, I'll go back to what I said before, are putting data at the core and building humans expertise around that core to leverage that expertise. And I think it's easy to sit back, for some companies to sit back and say, "Well I'm going to wait and see what happens." But to me anyway, there's a big gap between kind of the haves and the have-nots. And I think that, that gap is around applying machine intelligence to data and applying cloud economics. Zero marginal economics and API economy. An always-on sort of mentality, et cetera et cetera. And that's what the economy, in my view anyway, is going to look like in the future. >> So let me put out a challenge, Jim I'm going to come to you in a second, very quickly on some of the things that start looking like data assets. But today, when we talk about data protection we're talking about simply a whole bunch of applications and a whole bunch of devices. Just spinning that data off, so we have it at a third site. And then we can, and it takes to someone in real-time, and then if there's a catastrophe or we have, you know, large or small, being able to restore it often in hours or days. So we're talking about an improvement on RPO and RTO but when we talk about data assets, and I'm going to come to you in a second with that David Floyer, but when we talk about data assets, we're talking about, not only the data, the bits. We're talking about the relationships and the organization, and the metadata, as being a key element of that. So David, I'm sorry Jim Kobielus, just really quickly, thirty seconds. Models, what do they look like? What are the new nature of some of these assets look like? >> Well the new nature of these assets are the machine learning models that are driving so many business processes right now. And so really the core assets there are the data obviously from which they are developed, and also from which they are trained. But also very much the knowledge of the data scientists and engineers who build and tune this stuff. And so really, what you need to do is, you need to protect that knowledge and grow that knowledge base of data science professionals in your organization, in a way that builds on it. And hopefully you keep the smartest people in house. And they can encode more of their knowledge in automated programs to manage the entire pipeline of development. >> We're not talking about files. We're not even talking about databases, are we David Floyer? We're talking about something different. Algorithms and models are today's technology's really really set up to do a good job of protecting the full organization of those data assets. >> I would say that they're not even being thought about yet. And going back on what Jim was saying, Those data scientists are the only people who understand that in the same way as in the year 2000, the COBOL programmers were the only people who understood what was going on inside those applications. And we as an industry have to allow organizations to be able to protect the assets inside their applications and use AI if you like to actually understand what is in those applications and how are they working? And I think that's an incredibly important de-risking is ensuring that you're not dependent on a few experts who could leave at any moment, in the same way as COBOL programmers could have left. >> But it's not just the data, and it's not just the metadata, it really is the data structure. >> It is the model. Just the whole way that this has been put together and the reason why. And the ability to continue to upgrade that and change that over time. So those assets are incredibly important but at the moment there is no way that you can, there isn't technology available for you to actually protect those assets. >> So if I combine what you just said with what Neil Raden was talking about, David Vallante's put forward a good vision of what's required. Neil Raden's made the observation that this is going to be much more than technology. There's a lot of change, not change management at a low level inside the IT, but business change and the technology companies also have to step up and be able to support this. We're seeing this, we're seeing a number of different vendor types start to enter into this space. Certainly storage guys, Dylon Sears talking about doing a better job of data protection we're seeing middleware companies, TIBCO and DISCO, talk about doing this differently. We're seeing file systems, Scality, WekaIO talk about doing this differently. Backup and restore companies, Veeam, Veritas. I mean, everybody's looking at this and they're all coming at it. Just really quickly David, where's the inside track at this point? >> For me it is so much whitespace as to be unbelievable. >> So nobody has an inside track yet. >> Nobody has an inside track. Just to start with a few things. It's clear that you should keep data where it is. The cost of moving data around an organization from inside to out, is crazy. >> So companies that keep data in place, or technologies to keep data in place, are going to have an advantage. >> Much, much, much greater advantage. Sure, there must be backups somewhere. But you need to keep the working copies of data where they are because it's the real-time access, usually that's important. So if it originates in the cloud, keep it in the cloud. If it originates in a data-provider, on another cloud, that's where you should keep it. If it originates on your premise, keep it where it originated. >> Unless you need to combine it. But that's a new origination point. >> Then you're taking subsets of that data and then combining that up for itself. So that would be my first point. So organizations are going to need to put together what George was talking about, this metadata of all the data, how it interconnects, how it's being used. The flow of data through the organization, it's amazing to me that when you go to an IT shop they cannot define for you how the data flows through that data center or that organization. That's the requirement that you have to have and AI is going to be part of that solution, of looking at all of the applications and the data and telling you where it's going and how it's working together. >> So the second thing would be companies that are able to build or conceive of networks as data. Will also have an advantage. And I think I'd add a third one. Companies that demonstrate perennial observations, a real understanding of the unbelievable change that's required you can't just say, oh Facebook wants this therefore everybody's going to want it. There's going to be a lot of push marketing that goes on at the technology side. Alright so let's get to some Action Items. David Vellante, I'll start with you. Action Item. >> Well the future's going to be one where systems see, they talk, they sense, they recognize, they control, they optimize. It may be tempting to say, you know what I'm going to wait, I'm going to sit back and wait to figure out how I'm going to close that machine intelligence gap. I think that's a mistake. I think you have to start now, and you have to start with your data model. >> George Gilbert, Action Item. >> I think you have to keep in mind the guardrails related to governance, and trust, when you're building applications on the new data fabric. And you can take the approach of a platform-oriented one where you're plugging into an API, like Apache Atlas, that Hortonworks is driving, or a discovery-oriented one as David was talking about which would be something like Alation, using machine learning. But if, let's say the use case starts out as an IOT, edge analytics and cloud inferencing, that data science pipeline itself has to now be part of this fabric. Including the output of the design time. Meaning the models themselves, so they can be managed. >> Excellent. Jim Kobielus, you've been pretty quiet but I know you've got a lot to offer. Action Item, Jim. >> I'll be very brief. What you need to do is protect your data science knowledge base. That's the way to de-risk this entire process. And that involves more than just a data catalog. You need a data science expertise registry within your distributed value chain. And you need to manage that as a very human asset that needs to grow. That is your number one asset going forward. >> Ralph Finos, you've also been pretty quiet. Action Item, Ralph. >> Yeah, I think you've got to be careful about what you're trying to get done. Whether it's, it depends on your industry, whether it's finance or whether it's the entertainment business, there are different requirements about data in those different environments. And you need to be cautious about that and you need leadership on the executive business side of things. The last thing in the world you want to do is depend on data scientists to figure this stuff out. >> And I'll give you the second to last answer or Action Item. Neil Raden, Action Item. >> I think there's been a lot of progress lately in creating tools for data scientists to be more efficient and they need to be, because the big digital giants are draining them from other companies. So that's very encouraging. But in general I think becoming a data-driven, a digital transformation company for most companies, is a big job and I think they need to it in piece parts because if they try to do it all at once they're going to be in trouble. >> Alright, so that's great conversation guys. Oh, David Floyer, Action Item. David's looking at me saying, ah what about me? David Floyer, Action Item. >> (laughing) So my Action Item comes from an Irish proverb. Which if you ask for directions they will always answer you, "I wouldn't start from here." So the Action Item that I have is, if somebody is coming in saying you have to re-do all of your applications and re-write them from scratch, and start in a completely different direction, that is going to be a 20-year job and you're not going to ever get it done. So you have to start from what you have. The digital assets that you have, and you have to focus on improving those with additional applications, additional data using that as the foundation for how you build that business with a clear long-term view. And if you look at some of the examples that were given early, particularly in the insurance industries, that's what they did. >> Thank you very much guys. So, let's do an overall Action Item. We've been talking today about the challenges of de-risking digital business which ties directly to the overall understanding of the role of data assets play in businesses and the technology's ability to move from just protecting data, restoring data, to actually restoring the relationships in the data, the structures of the data and very importantly the models that are resident in the data. This is going to be a significant journey. There's clear evidence that this is driving a new valuation within the business. Folks talk about data as the new oil. We don't necessarily see things that way because data, quite frankly, is a very very different kind of asset. The cost could be shared because it doesn't suffer the same limits on scarcity. So as a consequence, what has to happen is, you have to start with where you are. What is your current value proposition? And what data do you have in support of that value proposition? And then whiteboard it, clean slate it and say, what data would we like to have in support of the activities that we perform? Figure out what those gaps are. Find ways to get access to that data through piecemeal, piece-part investments. That provide a roadmap of priorities looking forward. Out of that will come a better understanding of the fundamental data assets that are being created. New models of how you engage customers. New models of how operations works in the shop floor. New models of how financial services are being employed and utilized. And use that as a basis for then starting to put forward plans for bringing technologies in, that are capable of not just supporting the data and protecting the data but protecting the overall organization of data in the form of these models, in the form of these relationships, so that the business can, as it creates these, as it throws off these new assets, treat them as the special resource that the business requires. Once that is in place, we'll start seeing businesses more successfully reorganize, reinstitutionalize the work around data, and it won't just be the big technology companies who have, who people call digital native, that are well down this path. I want to thank George Gilbert, David Floyer here in the studio with me. David Vellante, Ralph Finos, Neil Raden and Jim Kobelius on the phone. Thanks very much guys. Great conversation. And that's been another Wikibon Action Item. (upbeat music)

Published Date : Mar 16 2018

SUMMARY :

I'm joined here in the studio has been that the difference and importantly in the context are going to have absolute consequential impacts and provide access to the data model, the ability of organizations to really implement this but is bigger than the role of technology. that is the right thing to do. Why is it that the most valuable companies in the world human expertise is at the center and the data lives in silos in other businesses, the buildings. the bottling plant et cetera, to data. Yeah we are, we are. an example from the news. and take the assets that you have. One of the primary reasons why is going to enable you to be able to build At the end of the day you look at your value proposition And then you need to put together a plan once again, talking about the Uberization of things, to erase my data, can you say with extreme confidence There's not a lot of companies on the planet and the only way you can be a trusted steward of that That's an example of another, one of the properties I mentioned the big five, but if you look at Spotify, and I'm going to come to you in a second And so really, what you need to do is, of protecting the full organization of those data assets. and use AI if you like to actually understand and it's not just the metadata, And the ability to continue to upgrade that and the technology companies also have to step up It's clear that you should keep data where it is. are going to have an advantage. So if it originates in the cloud, keep it in the cloud. Unless you need to combine it. That's the requirement that you have to have that goes on at the technology side. Well the future's going to be one where systems see, I think you have to keep in mind the guardrails but I know you've got a lot to offer. that needs to grow. Ralph Finos, you've also been pretty quiet. And you need to be cautious about that And I'll give you the second to last answer and they need to be, because the big digital giants David's looking at me saying, ah what about me? that is going to be a 20-year job and the technology's ability to move from just

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
David Vellante	PERSON	0.99+
David	PERSON	0.99+
Apple	ORGANIZATION	0.99+
Facebook	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Neil	PERSON	0.99+
Google	ORGANIZATION	0.99+
Walmart	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
David Floyer	PERSON	0.99+
George Gilbert	PERSON	0.99+
Jim Kobelius	PERSON	0.99+
Peter Burris	PERSON	0.99+
Jim	PERSON	0.99+
Geoffrey Moore	PERSON	0.99+
George	PERSON	0.99+
Ralph Finos	PERSON	0.99+
Neil Raden	PERSON	0.99+
INA	ORGANIZATION	0.99+
Equifax	ORGANIZATION	0.99+
Sears	ORGANIZATION	0.99+
Peter	PERSON	0.99+
March 2018	DATE	0.99+
Uber	ORGANIZATION	0.99+
TIBCO	ORGANIZATION	0.99+
DISCO	ORGANIZATION	0.99+
David Vallante	PERSON	0.99+
$160 billion	QUANTITY	0.99+
20-year	QUANTITY	0.99+
30 years	QUANTITY	0.99+
Ralph	PERSON	0.99+
Dave	PERSON	0.99+
Netflix	ORGANIZATION	0.99+
Peter Drucker	PERSON	0.99+
Express Scripts	ORGANIZATION	0.99+
Veritas	ORGANIZATION	0.99+
David Foyer	PERSON	0.99+
Veeam	ORGANIZATION	0.99+
$67 billion	QUANTITY	0.99+
Palo Alto, California	LOCATION	0.99+
first point	QUANTITY	0.99+
thirty seconds	QUANTITY	0.99+
second	QUANTITY	0.99+
Spotify	ORGANIZATION	0.99+
Twitter	ORGANIZATION	0.99+
Connecticut General	ORGANIZATION	0.99+
two things	QUANTITY	0.99+
both	QUANTITY	0.99+
about $3.5 trillion	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
Cigna	ORGANIZATION	0.99+
Both	QUANTITY	0.99+
2000	DATE	0.99+
today	DATE	0.99+
one	QUANTITY	0.99+
Dylon Sears	ORGANIZATION	0.98+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for WekaIO: