Image Title

Search Results for Storage Tech:

Fred Moore, Horison Information Strategies | CUBE Conversation, August 2020


 

>> Introducer: From the CUBE studios in Palo Alto and in Boston connecting with thought leaders all around the world. This is a CUBE Conversation. >> Hi everybody this is Dave Volante. Welcome to the special CUBE Conversation. I'm really excited to invite in my mentor and friend. We go way back. Fred Moore is here. He's the president of Horizon Information Strategies. We going to talk about managing data in the zettabyte era. Fred, I think when we first met, we were talking about like the megabyte era. >> Right, exactly. I think back then we had, you know, maybe 10 bytes in our telephone and one on the wristwatch, you know, but now you can put a whole data center in a single cartridge of tape and take off. Things that really changed. >> It's pretty amazing. And of course, for those who don't know Fred, he was the first a systems engineer at Storage Tech. And as I said, somebody who taught me a lot in my early days, of course he's very famous for the term that everybody uses today. Backup is one thing, recovery is everything. And Fred just wrote, you know, this fantastic paper. He's done this year after year after year. He's just dug in, he's a clear thinker, strategic planner with a technical bent in a business bent. You're like one of those five tool baseball players, Fred. But tell me about this paper. Why, did you write it? >> Well, the reason I wrote that is there's been so much focus in the last year or so on the archive component of the storage hierarchy. And the thing that's happening, we're generating data lots faster than we're analyzing it. So it's piling up being unanalyzed and sitting basically untapped for years at a time. So that has posed a big challenge for people. The other thing that got me deeper into this last year was the Hyperscale market. They are, those people are so big in terms of footprint and infrastructure that they can no longer keep everything on disk. It's just economically not possible. The energy consumption per disk, the infrastructure costs, the frequency of, you know, taking a disc out every three, four or five years for just for replacement, has made it very difficult to do that. So Hyperscale has gone to tape in a big way, and it's kind of where most of the tape business in the future is going to wind up in these Hyperscale businesses. >> Right. >> We know tape doesn't exist in the home. It doesn't exist in a small data center. It's only a large scale data center technology, but that whole cosmos led me into the archive space and in a need for a new archive technology beyond tape. >> So, I want to set up the premise here. Just going to pull this out of your paper. It says a 60% of all data is archival, and could reach 80% or more by 2024, making archival data by far the largest storage class. And given this trajectory, the traditional storage hierarchy paradigm is going to to need to disrupt itself. And quickly we're going to talk about that. That really is the premise of your paper here, isn't it? >> It is, you know, to do all this with traditional technologies is going to get very painful for a variety of reasons. So the stage is set for a new tier and a new technology to appear in the next five years. Fortunately, I'm actually working with somebody who is after this in a big way, and in a different way than what you and I know. So I think there is some hope here that we can redefine and really add a new tier down at the bottom. You see it kind of emerging on that picture of the deep archive tier it's. Beginning to show up now and it's, you know, infinite storage. I mean, if you look at major league sports, the world series and Superbowl, you know, that data will never be deleted. It'll be here forever. It'll be used periodically based on circumstances. >> Yeah, well, we've got that pyramid chart up here. I mean, you invented this chart, essentially. At least you were the first person that ever showed it to me. I honestly think that you first created this concept where you had a high performance tier, and a high cost per bit, and then an archive tier. Maybe it wasn't this granular, you know, back in the '70s and '80s? But it's constantly been changing with different media types and different use cases. >> You know, you're right. I mean, and you all know this because you know, when storage deck introduced the nearline architecture, nearline set in between online and offline storage, we called it nearline, and trademarked that term. So that was the tape library concept to move data from offline status to online status, with a robotic library. So that brought up that third tier online, nearline, and offline, but you're right. This pyramid has evolved and morphed into several things. And, you know, I keep it alive. Somebody said, I'll have a pyramid on my tombstone instead of my name when I go down. (both chuckles) But it's really the heart and soul of the infrastructure for data. And then out of this comes all the management and security, the deletion, the immutable storage concepts, the whole thing starts here. So it's like your house, you got to have a foundation, then you can build everything on top of it. >> Well, and as you pointed out in your paper, a minute ago, it always comes down to economics. So I want to bring up the sort of 10 year expected cost of ownership the TCO for the three levels you got all disk, you got all cloud and you got LTO and you got the different aspects of the cost. The purple is always the biggest piece of cost. It's the labor costs. But of course, you know, in cloud, you've got the big media cost because they've done so much automation. I wonder if you could take us through this slide, what are the key takeaways there? >> Well, you know the thing that hurts here with all these technologies is, as you can see up on top up there, what the key issues are with this and the staff and personnel. So the less people you have to manage data, the better off you are. And then, you know, it's pretty high for disk compared to a lot of things to do on desk, but lack of manage a lot of, you know, sadly what you and I had to deal with years ago and provision kind of, I mean, a lot of this stuff is just labor intensive. The further you get, the further down the pyramid and you also get less labor intensive storage. And that helps then you get a lower cost for energy and cost of ownership. The TCO thing is kind of taking on a new meaning. I hate to put up a TCO chart in some regards, because it's all based on what your input variables are. So you can decide something different, but we've tried to normalize all kinds of pricing and come up with everything. And the cloud is a big question for most people as to how does it stack up. And if you don't ever touch the data in the cloud, you know, the price comes way down. If you want to start moving data in and out of the cloud, you're going to have to ante up in a big way like that. But, you know we're going to see dollar a terabyte storage prices down at the bottom of this pyramid here in the next five years. But hey, you can get down to four or five terabyte with drives media in libraries tape, just entire flash and certainly higher than that. But you know, we're going to have the race to a dollar a terabyte, total TCO cost here in 2025. >> So when Amazon announced, they just announced a glacier. Everybody said, okay, what is that? Is that tape is that, you know, this spun down disk, cause it took a while to get it back. But you're kind of seeing that tape technology as you said, really move into the Hyperscale space and that's going to accommodate this massive, you know, lower part of the pyramid, isn't it? >> Exactly. Yeah. And we don't have a spin down disk solution today. I was actually on the board of a company that started that called Copay and years ago, right up here near Boulder. >> You watch him (both chuckles) You absolutely right. And a few other people that, you know also, but the spin down disk never made it. And you know, you can spin up and down on a desk on your desktop computer, but doing that in a data center, then on a fiber channel drive never made it. So we don't have a spin down disk to do that. The archive space is kind of dominated by very high capacity disc and then tape. And most of the archive data in the world today, unfortunately sits on display. It's not used and spinning seven by 24, three 65 and not touch much. So that's a bad economic move, but customers just found that easier to handle by doing that then going back to tape. So we've got a lot of data stored in the wrong place from a total economics point of view. >> But the Hyperscalers are solving this problem, or they're not through automation. And, you know, you referenced storage, tiering, really trying to take the labor cost out. How are they doing? Are they doing a good job? >> They've done really well taking the labor costs down, I mean, they have optimized every screw, nut and bolt in the 42 chassis that you could imagine to make it as clean as possible to do that. So they've done a whole lot to bring that cost down, but still the magnitude of these data centers, we're going to finish the year 2020 with about 570 Hyperscale data centers. So it's going right now around the world. You know, each one of these things is 350 400,000 square feet, and up of race wars space. And the economics just don't allow you to keep putting inactive data on spinning disk. We don't have to spin down disk, tape You know, I feel like the only guy in the industry that says this sometimes, but, you know, tapes had a, you know, a renaissance. That people don't appreciate in terms of reliability, throughput, you know, tapes three orders of reliability higher than disc right now. And most people don't know this. So tape's viable, the Hyperscalers see that. And read one Hyperscalers or you know, by over a million pieces of LTO tape last year alone. Just to handle this, you know, be the pressure valve to take all of this inactive stuff off of the gigantic disc farms that they have. >> Well, so let's talk about that a little bit. So you just try to keep it simple. You've got, you know, flash disk and tape. It feels like disc is getting squeezed. We know what flash has done in terms of eating into disc. And you see in that, in the storage market generally, it's soft right now. And I've posited that a lot of that is the headroom that data centers have with flash, is they don't have to buy spindles anymore for performance reasons. And the market is soft. Only pure is showing consistent growth, and ends up a little bit, cause because of mainframe, you've got Dell popping back and forth, but generally speaking, the primary storage market is not a great place to be right now, all the actions and sort of secondary storage and data protection. And so just going to get squeezed, and you mentioned tape, you said that if your only person talking about it, but you said in your paper, you know, it's sequential. So time to first bite is, is sometimes problematic, but you can front end a tape with cash. You can use algorithms and, you know, smart scans and to really address that problem. And dramatically lower the cost. Plus you could do things like you tell me Fred, you're the technologists here, but you're going to have multiple heads things that you can't necessarily do in a hermetically sealed disc drive. >> (chuckles) You can. And what you just described is called the active archive layer in the pyramid. So when you front end a tape library with a disk array for a cash buffer, you create an active archive and that data will sit in there three or four or five days before it gets demoted based on inactivity. So, you know for repetitive use and you're going to get dislike performance for tape data, and that's the same cash in concept that deserve systems had 30 years ago. So that does work and the active archive has got a lot of momentum right now. There's right here near me, where I live in Boulder. We have the Active Archive Alliances headquarters, and I get to do their annual report every year. And this whole active archives thing is a big way to make and overcome that time, the first bike problem that we've had in tape. And we'll have for quite a while. >> In your paper, you've talked about some of the use cases and workloads and you laid out, you know basically taking the pyramid and saying, okay based on the workload, some certain percentage should be up at the top of the pyramid for the high performance stuff. And of course lower for the, you know, the less, you know, important traditional workloads, et cetera. And it was striking to see the Delta between annual, the highest performance we had 70% , I think was up in the top of the pyramid versus, you know the last use case. So in you're talking about what it costs to store a zettabyte in services is that if I talk about 108 million at the high end versus a about 11 or 12 million, so huge Delta 10 X Delta between the top and the bottom based on those, you know allocations based on the workload. >> Yeah, I tried to get at the value of tiered storage based on your individual workload in your business. So I looked at five different workloads, the top one that you referenced. That was in there at 108 million, you know, is the HPC market. I mean, when I visited a few of the HPC people, you know, their DOD agencies in many cases, you know that and I threw the pyramid up. The first thing they would say our permanents inverted. You know (chuckles), all of our archive data is about 10%. You know, we were all flash as much as we can. And we have a little bit archived, we're in constant. Simulation and compute mode and producing results like crazy from the data. So we do an IO, bring in maybe a whole file at a time and compute for minutes before we come up with an answer. So just the reverse. And then I got to look into all the different workloads talking to people, and that's how we develop these profiles. >> So let's pull up this future of the storage hierarchy, was again kind of of talks to the premise of your paper. Walk us through this like, what changes should we be expecting, and you got air gap in here. We're going to, I'm going to ask you about remastering and lifespan, but take us through this. >> Yeah, you know, the traditional chart that you had up on the first big year had four tiers, you know, two disturbs and solid state at the top. And then the big archive tier, which is kind of everything falling down into tape at this point. But you know again, tape has some challenges. You know time to first bite and sequential access on. And then when we couple using tape or disc as an archive, most of that data that's archival is captured as unstructured data. So we don't have, we don't have tags, we don't have metadata, we don't have indices, and that has led to the movement for object storage, to be a primary, maybe in the next five years, the primary format in store archived data, because it's got all that information inside of it. So now we have a way to search things and we can get to objects, but in the interim, you know, it's hard to find and search out things that are unstructured and, you know, most estimates would say 80% of the world's data is at least that much is unstructured. So archives are hard to find once you store it, there's one storing is one thing, retrieving it is another thing. And that's led to the formation of another layer in the story tier. It's going to be data that doesn't have to be remastered or converted to a new technology. in the case of the disc, every three, four or five years or tape drive every eight, maybe 10 years take large lost. Kate Media can go 30 years, but with all new modern tape media, but unfortunately, you know, the underlying drive doesn't go back that far, you can't support that many different versions. So the media life is actually longer than it needs to be. So the stage is set for a new technology to appear down here to deal with this archives. So it'll have faster access will not need to be remastered every five or 10 years, but you'll have, you know, a 50 year life in here. And I believe me, I've been looking for a long time to be able find something like this. And, you know we have a shot at this now, and I'm actually working with the technology that could pull this off. >> Well, it's interesting also as well, you calling out the air gap and the chart we go back to our mainframe guesses, is not a lot we haven't seen before, you know, maybe data D duplication, but you know, the adversary has become a lot more sophisticated. And so air gaps and, you know, ransomware on everybody's mind today, but you've sort of highlighted three layers of the pyramid that are actually candidates for that air gapping. >> Yeah. The active archive up there, of course, you know, with the disk and tape combined, then just pure tape. And then this new technology, which can be removable. You know, when you have removability you create an air gap. little did we know when you and I met that removability would be important to take. We thought we were trying to get rid of the Chevy truck access method, and now without electricity with a terrorist attack and pandemic or whatever. The fastest way to move data is put it on a truck and get it out of town. So that has got renewed life right now. Removability much to my shock from where we started. >> You talked about remastering and you said it's a costly labor intensive process that typically migrates previously archived data to new media every five to 10 years. First of all, explain why you have to do that and how a data center operators can solve that problem. >> Yeah. And let's start with data where most of it sits today on described, you know it describes useful life is four to five years before it either fails or is replaced. That's pretty much common now. So then they have to start replacing these things. And that means you have to copy, you know, read the data off the disk and write it somewhere else, big data move. And as the years go by that amount of data to revamp or gets bigger and bigger. So, I mean, you can do the math as you well know, you want to move, you know, 50 petabytes of data. It's going to take several weeks to do that electronically. So this gets to be a real time consuming effort. So most data centers that I've seen will keep about one fifth of their disposal every year migrating to a new technology, just kind of rolling forward as they go like that rather than do the whole thing every five years. So that's the new build in the disc world. And then for tape the drive stay in there longer, you know the LTO family drives a good read. You know two generations back from the current one that's been there. They cut that off a year ago. They'll go back to something like this soon. But you know, you can go into 10 years on a tape drive. The media life because of very unfair right media, which was already oxidized the last 30 years or more. The old media metal particle was not oxidized. So, you know, the oxidized flake, the particles would fall off people will say shit. I've had this in here eight years, you know, and it's kind flake it I put it back in. So that didn't work well. But now that we had various Verite Media, it was all oxidized, the media lives skyrocket. So that was the whole trick with tape to get into something that was preoxidized before time could cause it to decay. So the remastering is a lot, is less on tape by two to one to three to one, but still when you've got petabytes, maybe an exabyte sitting on tape in the future, that's going to take a long time to do that. >> Right. >> So remastering you'd love a way to scale capacity without having to continue to move the data to something new ever so often. >> So my last question is you've , you know, you went from a technical role into a strategic planning role, which of course the more technical you are in that role, the better off you're going to be. You don't understand that the guardrails, but you've always had a sort of telescope in the industry and you close the paper and it's kind of where I want to end here on, you know, what's ahead. And you talk about some of the technologies that obviously have legs, like three D NAND and obviously magnetic storage. You got optical in here, but then you've got all these other ones that you even mentioned, you know, don't hold your breath waiting for these multilayer photonics and dedic DNA. What class media, holographic storage, quantum storage we do a lot about quantum. What should we be thinking about and expecting as observers as to, you know, new technologies that might drive some innovation in the storage business? >> Well, I've listed the ones that are in the lab that have any life at all, right on this paper. So, you know can kind of take your pick at what goes on there. I mean, optical disk has not made it in the data center. We talked about it for 35 years. We invested in it in storage deck and never saw the light of day. You know, optical disk has remained an entertainment technology throughout the last 35 years. And the bigger rate is very low compared to data center technology. So, you know optical would have to take a huge step going forward. We got a lot of legs left in the solid state business. That's really active SSB, the whole nonvolatile memory spaces. Probably not 45% of the total disc shipments in terms of units, from what it was at it's high and in 2010. Unbelievable though. You know, in disc shipment 650 million drives a year announced just under 400, 35,400. So flashes has taken this stuff away, like crazy. Tape shouldn't be taking just away, but the tape industry doesn't do a very effective job of marketing itself. Most people still don't know what's going on with tape. They're still looking out of the roof, still looking out of the rear view mirror at a tape, as opposed to the front windshield. We see all the new things that have happened. So, you know they have bad memories of taping the past load stretch, edge damage tape, wouldn't work a tear or anything like that. It was a problem. Oh, that's pretty well gone away now. In a moderate tape is a whole different ball game, but most people don't know that. So, you know tapes going to have to struggle with access time and sequential reality. They've done a few things to come over excess time and the order request now to take the optimizer based on physical movement on the tape that can take out 50% of your access time for multiple requests on a cartridge. The one on here that's got the most promise right now would be a version of a multilayer photonic storage, which is. I would say sort like optical, but, you know, with data center, class characteristics, multi-layer recording capability on that random access, which tape doesn't have. And, you know, I would say that's probably the one that you would want to take some look at going forward like this. The others are highly specular. You know, we've been talking about DNA since we were kids. So we don't have a DNA product out here yet. You know, it's access times eight hours. It's probably not going to work for us. That's your, that's not your deep archive anymore. That's your time capsule storage. >> Yeah, right. >> Lock the earth. So, I mean, I think you kind of see what's here. I mean, the chances are it's still going to be the magnetic technologies tape disc, and then the solid state number and stuff. >> Right. >> But these are the ones that I'm tracking and looking at, trying to have worked with a few of the companies that are in this. Future list and I'd love to see something breakthrough out there, but it's like, we've always said about a holographic storage. For example, you know, there's been more written about it than there's ever been written on it. (both chuckles) >> Well, the paper's called Reinventing Archival Storage. You can get it on your website I presume Fredhorizon.com >> Yep, absolutely. >> Awesome. >> Fred Moore, great to see you again. Thanks so much for coming on the CUBE. >> My pleasure, Dave. Thanks a lot. Great job. >> All right. And thank you for watching everybody. This is Dave Volante for the CUBE. We'll see you next time. (upbeat music)

Published Date : Aug 5 2020

SUMMARY :

all around the world. data in the zettabyte era. I think back then we had, you know, And Fred just wrote, you business in the future is going to We know tape doesn't exist in the home. That really is the premise the world series and Superbowl, you know, you know, back in the '70s and '80s? this because you know, But of course, you know, in cloud, So the less people you Is that tape is that, you know, of a company that started that And most of the archive And, you know, you that says this sometimes, but, you know, lot of that is the headroom and that's the same cash in concept the, you know, the less, the top one that you referenced. to ask you about remastering that are unstructured and, you know, And so air gaps and, you know, up there, of course, you know, and you said it's a costly the math as you well know, continue to move the data and you close the paper ones that are in the lab I mean, the chances For example, you know, Well, the paper's called Fred Moore, great to see you again. Thanks a lot. This is Dave Volante for the CUBE.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Fred MoorePERSON

0.99+

FredPERSON

0.99+

Dave VolantePERSON

0.99+

BostonLOCATION

0.99+

30 yearsQUANTITY

0.99+

Palo AltoLOCATION

0.99+

AmazonORGANIZATION

0.99+

DavePERSON

0.99+

70%QUANTITY

0.99+

August 2020DATE

0.99+

2025DATE

0.99+

2010DATE

0.99+

10 yearQUANTITY

0.99+

60%QUANTITY

0.99+

80%QUANTITY

0.99+

BoulderLOCATION

0.99+

50 yearQUANTITY

0.99+

10 yearsQUANTITY

0.99+

35 yearsQUANTITY

0.99+

HyperscaleORGANIZATION

0.99+

eight hoursQUANTITY

0.99+

2024DATE

0.99+

50 petabytesQUANTITY

0.99+

last yearDATE

0.99+

CopayORGANIZATION

0.99+

five yearsQUANTITY

0.99+

Horizon Information StrategiesORGANIZATION

0.99+

fourQUANTITY

0.99+

Active Archive AlliancesORGANIZATION

0.99+

12 millionQUANTITY

0.99+

108 millionQUANTITY

0.99+

10 bytesQUANTITY

0.99+

24QUANTITY

0.99+

twoQUANTITY

0.99+

eight yearsQUANTITY

0.99+

a year agoDATE

0.99+

350 400,000 square feetQUANTITY

0.99+

Horison Information StrategiesORGANIZATION

0.99+

2020DATE

0.98+

45%QUANTITY

0.98+

firstQUANTITY

0.98+

DellORGANIZATION

0.98+

50%QUANTITY

0.98+

five daysQUANTITY

0.98+

over a million piecesQUANTITY

0.98+

threeQUANTITY

0.98+

five toolQUANTITY

0.98+

first biteQUANTITY

0.98+

about 10%QUANTITY

0.97+

DODORGANIZATION

0.97+

FirstQUANTITY

0.97+

yearsDATE

0.97+

10QUANTITY

0.97+

each oneQUANTITY

0.97+

first personQUANTITY

0.96+

65QUANTITY

0.96+

one thingQUANTITY

0.96+

30 years agoDATE

0.96+

todayDATE

0.96+

dollar a terabyteQUANTITY

0.96+

sevenQUANTITY

0.96+

four tiersQUANTITY

0.95+

oneQUANTITY

0.95+

five different workloadsQUANTITY

0.95+

two generationsQUANTITY

0.95+

Storage TechORGANIZATION

0.95+

five terabyteQUANTITY

0.95+

three levelsQUANTITY

0.95+

CUBEORGANIZATION

0.95+

pandemicEVENT

0.94+

earthLOCATION

0.94+

under 400, 35,400QUANTITY

0.94+