George Gagne & Christopher McDermott, Defense POW/MIA Account Agency | AWS Public Sector Summit 2019
>> Live from Washington, DC, it's theCUBE, covering AWS Public Sector Summit. Brought to you by Amazon Web Services. >> Welcome back everyone to theCUBE's live coverage of the AWS Public Sector Summit, here in our nation's capital. I'm your host, Rebecca Knight, co-hosting with John Furrier. We have two guests for this segment, we have George Gagne, he is the Chief Information Officer at Defense POW/MIA Accounting Agency. Welcome, George. And we have Christopher McDermott, who is the CDO of the POW/MIA Accounting Agency. Welcome, Chris. >> Thank you. >> Thank you both so much for coming on the show. >> Thank you. >> So, I want to start with you George, why don't you tell our viewers a little bit about the POW/MIA Accounting Agency. >> Sure, so the mission has been around for decades actually. In 2015, Secretary of Defense, Hagel, looked at the accounting community as a whole and for efficiency gains made decision to consolidate some of the accounting community into a single organization. And they took the former JPAC, which was a direct reporting unit to PACOM out of Hawaii, which was the operational arm of the accounting community, responsible for research, investigation, recovery and identification. They took that organization, they looked at the policy portion of the organization, which is here in Crystal City, DPMO and then they took another part of the organization, our Life Sciences Support Equipment laboratory in Dayton, Ohio, and consolidated that to make the defense POW/MIA Accounting Agency, Under the Office of Secretary Defense for Policy. So that was step one. Our mission is the fullest possible accounting of missing U.S. personnel to their families and to our nation. That's our mission, we have approximately 82,000 Americans missing from our past conflicts, our service members from World War II, Korea War, Korea, Vietnam and the Cold War. When you look at the demographics of that, we have approximately 1,600 still missing from the Vietnam conflict. We have just over a 100 still missing from the Cold War conflict. We have approximately 7,700 still missing from the Korean War and the remainder of are from World War II. So, you know, one of the challenges when our organization was first formed, was we had three different organizations all had different reporting chains, they had their own cultures, disparate cultures, disparate systems, disparate processes, and step one of that was to get everybody on the same backbone and the same network. Step two to that, was to look at all those on-prem legacy systems that we had across our environment and look at the consolidation of that. And because our organization is so geographically dispersed, I just mentioned three, we also have a laboratory in Offutt, Nebraska. We have detachments in Southeast Asia, Thailand, Vietnam, Laos, and we have a detachment in Germany. And we're highly mobile. We conduct about, this year we're planned to do 84 missions around the world, 34 countries. And those missions last 30 to 45 day increments. So highly mobile, very globally diverse organization. So when we looked at that environment obviously we knew the first step after we got everybody on one network was to look to cloud architectures and models in order to be able to communicate, coordinate, and collaborate, so we developed a case management system that consist of a business intelligence software along with some enterprise content software coupled with some forensics software for our laboratory staff that make up what we call our case management system that cloud hosted. >> So business challenges, the consolidation, the reset or set-up for the mission, but then the data types, it's a different kind of data problem to work, to achieve the outcomes you're looking for. Christopher, talk about that dynamic because, >> Sure. >> You know, there are historical different types of data. >> That's right. And a lot of our data started as IBM punchcards or it started from, you know, paper files. When I started the work, we were still looking things up on microfiche and microfilm, so we've been working on an aggressive program to get all that kind of data digitized, but then we have to make it accessible. And we had, you know as George was saying, multiple different organizations doing similar work. So you had a lot of duplication of the same information, but kept in different structures, searchable in different pathways. So we have to bring all of that together and make and make it accessible, so that the government can all be on the same page. Because again, as George said, there's a large number of cases that we potentially can work on, but we have to be able to triage that down to the ones that have the best opportunity for us to use our current methods to solve. So rather than look for all 82,000 at once, we want to be able to navigate through that data and find the cases that have the most likelihood of success. >> So where do you even begin? What's the data that you're looking at? What have you seen has had the best indicators for success, of finding those people who are prisoners of war or missing in action? >> Well, you know, for some degrees as George was saying, our missions has been going on for decades. So, you know, a lot of the files that we're working from today were created at the time of the incidents. For the Vietnam cases, we have a lot of continuity. So we're still working on the leads that the strongest out of that set. And we still send multiple teams a year into Vietnam and Laos, Cambodia. And that's where, you know, you try to build upon the previous investigations, but that's also where if those investigations were done in the '70s or the '80s we have to then surface what's actionable out of that information, which pathways have we trod that didn't pay off. So a lot of it is, What can we reanalyze today? What new techniques can we bring? Can we bring in, you know, remote sensing data? Can we bring GIS applications to analyze where's the best scenario for resolving these cases after all this time? >> I mean, it's interesting one of the things we hear from the Amazon, we've done so many interviews with Amazon executives, we've kind of know their messaging. So here's one of them, "Eliminate the undifferentiated heavy lifting." You hear that a lot right. So there might be a lot of that here and then Teresa had a slide up today talking about COBOL and mainframe, talk about punch cards >> Absolutely. >> So you have a lot of data that's different types older data. So it's a true digitization project that you got to enable as well as other complexity. >> Absolutely, when the agency was formed in 2015 we really begin the process of an information modernization effort across the organization. Because like I said, these were legacy on-prem systems that were their systems' of record that had specific ways and didn't really have the ability to share the data, collaborate, coordinate, and communicate. So, it was a heavy lift across the board getting everyone on one backbone. But then going through an agency information modernization evolution, if you will, that we're still working our way through, because we're so mobilely diversified as well, our field communications capability and reach back and into the cloud and being able to access that data from geographical locations around the world, whether it's in the Himalayas, whether it's in Vietnam, whether it's in Papua New Guinea, wherever we may be. Not just our fixed locations. >> George and Christopher, if you each could comment for our audience, I would love to get this on record as you guys are really doing a great modernization project. Talk about, if you each could talk about key learnings and it could be from scar tissue. It could be from pain and suffering to an epiphany or some breakthrough. What was some of the key learnings as you when through the modernization? Could you share some from a CIO perspective and from a CDO perspective? >> Well, I'll give you a couple takeaways of what I thought I think we did well and some areas I thought that we could have done better. And for us as we looked at building our case management system, I think step one of defining our problem statement, it was years in planning before we actually took steps to actually start building out our infrastructure in the Amazon Cloud, or our applications. But building and defining that problem statement, we took some time to really take a look at that, because of the different in cultures from the disparate organizations and our processes and so on and so forth. Defining that problem statement was critical to our success and moving forward. I'd say one of the areas that I say that we could have done better is probably associated with communication and stakeholder buy-in. Because we are so geographically dispersed and highly mobile, getting the word out to everybody and all those geographically locations and all those time zones with our workforce that's out in the field a lot at 30 to 45 days at a time, three or four missions a year, sometimes more. It certainly made it difficult to get part of that get that messaging out with some of that stakeholder buy-in. And I think probably moving forward and we still deal regarding challenges is data hygiene. And that's for us, something else we did really well was we established this CDO role within our organization, because it's no longer about the systems that are used to process and store the data. It's really about the data. And who better to know the data but our data owners, not custodians and our chief data officer and our data governance council that was established. >> Christopher you're learnings, takeaways? >> What we're trying to build upon is, you define your problem statement, but the pathway there is you have to get results in front of the end users. You have get them to the people who are doing the work, so you can keep guiding it toward the solution actually meets all the needs, as well as build something that can innovate continuously over time. Because the technology space is changing so quickly and dynamically that the more we can surface our problem set, the more help we can to help find ways to navigate through that. >> So one of the things you said is that you're using data to look at the past. Whereas, so many of the guests we're talking today and so many of the people here at this summit are talking about using data to predict the future. Are you able to look your data sets from the past and then also sort of say, And then this is how we can prevent more POW. Are you using, are you thinking at all, are you looking at the future at all with you data? >> I mean, certainly especially from our laboratory science perspective, we have have probably the most advanced human identification capability in the world. >> Right. >> And recovery. And so all of those lessons really go a long ways to what what information needs to be accessible and actionable for us to be able to, recover individuals in those circumstances and make those identifications as quickly as possible. At the same time the cases that we're working on are the hardest ones. >> Right. >> The ones that are still left. But each success that we have teaches us something that can then be applied going forward. >> What is the human side of your job? Because here you are, these two wonky data number crunchers and yet, you are these are people who died fighting for their country. How do you manage those two, really two important parts of your job and how do you think about that? >> Yeah, I will say that it does amp up the emotional quotient of our agency and everybody really feels passionately about all the work that they do. About 10 times a year our agency meets with family members of the missing at different locations around the country. And those are really powerful reminders of why we're doing this. And you do get a lot of gratitude, but at the same time each case that's waiting still that's the one that matters to them. And you see that in the passion our agency brings to the data questions and quickly they want us to progress. It's never fast enough. There's always another case to pursue. So that definitely adds a lot to it, but it is very meaningful when we can help tell that story. And even for a case where we may never have the answers, being able to say, "This is what the government knows about your case and these are efforts that have been undertaken to this point." >> The fact there's an effort going on is really a wonderful thing for everybody involved. Good outcomes coming out from that. But interesting angle as a techy, IT, former IT techy back in the day in the '80s, '90s, I can't help but marvel at your perspective on your project because you're historians in a way too. You've got type punch cards, you know you got, I never used punch cards. >> Put them in a museum. >> I was the first generation post punch cards, but you have a historical view of IT state of the art at the time of the data you're working with. You have to make that data actionable in an outcome scenario workload work-stream for today. >> Yeah, another example we have is we're reclaiming chest X-rays that they did for induction when guys were which would screen for tuberculosis when they came into service. We're able to use those X-rays now for comparison with the remains that are recovered from the field. >> So you guys are really digging into history of IT. >> Yeah. >> So I'd love to get your perspective. To me, I marvel and I've always been critical of Washington's slowness with respect to cloud, but seeing you catch up now with the tailwinds here with cloud and Amazon and now Microsoft coming in with AI. You kind of see the visibility that leads to value. As you look back at the industry of federal, state, and local governments in public over the years, what's your view of the current state of union of modernization, because it seems to be a renaissance? >> Yeah, I would say the analogy I would give you it's same as that of the industrial revolutions went through in the early 20th century, but it's more about the technology revolution that we're going through now. That's how I'd probably characterize it. If I were to look back and tell my children's children about, hey, the advent of technology and that progression of where we're at. Cloud architecture certainly take down geographical barriers that before were problems for us. Now we're able to overcome those. We can't overcome the timezone barriers, but certainly the geographical barriers of separation of an organization with cloud computing has certainly changed. >> Do you see your peers within the government sector, other agencies, kind of catching wind of this going, Wow, I could really change the game. And will it be a step function into your kind of mind as you kind of have to project kind of forward where we are. Is it going to a small improvement, a step function? What do you guys see? What's the sentiment around town? >> I'm from Hawaii, so Chris probably has a better perspective of that with some of our sister organizations here in town. But, I would say there's more and more organizations that are adopting cloud architectures. It's my understanding very few organizations now are co-located in one facility and one location, right. Take a look at telework today, cost of doing business, remote accessibility regardless of where you're at. So, I'd say it's a force multiplier by far for any line of business, whether it's public sector, federal government or whatever. It's certainly enhanced our capabilities and it's a force multiplier for us. >> And I think that's where the expectation increasingly is that the data should be available and I should be able to act on it wherever I am whenever the the opportunity arises. And that's where the more we can democratize our ability to get that data out to our partners to our teams in the field, the faster those answers can come through. And the faster we can make decisions based upon the information we have, not just the process that we follow. >> And it feeds the creativity and the work product of the actors involved. Getting the data out there versus hoarding it, wall guarding it, asylumming it. >> Right, yeah. You know, becoming the lone expert on this sack of paper in the filing cabinet, doesn't have as much power as getting that data accessible to a much broader squad and everyone can contribute. >> We're doing our part. >> That's right, it's open sourcing it right here. >> To your point, death by PowerPoint. I'm sure you've heard that before. Well business intelligence software now by the click of a button reduces the level of effort for man-power and resources to put together slide decks. Where in business intelligence software can reach out to those structured data platforms and pull out the data that you want at the click of a button and build those presentations for you on the fly. Think about, I mean, if that's our force multiplier in advances in technology of. I think the biggest thing is we understand as humans how to exploit and leverage the technologies and the capabilities. Because I still don't think we fully grasp the potential of technology and how it can be leveraged to empower us. >> That's great insight and really respect what you guys do. Love your mission. Thanks for sharing. >> Yeah, thanks so much for coming on the show. >> Thank you for having us. >> I'm Rebecca Knight for John Ferrer. We will have much more coming up tomorrow on the AWS Public Sector Summit here in Washington, DC. (upbeat music)
SUMMARY :
Brought to you by Amazon Web Services. of the AWS Public Sector Summit, for coming on the show. about the POW/MIA Accounting Agency. and look at the consolidation of that. the reset or set-up for the mission, You know, there are historical so that the government can in the '70s or the '80s we have to then one of the things we hear project that you got to enable and into the cloud and being as you guys are really doing and store the data. and dynamically that the more we can So one of the things you said is capability in the world. At the same time the cases But each success that we What is the human side of your job? that's the one that matters to them. back in the day in the '80s, '90s, at the time of the data recovered from the field. So you guys are really You kind of see the visibility it's same as that of the Wow, I could really change the game. a better perspective of that with some And the faster we can make decisions and the work product in the filing cabinet, That's right, it's open and pull out the data that you really respect what you guys do. for coming on the show. on the AWS Public Sector
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Christopher McDermott | PERSON | 0.99+ |
George | PERSON | 0.99+ |
George Gagne | PERSON | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
Rebecca Knight | PERSON | 0.99+ |
Chris | PERSON | 0.99+ |
Vietnam | LOCATION | 0.99+ |
Germany | LOCATION | 0.99+ |
2015 | DATE | 0.99+ |
Christopher McDermott | PERSON | 0.99+ |
Christopher | PERSON | 0.99+ |
Amazon Web Services | ORGANIZATION | 0.99+ |
JPAC | ORGANIZATION | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Teresa | PERSON | 0.99+ |
Hawaii | LOCATION | 0.99+ |
Papua New Guinea | LOCATION | 0.99+ |
Washington, DC | LOCATION | 0.99+ |
Crystal City | LOCATION | 0.99+ |
Laos | LOCATION | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
POW/MIA Accounting Agency | ORGANIZATION | 0.99+ |
Thailand | LOCATION | 0.99+ |
PACOM | ORGANIZATION | 0.99+ |
three | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
two guests | QUANTITY | 0.99+ |
World War II. | EVENT | 0.99+ |
John Furrier | PERSON | 0.99+ |
John Ferrer | PERSON | 0.99+ |
Korean War | EVENT | 0.99+ |
30 | QUANTITY | 0.99+ |
Southeast Asia | LOCATION | 0.99+ |
Hagel | PERSON | 0.99+ |
PowerPoint | TITLE | 0.99+ |
34 countries | QUANTITY | 0.99+ |
Cold War | EVENT | 0.99+ |
84 missions | QUANTITY | 0.99+ |
early 20th century | DATE | 0.99+ |
World War II | EVENT | 0.99+ |
approximately 7,700 | QUANTITY | 0.99+ |
Himalayas | LOCATION | 0.99+ |
first step | QUANTITY | 0.99+ |
45 days | QUANTITY | 0.99+ |
this year | DATE | 0.99+ |
one | QUANTITY | 0.99+ |
Dayton, Ohio | LOCATION | 0.99+ |
Korea War | EVENT | 0.99+ |
approximately 1,600 | QUANTITY | 0.99+ |
two important parts | QUANTITY | 0.99+ |
each case | QUANTITY | 0.99+ |
82,000 | QUANTITY | 0.98+ |
tomorrow | DATE | 0.98+ |
approximately 82,000 | QUANTITY | 0.98+ |
one facility | QUANTITY | 0.98+ |
today | DATE | 0.98+ |
one network | QUANTITY | 0.98+ |
Vietnam | EVENT | 0.98+ |
step one | QUANTITY | 0.98+ |
Vietnam conflict | EVENT | 0.98+ |
one location | QUANTITY | 0.98+ |
Defense POW/MIA Accounting Agency | ORGANIZATION | 0.97+ |
Step two | QUANTITY | 0.97+ |
both | QUANTITY | 0.97+ |
first | QUANTITY | 0.97+ |
AWS Public Sector Summit | EVENT | 0.97+ |
each | QUANTITY | 0.97+ |
first generation | QUANTITY | 0.97+ |
AWS | EVENT | 0.96+ |
single organization | QUANTITY | 0.96+ |
U.S. | LOCATION | 0.96+ |
decades | QUANTITY | 0.95+ |
Offutt, Nebraska | LOCATION | 0.95+ |
Liran Zvibel & Andy Watson, WekaIO | CUBE Conversation, December 2018
(cheery music) >> Hi I'm Peter Burris, and welcome to another CUBE Conversation from our studios in Palo Alto, California. Today we're going to be talking about some new advances in how data gets processed. Now it may not sound exciting, but when you hear about some of the performance capabilities, and how it liberates new classes of applications, this is important stuff, now to have that conversation we've got Weka.IO here with us, specifically Liran Zvibel is the CEO of Weka.IO, and joined by Andy Watson, who's the CTO of Weka.IO. Liran, Andy, welcome to the cube. >> Thanks. >> Thank you very much for having us. >> So Liran, you've been here before, Andy, you're a newbie, so Liran, let's start with you. Give us the Weka.IO update, what's going on with the company? >> So 18 has been a grand year for us, we've had great market adoption, so we've spent last year proving our technology, and this year we have accelerated our commercial successes, we've expanded to Europe, we've hired quite a lot of sales in the US, and we're seeing a lot of successes around machine learning, deep learning, and life sciences data processes. >> And you've hired a CTO. >> And we've hired the CTO, Andy Watson, which I am excited about. >> So Andy, what's your pedigree, what's your background? >> Well I've been around a while, got the scars on my back to show it, mostly in storage, dating back to even off-specs before NetApp, but probably best known for the years I spent at NetApp, was there from 95 through 2007, kind of the glory years, I was the second CTO at NetApp, as a matter of fact, and that was a pretty exciting time. We changed the way the world viewed shared storage, I think it's fair to say, at NetApp, and it feels the same here at Weka.IO, and that's one of the reasons I'm so excited to have joined this company, because it's the same kind of experience of having something that is so revolutionary that quite often, whether it's a customer, or an analyst like yourself, people are a little skeptical, they find it hard to believe that we can do the things that we do, and so it's gratifying when we have the data to back it up, and it's really a lot of fun to see how customers react when they actually have it in their environment, and it changes their workflow and their life experience. >> Well I will admit, I might be undermining my credibility here, but I will admit that back in the mid 90s I was a little bit skeptical about NetApp, but I'm considerably less skeptical about Weka.IO, just based on the conversations we've had, but let's turn to that, because there are classes of applications that are highly dependent on very large, small files, being able to be moved very very rapidly, like machine learning, so you mentioned machine learning, Liran, talk a little bit about some of the market success that you're having, some of those applications' successes. >> Right so machine learning actually works extremely well for us for two reasons. For one big reasons, machine learning is being performed by GPU servers, so a server with several GPU offload engines in them, and what we see with this kind of server, a single GPU server replaces ten or tens of CPU based servers, and what we see that you actually need, the IO performance to be ten or tens times what the CPU servers has been, so we came up with a way of providing significantly higher, so two orders of magnitude higher IO to a single client on the one hand, and on the other hand, we have sold the data performance from the metadata perspective, so we can have directories with billions of files, we can have the whole file system with trillions of files, and when we look at the autonomous driving problem, for examples, if you look at the high end car makers, they have eight cameras around the cars, these cameras take small resolution, because you don't need a very high resolution to recognize the line, or a cat, or a pedestrian, but they take them at 60 frames per second, so 30 minutes, you get about the 100k files, traditional filers could put in the directory, but if you'd like to have your cars running in the Bay Area, you'd like to have all the data from the Bay Area in the single directory, then you would need the billions of file directories for us, and what we have heard from some of our customers that have had great success with our platform is that not only they get hundreds of gigabytes of small file read performance per second, they tell us that they take their standard time to add pop from about two weeks before they switched to us down to four hours. >> Now let's explore that, because one of the key reasons there is the scalability of the number of files you can handle, so in other words, instead of having to run against a limit of the number of files that they can typically run through the system, saturate these GPUs based on some other storage or file technology, they now don't have to stop and set up the job again and run it over and over, they can run the whole job against the entire expansive set of files, and that's crucial to speeding up the delivery of the outcome, right? >> Definitely, so what they, these customers used to do before us, they would do a local caching, cause NFS was not fast enough for them, so they would copy the data locally, and then they would run them over on the local file system, because that has been the pinnacle of performance of recent year. We are the only storage currently, I think we'll actually be the first wave of storage solutions where a shared platform built for NVME is actually faster than a local file system, so we'd let them go through any file, they don't have to pick initially what files goes to what server, and also we are even faster than the traditional caching solutions. >> And imagine, having to collect the data and copy it to the local server, application server, and do that again and again and again for a whole server farm, right? So it's bad enough to even do it once, to do it many times, and then to do it over and over and over and over again, it's a huge amount of work. >> And a lot of time? >> And a lot of time, and cumulatively that burden, it's going to slow you down, so that makes a big big difference and secondly, as Liran was explaining, if you put 100,000 files in a directory of other file systems, that is stressful. You want to put more than 100,000 files in a directory of other file systems, that is a tragedy, and we routinely can handle millions of files in a directory, doesn't matter to us at all because just like we distribute the data, we also distribute the metadata, and that's completely counter to the way the other file systems are designed because they were all designed in an era where their focus was on the physical geometry of hard disks, and we have been designed for flash storage. >> And the metadata associated with the distribution of that data typically was in a one file, in one place, and that was the master serialization problem when you come right down to it. So we've got a lot of ML workloads, very large number of files, definitely improved performance because of the parallelism through your file system, in the as I said, the ML world. Let's generalize this. What does this mean overall, you've kind of touched upon it, but what does it mean overall for the way that customers are going to think about storage architectures in the future as they are combining ML and related types of workloads with more traditional types of things? What's the impact of this on storage? >> So if you look at how people architect their solutions around storage recently, you have four different kind of storage systems. If you need the utmost performance, you're going to DAS, Fusion IO had a run, perfecting DAS and then the whole industry realized it. >> Direct attached storage. >> Direct attached storage, right, and then the industry realized hey it makes so much sense, they create a standard out of it, created NVME, but then you're wasting a lot of capacity, and you cannot manage it, you cannot back it up, and then if you need it as some way to manage it, you would put your data over SAN, actually our previous company was XAV storage that IBM acquired, vast majority of our use cases are actually people buying block, and then they overlay a local file system over it because it gets you so much higher performance then if you must get, but you don't get, you cannot share the data. Now, if you put it on a filer, which is Neta, or Islon, or the other solutions, you can share the data but your performance is limited, and your scalability is limited as Andy just said, and if you had to scale through the roof- >> With a shared storage approach. >> With a shared storage approach you had to go and port your application to an object storage which is an enormous feat of engineering, and tons of these projects actually failed. We actually bring the new kind of storage, which is assured storage, as scalable as an object storage, but faster than direct attach storage, so looking at the other traditional storage systems of the last 20 or 30 years, we actually have all the advantages people would come to expect from the different categories, but we don't have any of the downsides. >> Now give us some numbers, or do you have any benchmarks that you can talk about that kind of show or verify or validate this kind of vision that you've got, that Weka's delivering on? >> Definitely, but the i500? >> Sure, sure, we recently actually published our IO500 performance results at the SE1800, SE18 event in Dallas, and there are two different metrics- >> So fast you can go back in time? >> Yes, exactly, there are two different metrics, one metric is like an aggregate total amount of performance, it's a much longer list. I think the one that's more interesting is the one where it's the 10-client version, which we like to focus on because we believe that the most important area for a customer to focus on is how much IO can you deliver to an individual application server? And so this part of the benchmark is most representative of that, and on that rating, we were able to come in second well, after you filter out the irrelevant results, which, that's a separate process. >> Typical of every benchmark. >> Yes exactly, of the relevant meaningful results, we came in second behind the world's largest and most expensive supercomputer at Oak Ridge, the SUMMIT system. So they have a 40 rack system, and we have a half, or maybe a little bit more than half, one rack system of industry standard hardware running our software. So compare that, the cost of our hardware footprint and so forth is much less than a million dollars. >> And what was the differential between the two? >> Five percent. >> Five percent? So okay, sound of jaw dropping. 40 rack system at Oak Ridge? Five percent more performance than you guys running on effectively a half rack of like a supermicro or something like that? >> Oh and it was the first time we ran the benchmark, we were just learning how to run it, so those guys are all experts, they had IBM in there at their elbow helping them with all their tuning and everything, this was literally the first time our engineers ran the benchmark. >> Is a large feature of that the fact that Oak Ridge had to get all that hardware to get the physical IO necessary to run serial jobs, and you guys can just do this parallel on a relatively standard IO subset, NVME subset? >> Because beyond that, you have to learn how to use all those resources, right? All the tuning, all the expertise, one of the things people say is you need a PhD to administer one of those systems, and they're not far off, because it's true that it takes a lot of expertise. Our systems are dirt simple. >> Well you got to move the parallelism somewhere, and either you create it yourself, like you do at Oak Ridge, or you do it using your guys' stuff, through a file system. >> Exactly, and what we are showing that we have tremendously higher IO density, and we actually, what we're showing, that instead of using a local file system, that where most of them were created in the 90s, in the serial way of thinking, of optimizing over hard drives, if now you say, hey, NVME devices, SSDs are beasts at running 4k IOs, if you solve the networking problem, if the network is not the bottleneck anymore, if you just run all your IOs as much parallelized workload over 4k IOs, you actually get much higher performance than what you could get, up until we came, the pinnacle of performance, which is a local file system over a local device. >> Well so NFS has an effective throughput limitation of somewhere around a gigabyte, so if you've got a bunch of GPUs that are each wanting four, five, 10 gigabytes of data coming in, you're not saturating them out of an effective one gigabyte throughput rate, so it's almost like you've got the New York City Waterworks coming in to some of these big file systems, and you got like your little core sink that's actually spitting the data out into the GPUs, have I got that right? >> Good analogy, if you are creating a data lake, and then you're going to sip at it with some tiny little straw, it doesn't matter how much data you have, you can't really leverage the value of all that data that you've accumulated, if you're feeding it into your compute farm, GPU or not, because if you're feeding it into that farm slowly, then you'll never get to it all, right? And meanwhile more data's coming in every day, at a faster rate. It's an impossible situation, so the only solution really is to increase the rate by which you access the data, and that's what we do. >> So I could see how you're making the IO bandwidth junkies at Oak Ridge, or would make them really happy, but the other thing that at least I find interesting about Weka.IO is as you just talked about is that, that you've come up with an approach that's specifically built for SSD, you've moved the parallelism into the file system, as opposed to having it be somewhere else, which is natural, because SSD is not built to persist data, it's built to deliver data, and that suggests as you said earlier, that we're looking at a new way of thinking about storage as a consequence of technologies like Weka, technologies like NVME. Now Andy, you came from NetApp, and I remember what NetApp did to the industry, when it started talking about the advantages of sharing storage. Are we looking at something similar happening here with SSD and NVME and Weka? >> Indeed, I think that's the whole point, it's one of the reasons I'm so excited about it. It's not only because we have this technology that opens up this opportunity, this potential being realized. I think the other thing is, there's a lot of features, there's a lot of meaningful software that needs to be written around this architectural capability, and the team that I joined, their background, coming from having created XIV before, and the almost amazing way they all think together and recognize the market, and the way they interact with customers allows the organization to address realistically customer requirements, so instead of just doing things that we want to do because it seems elegant, or because the technology sparkles in some interesting way, this company, and it remains me of NetApp in the early days, and it was a driver of NetApp's big success, this company is very customer-focused, very customer driven. So when customers tell us what they're trying to do, we want to know more. Tell us in detail how you're trying to get there. What are your requirements? Because if we understand better, then we can engineer what we're doing to meet you there, because we have the fundamental building blocks. Those are mostly done, now what we're trying to do is add the pieces that allow you to implement it into your workflow, into your data center, or into your strategy for leveraging the cloud. >> So Liran, when you're here in 2019, we're having a similar conversation with this customer focus, you've got a value proposition to the IO bandwidth junkies, you can give more, but what's next in your sights? Are you going to show how this for example, you can get higher performance with less hardware? >> So we are already showing how you can get higher performance with less hardware, and I think as we go forward, we're going to have more customers embracing us for more workloads, so what we see already, they get us in for either the high end of their life sciences or their machine learning, and then people working around these people realize hey, I could get some faster speed as well, and then we start expanding within these customers and we get to see more and more workloads where people like us and we can start telling stories about them. The other thing that we have natural to us, we run natively in the cloud, and we actually let you move your workload seamlessly between your on-premises and the cloud, and we are seeing tremendous interest about moving to the cloud today, but not a lot of organizations already do it. I think 19 and forward, we are going to see more and more enterprises considering seriously moving to the cloud, cause we have almost 100% of our customers PFCing, cloudbursting, but not a lot of them using them. I think as time passes, all of them that has seen it working, when they did the initial test, will start leveraging this, and getting the elasticity out of the cloud, because this is what you should get out of the cloud, so this is one way for expansion for us. We are going to spend more resources into Europe, which we have recently started building the team, and later in that year also, JPAC. >> Gentlemen, thanks very much for coming on theCUBE and talking to us about some new advances in file systems that are leading to greater performance, less specialized hardware, and enabling new classes of applications. Liran Zvibel is the CEO of Weka.IO, Andy Watson is the CTO of Weka.IO, thanks for being on theCUBE. >> Thank you very much. >> Yeah, thanks a lot. >> And once again, I'm Peter Burris, and thanks very much for participating in this CUBE Conversation, until next time. (cheery music)
SUMMARY :
some of the performance So Liran, you've in the US, and we're And we've hired the CTO, Andy Watson, 2007, kind of the glory years, just based on the conversations we've had, a single client on the one the data locally, and then and then to do it over and distribute the data, we also in the future as they are So if you look at how people and then if you need it as We actually bring the new more interesting is the one Yes exactly, of the than you guys running on the benchmark. expertise, one of the things the parallelism somewhere, in the 90s, in the serial way of thinking, so the only solution the file system, as opposed to and the team that I and the cloud, and we are Liran Zvibel is the CEO and thanks very much for
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Andy | PERSON | 0.99+ |
Peter Burris | PERSON | 0.99+ |
Liran | PERSON | 0.99+ |
30 minutes | QUANTITY | 0.99+ |
ten | QUANTITY | 0.99+ |
Andy Watson | PERSON | 0.99+ |
Liran Zvibel | PERSON | 0.99+ |
2019 | DATE | 0.99+ |
Oak Ridge | ORGANIZATION | 0.99+ |
Europe | LOCATION | 0.99+ |
Weka.IO | ORGANIZATION | 0.99+ |
100,000 files | QUANTITY | 0.99+ |
Five percent | QUANTITY | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
40 rack | QUANTITY | 0.99+ |
four hours | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
December 2018 | DATE | 0.99+ |
Dallas | LOCATION | 0.99+ |
US | LOCATION | 0.99+ |
2007 | DATE | 0.99+ |
Bay Area | LOCATION | 0.99+ |
hundreds of gigabytes | QUANTITY | 0.99+ |
last year | DATE | 0.99+ |
two reasons | QUANTITY | 0.99+ |
Palo Alto, California | LOCATION | 0.99+ |
billions of file directories | QUANTITY | 0.99+ |
NetApp | ORGANIZATION | 0.99+ |
more than 100,000 files | QUANTITY | 0.99+ |
one file | QUANTITY | 0.99+ |
second | QUANTITY | 0.99+ |
this year | DATE | 0.99+ |
NVME | ORGANIZATION | 0.99+ |
mid 90s | DATE | 0.99+ |
one metric | QUANTITY | 0.99+ |
one place | QUANTITY | 0.99+ |
millions of files | QUANTITY | 0.98+ |
90s | DATE | 0.98+ |
five | QUANTITY | 0.98+ |
Weka | ORGANIZATION | 0.98+ |
tens | QUANTITY | 0.98+ |
first time | QUANTITY | 0.98+ |
eight cameras | QUANTITY | 0.98+ |
two different metrics | QUANTITY | 0.98+ |
single directory | QUANTITY | 0.98+ |
trillions of files | QUANTITY | 0.98+ |
one | QUANTITY | 0.97+ |
SE1800 | EVENT | 0.97+ |
less than a million dollars | QUANTITY | 0.97+ |
a half | QUANTITY | 0.97+ |
JPAC | ORGANIZATION | 0.97+ |
one way | QUANTITY | 0.97+ |
CUBE Conversation | EVENT | 0.96+ |
10-client | QUANTITY | 0.96+ |
tens times | QUANTITY | 0.96+ |
60 frames per second | QUANTITY | 0.96+ |
Today | DATE | 0.96+ |
NetApp | TITLE | 0.96+ |
two orders | QUANTITY | 0.95+ |
four | QUANTITY | 0.95+ |
almost 100% | QUANTITY | 0.94+ |