Som Shahapurkar & Adam Williams, Iron Mountain | AWS re:Invent 2021
(upbeat music) >> We're back at AWS re:Invent 2021. You're watching theCUBE and we're really excited to have Adam Williams on, he's a senior director of engineering at Iron Mountain. Som Shahapurkar, who's the product engineering of vertical solutions at Iron Mountain. Guys, great to see you. Thanks for coming on. >> Thank you >> Thank you. All right Adam, we know Iron Mountain trucks, tapes, what's new? >> What's new. So we've developed a SaaS platform for digitizing, classifying and bringing out and unlocking the value of our customer's data and putting their data to work. The content services platform that we've developed, goes together with an IDP that we call an intelligent document processing capability to do basic content management, but also to do data extraction and to increase workflow capabilities for our customers. >> Yeah, so I was kind of joking before Iron Mountain, the legacy business of course, everybody's seeing the trucks, but $4 billion company, $13 billion market cap, the stock's been on fire. The pandemic obviously has been a tailwind for you guys, but Som, if you had to describe it to like my mother, what's the sound bite that you'd give. >> Well the sound bite, as everyone knows data is gold today, right? And we are sitting figuratively and literally on a mountain of data. And now we have the technology to take that data partner with AWS, the heavy machinery to convert that into value, into value that people can use to complete the human story of healthcare, of mortgage, finance. A lot of this sits in systems, but it also sits in paper. And we are bridging that paper to digital divide, the physical and digital divide to create one story. >> This has been a journey for you guys. I mean, I recall that when you kind of laid this vision out a number of years ago, I think he made some acquisitions. And so maybe take us through that amazing transformation that Iron Mountain has made, but help the audience understand that. >> Transformations really been going from the physical records management that we've built our business around to evolving with our customers, to be able to work with all of the digital documents and not just be a transportation and records management storage company, but to actually work with them, to put their data to work, allowing them to be able to digitize a lot of their content, but also to bring in already digitized content and rich media. >> One of the problems that always existed, especially if you go back to back of my brain, 2006, the federal rules of civil procedure, which said that emails could now be evidence in a case and everyone like, oh, I don't like, how do I find email. So one of the real problems was classifying the information for retention policies. The lawyers wanted to throw everything out after whatever six or seven years, the business people wanted to keep everything forever. Neither of those strategies work, so classification and you couldn't do it manually. So have you guys solved that problem? How do you solve that problem? Does the machine intelligence help? It used to be, I'll use support vector machines or math or probabilistic, latent, semantic, indexing, all kinds of funky stuff. And now we enter this cloud world, have you guys been able to solve that problem and how? >> So our customers already have 20 plus years of retention rules and guidelines that are built within our systems. And we've helped them define those over the years. So we're able to take those records, retention schedules that they have, and then apply them to the documents. But instead of doing that manually, we're able to do that using our classification capabilities with AI ML and that Som's expertise. >> Awesome, so lay it on me. How do you guys do that? It's a lot of math. >> Yeah, so it can get complicated real fast, but at a simple level, what's changed really from support beta machines of 2006 to today is the scale at which we can do it, right? The scale at which we are bringing those technologies. Plus the latest technologies of deep learning, your conventional neural networks going from a bag of characters and words to really the way humans look at it. You look at a document and you know this is an invoice or this is a prescription, you don't have to even know to read to know that, machines are now capable of having that vision, the computer vision to say prescription, invoice. So we train those models and have them do it at industrial scale. >> Yeah, because humans are actually pretty bad at classifying at scale. >> At scale like their back. >> You remember, we used to try to do, oh, it was just tag it, oh, what a nightmare. And then when something changes and so now machines and the cloud and Jane said, how about, I mean, I presume highly regulated industries are the target, but maybe you could talk about the industry solutions a little bit. >> Right. Regulated industries are a challenge, right. Especially when you talk about black box methodologies like AI, where we don't know, okay, why does it classify this as this and that is that? But that's where I think a combined approach of what we are trying to say, composite AI. So the human knowledge, plus AI knowledge combined together to say, okay, we know about these regulations and hey, AI, be cognizant of this regulations while you do our stuff, don't go blindly. So we keep the AI in the guardrails and guided to be within those lines. >> And other part of that is we know our customers really well. We spent a lot of time with them. And so now we're able to take a lot of the challenges they have and go meet those needs with the document classification. But we also go beyond that, allowing them to implement their own workflows within the system, allowing them to be able to define their own capabilities and to be able to take those records into the future and to use our content management system as a true content services platform. >> Okay, take me through the before and the after. So the workflow used to be, I'd ring you up, or maybe you come in and every week grab a box of records, put them in the truck and then stick them in the Iron Mountain. And that was the workflow. And you wanted them back, you'd go get it back and it take awhile. So you've digitized that whole and when you say I'm inferring that the customer can define their own workflow because it's now software defined, right. So that's what you guys have engineered. Some serious engineering work. So what's the tech behind that. Can you paint a picture? >> So the tech behind it is we've run all of our cloud systems and Kubernetes. So using Kubernetes, we can scale really, really large. All of our capabilities are obviously cloud-based, which allows us to be able to scale rapidly. With that we run elastic search is our search engine and MongoDB is our no SQL database. And that allows us to be able to run millions of documents per minute through our system. We have customers that we're doing eight million documents a day for the reel over the process. And they're able to do that with a known level of accuracy. And they can go look at the documents that have had any exceptions. And we can go back to what Som was talking about to go through and retrain models and relabel documents so that we can catch that extra percentage and get it as close to 100% accuracy as we would like, or they would like. >> So what happens? So take me through the customer experience. What is that like? I mean, do they still... we you know the joke, the paperless bathroom will occur before the paperless office, right? So there's still paper in the office, but so what's the workload? I presume a lot of this is digitized at the office, but there's still paper, so help us understand that. >> Customers can take a couple of different paths. One is that we already have the physical documents that they'd like us to scan. We call that backfile scanning. So we already have the documents, they're in a box they're in a record center. We can move them between different records centers and get them imaged in our high volume scanning operation centers. From there-- >> Sorry to interrupt. And at that point, you're auto classifying, right? It's not already classified, I mean, it kind of is manually, but you're going to reclassify it on creation. >> Correct. >> Is that electronic document? >> For some of our customers, we have base metadata that gives us some clues as to what documents may be. But for other documents, we're able to train the models to know if their invoices or if their contracts commonly formatted documents, but customers can also bring in their already digitized content. They can bring in basic PDFs or Word documents or Google Docs for instance, but they can also bring in rich media, such as video and audio. And from there, we also do a speech to text for video and audio, in addition to just basic OCR for documents. >> Public sector, financial services, health care, insurance, I got to imagine that those have got to be the sweet spots. >> Another sweet spot for us is the federal space in public sector. We achieved FedRAMP, which is a major certification to be able to work with, with the federal government. >> Now, how would he work with AWS? What's your relationship with them? How do you use the cloud? Maybe you could describe that a little bit. >> Well, yeah, at multiple levels, right? So of course we use their cloud infrastructure to run our computing because with the AI and machine learning, you need a lot of computing power, right. And AWS is the one who can reliably provide it, space to store the digital data, computing the processes, extract all the information, train our models, and then process these, like he's talking about, we are talking about eight, 12, 16 million documents a day. So now you need seconds and sub second processing times, right? So at different levels, at the company infrastructure level, also the AI and machine learning algorithms levels, AWS has great, like Tesseract is one the ones that everyone knows but there is others purpose-built model APIs that we utilize. And then we'll put our secret sauce on top of that to build that pathway up and make it really compelling. >> And the secret sauce is obviously there's a workflow and the flexibility of the workflow, there's the classification and the machine learning and intelligence and all the engineering that makes the cloud work you manage. What else is there? >> Knowledge graphs, like he was saying, right, the domain. So mortgage is not that a document that looks very similar in mortgage versus a bank stated mortgage and bank statement in healthcare have different meanings. You're looking at different things. So you have something called a knowledge graph that maintains the knowledge of a person working in that field. And then we have those created for different fields and within those fields, different applications and use cases. So that's unique and that's powerful. >> That provides the ability to prior to hierarchy for our customers, so they can trace a document back to the original box that was given to us some many years ago. >> You got that providence and that lineage, I know you're not go to market guys, but conceptually, how do you price? Is it that, it's SaaS? Is it licensed? Is it term? Is it is a consumption based, based on how much I ingest? >> We have varying different pricing models. So we first off we're in six major markets from EU, Latin America, North America and others that we serve. So within those markets, we offer different capabilities. We have an essentials offering on AWS that we've launched in the last two weeks that allows you to be able to bring in base content. And that has a per object pricing. And then from there, we go into our standard edition that has ability to bring in additional workflows and have some custom pricing. And then we have what we call the enterprise. And for enterprise, we look at the customer's problem. We look at custom AI and ML models who might be developing and the solution that we're having to build for them and we provide a custom price and capability for what they need. >> And then the nativists this week announced a new glacier tier. So you guys are all over that. That's where you use it, right? The cheapest and the deepest, right? >> Yeah, one of the major things that AWS provides us as well is the compliance capabilities for our customers. So our customers really require us to have highly secure, highly trusted environments in the cloud. And then the ability to do that with data sovereignty is really important. And so we're able to meet that with AWS as well. >> What do you do in situations where AWS might not have a region? Do you have to find your own data center to do that stuff or? >> Well, so data privacy laws can be really complex. When you work with the customer, we can often find that the nearest data center in their region works, but we also do, we've explored the ability to run cloud capabilities within data centers, within the region that allows us to be able to bridge that. We also do have offerings where we can run on-premise, but obviously our focus here is on the cloud. >> Awesome business. Does Iron Mountain have any competitors? I mean like... >> Yeah. >> You don't have to name them, but I mean, this is awesome business. You've been around for a long time. >> And we found that we have new competitors now that we're in a new business. >> They are trying to disrupt and okay. So you guys are transforming as an incumbent. You're the incumbent disruptor. >> Yes. >> Yes, it's self disruption to some extent, right. Saying, hey, let's broaden our horizon perspective offering value. But I think the key thing is, I want to focus more on the competitive advantage rather than the competitors is that we have the end to end flow, right? From the high volume scanning operations, trucking, the physical world, then up and about into the digital world, right? So you extract it, it's not just PDFs. And then you go into database, machine learnings, unstructured to structured extraction. And then about that value added models. It's not just about classification. Well, now that you have classified and you have all this documents and you have all this data, what can you glean from it? What can you learn about your customers, the customers, customers, and provide them better services. So we are adding value all throughout this chain. And think we are the only ones that can do that full stack. >> That's the real competitive advantage. Guys, really super exciting. Congratulations on getting there. I know it's been a lot of hard work and engineering and way to go. >> Thank you. >> It's fun. >> Dave: It's good, suppose to have you back. >> Thanks. >> All right and thank you for watching. This is Dave Vellante for theCUBE, the leader in live tech coverage. (upbeat music)
SUMMARY :
the product engineering All right Adam, we know and to increase workflow describe it to like my mother, And now we have the I mean, I recall that when you of the digital documents So have you guys solved that problem? and then apply them to the documents. How do you guys do that? of having that vision, Yeah, because humans but maybe you could talk about and guided to be within those lines. and to be able to take those inferring that the customer and get it as close to 100% we you know the joke, One is that we already And at that point, you're And from there, we also have got to be the sweet spots. to be able to work with, How do you use the cloud? And AWS is the one who that makes the cloud work you manage. that maintains the knowledge to prior to hierarchy and others that we serve. So you guys are all over that. And then the ability to do here is on the cloud. Does Iron Mountain have any competitors? You don't have to And we found that we So you guys are transforming Well, now that you have classified That's the real competitive advantage. suppose to have you back. the leader in live tech coverage.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Jane | PERSON | 0.99+ |
Adam Williams | PERSON | 0.99+ |
Som Shahapurkar | PERSON | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
2006 | DATE | 0.99+ |
Dave | PERSON | 0.99+ |
$4 billion | QUANTITY | 0.99+ |
Adam | PERSON | 0.99+ |
$13 billion | QUANTITY | 0.99+ |
20 plus years | QUANTITY | 0.99+ |
six | QUANTITY | 0.99+ |
Iron Mountain | ORGANIZATION | 0.99+ |
Word | TITLE | 0.99+ |
100% | QUANTITY | 0.99+ |
One | QUANTITY | 0.99+ |
North America | LOCATION | 0.99+ |
one | QUANTITY | 0.99+ |
seven years | QUANTITY | 0.99+ |
today | DATE | 0.98+ |
Kubernetes | TITLE | 0.98+ |
Som | PERSON | 0.98+ |
Latin America | LOCATION | 0.98+ |
Google Docs | TITLE | 0.98+ |
EU | LOCATION | 0.98+ |
this week | DATE | 0.98+ |
MongoDB | TITLE | 0.97+ |
first | QUANTITY | 0.96+ |
Iron Mountain | LOCATION | 0.96+ |
six major markets | QUANTITY | 0.95+ |
one story | QUANTITY | 0.93+ |
pandemic | EVENT | 0.9+ |
12, 16 million documents a day | QUANTITY | 0.9+ |
millions of documents per minute | QUANTITY | 0.89+ |
second | QUANTITY | 0.87+ |
eight million documents a day | QUANTITY | 0.87+ |
last two weeks | DATE | 0.85+ |
SQL | TITLE | 0.82+ |
seconds | QUANTITY | 0.77+ |
about | QUANTITY | 0.76+ |
Invent 2021 | EVENT | 0.72+ |
Tesseract | TITLE | 0.72+ |
of years ago | DATE | 0.71+ |
re: | EVENT | 0.7+ |
many years ago | DATE | 0.68+ |
Invent | EVENT | 0.67+ |
rules of civil procedure | TITLE | 0.66+ |
eight | QUANTITY | 0.63+ |
theCUBE | ORGANIZATION | 0.51+ |
2021 | DATE | 0.48+ |