Paul Barth, Podium Data | The Podium Data Marketplace
(light techno music) >> Narrator: From the SiliconANGLE Media office in Boston, Massachusetts, it's theCUBE. Now here's your host, Stu Miniman. >> Hi, I'm Stu Miniman and welcome to theCUBE conversation here in our Boston area studio. Happy to welcome back to the program, Paul Barth, who's the CEO of Podium Data, also a Boston area company. Paul, great to see you. >> Great to see you, Stu. >> Alright, so we last caught up with you, it was a fun event that we do at MIT talking about information, data quality, kind of understand why your company would be there. For our audience that doesn't know, just give us a quick summary, your background, what was kind of the why of Podium Data back when it was founded in 2014. >> Oh that's great Stu, thank you. I've spent most of my career in helping large companies with their data and analytic strategies, next generation architectures, new technologies, et cetera, and in doing this work, we kept stumbling across the complexity of adopting new technologies. And around the time that big data and Hadoop was getting popular and lots of hype in the marketplace, we realized that traditional large businesses couldn't manage data on this because the technology was so new and different. So we decided to form a software company that would automate a lot of the processing, manage a catalog of the data, and make it easy for nontechnical users to access their data. >> Yeah, that's great. You know when I think back to when we were trying to help people understand this whole big data wave, one of the pithy things we did, it was turning all this glut of data from a problem to an opportunity, how do we put this in to the users. But a lot of things kind of, we hit bumps in the road as an industry. Did studies it was more than 50 percent of these projects fail. You brought up a great point, tooling is tough, changing processes is really challenging. But that focus on data is core to our research, what we talk about all the time. But now it's automation and AIML, choose your favorite acronym of the day. This is going to solve all the ills that the big data wave didn't do right. Right, Paul? So maybe you can help us connect the dots a little bit because I hear a lot in to the foundation that trend from the big data to kind of the automation and AI thing. So you're maybe just a little ahead of your time. >> Well thanks, I saw an opportunity before there was anything in the marketplace that could help companies really corral their data, get some of the benefits of consolidation, some oversight in management through an automated catalog and the like. As AI has started to emerge as the next hype wave, what we're seeing consistently from our partners like Data Robot and others who have great AI technology is they're starved for good information. You can't learn automatically or even human learning if you're given inconsistent information, data that's not conformed or ready or consistent, which you can look at a lot of different events and start to build correlations. So we believe that we're still a central part of large companies building out their analytics infrastructure. >> Okay, help us kind of look at how your users and how you fit into this changing ecosystem. We all know things are just changing so fast. From 2014 to today, Cloud is so much bigger, the big waves of IoT keep talking. Everybody's got some kind of machine learning initiative. So what're the customers looking for, how do you fit in some of those different environments? >> I think when we formed the company we recognized that the cost performance differential between the open-sourced data management platforms like Hadoop and now Spark, were so dramatically better than the traditional databases and data warehouses, that we could transform the business process of how do you get data from Rotaready. And that's a consistent problem for large companies they have data in legacy formats, on mainframes, they have them in relational databases, they have them in flat files, in the Cloud, behind the firewall, and these silos continue to grow. This view of a consistent, or consistent view of your business, your customers, your processes, your operations, is cental to optimizing and automating the business today. So our business users are looking for a couple of things. One thing they are looking for is some manageability and a consistent view of their data no matter where it lives, and our catalog can create that automatically in days or weeks depending on how how big we go or how broadly we go. They're looking for that visibility but also they're looking for productivity enhancements, which means that they can start leveraging that data without a big IT project. And finally they're looking for agility which means there's self-service, there's an ability to access data that you know is trusted and secured and safe for the end users to use without having to call IT and have a program spin something up. So they're really looking for a totally new paradigm of data delivery. >> I tell you that hits on so many things that we've been seeing and a challenge that we've seen in the marketplace. In my world, talk about people they had their data centers and if I look at my data and I look at my applications, it's this heterogeneous nightmare. We call it hybrid or multi cloud these days, and it shows the promise of making me faster and all this stuff. But as you said, my data is all over the place, my applications are getting spun up and maybe I'm moving them and federating things and all that. But, my data is one of the most critical components of my business. Maybe explain a little bit how that works. Where do the customers come in and say oh my gosh, I've got a challenge and Podium Data's helping and the marketplace and all that. >> Sure, first of all we targeted from the start large regulated businesses, financial services, pharmaceutical healthcare, and we've broadened since then. But these companies' data issues were really pressure from both ends. One was a compliance pressure. They needed to develop regulatory reports that could be audited and proven correct. If your data is in many silos and it's compiled manually using spreadsheets, that's not only incredibly expensive and nonreproducible, it's really not auditable. So a lot of these folks were pressured to prove that the data they were reporting was accurate. On the other side, it's the opportunity cost. Fintech companies are coming into their space offering loans and financial products, without any human interaction, without any branches. They knew that data was the center to that. The only way you can make an offer to someone for financial product is if you know enough about them that you understand the risk. So the use and leverage of data was a very critical mass. There was good money to invest in it and they also saw that the old ways of doing this just weren't working. >> Paul, does your company help with the incoming GDPR challenges that are being faced? >> Sure, last year we introduced a PII detector and protection scheme. That may not sound like such a big deal but in the Hadoop open-source world it is. At the end of the day this technology while cheap and powerful is incredibly immature. So when you land data, for example, into these open data platforms like S3 out in the Cloud, Podium takes the time to analyze that data and tell you what the structures of the data are, where you might have issues with sensitive data, and has the tooling like obfuscation and encryption to protect the data so you can create safe to use data. I'd say our customers right now, they started out behind the firewall. Again, these regulated businesses were very nervous about breaches. They're looking and realizing they need to get to the Cloud 'cause frankly not only is it a better platform for them from a cost basis and scalability, it's actually where the data comes from these days, their data suppliers are in the Cloud. So we're helping them catalog their data and identify the sensitive data and prepare data sets to move to the Cloud and then migrate it to the Cloud and manage it there. >> Such a critical piece. I lived in the storage world for about a decade. There was a little acquisition that they made of a company called Pi, P-I. It was Paul Maritz who a lot of people know, Paul had a great career at Microsoft went on to run VMware for a bunch. But it was, the vision you talk about reminds me of what I heard Paul Maritz talking to. Gosh, that was a decade ago. Information, so much sensitivity. Expand a little bit on the security aspect there, when I looked through your website, you're not a security company per se, but are there partnerships? How do you help customers with I want to leverage data but I need to be secure, all the GRC and security things that's super challenging. >> At this space to achieve agility and scale on a new technology, you have to be enterprise ready. So in version one of our product, we had security features that included field level encryption and protection, but also integration with LDAB and Kerberos and other enterprise standard mechanisms and systems that would protect data. We can interoperate with Protegrity's and other kinds of encryption and protection algorithms with our open architecture. But it's kind of table stakes to get your data in a secured, monitorable infrastructure if you're going to enable this agility and self-service. Otherwise you restrict the use of the new data technologies to sandboxes. The failures you hear about are not in the sandboxes in the exploration, they're in getting those to production. I had one of my customers talk about how before Podium they had 50 different projects on Hadoop and all of them were in code red and none of them could go to production. >> Paul you mentioned catalogs, give us the update. What's the newest from Podium Data? Help explain that a little bit more. >> So we believe that the catalog has to help operationalize the data delivery process. So one of the things we did from the very start was say let's use the analytical power of big data technologies, Spark, Hadoop, and others, to analyze the data on it's way in to the platform and build a metadata catalog out of that. So we have over 100 profiling statistics that we automatically calculate and maintain for every field of every file we ever load. It's not something you do as an afterthought or selectively. We knew from our experience that we needed to do that, data validation, and then bring in inferences such as this field looks like PII data and tag that in the metadata. That process of taking in data and this even applies to legacy mainframe data coming in a VSAM format. It gets converted and landed to a usable format automatically. But the most important part is the catalog gets enriched with all this statistical profiling information, validation, all of the technical information and we interoperate as well as have a GUI to help with business tagging, business definitions in the light. >> Paul, just a little bit of a broader industry question, we talked a value of data I think everybody understands how important is it. How are we doing in understanding the value of that data though, is that a monetization thing? You've got academia in your background, there's debates, we've talked to some people at MIT about this. How do you look at data value as an industry in general, is there anything from Podium Data that you help people identify, are we leveraging it, are we doing the most, what are your thoughts around that? >> So I'd say someone who's looking for a good framework to think about this I'd recommend Doug Laney's book on infonomics, we've collaborated for a while, he's doing a great job there. But there's also just a blocking and tackling which is what data is getting used or a common one for our customers is where do I have data that's duplicate or it comes from the same source but it's not exactly the same. That often causes reconciliation issues in finance, or in forecasting, in sales analysis. So what we've done with our data catalog with all these profiling statistics is start to build some analytics that identify similar data sets that don't have to be exactly the same to say you may have a version of the data that you're trying to load here already available. Why don't you look at that data set and see if that one is preferred and the data governance community really likes this. For one of our customers there were literally millions of dollars in savings of eliminating duplication but the more important thing is the inconsistency, when people are using similar but not the same data sets. So we're seeing that as a real driver. >> I want to give you the final word. Just what are you seeing out in the industry these days, biggest opportunities, biggest challenges from users you're talking to? >> Well, what I'd say is when we started this it was very difficult for traditional businesses to use Hadoop in production and they needed an army of programmers and I think we solved that. Last year we started on our work to move to a post-Hadoop world so the first thing we've done is open up our cataloging tools so we can catalog any data set in any source and allow the data to be brought into an analytical environment or production environment more on demand then the idea that you're going to build a giant data lake with everything in it and replicate everything. That's become really interesting because you can build the catalog in a few weeks and then actually use the analysis and all the contents to drive the strategy. What do I prioritize, where do I put things? The other big initiative is of course, Cloud. As I mentioned earlier you have to protect and make Cloud ready data behind your firewall and then you have to know where it's used and how it's used externally. We automate a lot of that process and make that transition something that you can manage over time, and that is now going to be extended into multi cloud, multi lake type of technologies. >> Multi cloud, multi lake, alright. Well Paul Barth, I appreciate getting the update everything happening with Podium Data. Well, theCUBE had so many events this year, be sure to check out thecube.net for all the upcoming events and all the existing interviews. I'm Stu Miniman, thanks for watching theCUBE. (light techno music)
SUMMARY :
Narrator: From the SiliconANGLE Media office Hi, I'm Stu Miniman and welcome to theCUBE conversation it was a fun event that we do at MIT and in doing this work, we kept stumbling across one of the pithy things we did, and start to build correlations. and how you fit into this changing ecosystem. and safe for the end users to use and it shows the promise of making me So the use and leverage of data was a very critical mass. and then migrate it to the Cloud and manage it there. Expand a little bit on the security aspect there, and none of them could go to production. What's the newest from Podium Data? and tag that in the metadata. that you help people identify, are we leveraging it, and the data governance community really likes this. I want to give you the final word. and allow the data to be brought into Well Paul Barth, I appreciate getting the update
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
2014 | DATE | 0.99+ |
Podium Data | ORGANIZATION | 0.99+ |
Paul Maritz | PERSON | 0.99+ |
Stu Miniman | PERSON | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
Paul Barth | PERSON | 0.99+ |
Paul | PERSON | 0.99+ |
Boston | LOCATION | 0.99+ |
last year | DATE | 0.99+ |
Stu | PERSON | 0.99+ |
Last year | DATE | 0.99+ |
Podium | ORGANIZATION | 0.99+ |
Doug Laney | PERSON | 0.99+ |
thecube.net | OTHER | 0.99+ |
more than 50 percent | QUANTITY | 0.99+ |
one | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
Boston, Massachusetts | LOCATION | 0.99+ |
MIT | ORGANIZATION | 0.98+ |
GRC | ORGANIZATION | 0.98+ |
One | QUANTITY | 0.98+ |
this year | DATE | 0.98+ |
both ends | QUANTITY | 0.98+ |
50 different projects | QUANTITY | 0.97+ |
Spark | TITLE | 0.97+ |
Data Robot | ORGANIZATION | 0.97+ |
Hadoop | TITLE | 0.96+ |
S3 | TITLE | 0.95+ |
millions of dollars | QUANTITY | 0.95+ |
GDPR | TITLE | 0.95+ |
theCUBE | ORGANIZATION | 0.95+ |
a decade ago | DATE | 0.94+ |
over 100 profiling statistics | QUANTITY | 0.91+ |
Cloud | TITLE | 0.9+ |
Rotaready | ORGANIZATION | 0.89+ |
One thing | QUANTITY | 0.87+ |
first thing | QUANTITY | 0.87+ |
VMware | TITLE | 0.86+ |
Kerberos | TITLE | 0.83+ |
The Podium Data Marketplace | ORGANIZATION | 0.79+ |
first | QUANTITY | 0.79+ |
LDAB | TITLE | 0.79+ |
Pi, P-I | ORGANIZATION | 0.77+ |
SiliconANGLE Media | ORGANIZATION | 0.61+ |
a decade | QUANTITY | 0.6+ |
wave | EVENT | 0.45+ |
Protegrity | ORGANIZATION | 0.44+ |