Santhosh Mahendiran, Standard Chartered Bank | BigData NYC 2017
>> Announcer: Live, from Midtown Manhattan, it's theCUBE, covering Big Data New York City 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. (upbeat techno music) >> Okay welcome back, we're live here in New York City. It's theCUBE's presentation of Big Data NYC, our fifth year doing this event in conjunction with Strata Data, formerly Strata Hadoop, formerly Strata Conference, formerly Hadoop World, we've been there from the beginning. Eight years covering Hadoop's ecosystem now Big Data. This is theCUBE, I'm John Furrier. Our next guest is Santhosh Mahendiran, who is the global head of technology analytics at Standard Chartered Bank. A practitioner in the field, here getting the data, checking out the scene, giving a presentation on your journey with Data at a bank, which is big financial obviously an adopter. Welcome to theCUBE. >> Thank you very much. >> So we always want to know what the practitioners are doing because at the end of the day there's a lot of vendors selling stuff here, so you got, everyone's got their story. End of the day you got to implement. >> That's right. >> And one of the themes is the data democratization which sounds warm and fuzzy, collaborating with data, this is all good stuff and you feel good and you move into the future, but at the end of the day it's got to have business value. >> That's right. >> And as you look at that, how do you look at the business value? Cause you want to be in the bleeding edge, you want to provide value and get that edge operationally. >> That's right. >> Where's the value in data democratization? How did you guys roll this out? Share your story. >> Okay, so let me start with the journey first before I come to the value part of it, right? So, data democratization is an outcome, but the journey has been something we started three years back. So what did we do, right? So we had some guiding principles to start our journey. The first was to say that we believed in the three S's, which is speed, scale, and it should be really, really flexible and super fast. So one of the challenges that we had was our historical data warehouses was entirely becoming redundant. And why was it? Because it was RDBMS centric, and it was extremely disparate. So we weren't able to scale up to meet the demands of managing huge chunks of data. So, the first step that we did was to re-pivot it to say that okay, let's embrace Hadoop. And what you mean by embracing is just not putting in the data lake, but we said that all our data will land into the data lake. And this journey started in 2015, so we have close to 80% of the Bank's data in the lake and it is end of day data right now and this data flows in on daily basis, and we have consumers who feed off that data. Now coming to your question about-- >> So the data lake's working? >> The data lake is working, up and running. >> People like it, you just got a good spot, batch 'em all you throw everything in the lake. >> So it is not real time, it is end of day. There is some data that is real-time, but the data lake is not entirely real-time, that I have to tell you. But one part is that the data lake is working. Second part to your question is how do I actually monetize it? Are you getting some value out of it? But I think that's where tools like Paxata has actually enabled us to accelerate this journey. So we call it data democratization. So the best part it's not about having the data. We want the business users to actually use the data. Typically, data has always been either delayed or denied in most of the cases to end-users and we have end-users waiting for the data but they don't get access to the data. It was done because primarily the size of the data was too huge and it wasn't flexible enough to be shared with. So how did tools like Paxata and the data lake help us? So what we did with data democratization is basically to say that "hey we'll get end-users to access the data first in a fast manner, in a self-service manner, and something that gives operational assurance to the data, so you don't hold the data and then say that you're going to get a subset of data to play with. We'll give you the entire set of data and we'll give you the right tools which you can play with. Most importantly, from an IT perspective, we'll be able to govern it. So that's the key about democratization. It's not about just giving them a tool, giving them all data and then say "go figure it out." It's about ensuring that "okay, you've got the tools, you've got the data, but we'll also govern it," so that you obviously have control over what they're doing. >> So now you govern it, they don't have to get involved in the governance, they just have access? >> No they don't need to. Yeah, they have access. So governance works both ways. We establish the boundaries. Look at it as a referee, and then say that "okay, there are guidelines that you don't," and within the datasets that key people have access to, you can further set rules. Now, coming back to specific use cases, I can talk about two specific cases which actually helped us to move the needle. The first is on stress testing, so being a financial institution, we typically have to report various numbers to our regulators, etc. The turnaround time was extremely huge. These kind of stress testing typically involve taking huge amount-- >> What were some of the turnaround times? >> Normally it was two to three weeks, some cases a month-- >> Wow. >> So we were able to narrow it down to days, but what we essentially did was as with any stress testing or reporting, it involved taking huge amounts of data, crunching them and then running some models and then showing the output, basically a number of transformations involved. Earlier, you first couldn't access the entire dataset, so that we solved-- >> So check, that was a good step one-- >> That was step one. >> But was there automation involved in that, the Paxata piece? >> Yeah, I wouldn't say it was fully automated end-to-end, but there was definitely automation given the fact that now you got Paxata to work off the data rather than someone extracting the data and then going off and figuring what needs to be done. The ability to work off the entire dataset was a big plus. So stress testing, bringing down the cycle time. The second one use case I can talk about is again anti-money laundering, and in our financial crime compliance space. We had processes that took time to report, given the clunkiness in the various handoffs that we needed to do. But again, empowering the users, giving the tool to them and then saying "hey, this"-- >> How about know your user, because we have to anti-money launder, you need to have to know your user base, that's all set their too? >> Yeah. So the good part is know the user, know your customer, KYCs all that part is set, but the key part is making sure the end-users are able to access the data much more earlier in the life cycle and are able to play with it. In the case of anti-money laundering, again first question of three weeks to four weeks was shortened down to question of days by giving tools like Paxata again in a structured manner and with which we're able to govern. >> You control this, so you knew what you were doing, but you let their tools do the job? >> Correct, so look at it this way. Typically, the data journey has always been IT-led. It has never been business-led. If you look at the generations of what happens is, you source the data which is IT-led, then you model the data which is IT-led, then you prepare then massage the data which is again IT-led and then you have tools on top of it which is again IT-led so the end-users get it only after the fourth stage. Now look at the generations within. All these life cycles apart from the fact that you source the data which is typically an IT issue, the rest need to be done by the actual business users and that's what we did. That's the progression of the generations in which we now we're in the third generation as I call it where our role is just to source the data and then say, "yeah we'll govern it in the matter and then preparation-- >> It's really an operating system and we were talking with Aaron with Elation's co-founder, we used the analogy of a car, how this show was like a car show engine show, what's in the engine and the technology and then it evolved every year, now it's like we're talking about the cars, now we're talking about driver experience-- >> That's right. >> At the end of the day, you just want to drive. You don't really care what's under the hood, you do but you don't, but there's those people who do care what's under the hood, so you can have best of both worlds. You've got the engines, you set up the infrastructure, but ultimately, you in the business side, you just want to drive, that's what's you're getting at? >> That's right. The time-to-market and speed to empower the users to play around with the data rather than IT trying to churn the data and confine access to data, that's a thing of the past. So we want more users to have faster access to data but at the same time govern it in a seamless manner. The word governance is still important because it's not about just give the data. >> And seamless is key. >> Seamless is key. >> Cause if you have democratization of data, you're implying that it is community-oriented, means that it's available, with access privileges all transparently or abstracted away from the users. >> Absolutely. >> So here's the question I want to ask you. There's been talk, I've been saying it for years going back to 2012 that an abstraction layer, a data layer will evolve and that'll be the real key. And then here in this show, I heard things like intelligent information fabric that is business, consumer-friendly. Okay, it's a mouthful, but intelligent information fabric in essence talks about an abstraction layer-- >> That's right. >> That doesn't really compromise anything but gives some enablement, creates some enabling value-- >> That's right. >> For software, how do you see that? >> As the word suggests, the earlier model was trying to build something for the end-users, but not which was end-user friendly, meaning to say, let me just give you a simple example. You had a data model that existed. Historically the way that we have approached using data is to say "hey, I've got a model and then let's fit that data into this model," without actually saying that "does this model actually serve the purpose?" You abstracted the model to a higher level. The whole point about intelligent data is about saying that, I'll give you a very simple analogy. Take zip code. Zipcode in US is very different from zipcode in India, it's very different from zipcode in Singapore. So if I had the ability for my data to come in, to say that "I know it's a zipcode, but this zipcode belongs to US, this zipcode belongs to Singapore, and this zipcode belongs to India," and more importantly, if I can further rev it up a notch, if I say that "this belongs to India, and this zipcode is valid." Look at where I'm going with intelligent sense. So that's what's up. If you look at the earlier model, you have to say that "yeah, this is a placeholder for zipcode." Now that makes sense, but what are you doing with it? >> Being a relational database model, it's just a field in a schema, you're taking it and abstracting it and creating value out of it. >> Precisely. So what I'm actually doing is accelerating the adoption, I'm making it more simpler for users to understand what the data is. So I don't need to as a user figure out "I got a zipcode, now is it a Singapore, India or what zipcode." >> So all this automation, Paxata's got a good system, we'll come back to the Paxata question in a second, I do want to drill down on that. But the big thing that I've been seeing at the show, and again Dave Alonte, my partner, co-CEO of Silicon Angle, we always talk about this all the time. He's more less bullish on Hadoop than I am. Although I love Hadoop, I think it's great but it's not the end-all, be-all. It's a great use case. We were critical early on and the thing we were critical on it was it was too much time being spent on the engine and how things are built, not on the business value. So there's like a lull period in the business where it was just too costly-- >> That's right. >> Total cost of ownership was a huge, huge problem. >> That's right. >> So now today, how did you deal with that and are you measuring the TCO or total cost of ownership cause at the end of the day, time to value, which is can you be up and running in 90 days with value and can you continue to do that, and then what's the overall cost to get there. Thoughts? >> So look I think TCO always underpins any technology investment. If someone said I'm doing a technology investment without thinking about TCO, I don't think he's a good technology leader, so TCO is obviously a driving factor. But TCO has multiple components. One is the TCO of the solution. The other aspect is TCO of what my value I'm going to get out of this system. So talking from an implementation perspective, what I look at as TCO is my whole ecosystem which is my hardware, software, so you spoke about Hadoop, you spoke about RDBMS, is Hadoop cheaper, etc? I don't want to get into that debate of cheaper or not but what I know is the ecosystem is becoming much, much more cheaper than before. And when I talk about ecosystem, I'm talking about RDBMS tools, I'm talking about Hadoop, I'm talking about BI tools, I'm talking about governance, I'm talking about this whole framework becoming cheaper. And it is also underpinned by the fact that hardware is also becoming cheaper. So the reality is all components in the whole ecosystem are becoming cheaper and given the fact that software is also becoming more open-sourced and people are open to using open-source software, I think the whole question of TCO becomes a much more pertinent question. Now coming to your point, do you measure it regularly? I think the honest answer is I don't think we are doing a good job of measuring it that well, but we do have that as one of the criteria for us to actually measure the success of our project. The way that we do is our implementation cost, at the time of writing out our PETs, we call it PETs, which is the Project Execution Document, we talk about cost. We say that "what's the implementation cost?" What are the business cases that are going to be an outcome of this? I'll give you an example of our anti-money laundering. I told you we reduced our cycle time from few weeks to a few days, and that in turn means the number of people involved in this whole process, you're reducing the overheads and the operational folks involved in it. That itself tells you how much we're able to save. So definitely, TCO is there and to say that-- >> And you are mindful of, it's what you look at, it's key. TCO is on your radar 100% you evaluate that into your deals? >> Yes, we do. >> So Paxata, what's so great about Paxata? Obviously you've had success with them. You're a customer, what's the deal. Was it the tech, was it the automation, the team? What was the key thing that got you engaged with them or specifically why Paxata? >> Look, I think the key to partnership there cannot be one ingredient that makes a partnership successful, I think there are multiple ingredients that make a partnership successful. We were one of the earliest adopters of Paxata. Given that we're a bank and we have multiple different systems and we have lot of manual processing involved, we saw Paxata as a good fit to govern these processes and ensure at the same time, users don't lose their experience. The good thing about Paxata that we like was obviously the simplicity and the look and feel of the tool. That's number one. Simplicity was a big point. The second one is about scale. The scale, the fact that it can take in millions of roles, it's not about just working off a sample of data. It can work on the entire dataset. That's very key for us. The third is to leverage our ecosystem, so it's not about saying "okay you give me this data, let me go figure out what to do and then," so Paxata works off the data lake. The fact that it can leverage the lake that we built, the fact that it's a simple and self-preparation tool which doesn't require a lot of time to bootstrap, so end-use people like you-- >> So it makes it usable. >> It's extremely user-friendly and usable in a very short period of time. >> And that helped with the journey? >> That really helped with the journey. >> Santosh, thanks so much for sharing. Santosh Mahendiran, who is the Global Tech Lead at the Analytics of the Bank at Standard Chartered Bank. Again, financial services, always a great early adopter, and you get success under your belt, congratulations. Data democratization is huge and again, it's an ecosystem, you got all that anti-money laundering to figure out, you got to get those reports out, lot of heavylifting? >> That's right, >> So thanks so much for sharing your story. >> Thank you very much. >> We'll give you more coverage after this short break, I'm John Furrier, stay tuned. More live coverage in New York City, its theCube.
SUMMARY :
Brought to you by SiliconANGLE Media here getting the data, checking out the scene, End of the day you got to implement. but at the end of the day it's got to have business value. how do you look at the business value? Where's the value in data democratization? So one of the challenges that we had was People like it, you just got a good spot, in most of the cases to end-users and we have end-users guidelines that you don't," and within the datasets that Earlier, you first couldn't access the entire dataset, So stress testing, bringing down the cycle time. So the good part is know the user, know your customer, That's the progression of the generations in which we At the end of the day, you just want to drive. but at the same time govern it in a seamless manner. Cause if you have democratization of data, So here's the question I want to ask you. So if I had the ability for my data to come in, and creating value out of it. So I don't need to as a user figure out "I got a zipcode, But the big thing that I've been seeing at the show, at the end of the day, time to value, which is can you be So the reality is all components in the whole ecosystem And you are mindful of, it's what you look at, it's key. Was it the tech, was it the automation, the team? The fact that it can leverage the lake that we built, It's extremely user-friendly and usable in a very at the Analytics of the Bank at Standard Chartered Bank. We'll give you more coverage after this short break,
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave Alonte | PERSON | 0.99+ |
Standard Chartered Bank | ORGANIZATION | 0.99+ |
three weeks | QUANTITY | 0.99+ |
John Furrier | PERSON | 0.99+ |
New York City | LOCATION | 0.99+ |
2012 | DATE | 0.99+ |
2015 | DATE | 0.99+ |
Santosh Mahendiran | PERSON | 0.99+ |
two | QUANTITY | 0.99+ |
Aaron | PERSON | 0.99+ |
US | LOCATION | 0.99+ |
Santhosh Mahendiran | PERSON | 0.99+ |
Singapore | LOCATION | 0.99+ |
Santosh | PERSON | 0.99+ |
four weeks | QUANTITY | 0.99+ |
TCO | ORGANIZATION | 0.99+ |
100% | QUANTITY | 0.99+ |
90 days | QUANTITY | 0.99+ |
India | LOCATION | 0.99+ |
SiliconANGLE Media | ORGANIZATION | 0.99+ |
fifth year | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
Midtown Manhattan | LOCATION | 0.99+ |
Paxata | ORGANIZATION | 0.99+ |
one ingredient | QUANTITY | 0.99+ |
third | QUANTITY | 0.99+ |
theCUBE | ORGANIZATION | 0.99+ |
one part | QUANTITY | 0.99+ |
millions | QUANTITY | 0.99+ |
first | QUANTITY | 0.99+ |
Eight years | QUANTITY | 0.99+ |
Silicon Angle | ORGANIZATION | 0.99+ |
Second part | QUANTITY | 0.98+ |
third generation | QUANTITY | 0.98+ |
fourth stage | QUANTITY | 0.98+ |
two specific cases | QUANTITY | 0.98+ |
both ways | QUANTITY | 0.98+ |
one | QUANTITY | 0.98+ |
BigData | ORGANIZATION | 0.98+ |
NYC | LOCATION | 0.98+ |
both worlds | QUANTITY | 0.98+ |
first step | QUANTITY | 0.97+ |
three years back | DATE | 0.97+ |
second one | QUANTITY | 0.97+ |
One | QUANTITY | 0.97+ |
2017 | DATE | 0.96+ |
Hadoop | TITLE | 0.96+ |
Strata Data | ORGANIZATION | 0.96+ |
Strata Hadoop | ORGANIZATION | 0.94+ |
step one | QUANTITY | 0.94+ |
first question | QUANTITY | 0.93+ |
a month | QUANTITY | 0.92+ |
Elation | ORGANIZATION | 0.9+ |
Data | EVENT | 0.89+ |
2017 | EVENT | 0.89+ |
80% | QUANTITY | 0.88+ |
Paxata | TITLE | 0.88+ |
Big Data | EVENT | 0.84+ |
theCube | ORGANIZATION | 0.83+ |