Prakash Nanduri, Paxata | Corinium Chief Analytics Officer Spring 2018

(techno music) >> Announcer: From the Corinium Chief Analytics Officer Conference Spring San Francisco. It's theCUBE. >> Hey, welcome back everybody. Jeff Frick here with theCUBE. We're in downtown San Francisco at the Parc 55 Hotel at the Corinium Chief Analytics Officer Spring 2018 event, about 100 people, pretty intimate affair. A lot of practitioners here talking about the challenges of Big Data and the challenges of Analytics. We're really excited to have a very special Cube guest. I think he was the first guy to launch his company on theCUBE. It was Big Data New York City 2013. I remember it distinctly. It's Prakash Nanduri, the co-founder and CEO of Paxata. Great to see you. >> Great seeing you. Thank you for having me back. >> Absolutely. You know we got so much mileage out of that clip. We put it on all of our promotional materials. You going to launch your company? Launch your company on theCUBE. >> You know it seems just like yesterday but it's been a long ride and it's been a fantastic ride. >> So give us just a quick general update on the company, where you guys are now, how things are going. >> Things are going fantastic. We continue to grow. If you recall, when we launched, we launched the whole notion of democratization of information in the enterprise with self service data prep. We have gone onto now delivered real value to some of the largest brands in the world. We're very proud that 2017 was the year when massive amount of adoption of Paxata's adaptive information platform was taken across multiple industries, financial services, retail, CPG, high tech, in the OIT space. So, we just keep growing and it's the usual challenges of managing growth and managing, you know, the change in the company as you, as you grow from being a small start-up to know being a real company. >> Right, right. There's good problems and bad problems. Those are the good problems. >> Yes, yes. >> So, you know, we do so many shows and there's two big themes over and over and over like digital transformation which gets way over used and then innovation and how do you find a culture of innovation. In doing literally thousands of these interviews, to me it seems pretty simple. It is about democratization. If you give more people the data, more people the tools to work with the data, and more people the power to do something once they find something in the data, and open that up to a broader set of people, they're going to find innovations, simply the fact of doing it. But the reality is those three simple steps aren't necessarily very easy to execute. >> You're spot on, you're spot on. I like to say that when we talk about digital transformation the real focus should be on the deed . And it really centers around data and it centers around the whole notion of democratization, right? The challenge always in large enterprises is democratization without governance becomes chaos. And we always need to focus on democratization. We need to focus on data because as we all know data is the new oil, all of that, and governance becomes a critical piece too. But as you recall, when we launched Paxata, the entire vision from day one has been while the entire focus around digitization covers many things right? It covers people processes. It covers applications. It's a very large topic, the whole digital transformation of enterprise. But the core foundation to digital transformation, data democratization governance, but the key issue is the companies that are going to succeed are the companies that turn data into information that's relevant for every digital transformation effort. >> Right, right. >> Because if you do not turn raw data into information, you're just dealing with raw data which is not useful >> Jeff: Right >> And it will not be democratized. >> Jeff: Right >> Because the business will only consume the information that is contextual to their need, the information that's complete and the information that is clean. >> Right, right. >> So that's really what we're driving towards. >> And that's interesting 'cause the data, there's so many more sources of data, right? There's data that you control. There's structured data, unstructured data. You know, I used to joke, just the first question when you'd ask people "Where's your data?", half the time they couldn't even, they couldn't even get beyond that step. And that's before you start talking about cleaning it and making it ready and making it available. Before you even start to get into governance and rights and access so it's a really complicated puzzle to solve on the backend. >> I think it starts with first focusing on what are the business outcomes we are driving with digital transformation. When you double-click on digital transformation and then you start focusing on data and information, there's a few things that come to fore. First of all, how do I leverage information to improve productivity in my company? There's multiple areas, whether it is marketing or supply chain or whatever. The second notion is how do I ensure that I can actually transform the culture in my company and attract the brightest and the best by giving them the the environment where democratization of information is actually reality, where people feel like they're empowered to access data and turn it into information and then be able to do really interesting things. Because people are not interested on being subservient to somebody who gives them the data. They want to be saying "Give it to me. "I'm smart enough. "I know analytics. "I think analytically and I want to drive my career forward." So the second thing is the cultural aspect to it. And the last thing, which is really important is every company, regardless of whether you're making toothpicks or turbines, you are looking to monetize data. So it's about productivity. It's about cultural change and attracting of talent. And it's about monetization. And when it comes to monetization of data, you cannot be satisfied with only covering enterprise data which is sitting in my enterprise systems. You have to be able to focus on, oh, how can I leverage the IOT data that's being generated from my products or widgets. How can I generate social immobile? How can I consume that? How can I bring all of this together and get the most complete insight that I need for my decision-making process? >> Right. So, I'm just curious, how do you see it your customers? So this is the chief analytics officer, we go to chief data officer, I mean, there's all these chief something officers that want to get involved in data and marketing is much more involved with it. Forget about manufacturing. So when you see successful cultural change, what drives that? Who are the people that are successful and what is the secret to driving the cultural change that we are going to be data-driven, we are going to give you the tools, we are going to make the investment to turn data which historically was even arguably a liability 'cause it had to buy a bunch o' servers to stick it on, into that now being an asset that drives actionable outcomes? >> You know, recently I was having this exact discussion with the CEO of one of the largest financial institutions in the world. This gentleman is running a very large financial services firm, is dealing with all the potential disruption where they're seeing completely new type of PINTEC products coming in, the whole notion of blockchain et cetera coming in. Everything is changing. Everything looks very dramatic. And what we started talking about is the first thing as the CEO that we always focus on is do we have the right people? And do we have the people that are motivated and driven to basically go and disrupt and change? For those people, you need to be able to give them the right kind of tools, the right kind of environment to empower them. This doesn't start with lip service. It doesn't start about us saying "We're going to be on a digital transformation journey" but at the same time, your data is completely in silos. It's locked up. There is 15,000 checks and balances before I can even access a simple piece of data and third, even when I get access to it, it's too little, too late or it's garbage in, garbage out. And that's not the culture. So first, it needs to be CEO drive, top down. We are going to go through digital transformation which means we are going to go through a democratization effort which means we are going to look at data and information as an asset and that means we are not only going to be able to harness these assets, but we're also going to monetize these assets. How are we going to do it? It depends very much on the business you're in, the vertical industry you play in, and your strengths and weaknesses. So each company has to look at it from their perspective. There's no one size fits all for everyone. >> Jeff: Right. >> There are some companies that have fantastic cultures of empowerment and openness but they may not have the right innovation or the right kind of product innovation skills in place. So it's about looking at data across the board. First from your culture and your empowerment, second about democratization of information which is where a company like Paxata comes in, and third, along with democratization, you have to focus on governance because we are for-profit companies. We have a fiducial responsibility to our customers and our regulators and therefore we cannot have democratization without governance. >> Right, right >> And that's really what our biggest differentiation is. >> And then what about just in terms of the political play inside the company. You know, on one hand, used to be if you held the information, you had the power. And now that's changed really 'cause there's so much information. It's really, if you are the conduit of information to help people make better decisions, that's actually a better position to be. But I'm sure there's got to be some conflicts going through digital transformation where I, you know, I was the keeper of the kingdom and now you want to open that up. Conversely, it must just be transformational for the people on the front lines that finally get the data that they've been looking for to run the analysis that they want to rather than waiting for the weekly reports to come down from on high. >> You bet. You know what I like to say is that if you've been in a company for 10, 15 years and if you felt like a particular aspect, purely selfishly, you felt a particular aspect was job security, that is exactly what's going to likely make you lose your job today. What you thought 10 years ago was your job security, that's exactly what's going to make you lose your job today. So if you do not disrupt yourself, somebody else will. So it's either transform yourself or not. Now this whole notion of politics and you know, struggle within the company, it's been there for as long as, humans generally go towards entropy. So, if you have three humans, you have all sort of issues. >> Jeff: Right, right. >> The issue starts frankly with leadership. It starts with the CEO coming down and not only putting an edict down on how things will be done but actually walking the walk with talking the talk. If, as a CEO, you're not transparent, it you're not trusting your people, if you're not sharing information which could be confidential, but you mention that it's confidential but you have to keep this confidential. If you trust your people, you give them the ability to, I think it's a culture change thing. And the second thing is incentivisation. You have to be able to focus on giving people the ability to say "by sharing my data, "I actually become a hero." >> Right, right. >> By giving them the actual credit for actually delivering the data to achieve an outcome. And that takes a lot of work. But if you do not actually drive the cultural change, you will not drive the digital transformation and you will not drive the democratization of information. >> And have you seen people try to do it without making the commitment? Have you seen 'em pay the lip service, spend a few bucks, start a project but then ultimately they, they hamstring themselves 'cause they're not actually behind it? >> Look, I mean, there's many instances where companies start on digital transformation or they start jumping into cool terms like AI or machine-learning, and there's a small group of people who are kind of the elites that go in and do this. And they're given all the kind of attention et cetera. Two things happen. Because these people who are quote, unquote, the elite team, either they are smart but they're not able to scale across the organization or many times, they're so good, they leave. So that transformation doesn't really get democratized. So it is really important from day one to start a culture where you're not going to have a small group of exclusive data scientists. You can have those people but you need to have a broader democratization focus. So what I have seen is many of the siloed, small, tight, mini science projects end up failing. They fail because number one, either the business outcome is not clearly identified early on or two, it's not scalable across the enterprise. >> Jeff: Right. >> And a majority of these exercises fail because the whole information foundation that is taking raw data turning it into clean, complete, potential consumable information, to feed across the organization, not just for one siloed group, not just one data science team. But how do you do that across the company? That's what you need to think from day one. When you do these siloed things, these departmental things, a lot of times they can fail. Now, it's important to say "I will start with a couple of test cases" >> Jeff: Right, right. >> "But I'm going to expand it across "from the beginning to think through that." >> So I'm just curious, your perspective, is there some departments that are the ripest for being that leading edge of the digital transformation in terms of, they've got the data, they've got the right attitude, they're just a short step away. Where have you seen the great place to succeed when you're starting on kind of a smaller PLC, I don't know if you'd say PLC, project or department level? >> So, it's funny but you will hear this, it's not rocket science. Always they say, follow the money. So, in a business, there are three incentives, making more money, saving money, or staying out of jail. (laughs) >> Those are good. I don't know if I'd put them in that order but >> Exactly, and you know what? Depending on who are you are, you may have a different order but staying out of jail if pretty high on my list. >> Jeff: I'm with you on that one. >> So, what are the ambiants? Risk and compliance. Right? >> Jeff: Right, right. >> That's one of those things where you absolutely have to deliver. You absolutely have to do it. It's significantly high cost. It's very data and analytic centric and if you find a smart way to do it, you can dramatically reduce your cost. You can significantly increase your quality and you can significantly increase the volume of your insights and your reporting, thereby achieving all the risk and compliance requirements but doing it in a smarter way and a less expensive way. >> Right. >> That's where incentives have really been high. Second, in making money, it always comes down to sales and marketing and customer success. Those are the three things, sales, marketing, and customer success. So most of our customers who have been widely successful, are the ones who have basically been able to go and say "You know what? "It used to take us eight months "to be able to even figure out a customer list "for a particular region. "Now it takes us two days because of Paxata "and because of the data prep capabilities "and the governance aspects." That's the power that you can deliver today. And when you see one person who's a line of business person who says "Oh my God. "What used to take me eight months, "now it's done in half a day". Or "What use to take me 22 days to create a report, "is now done in 45 minutes." All of a sudden, you will not have a small kind of trickle down, you will have a tsunami of democratization with governance. That's what we've seen in our customers. >> Right, right. I love it. And this is just so classic too. I always like to joke, you know, back in the day, you would run your business based on reports from old data. Now we want to run your business with stuff you can actually take action on now. >> Exactly. I mean, this is public, Shameek Kundu, the chief data officer of Standard Chartered Bank and Michael Gorriz who's the global CIO of Standard Chartered Bank, they have embraced the notion that information democratization in the bank is a foundational element to the digital transformation of Standard Chartered. They are very forward thinking and they're looking at how do I democratize information for all our 87,500 employees while we maintain governance? And another major thing that they are looking at is they know that the data that they need to manipulate and turn into information is not sitting only on premise. >> Right, right. >> It's sitting across a multi-cloud world and that's why they've embraced the Paxata information platform to be their information fabric for a multi-cloud hybrid world. And this is where we see successes and we're seeing more and more of this, because it starts with the people. It starts with the line of business outcomes and then it starts with looking at it from scale. >> Alright, Prakash, well always great to catch up and enjoy really watching the success of the company grow since you launched it many moons ago in New York City >> yes Fantastic. Always a pleasure to come back here. Thank you so much. >> Alright. Thank you. He's Prakash, I'm Jeff Frick. You're watching theCUBE from downtown San Francisco. Thanks for watching. (techno music)

Published Date : May 17 2018

SUMMARY :

Announcer: From the Corinium and the challenges of Analytics. Thank you for having me back. You going to launch your company? You know it seems just like yesterday where you guys are now, how things are going. of information in the enterprise Those are the good problems. and more people the power to do something and it centers around the whole notion of and the information that is clean. And that's before you start talking about cleaning it So the second thing is the cultural aspect to it. we are going to give you the tools, the vertical industry you play in, So it's about looking at data across the board. And that's really and now you want to open that up. and if you felt like a particular aspect, the ability to say "by sharing my data, and you will not drive the democratization of information. but you need to have a broader democratization focus. That's what you need to think from day one. "from the beginning to think through that." Where have you seen the great place to succeed So, it's funny but you will hear this, I don't know if I'd put them in that order but Exactly, and you know what? Risk and compliance. and if you find a smart way to do it, That's the power that you can deliver today. I always like to joke, you know, back in the day, is a foundational element to the digital transformation the Paxata information platform Thank you so much. Thank you.

ENTITIES

Entity	Category	Confidence
Jeff	PERSON	0.99+
Michael Gorriz	PERSON	0.99+
Prakash Nanduri	PERSON	0.99+
eight months	QUANTITY	0.99+
Standard Chartered Bank	ORGANIZATION	0.99+
22 days	QUANTITY	0.99+
Jeff Frick	PERSON	0.99+
Paxata	ORGANIZATION	0.99+
Shameek Kundu	PERSON	0.99+
two days	QUANTITY	0.99+
New York City	LOCATION	0.99+
Prakash	PERSON	0.99+
Second	QUANTITY	0.99+
thousands	QUANTITY	0.99+
87,500 employees	QUANTITY	0.99+
45 minutes	QUANTITY	0.99+
PINTEC	ORGANIZATION	0.99+
Standard Chartered	ORGANIZATION	0.99+
2017	DATE	0.99+
10	QUANTITY	0.99+
third	QUANTITY	0.99+
half a day	QUANTITY	0.99+
15,000 checks	QUANTITY	0.99+
first	QUANTITY	0.99+
First	QUANTITY	0.99+
Spring 2018	DATE	0.99+
first question	QUANTITY	0.99+
each company	QUANTITY	0.99+
one	QUANTITY	0.99+
second thing	QUANTITY	0.98+
nium	ORGANIZATION	0.98+
three things	QUANTITY	0.98+
10 years ago	DATE	0.98+
yesterday	DATE	0.98+
three simple steps	QUANTITY	0.98+
two big themes	QUANTITY	0.98+
first guy	QUANTITY	0.98+
second	QUANTITY	0.97+
three humans	QUANTITY	0.97+
about 100 people	QUANTITY	0.97+
two	QUANTITY	0.97+
Cori	PERSON	0.97+
Two things	QUANTITY	0.96+
Paxata	PERSON	0.96+
day one	QUANTITY	0.96+
15 years	QUANTITY	0.95+
three incentives	QUANTITY	0.94+
today	DATE	0.94+
theCUBE	ORGANIZATION	0.94+
second notion	QUANTITY	0.93+
first thing	QUANTITY	0.92+
one person	QUANTITY	0.92+
Cube	ORGANIZATION	0.89+
Parc 55 Hotel	LOCATION	0.88+
San Francisco	LOCATION	0.87+
2013	DATE	0.85+
Corinium Chief Analytics Officer	EVENT	0.82+
double-	QUANTITY	0.8+
downtown San Francisco	LOCATION	0.79+
Chief Analytics Officer	PERSON	0.78+
Corinium Chief Analytics Officer Conference	EVENT	0.77+
group	QUANTITY	0.74+
one data	QUANTITY	0.69+
Paxata	TITLE	0.66+
many moons ago	DATE	0.61+
couple	QUANTITY	0.61+
theCUBE	TITLE	0.57+
Spring	EVENT	0.5+

Prakash Nanduri, Paxata | BigData NYC 2017

>> Announcer: Live from midtown Manhattan, it's theCUBE covering Big Data New York City 2017. Brought to you by SiliconANGLE Media and it's ecosystem sponsors. (upbeat techno music) >> Hey, welcome back, everyone. Here live in New York City, this is theCUBE from SiliconANGLE Media Special. Exclusive coverage of the Big Data World at NYC. We call it Big Data NYC in conjunction also with Strata Hadoop, Strata Data, Hadoop World all going on kind of around the corner from our event here on 37th Street in Manhattan. I'm John Furrier, the co-host of theCUBE with Peter Burris, Head of Research at SiliconANGLE Media, and General Manager of WikiBon Research. And our next guest is one of our famous CUBE alumni, Prakash Nanduri co-founder and CEO of Paxata who launched his company here on theCUBE at our first inaugural Big Data NYC event in 2013. Great to see you. >> Great to see you, John. >> John: Great to have you back. You've been on every year since, and it's been the lucky charm. You guys have been doing great. It's not broke, don't fix it, right? And so theCUBE is working with you guys. We love having you on. It's been a pleasure, you as an entrepreneur, launching your company. Really, the entrepreneurial mojo. It's really what it's all about. Getting access to the market, you guys got in there, and you got a position. Give us the update on Paxata. What's happening? >> Awesome, John and Peter. Great to be here again. Every time I come here to New York for Strata I always look forward to our conversations. And every year we have something exciting and new to share with you. So, if you recall in 2013, it was a tiny little show, and it was a tiny little company, and we came in with big plans. And in 2013, I said, "You know, John, we're going to completely disrupt the way business consumers and business analysts turn raw data into information and they do self-service data preparation." That's what we brought to the market in 2013. Ever since, we have gone on to do something really exciting and new for our customers every year. In '14, we came in with the first Apache Spark-based platform that allowed business analysts to do data preparation at scale interactively. Every year since, last year we did enterprise grade and we talked about how Paxata is going to be delivering our self-service data preparation solution in a highly-scalable enterprise grade deployment world. This year, what's super exciting is in addition to the recent announcements we made on Paxata running natively on the Microsoft Azure HDI Spark system. We are truly now the only information platform that allows business consumers to turn data into information in a multi-cloud hybrid world for our enterprise customers. In the last few years, I came and I talked to you and I told you about work we're doing and what great things are happening. But this year, in addition to the super-exciting announcements with Microsoft and other exciting announcements that you'll be hearing. You are going to hear directly from one of our key anchor customers, Standard Chartered Bank. 150-year-old institution operating in over 46 countries. One of the most storied banks in the world with 87,500 employees. >> John: That's not a start up. >> That's not a start up. (John laughs) >> They probably have a high bar, high bar. They got a lot of data. >> They have lots of data. And they have chosen Paxata as their information fabric. We announced our strategic partnership with them recently and you know that they are going to be speaking on theCUBE this week. And what started as a little experiment, just like our experiment in 2013, has actually mushroomed now into Michael Gorriz, and Shameek Kundu, and the entire leadership of Standard Chartered choosing Paxata as the platform that will democratize information in the bank across their 87,500 employees. We are going in a very exciting way, a very fast way, and now delivering real value to the bank. And you can hear all about it on our website-- >> Well, he's coming on theCUBE so we'll drill down on that, but banks are changing. You talk about a transformation. What is a teller? An Internet of Things device. The watch potentially could be a terminal. So, the Internet of Things of people changes the game. Are the ATMs going to go away and become like broadcast points? >> Prakash: And you're absolutely right. And really what it is about is, it doesn't matter if you're a Standard Chartered Bank or if you're a pharma company or if you're the leading healthcare company, what it is is that everyone of our customers is really becoming an information-inspired business. And what we are driving our customers to is moving from a world where they're data-driven. I think being data-driven is fine. But what you need to be is information-inspired. And what does that mean? It means that you need to be able to consume data, regardless of format, regardless of source, regardless of where it's coming from, and turn it into information that actually allows you to get inside in decisions. And that's what Paxata does for you. So, this whole notion of being information-inspired, I don't care if you're a bank, if you're a car company, or if you're a healthcare company today, you need to have-- >> Prakash, for the folks watching that might not know our history as you launched on theCUBE in 2013 and have been successful every year since. You guys have really deploying the classic entrepreneurial success formula, be fast, walk the talk, listen to customers, add value. Take a minute quickly just to talk about what you guys do. Just for the folks that don't know you. >> Absolutely, let's just actually give it in the real example of you know, a customer like Standard Chartered. Standard Chartered operates in multiple countries. They have significant number of lines of businesses. And whether it's in risk and compliance, whether it is in their marketing department, whether it's in their corporate banking business, what they have to do is, a simple example could be I want to create a customer list to be able to go and run a marketing campaign. And the customer list in a particular region is not something easy for a bank like Standard Charter to come up with. They need to be able to pull from multiple sources. They need to be able to clean the data. They need to be able to shape the data to get that list. And if you look at what is really important, the people who understand the data are actually not the folks in IT but the folks in business. So, they need to have a tool and a platform that allows them to pull data from multiple sources to be able to massage it, to be able to clean it-- >> John: So, you sell to the business person? >> We sell to the business consumer. The business analyst is our consumer. And the person who supports them is the chief data officer and the person who runs the Paxata platform on their data lake infrastructure. >> So, IT sets the data lake and you guys just let the business guys go to town on the data. >> Prakash: Bingo. >> Okay, what's the problem that you solve? If you can summarize the problem that you solve for the customers, what is it? >> We take data and turn it into information that is clean, that's complete, that's consumable and that's contextual. The hardest problem in every analytical exercise is actually taking data and cleaning it up and getting it ready for analytics. That's what we do. >> It's the prep work. >> It's the prep work. >> As companies gain experience with Big Data, John, what they need to start doing increasingly is move more of the prep work or have more of the prep work flow closer to the analyst. And the reason's actually pretty simple. It's because of that context. Because the analyst knows more about what their looking for and is a better evaluator of whether or not they get what they need. Otherwise, you end up in this strange cycle time problem between people in back end that are trying to generate the data that they think they want. And so, by making the whole concept of data preparation simpler, more straight forward, you're able to have the people who actually consume the data and need it do a better job of articulating what they need, how they need it and making it presentable to the work that they're performing. >> Exactly, Peter. What does that say about how roles are starting to merge together? Cause you've got to be at the vanguard of seeing how some of these mature organizations are working. What do you think? Are we seeing roles start to become more aligned? >> Yes, I do think. So, first and foremost, I think what's happening is there is no such thing as having just one group that's doing data science and another group consuming. I think what you're going to be going into is the world of data and information isn't all-consuming and that everybody's role. Everybody has a role in that. And everybody's going to consume. So, if you look at a business analyst that was spending 80% of their time living in Excel or working with self-service BI tools like our partner's Tableau and Power BI from Microsoft, others. What you find is these people today are living in a world where either they have to live in coding scripting world hell or they have to rely on IT to get them the real data. So, the role of a business analyst or a subject matter expert, first and foremost, the fact that they work with data and they need information that's a given. There is no business role today where you can't deal with data. >> But it also makes them real valuable, because there aren't a lot of people who are good at dealing with data. And they're very, very reliant on these people to turn that data into something that is regarded as consumable elsewhere. So, you're trying to make them much more productive. >> Exactly. So, four years years ago, when we launched on theCUBE, the whole premise was that in order to be able to really drive towards a world where you can make information and data-driven decisions, you need to ensure that the business analyst community, or what I like to call the business consumer needs to have the power of being able to, A, get access to data, B, make sense of the data, and then turn that data into something that's valuable for her or for him. >> Peter: And others. >> And others, and others. Absolutely. And that's what Paxata is doing. In a collaborative, in a 21st Century world where I don't work in a silo, I work collaboratively. And then the tool, and the platform that helps me do that is actually a 21st Century platform. >> So, John, at the beginning of the session you and Jim were talking about what is going to be one of the themes here at the show. And we observed that it used to be that people were talking about setting up the hardware, setting up the clutters, getting Hadoop to work, and Jim talked about going up the stack. Well, this is one of the indicators that, in fact, people were starting to go up the stack because they're starting to worry more about the data, what it can do, the value of how it's going to be used, and how we distribute more of that work so that we get more people using data that's actually good and useful to the business. >> John: And drives value. >> And drives value. >> Absolutely. And if I may, just put a chronological aspect to this. When we launched the company we said the business analyst needs to be in charge of the data and turning the data into something useful. Then right at that time, the world of create data lakes came in thanks to our partners like Cloudera and Hortonworks, and others, and MapR and others. In the recent past, the world of moving from on premise data lakes to hybrid, multicloud data lakes is becoming reality. Our partners at Microsoft, at AWS, and others are having customers come in and build cloud-based data lakes. So, today what you're seeing is on one hand this complete democratization within the business, like at Standard Chartered, where all these business analysts are getting access to data. And on the other hand, from the data infrastructure moving into a hybrid multicloud world. And what you need is a 21st Century information management platform that serves the need of the business and to make that data relevant and information and ready for their consumption. While at the same time we should not forget that enterprises need governance. They need lineage. They need scale. They need to be able to move things around depending on what their business needs are. And that's what Paxata is driving. That's why we're so excited about our partnership with Microsoft, with AWS, with our customer partnerships such as Standard Chartered Bank, rolling this out in an enterprise-- >> This is a democratization that you were referring to with your customers. We see this-- >> Everywhere. >> When you free the data up, good things happen but you don't want to have IT be the constraint, you want to let them enable-- >> Peter: And IT doesn't want to be the constraint. >> They don't. >> This is one of the biggest problems that they have on a daily basis. >> They're happy to let it go free as long as it's in they're mind DevOps-like related, this is cool for them. >> Well, they're happy to let it go with policy and security in place. >> Our customers, our most strategic customers, the folks who are running the data lakes, the folks who are managing the data lakes, they are the first ones that say that we want business to be able to access this data, and to be able to go and make use out of this data in the right way for the bank. And not have us be the impediment, not have us be the roadblock. While at the same time we still need governance. We still need security. We still need all those things that are important for a bank or a large enterprise. That's what Paxata is delivering to the customers. >> John: So, what's next? >> Peter: Oh, I'm sorry. >> So, really quickly. An interesting observation. People talk about data being the new fuel of business. That really doesn't work because, as Bill Schmarzo says, it's not the new fuel of business, it's new sunlight of business. And the reason why is because fuel can only be used once. >> Prakash: That's right. >> The whole point of data is that it can be used a lot, in a lot of different ways, and a lot of different contexts. And so, in many respects what we're really trying to facilitate or if someone who runs a data lake when someone in the business asks them, "Well, how do you create value for the business?" The more people, the more users, the more context that they're serving out of that common data, the more valuable the resource that they're administering. So, they want to see more utilization, more contexts, more data being moved out. But again, governance, security have to be in place. >> You bet, you bet. And using that analogy of data, and I've heard this term about data being the new oil, etc. Well, if data is the oil, information is really the refined fuel or sunlight as we like to call it. >> Peter: Yeah. >> John: Well, you're riffing on semantics, but the point is it's not a one trick pony. Data is part of the development, I wrote a blog post in 1997, I mean 2007 that said data's the new development kit. And it was kind of riffing on this notion of the old days >> Prakash: You bet. >> Here's your development kit, SDK, or whatever was how people did things back then Enter the cloud, >> Prakash: That's right. >> And boom, there it is. The data now is in the process of the refinery the developers wanted. The developers want the data libraries. Whatever that means. That's where I see it. And that is the democratization where data is available to be integrated in to apps, into feeds, into ... >> Exactly, and so it brings me to our point about what was the exciting, new product innovation announcement we made today about Intelligent Ingest. You want to be able to access data in the enterprise regardless of where it is, regardless of the cloud where it's sitting, regardless of whether it's on-premise, in the cloud. You don't need to as a business worry about whether that is a JSON file or whether that's an XML file or that's a relational file. That's irrelevant. What you want is, do I have the access to the right data? Can I take that data, can I turn it into something valuable and then can I make a decision out of it? I need to do that fast. At the same time, I need to have the governance and security, all of that. That's at the end of the day the objective that our customers are driving towards. >> Prakash, thanks so much for coming on and being a great member of our community. >> Fantastic. >> You're part of our smart network of great people out there and entrepreneurial journey continues. >> Yes. >> Final question. Just observation. As you pinch yourself and you go down the journey, you guys are walking the talk, adding new products. We're global landscape. You're seeing a lot of new stuff happening. Customers are trying to stay focused. A lot of distractions whether security or data or app development. What's your state of the industry? How do you view the current market, from your perspective and also how the customer might see it from their impact? >> Well, the first thing is that I think in the last four years we have seen significant maturity both on the providers off software technology and solutions, and also amongst the customers. I do think that going forward what is really going to make a difference is one really driving towards business outcomes by leveraging data. We've talked about a lot of this over the last few years. What real business outcomes are you delivering? What we are super excited is when we see our customers each one of them actually subscribes to Paxata, we're a SAS company, they subscribe to Paxata not because they're doing the science experiment but because they're trying to deliver real business value. What is that? Whether that is a risk in compliance solution which is going to drive towards real cost savings. Or whether that's a top line benefit because they know what they're customer 360 is and how they can go and serve their customers better or how they can improve supply chains or how they can optimize their entire efficiency in the company. I think if you take it from that lens, what is going to be important right now is there's lots of new technologies coming in, and what's important is how is it going to drive towards those top three business drivers that I have today for the next 18 months? >> John: So, that's foundational. >> That's foundational. Those are the building blocks-- >> That's what is happening. Don't jump... If you're a customer, it's great to look at new technologies, etc. There's always innovation projects-- >> RND, GPOCs, whatever. Kick the tires. >> But now, if you are really going to talk the talk about saying I'm going to be, call your word, data-driven, information-driven, whatever it is. If you're going to talk the talk, then you better walk the walk by delivering the real kind of tools and capabilities that you're business consumers can adopt. And they better adopt that fast. If they're not up and running in 24 hours, something is wrong. >> Peter: Let me ask one question before you close, John. So, you're argument, which I agree with, suggests that one of the big changes in the next 18 months, three years as this whole thing matures and gets more consistent in it's application of the value that it generates, we're going to see an explosion in the number users of these types of tools. >> Prakash: Yes, yes. >> Correct? >> Prakash: Absolutely. >> 2X, 3X, 5X? What do you think? >> I think we're just at the cusp. I think is going to grow up at least 10X and beyond. >> Peter: In the next two years? >> In the next, I would give that next three to five years. >> Peter: Three to five years? >> Yes. And we're on the journey. We're just at the tip of the high curve taking off. That's what I feel. >> Yeah, and there's going to be a lot more consolidation. You're going to start to see people who are winning. It's becoming clear as the fog lifts. It's a cloud game, a scale game. It's democratization, community-driven. It's open source software. Just solve problems, outcomes. I think outcome is going to be much faster. I think outcomes as a service will be a model that we'll probably be talking about in the future. You know, real time outcomes. Not eight month projects or year projects. >> Certainly, we started writing research about outcome-based management. >> Right. >> Wikibon Research... Prakash, one more thing? >> I also just want to say that in addition to this business outcome thing, I think in the last five years I've seen a lot of shift in our customer's world where the initial excitement about analytics, predictive, AI, machine-learning to get to outcomes. They've all come into a reality that none of that is possible if you're not able to handle, first get a grip on your data, and then be able to turn that data into something meaningful that can be analyzed. So, that is also a major shift. That's why you're seeing the growth we're seeing-- >> John: Cause it's really hard. >> Prakash: It's really hard. >> I mean, it's a cultural mindset. You have the personnel. It's an operational model. I mean this is not like, throw some pixie dust on it and it magically happens. >> That's why I say, before you go into any kind of BI, analytics, AI initiative, stop, think about your information management strategy. Think about how you're going to democratize information. Think about how you're going to get governance. Think about how you're going to enable your business to turn data into information. >> Remember, you can't do AI with IA? You can't do AI without information architecture. >> There you go. That's a great point. >> And I think this all points to why Wikibon's research have all the analysts got it right with true private cloud because people got to take care of their business here to have a foundation for the future. And you can't just jump to the future. There's too much just to come and use a scale, too many cracks in the foundation. You got to do your, take your medicine now. And do the homework and lay down a solid foundation. >> You bet. >> All right, Prakash. Great to have you on theCUBE. Again, congratulations. And again, it's great for us. I totally have a great vibe when I see you. Thinking about how you launched on theCUBE in 2013, and how far you continue to climb. Congratulations. >> Thank you so much, John. Thanks, Peter. That was fantastic. >> All right, live coverage continuing day one of three days. It's going to be a great week here in New York City. Weather's perfect and all the players are in town for Big Data NYC. I'm John Furrier with Peter Burris. Be back with more after this short break. (upbeat techno music).

Published Date : Sep 27 2017

SUMMARY :

Brought to you by SiliconANGLE Media I'm John Furrier, the co-host of theCUBE with Peter Burris, and it's been the lucky charm. In the last few years, I came and I talked to you That's not a start up. They got a lot of data. and Shameek Kundu, and the entire leadership Are the ATMs going to go away and turn it into information that actually allows you Take a minute quickly just to talk about what you guys do. And the customer list in a particular region and the person who runs the Paxata platform and you guys just let the business guys and that's contextual. is move more of the prep work or have more of the prep work are starting to merge together? And everybody's going to consume. to turn that data into something that is regarded to be able to really drive towards a world And that's what Paxata is doing. So, John, at the beginning of the session of the business and to make that data relevant This is a democratization that you were referring to This is one of the biggest problems that they have They're happy to let it go free as long as Well, they're happy to let it go with policy and to be able to go and make use out of this data And the reason why is because fuel can only be used once. out of that common data, the more valuable Well, if data is the oil, I mean 2007 that said data's the new development kit. And that is the democratization At the same time, I need to have the governance and being a great member of our community. and entrepreneurial journey continues. How do you view the current market, and also amongst the customers. Those are the building blocks-- it's great to look at new technologies, etc. Kick the tires. the real kind of tools and capabilities in it's application of the value that it generates, I think is going to grow up at least 10X and beyond. We're just at the tip of Yeah, and there's going to be a lot more consolidation. Certainly, we started writing research Prakash, one more thing? and then be able to turn that data into something meaningful You have the personnel. to turn data into information. Remember, you can't do AI with IA? There you go. And I think this all points to Great to have you on theCUBE. Thank you so much, John. It's going to be a great week here in New York City.

ENTITIES

Entity	Category	Confidence
Peter Burris	PERSON	0.99+
John	PERSON	0.99+
Jim	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
2013	DATE	0.99+
Peter	PERSON	0.99+
Prakash	PERSON	0.99+
AWS	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Prakash Nanduri	PERSON	0.99+
Bill Schmarzo	PERSON	0.99+
1997	DATE	0.99+
New York	LOCATION	0.99+
Three	QUANTITY	0.99+
80%	QUANTITY	0.99+
Michael Gorriz	PERSON	0.99+
Standard Chartered Bank	ORGANIZATION	0.99+
New York City	LOCATION	0.99+
2007	DATE	0.99+
Hortonworks	ORGANIZATION	0.99+
87,500 employees	QUANTITY	0.99+
Paxata	ORGANIZATION	0.99+
NYC	LOCATION	0.99+
last year	DATE	0.99+
37th Street	LOCATION	0.99+
SAS	ORGANIZATION	0.99+
WikiBon Research	ORGANIZATION	0.99+
five years	QUANTITY	0.99+
Excel	TITLE	0.99+
24 hours	QUANTITY	0.99+
One	QUANTITY	0.99+
this year	DATE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
This year	DATE	0.99+
21st Century	DATE	0.99+
one	QUANTITY	0.99+
eight month	QUANTITY	0.99+
one question	QUANTITY	0.99+
four years years ago	DATE	0.99+
3X	QUANTITY	0.99+
5X	QUANTITY	0.99+
first	QUANTITY	0.99+
three years	QUANTITY	0.99+

Nenshad Bardoliwalla, Paxata - #BigDataNYC 2016 - #theCUBE

>> Voiceover: Live from New York, it's The Cube, covering Big Data New York City 2016. Brought to you by headline sponsors, Cisco, IBM, Nvidia, and our ecosystem sponsors. Now, here are your hosts, Dave Vellante and George Gilbert. >> Welcome back to New York City, everybody. Nenshad Bardoliwalla is here, he's the co-founder and chief product officer at Paxata, a company that, three years ago, I want to say three years ago, came out of stealth on The Cube. >> October 27, 2013. >> Right, and we were at the Warwick Hotel across the street from the Hilton. Yeah, Prakash came on The Cube and came out of stealth. Welcome back. >> Thank you very much. >> Great to see you guys. Taking the world by storm. >> Great to be here, and of course, Prakash sends his apologies. He couldn't be here so he sent his stunt double. (Dave and George laugh) >> Great, so give us the update. What's the latest? >> So there are a lot of great things going on in our space. The thing that we announced here at the show is what we're calling Paxata Connect, OK? We are moving just in the same way that we created the self-service data preparation category, and now there are 50 companies that claim they do self-service data prep. We are moving the industry to the next phase of what we are calling our business information platform. Paxata Connect is one of the first major milestones in getting to that vision of the business information platform. What Paxata Connect allows our customers to do is, number one, to have visual, completely declarative, point-and-click browsing access to a variety of different data sources in the enterprise. For example, we support, we are the only company that we know of that supports connecting to multiple, simultaneous, different Hadoop distributions in one system. So a Paxata customer can connect to MapR, they can connect to Hortonworks, they can connect to Cloudera, and they can federate across all of them, which is a very powerful aspect of the system. >> And part of this involves, when you say declarative, it means you don't have to write a program to retrieve the data. >> Exactly right. Exactly right. >> Is this going into HTFS, into Hive, or? >> Yes it is. In fact, so Hadoop is one part of, this multi-source Hadoop capability is one part of Paxata Connect. The second is, as we've moved into this information platform world, our customers are telling us they want read-write access to more than just Hadoop. Hadoop is obviously a very important part, but we're actually supporting no-sequel data sources like Cloudant, Mongo DB, we're supporting read and write, we're supporting, for the first time, relational databases, we already supported read, but now we actually support write to relational databases. So Paxata is really becoming kind of this fabric, a business-centric information fabric, that allows people to move data from anywhere to any destination, and transform it, profile it, explore it along the way. >> Excellent. Let's get into some of the use cases. >> Yeah, tell us where the banks are. The sense at the conference is that everyone sort of got their data lakes to some extent up and running. Now where are they pushing to go next? >> Sure, that's an excellent question. So we have really focused on the enterprise segment, as you know. So the customers that are working with Paxata from an industry perspective, banking is, of course, a very important one, we were really proud to share the stage yesterday with both Citi and Standard Chartered Bank, two of our flagship banking customers. But Paxata is also heavily used in the United States government, in the intelligence community, I won't say any more about that. It's used heavily in retail and consumer products, it's used heavily in the high-tech space, it's used heavily by data service providers, that is, companies whose entire business is based on data. But to answer your question specifically, what's happening in the data lake world is that a lot of folks, the early adopters, have jumped onto the data lake bandwagon. So they're pouring terabytes and petabytes of data into the data lake. And then the next question the business asks is, OK, now what? Where's the data, right? One of the simplest use cases, but actually one that's very pervasive for our customers, is they say, "Look, we don't even know, "our business people, they don't even know "what's in Hadoop right now." And by the way, I will also say that the data lake is not just Hadoop, but Amazon S3 is also serving as a data lake. The capabilities inside Microsoft's cloud are also serving as a data lake. Even the notion of a data lake is becoming this sort of polymorphic distributed thing. So what they do is, they want to be able to get what we like to say is first eyes on data. We let people with Paxata, especially with the release of Connect, to just point and click their way and to actually explore the data in all of the native systems before they even bring it in to something like Paxata. So they can actually sneak preview thousands of database tables or thousands of compressed data sets inside of Amazon S3, or thousands of data sets inside of Hadoop, and now the business people for the first time can point and click and actually see what is in the data lake in the first place. So step number one is, we have taken the approach so far in the industry of, there have been a lot of IT-driven use cases that have motivated people to go to the data lake approach. But now, we obviously want to show, all of our companies want to show business value, so tools and platforms like Paxata that sit on top of the data lake, that can federate across multiple data lakes and provide business-centric access to that information is the first significant use case pattern we're seeing. >> Just a clarification, could there be two roles where one is for slightly more technical business user exposes views summarizing, so that the ultimate end user doesn't have to see the thousands of tables? >> Absolutely, that's a great question. So when you look at self-service, if somebody wants to roll out a self-service strategy, there are multiple roles in an organization that actually need to intersect with self-service. There is a pattern in organizations where people say, "We want our people to get access to all the data." Of course it's governed, they have to have the right passwords and SSO and all that, but they're the companies who say, yes, the users really need to be able to see all of the data across these different tables. But there's a different role, who also uses Paxata extensively, who are the curators, right? These are the people who say, look, I'm going to provision the raw data, provide the views, provide even some normalization or transformation, and then land that data back into another layer, as people call the data relay, they go from layer zero to layer one to layer two, they're different directory structures, but the point is, there's a natural processing frame that they're going through with their data, and then from the curated data that's created by the data stewards, then the analysts can go pick it up. >> One of the other big challenges that our research is showing, that chief data officers express, is that they get this data in the data lake. So they've got the data sources, you're providing access to it, the other piece is they want to trust that data. There's obviously a governance piece, but then there's a data quality piece, maybe you could talk about that? >> Absolutely. So use case number one is about access. The second reason that people are not so -- So, why are people doing data prep in the first place? They are trying to make information-driven decisions that actually help move their business forward. So if you look at researchers from firms like Forrester, they'll say there are two reasons that slow down the latency of going from raw data to decision. Number one is access to data. That's the use case we just talked about. Number two is the trustworthiness of data. Our approach is very different on that. Once people actually can find the data that they're looking for, the big paradigm shift in the self-service world is that, instead of trying to process data based on transforming the metadata attributes, like I'm going to draw on a work flow diagram, bring in this table, aggregate with this operator, then split it this way, filter it, which is the classic ETL paradigm. The, I don't want to say profound, but maybe the very obvious thing we did was to say, "What if people could actually look at the data in the first place --" >> And sort of program it by example? >> We can tell, that's right. Because our eyes can tell us, our brains help us to say, we can immediately look at a data set, right? You look at an age column, let's say. There are values in the age column of 150 years. Maybe 20 years from now there may be someone who, on Earth, lives to 150 years. But pretty much -- >> Highly unlikely. >> The customers at the banks you work with are not 150 years old, right? So just being able to look at the data, to get to the point that you're asking, quality is about data being fit for a specific purpose. In order for data to be fit for a specific purpose, the person who needs the data needs to make the decision about what is quality data. Both of you may have access to the same transactional data, raw data, that the IT team has landed in the Hadoop cluster. But now you pull it up for one use case, you pull it up for another use case, and because your needs are different, what constitutes quality to you and where you want to make the investment is going to be very different. So by putting the power of that capability into the hands of the person who actually knows what they want, that is how we are actually able to change the paradigm and really compress the latency from "Here's my raw data" to "Here's the decision I want to make on that data." >> Let me ask, it sounds like, having put all of the self-service capabilities together, you've democratized access to this data. Now, what happens in terms of governance, or more importantly, just trust, when the pipeline, you know, has to go beyond where you're working on it, to some of the analytics or some of the basic ingest? To say, "I know this data came from here "and it's going there." >> That's right, how do we verify the fidelity of these data sources? It's a fantastic question. So, in my career, having worked in BI for a couple of decades, I know I look much younger but it actually has been a couple of decades. Remember, the camera adds about 15 pounds, for those of you watching at home. (Dave and George laugh) >> George: But you've lost already. >> Thank you very much. >> So you've lost net 30. (Nenshad laughs) >> Or maybe I'm back to where I'm supposed to be. What I've seen as the two models of governance in the enterprise when it comes to analytics and information management, right? There's model one, which is, we're going to build an enterprise data warehouse, we're going to know all the possible questions people are going to ask in advance, we're going to preprogram the ETL routines, we're going to put something like a MicroStrategy or BusinessObjects, an enterprise-reporting factory tool. Then you spend 10 million dollars on that project, the users come in and for the first time they use the system, and they say, "Oh, I kind of want to change this, this way. "I want to add this calculation." It takes them about five minutes to determine that they can't do it for whatever reason, and what is the first feature they look for in the product in order to move forward? Download to Excel, right? So you invested 15 million dollars to build a download to Excel capability which they already had before. So if you lock things down too much, the point is, the end users will go around you. They've been doing it for 30 years and they'll keep doing it. Then we have model two. Model two is, Excel spreadsheet. Excel Hell, or spreadmarts. There are lots of words for these things. You have a version of the data, you have a version of the data, I have a version of the data. We all started from the same transactional data, yet you're the head of sales, so suddenly your forecast looks really rosy. You're the head of finance, you really don't like what the forecast looks like. And I'm the product guy, so why am I even looking at the forecast in the first place, but somehow I got access to the data, right? These are the two polarities of the enterprise that we've worked with for the last 30 years. We wanted to find sort of a middle path, which is to say, let's give people the freedom and flexibility to be able to do the transformations they need to. If they want to add a column, let them add a column. If they want to change a calculation, let them add a a calculation. But, every single step in the process must be recorded. It must be versioned, it must be auditable. It must be governed in that way. So why the large banks and the intelligence community and the large enterprise customers are attracted to Paxata is because they have the ability to have perfect retraceability for every decision that they make. I can actually sit next to you and say, "This is why the data looks like this. "This is how this value, which started at one million, "became 1.5 million." That covers the Paxata part. But then the answer to the question you asked is, how do you even extend that to a broader ecosystem? I think that's really about some of the metadata interchange initiatives that a lot of the vendors in the Hadoop space, but also in the traditional enterprise space, have had for the last many years. If you look at something like Apache Atlas or Cloudera Navigator, they are systems designed to collect, aggregate, and connect these different metadata steps so you can see in an end-to-end flow, this is the raw data that got ingested into Hadoop. These are the transformations that the end user did in Paxata in order to make it ready for analytics. This is how it's getting consumed in something like Zoom Data, and you actually have the entire life cycle of data now actually manifested as a software asset. >> So those not, in other words, those are not just managing within the perimeter of Hadoop. They are managers of managers. >> That's right, that's right. Because the data is coming from anywhere, and it's going to anywhere. And then you can add another dimension of complexity which is, it's not just one Hadoop cluster. It's 10 Hadoop clusters. And those 10 Hadoop clusters, three of them are in Amazon. Four of them are in Microsoft. Three of them are in Google Cloud platform. How do you know what people are doing with data then? >> How is this all presented to the user? What does the user see? >> Great question. The trick to all of this, of self service, first you have to know very clearly, who is the person you are trying to serve? What are their technical skills and capabilities, and how can you get them productive as fast as possible? When we created this category, our key notion was that we were going to go after analysts. Now, that is a very generic term, right? Because we are all, in some sense, analysts in our day-to-day lives. But in Paxata, a business analyst, in an enterprise organizational context, is somebody that has the ability to use Microsoft Excel, they have to have that skill or they won't be successful with today's Paxata. They have to know what a VLOOKUP is, because a VLOOKUP is a way to actually pull data from a second data source into one. We would all know that as a join or a lookup. And the third thing is, they have to know what a pivot table is and know how a pivot table works. Because the key insight we had is that, of the hundreds of millions of analysts, people who use Excel on a day-to-day basis, a lot of their work is data prep. But Excel, being an amazing generic tool, is actually quite bad for doing data prep. So the person we target, when I go to a customer and they say, "Are we a good candidate to use Paxata?" and we're talking to the actual person who's going to use the software, I say, "Do you know what a VLOOKUP is, yes or no? "Do you know what a pivot table is, yes or no?" If they have that skill, when they come into Paxata, we designed Paxata to be very attractive to those people. So it's completely point-and-click. It's completely visual. It's completely interactive. There's no scripting inside that whole process, because do you think the average Microsoft Excel analyst wants to script, or they want to use a proprietary wrangling language? I'm sorry, but analysts don't want to wrangle. Data scientists, the 1% of the 1%, maybe they like to wrangle, but you don't have that with the broader analyst community, and that is a much larger market opportunity that we have targeted. >> Well, very large, I mean, a lot of people are familiar with those concepts in Excel, and if they're not, they're relatively easy to learn. >> Nenshad: That's right. Excellent. All right, Nenshad, we have to leave it there. Thanks very much for coming on The Cube, appreciate it. >> Thank you very much for having me. >> Congratulations for all the success. >> Thank you. >> All right, keep it right there, everybody. We'll be back with our next guest. This is The Cube, we're live from New York City at Big Data NYC. We'll be right back. (electronic music)

Published Date : Sep 30 2016

SUMMARY :

Brought to you by headline sponsors, here, he's the co-founder across the street from the Hilton. Great to see you guys. Great to be here, and of course, What's the latest? of the business information platform. to retrieve the data. Exactly right. explore it along the way. Let's get into some of the use cases. The sense at the conference One of the simplest use These are the people who One of the other big That's the use case we just talked about. to say, we can immediately the banks you work with of the self-service capabilities together, Remember, the camera adds about 15 pounds, So you've lost net 30. of the data, I have a version of the data. They are managers of managers. and it's going to anywhere. And the third thing is, they have to know relatively easy to learn. have to leave it there. This is The Cube, we're

ENTITIES

Entity	Category	Confidence
Citi	ORGANIZATION	0.99+
October 27, 2013	DATE	0.99+
George	PERSON	0.99+
George Gilbert	PERSON	0.99+
Nenshad	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Prakash	PERSON	0.99+
Dave	PERSON	0.99+
New York City	LOCATION	0.99+
Nvidia	ORGANIZATION	0.99+
Cisco	ORGANIZATION	0.99+
Earth	LOCATION	0.99+
15 million dollars	QUANTITY	0.99+
two	QUANTITY	0.99+
30 years	QUANTITY	0.99+
Forrester	ORGANIZATION	0.99+
Excel	TITLE	0.99+
thousands	QUANTITY	0.99+
50 companies	QUANTITY	0.99+
10 million dollars	QUANTITY	0.99+
Standard Chartered Bank	ORGANIZATION	0.99+
New York City	LOCATION	0.99+
Nenshad Bardoliwalla	PERSON	0.99+
two reasons	QUANTITY	0.99+
one million	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
first	QUANTITY	0.99+
two roles	QUANTITY	0.99+
two polarities	QUANTITY	0.99+
1.5 million	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
150 years	QUANTITY	0.99+
Hadoop	TITLE	0.99+
Paxata	ORGANIZATION	0.99+
second reason	QUANTITY	0.99+
One	QUANTITY	0.99+
two models	QUANTITY	0.99+
second	QUANTITY	0.99+
one	QUANTITY	0.99+
yesterday	DATE	0.99+
Both	QUANTITY	0.99+
three years ago	DATE	0.99+
first time	QUANTITY	0.98+
first time	QUANTITY	0.98+
New York	LOCATION	0.98+
both	QUANTITY	0.98+
1%	QUANTITY	0.97+
third thing	QUANTITY	0.97+
one system	QUANTITY	0.97+
about five minutes	QUANTITY	0.97+
Paxata	PERSON	0.97+
first feature	QUANTITY	0.97+
Data	LOCATION	0.96+
one part	QUANTITY	0.96+
United States government	ORGANIZATION	0.95+
thousands of tables	QUANTITY	0.94+
20 years	QUANTITY	0.94+
Model two	QUANTITY	0.94+
10 Hadoop clusters	QUANTITY	0.94+
terabytes	QUANTITY	0.93+

Nenshad Bardoliwalla & Stephanie McReynolds | BigData NYC 2017

>> Live from midtown Manhattan, it's theCUBE covering Big Data New York City 2017. Brought to you by Silicon Angle Media and its ecosystem sponsors. (upbeat techno music) >> Welcome back, everyone. Live here in New York, Day Three coverage, winding down for three days of wall to wall coverage theCUBE covering Big Data NYC in conjunction with Strata Data, formerly Strata Hadoop and Hadoop World, all part of the Big Data ecosystem. Our next guest is Nenshad Bardoliwalla Co-Founder and Chief Product Officer of Paxata, hot start up in the space. A lot of kudos. Of course, they launched on theCUBE in 2013 three years ago when we started theCUBE as a separate event from O'Reilly. So, great to see the success. And Stephanie McReynolds, you've been on multiple times, VP of Marketing at Alation. Welcome back, good to see you guys. >> Thank you. >> Happy to be here. >> So, winding down, so great kind of wrap-up segment here in addition to the partnership that you guys have. So, let's first talk about before we get to the wrap-up of the show and kind of bring together the week here and kind of summarize everything. Tell about your partnership you guys have. Paxata, you guys have been doing extremely well. Congratulations. Prakash was talking on theCUBE. Great success. You guys worked hard for it. I'm happy for you. But partnering is everything. Ecosystem is everything. Alation, their collaboration with data. That's there ethos. They're very user-centric. >> Nenshad: Yes. >> From the founders. Seemed like a good fit. What's the deal? >> It's a very natural fit between the two companies. When we started down the path of building new information management capabilities it became very clear that the market had strong need for both finding data, right? What do I actually have? I need an inventory, especially if my data's in Amazon S3, my data is in Azure Blob storage, my data is on-premise in HDFS, my data is in databases, it's all over the place. And I need to be able to find it. And then once I find it, I want to be able to prepare it. And so, one of the things that really drove this partnership was the very common interests that both companies have. And number one, pushing user experience. I love the Alation product. It's very easy to use, it's very intuitive, really it's a delightful thing to work with. And at the same time they also share our interests in working in these hybrid multicloud environments. So, what we've done and what we announced here at Strata is actually this bi-directional integration between the products. You can start in Alation and find a data set that you want to work with, see what collaboration or notes or business metadata people have created and then say, I want to go see this in Paxata. And in a single click you can then actually open it up in Paxata and profile that data. Vice versa you can also be in Paxata and prepare data, and then with a single click push it back, and then everybody who works with Alation actually now has knowledge of where that data is. So, it's a really nice synergy. >> So, you pushed the user data back to Alation, cause that's what they care a lot about, the cataloging and making the user-centric view work. So, you provide, it's almost a flow back and forth. It's a handshake if you will to data. Am I getting that right? >> Yeah, I mean, the idea's to keep the analyst or the user of that data, data scientist, even in some cases a business user, keep them in the flow of their work as much as possible. But give them the advantage of understanding what others in the organization have done with that data prior and allow them to transform it, and then share that knowledge back with the rest of the community that might be working with that data. >> John: So, give me an example. I like your Excel spreadsheet concept cause that's obvious. People know what Excel spreadsheet is so. So, it's Excel-like. That's an easy TAM to go after. All Microsoft users might not get that Azure thing. But this one, just take me through a usecase. >> So, I've got a good example. >> Okay, take me through. >> It's very common in a data lake for your data to be compressed. And when data's compressed, to a user it looks like a black box. So, if the data is compressed in Avro or Parquet or it's even like JSON format. A business user has no idea what's in that file. >> John: Yeah. >> So, what we do is we find the file for them. It may have some comments on that file of how that data's been used in past projects that we infer from looking at how others have used that data in Alation. >> John: So, you put metadata around it. >> We put a whole bunch of metadata around it. It might be comments that people have made. It might be >> Annotations, yeah. >> actual observations, annotations. And the great thing that we can do with Paxata is open that Avro file or Parquet file, open it up so that you can actually see the data elements themselves. So, all of a sudden, the business user has access without having to use a command line utility or understand anything about compression, and how you open that file up-- >> John: So, as Paxata spitting out there nuggets of value back to you, you're kind of understanding it, translating it to the user. And they get to do their thing, you get to do your thing, right? >> It's making a Avro or a Parquet file as easy to use as Excel, basically. Which is great, right? >> It's awesome. >> Now, you've enabled >> a whole new class of people who can use that. >> Well, and people just >> Get turned off when it's anything like jargon, or like, "What is that? I'm afraid it's phishing. Click on that and oh!" >> Well, the scary thing is that in a data lake environment, in a lot of cases people don't even label the files with extensions. They're just files. (Stephanie laughs) So, what started-- >> It's like getting your pictures like DS, JPEG. It's like what? >> Exactly. >> Right. >> So, you're talking about unlabeled-- >> If you looked on your laptop, and if you didn't have JPEG or DOC or PPT. Okay, I don't know that this file is. Well, what you have in the data lake environment is that you have thousands of these files that people don't really know what they are. And so, with Alation we have the ability to get all the value around the curation of the metadata, and how people are using that data. But then somebody says, "Okay, but I understand that this file exists. What's in it?" And then with Click to Profile from Alation you're immediately taken into Paxata. And now you're actually looking at what's in that file. So, you can very quickly go from this looks interesting to let me understand what's inside of it. And that's very powerful. >> Talk about Alation. Cause I had the CEO on, also their lead investor Greg Sands from Costanoa Ventures. They're a pretty amazing team but it's kind of out there. No offense, it's kind of a compliment actually. (Stephanie laughs) >> They got a symbolic >> Stephanie: Keep going. system Stanford guy, who's like super-smart. >> Nenshad: Yeah. >> They're on something that's really unique but it's almost too simple to be. Like, wait a minute! Google for the data, it's an awesome opportunity. How do you describe Alation to people who say, "Hey, what's this Alation thing?" >> Yeah, so I think that the best way to describe it is it's the browser for all of the distributed data in the enterprise. Sorry, so it's both the catalog, and the browser that sits on top of it. It sounds very simple. Conceptually it's very simple but they have a lot of richness in what they're able to do behind the scenes in terms of introspecting what type of work people are doing with data, and then taking that knowledge and actually surfacing it to the end user. So, for example, they have very powerful scenarios where they can watch what people are doing in different data sources, and then based on that information actually bubble up how queries are being used or the different patterns that people are doing to consume data with. So, what we find really exciting is that this is something that is very complex under the covers. Which Paxata is as well being built upon Spark. But they have put in the hard engineering work so that it looks simple to the end user. And that's the exact same thing that we've tried to do. >> And that's the hard problem. Okay, Stephanie back ... That was a great example by the way. Can't wait to have our little analyst breakdown of the event. But back to Alation for you. So, how do you talk about, you've been VP of Marketing of Alation. But you've been around the block. You know B2B, tech, big data. So, you've seen a bunch of different, you've worked at Trifacta, you worked at other companies, and you've seen a lot of waves of innovation come. What's different about Alation that people might not know about? How do you describe the difference? Because it sounds easy, "Oh, it's a browser! It's a catalog!" But it's really hard. Is it the tech that's the secret? Is it the approach? How do you describe the value of Alation? I think what's interesting about Alation is that we're solving a problem that since the dawn of the data warehouse has not been solved. And that is how to help end users really find and understand the data that they need to do their jobs. A lot of our customers talk about this-- >> John: Hold on. Repeat that. Cause that's like a key thing. What problem hasn't been solved since the data warehouse? >> To be able to actually find and fully understand, understand to the point of trust the data that you want to use for your analysis. And so, in the world of-- >> John: That sounds so simple. >> Stephanie: In the world of data warehousing-- >> John: Why is it so hard? >> Well, because in the world of data warehousing business people were told what data they should use. Someone in IT decided how to model the data, came up with a KPR calculation, and told you as a business person, you as a CEO, this is how you're going to monitor you business. >> John: Yeah. >> What business person >> Wants to be told that by an IT guy, right? >> Well, it was bounded by IT. >> Right. >> Expression and discovery >> Should be unbounded. Machine learning can take care of a lot of bounded stuff. I get that. But like, when you start to get into the discovery side of it, it should be free. >> Well, no offense to the IT team, but they were doing their best to try to figure out how to make this technology work. >> Well, just look at the cost of goods sold for storage. I mean, how many EMC drives? Expensive! IT was not cheap. >> Right. >> Not even 10, 15, 20 years ago. >> So, now when we have more self-service access to data, and we can have more exploratory analysis. What data science really introduced and Hadoop introduced was this ability on-demand to be able to create these structures, you have this more iterative world of how you can discover and explore datasets to come to an insight. The only challenge is, without simplifying that process, a business person is still lost, right? >> John: Yeah. >> Still lost in the data. >> So, we simply call that a catalog. But a catalog is much more-- >> Index, catalog, anthology, there's other words for it, right? >> Yeah, but I think it's interesting because like a concept of a catalog is an inventory has been around forever in this space. But the concept of a catalog that learns from other's behavior with that data, this concept of Behavior I/O that Aaron talked about earlier today. The fact that behavior of how people query data as an input and that input then informs a recommendation as an output is very powerful. And that's where all the machine learning and A.I. comes to work. It's hidden underneath that concept of Behavior I/O but that's there real innovation that drives this rich catalog is how can we make active recommendations to a business person who doesn't have to understand the technology but they know how to apply that data to making a decision. >> Yeah, that's key. Behavior and textual information has always been the two fly wheels in analysis whether you're talking search engine or data in general. And I think what I like about the trends here at Big Data NYC this weekend. We've certainly been seeing it at the hundreds of CUBE events we've gone to over the past 12 months and more is that people are using data differently. Not only say differently, there's baselining, foundational things you got to do. But the real innovators have a twist on it that give them an advantage. They see how they can use data. And the trend is collective intelligence of the customer seems to be big. You guys are doing it. You're seeing patterns. You're automating the data. So, it seems to be this fly wheel of some data, get some collective data. What's your thoughts and reactions. Are people getting it? Is this by people doing it by accident on purpose kind of thing? Did people just fell on their head? Or you see, "Oh, I just backed into this?" >> I think that the companies that have emerged as the leaders in the last 15 or 20 years, Google being a great example, Amazon being a great example. These are companies whose entire business models were based on data. They've generated out-sized returns. They are the leaders on the stock market. And I think that many companies have awoken to the fact that data as a monetizable asset to be turned into information either for analysis, to be turned into information for generating new products that can then be resold on the market. The leading edge companies have figured that out, and our adopting technologies like Alation, like Paxata, to get a competitive advantage in the business processes where they know they can make a difference inside of the enterprise. So, I don't think it's a fluke at all. I think that most of these companies are being forced to go down that path because they have been shown the way in terms of the digital giants that are currently ruling the enterprise tech world. >> All right, what's your thoughts on the week this week so far on the big trends? What are obvious, obviously A.I., don't need to talk about A.I., but what were the big things that came out of it? And what surprised you that didn't come out from a trends standpoint buzz here at Strata Data and Big Data NYC? What were the big themes that you saw emerge and didn't emerge what was the surprise? Any surprises? >> Basically, we're seeing in general the maturation of the market finally. People are finally realizing that, hey, it's not just about cool technology. It's not about what distribution or package. It's about can you actually drive return on investment? Can you actually drive insights and results from the stack? And so, even the technologists that we were talking with today throughout the course of the show are starting to talk about it's that last mile of making the humans more intelligent about navigating this data, where all the breakthroughs are going to happen. Even in places like IOT, where you think about a lot of automation, and you think about a lot of capability to use deep learning to maybe make some decisions. There's still a lot of human training that goes into that decision-making process and having agency at the edge. And so I think this acknowledgement that there should be balance between human input and what the technology can do is a nice breakthrough that's going to help us get to the next level. >> What's missing? What do you see that people missed that is super-important, that wasn't talked much about? Is there anything that jumps out at you? I'll let you think about it. Nenshad, you have something now. >> Yeah, I would say I completely agree with what Stephanie said which we are seeing the market mature. >> John: Yeah. >> And there is a compelling force to now justify business value for all the investments people have made. The science experiment phase of the big data world is over. People now have to show a return on that investment. I think that being said though, this is my sort of way of being a little more provocative. I still think there's way too much emphasis on data science and not enough emphasis on the average business analyst who's doing work in the Fortune 500. >> It should be kind of the same thing. I mean, with data science you're just more of an advanced analyst maybe. >> Right. But the idea that every person who works with data is suddenly going to understand different types of machine learning models, and what's the right way to do hyper parameter tuning, and other words that I could throw at you to show that I'm smart. (laughter) >> You guys have a vision with the Excel thing. I could see how you see that perspective because you see a future. I just think we're not there yet because I think the data scientists are still handcuffed and hamstrung by the fact that they're doing too much provisioning work, right? >> Yeah. >> To you're point about >> surfacing the insights, it's like the data scientists, "Oh, you own it now!" They become the sysadmin, if you will, for their department. And it's like it's not their job. >> Well, we need to get them out of data preparation, right? >> Yeah, get out of that. >> You shouldn't be a data scientist-- >> Right now, you have two values. You've got the use interface value, which I love, but you guys do the automation. So, I think we're getting there. I see where you're coming from, but still those data sciences have to set the tone for the generation, right? So, it's kind of like you got to get those guys productive. >> And it's not a .. Please go ahead. >> I mean, it's somewhat interesting if you look at can the data scientist start to collaborate a little bit more with the common business person? You start to think about it as a little bit of scientific inquiry process. >> John: Yeah. >> Right? >> If you can have more innovators around the table in a common place to discuss what are the insights in this data, and people are bringing business perspective together with machine learning perspective, or the knowledge of the higher algorithms, then maybe you can bring those next leaps forward. >> Great insight. If you want my observations, I use the crazy analogy. Here's my crazy analogy. Years it's been about the engine Model T, the car, the horse and buggy, you know? Now, "We got an engine in the car!" And they got wheels, it's got a chassis. And so, it's about the apparatus of the car. And then it evolved to, "Hey, this thing actually drives. It's transportation." You can actually go from A to B faster than the other guys, and people still think there's a horse and buggy market out there. So, they got to go to that. But now people are crashing. Now, there's an art to driving the car. >> Right. >> So, whether you're a sports car or whatever, this is where the value piece I think hits home is that, people are driving the data now. They're driving the value proposition. So, I think that, to me, the big surprise here is how people aren't getting into the hype cycle. They like the hype in terms of lead gen, and A.I., but they're too busy for the hype. It's like, drive the value. This is not just B.S. either, outcomes. It's like, "I'm busy. I got security. I got app development." >> And I think they're getting smarter about how their valuing data. We're starting to see some economic models, and some ways of putting actual numbers on what impact is this data having today. We do a lot of usage analysis with our customers, and looking at they have a goal to distribute data across more of the organization, and really get people using it in a self-service manner. And from that, you're being able to calculate what actually is the impact. We're not just storing this for insurance policy reasons. >> Yeah, yeah. >> And this cheap-- >> John: It's not some POC. Don't do a POC. All right, so we're going to end the day and the segment on you guys having the last word. I want to phrase it this way. Share an anecdotal story you've heard from a customer, or a prospective customer, that looked at your product, not the joint product but your products each, that blew you away, and that would be a good thing to leave people with. What was the coolest or nicest thing you've heard someone say about Alation and Paxata? >> For me, the coolest thing they said, "This was a social network for nerds. I finally feel like I've found my home." (laughter) >> Data nerds, okay. >> Data nerds. So, if you're a data nerd, you want to network, Alation is the place you want to be. >> So, there is like profiles? And like, you guys have a profile for everybody who comes in? >> Yeah, so the interesting thing is part of our automation, when we go and we index the data sources we also index the people that are accessing those sources. So, you kind of have a leaderboard now of data users, that contract one another in system. >> John: Ooh. >> And at eBay leader was this guy, Caleb, who was their data scientist. And Caleb was famous because everyone in the organization would ask Caleb to prepare data for them. And Caleb was like well known if you were around eBay for awhile. >> John: Yeah, he was the master of the domain. >> And then when we turned on, you know, we were indexing tables on teradata as well as their Hadoop implementation. And all of a sudden, there are table structures that are Caleb underscore cussed. Caleb underscore revenue. Caleb underscore ... We're like, "Wow!" Caleb drove a lot of teradata revenue. (Laughs) >> Awesome. >> Paxata, what was the coolest thing someone said about you in terms of being the nicest or coolest most relevant thing? >> So, something that a prospect said earlier this week is that, "I've been hearing in our personal lives about self-driving cars. But seeing your product and where you're going with it I see the path towards self-driving data." And that's really what we need to aspire towards. It's not about spending hours doing prep. It's not about spending hours doing manual inventories. It's about getting to the point that you can automate the usage to get to the outcomes that people are looking for. So, I'm looking forward to self-driving information. Nenshad, thanks so much. Stephanie from Alation. Thanks so much. Congratulations both on your success. And great to see you guys partnering. Big, big community here. And just the beginning. We see the big waves coming, so thanks for sharing perspective. >> Thank you very much. >> And your color commentary on our wrap up segment here for Big Data NYC. This is theCUBE live from New York, wrapping up great three days of coverage here in Manhattan. I'm John Furrier. Thanks for watching. See you next time. (upbeat techo music)

Published Date : Oct 3 2017

SUMMARY :

Brought to you by Silicon Angle Media and Hadoop World, all part of the Big Data ecosystem. in addition to the partnership that you guys have. What's the deal? And so, one of the things that really drove this partnership So, you pushed the user data back to Alation, Yeah, I mean, the idea's to keep the analyst That's an easy TAM to go after. So, if the data is compressed in Avro or Parquet of how that data's been used in past projects It might be comments that people have made. And the great thing that we can do with Paxata And they get to do their thing, as easy to use as Excel, basically. a whole new class of people Click on that and oh!" the files with extensions. It's like getting your pictures like DS, JPEG. is that you have thousands of these files Cause I had the CEO on, also their lead investor Stephanie: Keep going. Google for the data, it's an awesome opportunity. And that's the exact same thing that we've tried to do. And that's the hard problem. What problem hasn't been solved since the data warehouse? the data that you want to use for your analysis. Well, because in the world of data warehousing But like, when you start to get into to the IT team, but they were doing Well, just look at the cost of goods sold for storage. of how you can discover and explore datasets So, we simply call that a catalog. But the concept of a catalog that learns of the customer seems to be big. And I think that many companies have awoken to the fact And what surprised you that didn't come out And so, even the technologists What do you see that people missed the market mature. in the Fortune 500. It should be kind of the same thing. But the idea that every person and hamstrung by the fact that they're doing They become the sysadmin, if you will, So, it's kind of like you got to get those guys productive. And it's not a .. can the data scientist start to collaborate or the knowledge of the higher algorithms, the car, the horse and buggy, you know? So, I think that, to me, the big surprise here is across more of the organization, and the segment on you guys having the last word. For me, the coolest thing they said, Alation is the place you want to be. Yeah, so the interesting thing is if you were around eBay for awhile. And all of a sudden, there are table structures And great to see you guys partnering. See you next time.

ENTITIES

Entity	Category	Confidence
Stephanie	PERSON	0.99+
Stephanie McReynolds	PERSON	0.99+
Greg Sands	PERSON	0.99+
John	PERSON	0.99+
Caleb	PERSON	0.99+
John Furrier	PERSON	0.99+
Nenshad	PERSON	0.99+
New York	LOCATION	0.99+
Prakash	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Aaron	PERSON	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
2013	DATE	0.99+
thousands	QUANTITY	0.99+
Costanoa Ventures	ORGANIZATION	0.99+
Manhattan	LOCATION	0.99+
two companies	QUANTITY	0.99+
both companies	QUANTITY	0.99+
Excel	TITLE	0.99+
Trifacta	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Strata Data	ORGANIZATION	0.99+
Alation	ORGANIZATION	0.99+
Paxata	ORGANIZATION	0.99+
Nenshad Bardoliwalla	PERSON	0.99+
eBay	ORGANIZATION	0.99+
three days	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
two values	QUANTITY	0.99+
NYC	LOCATION	0.99+
hundreds	QUANTITY	0.99+
Big Data	ORGANIZATION	0.99+
first	QUANTITY	0.99+
one	QUANTITY	0.99+
both	QUANTITY	0.99+
Strata Hadoop	ORGANIZATION	0.99+
Hadoop World	ORGANIZATION	0.99+
earlier this week	DATE	0.98+
Paxata	PERSON	0.98+
today	DATE	0.98+
Day Three	QUANTITY	0.98+
Parquet	TITLE	0.96+
three years ago	DATE	0.96+

Santhosh Mahendiran, Standard Chartered Bank | BigData NYC 2017

>> Announcer: Live, from Midtown Manhattan, it's theCUBE, covering Big Data New York City 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. (upbeat techno music) >> Okay welcome back, we're live here in New York City. It's theCUBE's presentation of Big Data NYC, our fifth year doing this event in conjunction with Strata Data, formerly Strata Hadoop, formerly Strata Conference, formerly Hadoop World, we've been there from the beginning. Eight years covering Hadoop's ecosystem now Big Data. This is theCUBE, I'm John Furrier. Our next guest is Santhosh Mahendiran, who is the global head of technology analytics at Standard Chartered Bank. A practitioner in the field, here getting the data, checking out the scene, giving a presentation on your journey with Data at a bank, which is big financial obviously an adopter. Welcome to theCUBE. >> Thank you very much. >> So we always want to know what the practitioners are doing because at the end of the day there's a lot of vendors selling stuff here, so you got, everyone's got their story. End of the day you got to implement. >> That's right. >> And one of the themes is the data democratization which sounds warm and fuzzy, collaborating with data, this is all good stuff and you feel good and you move into the future, but at the end of the day it's got to have business value. >> That's right. >> And as you look at that, how do you look at the business value? Cause you want to be in the bleeding edge, you want to provide value and get that edge operationally. >> That's right. >> Where's the value in data democratization? How did you guys roll this out? Share your story. >> Okay, so let me start with the journey first before I come to the value part of it, right? So, data democratization is an outcome, but the journey has been something we started three years back. So what did we do, right? So we had some guiding principles to start our journey. The first was to say that we believed in the three S's, which is speed, scale, and it should be really, really flexible and super fast. So one of the challenges that we had was our historical data warehouses was entirely becoming redundant. And why was it? Because it was RDBMS centric, and it was extremely disparate. So we weren't able to scale up to meet the demands of managing huge chunks of data. So, the first step that we did was to re-pivot it to say that okay, let's embrace Hadoop. And what you mean by embracing is just not putting in the data lake, but we said that all our data will land into the data lake. And this journey started in 2015, so we have close to 80% of the Bank's data in the lake and it is end of day data right now and this data flows in on daily basis, and we have consumers who feed off that data. Now coming to your question about-- >> So the data lake's working? >> The data lake is working, up and running. >> People like it, you just got a good spot, batch 'em all you throw everything in the lake. >> So it is not real time, it is end of day. There is some data that is real-time, but the data lake is not entirely real-time, that I have to tell you. But one part is that the data lake is working. Second part to your question is how do I actually monetize it? Are you getting some value out of it? But I think that's where tools like Paxata has actually enabled us to accelerate this journey. So we call it data democratization. So the best part it's not about having the data. We want the business users to actually use the data. Typically, data has always been either delayed or denied in most of the cases to end-users and we have end-users waiting for the data but they don't get access to the data. It was done because primarily the size of the data was too huge and it wasn't flexible enough to be shared with. So how did tools like Paxata and the data lake help us? So what we did with data democratization is basically to say that "hey we'll get end-users to access the data first in a fast manner, in a self-service manner, and something that gives operational assurance to the data, so you don't hold the data and then say that you're going to get a subset of data to play with. We'll give you the entire set of data and we'll give you the right tools which you can play with. Most importantly, from an IT perspective, we'll be able to govern it. So that's the key about democratization. It's not about just giving them a tool, giving them all data and then say "go figure it out." It's about ensuring that "okay, you've got the tools, you've got the data, but we'll also govern it," so that you obviously have control over what they're doing. >> So now you govern it, they don't have to get involved in the governance, they just have access? >> No they don't need to. Yeah, they have access. So governance works both ways. We establish the boundaries. Look at it as a referee, and then say that "okay, there are guidelines that you don't," and within the datasets that key people have access to, you can further set rules. Now, coming back to specific use cases, I can talk about two specific cases which actually helped us to move the needle. The first is on stress testing, so being a financial institution, we typically have to report various numbers to our regulators, etc. The turnaround time was extremely huge. These kind of stress testing typically involve taking huge amount-- >> What were some of the turnaround times? >> Normally it was two to three weeks, some cases a month-- >> Wow. >> So we were able to narrow it down to days, but what we essentially did was as with any stress testing or reporting, it involved taking huge amounts of data, crunching them and then running some models and then showing the output, basically a number of transformations involved. Earlier, you first couldn't access the entire dataset, so that we solved-- >> So check, that was a good step one-- >> That was step one. >> But was there automation involved in that, the Paxata piece? >> Yeah, I wouldn't say it was fully automated end-to-end, but there was definitely automation given the fact that now you got Paxata to work off the data rather than someone extracting the data and then going off and figuring what needs to be done. The ability to work off the entire dataset was a big plus. So stress testing, bringing down the cycle time. The second one use case I can talk about is again anti-money laundering, and in our financial crime compliance space. We had processes that took time to report, given the clunkiness in the various handoffs that we needed to do. But again, empowering the users, giving the tool to them and then saying "hey, this"-- >> How about know your user, because we have to anti-money launder, you need to have to know your user base, that's all set their too? >> Yeah. So the good part is know the user, know your customer, KYCs all that part is set, but the key part is making sure the end-users are able to access the data much more earlier in the life cycle and are able to play with it. In the case of anti-money laundering, again first question of three weeks to four weeks was shortened down to question of days by giving tools like Paxata again in a structured manner and with which we're able to govern. >> You control this, so you knew what you were doing, but you let their tools do the job? >> Correct, so look at it this way. Typically, the data journey has always been IT-led. It has never been business-led. If you look at the generations of what happens is, you source the data which is IT-led, then you model the data which is IT-led, then you prepare then massage the data which is again IT-led and then you have tools on top of it which is again IT-led so the end-users get it only after the fourth stage. Now look at the generations within. All these life cycles apart from the fact that you source the data which is typically an IT issue, the rest need to be done by the actual business users and that's what we did. That's the progression of the generations in which we now we're in the third generation as I call it where our role is just to source the data and then say, "yeah we'll govern it in the matter and then preparation-- >> It's really an operating system and we were talking with Aaron with Elation's co-founder, we used the analogy of a car, how this show was like a car show engine show, what's in the engine and the technology and then it evolved every year, now it's like we're talking about the cars, now we're talking about driver experience-- >> That's right. >> At the end of the day, you just want to drive. You don't really care what's under the hood, you do but you don't, but there's those people who do care what's under the hood, so you can have best of both worlds. You've got the engines, you set up the infrastructure, but ultimately, you in the business side, you just want to drive, that's what's you're getting at? >> That's right. The time-to-market and speed to empower the users to play around with the data rather than IT trying to churn the data and confine access to data, that's a thing of the past. So we want more users to have faster access to data but at the same time govern it in a seamless manner. The word governance is still important because it's not about just give the data. >> And seamless is key. >> Seamless is key. >> Cause if you have democratization of data, you're implying that it is community-oriented, means that it's available, with access privileges all transparently or abstracted away from the users. >> Absolutely. >> So here's the question I want to ask you. There's been talk, I've been saying it for years going back to 2012 that an abstraction layer, a data layer will evolve and that'll be the real key. And then here in this show, I heard things like intelligent information fabric that is business, consumer-friendly. Okay, it's a mouthful, but intelligent information fabric in essence talks about an abstraction layer-- >> That's right. >> That doesn't really compromise anything but gives some enablement, creates some enabling value-- >> That's right. >> For software, how do you see that? >> As the word suggests, the earlier model was trying to build something for the end-users, but not which was end-user friendly, meaning to say, let me just give you a simple example. You had a data model that existed. Historically the way that we have approached using data is to say "hey, I've got a model and then let's fit that data into this model," without actually saying that "does this model actually serve the purpose?" You abstracted the model to a higher level. The whole point about intelligent data is about saying that, I'll give you a very simple analogy. Take zip code. Zipcode in US is very different from zipcode in India, it's very different from zipcode in Singapore. So if I had the ability for my data to come in, to say that "I know it's a zipcode, but this zipcode belongs to US, this zipcode belongs to Singapore, and this zipcode belongs to India," and more importantly, if I can further rev it up a notch, if I say that "this belongs to India, and this zipcode is valid." Look at where I'm going with intelligent sense. So that's what's up. If you look at the earlier model, you have to say that "yeah, this is a placeholder for zipcode." Now that makes sense, but what are you doing with it? >> Being a relational database model, it's just a field in a schema, you're taking it and abstracting it and creating value out of it. >> Precisely. So what I'm actually doing is accelerating the adoption, I'm making it more simpler for users to understand what the data is. So I don't need to as a user figure out "I got a zipcode, now is it a Singapore, India or what zipcode." >> So all this automation, Paxata's got a good system, we'll come back to the Paxata question in a second, I do want to drill down on that. But the big thing that I've been seeing at the show, and again Dave Alonte, my partner, co-CEO of Silicon Angle, we always talk about this all the time. He's more less bullish on Hadoop than I am. Although I love Hadoop, I think it's great but it's not the end-all, be-all. It's a great use case. We were critical early on and the thing we were critical on it was it was too much time being spent on the engine and how things are built, not on the business value. So there's like a lull period in the business where it was just too costly-- >> That's right. >> Total cost of ownership was a huge, huge problem. >> That's right. >> So now today, how did you deal with that and are you measuring the TCO or total cost of ownership cause at the end of the day, time to value, which is can you be up and running in 90 days with value and can you continue to do that, and then what's the overall cost to get there. Thoughts? >> So look I think TCO always underpins any technology investment. If someone said I'm doing a technology investment without thinking about TCO, I don't think he's a good technology leader, so TCO is obviously a driving factor. But TCO has multiple components. One is the TCO of the solution. The other aspect is TCO of what my value I'm going to get out of this system. So talking from an implementation perspective, what I look at as TCO is my whole ecosystem which is my hardware, software, so you spoke about Hadoop, you spoke about RDBMS, is Hadoop cheaper, etc? I don't want to get into that debate of cheaper or not but what I know is the ecosystem is becoming much, much more cheaper than before. And when I talk about ecosystem, I'm talking about RDBMS tools, I'm talking about Hadoop, I'm talking about BI tools, I'm talking about governance, I'm talking about this whole framework becoming cheaper. And it is also underpinned by the fact that hardware is also becoming cheaper. So the reality is all components in the whole ecosystem are becoming cheaper and given the fact that software is also becoming more open-sourced and people are open to using open-source software, I think the whole question of TCO becomes a much more pertinent question. Now coming to your point, do you measure it regularly? I think the honest answer is I don't think we are doing a good job of measuring it that well, but we do have that as one of the criteria for us to actually measure the success of our project. The way that we do is our implementation cost, at the time of writing out our PETs, we call it PETs, which is the Project Execution Document, we talk about cost. We say that "what's the implementation cost?" What are the business cases that are going to be an outcome of this? I'll give you an example of our anti-money laundering. I told you we reduced our cycle time from few weeks to a few days, and that in turn means the number of people involved in this whole process, you're reducing the overheads and the operational folks involved in it. That itself tells you how much we're able to save. So definitely, TCO is there and to say that-- >> And you are mindful of, it's what you look at, it's key. TCO is on your radar 100% you evaluate that into your deals? >> Yes, we do. >> So Paxata, what's so great about Paxata? Obviously you've had success with them. You're a customer, what's the deal. Was it the tech, was it the automation, the team? What was the key thing that got you engaged with them or specifically why Paxata? >> Look, I think the key to partnership there cannot be one ingredient that makes a partnership successful, I think there are multiple ingredients that make a partnership successful. We were one of the earliest adopters of Paxata. Given that we're a bank and we have multiple different systems and we have lot of manual processing involved, we saw Paxata as a good fit to govern these processes and ensure at the same time, users don't lose their experience. The good thing about Paxata that we like was obviously the simplicity and the look and feel of the tool. That's number one. Simplicity was a big point. The second one is about scale. The scale, the fact that it can take in millions of roles, it's not about just working off a sample of data. It can work on the entire dataset. That's very key for us. The third is to leverage our ecosystem, so it's not about saying "okay you give me this data, let me go figure out what to do and then," so Paxata works off the data lake. The fact that it can leverage the lake that we built, the fact that it's a simple and self-preparation tool which doesn't require a lot of time to bootstrap, so end-use people like you-- >> So it makes it usable. >> It's extremely user-friendly and usable in a very short period of time. >> And that helped with the journey? >> That really helped with the journey. >> Santosh, thanks so much for sharing. Santosh Mahendiran, who is the Global Tech Lead at the Analytics of the Bank at Standard Chartered Bank. Again, financial services, always a great early adopter, and you get success under your belt, congratulations. Data democratization is huge and again, it's an ecosystem, you got all that anti-money laundering to figure out, you got to get those reports out, lot of heavylifting? >> That's right, >> So thanks so much for sharing your story. >> Thank you very much. >> We'll give you more coverage after this short break, I'm John Furrier, stay tuned. More live coverage in New York City, its theCube.

Published Date : Sep 29 2017

SUMMARY :

Brought to you by SiliconANGLE Media here getting the data, checking out the scene, End of the day you got to implement. but at the end of the day it's got to have business value. how do you look at the business value? Where's the value in data democratization? So one of the challenges that we had was People like it, you just got a good spot, in most of the cases to end-users and we have end-users guidelines that you don't," and within the datasets that Earlier, you first couldn't access the entire dataset, So stress testing, bringing down the cycle time. So the good part is know the user, know your customer, That's the progression of the generations in which we At the end of the day, you just want to drive. but at the same time govern it in a seamless manner. Cause if you have democratization of data, So here's the question I want to ask you. So if I had the ability for my data to come in, and creating value out of it. So I don't need to as a user figure out "I got a zipcode, But the big thing that I've been seeing at the show, at the end of the day, time to value, which is can you be So the reality is all components in the whole ecosystem And you are mindful of, it's what you look at, it's key. Was it the tech, was it the automation, the team? The fact that it can leverage the lake that we built, It's extremely user-friendly and usable in a very at the Analytics of the Bank at Standard Chartered Bank. We'll give you more coverage after this short break,

ENTITIES

Entity	Category	Confidence
Dave Alonte	PERSON	0.99+
Standard Chartered Bank	ORGANIZATION	0.99+
three weeks	QUANTITY	0.99+
John Furrier	PERSON	0.99+
New York City	LOCATION	0.99+
2012	DATE	0.99+
2015	DATE	0.99+
Santosh Mahendiran	PERSON	0.99+
two	QUANTITY	0.99+
Aaron	PERSON	0.99+
US	LOCATION	0.99+
Santhosh Mahendiran	PERSON	0.99+
Singapore	LOCATION	0.99+
Santosh	PERSON	0.99+
four weeks	QUANTITY	0.99+
TCO	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
90 days	QUANTITY	0.99+
India	LOCATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
fifth year	QUANTITY	0.99+
today	DATE	0.99+
Midtown Manhattan	LOCATION	0.99+
Paxata	ORGANIZATION	0.99+
one ingredient	QUANTITY	0.99+
third	QUANTITY	0.99+
theCUBE	ORGANIZATION	0.99+
one part	QUANTITY	0.99+
millions	QUANTITY	0.99+
first	QUANTITY	0.99+
Eight years	QUANTITY	0.99+
Silicon Angle	ORGANIZATION	0.99+
Second part	QUANTITY	0.98+
third generation	QUANTITY	0.98+
fourth stage	QUANTITY	0.98+
two specific cases	QUANTITY	0.98+
both ways	QUANTITY	0.98+
one	QUANTITY	0.98+
BigData	ORGANIZATION	0.98+
NYC	LOCATION	0.98+
both worlds	QUANTITY	0.98+
first step	QUANTITY	0.97+
three years back	DATE	0.97+
second one	QUANTITY	0.97+
One	QUANTITY	0.97+
2017	DATE	0.96+
Hadoop	TITLE	0.96+
Strata Data	ORGANIZATION	0.96+
Strata Hadoop	ORGANIZATION	0.94+
step one	QUANTITY	0.94+
first question	QUANTITY	0.93+
a month	QUANTITY	0.92+
Elation	ORGANIZATION	0.9+
Data	EVENT	0.89+
2017	EVENT	0.89+
80%	QUANTITY	0.88+
Paxata	TITLE	0.88+
Big Data	EVENT	0.84+
theCube	ORGANIZATION	0.83+

Aaron Kalb, Alation | BigData NYC 2017

>> Announcer: Live from midtown Manhattan, it's the Cube. Covering Big Data New York City 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. >> Welcome back everyone, we are here live in New York City, in Manhattan for BigData NYC, our event we've been doing for five years in conjunction with Strata Data which is formerly Strata Hadoop, which was formerly Strata Conference, formerly Hadoop World. We've been covering the big data space going on ten years now. This is the Cube. I'm here with Aaron Kalb, whose Head of Product and co-founder at Alation. Welcome to the cube. >> Aaron Kalb: Thank you so much for having me. >> Great to have you on, so co-founder head of product, love these conversations because you're also co-founder, so it's your company, you got a lot of equity interest in that, but also head of product you get to have the 20 mile stare, on what the future looks, while inventing it today, bringing it to market. So you guys have an interesting take on the collaboration of data. Talk about what the means, what's the motivation behind that positioning, what's the core thesis around Alation? >> Totally so the thing we've observed is a lot of people working in the data space, are concerned about the data itself. How can we make it cheaper to store, faster to process. And we're really concerned with the human side of it. Data's only valuable if it's used by people, how do we help people find the data, understand the data, trust in the data, and that involves a mix of algorithmic approaches and also human collaboration, both human to human and human to computer to get that all organized. >> John Furrier: It's interesting you have a symbolics background from Stanford, worked at Apple, involved in Siri, all this kind of futuristic stuff. You can't go a day without hearing about Alexia is going to have voice-activated, you've got Siri. AI is taking a really big part of this. Obviously all of the hype right now, but what it means is the software is going to play a key role as an interface. And this symbolic systems almost brings on this neural network kind of vibe, where objects, data, plays a critical role. >> Oh, absolutely, yeah, and in the early days when we were co-founding the company, we talked about what is Siri for the enterprise? Right, I was you know very excited to work on Siri, and it's really a kind of fun gimmick, and it's really useful when you're in the car, your hands are covered in cookie dough, but if you could answer questions like what was revenue last quarter in the UK and get the right answer fast, and have that dialogue, oh do you mean fiscal quarter or calendar quarter. Do you mean UK including Ireland, or whatever it is. That would really enable better decisions and a better outcome. >> I was worried that Siri might do something here. Hey Siri, oh there it is, okay be careful, I don't want it to answer and take over my job. >> (laughs) >> Automation will take away the job, maybe Siri will be doing interviews. Okay let's take a step back. You guys are doing well as a start up, you've got some great funding, great investors. How are you guys doing on the product? Give us a quick highlight on where you guys are, obviously this is BigData NYC a lot going on, it's Manhattan, you've got financial services, big industry here. You've got the Strata Data event which is the classic Hadoop industry that's morphed into data. Which really is overlapping with cloud, IoTs application developments all kind of coming together. How do you guys fit into that world? >> Yeah, absolutely, so the idea of the data lake is kind of interesting. Psychologically it's sort of a hoarder mentality, oh everything I've ever had I want to keep in the attic, because I might need it one day. Great opportunity to evolve these new streams of data, with IoT and what not, but just cause you can get to it physically doesn't mean it's easy to find the thing you want, the needle in all that big haystack and to distinguish from among all the different assets that are available, which is the one that is actually trustworthy for your need. So we find that all these trends make the need for a catalog to kind of organize that information and get what you want all the more valuable. >> This has come up a lot, I want to get into the integration piece and how you're dealing with your partnerships, but the data lake integration has been huge, and having the catalog has come up with, has been the buzz. Foundationally if you will saying catalog is important. Why is it important to do the catalog work up front, with a lot of the data strategies? >> It's a great question, so, we see data cataloging as step zero. Before you can prep the data in a tool like Trifacta, PACSAT, or Kylo. Before you can visualize it in a tool like Tableau, or MicroStrategy. Before you can do some sort of cool prediction of what's going to happen in the future, with a data science engine, before any of that. These are all garbage in garbage out processes. The step zero is find the relevant data. Understand it so you can get it in the right format. Trust that it's good and then you can do whatever comes next >> And governance has become a key thing here, we've heard of the regulations, GDPR outside of the United States, but also that's going to have an arms length reach over into the United States impact. So these little decisions, and there's going to be an Equifax someday out there. Another one's probably going to come around the corner. How does the policy injection change the catalog equation? A lot of people are building machine learning algorithms on top of catalogs, and they're worried they might have to rewrite everything. How do you balance the trade off between good catalog design and flexibility on the algorithm side? >> Totally yes it's a complicated thing with governance and consumption right. There's people who are concerned with keeping the data safe, and there are people concerned with turning that data into real value, and these can seem to be at odds. What we find is actually a catalog as a foundation for both, and they are not as opposed as they seem. What Alation fundamentally does is we make a map of where the data is, who's using what data, when, how. And that can actually be helpful if your goal is to say let's follow in the footsteps of the best analyst and make more insights generated or if you want to say, hey this data is being used a lot, let's make sure it's being used correctly. >> And by the right people. >> And by the right people exactly >> Equifax they were fishing that pond dry months, months before it actually happened. With good tools like this they might have seen this right? Am I getting it right? >> That's exactly right, how can you observe what's going on to make sure it's compliant and that the answers are correct and that it's happening quickly and driving results. >> So in a way you're taking the collective intelligence of the user behavior and using that into understanding what to do with the data modeling? >> That's exactly right. We want to make each person in your organization as knowledgeable as all of their peers combined. >> So the benefit then for the customer would be if you see something that's developing you can double down on it. And if the users are using a lot of data, then you can provision more technology, more software. >> Absolutely, absolutely. It's sort of like when I was going to Stanford, there was a place where the grass was all dead, because people were riding their bikes diagonally across it. And then somebody smart was like, we're going to put a real gravel path there. So the infrastructure should follow the usage, instead of being something you try to enforce on people. >> It's a classic design meme that goes around. Good design is here, the more effective design is the path. >> Exactly. >> So let's get into the integration. So one of the hot topics here this year obviously besides cloud and AI, with cloud really being more the driver, the tailwind for the growth, AI being more the futuristic head room, is integration. You guys have some partnerships that you announced with integration, what are some of the key ones, and why are they important? >> Absolutely, so, there have been attempts in the past to centralize all the data in one place have one warehouse or one lake have one BI tool. And those generally fail, for different reasons, different teams pick different stacks that work for them. What we think is important is the single source of reference One hub with spokes out to all those different points. If you think about it it's like Google, it's one index of the whole web even though the web is distributed all over the place. To make that happen it's very important that we have partnerships to get data in from various sources. So we have partnerships with database vendors, with Cloudera and Hortonworks, with different BI tools. What's new are a few things. One is with Cloudera Navigator, they have great technical metadata around security and lineage over HGFS, and that's a way to bolster our catalog to go even deeper into what's happening in the files before things get surfaced and higher for places where we have a deeper offering today. >> So it's almost a connector to them in a way, you kind of share data. >> That's exactly right, we've a lot of different connectors, this is one new one that we have. Another, go ahead. >> I was going to go ahead continue. >> I was just going to say another place that is exciting is data prep tools, so Trifacta and Paxata are both places where you can find and understand an alation and then begin to manipulate in those tools. We announced with Paxata yesterday, the ability to click to profile, so if you want to actually see what's in some raw compressed avro file, you can see that in one click. >> It's interesting, Paxata has really been almost lapping, Trifacta because they were the leader in my mind, but now you've got like a Nascar race going on between the two firms, because data wrangling is a huge issue. Data prep is where everyone is stuck right now, they just want to do the data science, it's interesting. >> They are both amazing companies and I'm happy to partner with both. And actually Trifacta and Alation have a lot of joint customers we're psyched to work with as well. I think what's interesting is that data prep, and this is beginning to happen with analyst definitions of that field. It isn't just preparing the data to be used, getting it cleaned and shaped, it's also preparing the humans to use the data giving them the confidence, the tools, the knowledge to know how to manipulate it. >> And it's great progress. So the question I wanted to ask is now the other big trend here is, I mean it's kind of a subtext in this show, it's not really front and center but we've been seeing it kind of emerge as a concept, we see in the cloud world, on premise vs cloud. On premise a lot of people bring in the dev ops model in, and saying I may move to the cloud for bursting and some native applications, but at the end of the day there is a lot of work going on on premise. A lot of companies are kind of cleaning house, retooling, replatforming, whatever you want to do resetting. They are kind of getting their house in order to do on prem cloud ops, meaning a business model of cloud operations on site. A lot of people doing that, that will impact the story, it's going to impact some of the server modeling, that's a hot trend. How do you guys deal with the on premise cloud dynamic? >> Totally, so we just want to do what's right for the customer, so we deploy both on prem and in the cloud and then from wherever the Alation server is it will point to usually a mix of sources, some that are in the cloud like vetshifter S3 often with Amazon today, and also sources that are on prem. I do think I'm seeing a trend more and more toward the cloud and we have people that are migrating from HGFS to S3 is one thing we hear a lot about it. Strata with sort of dupe interest. But I think what's happening is people are realizing as each Equifax in turn happens, that this old wild west model of oh you surround your bank with people on horseback and it's physically in one place. With data it isn't like that, most people are saying I'd rather have the A+ teams at Salesforce or Amazon or Google be responsible for my security, then the people I can get over in the midwest. >> And the Paxata guys have loved the term Data Democracy, because that is really democratization, making the data free but also having the governance thing. So tell me about the Data Lake governance, because I've never loved the term Data Lake, I think it's more of a data ocean, but now you see data lake, data lake, data lake. Are they just silos of data lakes happening now? Are people trying to connect them? That's key, so that's been a key trend here. How do you handle the governance across multiple data lakes? >> That's right so the key is to have that single source of reference, so that regardless of which lake or warehouse, or little siloed Sequel server somewhere, that you can search in a single portal and find that thing no matter where it is. >> John: Can you guys do that? >> We can do that, yeah, I think the metaphor for people who haven't seen it really is Google, if you think about it, you don't even know what physical server a webpage is hosted from. >> Data lakes should just be invisible >> Exactly. >> So your interfacing with multiple data lakes, that's a value proposition for you. >> That's right so it could be on prem or in the cloud, multi-cloud. >> Can you share an example of a customer that uses that and kind of how it's laid out? >> Absolutely, so one great example of an interesting data environment is eBay. They have the biggest teradata warehouse in the world. They also have I believe two huge data lakes, they have hive on top of that, and Presto is used to sort of virtualize it across a mixture of teradata, and hive and then direct Presto query It gets very complicated, and they have, they are a very data driven organization, so they have people who are product owners who are in jobs where data isn't in their job title and they know how to look at excel and look at numbers and make choices, but they aren't real data people. Alation provides that accessibility so that they can understand it. >> We used to call the Hadoop world the car show for the data world, where for a long time it was about the engine what was doing what, and then it became, what's the car, and now how's it drive. Seeing that same evolution now where all that stuff has to get done under the hood. >> Aaron: Exactly. >> But there are still people who care about that, right. They are the mechanics, they are the plumbers, whatever you want to call them, but then the data science are the guys really driving things and now end users potentially, and even applications bots or what nots. It seems to evolve, that's where we're kind of seeing the show change a little bit, and that's kind of where you see some of the AI things. I want to get your thoughts on how you or your guys are using AI, how you see AI, if it's AI at all if it's just machine learning as a baby step into AI, we all know what AI could be, but it's really just machine learning now. How do you guys use quote AI and how has it evolved? >> It's a really insightful question and a great metaphor that I love. If you think about it, it used to be how do you build the car, and now I can drive the car even though I couldn't build it or even fix it, and soon I don't even have to drive the car, the car will just drive me, all I have to know is where I want to go. That's sortof the progression that we see as well. There's a lot of talk about deep learning, all these different approaches, and it's super interesting and exciting. But I think even more interesting than the algorithms are the applications. And so for us it's like today how do we get that turn by turn directions where we say turn left at the light if you want to get there And eventually you know maybe the computer can do it for you The thing that is also interesting is to make these algorithms work no matter how good your algorithm is it's all based on the quality of your training data. >> John: Which is a historical data. Historical data in essence the more historical data you have you need that to train the data. >> Exactly right, and we call this behavior IO how do we look at all the prior human behavior to drive better behavior in the future. And I think the key for us is we don't want to have a bunch of unpaid >> John: You can actually get that URL behavioral IO. >> We should do it before it's too late (Both laugh) >> We're live right now, go register that Patrick. >> Yeah so the goal is we don't want to have a bunch of unpaid interns trying to manually attack things, that's error prone and that's slow. I look at things like Luis von Ahn over at CMU, he does a thing where as you're writing in a CAPTCHA to get an email account you're also helping Google recognize a hard to read address or a piece of text from books. >> John: If you shoot the arrow forward, you just take this kind of forward, you almost think augmented reality is a pretext to what we might see for what you're talking about and ultimately VR are you seeing some of the use cases for virtual reality be very enterprise oriented or even end consumer. I mean Tom Brady the best quarterback of all time, he uses virtual reality to play the offense virtually before every game, he's a power user, in pharma you see them using virtual reality to do data mining without being in the lab, so lab tests. So you're seeing augmentation coming in to this turn by turn direction analogy. >> It's exactly, I think it's the other half of it. So we use AI, we use techniques to get great data from people and then we do extra work watching their behavior to learn what's right. And to figure out if there are recommendations, but then you serve those recommendations, either it's Google glasses it appears right there in your field of view. We just have to figure out how do we make sure, that in a moment of you're making a dashboard, or you're making a choice that you have that information right on hand. >> So since you're a technical geek, and a lot of folks would love to talk about this, so I'll ask you a tough question cause this is something everyone is trying to chase for the holy grail. How do you get the right piece of data at the right place at the right time, given that you have all these legacy silos, latencies and network issues as well, so you've got a data warehouse, you've got stuff in cold storage, and I've got an app and I'm doing something, there could be any points of data in the world that could be in milliseconds potentially on my phone or in my device my internet of thing wearable. How do you make that happen? Because that's the struggle, at the same time keep all the compliance and all the overhead involved, is it more compute, is it an architectural challenge how do you view that because this is the big challenge of our time. >> Yeah again I actually think it's the human challenge more than the technology challenge. It is true that there is data all over the place kind of gathering dust, but again if you think about Google, billions of web pages, I only care about the one I'm about to use. So for us it's really about being in that moment of writing a query, building a chart, how do we say in that moment, hey you're using an out of date definition of profit. Or hey the database you chose to use, the one thing you chose out of the millions that is actually is broken and stale. And we have interventions to do that with our partners and through our own first party apps that actually change how decisions get made at companies. >> So to make that happen, if I imagine it, you'd have to need access to the data, and then write software that is contextually aware to then run, compute, in context to the user interaction. >> It's exactly right, back to the turn by turn directions concept you have to know both where you're trying to go and where you are. And so for us that can be the from where I'm writing a Sequel statement after join we can suggest the table most commonly joined with that, but also overlay onto that the fact that the most commonly joined table was deprecated by a data steward data curator. So that's the moment that we can change the behavior from bad to good. >> So a chief data officer out there, we've got to wrap up, but I wanted to ask one final question, There's a chief data officer out there they might be empowered or they might be just a CFO assistant that's managing compliance, either way, someone's going to be empowered in an organization to drive data science and data value forward because there is so much proof that data science works. From military to play you're seeing examples where being data driven actually has benefits. So everyone is trying to get there. How do you explain the vision of Alation to that prospect? Because they have so much to select from, there's so much noise, there's like, we call it the tool shed out there, there's like a zillion tools out there there's like a zillion platforms, some tools are trying to turn into something else, a hammer is trying to be a lawnmower. So they've got to be careful on who the select, so what's the vision of Alation to that chief data officer, or that person in charge of analytics to scale operational analytics. >> Absolutely so we say to the CDO we have a shared vision for this place where your company is making decisions based on data, instead of based on gut, or expensive consultants months too late. And the way we get there, the reason Alation adds value is, we're sort of the last tool you have to buy, because with this lake mentality, you've got your tool shed with all the tools, you've got your library with all the books, but they're just in a pile on the floor, if you had a tool that had everything organized, so you just said hey robot, I need an hammer and this size nail and this text book on this set of information and it could just come to you, and it would be correct and it would be quick, then you could actually get value out of all the expense you've already put in this infrastructure, that's especially true on the lake. >> And also tools describe the way the works done so in that model tools can be in the tool shed no one needs to know it's in there. >> Aaron: Exactly. >> You guys can help scale that. Well congratulations and just how far along are you guys in terms of number of employees, how many customers do you have? If you can share that, I don't know if that's confidential or what not >> Absolutely, so we're small but growing very fast planning to double in the next year, and in terms of customers, we've got 85 customers including some really big names. I mentioned eBay, Pfizer, Safeway Albertsons, Tesco, Meijer. >> And what are they saying to you guys, why are they buying, why are they happy? >> They share that same vision of a more data driven enterprise, where humans are empowered to find out, understand, and trust data to make more informed choices for the business, and that's why they come and come back. >> And that's the product roadmap, ethos, for you guys that's the guiding principle? >> Yeah the ultimate goal is to empower humans with information. >> Alright Aaron thanks for coming on the Cube. Aaron Kalb, co-founder head of product for Alation here in New York City for BigData NYC and also Strata Data I'm John Furrier thanks for watching. We'll be right back with more after this short break.

Published Date : Sep 28 2017

SUMMARY :

Brought to you by This is the Cube. Great to have you on, so co-founder head of product, Totally so the thing we've observed is a lot Obviously all of the hype right now, and get the right answer fast, and have that dialogue, I don't want it to answer and take over my job. How are you guys doing on the product? doesn't mean it's easy to find the thing you want, and having the catalog has come up with, has been the buzz. Understand it so you can get it in the right format. and flexibility on the algorithm side? and make more insights generated or if you want to say, Am I getting it right? That's exactly right, how can you observe what's going on We want to make each person in your organization So the benefit then for the customer would be So the infrastructure should follow the usage, Good design is here, the more effective design is the path. You guys have some partnerships that you announced it's one index of the whole web So it's almost a connector to them in a way, this is one new one that we have. the ability to click to profile, going on between the two firms, It isn't just preparing the data to be used, but at the end of the day there is a lot of work for the customer, so we deploy both on prem and in the cloud because that is really democratization, making the data free That's right so the key is to have that single source really is Google, if you think about it, So your interfacing with multiple data lakes, on prem or in the cloud, multi-cloud. They have the biggest teradata warehouse in the world. the car show for the data world, where for a long time and that's kind of where you see some of the AI things. and now I can drive the car even though I couldn't build it Historical data in essence the more historical data you have to drive better behavior in the future. Yeah so the goal is and ultimately VR are you seeing some of the use cases but then you serve those recommendations, and all the overhead involved, is it more compute, the one thing you chose out of the millions So to make that happen, if I imagine it, back to the turn by turn directions concept you have to know How do you explain the vision of Alation to that prospect? And the way we get there, no one needs to know it's in there. If you can share that, I don't know if that's confidential planning to double in the next year, for the business, and that's why they come and come back. Yeah the ultimate goal is Alright Aaron thanks for coming on the Cube.

ENTITIES

Entity	Category	Confidence
Luis von Ahn	PERSON	0.99+
eBay	ORGANIZATION	0.99+
Aaron Kalb	PERSON	0.99+
Pfizer	ORGANIZATION	0.99+
John	PERSON	0.99+
Aaron	PERSON	0.99+
Tesco	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Safeway Albertsons	ORGANIZATION	0.99+
Siri	TITLE	0.99+
Google	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
New York City	LOCATION	0.99+
UK	LOCATION	0.99+
20 mile	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
BigData	ORGANIZATION	0.99+
five years	QUANTITY	0.99+
Equifax	ORGANIZATION	0.99+
two firms	QUANTITY	0.99+
Apple	ORGANIZATION	0.99+
Meijer	ORGANIZATION	0.99+
ten years	QUANTITY	0.99+
Cloudera	ORGANIZATION	0.99+
Trifacta	ORGANIZATION	0.99+
85 customers	QUANTITY	0.99+
Alation	ORGANIZATION	0.99+
Patrick	PERSON	0.99+
both	QUANTITY	0.99+
Strata Data	ORGANIZATION	0.99+
millions	QUANTITY	0.99+
United States	LOCATION	0.99+
Paxata	ORGANIZATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
excel	TITLE	0.99+
Manhattan	LOCATION	0.99+
last quarter	DATE	0.99+
Ireland	LOCATION	0.99+
GDPR	TITLE	0.99+
Tom Brady	PERSON	0.99+
each person	QUANTITY	0.99+
Salesforce	ORGANIZATION	0.98+
next year	DATE	0.98+
NYC	LOCATION	0.98+
one	QUANTITY	0.98+
this year	DATE	0.98+
yesterday	DATE	0.98+
today	DATE	0.97+
one lake	QUANTITY	0.97+
Nascar	ORGANIZATION	0.97+
one warehouse	QUANTITY	0.97+
Strata Data	EVENT	0.96+
Tableau	TITLE	0.96+
One	QUANTITY	0.96+
Both laugh	QUANTITY	0.96+
billions of web pages	QUANTITY	0.96+
single portal	QUANTITY	0.95+

Nenshad Bardoliwalla & Pranav Rastogi | BigData NYC 2017

>> Announcer: Live from Midtown Manhattan it's theCUBE. Covering Big Data New York City 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. >> OK, welcome back everyone we're here in New York City it's theCUBE's exclusive coverage of Big Data NYC, in conjunction with Strata Data going on right around the corner. It's out third day talking to all the influencers, CEO's, entrepreneurs, people making it happen in the Big Data world. I'm John Furrier co-host of theCUBE, with my co-host here Jim Kobielus who is the Lead Analyst at Wikibon Big Data. Nenshad Bardoliwalla. >> Bar-do-li-walla. >> Bardo. >> Nenshad Bardoliwalla. >> That guy. >> Okay, done. Of Paxata, Co-Founder & Chief Product Officer it's a tongue twister, third day, being from Jersey, it's hard with our accent, but thanks for being patient with me. >> Happy to be here. >> Pranav Rastogi, Product Manager, Microsoft Azure. Guys, welcome back to theCUBE, good to see you. I apologize for that, third day blues here. So Paxata, we had your partner on Prakash. >> Prakash. >> Prakash. Really a success story, you guys have done really well launching theCUBE fun to watch you guys from launching to the success. Obviously your relationship with Microsoft super important. Talk about the relationship because I think this is really people can start connecting the dots. >> Sure, maybe I'll start and I'LL be happy to get Pranav's point of view as well. Obviously Microsoft is one of the leading brands in the world and there are many aspects of the way that Microsoft has thought about their product development journey that have really been critical to the way that we have thought about Paxata as well. If you look at the number one tool that's used by analysts the world over it's Microsoft Excel. Right, there isn't even anything that's a close second. And if you look at the the evolution of what Microsoft has done in many layers of the stack, whether it's the end user computing paradigm that Excel provides to the world. Whether it's all of their recent innovation in both hybrid cloud technologies as well as the big data technologies that Pranav is part of managing. We just see a very strong synergy between trying to combine the usage by business consumers of being able to take advantage of these big data technologies in a hybrid cloud environment. So there's a very natural resonance between the 2 companies. We're very privileged to have Microsoft Ventures as an investor in Paxata and so the opportunity for us to work with one of the great brands of all time in our industry was really a privilege for us. Yeah, and that's the corporate sides so that wasn't actually part of it. So it's a different part of Microsoft which is great. You have also business opportunity with them. >> Nenshad : We do. >> Obviously data science problem that we're seeing is that they need to get the data faster. All that prep work, seems to be the big issue. >> It does and maybe we can get Pranav's point of view from the Microsoft angle. >> Yeah so to sort of continue what Nenshad was saying, you know the data prep in general is sort of a key core competence which is problematic for lots of users, especially around the knowledge that you need to have in terms of the different tools you can use. Folks who are very proficient will do ETL or data preparation like scenarios using one of the computing engines like Hive or Spark. That's good, but there's this big audience out there who like Excel-like interface, which is easy to use a very visually rich graphical interface where you can drag and drop and can click through. And the idea behind all of this is how quickly can I get insights from my data faster. Because in a big data space, it's volume, variety and velocity. So data is coming at a very fast rate. It's changing it's growing. And if you spend lot of time just doing data prep you're losing the value of data, or the value of data would change over time. So what we're trying to do would sort of enabling Paxata or HDInsight is enabling these users to use Paxata, get insights from data faster by solving key problems of doing data prep. >> So data democracy is a term that we've been kicking around, you guys have been talking about as well. What is actually mean, because we've been teasing out first two days here at theCUBE and BigData NYC is. It's clear the community aspect of data is growing, almost on a similar path as you're seeing with open source software. That genie's out the bottle. Open source software, tier one, it won, it's only growing exponentially. That same paradigm is moving into the data world where the collaboration is super important, in this data democracy, what is that actually mean and how does that relate to you guys? >> So the perspective we have is that first something that one of our customers said, that is there is no democracy without certain degrees of governance. We all live in a in a democracy. And yet we still have rules that we have to abide by. There are still policies that society needs to follow in order for us to be successful citizens. So when when a lot of folks hear the term democracy they really think of the wild wild west, you know. And a lot of the analytic work in the enterprise does have that flavor to it, right, people download stuff to their desktop, they do a little bit of massaging of the data. They email that to their friend, their friend then makes some changes and next thing you know we have what what some folks affectionately call spread mart hell. But if you really want to democratize the technology you have to wrap not only the user experience, like Pranav described, into something that's consumable by a very large number of folks in the enterprise. You have to wrap that with the governance and collaboration capabilities so that multiple people can work off the same data set. That you can apply the permissions so that people, who is allowed to share with each other and under what circumstances are they allowed to share. Under what circumstances are you allowed to promote data from one environment to another? It may be okay for someone like me to work in a sandbox but I cannot push that to a database or HDFS or Azure BLOB storage unless I actually have the right permissions to do so. So I think what you're seeing is that, in general, technology is becoming a, always goes on this trend, towards democratization. Whether it's the phone, whether it's the television, whether it's the personal computer and the same thing is happening with data technologies and certainly companies like. >> Well, Pranav, we're talking about this when you were on theCUBE yesterday. And I want to get your thoughts on this. The old way to solve the governance problem was to put data in silos. That was easy, I'll just put it in a silo and take care of it and access control was different. But now the value of the data is about cross-pollinating and make it freely available, horizontally scalable, so that it can be used. But the same time and you need to have a new governance paradigm. So, you've got to democratize the data by making it available, addressable and use for apps. The same time there's also the concerns on how do you make sure it doesn't get in the wrong hands and so on and so forth. >> Yeah and which is also very sort of common regarding open source projects in the cloud is a how do you ensure that the user authorized to access this open source project or run it has the right credentials is authorized and stuff. So, the benefit that you sort of get in the cloud is there's a centralized authentication system. There's Azure Active Directory, so you know most enterprise would have Active Directory users. Who are then authorized to either access maybe this cluster, or maybe this workload and they can run this job and that sort of further that goes down to the data layer as well. Where we have active policies which then describe what user can access what files and what folders. So if you think about the entrance scenario there is authentication and authorization happening and for the entire system when what user can access what data. And part of what Paxata brings in the picture is like how do you visualize this governance flow as data is coming from various sources, how do you make sure that the person who has access to data does have access data, and the one who doesn't cannot access data. >> Is that the problem with data prep is just that piece of it? What is the big problem with data prep, I mean, that seems to be, everyone keeps coming back to the same problem. What is causing all this data prep. >> People not buying Paxata it's very simple. >> That's a good one. Check out Paxata they're going to solve your problems go. But seriously, there seems to be the same hole people keep digging themselves into. They gather their stuff then next thing they're in the in the same hole they got to prepare all this stuff. >> I think the previous paradigms for doing data preparation tie exactly to the data democracy themes that we're talking about here. If you only have a very silo'd group of people in the organization with very deep technical skills but don't have the business context for what they're actually trying to accomplish, you have this impedance mismatch in the organization between the people who know what they want and the people who have the tools to do it. So what we've tried to do, and again you know taking a page out of the way that Microsoft has approached solving these problems you know both in the past in the present. Is to say look we can actually take the tools that once were only in the hands of the, you know, shamans who know how to utter the right incantations and instead move that into the the common folk who actually. >> The users. >> The users themselves who know what they want to do with the data. Who understand what those data elements mean. So if you were to ask the Paxata point of view, why have we had these data prep problems? Because we've separated the people who had the tools from the people who knew what they wanted to do with it. >> So it sounds to me, correct me if this is the wrong term, that what you offer in your partnership is it basically a broad curational environment for knowledge workers. You know, to sift and sort and annotating shared data with the lineage of the data preserved in essentially a system of record that can follow the data throughout its natural life. Is that a fair characterization? >> Pranav: I would think so yeah. >> You mention, Pranav, the whole issue of how one visualizes or should visualize this entire chain of custody, as it were, for the data, is there is there any special visualization paradigm that you guys offer? Now Microsoft, you've made a fairly significant investment in graph technology throughout your portfolio. I was at Build back in May and Sacha and the others just went to town on all things to do with Microsoft Graph, will that technology be somehow at some point, now or in the future, be reflected in this overall capability that you've established here with your partner here Paxata? >> I am not sure. So far, I think what you've talked about is some Graph capabilities introduced from the Microsoft Graph that's sort of one extreme. The other side of Graph exists today as a developer you can do some Graph based queries. So you can go to Cosmos DB which had a Gremlin API. For Graph based query, so I don't know how. >> I'll get right to the question. What's the Paxata benefits of with HDInsight? How does that, just quickly, explain for the audience. What is that solution, what are the benefits? >> So the the solution is you get a one click install of installing Paxata HDInsight and the benefit is as a benefit for a user persona who's not, sort of, used to big data or Hadoop they can use a very familiar GUI-based experience to get their insights from data faster without having any knowledge of how Spark works or Hadoop works. >> And what does the Microsoft relationship bring to the table for Paxata? >> So I think it's a couple of things. One is Azure is clearly growing at an extremely fast pace. And a lot of the enterprise customers that we work with are moving many of their workloads to Azure and and these cloud based environments. Especially for us, the unique value proposition of a partner who truly understands the hybrid nature of the world. The idea that everything is going to move to the cloud or everything is going to stay on premise is too simplistic. Microsoft understood that from day one. That data would be in it and all of those different places. And they've provided enabling technologies for vendors like us. >> I'll just say it to maybe you're too coy to say it, but the bottom line is you have an Excel-like interface. They have Office 365 they're user's going to instantly love that interface because it's an easy to use interface an Excel-like it's not Excel interface per se. >> Similar. >> Metaphor, graphical user interface. >> Yes it is. >> It's clean and it's targeted at the analyst role or user. >> That's right. >> That's going to resonate in their install base. >> And combined with a lot of these new capabilities that Microsoft is rolling out from a big data perspective. So HDInsight has a very rich portfolio of runtime engines and capabilities. They're introducing new data storage layers whether it's ADLS or Azure BLOB storage, so it's really a nice way of us working together to extract and unlock a lot of the value that Microsoft. >> So, here's the tough question for you, open source projects I see Microsoft, comments were hell froze because LINUX is now part of their DNA, which was a comment I saw at the even this week in Orlando, but they're really getting behind open source. From open compute, it's just clearly new DNA's. They're they're into it. How are you guys working together in open source and what's the impact to developers because now that's only one cloud, there's other clouds out there so data's going to be an important part of it. So open source, together, you guys working together on that and what's the role for the data? >> From an open source perspective, Microsoft plays a big role in embracing open source technologies and making sure that it runs reliably in the cloud. And part of that value prop that we provide in sort of Azure HDInsight is being sure that you can run these open source big data workloads reliably in the cloud. So you can run open source like Apache, Spark, Hive, Storm, Kafka, R Server. And the hard part about running open source technology in the cloud is how do you fine tune it, and how do you configure it, how do you run it reliably. And that's what sort of what we bring in from a cloud perspective. And we also contribute back to the community based on sort of what learned by running these workloads in the cloud. And we believe you know in the broader ecosystem customers will sort of have a mixture of these combinations and their solution They'll be using some of the Microsoft solutions some open source solutions some solutions from ecosystem that's how we see our customer solution sort of being built today. >> What's the big advantage you guys have at Paxata? What's the key differentiator for why someone should work with you guys? Is it the automation? What's the key secret sauce to you guys? >> I think it's a couple of dimensions. One is I think we have come the closest in the industry to getting a user experience that matches the Excel target user. A lot of folks are attempting to do the same but the feedback we consistently get is that when the Excel user uses our solution they just, they get it. >> Was there a design criteria, was that from the beginning how you were going to do this? >> From day one. >> So you engineer everything to make it as simple as like Excel. >> We want people to use our system they shouldn't be coding, they shouldn't be writing scripts. They just need to be able. >> Good Excel you just do good macros though. >> That's right. >> So simple things like that right. >> But the second is being able to interact with the data at scale. There are a lot of solutions out there that make the mistake in our opinion of sampling very tiny amounts of data and then asking you to draw inferences and then publish that to batch jobs. Our whole approach is to smash the batch paradigm and actually bring as much into the interactive world as possible. So end users can actually point and click on 100 million rows of data, instead of the million that you would get in Excel, and get an instantaneous response. Verses designing a job in a batch paradigm and then pushing it through the the batch. >> So it's interactive data profiling over vast corpuses of data in the cloud. >> Nenshad: Correct. >> Nenshad Bardoliwalla thanks for coming on theCUBE appreciate it, congratulations on Paxata and Microsoft Azure, great to have you. Good job on everything you do with Azure. I want to give you guys props, with seeing the growth in the market and the investment's been going well, congratulations. Thanks for sharing, keep coverage here in BigData NYC more coming after this short break.

Published Date : Sep 28 2017

SUMMARY :

Brought to you by SiliconANGLE Media in the Big Data world. it's hard with our accent, So Paxata, we had your partner on Prakash. launching theCUBE fun to watch you guys has done in many layers of the stack, is that they need to get the data faster. from the Microsoft angle. the different tools you can use. and how does that relate to you guys? have the right permissions to do so. But the same time and you need to have So, the benefit that you sort of get in the cloud What is the big problem with data prep, But seriously, there seems to be the same hole and instead move that into the the common folk from the people who knew what they wanted to do with it. is the wrong term, that what you offer for the data, is there is there So you can go to Cosmos DB which had a Gremlin API. What's the Paxata benefits of with HDInsight? So the the solution is you get a one click install And a lot of the enterprise customers but the bottom line is you have an Excel-like interface. user interface. It's clean and it's targeted at the analyst role to extract and unlock a lot of the value So open source, together, you guys working together and making sure that it runs reliably in the cloud. A lot of folks are attempting to do the same So you engineer everything to make it as simple They just need to be able. Good Excel you just do But the second is being able to interact So it's interactive data profiling and Microsoft Azure, great to have you.

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Jersey	LOCATION	0.99+
Microsoft	ORGANIZATION	0.99+
Excel	TITLE	0.99+
2 companies	QUANTITY	0.99+
John Furrier	PERSON	0.99+
New York City	LOCATION	0.99+
Orlando	LOCATION	0.99+
Nenshad	PERSON	0.99+
Bardo	PERSON	0.99+
Nenshad Bardoliwalla	PERSON	0.99+
third day	QUANTITY	0.99+
both	QUANTITY	0.99+
Office 365	TITLE	0.99+
yesterday	DATE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
100 million rows	QUANTITY	0.99+
BigData	ORGANIZATION	0.99+
Paxata	ORGANIZATION	0.99+
Microsoft Ventures	ORGANIZATION	0.99+
Pranav Rastogi	PERSON	0.99+
first two days	QUANTITY	0.99+
one	QUANTITY	0.98+
One	QUANTITY	0.98+
million	QUANTITY	0.98+
second	QUANTITY	0.98+
Midtown Manhattan	LOCATION	0.98+
Spark	TITLE	0.98+
this week	DATE	0.98+
first	QUANTITY	0.97+
theCUBE	ORGANIZATION	0.97+
one click	QUANTITY	0.97+
Prakash	PERSON	0.97+
Azure	TITLE	0.97+
May	DATE	0.97+
Wikibon Big Data	ORGANIZATION	0.96+
Hadoop	TITLE	0.96+
Hive	TITLE	0.94+
today	DATE	0.94+
Strata Data	ORGANIZATION	0.94+
Pranav	PERSON	0.93+
NYC	LOCATION	0.93+
one cloud	QUANTITY	0.93+
2017	DATE	0.92+
Apache	ORGANIZATION	0.9+
Paxata	TITLE	0.9+
Graph	TITLE	0.89+
Pranav	ORGANIZATION	0.88+

Day One Wrap | BigData NYC 2017

>> Announcer: Live from midtown Manhattan, it's theCUBE covering BigData New York City 2017. Brought to you by SiliconANGLE Media, and its ecosystem sponsors. >> Hello everyone, welcome back to our day one, at Big Data NYC, of three days of wall to wall coverage. This is theCUBE. I'm John Furrier, with my co-hosts Jim Kobielus and Peter Burris. We do this event every year, this is theCUBE's BigData NYC. It's our event that we run in New York City. We have a lot of great content, we have theCUBE going live, we don't go to Strata anymore. We do our own event in conjunction, they have their own event. You can go pay over there and get the booth space, but we do our media event and attract all the influencers, the VIPs, the executives, the entrepreneurs, we've been doing it for five years, we're super excited, and thank our sponsors for allowing us to get here and really appreciate the community for continuing to support theCUBE. We're here to wrap up day one what's going on in New York, certainly we've had a chance to check out the Strata situations, Strata Data, which is Cloudera, and O'Reilly, mainly O'Reilly media, they run that, kind of old school event, guys. Let's kind of discuss the impact of the event in context to the massive growth that's going outside of their event. And their event is a walled garden, you got to pay to get in, they're very strict. They don't really let a lot of people in, but, okay. Outside of that the event it going global, the activity around big data is going global. It's more than Hadoop, we certainly thought about that's old news, but what's the big trend this year? As the horizontally scalable cloud enters the equation. >> I think the big trend, John, is the, and we've talked about in our research, is that we have finally moved away from big data, being associated with a new type of infrastructure. The emergence of AI, deep learning, machine learning, cognitive, all these different names for relatively common things, are an indications that we're starting to move up into people thinking about applications, people thinking about services they can use to get access, or they can get access to build their applications. There's not enough skills. So I think that's probably the biggest thing is that the days of failure being measured by whether or not you can scale your cluster up, are finally behind us. We're using the cloud, other resources, we have enough expertise, the technologies are becoming simpler and more straightforward to do that. And now we're thinking about how we're going to create value out of all of this, which is how we're going to use the data to learn something new about what we're doing in the organization, combine it with advanced software technologies that actually dramatically reduce the amount of work that's necessary to make a decision. >> And the other trend I would say, on top of that, just to kind of put a little cherry on top of that, kind of the business focus which is again, not the speeds and feeds, although under the hood, lot of great innovation going on from deep learning, and there's a ton of stuff. However, the conversation is the business value, how it's transforming work and, but the one thing that nobody's talking about is, this is why I'm not bullish on these one shows, one show meets all kind of thing like O'Reilly Media does, because there's multiple personas in a company now in the ecosystem. There are now a variety of buyers of some products. At least in the old days, you'd go talk to the IT CIO and you're in. Not anymore. You have an analytics person, a Chief Data Officer, you might have an IT person, you might have a cloud person. So you're seeing a completely broader set of potential buyers that are driving the change. We heard Paxata talk about that. And this is a dynamic. >> Yeah, definitely. We see a fair amount of, what I'm sensing about Strata, how it's evolving these big top shows around data, it's evolving around addressing a broader, what we call maker culture. It's more than software developers. It's business analysts, it's the people who build the hardware for the internet of things into which AI and machine learning models are being containerized and embedded. I've, you know, one of the takeaways from today so far, and the keynotes are tomorrow at Strata, but I've been walking the atrium at the Javits Center having some interesting conversations, in addition, of course, to the ones we've been having here at theCUBE. And what I'm notic-- >> John: What are those hallway conversations that you're having? >> Yeah. >> What's going on over there? >> Yeah, what I've, the conversations I've had today have been focused on, the chief trend that I'm starting to sense here is that the productionization of the machine learning development process or pipeline, is super hot. It spans multiple data platforms, of course. You've got a bit of Hadoop in the refinery layer, you've got a bit of in-memory columnar databases, like the Act In discussed at their own, but the more important, not more important, but just as important is that what users are looking at is how can we build these DevOps pipelines for continuous management of releases of machine learning models for productionization, but also for ongoing evaluation and scoring and iteration and redeployment into business applications. You know there's, I had conversations with Mapbar, I had conversations with IBM, I mean, these were atrium conversations about things that they are doing. IBM had an announcement today on the wires and so forth with some relevance to that. And so I'm seeing a fair, I'm hearing, I'm sensing a fair amount of It's The Apps, it's more than just Hadoop. But it's very much the flow of these, these are the core pieces, like AI, core pieces of intellectual property in the most disruptive applications that are being developed these days in all manner, in business and industry in the consumer space. >> So I did not go over to the show floor yet, I've not been over to the Atrium. But, I'll bet you dollars to donuts this is indicative of something that always happens in a complex technology environment. And again, this is something we've thought about particularly talked about here on theCUBE, in fact we talked to Paxata about it a little bit as well. And that is, as an organization gains experience, it starts to specialize. But there's always moments, there' always inflection points in the process of gaining that experience. And by that, or one of the indications of that is that you end up with some people starting to specialize, but not quite sure what they're specializing in yet. And I think that's one of the things that's happening right now is that the skills gap is significant. At the same time that the skills gap is being significant, we're seeing people start to declare their specializations that they don't have skills, necessarily, to perform yet. And the tools aren't catching up. So there's still this tension model, open source, not necessarily focusing on the core problem. Skills looking for tools, and explosion in the number of tools out there, not focused on how you simplify, streamline, and put into operation. How all these things work together. It's going to be an interesting couple of years, but the good news, ultimately, is that we are starting to see for the first time, even on theCUBE interviews today, the emergence of a common language about how we think about the characteristics of the problem. And I think that that heralds a new round of experience and a new round of thinking about what is all the business analysts, the data scientists, the developer, the infrastructure person, business person. >> You know, you bring up that comment, those comments, about the specialists and the skills. We talked, Jim and I talked on the segment this morning about tool shed. We're talking about there are so many tools out there, and everyone loves a good tool, a hammer. But the old expression is if you're a hammer, everything looks like a nail, that's cliche. But what's happened is there are a plethora of tools, right, and tools are good. Platforms are better. As people start to replatformize everything they could have too many tools. So we asked the C Chief Data Officer, he goes yeah, I try to manage the tool tsunami, but his biggest issue was he buys a hammer, and it turns into a lawnmower. That's a vendor mentality of-- >> What a truck. Well, but that's a classic example of what I'm talking about. >> Or someone's trying to use a hammer to mow the lawn right? Again, so this is what you're getting at. >> Yeah! >> The companies out there are groping for relevance, and that's how you can see the pretenders from the winners. >> Well, a tool, fundamentally, is pedagogical. A tool describes the way work is going to be performed, and that's been a lot of what's been happening over the course of the past few years. Now, businesses that get more experience, they're describing their own way of thinking throughout a problem. And they're still not clear on how to bring the tools together because the tools are being generated, put into the marketplace by an expanding array of folks and companies, and they're now starting to shuffle for position. But I think ultimately, what we're going to see happen over the next year and I think this is an inflection point, going back to this big tent notion, is the idea that ultimately we are going to see greater specialization over the next few years. My guess is that this year will probably, should get better, or should get bigger, I'm not certain it will because it's focused on the problems that we already solved and not moving into the problems that we need to focus on. >> Yeah, I mean, a lot of the problems I have with the O'Reilly show is that they try to throw default leadership out there, and there's some smart people that go to that, but the problem is is that it's too monetization, they try to make too much money from the event when this action's happening. And this is where the tool becomes, the hammer becomes a lawnmower, because what's happening is that the vendor's trying to stay alive. And you mentioned this earlier, to your point, the customers that are buyers of the technology don't want to have something that's not going to be a fit, that's going to be agile from us. They don't want the hammer that they bought to turn into something that they didn't buy it for. And sometimes, teams can't make that leap, skillset-wise, to literally pivot overnight. Especially as a startup. So this is where the selection of the companies makes a big difference. And a lot of the clients, a lot of customers that we're serving on the end user side are reaching the conclusion that the tools themselves, while important, are clearly not where the value is. The value is in how they put them together for their business. And that's something that's going to have to, again, that's a maturation process, roles, responsibilities, the chief data officer, they're going to have a role in that or not, but ultimately, they're going to have to start finding their pipelines, their process for ingestion out to analysis. >> Let me get your reaction, you guys, your reactions to this tape. Because one of the things that I heard today, and I think this validates a bigger trend as we talk about the landscape of the markup from the event to how people are behaving and promoting and building products and companies. The pattern that I'm hearing, we said it multiple times on theCUBE today and one from the guy who's basically reading the script, is, in his interview, explaining 'cause it's so factual, I asked him the straight-up question, how do you deal with suppliers? What's happening is the trend is don't show me sizzle. I want to see the steak. Don't sell me hype, I got too many business things to work on right now, I need to nail down some core things. I got application development, I got security to build out big time, and then I got all those data channels that I need, I don't have time for you to sell me a hammer that might not be a hammer in the future! So I need real results, I need real performance that's going to have a business impact. That is the theme, and that trumps the hype. I see that becoming a huge thing right now. Your thoughts, reactions, guys-- >> Well I'll start-- >> What's your reaction then? True or false on the trend? Be-- >> Peter: True! >> Get down to business. >> I'll say that much, true, but go ahead. >> I'll say true as well, but let me just add some context. I think a show like O'Reilly Strata is good up to a point, especially to catalyze an industry, a growing industry like big data's own understanding of it, of the value that all these piece parts, Hadoop and Spark and so forth, can add, can provide when deployed in a unit according to some emerging patterns, whatever. But at a certain point where a space like this becomes well-established, it just becomes a pure marketing event. And customers, at a certain point say, you know, I come here for ideas about things that I can do in my environ, my business, that could actually many ways help me to do new things. You know, you can't get that at a marketing-oriented, you can get that, as a user, more at a research-oriented show. When it's an emerging market, like let's say Spark has been, like the Spark Summit was in the beginning, those are kind of like, when industries go through the phase those are sort of in the beginning, sort of research-focused shows where industry, the people who are doing the development of this new architecture, they talk ideas. Now I think in 2017, where we're at now, is what the idea is everybody's trying to get their heads around, they're all around AI, what the heck that is. For a show like an O'Reilly Ready show to have relevance in a market that's in this much ferment of really innovation around AI and deep learning, there needs to be a core research focus that you don't get at this point in the lifecycle of Strata, for example. So that's my take on what's going on. >> So, my take is this. And first of all, I agree with everything you said, so it's not in opposition to anything. Many years ago I had this thought that I think still is very true. And that is the value of industry, the value of infrastructure is inversely correlated with the degree to which anybody knows anything about it. So if I know a lot about my infrastructure, it's not creating a lot of business value. In fact, more often than not, it's not working, which is why people end up knowing more about it. But the problem is, the way that technology has always been sold is as a differentiated, some sort of value-add thing. So you end up with this tension. And this is an application domain, a very, very complex application domain like big data. The tension is, my tool is so great that, and it's differentiating all those other stuff, yeah but it becomes valuable to me if and only if nobody knows it exists. So I think, and one of the reasons why I bring this up, John, is many of the companies that are in the big data space today that are most successful are companies that are positioning themselves as a service. There's a lot of interesting SaaS applications for big data analysis, pipeline management, all the other things you can talk about, that are actually being rendered as a service, and not as a product. So that all you need to know is what the tool does. You don't need to know the tool. And I don't know that that's necessarily going to last, but I think it's very, very interesting that a lot of the more successful companies that we're talking to are themselves mere infrastructure SaaS companies. >> Because-- >> AtScale is interesting, though. They came in as a service. But their service has an interesting value proposition. They can allow you to essentially virtualize the data to play with it, so people can actually sandbox data. And if it gets traction, they can then double-down on it. So to me that's a freebie. To me, I'm a customer, I got to love that kind of environment because you're essentially giving almost a developer-like environment-- >> Peter: Value without necessarily-- >> Yeah, the cost, and the guy gets the signal from the marketplace, his customer, of what data resolves. To me that's a very cool scene. I don't, you saying that's bad, or? >> No, no, I think it's interesting. I think it's-- >> So you're saying service is-- >> So what I'm saying is, what I'm saying is, that the value of infrastructure is inversely proportional to the degree to which anybody knows anything about it. But you've got a bunch of companies who are selling, effectively, infrastructure software, so it's a value-add thing, and that creates a problem. And a lot of other companies not only have the ability to sell something as a service as opposed to a product, they can put the service froward, and people are using the service and getting what they need out of it without knowing anything about the tool. >> I like that. Let me just maybe possibly restate what you just said. When a market goes toward a SaaS go-to-market delivery model for solutions, the user, the buyer's focus is shifted away from what the solution can do, I mean, how it works under the cover. >> Peter: Quote, value-add-- >> To what it can do potentially for you. >> The business, that's right. >> But you're not going to, don't get distracted by the implementation details. You have then as a user become laser-focused on, wow, there's a bunch of things that this can do for me. I don't care how it works, really. You SaaS provider, you worry about that stuff. I can worry now about somehow extracting the value. I'm not distracted. >> This show, or this domain, is one of the domains where SaaS has moved, just as we're thinking about moving up the stack, the SaaS business model is moving down the stack in the big data world. >> All right, so, in summary, the stack is changing. Predictions for the next few days. What are we going to see come out of Strata Data, and our BigData NYC? 'Cause remember, this show was always a big hit, but it's very clear from the data on our dashboards, we're seeing all the social data. Microsoft Ignite is going on, and Microsoft Azure, just in the past few years, has burst on the scene. Cloud is sucking the oxygen out of the big data event. Or is it? >> I doubt it was sucking it out of the event, but you know, theCUBE is in, theCUBE is not at Ignite. Where's theCUBE right now? >> John: BigData NYC. >> No, it's here, but it's also at the Splunk show. >> John: That's true. >> And isn't it interesting-- >> John: We're sucking the data out of two events. >> Did a lot of people coming in, exactly. A lot of people coming-- >> We're live streaming in a streaming data kind of-- >> John just said we suck, there's that record saying that. >> We're sucking all the data. >> So we are-- >> We're sharing data. These videos are data-driven. >> Yeah, absolutely, but the point is, ultimately, is that, is that Splunk is an example of a company that's putting forward a service about how you do this and not necessarily a product focus. And a lot of the folks that are coming on theCUBE here are also going on to theCUBE down in Washington D.C., which is where the Splunk show's at. And so I think one of the things, one of the predictions I'll make, is that we're going to hear over the next couple of days more companies talk about their SaaS trash. >> Yeah, I mean I just think, I agree with you, but I also agree with the comments about the technology coming together. And here's one thing I want to throw on the table. I've gotten the sense a few times about connecting the dots on it, we'll put it out publicly for comment right now. The role that communities will play outside of developer, is going to be astronomical. I think we're seeing signals, certainly open-source communities have been around for a long time. They continue to grow shoulders of giants before them. Even these events like O'Reilly, which are a small community that they rely on is now not the only game in town. We're seeing the notion of a community strategy in things like Blockchain, you're seeing it in business, you're seeing people rolling out their recruitment to say, data scientists. You're seeing a community model developing in business, yes or no? >> Yes, but I would say, I would put it this way, John. That it's always been there. The difference is that we're now getting enough experience with things that have occurred, for example, collaboration, communal, communal collaboration in open-source software that people are now saying, and they've developed a bunch of social networking techniques where they can actually analyze how those communities work together, but now they're saying, hmm, I've figured out how to do an assessment analysis understanding that community. I'm going to see if I can take that same concept and apply it over here to how sales works, or how B-to-B engagement works, or how marketing gets conducted, or how sales and marketing work together. And they're discovering that the same way of thinking is actually very fruitful over there. So I totally agree, 100%. >> So they don't rely on other people's version of a community, they can essentially construct their own. >> They are, they are-- >> John: Or enabling their own. >> That's right, they are bringing that approach to thinking about a community-driven business and they're applying it to a lot of new ways, and that's very exciting. >> As the world gets connected with mobile and internet of things as we're seeing, it's one big online community. We're seeing things, I'm writing a post right now, what you could, what B-to-B markets should learn from the fake news problem. And that is content and infrastructure are now contextually tied together. >> Peter: Totally. >> And related. The payload of the fake news is also related to the gamification of the network effect, hence the targeting, hence the weaponization. >> Hey, we wrote the three Cs, we wrote a piece on the three Cs of strategy a year and a half ago. Content, community, context. And at the end of the day, the most important thing to what you're saying about, is that there is, you know, right now people talk about social networking. Social media, you think Facebook. Facebook is a community with a single context, stay in touch with your friends. >> Connections. >> Connections. But what you're really saying is that for the first time we're now going to see an enormous amount of technology being applied to the fullness of all the communities. We're going to see a lot more communities being created with the software, each driven by what content does, creates value, against the context of how it works, where the community's defined in terms of what do we do? >> Let me focus on the fact that bringing, using community as a framework for understanding how the software world is evolving. The software world is evolving towards, I've said this many times in my work about a resurge, the data scientists or data people, data science skills are the core developers in this new era. Now, what is data science all about at its heart? Machine learning, building, and training machine learning models. And so training machine learning models is everything towards making sure that they are fit for their predicted purpose of classification. Training data, where you get all the training data from to feed all, to train all these models? Where do you get all the human resources to label, to do the labeling of the data sets, and so forth, that you need communities, crowdsourcing and whatnot, and you need sustainable communities that can supply the data and the labeling services, and so forth, to be able to sustain the AI and machine learning revolution. So content, creating data and so forth, really rules in this new era, like-- >> The interest in machine learning is at an all-time high, I guess. >> Jim: Yeah, oh yeah, very much so. >> Got it, I agree. I think the social grab, interest grab, value grab is emerging. I think communities, content, context, communities are relevant. I think a lot of things are going to change, and that the scuttlebutt that I'm hearing in this area now is it's not about the big event anymore. It's about the digital component. I think you're seeing people recognize that, but they still want to do the face-to-face. >> You know what, that's right. That's right, they still want, let's put it this way. That there are, that the whole point of community is we do things together. And there are some things that are still easier to do together if we get together. >> But B-to-B marketing, you just can't say, we're not going to do events when there's a whole machinery behind events. Legion batch marketing, we call it. There's a lot of stuff that goes on in that funnel. You can't just say hey, we're going to do a blog post. >> People still need to connect. >> So it's good, but there's some online tools that are happening, so of course. You wanted to say something? >> Yeah, I just want to say one thing. Face to face validates the source of expertise. I don't really fully trust an expert, I can't in my heart engage with them, 'til I actually meet them and figure out in person whether they really do have the goods, or whether they're repurposing some thinking that they got from elsewhere and they gussy it up. So face, there's no substitute for face-to-face to validate the expertise. The expertise that you value enough to want to engage in your solution, or whatever it might be. >> Awesome, I agree. Online activities, the content, we're streaming the data, theCUBE, this is our annual event in New York City. We've got three days of coverage, Tuesday, Wednesday, Thursday, here, theCUBE in Manhattan, right around the corner from Strata Hadoop, the Javits Center of influencers. We're here with the VIPs, with the entrepreneurs, with the CEOs and all the top analysts from WikiBon and around the community. Be there tomorrow all day, day one wrap up is done. Thanks for watching, see you tomorrow. (rippling music)

Published Date : Sep 27 2017

SUMMARY :

Brought to you by SiliconANGLE Media, of the event in context to the massive growth is that the days of failure being measured by of potential buyers that are driving the change. and the keynotes are tomorrow at Strata, is that the productionization of the machine learning is that the skills gap is significant. But the old expression is if you're a hammer, of what I'm talking about. Again, so this is what you're getting at. and that's how you can see the pretenders from the winners. is the idea that ultimately we are going to see And a lot of the clients, a lot of customers from the event to how people are behaving of it, of the value that all these piece parts, And that is the value of industry, So to me that's a freebie. from the marketplace, his customer, of what data resolves. I think it's-- And a lot of other companies not only have the ability for solutions, the user, the buyer's focus To what it can do by the implementation details. is one of the domains where SaaS has moved, Cloud is sucking the oxygen out of the big data event. I doubt it was sucking it out of the event, but you know, Did a lot of people coming in, exactly. We're sharing data. And a lot of the folks that are coming on theCUBE here is now not the only game in town. and apply it over here to how sales works, of a community, they can essentially construct their own. and they're applying it to a lot of new ways, from the fake news problem. hence the targeting, hence the weaponization. And at the end of the day, the most important thing We're going to see a lot more communities being created that can supply the data and the labeling services, is at an all-time high, I guess. and that the scuttlebutt that I'm hearing And there are some things that are still easier to do There's a lot of stuff that goes on in that funnel. that are happening, so of course. The expertise that you value enough to want to engage and around the community.

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Peter Burris	PERSON	0.99+
O'Reilly	ORGANIZATION	0.99+
Jim	PERSON	0.99+
John	PERSON	0.99+
IBM	ORGANIZATION	0.99+
O'Reilly Media	ORGANIZATION	0.99+
Manhattan	LOCATION	0.99+
2017	DATE	0.99+
John Furrier	PERSON	0.99+
New York City	LOCATION	0.99+
Peter	PERSON	0.99+
Washington D.C.	LOCATION	0.99+
New York	LOCATION	0.99+
tomorrow	DATE	0.99+
five years	QUANTITY	0.99+
two events	QUANTITY	0.99+
100%	QUANTITY	0.99+
Cloudera	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
first time	QUANTITY	0.99+
today	DATE	0.99+
Wednesday	DATE	0.99+
a year and a half ago	DATE	0.99+
Thursday	DATE	0.99+
one	QUANTITY	0.99+
Spark Summit	EVENT	0.99+
three days	QUANTITY	0.99+
Tuesday	DATE	0.98+
Javits Center	LOCATION	0.98+
Splunk	ORGANIZATION	0.98+
Paxata	ORGANIZATION	0.98+
Facebook	ORGANIZATION	0.98+
next year	DATE	0.97+
this year	DATE	0.97+
SaaS	TITLE	0.97+
day one	QUANTITY	0.96+
NYC	LOCATION	0.96+
first	QUANTITY	0.96+
one thing	QUANTITY	0.96+
WikiBon	ORGANIZATION	0.95+
one show	QUANTITY	0.94+
one shows	QUANTITY	0.94+
BigData	ORGANIZATION	0.94+
Many years ago	DATE	0.93+
Strata	LOCATION	0.93+
Strata Hadoop	LOCATION	0.92+
each	QUANTITY	0.91+
three Cs	QUANTITY	0.9+
Javits Center	ORGANIZATION	0.89+
midtown Manhattan	LOCATION	0.88+
theCUBE	ORGANIZATION	0.87+
Strata	TITLE	0.87+
past few years	DATE	0.87+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Paxata: