Nenshad Bardoliwalla, DataRobot | AWS re:Invent 2021

>>Welcome back everybody to AWS reinvent. You're watching the cube, the leader in high tech coverage. My name is Dave Volante with my co-host David Nicholson. We're here all week. We got two sets, 20 plus thousand people here live at AWS reinvent. 21 of course last year was virtual. We got a hybrid event running. We had two studios running before the show running. A lot of pre-records really excited to have ninja Bardelli Walla, who is the chief product officer at data robot. Really interesting AI company. We're going to talk about insights with machine intelligence and then shout. It's great to see you again. It's been awhile. >>Great to see you as well. And I'm so happy to be on the cube. I think eight years since I first came on. >>When you launched the company that you founded back then Peck Sada on the cube, that was part >>Of the inner robot >>Family part of data, robot family. And of course, friend of the cube. Chris Lynch is the executive chairman of data robot. So a lot of connections, I always joke a hundred people in our industry, 99 seats, but tell us about data robot. What's the, what's the scoop these days. >>Thanks. Thanks very much for the opportunity to speak with both of you. Uh, I think we're seeing some very interesting trends. Uh, we've all been in the industry long enough to recognize, uh, that hype cycles they're cycles. They go in waves and, uh, the level of interest in AI has never been higher. Uh, every company in the world is looking for the opportunity to take advantage of AI, to improve their business processes, whether it's to improve their revenue it's to lower their cost profile or it's to lower their risk. What we're seeing that's most interesting is that, uh, we spend a lot of time working with companies on what we consider applied AI. That is how do we solve real business problems, uh, with the technology and not just run a bunch of experiments. You know, it's very tempting for a lot of us, Dave and David, uh, to, to do, uh, you know, spin up a spark cluster with 10,000 nodes and slosh a bunch of data through it. >>But the question we always ask at data robot is what is the business value of doing this? Why are we using these AI techniques and in order to solve what problem? So the biggest trend we see a data robot and one that we feel we're very well positioned to solve is that companies are coming out of that experimental phase. There's still a lot of experimentation going on and they're saying, okay, we, we stood up a cluster. Uh, we got a bunch of Python notebooks running around here, but we haven't really seen a return on our investment yet data robot, can you help us actually make AI real and concrete in terms of achieving a specific business outcome for us? >>Well, and I want to test something on your niche. That's something we've talked about a lot on the cube is a change in the way in which companies are architecting their data. When we first, it was like, okay, create a Hadoop cluster. And that spark came along to make that easier, but it was still this highly technical, highly centralized, hyper specialized roles where the business, people who have a really good understanding of the outcome had to kind of beg to get what they wanted because it was so technical and the success was defined as, Hey, it worked or we ran the experiment and it looks like it has promise. So now it seems like with companies like data robot, you're democratizing AI, allowing organizations to inject AI into their business processes, their applications. And it seems to be more business led. One of you could comment on that. >>I think that is a various dude observation. Uh, we launched this concept a little bit earlier this year of AI cloud. And the idea behind AI cloud is if you want to democratize AI, which is in fact has been DataRobot's vision since 2012, we were the first company on the cloud. The first AI cloud that ever existed was data robots in 2014. And the entire idea was that we knew that data scientists would always play a very important role in an organization, but yet the demand for AI would vastly outstrip the supply. And so in order to solve that challenge, we built AI cloud. We've actually spent over a million engineering hours in building this technology over the, over the last decade and put this together in a way where all of the different personas and the organizations, you have people who create AI applications. >>Those are the folks we usually think about, but those are the data scientists. Those are the analysts, those are the data engineers, but then you actually have to put it into production. You've got to run the system. So you also have to democratize this capability for the folks who are going to operate the system for the folks in risk and compliance. We're actually going to, uh, ensure that the system is operating in accordance with your policies and compliance regimes. And then the third wave of democratization, which we've just embarked on is then how do you bring AI into the hands of the actual business people? How do you put on a mobile device or a web browser, or in context, in an application with the decision, the ability for AI to drive a decision in your organization, which leads to an action, which helps drive you towards the outcome you're trying to optimize for. >>So AI cloud is about this pervasive tapestry, bringing together the creators, the consumers, the individuals who operate these systems into a single system that can lower the barrier to entry for people who don't have the skills, but allow you to plug in and go deep underneath the covers and modify whatever you need to, if you have that level of technical skill and that ability for us to kind of slide, slide the slider in one direction or the other, I could slide it to the right and say, I want all automation, something data robot has pioneered and is absolutely the leader in, but we can also, especially in these last couple of years, say, I want to be able to use as much code as I want to bring in. And the beauty of the model is that customers can choose how much they want to let the machine drive or how much they want to let the human being drive. David. I love that, >>That idea of a slider, because now you're talking about generalists getting access to really powerful tools. >>Yeah, no, exactly. And I, I'm curious, what's your view on where we are culturally with AI at this point? And what I mean by culturally is the idea that, okay, that's great. You put powerful tools in the hands of business users. Um, do most of us still need to have a lot of visibility under the covers to understand the inner workings so that we trust what we're being told? You know, I'm fine pulling a lever and having a little biscuit come out of SWOT as long as I've gotten a tour of the kitchen at some point in time. Yes. I mean, where are we with that? Where where's the level of >>Absolutely fantastic question and it's one that's, it's actually pervasive to the way data robot operates. So trust gets, uh, engendered by multiple different capabilities that you build throughout the platform. The first one is around, uh, explainability. So when you get a prediction from a system, just like you mentioned, you know, if, if the stakes are not very high, you know, you, uh, we're here in Las Vegas, of course I'm thinking of slot machines. If you get a biscuit at the end of it and it tastes pretty good. Hey, great. Right? When you're making a mission critical business decision, you don't want to be in the position where you don't understand why the system is making the decision. It does. So we have historically invested an enormous amount of effort in explainability tools, having the system actually at a prediction level, explain to you, why is it making the recommendation it's making? >>For example, the system says this customer has a high likelihood of churn. Why? Because their account balance has been declining over the last five months. Uh, number two, because their credit score has been going down. And what gives you the trust is actually the machine and the human able to communicate in the same language and same vernacular about the business value. So that's one part of it. The second part is about transparency, right? So one of the things that the automated machine learning movement, that data robot pioneered, uh, has been, I'd say rightfully criticized for frankly, is that it's too much of a black box. It's too much magic. I load my dataset. I press the start button and data robot does everything else for me. Well, that's not very satisfying when you have a 10 or a hundred million dollar decision coming on the other side, even if the technology is actually doing the job correctly, which data robot usually does. >>So where we've morphed and evolved our position in the market and where I have driven our technology portfolio at data robot is to say, you know what? There is a very important aspect of trust that needs to be brought to bear here, which is that if somebody wants to see code, let them see code. And in fact, the beauty of AI cloud is that on the same platform, the people who don't like code, but are, are very good at understanding the business domain con uh, the business domain knowledge and the context. They now have the ability to do that. But when they're at the stage before they're going to deploy anything to production. Now you can raise your hand at data robot and actually use our workflow and say, I need a coder to review this. I want the professional data scientist who has all this knowledge who understands and has read up on the latest advances in hyper parameter tuning to look at the model and tell me that this is going to be okay. And so we allow both the less technical folks and the very deep technical data scientists, the ability to collaborate on the same environment, which allows you to build trust in terms of the human side of, Hey, I don't want to just let anybody throw a model into production. I like, >>I mean, I see those, the transparency and the explainability is almost two sides of the same coin, right? Because you know, if you're gonna be accused of gender bias, you can say, no, here's how the system may, it's not like, you know, you think about the internet. It tells you it's a cat, but you don't really know how the machine determined that you're breaking apart, blowing away that black box. And the other thing I like what you said was you have data producers and data consumers, and you also talked about context because a lot of times the data producers, they don't necessarily care about the context or the PI data pipeline. People necessarily care about the context. So, okay. So now we're at the point where you're democratizing data, you're doing some great work. What are some of the blockers that you see today that you're obliterating with data robot? Maybe you could talk about that a little bit. Sure. >>So, so I think, uh, you know, one very important concept is that, uh, in a democracy, we talked about democratization. You still have rules, you still have governance. It's not a free for all the free for all version of that is called NRG. That's not what any company wants, right? So we have to blend the freedom and flexibility that we want businesses to have with the compliance and regulatory observability that we need in order to be successful. So what we're seeing in, in our, in our customer base and what companies are coming to data robot to discuss is, okay, we've tried these experiments. Now we want to actually get to real business value. And one of the things that's really unique about data robot is that we have put, uh, we have, we've worked in our system on over 1 million projects, training models, inside data robot. >>We have seen every type of use case across different industries, whether it's healthcare or manufacturing, uh, or, or retail, uh, we have the ability to understand those different data sets and actually to come up with models. So we have that breadth of information there if you aggregate that over time, right? So again, we did not come to AI. This is not a fad for us. We didn't start as one kind of company than slap the AI label on and say, Hey, we're an AI company now, right? We have been AI native since day one. And in that process, what we have found is working on these, this million plus projects on these data sets across these industries, we have a very good sense of which projects will actually deliver value and which don't. And that gets to a previous point that you were making, which is that you have to know and partner with an organization who it's not just about the technology. So we have fantastic people who we call our customer facing data scientists who will tell the customer, look, I know you think this is a really high value use case, but we've tried it at other customers. And unfortunately it didn't work very well. Let's steer you, cause you need with a, with a technology that is largely at the early stage and the maturity that organizations have with it, you need to help them in order to deliver success. And no vendor has delivered more successful production deployment of AI than data road. >>No, don't go down that path. It's a dead end as a cul-de-sac. So just avoid it. So we talked about transparency, explainability governance. Can you get that to the point where it's self-serve as you, as you put data in the hands of business, people where the context lives, the domain experts, can you get to self-serve and federate that governance? Yes. >>So you can, uh, that's one of the key principles of what we, what we do at data robot. And it comes back to a concept that I learned, uh, you, you both will remember. We were in the Sarbanes-Oxley crazy world of, I dunno, was that 15 years of saved data warehousing. >>Everybody wanted to talk about socks. You know, my wife would hear me on the phone. She'd be like, what is your sudden obsession with socks? I'm like, no, no, it's not what you fit. And so, um, but what came from Sarbanes Oxley are, are these, uh, longstanding principles around the segregation of duties and segregation of responsibilities. You can have democracy democratization with governance, if you have the right segregation of duties. So for example, I have somebody who can generate lots of different models, right? But I don't allow them to, to, uh, in a self-service way, just deploy into production. I actually have a workflow system which will go through multiple rigorous approvals and say, these three people have signed off, they've done an audit, uh, an, an audit assessment of this model. It's good to go, let's go and drop it into production. So the way that you get to self-service with governance is to have the right controls and policies and frameworks that surround the self-service model with the right checks and balances that implement the segregation of duties I'm talking >>And you get that right. And then you can automate it and then you can really scale, right? You gotta have your back because it's such a great topic. We, we barely scratched the surface. It was great to see you again, congratulations on all the success. And, uh, as I say any time, let's do this again. Fantastic. Thank >>You so much. All right, you're welcome. And thank you for watching you watching the cubes coverage of AWS reinvent 2021, Dave Volante for David Nicholson. Keep it right there. You're watching the cube, the leader in high-tech coverage.

Published Date : Dec 2 2021

SUMMARY :

It's great to see you again. Great to see you as well. And of course, friend of the cube. Dave and David, uh, to, to do, uh, you know, spin up a spark cluster with 10,000 So the biggest trend we see a data robot and one that we feel we're very well positioned to the outcome had to kind of beg to get what they wanted because it was so And the idea behind AI cloud is if you want So you also have to democratize this capability for the folks who are going to operate the system that can lower the barrier to entry for people who don't have the skills, That idea of a slider, because now you're talking about generalists getting access to really the inner workings so that we trust what we're being told? So when you get a prediction from a system, just like you mentioned, you know, if, if the stakes are not very high, And what gives you the trust is actually the same environment, which allows you to build trust in terms of the human side of, And the other thing I like what you said And one of the things that's really unique about data robot is that we have put, the maturity that organizations have with it, you need to help them in order to deliver success. people where the context lives, the domain experts, can you get to self-serve and federate that governance? And it comes back to a concept that I learned, uh, you, you both will remember. So the way that you get to self-service And then you can automate it and then you can really scale, right? And thank you for watching you watching the cubes coverage of AWS reinvent 2021,

ENTITIES

Entity	Category	Confidence
David Nicholson	PERSON	0.99+
2014	DATE	0.99+
Dave Volante	PERSON	0.99+
Chris Lynch	PERSON	0.99+
David	PERSON	0.99+
Dave	PERSON	0.99+
Las Vegas	LOCATION	0.99+
15 years	QUANTITY	0.99+
99 seats	QUANTITY	0.99+
second part	QUANTITY	0.99+
one part	QUANTITY	0.99+
two sets	QUANTITY	0.99+
two studios	QUANTITY	0.99+
2012	DATE	0.99+
last year	DATE	0.99+
Nenshad Bardoliwalla	PERSON	0.99+
three people	QUANTITY	0.99+
Bardelli Walla	PERSON	0.99+
10	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
10,000 nodes	QUANTITY	0.99+
Python	TITLE	0.99+
first	QUANTITY	0.99+
both	QUANTITY	0.99+
first one	QUANTITY	0.99+
eight years	QUANTITY	0.98+
one	QUANTITY	0.98+
today	DATE	0.98+
one direction	QUANTITY	0.97+
over a million	QUANTITY	0.97+
20 plus thousand people	QUANTITY	0.96+
DataRobot	ORGANIZATION	0.96+
One	QUANTITY	0.95+
first company	QUANTITY	0.95+
over 1 million projects	QUANTITY	0.95+
21	QUANTITY	0.92+
last decade	DATE	0.9+
Sarbanes	LOCATION	0.9+
two sides	QUANTITY	0.89+
one kind	QUANTITY	0.89+
single system	QUANTITY	0.89+
third wave of democratization	EVENT	0.87+
million plus projects	QUANTITY	0.87+
NRG	ORGANIZATION	0.86+
earlier this year	DATE	0.85+
a hundred people	QUANTITY	0.83+
a hundred million dollar	QUANTITY	0.81+
Invent	EVENT	0.74+
last couple of years	DATE	0.73+
last five months	DATE	0.67+
number two	QUANTITY	0.66+
Peck Sada	ORGANIZATION	0.6+
2021	DATE	0.6+
Sarbanes Oxley	ORGANIZATION	0.59+
AWS reinvent	EVENT	0.57+
concept	QUANTITY	0.55+
day one	QUANTITY	0.53+
Oxley	LOCATION	0.36+

Nenshad Bardoliwalla & Stephanie McReynolds | BigData NYC 2017

>> Live from midtown Manhattan, it's theCUBE covering Big Data New York City 2017. Brought to you by Silicon Angle Media and its ecosystem sponsors. (upbeat techno music) >> Welcome back, everyone. Live here in New York, Day Three coverage, winding down for three days of wall to wall coverage theCUBE covering Big Data NYC in conjunction with Strata Data, formerly Strata Hadoop and Hadoop World, all part of the Big Data ecosystem. Our next guest is Nenshad Bardoliwalla Co-Founder and Chief Product Officer of Paxata, hot start up in the space. A lot of kudos. Of course, they launched on theCUBE in 2013 three years ago when we started theCUBE as a separate event from O'Reilly. So, great to see the success. And Stephanie McReynolds, you've been on multiple times, VP of Marketing at Alation. Welcome back, good to see you guys. >> Thank you. >> Happy to be here. >> So, winding down, so great kind of wrap-up segment here in addition to the partnership that you guys have. So, let's first talk about before we get to the wrap-up of the show and kind of bring together the week here and kind of summarize everything. Tell about your partnership you guys have. Paxata, you guys have been doing extremely well. Congratulations. Prakash was talking on theCUBE. Great success. You guys worked hard for it. I'm happy for you. But partnering is everything. Ecosystem is everything. Alation, their collaboration with data. That's there ethos. They're very user-centric. >> Nenshad: Yes. >> From the founders. Seemed like a good fit. What's the deal? >> It's a very natural fit between the two companies. When we started down the path of building new information management capabilities it became very clear that the market had strong need for both finding data, right? What do I actually have? I need an inventory, especially if my data's in Amazon S3, my data is in Azure Blob storage, my data is on-premise in HDFS, my data is in databases, it's all over the place. And I need to be able to find it. And then once I find it, I want to be able to prepare it. And so, one of the things that really drove this partnership was the very common interests that both companies have. And number one, pushing user experience. I love the Alation product. It's very easy to use, it's very intuitive, really it's a delightful thing to work with. And at the same time they also share our interests in working in these hybrid multicloud environments. So, what we've done and what we announced here at Strata is actually this bi-directional integration between the products. You can start in Alation and find a data set that you want to work with, see what collaboration or notes or business metadata people have created and then say, I want to go see this in Paxata. And in a single click you can then actually open it up in Paxata and profile that data. Vice versa you can also be in Paxata and prepare data, and then with a single click push it back, and then everybody who works with Alation actually now has knowledge of where that data is. So, it's a really nice synergy. >> So, you pushed the user data back to Alation, cause that's what they care a lot about, the cataloging and making the user-centric view work. So, you provide, it's almost a flow back and forth. It's a handshake if you will to data. Am I getting that right? >> Yeah, I mean, the idea's to keep the analyst or the user of that data, data scientist, even in some cases a business user, keep them in the flow of their work as much as possible. But give them the advantage of understanding what others in the organization have done with that data prior and allow them to transform it, and then share that knowledge back with the rest of the community that might be working with that data. >> John: So, give me an example. I like your Excel spreadsheet concept cause that's obvious. People know what Excel spreadsheet is so. So, it's Excel-like. That's an easy TAM to go after. All Microsoft users might not get that Azure thing. But this one, just take me through a usecase. >> So, I've got a good example. >> Okay, take me through. >> It's very common in a data lake for your data to be compressed. And when data's compressed, to a user it looks like a black box. So, if the data is compressed in Avro or Parquet or it's even like JSON format. A business user has no idea what's in that file. >> John: Yeah. >> So, what we do is we find the file for them. It may have some comments on that file of how that data's been used in past projects that we infer from looking at how others have used that data in Alation. >> John: So, you put metadata around it. >> We put a whole bunch of metadata around it. It might be comments that people have made. It might be >> Annotations, yeah. >> actual observations, annotations. And the great thing that we can do with Paxata is open that Avro file or Parquet file, open it up so that you can actually see the data elements themselves. So, all of a sudden, the business user has access without having to use a command line utility or understand anything about compression, and how you open that file up-- >> John: So, as Paxata spitting out there nuggets of value back to you, you're kind of understanding it, translating it to the user. And they get to do their thing, you get to do your thing, right? >> It's making a Avro or a Parquet file as easy to use as Excel, basically. Which is great, right? >> It's awesome. >> Now, you've enabled >> a whole new class of people who can use that. >> Well, and people just >> Get turned off when it's anything like jargon, or like, "What is that? I'm afraid it's phishing. Click on that and oh!" >> Well, the scary thing is that in a data lake environment, in a lot of cases people don't even label the files with extensions. They're just files. (Stephanie laughs) So, what started-- >> It's like getting your pictures like DS, JPEG. It's like what? >> Exactly. >> Right. >> So, you're talking about unlabeled-- >> If you looked on your laptop, and if you didn't have JPEG or DOC or PPT. Okay, I don't know that this file is. Well, what you have in the data lake environment is that you have thousands of these files that people don't really know what they are. And so, with Alation we have the ability to get all the value around the curation of the metadata, and how people are using that data. But then somebody says, "Okay, but I understand that this file exists. What's in it?" And then with Click to Profile from Alation you're immediately taken into Paxata. And now you're actually looking at what's in that file. So, you can very quickly go from this looks interesting to let me understand what's inside of it. And that's very powerful. >> Talk about Alation. Cause I had the CEO on, also their lead investor Greg Sands from Costanoa Ventures. They're a pretty amazing team but it's kind of out there. No offense, it's kind of a compliment actually. (Stephanie laughs) >> They got a symbolic >> Stephanie: Keep going. system Stanford guy, who's like super-smart. >> Nenshad: Yeah. >> They're on something that's really unique but it's almost too simple to be. Like, wait a minute! Google for the data, it's an awesome opportunity. How do you describe Alation to people who say, "Hey, what's this Alation thing?" >> Yeah, so I think that the best way to describe it is it's the browser for all of the distributed data in the enterprise. Sorry, so it's both the catalog, and the browser that sits on top of it. It sounds very simple. Conceptually it's very simple but they have a lot of richness in what they're able to do behind the scenes in terms of introspecting what type of work people are doing with data, and then taking that knowledge and actually surfacing it to the end user. So, for example, they have very powerful scenarios where they can watch what people are doing in different data sources, and then based on that information actually bubble up how queries are being used or the different patterns that people are doing to consume data with. So, what we find really exciting is that this is something that is very complex under the covers. Which Paxata is as well being built upon Spark. But they have put in the hard engineering work so that it looks simple to the end user. And that's the exact same thing that we've tried to do. >> And that's the hard problem. Okay, Stephanie back ... That was a great example by the way. Can't wait to have our little analyst breakdown of the event. But back to Alation for you. So, how do you talk about, you've been VP of Marketing of Alation. But you've been around the block. You know B2B, tech, big data. So, you've seen a bunch of different, you've worked at Trifacta, you worked at other companies, and you've seen a lot of waves of innovation come. What's different about Alation that people might not know about? How do you describe the difference? Because it sounds easy, "Oh, it's a browser! It's a catalog!" But it's really hard. Is it the tech that's the secret? Is it the approach? How do you describe the value of Alation? I think what's interesting about Alation is that we're solving a problem that since the dawn of the data warehouse has not been solved. And that is how to help end users really find and understand the data that they need to do their jobs. A lot of our customers talk about this-- >> John: Hold on. Repeat that. Cause that's like a key thing. What problem hasn't been solved since the data warehouse? >> To be able to actually find and fully understand, understand to the point of trust the data that you want to use for your analysis. And so, in the world of-- >> John: That sounds so simple. >> Stephanie: In the world of data warehousing-- >> John: Why is it so hard? >> Well, because in the world of data warehousing business people were told what data they should use. Someone in IT decided how to model the data, came up with a KPR calculation, and told you as a business person, you as a CEO, this is how you're going to monitor you business. >> John: Yeah. >> What business person >> Wants to be told that by an IT guy, right? >> Well, it was bounded by IT. >> Right. >> Expression and discovery >> Should be unbounded. Machine learning can take care of a lot of bounded stuff. I get that. But like, when you start to get into the discovery side of it, it should be free. >> Well, no offense to the IT team, but they were doing their best to try to figure out how to make this technology work. >> Well, just look at the cost of goods sold for storage. I mean, how many EMC drives? Expensive! IT was not cheap. >> Right. >> Not even 10, 15, 20 years ago. >> So, now when we have more self-service access to data, and we can have more exploratory analysis. What data science really introduced and Hadoop introduced was this ability on-demand to be able to create these structures, you have this more iterative world of how you can discover and explore datasets to come to an insight. The only challenge is, without simplifying that process, a business person is still lost, right? >> John: Yeah. >> Still lost in the data. >> So, we simply call that a catalog. But a catalog is much more-- >> Index, catalog, anthology, there's other words for it, right? >> Yeah, but I think it's interesting because like a concept of a catalog is an inventory has been around forever in this space. But the concept of a catalog that learns from other's behavior with that data, this concept of Behavior I/O that Aaron talked about earlier today. The fact that behavior of how people query data as an input and that input then informs a recommendation as an output is very powerful. And that's where all the machine learning and A.I. comes to work. It's hidden underneath that concept of Behavior I/O but that's there real innovation that drives this rich catalog is how can we make active recommendations to a business person who doesn't have to understand the technology but they know how to apply that data to making a decision. >> Yeah, that's key. Behavior and textual information has always been the two fly wheels in analysis whether you're talking search engine or data in general. And I think what I like about the trends here at Big Data NYC this weekend. We've certainly been seeing it at the hundreds of CUBE events we've gone to over the past 12 months and more is that people are using data differently. Not only say differently, there's baselining, foundational things you got to do. But the real innovators have a twist on it that give them an advantage. They see how they can use data. And the trend is collective intelligence of the customer seems to be big. You guys are doing it. You're seeing patterns. You're automating the data. So, it seems to be this fly wheel of some data, get some collective data. What's your thoughts and reactions. Are people getting it? Is this by people doing it by accident on purpose kind of thing? Did people just fell on their head? Or you see, "Oh, I just backed into this?" >> I think that the companies that have emerged as the leaders in the last 15 or 20 years, Google being a great example, Amazon being a great example. These are companies whose entire business models were based on data. They've generated out-sized returns. They are the leaders on the stock market. And I think that many companies have awoken to the fact that data as a monetizable asset to be turned into information either for analysis, to be turned into information for generating new products that can then be resold on the market. The leading edge companies have figured that out, and our adopting technologies like Alation, like Paxata, to get a competitive advantage in the business processes where they know they can make a difference inside of the enterprise. So, I don't think it's a fluke at all. I think that most of these companies are being forced to go down that path because they have been shown the way in terms of the digital giants that are currently ruling the enterprise tech world. >> All right, what's your thoughts on the week this week so far on the big trends? What are obvious, obviously A.I., don't need to talk about A.I., but what were the big things that came out of it? And what surprised you that didn't come out from a trends standpoint buzz here at Strata Data and Big Data NYC? What were the big themes that you saw emerge and didn't emerge what was the surprise? Any surprises? >> Basically, we're seeing in general the maturation of the market finally. People are finally realizing that, hey, it's not just about cool technology. It's not about what distribution or package. It's about can you actually drive return on investment? Can you actually drive insights and results from the stack? And so, even the technologists that we were talking with today throughout the course of the show are starting to talk about it's that last mile of making the humans more intelligent about navigating this data, where all the breakthroughs are going to happen. Even in places like IOT, where you think about a lot of automation, and you think about a lot of capability to use deep learning to maybe make some decisions. There's still a lot of human training that goes into that decision-making process and having agency at the edge. And so I think this acknowledgement that there should be balance between human input and what the technology can do is a nice breakthrough that's going to help us get to the next level. >> What's missing? What do you see that people missed that is super-important, that wasn't talked much about? Is there anything that jumps out at you? I'll let you think about it. Nenshad, you have something now. >> Yeah, I would say I completely agree with what Stephanie said which we are seeing the market mature. >> John: Yeah. >> And there is a compelling force to now justify business value for all the investments people have made. The science experiment phase of the big data world is over. People now have to show a return on that investment. I think that being said though, this is my sort of way of being a little more provocative. I still think there's way too much emphasis on data science and not enough emphasis on the average business analyst who's doing work in the Fortune 500. >> It should be kind of the same thing. I mean, with data science you're just more of an advanced analyst maybe. >> Right. But the idea that every person who works with data is suddenly going to understand different types of machine learning models, and what's the right way to do hyper parameter tuning, and other words that I could throw at you to show that I'm smart. (laughter) >> You guys have a vision with the Excel thing. I could see how you see that perspective because you see a future. I just think we're not there yet because I think the data scientists are still handcuffed and hamstrung by the fact that they're doing too much provisioning work, right? >> Yeah. >> To you're point about >> surfacing the insights, it's like the data scientists, "Oh, you own it now!" They become the sysadmin, if you will, for their department. And it's like it's not their job. >> Well, we need to get them out of data preparation, right? >> Yeah, get out of that. >> You shouldn't be a data scientist-- >> Right now, you have two values. You've got the use interface value, which I love, but you guys do the automation. So, I think we're getting there. I see where you're coming from, but still those data sciences have to set the tone for the generation, right? So, it's kind of like you got to get those guys productive. >> And it's not a .. Please go ahead. >> I mean, it's somewhat interesting if you look at can the data scientist start to collaborate a little bit more with the common business person? You start to think about it as a little bit of scientific inquiry process. >> John: Yeah. >> Right? >> If you can have more innovators around the table in a common place to discuss what are the insights in this data, and people are bringing business perspective together with machine learning perspective, or the knowledge of the higher algorithms, then maybe you can bring those next leaps forward. >> Great insight. If you want my observations, I use the crazy analogy. Here's my crazy analogy. Years it's been about the engine Model T, the car, the horse and buggy, you know? Now, "We got an engine in the car!" And they got wheels, it's got a chassis. And so, it's about the apparatus of the car. And then it evolved to, "Hey, this thing actually drives. It's transportation." You can actually go from A to B faster than the other guys, and people still think there's a horse and buggy market out there. So, they got to go to that. But now people are crashing. Now, there's an art to driving the car. >> Right. >> So, whether you're a sports car or whatever, this is where the value piece I think hits home is that, people are driving the data now. They're driving the value proposition. So, I think that, to me, the big surprise here is how people aren't getting into the hype cycle. They like the hype in terms of lead gen, and A.I., but they're too busy for the hype. It's like, drive the value. This is not just B.S. either, outcomes. It's like, "I'm busy. I got security. I got app development." >> And I think they're getting smarter about how their valuing data. We're starting to see some economic models, and some ways of putting actual numbers on what impact is this data having today. We do a lot of usage analysis with our customers, and looking at they have a goal to distribute data across more of the organization, and really get people using it in a self-service manner. And from that, you're being able to calculate what actually is the impact. We're not just storing this for insurance policy reasons. >> Yeah, yeah. >> And this cheap-- >> John: It's not some POC. Don't do a POC. All right, so we're going to end the day and the segment on you guys having the last word. I want to phrase it this way. Share an anecdotal story you've heard from a customer, or a prospective customer, that looked at your product, not the joint product but your products each, that blew you away, and that would be a good thing to leave people with. What was the coolest or nicest thing you've heard someone say about Alation and Paxata? >> For me, the coolest thing they said, "This was a social network for nerds. I finally feel like I've found my home." (laughter) >> Data nerds, okay. >> Data nerds. So, if you're a data nerd, you want to network, Alation is the place you want to be. >> So, there is like profiles? And like, you guys have a profile for everybody who comes in? >> Yeah, so the interesting thing is part of our automation, when we go and we index the data sources we also index the people that are accessing those sources. So, you kind of have a leaderboard now of data users, that contract one another in system. >> John: Ooh. >> And at eBay leader was this guy, Caleb, who was their data scientist. And Caleb was famous because everyone in the organization would ask Caleb to prepare data for them. And Caleb was like well known if you were around eBay for awhile. >> John: Yeah, he was the master of the domain. >> And then when we turned on, you know, we were indexing tables on teradata as well as their Hadoop implementation. And all of a sudden, there are table structures that are Caleb underscore cussed. Caleb underscore revenue. Caleb underscore ... We're like, "Wow!" Caleb drove a lot of teradata revenue. (Laughs) >> Awesome. >> Paxata, what was the coolest thing someone said about you in terms of being the nicest or coolest most relevant thing? >> So, something that a prospect said earlier this week is that, "I've been hearing in our personal lives about self-driving cars. But seeing your product and where you're going with it I see the path towards self-driving data." And that's really what we need to aspire towards. It's not about spending hours doing prep. It's not about spending hours doing manual inventories. It's about getting to the point that you can automate the usage to get to the outcomes that people are looking for. So, I'm looking forward to self-driving information. Nenshad, thanks so much. Stephanie from Alation. Thanks so much. Congratulations both on your success. And great to see you guys partnering. Big, big community here. And just the beginning. We see the big waves coming, so thanks for sharing perspective. >> Thank you very much. >> And your color commentary on our wrap up segment here for Big Data NYC. This is theCUBE live from New York, wrapping up great three days of coverage here in Manhattan. I'm John Furrier. Thanks for watching. See you next time. (upbeat techo music)

Published Date : Oct 3 2017

SUMMARY :

Brought to you by Silicon Angle Media and Hadoop World, all part of the Big Data ecosystem. in addition to the partnership that you guys have. What's the deal? And so, one of the things that really drove this partnership So, you pushed the user data back to Alation, Yeah, I mean, the idea's to keep the analyst That's an easy TAM to go after. So, if the data is compressed in Avro or Parquet of how that data's been used in past projects It might be comments that people have made. And the great thing that we can do with Paxata And they get to do their thing, as easy to use as Excel, basically. a whole new class of people Click on that and oh!" the files with extensions. It's like getting your pictures like DS, JPEG. is that you have thousands of these files Cause I had the CEO on, also their lead investor Stephanie: Keep going. Google for the data, it's an awesome opportunity. And that's the exact same thing that we've tried to do. And that's the hard problem. What problem hasn't been solved since the data warehouse? the data that you want to use for your analysis. Well, because in the world of data warehousing But like, when you start to get into to the IT team, but they were doing Well, just look at the cost of goods sold for storage. of how you can discover and explore datasets So, we simply call that a catalog. But the concept of a catalog that learns of the customer seems to be big. And I think that many companies have awoken to the fact And what surprised you that didn't come out And so, even the technologists What do you see that people missed the market mature. in the Fortune 500. It should be kind of the same thing. But the idea that every person and hamstrung by the fact that they're doing They become the sysadmin, if you will, So, it's kind of like you got to get those guys productive. And it's not a .. can the data scientist start to collaborate or the knowledge of the higher algorithms, the car, the horse and buggy, you know? So, I think that, to me, the big surprise here is across more of the organization, and the segment on you guys having the last word. For me, the coolest thing they said, Alation is the place you want to be. Yeah, so the interesting thing is if you were around eBay for awhile. And all of a sudden, there are table structures And great to see you guys partnering. See you next time.

ENTITIES

Entity	Category	Confidence
Stephanie	PERSON	0.99+
Stephanie McReynolds	PERSON	0.99+
Greg Sands	PERSON	0.99+
John	PERSON	0.99+
Caleb	PERSON	0.99+
John Furrier	PERSON	0.99+
Nenshad	PERSON	0.99+
New York	LOCATION	0.99+
Prakash	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Aaron	PERSON	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
2013	DATE	0.99+
thousands	QUANTITY	0.99+
Costanoa Ventures	ORGANIZATION	0.99+
Manhattan	LOCATION	0.99+
two companies	QUANTITY	0.99+
both companies	QUANTITY	0.99+
Excel	TITLE	0.99+
Trifacta	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Strata Data	ORGANIZATION	0.99+
Alation	ORGANIZATION	0.99+
Paxata	ORGANIZATION	0.99+
Nenshad Bardoliwalla	PERSON	0.99+
eBay	ORGANIZATION	0.99+
three days	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
two values	QUANTITY	0.99+
NYC	LOCATION	0.99+
hundreds	QUANTITY	0.99+
Big Data	ORGANIZATION	0.99+
first	QUANTITY	0.99+
one	QUANTITY	0.99+
both	QUANTITY	0.99+
Strata Hadoop	ORGANIZATION	0.99+
Hadoop World	ORGANIZATION	0.99+
earlier this week	DATE	0.98+
Paxata	PERSON	0.98+
today	DATE	0.98+
Day Three	QUANTITY	0.98+
Parquet	TITLE	0.96+
three years ago	DATE	0.96+

Nenshad Bardoliwalla & Pranav Rastogi | BigData NYC 2017

>> Announcer: Live from Midtown Manhattan it's theCUBE. Covering Big Data New York City 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. >> OK, welcome back everyone we're here in New York City it's theCUBE's exclusive coverage of Big Data NYC, in conjunction with Strata Data going on right around the corner. It's out third day talking to all the influencers, CEO's, entrepreneurs, people making it happen in the Big Data world. I'm John Furrier co-host of theCUBE, with my co-host here Jim Kobielus who is the Lead Analyst at Wikibon Big Data. Nenshad Bardoliwalla. >> Bar-do-li-walla. >> Bardo. >> Nenshad Bardoliwalla. >> That guy. >> Okay, done. Of Paxata, Co-Founder & Chief Product Officer it's a tongue twister, third day, being from Jersey, it's hard with our accent, but thanks for being patient with me. >> Happy to be here. >> Pranav Rastogi, Product Manager, Microsoft Azure. Guys, welcome back to theCUBE, good to see you. I apologize for that, third day blues here. So Paxata, we had your partner on Prakash. >> Prakash. >> Prakash. Really a success story, you guys have done really well launching theCUBE fun to watch you guys from launching to the success. Obviously your relationship with Microsoft super important. Talk about the relationship because I think this is really people can start connecting the dots. >> Sure, maybe I'll start and I'LL be happy to get Pranav's point of view as well. Obviously Microsoft is one of the leading brands in the world and there are many aspects of the way that Microsoft has thought about their product development journey that have really been critical to the way that we have thought about Paxata as well. If you look at the number one tool that's used by analysts the world over it's Microsoft Excel. Right, there isn't even anything that's a close second. And if you look at the the evolution of what Microsoft has done in many layers of the stack, whether it's the end user computing paradigm that Excel provides to the world. Whether it's all of their recent innovation in both hybrid cloud technologies as well as the big data technologies that Pranav is part of managing. We just see a very strong synergy between trying to combine the usage by business consumers of being able to take advantage of these big data technologies in a hybrid cloud environment. So there's a very natural resonance between the 2 companies. We're very privileged to have Microsoft Ventures as an investor in Paxata and so the opportunity for us to work with one of the great brands of all time in our industry was really a privilege for us. Yeah, and that's the corporate sides so that wasn't actually part of it. So it's a different part of Microsoft which is great. You have also business opportunity with them. >> Nenshad : We do. >> Obviously data science problem that we're seeing is that they need to get the data faster. All that prep work, seems to be the big issue. >> It does and maybe we can get Pranav's point of view from the Microsoft angle. >> Yeah so to sort of continue what Nenshad was saying, you know the data prep in general is sort of a key core competence which is problematic for lots of users, especially around the knowledge that you need to have in terms of the different tools you can use. Folks who are very proficient will do ETL or data preparation like scenarios using one of the computing engines like Hive or Spark. That's good, but there's this big audience out there who like Excel-like interface, which is easy to use a very visually rich graphical interface where you can drag and drop and can click through. And the idea behind all of this is how quickly can I get insights from my data faster. Because in a big data space, it's volume, variety and velocity. So data is coming at a very fast rate. It's changing it's growing. And if you spend lot of time just doing data prep you're losing the value of data, or the value of data would change over time. So what we're trying to do would sort of enabling Paxata or HDInsight is enabling these users to use Paxata, get insights from data faster by solving key problems of doing data prep. >> So data democracy is a term that we've been kicking around, you guys have been talking about as well. What is actually mean, because we've been teasing out first two days here at theCUBE and BigData NYC is. It's clear the community aspect of data is growing, almost on a similar path as you're seeing with open source software. That genie's out the bottle. Open source software, tier one, it won, it's only growing exponentially. That same paradigm is moving into the data world where the collaboration is super important, in this data democracy, what is that actually mean and how does that relate to you guys? >> So the perspective we have is that first something that one of our customers said, that is there is no democracy without certain degrees of governance. We all live in a in a democracy. And yet we still have rules that we have to abide by. There are still policies that society needs to follow in order for us to be successful citizens. So when when a lot of folks hear the term democracy they really think of the wild wild west, you know. And a lot of the analytic work in the enterprise does have that flavor to it, right, people download stuff to their desktop, they do a little bit of massaging of the data. They email that to their friend, their friend then makes some changes and next thing you know we have what what some folks affectionately call spread mart hell. But if you really want to democratize the technology you have to wrap not only the user experience, like Pranav described, into something that's consumable by a very large number of folks in the enterprise. You have to wrap that with the governance and collaboration capabilities so that multiple people can work off the same data set. That you can apply the permissions so that people, who is allowed to share with each other and under what circumstances are they allowed to share. Under what circumstances are you allowed to promote data from one environment to another? It may be okay for someone like me to work in a sandbox but I cannot push that to a database or HDFS or Azure BLOB storage unless I actually have the right permissions to do so. So I think what you're seeing is that, in general, technology is becoming a, always goes on this trend, towards democratization. Whether it's the phone, whether it's the television, whether it's the personal computer and the same thing is happening with data technologies and certainly companies like. >> Well, Pranav, we're talking about this when you were on theCUBE yesterday. And I want to get your thoughts on this. The old way to solve the governance problem was to put data in silos. That was easy, I'll just put it in a silo and take care of it and access control was different. But now the value of the data is about cross-pollinating and make it freely available, horizontally scalable, so that it can be used. But the same time and you need to have a new governance paradigm. So, you've got to democratize the data by making it available, addressable and use for apps. The same time there's also the concerns on how do you make sure it doesn't get in the wrong hands and so on and so forth. >> Yeah and which is also very sort of common regarding open source projects in the cloud is a how do you ensure that the user authorized to access this open source project or run it has the right credentials is authorized and stuff. So, the benefit that you sort of get in the cloud is there's a centralized authentication system. There's Azure Active Directory, so you know most enterprise would have Active Directory users. Who are then authorized to either access maybe this cluster, or maybe this workload and they can run this job and that sort of further that goes down to the data layer as well. Where we have active policies which then describe what user can access what files and what folders. So if you think about the entrance scenario there is authentication and authorization happening and for the entire system when what user can access what data. And part of what Paxata brings in the picture is like how do you visualize this governance flow as data is coming from various sources, how do you make sure that the person who has access to data does have access data, and the one who doesn't cannot access data. >> Is that the problem with data prep is just that piece of it? What is the big problem with data prep, I mean, that seems to be, everyone keeps coming back to the same problem. What is causing all this data prep. >> People not buying Paxata it's very simple. >> That's a good one. Check out Paxata they're going to solve your problems go. But seriously, there seems to be the same hole people keep digging themselves into. They gather their stuff then next thing they're in the in the same hole they got to prepare all this stuff. >> I think the previous paradigms for doing data preparation tie exactly to the data democracy themes that we're talking about here. If you only have a very silo'd group of people in the organization with very deep technical skills but don't have the business context for what they're actually trying to accomplish, you have this impedance mismatch in the organization between the people who know what they want and the people who have the tools to do it. So what we've tried to do, and again you know taking a page out of the way that Microsoft has approached solving these problems you know both in the past in the present. Is to say look we can actually take the tools that once were only in the hands of the, you know, shamans who know how to utter the right incantations and instead move that into the the common folk who actually. >> The users. >> The users themselves who know what they want to do with the data. Who understand what those data elements mean. So if you were to ask the Paxata point of view, why have we had these data prep problems? Because we've separated the people who had the tools from the people who knew what they wanted to do with it. >> So it sounds to me, correct me if this is the wrong term, that what you offer in your partnership is it basically a broad curational environment for knowledge workers. You know, to sift and sort and annotating shared data with the lineage of the data preserved in essentially a system of record that can follow the data throughout its natural life. Is that a fair characterization? >> Pranav: I would think so yeah. >> You mention, Pranav, the whole issue of how one visualizes or should visualize this entire chain of custody, as it were, for the data, is there is there any special visualization paradigm that you guys offer? Now Microsoft, you've made a fairly significant investment in graph technology throughout your portfolio. I was at Build back in May and Sacha and the others just went to town on all things to do with Microsoft Graph, will that technology be somehow at some point, now or in the future, be reflected in this overall capability that you've established here with your partner here Paxata? >> I am not sure. So far, I think what you've talked about is some Graph capabilities introduced from the Microsoft Graph that's sort of one extreme. The other side of Graph exists today as a developer you can do some Graph based queries. So you can go to Cosmos DB which had a Gremlin API. For Graph based query, so I don't know how. >> I'll get right to the question. What's the Paxata benefits of with HDInsight? How does that, just quickly, explain for the audience. What is that solution, what are the benefits? >> So the the solution is you get a one click install of installing Paxata HDInsight and the benefit is as a benefit for a user persona who's not, sort of, used to big data or Hadoop they can use a very familiar GUI-based experience to get their insights from data faster without having any knowledge of how Spark works or Hadoop works. >> And what does the Microsoft relationship bring to the table for Paxata? >> So I think it's a couple of things. One is Azure is clearly growing at an extremely fast pace. And a lot of the enterprise customers that we work with are moving many of their workloads to Azure and and these cloud based environments. Especially for us, the unique value proposition of a partner who truly understands the hybrid nature of the world. The idea that everything is going to move to the cloud or everything is going to stay on premise is too simplistic. Microsoft understood that from day one. That data would be in it and all of those different places. And they've provided enabling technologies for vendors like us. >> I'll just say it to maybe you're too coy to say it, but the bottom line is you have an Excel-like interface. They have Office 365 they're user's going to instantly love that interface because it's an easy to use interface an Excel-like it's not Excel interface per se. >> Similar. >> Metaphor, graphical user interface. >> Yes it is. >> It's clean and it's targeted at the analyst role or user. >> That's right. >> That's going to resonate in their install base. >> And combined with a lot of these new capabilities that Microsoft is rolling out from a big data perspective. So HDInsight has a very rich portfolio of runtime engines and capabilities. They're introducing new data storage layers whether it's ADLS or Azure BLOB storage, so it's really a nice way of us working together to extract and unlock a lot of the value that Microsoft. >> So, here's the tough question for you, open source projects I see Microsoft, comments were hell froze because LINUX is now part of their DNA, which was a comment I saw at the even this week in Orlando, but they're really getting behind open source. From open compute, it's just clearly new DNA's. They're they're into it. How are you guys working together in open source and what's the impact to developers because now that's only one cloud, there's other clouds out there so data's going to be an important part of it. So open source, together, you guys working together on that and what's the role for the data? >> From an open source perspective, Microsoft plays a big role in embracing open source technologies and making sure that it runs reliably in the cloud. And part of that value prop that we provide in sort of Azure HDInsight is being sure that you can run these open source big data workloads reliably in the cloud. So you can run open source like Apache, Spark, Hive, Storm, Kafka, R Server. And the hard part about running open source technology in the cloud is how do you fine tune it, and how do you configure it, how do you run it reliably. And that's what sort of what we bring in from a cloud perspective. And we also contribute back to the community based on sort of what learned by running these workloads in the cloud. And we believe you know in the broader ecosystem customers will sort of have a mixture of these combinations and their solution They'll be using some of the Microsoft solutions some open source solutions some solutions from ecosystem that's how we see our customer solution sort of being built today. >> What's the big advantage you guys have at Paxata? What's the key differentiator for why someone should work with you guys? Is it the automation? What's the key secret sauce to you guys? >> I think it's a couple of dimensions. One is I think we have come the closest in the industry to getting a user experience that matches the Excel target user. A lot of folks are attempting to do the same but the feedback we consistently get is that when the Excel user uses our solution they just, they get it. >> Was there a design criteria, was that from the beginning how you were going to do this? >> From day one. >> So you engineer everything to make it as simple as like Excel. >> We want people to use our system they shouldn't be coding, they shouldn't be writing scripts. They just need to be able. >> Good Excel you just do good macros though. >> That's right. >> So simple things like that right. >> But the second is being able to interact with the data at scale. There are a lot of solutions out there that make the mistake in our opinion of sampling very tiny amounts of data and then asking you to draw inferences and then publish that to batch jobs. Our whole approach is to smash the batch paradigm and actually bring as much into the interactive world as possible. So end users can actually point and click on 100 million rows of data, instead of the million that you would get in Excel, and get an instantaneous response. Verses designing a job in a batch paradigm and then pushing it through the the batch. >> So it's interactive data profiling over vast corpuses of data in the cloud. >> Nenshad: Correct. >> Nenshad Bardoliwalla thanks for coming on theCUBE appreciate it, congratulations on Paxata and Microsoft Azure, great to have you. Good job on everything you do with Azure. I want to give you guys props, with seeing the growth in the market and the investment's been going well, congratulations. Thanks for sharing, keep coverage here in BigData NYC more coming after this short break.

Published Date : Sep 28 2017

SUMMARY :

Brought to you by SiliconANGLE Media in the Big Data world. it's hard with our accent, So Paxata, we had your partner on Prakash. launching theCUBE fun to watch you guys has done in many layers of the stack, is that they need to get the data faster. from the Microsoft angle. the different tools you can use. and how does that relate to you guys? have the right permissions to do so. But the same time and you need to have So, the benefit that you sort of get in the cloud What is the big problem with data prep, But seriously, there seems to be the same hole and instead move that into the the common folk from the people who knew what they wanted to do with it. is the wrong term, that what you offer for the data, is there is there So you can go to Cosmos DB which had a Gremlin API. What's the Paxata benefits of with HDInsight? So the the solution is you get a one click install And a lot of the enterprise customers but the bottom line is you have an Excel-like interface. user interface. It's clean and it's targeted at the analyst role to extract and unlock a lot of the value So open source, together, you guys working together and making sure that it runs reliably in the cloud. A lot of folks are attempting to do the same So you engineer everything to make it as simple They just need to be able. Good Excel you just do But the second is being able to interact So it's interactive data profiling and Microsoft Azure, great to have you.

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Jersey	LOCATION	0.99+
Microsoft	ORGANIZATION	0.99+
Excel	TITLE	0.99+
2 companies	QUANTITY	0.99+
John Furrier	PERSON	0.99+
New York City	LOCATION	0.99+
Orlando	LOCATION	0.99+
Nenshad	PERSON	0.99+
Bardo	PERSON	0.99+
Nenshad Bardoliwalla	PERSON	0.99+
third day	QUANTITY	0.99+
both	QUANTITY	0.99+
Office 365	TITLE	0.99+
yesterday	DATE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
100 million rows	QUANTITY	0.99+
BigData	ORGANIZATION	0.99+
Paxata	ORGANIZATION	0.99+
Microsoft Ventures	ORGANIZATION	0.99+
Pranav Rastogi	PERSON	0.99+
first two days	QUANTITY	0.99+
one	QUANTITY	0.98+
One	QUANTITY	0.98+
million	QUANTITY	0.98+
second	QUANTITY	0.98+
Midtown Manhattan	LOCATION	0.98+
Spark	TITLE	0.98+
this week	DATE	0.98+
first	QUANTITY	0.97+
theCUBE	ORGANIZATION	0.97+
one click	QUANTITY	0.97+
Prakash	PERSON	0.97+
Azure	TITLE	0.97+
May	DATE	0.97+
Wikibon Big Data	ORGANIZATION	0.96+
Hadoop	TITLE	0.96+
Hive	TITLE	0.94+
today	DATE	0.94+
Strata Data	ORGANIZATION	0.94+
Pranav	PERSON	0.93+
NYC	LOCATION	0.93+
one cloud	QUANTITY	0.93+
2017	DATE	0.92+
Apache	ORGANIZATION	0.9+
Paxata	TITLE	0.9+
Graph	TITLE	0.89+
Pranav	ORGANIZATION	0.88+

Nenshad Bardoliwalla, Paxata - #BigDataNYC 2016 - #theCUBE

>> Voiceover: Live from New York, it's The Cube, covering Big Data New York City 2016. Brought to you by headline sponsors, Cisco, IBM, Nvidia, and our ecosystem sponsors. Now, here are your hosts, Dave Vellante and George Gilbert. >> Welcome back to New York City, everybody. Nenshad Bardoliwalla is here, he's the co-founder and chief product officer at Paxata, a company that, three years ago, I want to say three years ago, came out of stealth on The Cube. >> October 27, 2013. >> Right, and we were at the Warwick Hotel across the street from the Hilton. Yeah, Prakash came on The Cube and came out of stealth. Welcome back. >> Thank you very much. >> Great to see you guys. Taking the world by storm. >> Great to be here, and of course, Prakash sends his apologies. He couldn't be here so he sent his stunt double. (Dave and George laugh) >> Great, so give us the update. What's the latest? >> So there are a lot of great things going on in our space. The thing that we announced here at the show is what we're calling Paxata Connect, OK? We are moving just in the same way that we created the self-service data preparation category, and now there are 50 companies that claim they do self-service data prep. We are moving the industry to the next phase of what we are calling our business information platform. Paxata Connect is one of the first major milestones in getting to that vision of the business information platform. What Paxata Connect allows our customers to do is, number one, to have visual, completely declarative, point-and-click browsing access to a variety of different data sources in the enterprise. For example, we support, we are the only company that we know of that supports connecting to multiple, simultaneous, different Hadoop distributions in one system. So a Paxata customer can connect to MapR, they can connect to Hortonworks, they can connect to Cloudera, and they can federate across all of them, which is a very powerful aspect of the system. >> And part of this involves, when you say declarative, it means you don't have to write a program to retrieve the data. >> Exactly right. Exactly right. >> Is this going into HTFS, into Hive, or? >> Yes it is. In fact, so Hadoop is one part of, this multi-source Hadoop capability is one part of Paxata Connect. The second is, as we've moved into this information platform world, our customers are telling us they want read-write access to more than just Hadoop. Hadoop is obviously a very important part, but we're actually supporting no-sequel data sources like Cloudant, Mongo DB, we're supporting read and write, we're supporting, for the first time, relational databases, we already supported read, but now we actually support write to relational databases. So Paxata is really becoming kind of this fabric, a business-centric information fabric, that allows people to move data from anywhere to any destination, and transform it, profile it, explore it along the way. >> Excellent. Let's get into some of the use cases. >> Yeah, tell us where the banks are. The sense at the conference is that everyone sort of got their data lakes to some extent up and running. Now where are they pushing to go next? >> Sure, that's an excellent question. So we have really focused on the enterprise segment, as you know. So the customers that are working with Paxata from an industry perspective, banking is, of course, a very important one, we were really proud to share the stage yesterday with both Citi and Standard Chartered Bank, two of our flagship banking customers. But Paxata is also heavily used in the United States government, in the intelligence community, I won't say any more about that. It's used heavily in retail and consumer products, it's used heavily in the high-tech space, it's used heavily by data service providers, that is, companies whose entire business is based on data. But to answer your question specifically, what's happening in the data lake world is that a lot of folks, the early adopters, have jumped onto the data lake bandwagon. So they're pouring terabytes and petabytes of data into the data lake. And then the next question the business asks is, OK, now what? Where's the data, right? One of the simplest use cases, but actually one that's very pervasive for our customers, is they say, "Look, we don't even know, "our business people, they don't even know "what's in Hadoop right now." And by the way, I will also say that the data lake is not just Hadoop, but Amazon S3 is also serving as a data lake. The capabilities inside Microsoft's cloud are also serving as a data lake. Even the notion of a data lake is becoming this sort of polymorphic distributed thing. So what they do is, they want to be able to get what we like to say is first eyes on data. We let people with Paxata, especially with the release of Connect, to just point and click their way and to actually explore the data in all of the native systems before they even bring it in to something like Paxata. So they can actually sneak preview thousands of database tables or thousands of compressed data sets inside of Amazon S3, or thousands of data sets inside of Hadoop, and now the business people for the first time can point and click and actually see what is in the data lake in the first place. So step number one is, we have taken the approach so far in the industry of, there have been a lot of IT-driven use cases that have motivated people to go to the data lake approach. But now, we obviously want to show, all of our companies want to show business value, so tools and platforms like Paxata that sit on top of the data lake, that can federate across multiple data lakes and provide business-centric access to that information is the first significant use case pattern we're seeing. >> Just a clarification, could there be two roles where one is for slightly more technical business user exposes views summarizing, so that the ultimate end user doesn't have to see the thousands of tables? >> Absolutely, that's a great question. So when you look at self-service, if somebody wants to roll out a self-service strategy, there are multiple roles in an organization that actually need to intersect with self-service. There is a pattern in organizations where people say, "We want our people to get access to all the data." Of course it's governed, they have to have the right passwords and SSO and all that, but they're the companies who say, yes, the users really need to be able to see all of the data across these different tables. But there's a different role, who also uses Paxata extensively, who are the curators, right? These are the people who say, look, I'm going to provision the raw data, provide the views, provide even some normalization or transformation, and then land that data back into another layer, as people call the data relay, they go from layer zero to layer one to layer two, they're different directory structures, but the point is, there's a natural processing frame that they're going through with their data, and then from the curated data that's created by the data stewards, then the analysts can go pick it up. >> One of the other big challenges that our research is showing, that chief data officers express, is that they get this data in the data lake. So they've got the data sources, you're providing access to it, the other piece is they want to trust that data. There's obviously a governance piece, but then there's a data quality piece, maybe you could talk about that? >> Absolutely. So use case number one is about access. The second reason that people are not so -- So, why are people doing data prep in the first place? They are trying to make information-driven decisions that actually help move their business forward. So if you look at researchers from firms like Forrester, they'll say there are two reasons that slow down the latency of going from raw data to decision. Number one is access to data. That's the use case we just talked about. Number two is the trustworthiness of data. Our approach is very different on that. Once people actually can find the data that they're looking for, the big paradigm shift in the self-service world is that, instead of trying to process data based on transforming the metadata attributes, like I'm going to draw on a work flow diagram, bring in this table, aggregate with this operator, then split it this way, filter it, which is the classic ETL paradigm. The, I don't want to say profound, but maybe the very obvious thing we did was to say, "What if people could actually look at the data in the first place --" >> And sort of program it by example? >> We can tell, that's right. Because our eyes can tell us, our brains help us to say, we can immediately look at a data set, right? You look at an age column, let's say. There are values in the age column of 150 years. Maybe 20 years from now there may be someone who, on Earth, lives to 150 years. But pretty much -- >> Highly unlikely. >> The customers at the banks you work with are not 150 years old, right? So just being able to look at the data, to get to the point that you're asking, quality is about data being fit for a specific purpose. In order for data to be fit for a specific purpose, the person who needs the data needs to make the decision about what is quality data. Both of you may have access to the same transactional data, raw data, that the IT team has landed in the Hadoop cluster. But now you pull it up for one use case, you pull it up for another use case, and because your needs are different, what constitutes quality to you and where you want to make the investment is going to be very different. So by putting the power of that capability into the hands of the person who actually knows what they want, that is how we are actually able to change the paradigm and really compress the latency from "Here's my raw data" to "Here's the decision I want to make on that data." >> Let me ask, it sounds like, having put all of the self-service capabilities together, you've democratized access to this data. Now, what happens in terms of governance, or more importantly, just trust, when the pipeline, you know, has to go beyond where you're working on it, to some of the analytics or some of the basic ingest? To say, "I know this data came from here "and it's going there." >> That's right, how do we verify the fidelity of these data sources? It's a fantastic question. So, in my career, having worked in BI for a couple of decades, I know I look much younger but it actually has been a couple of decades. Remember, the camera adds about 15 pounds, for those of you watching at home. (Dave and George laugh) >> George: But you've lost already. >> Thank you very much. >> So you've lost net 30. (Nenshad laughs) >> Or maybe I'm back to where I'm supposed to be. What I've seen as the two models of governance in the enterprise when it comes to analytics and information management, right? There's model one, which is, we're going to build an enterprise data warehouse, we're going to know all the possible questions people are going to ask in advance, we're going to preprogram the ETL routines, we're going to put something like a MicroStrategy or BusinessObjects, an enterprise-reporting factory tool. Then you spend 10 million dollars on that project, the users come in and for the first time they use the system, and they say, "Oh, I kind of want to change this, this way. "I want to add this calculation." It takes them about five minutes to determine that they can't do it for whatever reason, and what is the first feature they look for in the product in order to move forward? Download to Excel, right? So you invested 15 million dollars to build a download to Excel capability which they already had before. So if you lock things down too much, the point is, the end users will go around you. They've been doing it for 30 years and they'll keep doing it. Then we have model two. Model two is, Excel spreadsheet. Excel Hell, or spreadmarts. There are lots of words for these things. You have a version of the data, you have a version of the data, I have a version of the data. We all started from the same transactional data, yet you're the head of sales, so suddenly your forecast looks really rosy. You're the head of finance, you really don't like what the forecast looks like. And I'm the product guy, so why am I even looking at the forecast in the first place, but somehow I got access to the data, right? These are the two polarities of the enterprise that we've worked with for the last 30 years. We wanted to find sort of a middle path, which is to say, let's give people the freedom and flexibility to be able to do the transformations they need to. If they want to add a column, let them add a column. If they want to change a calculation, let them add a a calculation. But, every single step in the process must be recorded. It must be versioned, it must be auditable. It must be governed in that way. So why the large banks and the intelligence community and the large enterprise customers are attracted to Paxata is because they have the ability to have perfect retraceability for every decision that they make. I can actually sit next to you and say, "This is why the data looks like this. "This is how this value, which started at one million, "became 1.5 million." That covers the Paxata part. But then the answer to the question you asked is, how do you even extend that to a broader ecosystem? I think that's really about some of the metadata interchange initiatives that a lot of the vendors in the Hadoop space, but also in the traditional enterprise space, have had for the last many years. If you look at something like Apache Atlas or Cloudera Navigator, they are systems designed to collect, aggregate, and connect these different metadata steps so you can see in an end-to-end flow, this is the raw data that got ingested into Hadoop. These are the transformations that the end user did in Paxata in order to make it ready for analytics. This is how it's getting consumed in something like Zoom Data, and you actually have the entire life cycle of data now actually manifested as a software asset. >> So those not, in other words, those are not just managing within the perimeter of Hadoop. They are managers of managers. >> That's right, that's right. Because the data is coming from anywhere, and it's going to anywhere. And then you can add another dimension of complexity which is, it's not just one Hadoop cluster. It's 10 Hadoop clusters. And those 10 Hadoop clusters, three of them are in Amazon. Four of them are in Microsoft. Three of them are in Google Cloud platform. How do you know what people are doing with data then? >> How is this all presented to the user? What does the user see? >> Great question. The trick to all of this, of self service, first you have to know very clearly, who is the person you are trying to serve? What are their technical skills and capabilities, and how can you get them productive as fast as possible? When we created this category, our key notion was that we were going to go after analysts. Now, that is a very generic term, right? Because we are all, in some sense, analysts in our day-to-day lives. But in Paxata, a business analyst, in an enterprise organizational context, is somebody that has the ability to use Microsoft Excel, they have to have that skill or they won't be successful with today's Paxata. They have to know what a VLOOKUP is, because a VLOOKUP is a way to actually pull data from a second data source into one. We would all know that as a join or a lookup. And the third thing is, they have to know what a pivot table is and know how a pivot table works. Because the key insight we had is that, of the hundreds of millions of analysts, people who use Excel on a day-to-day basis, a lot of their work is data prep. But Excel, being an amazing generic tool, is actually quite bad for doing data prep. So the person we target, when I go to a customer and they say, "Are we a good candidate to use Paxata?" and we're talking to the actual person who's going to use the software, I say, "Do you know what a VLOOKUP is, yes or no? "Do you know what a pivot table is, yes or no?" If they have that skill, when they come into Paxata, we designed Paxata to be very attractive to those people. So it's completely point-and-click. It's completely visual. It's completely interactive. There's no scripting inside that whole process, because do you think the average Microsoft Excel analyst wants to script, or they want to use a proprietary wrangling language? I'm sorry, but analysts don't want to wrangle. Data scientists, the 1% of the 1%, maybe they like to wrangle, but you don't have that with the broader analyst community, and that is a much larger market opportunity that we have targeted. >> Well, very large, I mean, a lot of people are familiar with those concepts in Excel, and if they're not, they're relatively easy to learn. >> Nenshad: That's right. Excellent. All right, Nenshad, we have to leave it there. Thanks very much for coming on The Cube, appreciate it. >> Thank you very much for having me. >> Congratulations for all the success. >> Thank you. >> All right, keep it right there, everybody. We'll be back with our next guest. This is The Cube, we're live from New York City at Big Data NYC. We'll be right back. (electronic music)

Published Date : Sep 30 2016

SUMMARY :

Brought to you by headline sponsors, here, he's the co-founder across the street from the Hilton. Great to see you guys. Great to be here, and of course, What's the latest? of the business information platform. to retrieve the data. Exactly right. explore it along the way. Let's get into some of the use cases. The sense at the conference One of the simplest use These are the people who One of the other big That's the use case we just talked about. to say, we can immediately the banks you work with of the self-service capabilities together, Remember, the camera adds about 15 pounds, So you've lost net 30. of the data, I have a version of the data. They are managers of managers. and it's going to anywhere. And the third thing is, they have to know relatively easy to learn. have to leave it there. This is The Cube, we're

ENTITIES

Entity	Category	Confidence
Citi	ORGANIZATION	0.99+
October 27, 2013	DATE	0.99+
George	PERSON	0.99+
George Gilbert	PERSON	0.99+
Nenshad	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Prakash	PERSON	0.99+
Dave	PERSON	0.99+
New York City	LOCATION	0.99+
Nvidia	ORGANIZATION	0.99+
Cisco	ORGANIZATION	0.99+
Earth	LOCATION	0.99+
15 million dollars	QUANTITY	0.99+
two	QUANTITY	0.99+
30 years	QUANTITY	0.99+
Forrester	ORGANIZATION	0.99+
Excel	TITLE	0.99+
thousands	QUANTITY	0.99+
50 companies	QUANTITY	0.99+
10 million dollars	QUANTITY	0.99+
Standard Chartered Bank	ORGANIZATION	0.99+
New York City	LOCATION	0.99+
Nenshad Bardoliwalla	PERSON	0.99+
two reasons	QUANTITY	0.99+
one million	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
first	QUANTITY	0.99+
two roles	QUANTITY	0.99+
two polarities	QUANTITY	0.99+
1.5 million	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
150 years	QUANTITY	0.99+
Hadoop	TITLE	0.99+
Paxata	ORGANIZATION	0.99+
second reason	QUANTITY	0.99+
One	QUANTITY	0.99+
two models	QUANTITY	0.99+
second	QUANTITY	0.99+
one	QUANTITY	0.99+
yesterday	DATE	0.99+
Both	QUANTITY	0.99+
three years ago	DATE	0.99+
first time	QUANTITY	0.98+
first time	QUANTITY	0.98+
New York	LOCATION	0.98+
both	QUANTITY	0.98+
1%	QUANTITY	0.97+
third thing	QUANTITY	0.97+
one system	QUANTITY	0.97+
about five minutes	QUANTITY	0.97+
Paxata	PERSON	0.97+
first feature	QUANTITY	0.97+
Data	LOCATION	0.96+
one part	QUANTITY	0.96+
United States government	ORGANIZATION	0.95+
thousands of tables	QUANTITY	0.94+
20 years	QUANTITY	0.94+
Model two	QUANTITY	0.94+
10 Hadoop clusters	QUANTITY	0.94+
terabytes	QUANTITY	0.93+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Nenshad Bardoliwalla: