Balaji Ganesan, Privacera | CUBE Conversation
(upbeat techno music) >> Welcome to this CUBE Conversation. I'm Lisa Martin; I am joined by the CEO and co-founder of Privacera, Balaji Ganesan. Balaji, it's great to have you on theCUBE. >> Great to great to see you, Lisa. Good to see you again, and thanks for the opportunity. >> So tell our audience about Privacera. How do you help balance data security, data sharing? >> Absolutely. At Privacera we are on a mission to help enterprises unlock their data, but do it in a secure and a compliant way. We are in this balance between, we call it a dual mandate, where we see enterprise data teams, on one hand, they are being asked to democratize data and make this data available to all parts of the organization. So everybody in the organization is looking forward to get access to the data faster. On the other hand, governance, privacy, and compliance mandates have become more stringent. And it has come from regulations such as GDPR or California Privacy, but in general, the environment and the culture has changed where, from a board level, there's more owners who are making sure that you have visibility on what data you're bringing in, but also make sure that right people have access to the right data. And that notion is no longer in textbooks or in books, right? It's being actually, an onus is on making it happen. And it's really hard for these data teams do that, as the platforms are very diverse. And again, driven by data democratization today, companies are running very diverse platforms. Even in a single cloud like AWS, they have choices between Snowflake or Databricks and Amazon's native tools and other other services, which are really cropping up and being available in the cloud. But if you need to make sure right people have access to the right data, in that paradigm it's really, really hard. And this is where a tool like Privacera comes in, where we can help them get visibility on their data, but also make sure that we can help them with building a unified layer where they can start managing these tools more cohesively. And the end result is they can get access to the data faster, but you're compliant, you're governed, and you have visibility around who's doing what. And that's the big enabler in their data strategy. >> So, we talk about the need for data monetization, for organizations to be able to give enterprise-wide access across business units, to identify new sources of revenue and new opportunities. That's a big challenge to do. You mentioned the security and governance front at the board level. I imagined that the data-sharing is as well. How are you helping customers navigate multiple platforms, multiple clouds, to be able to get access that is actually secure, that the CEO can go back to the board and say we've got everything, you know, all I's dotted and T's crossed here? >> Absolutely, absolutely. I think this is one of the biggest challenges that we have the CIOs face today, is on one hand, they have to be agile to the business and make sure that they're present in the cloud, but they are enabling multiple services that the business needs for agility. And data is being one of the business drivers today, And most companies are becoming data companies. And it is to make decisions to serve your customer better, bring more revenue, cut costs. Even in the midst of COVID, we have seen our customers go in and leverage data to find out how they can shift to a different paradigm of doing business. Now, we had a customer which was primarily in retail stores, but they had to go and shift and analyze data on how they can pivot into a more online world in the COVID paradigm, how they can make supply chain decisions faster. So every company is becoming a data-driven business. The data is becoming the currency. So more units want faster access to the data as possible. But on the other hand, you cannot forget about governance. You can not forget about security, it's becoming a table stakes as part of it. And traditionally, this has been a zero-sum game, where, you know, in order to maintain more security, you cannot give more access to the data or you will make copies of the data, and that creates redundancy. The newer paradigm, in our belief, is that you can do both. And that's how Privacera has built toward. And this is how we are helping our customers in their journey where, you know, if you take Comcast, for example, they're building a massive infrastructure on top of AWS to serve the digital analytics part of it. And they are collecting a lot of data and making decisions based on that. But on the other hand, in order for them to achieve compliance and privacy, there needs to be an approach, a more unified layer, which is not innovating from using the data. And this is where a solution like Privacera is coming in, where we have built an approach, we have built an architecture, where they can enable governance and policies, and these policies are being implemented across the data infrastructure. So it doesn't matter which application you use, where you're coming from, you're governed by the same rules and policies. And that uniformity, that consistency is something we can bring in, of being in horizontal layer and having built those integrations, prebuilt those integrations in. So with Comcast, what the end result they're saying is they can be faster to the market, right? Before us, they would be spending a lot of time with manual processes to build that governance. But with an automated layer, with an automated governance, which has prebuilt integrations into all the layers, they are now able to go to market faster, but now they're going into the market with the governance and the compliance built in, so they can have both. So again, our belief is it's not zero-sum. Your governance, security can be built in with this business agility. And we are helping customers do that. >> You mentioned that retail customer and COVID-19, and we saw a massive pivot about a year and a half ago. And some companies did a great job of pivoting from brick and mortar to curbside delivery, for example, which is table stakes. But we saw so much acceleration of digital transformation last year. How has COVID-19 impacted governance? And what are some of the things that you're helping customers achieve there as they're accelerating their digital journeys? >> Again, going back to the drivers, we are seeing our customers, right? So on one hand, digitization and cloud journey, that accelerated during COVID right? So more companies where they were doing their cloud journey, they accelerated, because they can unlock data faster. And, to my earlier examples, they want to make decisions, leveraging data. And COVID brought that, even accelerated some of these initiatives. So there has been more data initiatives than before. Digitalization has accelerated; cloud migration has accelerated. But COVID also brought in the fact that you are not physically located. You can't sit in a room and trust each other and say, "I trust all of you and I'll give you all equal access." You are now sitting in disparate locations, without the traditional securities you would have, a physical boundary, having that. You're now remote. All of a sudden, the CIOs have to think how we can be more agile? How do you build in security, governance in that layer where you have to think start from bottom staff and then say, are you governing and protecting your data wherever it is stored and being accessed, Rather than relying on perimeter or relying on a physical boundary or being in a physical location. So those traditional paradigms are getting shattered. And most companies have recognized, most forward-looking companies, are recognizing that. They accelerated those trends. And from what we have seen from our point of view is we are able to help in that transformation, both in enabling companies to become digital and democratize data faster, but also building this bottom-up layer where they can be sure that they have visibility on what data they have, but also making sure right people have access to the right data, irrespective of what tool they use, irrespective of where they are set, they're always getting that part of it. And that's a sea change we are seeing in the companies now. So COVID in our industry, in our world, has brought in massive transformation and massive opportunities to set a new paradigm for how organizations treat governance, as well as the data initiative. >> A lot of change that it's brought. Some good, as you've mentioned. Talk to me about, so Privacera is built on Apache Ranger; how are you guys helping AWS customers from a cloud migration perspective? Because we know cloud migration is continuing to accelerate. >> Our foundation, given our work in open source, has always been building around open standards and interoperability, and we believe an enterprise solution needs to be built around these standards that we can talk to. You're not the only solution that enterprises will have. There needs to be interoperability, especially around governance and where we exchanging information, and with other tools. And given a legacy of Ranger, it helps us build those standards. And Ranger as a project today is supported from the likes of Cloudera or in the cloud, Microsoft, AWS, and Google, and most of the forward-looking standards and tools, like Presto and Spark. It has been a de facto standard used by some of these analytical engines. The wide adoption around that, and being built on Ranger gives us that standard of interoperability. So when we go and work with other tools, it makes it easier for us to talk. It makes it easier for organizations to transition in their cloud journey, where they can now very easily move the governance and policies of, even if they are running Ranger on premise, they can easily move those standards, those policies, easily into the cloud. For example, with Sun Life, it was the same case, where they built a lot of these rules and policies in their on-premise environment. Being an insurance company, they always had governance and compliance at top of their mind. Very strict rules around who can access what data and what portions of data, because this data is governed by federal laws, by a lot of the industry laws and mandates and compliance. And they always had this notion in on-premise. Now when they're migrating to the cloud, one of the bottlenecks is how do you move this governance and do you have to build it from scratch? But with our tool and the standards we have built in, we can migrate that in days rather than months. So for them, we help in the overall cloud migration. To my earlier point, we are helping customers achieve faster time to market by enabling this governance and making it easier. And by having this open standard, it makes it easier for customers to migrate and then cooperate, rather than having to build it again, having to reinvent the wheel when they migrate to the cloud. Because, the governance and compliance mandates are not changing when you go from prem to cloud. In fact cloud, in some cases, it's more diverse. So by helping organizations do that, we are helping them achieve a faster acceleration, which is the case happened in Sun Life. >> That time to market is absolutely imperative. If anything, we've learned in the last 18 months, it's businesses that needed to pivot overnight multiple times. And they need to be able to get to market faster, whether it's pivoting from being a brick and mortar, to being able to deliver a curbside delivery. The time to market, people don't have that time, regardless of industry, because there's competitors in the rear-view mirror who might be smaller, more agile, and able to get to market faster. So these bigger companies, and any company, needs to have a faster time to market. >> Yeah, absolutely. And that's what we are seeing. And that's big driver for journey into the cloud is to bring that agility. In the earlier paradigm, you're going to have a monolithical technology standard, and you can adopt changes faster when you are reliant on the IT team. What cloud brings in is, you can now move data into the cloud and enable any service and any team faster than ever before. You can enable a team on Snowflake, you can enable a team on a different machine learning tool, all having access to the same data, without it being the need for the data to be copied and servers built out. The cloud is really bringing that digital transformation, but it's also bringing in the agility of being faster and nimble and as part of it. But the challenge for cloud is it's happening at the same time governance, privacy has become real. And organizations no longer can be assuming that, you know, they can just move data into the cloud and be done with it. You have to really think about all layers of the cloud and say, how do you make sure that data is protected on all layers, in all consumption? How do you make sure that right people have access to the right data? And that's a much comprehensive problem, given the world that we are now not sitting in a physical office anymore, we are distributed. How do you do that? So while cloud brings that business agility, it's also happening, not because of cloud, but because of the climate we are in, that governance and compliance is real. And most forward-looking organizations are thinking about how they can build a foundation that can handle both. How they can build, institutionalize these governance frameworks in the newer paradigms of cloud. We are seeing the companies implementing what is called a data mesh, which is essentially a concept of how the data could be decentralized and owned by business owners and teams. But how do you bring governance in that? How do you make sure that a layer of that, and then a newer paradigm most forward-looking organizations are adopting is, governance doesn't need to be managed by one team. It can be a distributed function. But can you institutionalize a foundation or a framework, and you have pools which can be used by different teams. So they are bound by the same rules, but they're operating in their own independent way. And that's the future for us, is how the organizations can figure out how in the cloud, they can have a more distributed, delegated, decentralized governance that aligns with their business strategy of self-service analytics and use of data across multiple teams, but all bound by the same framework, all bound by common rules so that you're not building your own; the tools and the methods are all common, but each team is able to operate independently. And that's where the agility, true agility, will come in, when organizations are able to do that. And I think we are in probably step one or two of the journey. It's fascinating to see some of the organizations take leaps in that. But for us, the future is how if some organizations can build those foundations in from processes and people, they can truly unlock the power of the cloud. >> You brought in technology and people; last question is, how do you advise customers when you're in conversations? We talked about data access, governance, security, being a board-level conversation, the ability for an organization to monetize their data; but how do you talk about that balance when you're with customers? That's a tricky line. >> And what we say to the customer, it's a journey. You don't have to think of solving this on day one. What we really think about is foundational steps you need to do to achieve that journey. And what are the steps you can do today? And add onto it, rather than trying to solve for everything on day one. And that's what most of the focus areas goes in, is how we can help our customers put together a program which achieves both their data strategy and aligns their governance with it. And most forward-looking organizations are already doing that, where they have a multi-year journey that they're already working on. They are thinking about some of the things that we help with. And in some cases, when organizations are not thinking about it, we come and help and advise with that. Our advice always is, start thinking about today and what your next two or three years is going to look like. We put together a program. And that involves tools, that involves people, and that involves organization structure. And we are a cog in the wheel, but we also recommend them to look at, holistically, all the aspects. And that's our job at the end of the day as vendors in this industry, to help collectively learn from customers what we are learning and can help the next set of customers coming. But we believe, again, going back to my point, if organizations are able to set up this paradigm where they're able to set structures, where they can delegate governance, but they build those common rules and frameworks upfront, they are set up to succeed in the future. They can be more agile than their competitors. >> And that is absolutely table stakes these days. Balaji, thank you so much for joining, telling our audience about Privacera, what you're doing, how you're helping customers, particularly AWS customers, migrate to the cloud in such a dynamic environment. We appreciate your time. >> Thank you so much. It was a pleasure talking to you and I appreciate it. >> Likewise. For Balaji Ganesan, I'm Lisa Martin. You're watching this CUBE Conversation. (upbeat music)
SUMMARY :
Balaji, it's great to have you on theCUBE. Good to see you again, and How do you help balance And the end result is they can for organizations to be able to give But on the other hand, you to curbside delivery, All of a sudden, the CIOs have to think is continuing to accelerate. and most of the forward-looking And they need to be able but because of the climate we are in, to monetize their data; And that's our job at the end of the day And that is absolutely to you and I appreciate it. For Balaji Ganesan, I'm Lisa Martin.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Lisa Martin | PERSON | 0.99+ |
Comcast | ORGANIZATION | 0.99+ |
Privacera | ORGANIZATION | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
Lisa | PERSON | 0.99+ |
last year | DATE | 0.99+ |
GDPR | TITLE | 0.99+ |
Balaji Ganesan | PERSON | 0.99+ |
Databricks | ORGANIZATION | 0.99+ |
Balaji | PERSON | 0.99+ |
both | QUANTITY | 0.99+ |
ORGANIZATION | 0.99+ | |
Sun Life | ORGANIZATION | 0.99+ |
each team | QUANTITY | 0.99+ |
one | QUANTITY | 0.98+ |
today | DATE | 0.98+ |
one team | QUANTITY | 0.98+ |
three years | QUANTITY | 0.98+ |
Snowflake | ORGANIZATION | 0.97+ |
two | QUANTITY | 0.97+ |
California Privacy | TITLE | 0.95+ |
COVID | OTHER | 0.95+ |
Sun Life | ORGANIZATION | 0.95+ |
COVID | TITLE | 0.94+ |
about a year and a half ago | DATE | 0.94+ |
COVID-19 | OTHER | 0.91+ |
day one | QUANTITY | 0.9+ |
COVID | ORGANIZATION | 0.87+ |
dual | QUANTITY | 0.86+ |
Ranger | ORGANIZATION | 0.86+ |
step one | QUANTITY | 0.84+ |
Snowflake | TITLE | 0.82+ |
single cloud | QUANTITY | 0.81+ |
Apache Ranger | ORGANIZATION | 0.78+ |
Presto | ORGANIZATION | 0.7+ |
last 18 months | DATE | 0.7+ |
Spark | TITLE | 0.69+ |
one of the bottlenecks | QUANTITY | 0.62+ |
Cloudera | TITLE | 0.54+ |
Privacera | PERSON | 0.51+ |
Steve Wooledge, Arcadia Data & Satya Ramachandran, Neustar | DataWorks Summit 2018
(upbeat electronic music) >> Live from San Jose, in the heart of Silicon Valley, it's theCUBE. Covering Dataworks Summit 2018, brought to you by Hortonworks. (electronic whooshing) >> Welcome back to theCUBE's live coverage of Dataworks, here in San Jose, California. I'm your host, Rebecca Knight, along with my co-host, James Kobielus. We have two guests in this segment, we have Steve Wooledge, he is the VP of Product Marketing at Arcadia Data, and Satya Ramachandran, who is the VP of Engineering at Neustar. Thanks so much for coming on theCUBE. >> Our pleasure and thank you. >> So let's start out by setting the scene for our viewers. Tell us a little bit about what Arcadia Data does. >> Arcadia Data is focused on getting business value from these modern scale-out architectures, like Hadoop, and the Cloud. We started in 2012 to solve the problem of how do we get value into the hands of the business analysts that understand a little bit more about the business, in addition to empowering the data scientists to deploy their models and value to a much broader audience. So I think that's been, in some ways, the last mile of value that people need to get out of Hadoop and data lakes, is to get it into the hands of the business. So that's what we're focused on. >> And start seeing the value, as you said. >> Yeah, seeing is believing, a picture is a thousand words, all those good things. And what's really emerging, I think, is companies are realizing that traditional BI technology won't solve the scale and user concurrency issues, because architecturally, big data's different, right? We're on the scale-out, MPP architectures now, like Hadoop, the data complexity and variety has changed, but the BI tools are still the same, and you pull the data out of the system to put it into some little micro cube to do some analysis. Companies want to go after all the data, and view the analysis across a much broader set, and that's really what we enable. >> I want to hear about the relationship between your two companies, but Satya, tell us a little about Neustar, what you do. >> Neustar is an information services company, we are built around identity. We are the premiere identity provider, the most authoritative identity provider for the US. And we built a whole bunch of services around that identity platform. I am part of the marketing solutions group, and I head the analytics engineering for marketing solutions. The product that I work on helps marketers do their annual planning, as well as their campaign or tactical planning, so that they can fine tune their campaigns on an ongoing basis. >> So how do you use Arcadia Data's primary product? >> So we are a predictive analytics platform, the reporting solution, we use Arcadia for the reporting part of it. So we have multi terabytes of advertising data in our values, and so we use Arcadia to provide fast taxes to our customers, and also very granular and explorative analysis of this data. High (mumbles) and explorative analysis of this data. >> So you say you help your customers with their marketing campaigns, so are you doing predictive analytics? And are you during churn analysis and so forth? And how does Arcadia fit into all of that? >> So we get data and then they build an activation model, which tells how the marketing spent corresponds to the revenue. We not only do historical analysis, we also do predictive, in the sense that the marketers frequently done what-if analysis, saying that, what if I moved my budget from page search to TV? And how does it affect the revenue? So all of this modeling is built by Neustar, the modeling platform is built by the Neustar, but the last mile of taking these reports and providing this explorative analysis of the results, that is provided by the reporting solution, which is Arcadia. >> Well, I mean, the thing about data analytics, is that it really is going to revolutionize marketing. That famous marketing adage of, I know my advertising works, I just don't know which half, and now we're really going to be able to figure out which half. Can you talk a little bit about return on investment and what your clients see? >> Sure, we've got some major Fortune 500 companies that have said publicly that they've realized over a billion dollars of incremental value. And that could be across both marketing analytics, and how we better treat our messaging, our brand, to reach our intended audience. There's things like supply chain and being able to more realtime analyze what-if analysis for different routes, it's things like cyber security and stopping fraud and waste and things like that at a much grander scale than what was really possible in the past. >> So we're here at Dataworks and it's the Hortonworks show. Give us a sense of the degree of your engagement or partnership with Hortonworks and participation in their partner ecosystem. >> Yeah, absolutely. Hortonworks is one of our key partners, and what we did that's different architecturally, is we built our BI server directly into the data platforms. So what I mean by that is, we take the concept of a BI server, we install it and run it on the data nodes of Hortonworks Data Platform. We inherit the security directly out of systems like Apache Ranger, so that all that administration and scale is done at Hadoop economics, if you will, and it leverages the things that are already in place. So that has huge advantages both in terms of scale, but also simplicity, and then you get the performance, the concurrency that companies need to deploy out to like, 5,000 users directly on that Hadoop cluster. So, Hortonworks is a fantastic partner for us and a large number of our customers run on Hortonworks, as well as other platforms, such as Amazon Web Services, where Satya's got his system deployed. >> At the show they announced Hortonworks Data Platform 3.0. There's containerization there, there's updates to Hive to enable it to be more of a realtime analytics, and also a data warehousing engine. In Arcadia Data, do you follow their product enhancements, in terms of your own product roadmap with any specific, fixed cycle? Are you going to be leveraging the new features in HDP 3.0 going forward to add value to your customers' ability to do interactive analysis of this data in close to realtime? >> Sure, yeah, no, because we're a native-- >> 'Cause marketing campaigns are often in realtime increasingly, especially when you're using, you know, you got a completely digital business. >> Yeah, absolutely. So we benefit from the innovations happening within the Hortonworks Data Platform. So, because we're a native BI tool that runs directly within that system, you know, with changes in Hive, or different things within HDFS, in terms of performance or compression and things like that, our customers generally benefit from that directly, so yeah. >> Satya, going forward, what are some of the problems that you want to solve for your clients? What is their biggest pain points and where do you see Neustar? >> So, data is the new island, right? So, marketers, also for them now, data is the biggest, is what they're going after. They want faster analysis, they want to be able to get to insights as fast as they can, and they want to obviously get, work on as large amount of data as possible. The variety of sources is becoming higher and higher and higher, in terms of marketing. There used to be a few channels in '70s and '80s, and '90s kind of increased, now you have like, hundreds of channels, if not thousands of channels. And they want visibility across all of that. It's the ability to work across this variety of data, increasing volume at a very high speed. Those are high level challenges that we have at Neustar. >> Great. >> So the difference, marketing attribution analysis you say is one of the core applications of your solution portfolio. How is that more challenging now than it had been in the past? We have far more marketing channels, digital and so forth, then how does the state-of-the-art of marketing attribution analysis, how is it changing to address this multiplicity of channels and media for advertising and for influencing the customer on social media and so forth? And then, you know, can you give us a sense for then, what are the necessary analytical tools needed for that? We often hear about a social graph analysis or semantic analysis, or for behavioral analytics and so forth, all of this makes it very challenging. How can you determine exactly what influences a customer now in this day and age, where, you think, you know, Twitter is an influencer over the conversation. How can you nail that down to specific, you know, KPIs or specific things to track? >> So I think, from our, like you pointed out, the variety is increasing, right? And I think the marketers now have a lot more options than what they have, and that that's a blessing, and it's also a curse. Because then I don't know where I'm going to move my marketing spending to. So, attribution right now, is still sitting at the headquarters, it's kind of sitting at a very high level and it is answering questions. Like we said, with the Fortune 100 companies, it's still answering questions to the CMOs, right? Where attribution will take us, next step is to then lower down, where it's able to answer the regional headquarters on what needs to happen, and more importantly, on every store, I'm able to then answer and tailor my attribution model to a particular store. Let's take Ford for an example, right? Now, instead of the CMO suite, but, if I'm able to go to every dealer, and I'm able to personal my attribution to that particular dealer, then it becomes a lot more useful. The challenge there is it all needs to be connected. Whatever model we are working for the dealer, needs to be connected up to the headquarters. >> Yes, and that personalization, it very much leverages the kind of things that Steve was talking about at Arcadia. Being able to analyze all the data to find those micro, micro, micro segments that can be influenced to varying degrees, so yeah. I like where you're going with this, 'cause it very much relates to the power of distributing federated big data fabrics like Hortonworks' offers. >> And so it's streaming analytics is coming to forward, and it's been talked about for the past longest period of time, but we have real use cases for streaming analytics right now. Similarly, the large volumes of the data volumes is, indeed, becoming a lot more. So both of them are doing a lot more right now. >> Yes. >> Great. >> Well, Satya and Steve, thank you so much for coming on theCUBE, this was really, really fun talking to you. >> Excellent. >> Thanks, it was great to meet you. Thanks for having us. >> I love marketing talk. >> (laughs) It's fun. I'm Rebecca Knight, for James Kobielus, stay tuned to theCUBE, we will have more coming up from our live coverage of Dataworks, just after this. (upbeat electronic music)
SUMMARY :
brought to you by Hortonworks. the VP of Product Marketing the scene for our viewers. the data scientists to deploy their models the value, as you said. and you pull the data out of the system Neustar, what you do. and I head the analytics engineering the reporting solution, we use Arcadia analysis of the results, and what your clients see? and being able to more realtime and it's the Hortonworks show. and it leverages the things of this data in close to realtime? you got a completely digital business. So we benefit from the It's the ability to work to specific, you know, KPIs and I'm able to personal my attribution the data to find those micro, analytics is coming to forward, talking to you. Thanks, it was great to meet you. stay tuned to theCUBE, we
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
James Kobielus | PERSON | 0.99+ |
Steve Wooledge | PERSON | 0.99+ |
Rebecca Knight | PERSON | 0.99+ |
Satya Ramachandran | PERSON | 0.99+ |
Steve | PERSON | 0.99+ |
Hortonworks | ORGANIZATION | 0.99+ |
Neustar | ORGANIZATION | 0.99+ |
Arcadia Data | ORGANIZATION | 0.99+ |
Ford | ORGANIZATION | 0.99+ |
Satya | PERSON | 0.99+ |
2012 | DATE | 0.99+ |
San Jose | LOCATION | 0.99+ |
two companies | QUANTITY | 0.99+ |
Silicon Valley | LOCATION | 0.99+ |
two guests | QUANTITY | 0.99+ |
Arcadia | ORGANIZATION | 0.99+ |
San Jose, California | LOCATION | 0.99+ |
Amazon Web Services | ORGANIZATION | 0.99+ |
US | LOCATION | 0.99+ |
both | QUANTITY | 0.99+ |
Hortonworks' | ORGANIZATION | 0.99+ |
5,000 users | QUANTITY | 0.99+ |
Dataworks | ORGANIZATION | 0.98+ |
theCUBE | ORGANIZATION | 0.98+ |
one | QUANTITY | 0.97+ |
ORGANIZATION | 0.96+ | |
hundreds of channels | QUANTITY | 0.96+ |
Dataworks Summit 2018 | EVENT | 0.96+ |
DataWorks Summit 2018 | EVENT | 0.93+ |
thousands of channels | QUANTITY | 0.93+ |
over a billion dollars | QUANTITY | 0.93+ |
Data Platform 3.0 | TITLE | 0.9+ |
'70s | DATE | 0.86+ |
Arcadia | TITLE | 0.84+ |
Hadoop | TITLE | 0.84+ |
HDP 3.0 | TITLE | 0.83+ |
'90s | DATE | 0.82+ |
Apache Ranger | ORGANIZATION | 0.82+ |
thousand words | QUANTITY | 0.76+ |
HDFS | TITLE | 0.76+ |
multi terabytes | QUANTITY | 0.75+ |
Hive | TITLE | 0.69+ |
Neustar | TITLE | 0.67+ |
Fortune | ORGANIZATION | 0.62+ |
80s | DATE | 0.55+ |
500 | QUANTITY | 0.45+ |
100 | QUANTITY | 0.4+ |
theCUBE | TITLE | 0.39+ |
Ram Venkatesh, Hortonworks & Sudhir Hasbe, Google | DataWorks Summit 2018
>> Live from San Jose, in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2018. Brought to you by HortonWorks. >> We are wrapping up Day One of coverage of Dataworks here in San Jose, California on theCUBE. I'm your host, Rebecca Knight, along with my co-host, James Kobielus. We have two guests for this last segment of the day. We have Sudhir Hasbe, who is the director of product management at Google and Ram Venkatesh, who is VP of Engineering at Hortonworks. Ram, Sudhir, thanks so much for coming on the show. >> Thank you very much. >> Thank you. >> So, I want to start out by asking you about a joint announcement that was made earlier this morning about using some Hortonworks technology deployed onto Google Cloud. Tell our viewers more. >> Sure, so basically what we announced was support for the Hortonworks DataPlatform and Hortonworks DataFlow, HDP and HDF, running on top of the Google Cloud Platform. So this includes deep integration with Google's cloud storage connector layer as well as it's a certified distribution of HDP to run on the Google Cloud Platform. >> I think the key thing is a lot of our customers have been telling us they like the familiar environment of Hortonworks distribution that they've been using on-premises and as they look at moving to cloud, like in GCP, Google Cloud, they want the similar, familiar environment. So, they want the choice to deploy on-premises or Google Cloud, but they want the familiarity of what they've already been using with Hortonworks products. So this announcement actually helps customers pick and choose like whether they want to run Hortonworks distribution on-premises, they want to do it in cloud, or they wat to build this hybrid solution where the data can reside on-premises, can move to cloud and build these common, hybrid architecture. So, that's what this does. >> So, HDP customers can store data in the Google Cloud. They can execute ephemeral workloads, analytic workloads, machine learning in the Google Cloud. And there's some tie-in between Hortonworks's real-time or low latency or streaming capabilities from HDF in the Google Cloud. So, could you describe, at a full sort of detail level, the degrees of technical integration between your two offerings here. >> You want to take that? >> Sure, I'll handle that. So, essentially, deep in the heart of HDP, there's the HDFS layer that includes Hadoop compatible file system which is a plug-able file system layer. So, what Google has done is they have provided an implementation of this API for the Google Cloud Storage Connector. So this is the GCS Connector. We've taken the connector and we've actually continued to refine it to work with our workloads and now Hortonworks has actually bundling, packaging, and making this connector be available as part of HDP. >> So bilateral data movement between them? Bilateral workload movement? >> No, think of this as being very efficient when our workloads are running on top of GCP. When they need to get at data, they can get at data that is in the Google Cloud Storage buckets in a very, very efficient manner. So, since we have fairly deep expertise on workloads like Apache Hive and Apache Spark, we've actually done work in these workloads to make sure that they can run efficiently, not just on HDFS, but also in the cloud storage connector. This is a critical part of making sure that the architecture is actually optimized for the cloud. So, at our skill and our customers are moving their workloads from on-premise to the cloud, it's not just functional parity, but they also need sort of the operational and the cost efficiency that they're looking for as they move to the cloud. So, to do that, we need to enable these fundamental disaggregated storage pattern. See, on-prem, the big win with Hadoop was we could bring the processing to where the data was. In the cloud, we need to make sure that we work well when storage and compute are disaggregated and they're scaled elastically, independent of each other. So this is a fairly fundamental architectural change. We want to make sure that we enable this in a first-class manner. >> I think that's a key point, right. I think what cloud allows you to do is scale the storage and compute independently. And so, with storing data in Google Cloud Storage, you can like scale that horizontally and then just leverage that as your storage layer. And the compute can independently scale by itself. And what this is allowing customers of HDP and HDF is store the data on GCP, on the cloud storage, and then just use the scale, the compute side of it with HDP and HDF. >> So, if you'll indulge me to a name, another Hortonworks partner for just a hypothetical. Let's say one of your customers is using IBM Data Science Experience to do TensorFlow modeling and training, can they then inside of HDP on GCP, can they use the compute infrastructure inside of GCP to do the actual modeling which is more compute intensive and then the separate decoupled storage infrastructure to do the training which is more storage intensive? Is that a capability that would available to your customers? With this integration with Google? >> Yeah, so where we are going with this is we are saying, IBM DSX and other solutions that are built on top of HDP, they can transparently take advantage of the fact that they have HDP compute infrastructure to run against. So, you can run your machine learning training jobs, you can run your scoring jobs and you can have the same unmodified DSX experience whether you're running against an on-premise HDP environment or an in-cloud HDP environment. Further, that's sort of the benefit for partners and partner solutions. From a customer standpoint, the big value prop here is that customers, they're used to securing and governing their data on-prem in their particular way with HDP, with Apache Ranger, Atlas, and so forth. So, when they move to the cloud, we want this experience to be seamless from a management standpoint. So, from a data management standpoint, we want all of their learning from a security and governance perspective to apply when they are running in Google Cloud as well. So, we've had this capability on Azure and on AWS, so with this partnership, we are announcing the same type of deep integration with GCP as well. >> So Hortonworks is that one pane of glass across all your product partners for all manner of jobs. Go ahead, Rebecca. >> Well, I just wanted to ask about, we've talked about the reason, the impetus for this. With the customer, it's more familiar for customers, it offers the seamless experience, But, can you delve a little bit into the business problems that you're solving for customers here? >> A lot of times, our customers are at various points on their cloud journey, that for some of them, it's very simple, they're like there's a broom coming by and the datacenter is going away in 12 months and I need to be in the cloud. So, this is where there is a wholesale movement of infrastructure from on-premise to the cloud. Others are exploring individual business use cases. So, for example, one of our large customers, a travel partner, so they are exploring their new pricing model and they want to roll out this pricing model in the cloud. They have on-premise infrastructure, they know they have that for a while. They are spinning up new use cases in the cloud typically for reasons of agility. So, if you, typically many of our customers, they operate large, multi-tenant clusters on-prem. That's nice for, so a very scalable compute for running large jobs. But, if you want to run, for example, a new version of Spark, you have to upgrade the entire cluster before you can do that. Whereas in this sort of model, what they can say is, they can bring up a new workload and just have the specific versions and dependency that it needs, independent of all of their other infrastructure. So this gives them agility where they can move as fast as... >> Through the containerization of the Spark jobs or whatever. >> Correct, and so containerization as well as even spinning up an entire new environment. Because, in the cloud, given that you have access to elastic compute resources, they can come and go. So, your workloads are much more independent of the underlying cluster than they are on-premise. And this is where sort of the core business benefits around agility, speed of deployment, things like that come into play. >> And also, if you look at the total cost of ownership, really take an example where customers are collecting all this information through the month. And, at month end, you want to do closing of books. And so that's a great example where you want ephemeral workloads. So this is like do it once in a month, finish the books and close the books. That's a great scenario for cloud where you don't have to on-premises create an infrastructure, keep it ready. So that's one example where now, in the new partnership, you can collect all the data through the on-premises if you want throughout the month. But, move that and leverage cloud to go ahead and scale and do this workload and finish the books and all. That's one, the second example I can give is, a lot of customers collecting, like they run their e-commerce platforms and all on-premises, let's say they're running it. They can still connect all these events through HDP that may be running on-premises with Kafka and then, what you can do is, in-cloud, in GCP, you can deploy HDP, HDF, and you can use the HDF from there for real-time stream processing. So, collect all these clickstream events, use them, make decisions like, hey, which products are selling better?, should we go ahead and give?, how many people are looking at that product?, or how many people have bought it?. That kind of aggregation and real-time at scale, now you can do in-cloud and build these hybrid architectures that are there. And enable scenarios where in past, to do that kind of stuff, you would have to procure hardware, deploy hardware, all of that. Which all goes away. In-cloud, you can do that much more flexibly and just use whatever capacity you have. >> Well, you know, ephemeral workloads are at the heart of what many enterprise data scientists do. Real-world experiments, ad-hoc experiments, with certain datasets. You build a TensorFlow model or maybe a model in Caffe or whatever and you deploy it out to a cluster and so the life of a data scientist is often nothing but a stream of new tasks that are all ephemeral in their own right but are part of an ongoing experimentation program that's, you know, they're building and testing assets that may be or may not be deployed in the production applications. That's you know, so I can see a clear need for that, well, that capability of this announcement in lots of working data science shops in the business world. >> Absolutely. >> And I think coming down to, if you really look at the partnership, right. There are two or three key areas where it's going to have a huge advantage for our customers. One is analytics at-scale at a lower cost, like total cost of ownership, reducing that, running at-scale analytics. That's one of the big things. Again, as I said, the hybrid scenarios. Most customers, enterprise customers have huge deployments of infrastructure on-premises and that's not going to go away. Over a period of time, leveraging cloud is a priority for a lot of customers but they will be in these hybrid scenarios. And what this partnership allows them to do is have these scenarios that can span across cloud and on-premises infrastructure that they are building and get business value out of all of these. And then, finally, we at Google believe that the world will be more and more real-time over a period of time. Like, we already are seeing a lot of these real-time scenarios with IoT events coming in and people making real-time decisions. And this is only going to grow. And this partnership also provides the whole streaming analytics capabilities in-cloud at-scale for customers to build these hybrid plus also real-time streaming scenarios with this package. >> Well it's clear from Google what the Hortonworks partnership gives you in this competitive space, in the multi-cloud space. It gives you that ability to support hybrid cloud scenarios. You're one of the premier public cloud providers and we all know about. And clearly now that you got, you've had the Hortonworks partnership, you have that ability to support those kinds of highly hybridized deployments for your customers, many of whom I'm sure have those requirements. >> That's perfect, exactly right. >> Well a great note to end on. Thank you so much for coming on theCUBE. Sudhir, Ram, that you so much. >> Thank you, thanks a lot. >> Thank you. >> I'm Rebecca Knight for James Kobielus, we will have more tomorrow from DataWorks. We will see you tomorrow. This is theCUBE signing off. >> From sunny San Jose. >> That's right.
SUMMARY :
in the heart of Silicon Valley, for coming on the show. So, I want to start out by asking you to run on the Google Cloud Platform. and as they look at moving to cloud, in the Google Cloud. So, essentially, deep in the heart of HDP, and the cost efficiency is scale the storage and to do the training which and you can have the same that one pane of glass With the customer, it's and just have the specific of the Spark jobs or whatever. of the underlying cluster and then, what you can and so the life of a data that the world will be And clearly now that you got, Sudhir, Ram, that you so much. We will see you tomorrow.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
James Kobielus | PERSON | 0.99+ |
Rebecca Knight | PERSON | 0.99+ |
Rebecca | PERSON | 0.99+ |
two | QUANTITY | 0.99+ |
Sudhir | PERSON | 0.99+ |
Ram Venkatesh | PERSON | 0.99+ |
San Jose | LOCATION | 0.99+ |
HortonWorks | ORGANIZATION | 0.99+ |
Sudhir Hasbe | PERSON | 0.99+ |
ORGANIZATION | 0.99+ | |
Hortonworks | ORGANIZATION | 0.99+ |
Silicon Valley | LOCATION | 0.99+ |
two guests | QUANTITY | 0.99+ |
San Jose, California | LOCATION | 0.99+ |
DataWorks | ORGANIZATION | 0.99+ |
tomorrow | DATE | 0.99+ |
Ram | PERSON | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
one example | QUANTITY | 0.99+ |
one | QUANTITY | 0.99+ |
two offerings | QUANTITY | 0.98+ |
12 months | QUANTITY | 0.98+ |
One | QUANTITY | 0.98+ |
Day One | QUANTITY | 0.98+ |
DataWorks Summit 2018 | EVENT | 0.97+ |
IBM | ORGANIZATION | 0.97+ |
second example | QUANTITY | 0.97+ |
Google Cloud Platform | TITLE | 0.96+ |
Atlas | ORGANIZATION | 0.96+ |
Google Cloud | TITLE | 0.94+ |
Apache Ranger | ORGANIZATION | 0.92+ |
three key areas | QUANTITY | 0.92+ |
Hadoop | TITLE | 0.91+ |
Kafka | TITLE | 0.9+ |
theCUBE | ORGANIZATION | 0.88+ |
earlier this morning | DATE | 0.87+ |
Apache Hive | ORGANIZATION | 0.86+ |
GCP | TITLE | 0.86+ |
one pane | QUANTITY | 0.86+ |
IBM Data Science | ORGANIZATION | 0.84+ |
Azure | TITLE | 0.82+ |
Spark | TITLE | 0.81+ |
first | QUANTITY | 0.79+ |
HDF | ORGANIZATION | 0.74+ |
once in a month | QUANTITY | 0.73+ |
HDP | ORGANIZATION | 0.7+ |
TensorFlow | OTHER | 0.69+ |
Hortonworks DataPlatform | ORGANIZATION | 0.67+ |
Apache Spark | ORGANIZATION | 0.61+ |
GCS | OTHER | 0.57+ |
HDP | TITLE | 0.5+ |
DSX | TITLE | 0.49+ |
Cloud Storage | TITLE | 0.47+ |
Scott Gnau, Hortonworks - DataWorks Summit 2017
>> Announcer: Live, from San Jose, in the heart of Silicon Valley, it's The Cube, covering DataWorks Summit 2017. Brought to you by Hortonworks. >> Welcome back to The Cube. We are live at DataWorks Summit 2017. I'm Lisa Martin with my cohost, George Gilbert. We've just come from this energetic, laser light show infused keynote, and we're very excited to be joined by one of the keynotes today, the CTO of Hortonworks, Scott Gnau. Scott, welcome back to The Cube. >> Great to be here, thanks for having me. >> Great to have you back here. One of the things that you talked about in your keynote today was collaboration. You talked about the modern data architecture and one of the things that I thought was really interesting is that now where Horton Works is, you are empowering cross-functional teams, operations managers, business analysts, data scientists, really helping enterprises drive the next generation of value creation. Tell us a little bit about that. >> Right, great. Thanks for noticing, by the way. I think the next, the important thing, kind of as a natural evolution for us as a company and as a community is, and I've seen this time and again in the tech industry, we've kind of moved from really cool breakthrough tech, more into a solutions base. So I think this whole notion is really about how we're making that natural transition. And when you think about all the cool technology and all the breakthrough algorithms and all that, that's really great, but how do we then take that and turn it to value really quickly and in a repeatable fashion. So, the notion that I launched today is really making these three personas really successful. If you can focus, combining all of the technology, usability and even some services around it, to make each of those folks more successful in their job. So I've broken it down really into three categories. We know the traditional business analyst, right? They've Sequel and they've been doing predictive modeling of structured data for a very long time, and there's a lot of value generated from that. Making the business analyst successful Hadoop inspired world is extremely valuable. And why is that? Well, it's because Hadoop actually now brings a lot more breadth of data and frankly a lot more depth of data than they've ever had access to before. But being able to communicate with that business analyst in a language they understand, Sequel, being able to make all those tools work seamlessly, is the next extension of success for the business analyst. We spent a lot of time this morning talking about data scientists, the next great frontier where you bring together lots and lots and lots and lots of data, for instance, Skin and Math and Heavy Compute, with the data scientists and really enable them to go build out that next generation of high definition kind of analytics, all right, and we're all, certainly I am, captured by the notion of self-driving cars, and you think about a self-driving car, and the success of that is purely based on the successful data science. In those cameras and those machines being able to infer images more accurately than a human being, and then make decisions about what those images mean. That's all data science, and it's all about raw processing power and lots and lots and lots of data to make those models train and more accurate than what would otherwise happen. So enabling the data scientist to be successful, obviously, that's a use case. You know, certainly voice activated, voice response kinds of systems, for better customer service; better fraud detection, you know, the cost of a false positive is a hundred times the cost of missing a fraudulent behavior, right? That's because you've irritated a really good customer. So being able to really train those models in high definition is extremely valuable. So bringing together the data, but the tool set so that data scientists can actually act as a team and collaborate and spend less of their time finding the data, and more of their time providing the models. And I said this morning, last but not least, the operations manager. This is really, really, really important. And a lot of times, especially geeks like myself, are just, ah, operations guys are just a pain in the neck. Really, really, really important. We've got data that we've never thought of. Making sure that it's secured properly, making sure that we're managing within the regulations of privacy requirements, making sure that we're governing it and making sure how that data is used, alongside our corporate mission is really important. So creating that tool set so that the operations manager can be confident in turning these massive files of data to the business analyst and to the data scientist and be confident that the company's mission, the regulation that they're working within in those jurisdictions are all in compliance. And so that's what we're building on, and that stack, of course, is built on open source Apache Atlas and open source Apache Ranger and it really makes for an enterprise grade experience. >> And a couple things to follow on to that, we've heard of this notion for years, that there is a shortage of data scientists, and now, it's such a core strategic enabler of business transformation. Is this collaboration, this team support that was talked about earlier, is this helping to spread data science across these personas to enable more of the to be data scientists? >> Yeah, I think there are two aspects to it, right? One is certainly really great data scientists are hard to find; they're scarce. They're unique creatures. And so, to the extent that we're able to combine the tool set to make the data scientists that we have more productive, and I think the numbers are astronomical, right? You could argue that, with the wrong tool set, a data scientist might spend 80% or 90% of his or her time just finding the data and only 10% working on the problem. If we can flip that around and make it 10% finding the data and 90%, that's like, in order of magnitude, more breadth of data science coverage that we get from the same pool of data scientists, so I think that from an efficiency perspective, that's really huge. The second thing, though, is that by looking at these personas and the tools that we're rolling out, can we start to package up things that the data scientists are learning and move those models into the business analysts desktop. So, now, not only is there more breadth and depth of data, but frankly, there's more depth and breadth of models that can be run, but inferred with traditional business process, which means, turning that into better decision making, turning that into better value for the business, just kind of happens automatically. So, you're leveraging the value of data scientists. >> Let me follow that up, Scott. So, if the, right now the biggest time sync for the data scientist or the data engineer is data cleansing and transformation. Where do the cloud vendors fit in in terms of having trained some very broad horizontal models in terms of vision, natural language understanding, text to speech, so where they have accumulated a lot of data assets, and then they created models that were trained and could be customized. Do you see a role for, not just mixed gen UI related models coming from the cloud vendors, but for other vendors who have data assets to provide more fully baked models so that you don't have to start from scratch? >> Absolutely. So, one of the things that I talked about also this morning is this notion, and I said it this morning, kind of opens where open community, open source, and open ecosystem, I think it's now open to the third power, right, and it's talking about open models and algorithms. And I think all of those same things are really creating a tremendous opportunity, the likes of which we've not seen before, and I think it's really driving the velocity in the market, right, so there's no, because we're collaborating in the open, things just get done faster and more efficiently, whether it be in the core open source stuff or whether it be in the open ecosystem, being able to pull tools in. Of course, the announcement earlier today, with IBMs Data Science Experience software as a framework for the data scientists to work as a team, but that thing in and of itself is also very open. You can plug in Python, you can plug in open source models and libraries, some of which were developed in the cloud and published externally. So, it's all about continued availability of open collaboration that is the hallmark of this wave of technology. >> Okay, so we have this issue of how much can we improve the productivity with better tools or with some amount of data. But then, the part that everyone's also point out, besides the cloud experience, is also the ability to operationalize the models and get them into production either in Bespoke apps or packaged apps. How's that going to sort of play out over time? >> Well, I think two things you'll see. One, certainly in the near term, again, with our collaboration with IBM and the Data Science Experience. One of the key things there is not only, not just making the data scientists be able to be more collaborative, but also the ease of which they can publish their models out into the wild. And so, kind of closing that loop to action is really important. I think, longer term, what you're going to see, and I gave a hint of this a little bit in my keynote this morning, is, I believe in five years, we'll be talking about scalability, but scalability won't be the way we think of it today, right? Oh, I have this many petabytes under management, or, petabytes. That's upkeep. But truly, scalability is going to be how many connected devices do you have interacting, and how many analytics can you actually push from model perspective, actually out to the center or out to the device to run locally. Why is that important? Think about it as a consumer with a mobile device. The time of interaction, your attention span, do you get an offer in the right time, and is that offer relevant. It can't be rules based, it has to be models based. There's no time for the electrons to move from your device across a power grid, run an analytic and have it come back. It's going to happen locally. So scalability, I believe, is going to be determined in terms of the CPU cycles and the total interconnected IOT network that you're working in. What does that mean from your original question? That means applications have to be portable, models have to be portable so that they can execute out to the edge where it's required. And so that's, obviously, part of the key technology that we're working with in Portworks Data Flow and the combination of Apache Nifi and Apache Caca and Storm to really combine that, "How do I manage, not only data in motion, but ultimately, how do I move applications and analytics to the data and not be required to move the data to the analytics?" >> So, question for you. You talked about real time offers, for example. We talk a lot about predicted analytics, advanced analytics, data wrangling. What are your thoughts on preemptive analytics? >> Well, I think that, while that sounds a little bit spooky, because we're kind of mind reading, I think those things can start to exist. Certainly because we now have access to all of the data and we have very sophisticated data science models that allow us to understand and predict behavior, yeah, the timing of real time analytics or real time offer delivery, could actually, from our human being perception, arrive before I thought about it. And isn't that really cool in a way. I'm thinking about, I need to go do X,Y,Z. Here's a relevant offer, boom. So it's no longer, I clicked here, I clicker here, I clicked here, and in five seconds I get a relevant offer, but before I even though to click, I got a relevant offer. And again, to the extent that it's relevant, it's not spooky. >> Right. >> If it's irrelevant, then you deal with all of the other downstream impact. So that, again, points to more and more and more data and more and more and more accurate and sophisticated models to make sure that that relevance exists. >> Exactly. Well, Scott Gnau, CTO of Hortonworks, thank you so much for stopping by The Cube once again. We appreciate your conversation and insights. And for George Gilbert, I am Lisa Martin. You're watching The Cube live, from day one of the DataWorks Summit in the heart of Silicon Valley. Stick around, though, we'll be right back.
SUMMARY :
in the heart of Silicon Valley, it's The Cube, the CTO of Hortonworks, Scott Gnau. One of the things that you talked about So enabling the data scientist to be successful, And a couple things to follow on to that, and the tools that we're rolling out, for the data scientist or the data engineer as a framework for the data scientists to work as a team, is also the ability to operationalize the models not just making the data scientists be able to be You talked about real time offers, for example. And again, to the extent that it's relevant, So that, again, points to more and more and more data of the DataWorks Summit in the heart of Silicon Valley.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Lisa Martin | PERSON | 0.99+ |
George Gilbert | PERSON | 0.99+ |
Scott | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
80% | QUANTITY | 0.99+ |
San Jose | LOCATION | 0.99+ |
10% | QUANTITY | 0.99+ |
90% | QUANTITY | 0.99+ |
Scott Gnau | PERSON | 0.99+ |
Silicon Valley | LOCATION | 0.99+ |
IBMs | ORGANIZATION | 0.99+ |
Python | TITLE | 0.99+ |
two aspects | QUANTITY | 0.99+ |
five seconds | QUANTITY | 0.99+ |
Hortonworks | ORGANIZATION | 0.99+ |
One | QUANTITY | 0.99+ |
DataWorks Summit 2017 | EVENT | 0.98+ |
Horton Works | ORGANIZATION | 0.98+ |
Hadoop | TITLE | 0.98+ |
one | QUANTITY | 0.98+ |
DataWorks Summit | EVENT | 0.98+ |
today | DATE | 0.98+ |
each | QUANTITY | 0.98+ |
five years | QUANTITY | 0.97+ |
third | QUANTITY | 0.96+ |
second thing | QUANTITY | 0.96+ |
Apache Caca | ORGANIZATION | 0.95+ |
three personas | QUANTITY | 0.95+ |
this morning | DATE | 0.95+ |
Apache Nifi | ORGANIZATION | 0.95+ |
this morning | DATE | 0.94+ |
three categories | QUANTITY | 0.94+ |
CTO | PERSON | 0.93+ |
The Cube | TITLE | 0.9+ |
Sequel | PERSON | 0.89+ |
Apache Ranger | ORGANIZATION | 0.88+ |
two things | QUANTITY | 0.86+ |
hundred times | QUANTITY | 0.85+ |
Portworks | ORGANIZATION | 0.82+ |
earlier today | DATE | 0.8+ |
Data Science Experience | TITLE | 0.79+ |
The Cube | ORGANIZATION | 0.78+ |
Apache Atlas | ORGANIZATION | 0.75+ |
Storm | ORGANIZATION | 0.74+ |
day one | QUANTITY | 0.74+ |
wave | EVENT | 0.69+ |
one of the keynotes | QUANTITY | 0.66+ |
lots | QUANTITY | 0.63+ |
years | QUANTITY | 0.53+ |
Hortonworks | EVENT | 0.5+ |
lots of data | QUANTITY | 0.49+ |
Sequel | ORGANIZATION | 0.46+ |
Flow | ORGANIZATION | 0.39+ |