Marc Linster, EDB | Postgres Vision 2021

(upbeat music) >> Narrator: From around the globe, it's theCUBE, with digital coverage of Postgres Vision 2021, brought to you by EDB. >> Well, good day, everybody. John Walls here on theCUBE, and continuing our CUBE conversation as part of Postgres Vision 2021, sponsored by EDB, with EDB Chief Technology Officer, Mr. Mark Linster. Mark, good morning to you. How are you doing today? >> I'm doing very fine, very good, sir. >> Excellent. Excellent. Glad you could join us. And we appreciate the time, chance, to look at what's going on in this world of data, which, as you know, continues to evolve quite rapidly. So let's just take that 30,000-foot perspective here to begin with here, and let's talk about data, and management, and what Postgres is doing in terms of accelerating all these innovative techniques, and solutions, and services that we're seeing these days. >> Yeah, so I think it's really... It's a fantastic confluence of factors that we've seen in Postgres, or are seeing in Postgres today, where Postgres has really, really matured over the last couple of years, where things like high availability, parallel processing, use of very high core counts, et cetera, have come together with the drive towards digital transformation, the enormous amounts of data that businesses are dealing with today, so, and then the third factor's really the embracing of open source, right? I mean, Linux has shown the way, and has shown that this is really, really possible. And now we're seeing Postgres as, I think, the next big open source innovation, after Linux, achieving the same type of transformation. So it's really, it's a maturing, it's an acceptance, and the big drive towards dealing with a lot more data as part of digital transformation. >> You know, part of that acceptance that you talk about is about kind of accepting the fact that you have a legacy system that maybe, if you're not going to completely overhaul, you still have to integrate, right? You've got to compliment and start this kind of migration. So in your perspective, or from your perspective, what kind of progress is Postgres allowing in the mindset of CTOs among your client base, or whatever, that their legacy systems can function in this new environment, that all is not lost, and while there is some, perhaps, catching up to do, or some patching you have to do here and there, that it's not as arduous, or not as complex, as might appear to be on the face. >> Well, I think there's, the maturing of Postgres that has really really opened this up, right? Where we're seeing that Postgres can handle these workloads, right? And at the same time, there's a growing number of success cases where companies across all industries, financial services, insurance, manufacturing, retail are using Postgres. So, so you're no longer, you're no longer the first leader who's taken a higher risk, right? Like, five or 10 years ago, Postgres knowledge was not readily available. So if you want Postgres, it was really hard to find somebody who could support you, right? Or find an employee that you could hire who would be the Postgres expert. That's no longer the case. There's plenty of books about Postgres. There's lots of conferences about Postgres. It's a big meetup topic. So, getting know how and getting acceptance amongst your team to use Postgres has become a lot easier, right? At the same time, over 90% of all enterprises today use open source in one way or the other. Which basically means they have open source policies. They have ways to bring open source into the development stream. So that makes it possible, right? Whereas before it was really hard, you had to have an individual who would be evangelized to go, get open source, et cetera, now open source is something that almost everybody is using. You know, from government to financing services, open sources use all over the place, right? So, so now you have something that really matured, right? There's a lot of references out there and then you have the policies that make it possible, right? You have the success stories and now all the pieces have come together to deal with this onslaught of data, right? And then maybe the last thing that that really plays a big role is the cloud. Postgres runs everywhere, right? I mean, it runs from an Arduino to Amazon. Everywhere. And so, which basically means if you want to drive agile business transformation, you call Postgres because you don't have to decide today where it's going to run. You're not locking into a vendor. You're not locking into a limited support system. You can run this thing anywhere. It'll run on your laptop. It'll run on every cloud in the world. You can have it managed, you can have it hosted. You can add have every flavor you want and there's lots of good Postgres support companies out there. So all of these factors together is really what makes us so interesting, right? >> Kubernetes and this marriage, this complimentary, you know relationship right now with Kubernetes, what has that done? You think in terms of providing additional services or at least providing perhaps a new approach or new philosophies, new concepts in terms of database management? >> Well, it's maybe the most the most surprising thing or surprising from the outside. Probably not from the inside, but you think that that Postgres this now 25 year old, database twenty-five year old open source project would be kind of like completely, you know, incompatible with Kubernetes, with containers. But what really happens is Postgres in containers today is the number one database, after Engine X. It is the number two software that is being deployed in containers. So it's really become the workhorse of the whole microservices transformation, right? A 25 year old software, well, it has a very small footprint. It has a lot of interesting features like GIS, document processing, now graph capabilities, common table expressions all those things that are really like cool for developers. And that's probably what leads it to be the number one database in containers. So it's absolutely compatible with Kubernetes. And the whole transformation towards microservices is is like, you know, there's nothing better out there. It runs everywhere and has the most innovative technologies in it. And that's what we're seeing. Also, you go to the annual stack overflow survey of developers, right? It's been consistently number one or number two most loved and most used database, right? So, so what's amazing is that it's this relatively old technology that is, you know, beating everybody else in this digital transformation and then the adoption by developers. >> Just like old dog new tricks, right? It's still winning, right? >> Yeah, yeah, and, and, you know, the elephant is the symbol and this elephant does dance. >> Still dancing that's right. You know, and this is kind of a loaded question but there are a lot of databases out there, a lot of options, obviously from your perspective, you know, Postgres is winning, right? And, and, and from the size of the marketplace it is certainly leading RA leader. In your opinion, you know, what, what is this confluence of factors that have influenced this, this market position if you will, of Postgres or market acceptance of Postgres? >> It's, I mean, it's the, it's a maturing of the core. As I said before, that the transaction rates et cetera, Postgres can handle, are growing every year and are growing dramatic, right? So that's one thing. And then you have it, that Postgres is really, I think, the most reliable and relational database out there as what is my opinion, I'm biased, I guess. And, and it's, it's super quality code but then you add to that the innovation drive. I mean, it was the first one out there with good JSONB support, right? And now it's brought in JSON Path as as part of the new SQL standard. So now you can address JSON data inside your database and the same way you do it inside your browser. And that's pretty cool for developers. Then you combine that with PostGIS, right, which is, I think the most advanced GIS system out there in database. Now, now you got relations, asset compliant, GIS and document. You may say what's so cool about that. Well, what's cool about it is I can do absolutely reliable asset compliant transactions. I can have a fantastic personalization engine through JSONB, and then all my applications need to know where is the transaction? Where is the next store? How far away I'm a form of the parking spot? Right? So now I got a really really nice recipe to put the applications of the future together. You add onto that movements toward supporting graph and supporting other capabilities inside the database. So now you got, you got capability, you've got reliability and you got fantastic innovation. I mean, there's nothing better out there. >> Let's hit the security angle here, 'cause you talked about the asset test, and certainly, you know, those, that criteria is being met. No question about that, whether it's isolation, durability, consistency, whatever, but, but security, I don't have to tell you what a growing concern this is. It's already paramount, but we're seeing every day write stories about, about intrusions and and invasions, if you will. So in terms of providing that layer of security that everybody's looking for right now, you know, this this ultra impenetrable force, if you will, what in your mind, what's Postgres allowing for, in that respect in terms of security, peace of mind, and maybe a little additional comfort that everybody in your space is looking for these? >> So, so look at, look at security with a database like, like multiple layers, right? There's not just, you don't do security only one place. It's like when you go into a bank branch, right? I mean, they do lock the door, they have a camera, there is a gate in front of the safe, there's a safe door. And inside the safe, there is still, again safety deposit boxes with individual locks. The same applies to Postgres, right? Where let's say we start at the heart of it where we can secure and protect tables and data. We're using access control lists and groups and usernames, et cetera. Right? So that's, that's at the heart of it. But then outside of that, we can encrypt the data when on disk or when it's in transit on disk. Most people use the Linux disc encryption systems but there's also good partners out there, like like more metric or others that we work with, that that provide security on disk. And then you go out from there and then you have the securing of the database itself again through the log-ins and the groups. You go out from there and now you have the securing of the hosts that the database is sitting on. Then you'll look at securing the data on the networks through SSL and certificates, et cetera. So that basically there's a multi-layer security model layer that positions Postgres extremely well. And then maybe the last thing is to say it certainly integrates very well with ELDAP, active directory, Kerberos, all the usual suspects that you would use to secure technology inside the enterprise or in an open network, like where people work from home, et cetera. >> You talked about the history about this 25 year old technology, you know, founded back at Cal Berkeley, you know, probably almost some 30 years ago and certainly has evolved. And, and as you have pointed out now as a very mature technology, what do you see though in terms of growth from here? Like, where does it go from here in the next 18 months, 24 months, what what do you think is that next barrier, that challenge that that you think the technology and this open source community wants to take on? >> Well, I think there's there's the continuous effort of making it faster, right? That always happens, right? Every database wants to be faster do more transactions per second, et cetera. And there's a lot of work that has been done there. I mean, just in the last couple of years, Postgres performance has increased by over 50%. Right? So, so transactions per second and that kind of scalability that is going to continue to be, to be a focus, right? And then the other one is leading the implementation of the SQL standards, right? So there'd be the most advanced database, the most innovative database, because, remember for many years now, Postgres has come up with a new release on an annual basis. Other database vendors are now catching up to that, but Postgres has done that for years. So innovation has always been at the heart of it. So we started with JSONB, Key value pair came even before that, PostGis has been around for a long time, graph extensions are going to be the next thing, ingestion of time series data is going to, is going to happen. So there's going to be an ongoing stream of innovations happening. But one thing that I can say is because Postgres is a pure open source project. There's not a hard roadmap, like where it's going to go but where it's going to go is always driven by what people want to have, right? There is no product management department. There's no, there's no great visionary that says, "Oh, this is where we're going to go." No, no. What's going to happen is what people want to have, right? If companies or contributors want to have a certain feature because they need it, well, that's how it's going to happen. And that's really been at the heart of this since Mike Stonebraker, who's an advisor to EDB today, invented it. And then, you know, the open source project got created. This has always been the movement to only focus on things that people actually want to have because if nobody wants to have it, we're just not going to build it because nobody wants it. Right? So when you asked me for the roadmap I believe it's going to be, you know, faster, obviously, always faster, right? Everybody wants faster. And then there's going to be innovation features like making the document stored even better, graph ingestion of large time series, et cetera. That's really what I believe is going to drive it forward. >> Wow. Yeah, the market has spoken and as you point out the market will continue to speak and, and drive that bus. So Mark, thank you for the time today. We certainly appreciate that. And wish EDB continued success at Postgres vision 2021. And thanks for the time. >> Thanks John, it was a pleasure. >> You bet. Mark Linster, joining us, the CTO at EDB. I'm John Walls, you've been watching theCUBE. (upbeat music)

Published Date : Jun 3 2021

SUMMARY :

brought to you by EDB. How are you doing today? data, which, as you know, and has shown that this is the fact that you have and then you have the policies technology that is, you know, the symbol and this elephant does dance. And, and, and from the and the same way you do I don't have to tell you what all the usual suspects that you would use And, and as you have pointed out now And that's really been at the heart And thanks for the time. You bet.

ENTITIES

Entity	Category	Confidence
Mike Stonebraker	PERSON	0.99+
John	PERSON	0.99+
Mark Linster	PERSON	0.99+
Postgres	ORGANIZATION	0.99+
John Walls	PERSON	0.99+
Mark	PERSON	0.99+
Marc Linster	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
EDB	ORGANIZATION	0.99+
JSONB	TITLE	0.99+
over 50%	QUANTITY	0.99+
today	DATE	0.99+
JSON	TITLE	0.98+
over 90%	QUANTITY	0.98+
30,000-foot	QUANTITY	0.98+
five	DATE	0.98+
one way	QUANTITY	0.97+
first one	QUANTITY	0.97+
Kubernetes	TITLE	0.97+
one place	QUANTITY	0.97+
Linux	TITLE	0.96+
one thing	QUANTITY	0.96+
third factor	QUANTITY	0.95+
25 year old	QUANTITY	0.94+
SQL	TITLE	0.94+
JSON Path	TITLE	0.93+
Postgres Vision	ORGANIZATION	0.93+
30 years ago	DATE	0.91+
Arduino	ORGANIZATION	0.91+
Cal Berkeley	ORGANIZATION	0.9+
24 months	QUANTITY	0.89+
twenty-five year old	QUANTITY	0.89+
one	QUANTITY	0.89+
PostGIS	TITLE	0.89+
10 years ago	DATE	0.88+
first leader	QUANTITY	0.87+
Kubernetes	ORGANIZATION	0.86+
Chief Technology Officer	PERSON	0.85+
Vision 2021	EVENT	0.85+
months	DATE	0.81+
2021	DATE	0.81+
two software	QUANTITY	0.8+
number one	QUANTITY	0.8+
JSONB	ORGANIZATION	0.8+
CUBE	ORGANIZATION	0.79+
last couple of years	DATE	0.78+
two	QUANTITY	0.76+

UNLIST TILL 4/2 - End-to-End Security

>> Paige: Hello everybody and thank you for joining us today for the virtual Vertica BDC 2020. Today's breakout session is entitled End-to-End Security in Vertica. I'm Paige Roberts, Open Source Relations Manager at Vertica. I'll be your host for this session. Joining me is Vertica Software Engineers, Fenic Fawkes and Chris Morris. Before we begin, I encourage you to submit your questions or comments during the virtual session. You don't have to wait until the end. Just type your question or comment in the question box below the slide as it occurs to you and click submit. There will be a Q&A session at the end of the presentation and we'll answer as many questions as we're able to during that time. Any questions that we don't address, we'll do our best to answer offline. Also, you can visit Vertica forums to post your questions there after the session. Our team is planning to join the forums to keep the conversation going, so it'll be just like being at a conference and talking to the engineers after the presentation. Also, a reminder that you can maximize your screen by clicking the double arrow button in the lower right corner of the slide. And before you ask, yes, this whole session is being recorded and it will be available to view on-demand this week. We'll send you a notification as soon as it's ready. I think we're ready to get started. Over to you, Fen. >> Fenic: Hi, welcome everyone. My name is Fen. My pronouns are fae/faer and Chris will be presenting the second half, and his pronouns are he/him. So to get started, let's kind of go over what the goals of this presentation are. First off, no deployment is the same. So we can't give you an exact, like, here's the right way to secure Vertica because how it is to set up a deployment is a factor. But the biggest one is, what is your threat model? So, if you don't know what a threat model is, let's take an example. We're all working from home because of the coronavirus and that introduces certain new risks. Our source code is on our laptops at home, that kind of thing. But really our threat model isn't that people will read our code and copy it, like, over our shoulders. So we've encrypted our hard disks and that kind of thing to make sure that no one can get them. So basically, what we're going to give you are building blocks and you can pick and choose the pieces that you need to secure your Vertica deployment. We hope that this gives you a good foundation for how to secure Vertica. And now, what we're going to talk about. So we're going to start off by going over encryption, just how to secure your data from attackers. And then authentication, which is kind of how to log in. Identity, which is who are you? Authorization, which is now that we know who you are, what can you do? Delegation is about how Vertica talks to other systems. And then auditing and monitoring. So, how do you protect your data in transit? Vertica makes a lot of network connections. Here are the important ones basically. There are clients talk to Vertica cluster. Vertica cluster talks to itself. And it can also talk to other Vertica clusters and it can make connections to a bunch of external services. So first off, let's talk about client-server TLS. Securing data between, this is how you secure data between Vertica and clients. It prevents an attacker from sniffing network traffic and say, picking out sensitive data. Clients have a way to configure how strict the authentication is of the server cert. It's called the Client SSLMode and we'll talk about this more in a bit but authentication methods can disable non-TLS connections, which is a pretty cool feature. Okay, so Vertica also makes a lot of network connections within itself. So if Vertica is running behind a strict firewall, you have really good network, both physical and software security, then it's probably not super important that you encrypt all traffic between nodes. But if you're on a public cloud, you can set up AWS' firewall to prevent connections, but if there's a vulnerability in that, then your data's all totally vulnerable. So it's a good idea to set up inter-node encryption in less secure situations. Next, import/export is a good way to move data between clusters. So for instance, say you have an on-premises cluster and you're looking to move to AWS. Import/Export is a great way to move your data from your on-prem cluster to AWS, but that means that the data is going over the open internet. And that is another case where an attacker could try to sniff network traffic and pull out credit card numbers or whatever you have stored in Vertica that's sensitive. So it's a good idea to secure data in that case. And then we also connect to a lot of external services. Kafka, Hadoop, S3 are three of them. Voltage SecureData, which we'll talk about more in a sec, is another. And because of how each service deals with authentication, how to configure your authentication to them differs. So, see our docs. And then I'd like to talk a little bit about where we're going next. Our main goal at this point is making Vertica easier to use. Our first objective was security, was to make sure everything could be secure, so we built relatively low-level building blocks. Now that we've done that, we can identify common use cases and automate them. And that's where our attention is going. Okay, so we've talked about how to secure your data over the network, but what about when it's on disk? There are several different encryption approaches, each depends on kind of what your use case is. RAID controllers and disk encryption are mostly for on-prem clusters and they protect against media theft. They're invisible to Vertica. S3 and GCP are kind of the equivalent in the cloud. They also invisible to Vertica. And then there's field-level encryption, which we accomplish using Voltage SecureData, which is format-preserving encryption. So how does Voltage work? Well, it, the, yeah. It encrypts values to things that look like the same format. So for instance, you can see date of birth encrypted to something that looks like a date of birth but it is not in fact the same thing. You could do cool stuff like with a credit card number, you can encrypt only the first 12 digits, allowing the user to, you know, validate the last four. The benefits of format-preserving encryption are that it doesn't increase database size, you don't need to alter your schema or anything. And because of referential integrity, it means that you can do analytics without unencrypting the data. So again, a little diagram of how you could work Voltage into your use case. And you could even work with Vertica's row and column access policies, which Chris will talk about a bit later, for even more customized access control. Depending on your use case and your Voltage integration. We are enhancing our Voltage integration in several ways in 10.0 and if you're interested in Voltage, you can go see their virtual BDC talk. And then again, talking about roadmap a little, we're working on in-database encryption at rest. What this means is kind of a Vertica solution to encryption at rest that doesn't depend on the platform that you're running on. Encryption at rest is hard. (laughs) Encrypting, say, 10 petabytes of data is a lot of work. And once again, the theme of this talk is everyone has a different key management strategy, a different threat model, so we're working on designing a solution that fits everyone. If you're interested, we'd love to hear from you. Contact us on the Vertica forums. All right, next up we're going to talk a little bit about access control. So first off is how do I prove who I am? How do I log in? So, Vertica has several authentication methods. Which one is best depends on your deployment size/use case. Again, theme of this talk is what you should use depends on your use case. You could order authentication methods by priority and origin. So for instance, you can only allow connections from within your internal network or you can enforce TLS on connections from external networks but relax that for connections from your internal network. That kind of thing. So we have a bunch of built-in authentication methods. They're all password-based. User profiles allow you to set complexity requirements of passwords and you can even reject non-TLS connections, say, or reject certain kinds of connections. Should only be used by small deployments because you probably have an LDAP server, where you manage users if you're a larger deployment and rather than duplicating passwords and users all in LDAP, you should use LDAP Auth, where Vertica still has to keep track of users, but each user can then use LDAP authentication. So Vertica doesn't store the password at all. The client gives Vertica a username and password and Vertica then asks the LDAP server is this a correct username or password. And the benefits of this are, well, manyfold, but if, say, you delete a user from LDAP, you don't need to remember to also delete their Vertica credentials. You can just, they won't be able to log in anymore because they're not in LDAP anymore. If you like LDAP but you want something a little bit more secure, Kerberos is a good idea. So similar to LDAP, Vertica doesn't keep track of who's allowed to log in, it just keeps track of the Kerberos credentials and it even, Vertica never touches the user's password. Users log in to Kerberos and then they pass Vertica a ticket that says "I can log in." It is more complex to set up, so if you're just getting started with security, LDAP is probably a better option. But Kerberos is, again, a little bit more secure. If you're looking for something that, you know, works well for applications, certificate auth is probably what you want. Rather than hardcoding a password, or storing a password in a script that you use to run an application, you can instead use a certificate. So, if you ever need to change it, you can just replace the certificate on disk and the next time the application starts, it just picks that up and logs in. Yeah. And then, multi-factor auth is a feature request we've gotten in the past and it's not built-in to Vertica but you can do it using Kerberos. So, security is a whole application concern and fitting MFA into your workflow is all about fitting it in at the right layer. And we believe that that layer is above Vertica. If you're interested in more about how MFA works and how to set it up, we wrote a blog on how to do it. And now, over to Chris, for more on identity and authorization. >> Chris: Thanks, Fen. Hi everyone, I'm Chris. So, we're a Vertica user and we've connected to Vertica but once we're in the database, who are we? What are we? So in Vertica, the answer to that questions is principals. Users and roles, which are like groups in other systems. Since roles can be enabled and disabled at will and multiple roles can be active, they're a flexible way to use only the privileges you need in the moment. For example here, you've got Alice who has Dbadmin as a role and those are some elevated privileges. She probably doesn't want them active all the time, so she can set the role and add them to her identity set. All of this information is stored in the catalog, which is basically Vertica's metadata storage. How do we manage these principals? Well, depends on your use case, right? So, if you're a small organization or maybe only some people or services need Vertica access, the solution is just to manage it with Vertica. You can see some commands here that will let you do that. But what if we're a big organization and we want Vertica to reflect what's in our centralized user management system? Sort of a similar motivating use case for LDAP authentication, right? We want to avoid duplication hassles, we just want to centralize our management. In that case, we can use Vertica's LDAPLink feature. So with LDAPLink, principals are mirrored from LDAP. They're synced in a considerable fashion from the LDAP into Vertica's catalog. What this does is it manages creating and dropping users and roles for you and then mapping the users to the roles. Once that's done, you can do any Vertica-specific configuration on the Vertica side. It's important to note that principals created in Vertica this way, support multiple forms of authentication, not just LDAP. This is a separate feature from LDAP authentication and if you created a user via LDAPLink, you could have them use a different form of authentication, Kerberos, for example. Up to you. Now of course this kind of system is pretty mission-critical, right? You want to make sure you get the right roles and the right users and the right mappings in Vertica. So you probably want to test it. And for that, we've got new and improved dry run functionality, from 9.3.1. And what this feature offers you is new metafunctions that let you test various parameters without breaking your real LDAPLink configuration. So you can mess around with parameters and the configuration as much as you want and you can be sure that all of that is strictly isolated from the live system. Everything's separated. And when you use this, you get some really nice output through a Data Collector table. You can see some example output here. It runs the same logic as the real LDAPLink and provides detailed information about what would happen. You can check the documentation for specifics. All right, so we've connected to the database, we know who we are, but now, what can we do? So for any given action, you want to control who can do that, right? So what's the question you have to ask? Sometimes the question is just who are you? It's a simple yes or no question. For example, if I want to upgrade a user, the question I have to ask is, am I the superuser? If I'm the superuser, I can do it, if I'm not, I can't. But sometimes the actions are more complex and the question you have to ask is more complex. Does the principal have the required privileges? If you're familiar with SQL privileges, there are things like SELECT, INSERT, and Vertica has a few of their own, but the key thing here is that an action can require specific and maybe even multiple privileges on multiple objects. So for example, when selecting from a table, you need USAGE on the schema and SELECT on the table. And there's some other examples here. So where do these privileges come from? Well, if the action requires a privilege, these are the only places privileges can come from. The first source is implicit privileges, which could come from owning the object or from special roles, which we'll talk about in a sec. Explicit privileges, it's basically a SQL standard GRANT system. So you can grant privileges to users or roles and optionally, those users and roles could grant them downstream. Discretionary access control. So those are explicit and they come from the user and the active roles. So the whole identity set. And then we've got Vertica-specific inherited privileges and those come from the schema, and we'll talk about that in a sec as well. So these are the special roles in Vertica. First role, DBADMIN. This isn't the Dbadmin user, it's a role. And it has specific elevated privileges. You can check the documentation for those exact privileges but it's less than the superuser. The PSEUDOSUPERUSER can do anything the real superuser can do and you can grant this role to whomever. The DBDUSER is actually a role, can run Database Designer functions. SYSMONITOR gives you some elevated auditing permissions and we'll talk about that later as well. And finally, PUBLIC is a role that everyone has all the time so anything you want to be allowed for everyone, attach to PUBLIC. Imagine this scenario. I've got a really big schema with lots of relations. Those relations might be changing all the time. But for each principal that uses this schema, I want the privileges for all the tables and views there to be roughly the same. Even though the tables and views come and go, for example, an analyst might need full access to all of them no matter how many there are or what there are at any given time. So to manage this, my first approach I could use is remember to run grants every time a new table or view is created. And not just you but everyone using this schema. Not only is it a pain, it's hard to enforce. The second approach is to use schema-inherited privileges. So in Vertica, schema grants can include relational privileges. For example, SELECT or INSERT, which normally don't mean anything for a schema, but they do for a table. If a relation's marked as inheriting, then the schema grants to a principal, for example, salespeople, also apply to the relation. And you can see on the diagram here how the usage applies to the schema and the SELECT technically but in Sales.foo table, SELECT also applies. So now, instead of lots of GRANT statements for multiple object owners, we only have to run one ALTER SCHEMA statement and three GRANT statements and from then on, any time that you grant some privileges or revoke privileges to or on the schema, to or from a principal, all your new tables and views will get them automatically. So it's dynamically calculated. Now of course, setting it up securely, is that you want to know what's happened here and what's going on. So to monitor the privileges, there are three system tables which you want to look at. The first is grants, which will show you privileges that are active for you. That is your user and active roles and theirs and so on down the chain. Grants will show you the explicit privileges and inherited_privileges will show you the inherited ones. And then there's one more inheriting_objects which will show all tables and views which inherit privileges so that's useful more for not seeing privileges themselves but managing inherited privileges in general. And finally, how do you see all privileges from all these sources, right? In one go, you want to see them together? Well, there's a metafunction added in 9.3.1. Get_privileges_description which will, given an object, it will sum up all the privileges for a current user on that object. I'll refer you to the documentation for usage and supported types. Now, the problem with SELECT. SELECT let's you see everything or nothing. You can either read the table or you can't. But what if you want some principals to see subset or a transformed version of the data. So for example, I have a table with personnel data and different principals, as you can see here, need different access levels to sensitive information. Social security numbers. Well, one thing I could do is I could make a view for each principal. But I could also use access policies and access policies can do this without introducing any new objects or dependencies. It centralizes your restriction logic and makes it easier to manage. So what do access policies do? Well, we've got row and column access policies. Rows will hide and column access policies will transform data in the row or column, depending on who's doing the SELECTing. So it transforms the data, as we saw on the previous slide, to look as requested. Now, if access policies let you see the raw data, you can still modify the data. And the implication of this is that when you're crafting access policies, you should only use them to refine access for principals that need read-only access. That is, if you want a principal to be able to modify it, the access policies you craft should let through the raw data for that principal. So in our previous example, the loader service should be able to see every row and it should be able to see untransformed data in every column. And as long as that's true, then they can continue to load into this table. All of this is of course monitorable by a system table, in this case access_policy. Check the docs for more information on how to implement these. All right, that's it for access control. Now on to delegation and impersonation. So what's the question here? Well, the question is who is Vertica? And that might seem like a silly question, but here's what I mean by that. When Vertica's connecting to a downstream service, for example, cloud storage, how should Vertica identify itself? Well, most of the time, we do the permissions check ourselves and then we connect as Vertica, like in this diagram here. But sometimes we can do better. And instead of connecting as Vertica, we connect with some kind of upstream user identity. And when we do that, we let the service decide who can do what, so Vertica isn't the only line of defense. And in addition to the defense in depth benefit, there are also benefits for auditing because the external system can see who is really doing something. It's no longer just Vertica showing up in that external service's logs, it's somebody like Alice or Bob, trying to do something. One system where this comes into play is with Voltage SecureData. So, let's look at a couple use cases. The first one, I'm just encrypting for compliance or anti-theft reasons. In this case, I'll just use one global identity to encrypt or decrypt with Voltage. But imagine another use case, I want to control which users can decrypt which data. Now I'm using Voltage for access control. So in this case, we want to delegate. The solution here is on the Voltage side, give Voltage users access to appropriate identities and these identities control encryption for sets of data. A Voltage user can access multiple identities like groups. Then on the Vertica side, a Vertica user can set their Voltage username and password in a session and Vertica will talk to Voltage as that Voltage user. So in the diagram here, you can see an example of how this is leverage so that Alice could decrypt something but Bob cannot. Another place the delegation paradigm shows up is with storage. So Vertica can store and interact with data on non-local file systems. For example, HGFS or S3. Sometimes Vertica's storing Vertica-managed data there. For example, in Eon mode, you might store your projections in communal storage in S3. But sometimes, Vertica is interacting with external data. For example, this usually maps to a user storage location in the Vertica side and it might, on the external storage side, be something like Parquet files on Hadoop. And in that case, it's not really Vertica's data and we don't want to give Vertica more power than it needs, so let's request the data on behalf of who needs it. Lets say I'm an analyst and I want to copy from or export to Parquet, using my own bucket. It's not Vertica's bucket, it's my data. But I want Vertica to manipulate data in it. So the first option I have is to give Vertica as a whole access to the bucket and that's problematic because in that case, Vertica becomes kind of an AWS god. It can see any bucket, any Vertica user might want to push or pull data to or from any time Vertica wants. So it's not good for the principals of least access and zero trust. And we can do better than that. So in the second option, use an ID and secret key pair for an AWS, IAM, if you're familiar, principal that does have access to the bucket. So I might use my, the analyst, credentials, or I might use credentials for an AWS role that has even fewer privileges than I do. Sort of a restricted subset of my privileges. And then I use that. I set it in Vertica at the session level and Vertica will use those credentials for the copy export commands. And it gives more isolation. Something that's in the works is support for keyless delegation, using assumable IAM roles. So similar benefits to option two here, but also not having to manage keys at the user level. We can do basically the same thing with Hadoop and HGFS with three different methods. So first option is Kerberos delegation. I think it's the most secure. It definitely, if access control is your primary concern here, this will give you the tightest access control. The downside is it requires the most configuration outside of Vertica with Kerberos and HGFS but with this, you can really determine which Vertica users can talk to which HGFS locations. Then, you've got secure impersonation. If you've got a highly trusted Vertica userbase, or at least some subset of it is, and you're not worried about them doing things wrong but you want to know about auditing on the HGFS side, that's your primary concern, you can use this option. This diagram here gives you a visual overview of how that works. But I'll refer you to the docs for details. And then finally, option three, this is bringing your own delegation token. It's similar to what we do with AWS. We set something in the session level, so it's very flexible. The user can do it at an ad hoc basis, but it is manual, so that's the third option. Now on to auditing and monitoring. So of course, we want to know, what's happening in our database? It's important in general and important for incident response, of course. So your first stop, to answer this question, should be system tables. And they're a collection of information about events, system state, performance, et cetera. They're SELECT-only tables, but they work in queries as usual. The data is just loaded differently. So there are two types generally. There's the metadata table, which stores persistent information or rather reflects persistent information stored in the catalog, for example, users or schemata. Then there are monitoring tables, which reflect more transient information, like events, system resources. Here you can see an example of output from the resource pool's storage table which, these are actually, despite that it looks like system statistics, they're actually configurable parameters for using that. If you're interested in resource pools, a way to handle users' resource allocation and various principal's resource allocation, again, check that out on the docs. Then of course, there's the followup question, who can see all of this? Well, some system information is sensitive and we should only show it to those who need it. Principal of least privilege, right? So of course the superuser can see everything, but what about non-superusers? How do we give access to people that might need additional information about the system without giving them too much power? One option's SYSMONITOR, as I mentioned before, it's a special role. And this role can always read system tables but not change things like a superuser would be able to. Just reading. And another option is the RESTRICT and RELEASE metafunctions. Those grant and revoke access to from a certain system table set, to and from the PUBLIC role. But the downside of those approaches is that they're inflexible. So they only give you, they're all or nothing. For a specific preset of tables. And you can't really configure it per table. So if you're willing to do a little more setup, then I'd recommend using your own grants and roles. System tables support GRANT and REVOKE statements just like any regular relations. And in that case, I wouldn't even bother with SYSMONITOR or the metafunctions. So to do this, just grant whatever privileges you see fit to roles that you create. Then go ahead and grant those roles to the users that you want. And revoke access to the system tables of your choice from PUBLIC. If you need even finer-grained access than this, you can create views on top of system tables. For example, you can create a view on top of the user system table which only shows the current user's information, uses a built-in function that you can use as part of the view definition. And then, you can actually grant this to PUBLIC, so that each user in Vertica could see their own user's information and never give access to the user system table as a whole, just that view. Now if you're a superuser or if you have direct access to nodes in the cluster, filesystem/OS, et cetera, then you have more ways to see events. Vertica supports various methods of logging. You can see a few methods here which are generally outside of running Vertica, you'd interact with them in a different way, with the exception of active events which is a system table. We've also got the data collector. And that sorts events by subjects. So what the data collector does, it extends the logging and system table functionality, by the component, is what it's called in the documentation. And it logs these events and information to rotating files. For example, AnalyzeStatistics is a function that could be of use by users and as a database administrator, you might want to monitor that so you can use the data collector for AnalyzeStatistics. And the files that these create can be exported into a monitoring database. One example of that is with the Management Console Extended Monitoring. So check out their virtual BDC talk. The one on the management console. And that's it for the key points of security in Vertica. Well, many of these slides could spawn a talk on their own, so we encourage you to check out our blog, check out the documentation and the forum for further investigation and collaboration. Hopefully the information we provided today will inform your choices in securing your deployment of Vertica. Thanks for your time today. That concludes our presentation. Now, we're ready for Q&A.

Published Date : Mar 30 2020

SUMMARY :

in the question box below the slide as it occurs to you So for instance, you can see date of birth encrypted and the question you have to ask is more complex.

ENTITIES

Entity	Category	Confidence
Chris	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Chris Morris	PERSON	0.99+
second option	QUANTITY	0.99+
Vertica	ORGANIZATION	0.99+
Paige Roberts	PERSON	0.99+
two types	QUANTITY	0.99+
first option	QUANTITY	0.99+
three	QUANTITY	0.99+
Alice	PERSON	0.99+
second approach	QUANTITY	0.99+
Paige	PERSON	0.99+
third option	QUANTITY	0.99+
AWS'	ORGANIZATION	0.99+
today	DATE	0.99+
Today	DATE	0.99+
first approach	QUANTITY	0.99+
second half	QUANTITY	0.99+
each service	QUANTITY	0.99+
Bob	PERSON	0.99+
10 petabytes	QUANTITY	0.99+
Fenic	PERSON	0.99+
first	QUANTITY	0.99+
first source	QUANTITY	0.99+
first one	QUANTITY	0.99+
Fen	PERSON	0.98+
S3	TITLE	0.98+
One system	QUANTITY	0.98+
first objective	QUANTITY	0.98+
each user	QUANTITY	0.98+
First role	QUANTITY	0.97+
each principal	QUANTITY	0.97+
4/2	DATE	0.97+
each	QUANTITY	0.97+
both	QUANTITY	0.97+
Vertica	TITLE	0.97+
First	QUANTITY	0.97+
one	QUANTITY	0.96+
this week	DATE	0.95+
three different methods	QUANTITY	0.95+
three system tables	QUANTITY	0.94+
one thing	QUANTITY	0.94+
Fenic Fawkes	PERSON	0.94+
Parquet	TITLE	0.94+
Hadoop	TITLE	0.94+
One example	QUANTITY	0.93+
Dbadmin	PERSON	0.92+
10.0	QUANTITY	0.92+

Will Nowak, Dataiku | AWS re:Invent 2019

>>long from Las Vegas. It's the Q covering a ws re invent 2019. Brought to you by Amazon Web service is and in along with its ecosystem partners. >>Hey, welcome back to the Cube. Lisa Martin at AWS Reinvent 19. This is Day three of the Cubes coverage. We have two sets here. Lots of cute content are joined by Justin Warren, the founder and chief analyst at Pivot nine. Justin. How's it going? Great, right? You still have a voice? Three days? >>Just barely. I've been I've been trying to take care of it. >>Impressed. And you probably have talked to at least half of the 65,000 attendees. >>I'm trying to talk to as many as I can. >>Well, we're gonna talk to another guy here. Joining us from data ICU is well, Novak, the solutions architect will be the Cube. >>Thanks for having me. >>You have a good voice too. After a three day is that you >>have been doing the best I can. >>Yeah, he's good. So did ICU. Interesting name. Let's start off by sharing with our audience. Who did a coup is and what you guys do in technology. >>Yes. So the Entomology of date ICU. It's like hi cooze for data. So we say we take your data and, you know, we make poetry out of it. Make your data so beautiful. Wow, Now, But for those who are unaware Day like it was an enterprise data science platform. Eso we provide a collaborative environment for we say coders and clickers kind of business analyst and native data scientists to make use of organizations, data bill reports and Bill productive machine learning base models and deploy them. >>I'm only the guy's been around around for eight years. Eight years. Okay, >>so start up. Still >>mourning the cloud, the opportunity there That data is no longer a liability. It's an asset or should be. >>So we've been server based from the start, which is one of our differentiators. And so by that we see ourselves as a collaborative platform. Users access it through a Web browser, log into a shared space and share code, can share visual recipes, as we call them to prepare data. >>Okay, so what customers using the platform to do with machine learning is pretty hot at the moment. I think it might be nearing the peak of the life cycle pretty hot. Yeah, what a customer is actually actually doing on the platform, >>you know, So we really focus on enabling the enterprise. So, for example, G has been a customer for some time now, and Sergey is a great prototypical example on that. They have many disparate use cases, like simple things like doing customer segmentation for, you know, marketing campaigns but also stuff like Coyote predicted maintenance. So use cases kind of run the gamut, and so did ICU. Based on open source, we're enabling all of G's users to come into a centralized platform, access their data manipulated for whatever purposes. Maybe >>nobody talked about marketing campaigns for a second. I'm wondering. Are, is their integration with serum technologies? Or how would a customer like wanting to understand customer segmentation or had a segment it for marketing campaign? How would they work in conjunction with a serum and data ICU, for example? >>It's a great question. So again, us being a platform way sit on a single server, something like an Amazon ec2 instance, and then we make connections into an organization's data sources. So if using something like Salesforce weaken seamlessly, pull in data from Salesforce Yuka manipulated in date ICU, but the same time. Maybe also have some excel file someone you know me. I can bring that into my data to work environment. And I also have a red shift data table. All those things would come into the same environment. I can visualize. I can analyze, and I can prepare the data. I see. >>So you tell you it's based on open source? I'm a longtime fan of over. It's always been involved in it for longer than I care to remember. Actually, that's an interesting way t base your product on that. So maybe talk us through how you how you came to found the company based on basic an open source. What? What led to that choice? What? What was that decision based on? >>Yeah, for sure. So you talked about how you know the hype cycle? A. I saw how hot is a I and so I think again, our founders astutely recognize that this is a very fast moving place to be. And so I'm kind of betting on one particular technology can be risky. So instead, by being a platform, we say, like sequel has been the data transformation language do jour for many days now. So, of course, that you can easily write Sequel and a lot of our visual data Transformations are based on the sequel language, but also something like Python again. It's like the language de jour for machine law machine learning model building right now, so you can easily code in python. Maintain your python libraries in date, ICU And so by leveraging open source, we figured we're making our clients more future proof as long as they're staying in date ICU. But using data ICU to leverage the best in breed and open source, they'll always be kind of where they want to be in the technological landscape by supposed to locked into some tech that is now out of date. >>What's been the appetite for making data beautiful for a legacy enterprise, like a G E that's been around for a very long time versus a more modern either. Born in the Cloud er's our CEO says, reborn in the cloud. What are some of the differences but also similarities that you see in terms of we have to be able to use emerging tech. Otherwise someone's gonna come in behind us and replace us. >>Yeah, I mean, I think it's complicated in that there's still a lot of value to be had in someone says, like a bar chart you can rely on right, So it's maybe not sexy. But having good reporting and analytics is something that both you know, 200 year old enterprise organizations and data native organizations startups needs. At the same time, building predicted machine learning models and deploying those is rest a p i n points that developers can use in your organization to provide a data driven product for your consumers. Like that's amore advanced use case that everyone kind of wants to be a part of again data. Who's a nice tool, which says Maybe you don't have developers who are very fluent in turning out flashed applications. We could give you a place to build a predictive model and deploy that predictive model, saving you time to write all that code on the back end. >>One of the themes of the show has been transformation, so it sounds like data ICU would be It's something that you can dip your toes in and start to get used to using. Even if you're not particularly familiar with Time machine learning model a model building. >>Yeah, that's exactly right. So a big part of our product and encourage watchers to go try it out themselves and go to our website. Download a free version pretrial, but is enablement. So if you're the most sophisticated applied math PhD there is, like, Who's a great environment for you to Code and Bill predictive models. If you never built the machine learning model before you can use data ICU to run visual machine learning recipes, we call them, and also we give you documentation, which is, Hey, this is a random forest model. What is a random forest model? We'll tell you a little bit about it. And that's another thing that some of these enterprises have really appreciated about date I could. It is helping up skill there user base >>in terms of that transformation theme that Justin just mention which we're hearing a lot about, not visit this show. It's a big thing, but we hear it all the time, right? But in terms of customers transformation, journey, whatever you wanna call it, cloud is gonna be an essential enabler of being able to really love it value from a I. So I'm just wondering from a strategic positioning standpoint. Is did ICU positioned as a facilitator or as fuel for a cloud transformation that on enterprise would undergo >>again? Yes, great point. So for us, I can't take the credit. This credit goes to our founders, but we've thought from the start the clouds and exciting proposition Not everyone is. They're still in 2019. Most people, if not all of them, want to get there. Also, people want too many of our clients want the multi cloud on a day. Like who says, If you want to be on prim, if you want to be in a single cloud subscription. If you want to be multi cloud again as a platform, we're just gonna give you connection to your underlying infrastructure. You could use the infrastructure that you like and just use our front end to help your analyst get value. They can. I >>think I think a lot of vendors across the entire ecosystem around to say the customer choice is really important, and the customers, particularly enterprise customers, want to be able to have lots of different options, and not all of them will be ready to go completely. All in on cloud today. They made it may take them years, possibly decades, to get there. So having that choice is like it's something that it would work with you today and we'll work with you tomorrow, depending on what choices you make. >>It's exactly right. Another thing we've seen a lot of to that day, like who helps with and whether it's like you or other tools. Like, of course, you want best in breed, but you also want particularly for a large enterprise. You don't want people operating kind of in a wild West, particularly in like the ML data science space. So you know we integrate with Jupiter notebooks, but some of our clients come to us initially. Just have I won't say rogues that has a negative connotation. But maybe I will say Road road data Scientists are just tapping into some day the store. They're using Jupiter notebooks to build a predictive model, but then to actually production allies that to get sustainable value out of it like it's to one off and so having a centralized platform like date ICU, where you can say this is where we're going to use our central model depository, that something where businesses like they can sleep easier at night because they know where is my ML development happening? It's happening in one ecosystem. What tools that happening with, well, best in breed of open source. So again, you kind of get best of both worlds like they like you. >>It sounds like it's more about the operations of machine learning. It is really, really important rather than just. It's the pure technology. Yes, that's important as well, and you need to have the data Sinus to build it, but having something that allows you to operationalize it so that you can just bake it into what we do every day as a business. >>Yeah, I think in a conference like this all about tech, it's easy to forget what we firmly believe, which is a I and maybe tech. More broadly, it's still human problems at the core, right? Once you get the tech right, the code runs corrected. The code is written correctly. Therefore, like human interactions, project management model deployment in an organization. These are really hard, human centered problems, but so having tech that enables that human centric collaboration helps with that, we find >>Let's talk about some of the things that we can't ever go to an event and not talk about. Nut is respected data quality, reliability and security. Understood? I could facilitate those three cornerstones. >>Yeah, sure. So, again, viewers, I would encourage you to check out the date. ICU has some nice visual indications of data quality. So an analyst or data scientists and come in very easily understand, you know, is this quality to conform to the standards that my organization has set and what I mean by standards that could be configured. Right? So does this column have the appropriate schema? Does it have the appropriate carnality? These are things that an individual might decide to use on then for security. So Data has its own security mechanisms. However, we also to this point about incorporating best Retek. We'll work with whatever underlying security mechanisms organizations organizations have in place. So, for instance, if you're using a W s, you have, I am rolls to manage your security. Did ICU comport those that apply those to the date ICU environment or using something like on prime miss, uh, duke waken you something like Kerberos has the technology to again manage access to resources. So we're taking the best in breed that this organization already has invested time, energy and resources into and saying We're not trying to compete with them but rather were trying to enable organizations to use these technologies efficiently. >>Yeah, I like that consistency of customer choice. We spoke about that just before. I'm seeing that here with their choices around. Well, if you're on this particular platform will integrate with whatever the tools are there. People underestimate how important that is for enterprises, that it has to be ahead. Virginia's environment, playing well with others is actually quite important. >>Yeah, I don't know that point. Like the combination of heterogeneity but also uniformity. It's a hard balance to strike, and I think it's really important, giving someone a unified environment but still choice. At the same time. A good restaurant or something like you won't be able to pick your dish, but you want to know that the entire quality is high. And so having that consistent ecosystem, I think, really helps >>what are, in your opinion, some of the next industries that you see there really right to start Really leveraging machine learning to transfer You mentioned g e a very old legacy business. If we think of you know what happened with the ride hailing industry uber, for example, or fitness with Saletan or pinchers with visible Serge, what do you think is the next industry? That's like you guys taking advantage of machine learning will completely transform this and our lives. >>I mean, the easy answer that I'll give because it's easy to say it's gonna transform. But hard to operationalize is health care, right? So there is structured data, but the data quality is so desperate and had a row genius s, I think you know, if organizations in a lot of this again it's a human centered problem. If people could decide on data standards and also data privacy is, of course, a huge issue. We talked about data security internally, but also as a customer. What day to do I want you know, this hospital, this health care provider, to have access to that human issues we have to result but conditional on that being resolved that staring out a way to anonymous eyes data and respect data privacy but have consistent data structure. And we could say, Hey, let's really set these a I M L models loose and figure out things like personalized medicine which were starting to get to. But I feel like there's still a lot of room to go. That >>sounds like it's exciting time to be in machine learning. People should definitely check out products such as Dead Rock you and see what happens. >>Last question for you is so much news has come out in the last three days. It's mind boggling sum of the takeaways, that of some of the things that you've heard from Andy Jassy to border This'll Morning. >>Yeah, I think a big thing for me, which was something for me before this week. But it's always nice to hear an Amazon reassures the concept of white box. Aye, aye. We've been talking about that a date ICU for some time, but everyone wants performance A. I R ml solutions, but increasing. There's a really appetite publicly for interpret ability, and so you have to be responsible. You have to have interpret belay I and so it's nice to hear a leader like Amazon echo that day like you. That's something we've been talking about since our start. >>A little bit validating them for data ICU, for sure, for sure. Well, thank you for joining. Just to be on the kid, the suffering. And we appreciate it. Appreciate it. All right. For my co host, Justin Warren, I'm Lisa Martin and your work to the Cube from Vegas. It's AWS reinvent 19.

Published Date : Dec 5 2019

SUMMARY :

Brought to you by Amazon Web service by Justin Warren, the founder and chief analyst at Pivot nine. I've been I've been trying to take care of it. And you probably have talked to at least half of the 65,000 attendees. Well, we're gonna talk to another guy here. After a three day is that you Who did a coup is and what you guys do in technology. you know, we make poetry out of it. I'm only the guy's been around around for eight years. so start up. mourning the cloud, the opportunity there That data is no longer a And so by that we see ourselves as a collaborative platform. actually doing on the platform, like simple things like doing customer segmentation for, you know, marketing campaigns but Are, is their integration with serum Maybe also have some excel file someone you know me. So maybe talk us through how you how you came to found the company based on basic So, of course, that you can easily write Sequel and a lot of our visual data Transformations What are some of the differences but also similarities that you see in terms of we have to be had in someone says, like a bar chart you can rely on right, So it's maybe not sexy. One of the themes of the show has been transformation, so it sounds like data ICU would be It's something that you can dip your we call them, and also we give you documentation, which is, Hey, this is a random forest model. transformation, journey, whatever you wanna call it, cloud is gonna be an essential as a platform, we're just gonna give you connection to your underlying infrastructure. So having that choice is like it's something that it would work with you today and we'll work with you tomorrow, So you know we integrate with Jupiter notebooks, but some of our clients come to us initially. to operationalize it so that you can just bake it into what we do every day as a business. Yeah, I think in a conference like this all about tech, it's easy to forget what we firmly Let's talk about some of the things that we can't ever go to an event and not talk about. like on prime miss, uh, duke waken you something like Kerberos has the technology to again Yeah, I like that consistency of customer choice. A good restaurant or something like you won't be able to pick your dish, If we think of you know what happened with the ride hailing industry uber, for example, What day to do I want you know, such as Dead Rock you and see what happens. Last question for you is so much news has come out in the last three days. There's a really appetite publicly for interpret ability, and so you have to be responsible. thank you for joining.

ENTITIES

Entity	Category	Confidence
Justin Warren	PERSON	0.99+
Lisa Martin	PERSON	0.99+
2019	DATE	0.99+
Justin	PERSON	0.99+
Andy Jassy	PERSON	0.99+
Las Vegas	LOCATION	0.99+
Will Nowak	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Eight years	QUANTITY	0.99+
python	TITLE	0.99+
200 year	QUANTITY	0.99+
Python	TITLE	0.99+
Vegas	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
echo	COMMERCIAL_ITEM	0.99+
Sergey	PERSON	0.99+
today	DATE	0.99+
tomorrow	DATE	0.99+
Novak	PERSON	0.99+
two sets	QUANTITY	0.99+
Three days	QUANTITY	0.99+
Virginia	LOCATION	0.98+
Dataiku	PERSON	0.98+
both	QUANTITY	0.98+
Dead Rock	TITLE	0.97+
single server	QUANTITY	0.97+
both worlds	QUANTITY	0.97+
three day	QUANTITY	0.97+
Serge	PERSON	0.96+
one	QUANTITY	0.96+
single cloud	QUANTITY	0.96+
Retek	ORGANIZATION	0.95+
uber	ORGANIZATION	0.95+
Salesforce	ORGANIZATION	0.95+
a day	QUANTITY	0.93+
Day three	QUANTITY	0.93+
One	QUANTITY	0.91+
65,000 attendees	QUANTITY	0.91+
This'll Morning	TITLE	0.9+
Coyote	ORGANIZATION	0.89+
Amazon Web	ORGANIZATION	0.89+
Kerberos	ORGANIZATION	0.88+
decades	QUANTITY	0.88+
one ecosystem	QUANTITY	0.87+
ec2	TITLE	0.85+
last three days	DATE	0.82+
three cornerstones	QUANTITY	0.79+
G	ORGANIZATION	0.79+
19	QUANTITY	0.78+
eight years	QUANTITY	0.74+
Cube	ORGANIZATION	0.74+
this week	DATE	0.73+
Eso	ORGANIZATION	0.72+
G E	ORGANIZATION	0.7+
Pivot nine	ORGANIZATION	0.69+
excel	TITLE	0.67+
Saletan	PERSON	0.59+
Cubes	ORGANIZATION	0.57+
second	QUANTITY	0.57+
Yuka	COMMERCIAL_ITEM	0.53+
half	QUANTITY	0.5+
Jupiter	ORGANIZATION	0.48+
Invent 2019	EVENT	0.46+
Reinvent 19	EVENT	0.39+
invent	EVENT	0.24+

Tom Phelan, HPE | KubeCon + CloudNativeCon NA 2019

Live from San Diego, California it's theCUBE! covering KubeCon and CloudNativeCon brought to you by Red Hat a CloudNative computing foundation and its ecosystem partners. >> Welcome back, this is theCube's coverage of KubeCon, CloudNativeCon 2019 in San Diego I'm Stu Miniman with my co-host for the week, John Troyer, and happy to welcome to the program, Tom Phelan, who's an HPE Fellow and was the BlueData CTO >> That's correct. >> And is now part of Hewlett-Packard Enterprise. Tom, thanks so much for joining us. >> Thanks, Stu. >> All right, so we talked with a couple of your colleagues earlier this morning. >> Right. >> About the HPE container platform. We're going to dig in a little bit deeper later. >> So, set the table for us as to really the problem statement that HP is going to solve here. >> Sure, so Blue Data which is what technologies we're talking about, we addressed the issues of how to run applications well in containers in the enterprise. Okay, what this involves is how do you handle security how do you handle Day-2 operations of upgrade of the software how do you bring CI and CD actions to all your applications. This is what the HPE container platform is all about. So, the announcement this morning, which went out was HPE is announcing the general availability of the HPE container platform, an enterprise solution that will run not only CloudNative applications, are typically called microservices applications, but also Legacy applications on Kubernetes and it's supported in a hybrid environment. So not only the main public cloud providers, but also on premise. And a little bit of divergence for HPE, HPE is selling this product, licensing this product to work on heterogeneous hardware. So not only HPE hardware, but other competitors' hardware as well. >> It's good, one of the things I've been hearing really over the last year is when we talked about Kubernetes, it resonated, for the most part, with me. I'm an infrastructure guy by background. When I talk in the cloud environment, it's really talking more about the applications. >> Exactly. >> And that really, we know why does infrastructure exist? Infrastructure is just to run my applications, it's about my data, it's about my business processes >> Right. >> And it seems like that is a y'know really where you're attacking with this solution. >> Sure, this solution is a necessary portion of the automated infrastructure for providing solutions as a service. So, um, historically, BlueData has been specializing in artificial intelligence, machine learning, deep learning, big data, that's where our strong suit came from. So we, uh, developed a platform that would containerize those applications like TensorFlow, um, Hadoop, Spark, and the like, make it easy for data scientists to stand up some clusters, and then do the horizontal scalability, separate, compute, and storage so that you can scale your compute independent of your storage capacity. What we're now doing is part of the HPE container platform is taking that same knowledge, expanding it to other applications beyond AI, ML, and DL. >> So what are some of those Day-2 implications then uh what is something that folks run into that then now with an HPE container platform you think will eliminate those problems? >> Sure, it's a great question, so, even though, uh, we're talking about applications that are inherently scalable, so, AI and ML and DL, they are developed so they can be horizontal- horizontally scalable, they're not stateless in the true sense of the word. When we say a stateless application, that means that, uh, there is no state in the container itself that matters. So if you destroy the container, reinstate it, there's no loss of continuity. That's a true stateless or CloudNative application. Uh, AI and ML and DL applications tend to have configuration information and state information that's stored in what's known as the Root Storage of the compute node, okay, what's in slash, so you might see, um, per node configuration information in a configuration file in the Etsy directory. Okay, today, if you just take standard off the shelf Kubernetes, if you deploy, um, Hadoop for example, or TensorFlow, and you configure that, you lose that state when the container goes down. With the HPE container platform, we are, we have been moving forward with a, or driving, a open source project known as KubeDirector. A portion of KubeDirector, of the functionality is to preserve that, uh, Root Storage so that if a container goes down, we are allowed- we are enabled to bring a Nether Instance of that container and have it have the same Root Storage. So it'll look like a just a reboot to the node rather than a reinstall of that node. So that's a huge value when you're talking about these, um, machine learning and deep learning applications that have the state in root. >> All right, so, Tom, how does KubeDirector fit compared to compare contrast it, does it kind of sit aside something like Rook, which was talked about in the keynote, talking about being able to really have that, uh, that kind of universal backplate across all of my clusters >> Right, you're going to have to be >> Is that specific for AI and ML or is this >> I, well, that's a great question, so KubeDirector itself is a Kubernetes operator, okay, uh, and we have implemented that, the open-source communities joining in, so, but what it allows us, KubeDirector is, um, application agnostic, so, you could author a YAML file with some pertinent information about the application that you want to deploy on Kubernetes. You give that YAML file to the KubeDirector operator, it will then deploy the application on your Kubernetes cluster and then manage the Day-2 activities, so this is beyond Helm, or beyond KubeFlow, which are deployment engines. So this also has, well, what happens if I lose my container? How do I bring the services back up, and those services are dependent upon the type of application that's there. That's what KubeDirector does. So, KubeDirector allows a new application to be deployed and managed on Kubernetes without having to write a operator in Go Code. Makes it much easier to bring a new application to the platform. >> Gotcha, so Tom, kind of a two-part question, first part, so, uh, you were one of the co-founders of BlueData >> And now with HPE, there's, sometimes I think with technology, some of them are kind of invented in a lab, or in a graduate student's head, others come out of real world experience. And, uh, you're smiling 'cause I think BlueData was really built around, uh, y'know, at least your experience was building these BlueData apps. >> This is a hundred percent real world experience. So we were one of the real early pioneers of bringing, um, these applications into containers y'know, truth be told, when BlueData first started, we were using VMs. We were using OpenStack, and VM more. And we realized that we didn't need to pay that overhead it was possible to go ahead and get the same thing out of a container. So we did that, and we suffered all the slings and arrows of how to make the, um, security of the container, uh, to meet enterprise class standards. How do we automatically integrate with active directory and LDAP, and Kerberos, with a single sign on all those things that enterprises require for their infrastructure, we learned that the hard way through working with, y'know, international banking organizations, financial institutions, investment houses, medical companies, so our, our, all our customers were those high-demand enterprises. Now that we're apart of HP, we're taking all that knowledge that we acquired, bringing it to Kubernetes, exposing it through KubeDirector, where we can, and I agree there will be follow on open-source projects, releasing more of that technology to the open-source community. >> Mhm that was, that was actually part-two of my question is okay, what about, with now with HPE, the apps that are not AI, ML and you nailed it, right, >> Yeah. >> All those enterprise requirements. >> Same problems exist, right, there is secure data, you have secure data in a public cloud, you have it on premise, how do you handle data gravity issues so that you store, you run your compute close to your data where it's necessary you don't want to pay for moving data across the web like that. >> All right, so Tom, platforms are used for lots of different things, >> Yes. >> Bring us inside, what do you feel from your early customers, some of the key use cases that should be highlighted? >> Our key use cases were those customers who were very interested, they had internal developers. So they had a lot of expertise in house, maybe they had medical data scientists, or financial advisors. They wanted to build up sandboxes, so we helped them stand up, cookie-cutter sandboxes within a few moments, they could go ahead and play around with them, if they screwed them up, so what? Right, we tear them down and redo it within moments, they didn't need a lot of DevOps, heavy weight-lifting to reinstall bare-metal servers with these complex stacks of applications. The data scientist that I want to use this software which just came out of the open-source community last week, deployed in a container and I want to mess it up, I want to tighten, y'know, really push the edge on this and so we did that. We developed this sandboxing platform. Then they said, okay, now that you've tested this, I have it in queue A, I've done my CI/CD, I've done my testing, now I want to promote it into production. So we did that, we allowed the customer to deploy and define different quality of service depending on what tier their application was running in. If it was in testing dev, it got the lowest tier. If it was in CI/CD, it got a higher level of resource priority. Once it got promoted to production, it got guaranteed resource priority, the highest solution, so that you could always make sure that the customer who is using the production cluster got the highest level of access to the resources. So we built that out as a solution, KubeDirector now allows us to deploy that same sort of thing with the Kubernetes container orchestrator. >> Tom, you mentioned blue metal, uh, bare-metal, we've talked about VMs, we've been hearing a lot of multicloud stories here, already today, the first day of KubeCon, it seems like that's a reality out in the world, >> Can you talk about where are people putting applications and why? >> Well, clearly, uh, the best practices today are to deploy virtual machines and then put containers in virtual machines, and they do that for two very legitimate reasons. One is concern about the security, uh, plane for containers. So if you had a rogue actor, they could break out of the container, and if they're confined within the virtual machine, you can limit the impact of the damage. One very good, uh, reason for virtual machines, also there's a, uh, feeling that it's necessary to maintain, um, the container's state running in a virtual machine, and then be allowed to upgrade the the Prom Code, or the host software itself. So you want to be able to vMotion a virtual machine from one physical host to another, and then maintain the state of the containers. What KubeDirector brings and what BlueData and HP are stating is we believe we can provide both of those functionalities on containers on bare-metal. Okay, and we've spoken a bit about today already about how KubeDirector allows the Root File System to be preserved. That is a huge component of of why vMotion is used to move the container from one host to another. We believe that we can do that with a reboot. Also, um, HPE container platform runs all virtual machines as, um, reduced priority. So you're not, we're not giving root priority or privileged priority to those containers. So we minimize the attack plane of the software running in the container by running it as an unprivileged user and then tight control of the container capabilities that are configured for a given container. We believe it's just enough priority or just enough functionality which is granted to that container to run the application and nothing more. So we believe that we are limiting the attack plane of that through the, uh and that's why we believe we can validly state we can run these containers on bare-metal without, without the enterprise having to compromise in areas of security or persistence of the data. >> All right, so Tom, the announcement this week, uh is HP container platform available today? >> It will be a- we are announcing it. It's a limited availability to select customers It'll be generally available in Queue 1 of 2020. >> All right, and y'know, give us, y'know, we come back to KubeCon, which will actually be in Boston >> Yes. >> Next year in November >> When we're sitting down with you and you say hugely successful >> Right. >> Give us some of those KPIs as to y'know >> Sure. >> What are your teams looking at? >> So, we're going to look at how many new customers these are not the historic BlueData customers, how many new customers have we convinced that they can run their production work loads on Kubernetes And we're talking about I don't care how many POCs we do or how many testing dev things I want to know about production workloads that are the bread and butter for these enterprises that HP is helping run in the industry. And that will be not only, as we've talked about, CloudNative applications, but also the Legacy, J2EE applications that they're running today on Kubernetes. >> Yeah, I, uh, I don't know if you caught the keynote this morning, but Dan Kohn, y'know, runs the CNCF, uh, was talking about, y'know, a lot of the enterprises have been quitting them with second graders. Y'know, we need to get over the fact that y'know things are going to break and we're worried about making changes y'know the software world that y'know we've been talking about for a number of years, absolutely things will break, but software needs to be a resilient and distributed system, so, y'know, what advice do you give the enterprise out there to be able to dive in and participate? >> It's a great question, we get it all the time. The first thing is identify your most critical use case. Okay, that we can help you with and, and don't try to boil the ocean. Let's get the container platform in there, we will show you how you have success, with that one application and then once that's you'll build up confidence in the platform and then we can run the rest of your applications and production. >> Right, well Tom Phelan, thanks so much for the updates >> Thank you, Stu. >> Congratulations on the launch >> Thank you. >> with the HP container platform and we look forward to seeing the results in 2020. >> Well I hope you invite me back 'cause this was really fun and I'm glad to speak with you today. Thank you. >> All right, for John Troyer, I'm Stu Miniman, still watch more to go here at KubeCon, CloudNativeCon 2019. Thanks for watching theCUBE. (energetic music)

Published Date : Nov 20 2019

SUMMARY :

brought to you by Red Hat And is now part of Hewlett-Packard Enterprise. All right, so we talked with a couple of your colleagues About the HPE container platform. statement that HP is going to solve here. of the HPE container platform, it resonated, for the most part, with me. And it seems like that is a y'know so that you can scale your compute of that container and have it have the same Root Storage. about the application that you want to deploy on Kubernetes. built around, uh, y'know, at least your experience was security of the container, uh, issues so that you store, you run your compute got the highest level of access to the resources. We believe that we can do that with a reboot. It's a limited availability to select customers that are the bread and butter for these enterprises runs the CNCF, uh, was talking about, y'know, Okay, that we can help you with and we look forward to seeing the results in 2020. and I'm glad to speak with you today. All right, for John Troyer, I'm Stu Miniman,

ENTITIES

Entity	Category	Confidence
Tom Phelan	PERSON	0.99+
John Troyer	PERSON	0.99+
Dan Kohn	PERSON	0.99+
2020	DATE	0.99+
Red Hat	ORGANIZATION	0.99+
Stu Miniman	PERSON	0.99+
HP	ORGANIZATION	0.99+
Boston	LOCATION	0.99+
two-part	QUANTITY	0.99+
Tom	PERSON	0.99+
San Diego, California	LOCATION	0.99+
BlueData	ORGANIZATION	0.99+
KubeCon	EVENT	0.99+
Stu	PERSON	0.99+
last week	DATE	0.99+
Next year	DATE	0.99+
first part	QUANTITY	0.99+
today	DATE	0.99+
last year	DATE	0.99+
San Diego	LOCATION	0.98+
one	QUANTITY	0.98+
CloudNativeCon	EVENT	0.98+
Hewlett-Packard Enterprise	ORGANIZATION	0.98+
this week	DATE	0.98+
both	QUANTITY	0.98+
One	QUANTITY	0.98+
OpenStack	TITLE	0.98+
HPE	TITLE	0.98+
CNCF	ORGANIZATION	0.98+
hundred percent	QUANTITY	0.97+
Etsy	ORGANIZATION	0.97+
HPE	ORGANIZATION	0.97+
TensorFlow	TITLE	0.97+
KubeDirector	TITLE	0.97+
first day	QUANTITY	0.96+
CloudNativeCon 2019	EVENT	0.96+
CloudNative	TITLE	0.95+
Spark	TITLE	0.95+
one application	QUANTITY	0.95+
first	QUANTITY	0.94+
Kubernetes	TITLE	0.94+
Hadoop	TITLE	0.94+
this morning	DATE	0.9+
first thing	QUANTITY	0.9+
two very legitimate reasons	QUANTITY	0.89+
vMotion	TITLE	0.89+
one physical	QUANTITY	0.88+
this morning	DATE	0.88+
earlier this morning	DATE	0.87+
Kerberos	TITLE	0.83+

Arun Murthy, Hortonworks | DataWorks Summit 2018

>> Live from San Jose in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2018, brought to you by Hortonworks. >> Welcome back to theCUBE's live coverage of DataWorks here in San Jose, California. I'm your host, Rebecca Knight, along with my cohost, Jim Kobielus. We're joined by Aaron Murphy, Arun Murphy, sorry. He is the co-founder and chief product officer of Hortonworks. Thank you so much for returning to theCUBE. It's great to have you on >> Yeah, likewise. It's been a fun time getting back, yeah. >> So you were on the main stage this morning in the keynote, and you were describing the journey, the data journey that so many customers are on right now, and you were talking about the cloud saying that the cloud is part of the strategy but it really needs to fit into the overall business strategy. Can you describe a little bit about how you're approach to that? >> Absolutely, and the way we look at this is we help customers leverage data to actually deliver better capabilities, better services, better experiences, to their customers, and that's the business we are in. Now with that obviously we look at cloud as a really key part of it, of the overall strategy in terms of how you want to manage data on-prem and on the cloud. We kind of joke that we ourself live in a world of real-time data. We just live in it and data is everywhere. You might have trucks on the road, you might have drawings, you might have sensors and you have it all over the world. At that point, we've kind of got to a point where enterprise understand that they'll manage all the infrastructure but in a lot of cases, it will make a lot more sense to actually lease some of it and that's the cloud. It's the same way, if you're delivering packages, you don't got buy planes and lay out roads you go to FedEx and actually let them handle that view. That's kind of what the cloud is. So that is why we really fundamentally believe that we have to help customers leverage infrastructure whatever makes sense pragmatically both from an architectural standpoint and from a financial standpoint and that's kind of why we talked about how your cloud strategy, is part of your data strategy which is actually fundamentally part of your business strategy. >> So how are you helping customers to leverage this? What is on their minds and what's your response? >> Yeah, it's really interesting, like I said, cloud is cloud, and infrastructure management is certainly something that's at the foremost, at the top of the mind for every CIO today. And what we've consistently heard is they need a way to manage all this data and all this infrastructure in a hybrid multi-tenant, multi-cloud fashion. Because in some GEOs you might not have your favorite cloud renderer. You know, go to parts of Asia is a great example. You might have to use on of the Chinese clouds. You go to parts of Europe, especially with things like the GDPR, the data residency laws and so on, you have to be very, very cognizant of where your data gets stored and where your infrastructure is present. And that is why we fundamentally believe it's really important to have and give enterprise a fabric with which it can manage all of this. And hide the details of all of the underlying infrastructure from them as much as possible. >> And that's DataPlane Services. >> And that's DataPlane Services, exactly. >> The Hortonworks DataPlane Services we launched in October of last year. Actually I was on CUBE talking about it back then too. We see a lot of interest, a lot of excitement around it because now they understand that, again, this doesn't mean that we drive it down to the least common denominator. It is about helping enterprises leverage the key differentiators at each of the cloud renderers products. For example, Google, which we announced a partnership, they are really strong on AI and MO. So if you are running TensorFlow and you want to deal with things like Kubernetes, GKE is a great place to do it. And, for example, you can now go to Google Cloud and get DPUs which work great for TensorFlow. Similarly, a lot of customers run on Amazon for a bunch of the operational stuff, Redshift as an example. So the world we live in, we want to help the CIO leverage the best piece of the cloud but then give them a consistent way to manage and count that data. We were joking on stage that IT has just about learned how deal with Kerberos and Hadoob And now we're telling them, "Oh, go figure out IM on Google." which is also IM on Amazon but they are completely different. The only thing that's consistent is the name. So I think we have a unique opportunity especially with the open source technologies like Altas, Ranger, Knox and so on, to be able to draw a consistent fabric over this and secured occurrence. And help the enterprise leverage the best parts of the cloud to put a best fit architecture together, but which also happens to be a best of breed architecture. >> So the fabric is everything you're describing, all the Apache open source projects in which HortonWorks is a primary committer and contributor, are able to scheme as in policies and metadata and so forth across this distributed heterogeneous fabric of public and private cloud segments within a distributed environment. >> Exactly. >> That's increasingly being containerized in terms of the applications for deployment to edge nodes. Containerization is a big theme in HTP3.0 which you announced at this show. >> Yeah. >> So, if you could give us a quick sense for how that containerization capability plays into more of an edge focus for what your customers are doing. >> Exactly, great point, and again, the fabric is obviously, the core parts of the fabric are the open source projects but we've also done a lot of net new innovation with data plans which, by the way, is also open source. Its a new product and a new platform that you can actually leverage, to lay it out over the open source ones you're familiar with. And again, like you said, containerization, what is actually driving the fundamentals of this, the details matter, the scale at which we operate, we're talking about thousands of nodes, terabytes of data. The details really matter because a 5% improvement at that scale leads to millions of dollars in optimization for capex and opex. So that's why all of that, the details are being fueled and driven by the community which is kind of what we tell over HDP3 Until the key ones, like you said, are containerization because now we can actually get complete agility in terms of how you deploy the applications. You get isolation not only at the resource management level with containers but you also get it at the software level, which means, if two data scientists wanted to use a different version of Python or Scala or Spark or whatever it is, they get that consistently and holistically. That now they can actually go from the test dev cycle into production in a completely consistent manner. So that's why containers are so big because now we can actually leverage it across the stack and the things like MiNiFi showing up. We can actually-- >> Define MiNiFi before you go further. What is MiNiFi for our listeners? >> Great question. Yeah, so we've always had NiFi-- >> Real-time >> Real-time data flow management and NiFi was still sort of within the data center. What MiNiFi does is actually now a really, really small layer, a small thin library if you will that you can throw on a phone, a doorbell, a sensor and that gives you all the capabilities of NiFi but at the edge. >> Mmm Right? And it's actually not just data flow but what is really cool about NiFi it's actually command and control. So you can actually do bidirectional command and control so you can actually change in real-time the flows you want, the processing you do, and so on. So what we're trying to do with MiNiFi is actually not just collect data from the edge but also push the processing as much as possible to the edge because we really do believe a lot more processing is going to happen at the edge especially with the A6 and so on coming out. There will be custom hardware that you can throw and essentially leverage that hardware at the edge to actually do this processing. And we believe, you know, we want to do that even if the cost of data not actually landing up at rest because at the end of the day we're in the insights business not in the data storage business. >> Well I want to get back to that. You were talking about innovation and how so much of it is driven by the open source community and you're a veteran of the big data open source community. How do we maintain that? How does that continue to be the fuel? >> Yeah, and a lot of it starts with just being consistent. From day one, James was around back then, in 2011 we started, we've always said, "We're going to be open source." because we fundamentally believed that the community is going to out innovate any one vendor regardless of how much money they have in the bank. So we really do believe that's the best way to innovate mostly because their is a sense of shared ownership of that product. It's not just one vendor throwing some code out there try to shove it down the customers throat. And we've seen this over and over again, right. Three years ago, we talk about a lot of the data plane stuff comes from Atlas and Ranger and so on. None of these existed. These actually came from the fruits of the collaboration with the community with actually some very large enterprises being a part of it. So it's a great example of how we continue to drive it6 because we fundamentally believe that, that's the best way to innovate and continue to believe so. >> Right. And the community, the Apache community as a whole so many different projects that for example, in streaming, there is Kafka, >> Okay. >> and there is others that address a core set of common requirements but in different ways, >> Exactly. >> supporting different approaches, for example, they are doing streaming with stateless transactions and so forth, or stateless semantics and so forth. Seems to me that HortonWorks is shifting towards being more of a streaming oriented vendor away from data at rest. Though, I should say HDP3.0 has got great scalability and storage efficiency capabilities baked in. I wonder if you could just break it down a little bit what the innovations or enhancements are in HDP3.0 for those of your core customers, which is most of them who are managing massive multi-terabyte, multi-petabyte distributed, federated, big data lakes. What's in HDP3.0 for them? >> Oh for lots. Again, like I said, we obviously spend a lot of time on the streaming side because that's where we see. We live in a real-time world. But again, we don't do it at the cost of our core business which continues to be HDP. And as you can see, the community trend is drive, we talked about continuization massive step up for the Hadoob Community. We've also added support for GPUs. Again, if you think about Trove's at scale machine learning. >> Graphing processing units, >> Graphical-- >> AI, deep learning >> Yeah, it's huge. Deep learning, intensive flow and so on, really, really need a custom, sort of GPU, if you will. So that's coming. That's an HDP3. We've added a whole bunch of scalability improvements with HDFS. We've added federation because now we can go from, you can go over a billion files a billion objects in HDFS. We also added capabilities for-- >> But you indicated yesterday when we were talking that very few of your customers need that capacity yet but you think they will so-- >> Oh for sure. Again, part of this is as we enable more source of data in real-time that's the fuel which drives and that was always the strategy behind the HDF product. It was about, can we leverage the synergies between the real-time world, feed that into what you do today, in your classic enterprise with data at rest and that is what is driving the necessity for scale. >> Yes. >> Right. We've done that. We spend a lot of work, again, loading the total cost of ownership the TCO so we added erasure coding. >> What is that exactly? >> Yeah, so erasure coding is a classic sort of storage concept which allows you to actually in sort of, you know HTFS has always been three replicas So for redundancy, fault tolerance and recovery. Now, it sounds okay having three replicas because it's cheap disk, right. But when you start to think about our customers running 70, 80 hundred terabytes of data those three replicas add up because you've now gone from 80 terabytes of effective data where actually two 1/4 of an exobyte in terms of raw storage. So now what we can do with erasure coding is actually instead of storing the three blocks we actually store parody. We store the encoding of it which means we can actually go down from three to like two, one and a half, whatever we want to do. So, if we can get from three blocks to one and a half especially for your core data, >> Yeah >> the ones you're not accessing every day. It results in a massive savings in terms of your infrastructure costs. And that's kind of what we're in the business doing, helping customers do better with the data they have whether it's on-prem or on the cloud, that's sort of we want to help customers be comfortable getting more data under management along with secured and the lower TCO. The other sort of big piece I'm really excited about HDP3 is all the work that's happened to Hive Community for what we call the real-time database. >> Yes. >> As you guys know, you follow the whole sequel of ours in the Doob Space. >> And hive has changed a lot in the last several years, this is very different from what it was five years ago. >> The only thing that's same from five years ago is the name (laughing) >> So again, the community has done a phenomenal job, kind of, really taking sort of a, we used to call it like a sequel engine on HDFS. From there, to drive it with 3.0, it's now like, with Hive 3 which is part of HDP3 it's a full fledged database. It's got full asset support. In fact, the asset support is so good that writing asset tables is at least as fast as writing non-asset tables now. And you can do that not only on-- >> Transactional database. >> Exactly. Now not only can you do it on prem, you can do it on S3. So you can actually drive the transactions through Hive on S3. We've done a lot of work to actually, you were there yesterday when we were talking about some of the performance work we've done with LAP and so on to actually give consistent performance both on-prem and the cloud and this is a lot of effort simply because the performance characteristics you get from the storage layer with HDFS versus S3 are significantly different. So now we have been able to bridge those with things with LAP. We've done a lot of work and sort of enhanced the security model around it, governance and security. So now you get things like account level, masking, row-level filtering, all the standard stuff that you would expect and more from an Enprise air house. We talked to a lot of our customers, they're doing, literally tens of thousands of views because they don't have the capabilities that exist in Hive now. >> Mmm-hmm 6 And I'm sitting here kind of being amazed that for an open source set of tools to have the best security and governance at this point is pretty amazing coming from where we started off. >> And it's absolutely essential for GDPR compliance and compliance HIPA and every other mandate and sensitivity that requires you to protect personally identifiable information, so very important. So in many ways HortonWorks has one of the premier big data catalogs for all manner of compliance requirements that your customers are chasing. >> Yeah, and James, you wrote about it in the contex6t of data storage studio which we introduced >> Yes. >> You know, things like consent management, having--- >> A consent portal >> A consent portal >> In which the customer can indicate the degree to which >> Exactly. >> they require controls over their management of their PII possibly to be forgotten and so forth. >> Yeah, it's going to be forgotten, it's consent even for analytics. Within the context of GDPR, you have to allow the customer to opt out of analytics, them being part of an analytic itself, right. >> Yeah. >> So things like those are now something we enable to the enhanced security models that are done in Ranger. So now, it's sort of the really cool part of what we've done now with GDPR is that we can get all these capabilities on existing data an existing applications by just adding a security policy, not rewriting It's a massive, massive, massive deal which I cannot tell you how much customers are excited about because they now understand. They were sort of freaking out that I have to go to 30, 40, 50 thousand enterprise apps6 and change them to take advantage, to actually provide consent, and try to be forgotten. The fact that you can do that now by changing a security policy with Ranger is huge for them. >> Arun, thank you so much for coming on theCUBE. It's always so much fun talking to you. >> Likewise. Thank you so much. >> I learned something every time I listen to you. >> Indeed, indeed. I'm Rebecca Knight for James Kobeilus, we will have more from theCUBE's live coverage of DataWorks just after this. (Techno music)

Published Date : Jun 19 2018

SUMMARY :

brought to you by Hortonworks. It's great to have you on Yeah, likewise. is part of the strategy but it really needs to fit and that's the business we are in. And hide the details of all of the underlying infrastructure for a bunch of the operational stuff, So the fabric is everything you're describing, in terms of the applications for deployment to edge nodes. So, if you could give us a quick sense for Until the key ones, like you said, are containerization Define MiNiFi before you go further. Yeah, so we've always had NiFi-- and that gives you all the capabilities of NiFi the processing you do, and so on. and how so much of it is driven by the open source community that the community is going to out innovate any one vendor And the community, the Apache community as a whole I wonder if you could just break it down a little bit And as you can see, the community trend is drive, because now we can go from, you can go over a billion files the real-time world, feed that into what you do today, loading the total cost of ownership the TCO sort of storage concept which allows you to actually is all the work that's happened to Hive Community in the Doob Space. And hive has changed a lot in the last several years, And you can do that not only on-- the performance characteristics you get to have the best security and governance at this point and sensitivity that requires you to protect possibly to be forgotten and so forth. Within the context of GDPR, you have to allow The fact that you can do that now Arun, thank you so much for coming on theCUBE. Thank you so much. we will have more from theCUBE's live coverage of DataWorks

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
James	PERSON	0.99+
Aaron Murphy	PERSON	0.99+
Arun Murphy	PERSON	0.99+
Arun	PERSON	0.99+
2011	DATE	0.99+
Google	ORGANIZATION	0.99+
5%	QUANTITY	0.99+
80 terabytes	QUANTITY	0.99+
FedEx	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
Arun Murthy	PERSON	0.99+
HortonWorks	ORGANIZATION	0.99+
yesterday	DATE	0.99+
San Jose, California	LOCATION	0.99+
three replicas	QUANTITY	0.99+
James Kobeilus	PERSON	0.99+
three blocks	QUANTITY	0.99+
GDPR	TITLE	0.99+
Python	TITLE	0.99+
Europe	LOCATION	0.99+
millions of dollars	QUANTITY	0.99+
Scala	TITLE	0.99+
Spark	TITLE	0.99+
theCUBE	ORGANIZATION	0.99+
five years ago	DATE	0.99+
one and a half	QUANTITY	0.98+
Enprise	ORGANIZATION	0.98+
three	QUANTITY	0.98+
Hive 3	TITLE	0.98+
Three years ago	DATE	0.98+
both	QUANTITY	0.98+
Asia	LOCATION	0.97+
50 thousand	QUANTITY	0.97+
TCO	ORGANIZATION	0.97+
MiNiFi	TITLE	0.97+
Apache	ORGANIZATION	0.97+
40	QUANTITY	0.97+
Altas	ORGANIZATION	0.97+
Hortonworks DataPlane Services	ORGANIZATION	0.96+
DataWorks Summit 2018	EVENT	0.96+
30	QUANTITY	0.95+
thousands of nodes	QUANTITY	0.95+
A6	COMMERCIAL_ITEM	0.95+
Kerberos	ORGANIZATION	0.95+
today	DATE	0.95+
Knox	ORGANIZATION	0.94+
one	QUANTITY	0.94+
hive	TITLE	0.94+
two data scientists	QUANTITY	0.94+
each	QUANTITY	0.92+
Chinese	OTHER	0.92+
TensorFlow	TITLE	0.92+
S3	TITLE	0.91+
October of last year	DATE	0.91+
Ranger	ORGANIZATION	0.91+
Hadoob	ORGANIZATION	0.91+
HIPA	TITLE	0.9+
CUBE	ORGANIZATION	0.9+
tens of thousands	QUANTITY	0.9+
one vendor	QUANTITY	0.89+
last several years	DATE	0.88+
a billion objects	QUANTITY	0.86+
70, 80 hundred terabytes of data	QUANTITY	0.86+
HTP3.0	TITLE	0.86+
two 1/4 of an exobyte	QUANTITY	0.86+
Atlas and	ORGANIZATION	0.85+
DataPlane Services	ORGANIZATION	0.84+
Google Cloud	TITLE	0.82+

Paul Barth, Podium Data | The Podium Data Marketplace

(light techno music) >> Narrator: From the SiliconANGLE Media office in Boston, Massachusetts, it's theCUBE. Now here's your host, Stu Miniman. >> Hi, I'm Stu Miniman and welcome to theCUBE conversation here in our Boston area studio. Happy to welcome back to the program, Paul Barth, who's the CEO of Podium Data, also a Boston area company. Paul, great to see you. >> Great to see you, Stu. >> Alright, so we last caught up with you, it was a fun event that we do at MIT talking about information, data quality, kind of understand why your company would be there. For our audience that doesn't know, just give us a quick summary, your background, what was kind of the why of Podium Data back when it was founded in 2014. >> Oh that's great Stu, thank you. I've spent most of my career in helping large companies with their data and analytic strategies, next generation architectures, new technologies, et cetera, and in doing this work, we kept stumbling across the complexity of adopting new technologies. And around the time that big data and Hadoop was getting popular and lots of hype in the marketplace, we realized that traditional large businesses couldn't manage data on this because the technology was so new and different. So we decided to form a software company that would automate a lot of the processing, manage a catalog of the data, and make it easy for nontechnical users to access their data. >> Yeah, that's great. You know when I think back to when we were trying to help people understand this whole big data wave, one of the pithy things we did, it was turning all this glut of data from a problem to an opportunity, how do we put this in to the users. But a lot of things kind of, we hit bumps in the road as an industry. Did studies it was more than 50 percent of these projects fail. You brought up a great point, tooling is tough, changing processes is really challenging. But that focus on data is core to our research, what we talk about all the time. But now it's automation and AIML, choose your favorite acronym of the day. This is going to solve all the ills that the big data wave didn't do right. Right, Paul? So maybe you can help us connect the dots a little bit because I hear a lot in to the foundation that trend from the big data to kind of the automation and AI thing. So you're maybe just a little ahead of your time. >> Well thanks, I saw an opportunity before there was anything in the marketplace that could help companies really corral their data, get some of the benefits of consolidation, some oversight in management through an automated catalog and the like. As AI has started to emerge as the next hype wave, what we're seeing consistently from our partners like Data Robot and others who have great AI technology is they're starved for good information. You can't learn automatically or even human learning if you're given inconsistent information, data that's not conformed or ready or consistent, which you can look at a lot of different events and start to build correlations. So we believe that we're still a central part of large companies building out their analytics infrastructure. >> Okay, help us kind of look at how your users and how you fit into this changing ecosystem. We all know things are just changing so fast. From 2014 to today, Cloud is so much bigger, the big waves of IoT keep talking. Everybody's got some kind of machine learning initiative. So what're the customers looking for, how do you fit in some of those different environments? >> I think when we formed the company we recognized that the cost performance differential between the open-sourced data management platforms like Hadoop and now Spark, were so dramatically better than the traditional databases and data warehouses, that we could transform the business process of how do you get data from Rotaready. And that's a consistent problem for large companies they have data in legacy formats, on mainframes, they have them in relational databases, they have them in flat files, in the Cloud, behind the firewall, and these silos continue to grow. This view of a consistent, or consistent view of your business, your customers, your processes, your operations, is cental to optimizing and automating the business today. So our business users are looking for a couple of things. One thing they are looking for is some manageability and a consistent view of their data no matter where it lives, and our catalog can create that automatically in days or weeks depending on how how big we go or how broadly we go. They're looking for that visibility but also they're looking for productivity enhancements, which means that they can start leveraging that data without a big IT project. And finally they're looking for agility which means there's self-service, there's an ability to access data that you know is trusted and secured and safe for the end users to use without having to call IT and have a program spin something up. So they're really looking for a totally new paradigm of data delivery. >> I tell you that hits on so many things that we've been seeing and a challenge that we've seen in the marketplace. In my world, talk about people they had their data centers and if I look at my data and I look at my applications, it's this heterogeneous nightmare. We call it hybrid or multi cloud these days, and it shows the promise of making me faster and all this stuff. But as you said, my data is all over the place, my applications are getting spun up and maybe I'm moving them and federating things and all that. But, my data is one of the most critical components of my business. Maybe explain a little bit how that works. Where do the customers come in and say oh my gosh, I've got a challenge and Podium Data's helping and the marketplace and all that. >> Sure, first of all we targeted from the start large regulated businesses, financial services, pharmaceutical healthcare, and we've broadened since then. But these companies' data issues were really pressure from both ends. One was a compliance pressure. They needed to develop regulatory reports that could be audited and proven correct. If your data is in many silos and it's compiled manually using spreadsheets, that's not only incredibly expensive and nonreproducible, it's really not auditable. So a lot of these folks were pressured to prove that the data they were reporting was accurate. On the other side, it's the opportunity cost. Fintech companies are coming into their space offering loans and financial products, without any human interaction, without any branches. They knew that data was the center to that. The only way you can make an offer to someone for financial product is if you know enough about them that you understand the risk. So the use and leverage of data was a very critical mass. There was good money to invest in it and they also saw that the old ways of doing this just weren't working. >> Paul, does your company help with the incoming GDPR challenges that are being faced? >> Sure, last year we introduced a PII detector and protection scheme. That may not sound like such a big deal but in the Hadoop open-source world it is. At the end of the day this technology while cheap and powerful is incredibly immature. So when you land data, for example, into these open data platforms like S3 out in the Cloud, Podium takes the time to analyze that data and tell you what the structures of the data are, where you might have issues with sensitive data, and has the tooling like obfuscation and encryption to protect the data so you can create safe to use data. I'd say our customers right now, they started out behind the firewall. Again, these regulated businesses were very nervous about breaches. They're looking and realizing they need to get to the Cloud 'cause frankly not only is it a better platform for them from a cost basis and scalability, it's actually where the data comes from these days, their data suppliers are in the Cloud. So we're helping them catalog their data and identify the sensitive data and prepare data sets to move to the Cloud and then migrate it to the Cloud and manage it there. >> Such a critical piece. I lived in the storage world for about a decade. There was a little acquisition that they made of a company called Pi, P-I. It was Paul Maritz who a lot of people know, Paul had a great career at Microsoft went on to run VMware for a bunch. But it was, the vision you talk about reminds me of what I heard Paul Maritz talking to. Gosh, that was a decade ago. Information, so much sensitivity. Expand a little bit on the security aspect there, when I looked through your website, you're not a security company per se, but are there partnerships? How do you help customers with I want to leverage data but I need to be secure, all the GRC and security things that's super challenging. >> At this space to achieve agility and scale on a new technology, you have to be enterprise ready. So in version one of our product, we had security features that included field level encryption and protection, but also integration with LDAB and Kerberos and other enterprise standard mechanisms and systems that would protect data. We can interoperate with Protegrity's and other kinds of encryption and protection algorithms with our open architecture. But it's kind of table stakes to get your data in a secured, monitorable infrastructure if you're going to enable this agility and self-service. Otherwise you restrict the use of the new data technologies to sandboxes. The failures you hear about are not in the sandboxes in the exploration, they're in getting those to production. I had one of my customers talk about how before Podium they had 50 different projects on Hadoop and all of them were in code red and none of them could go to production. >> Paul you mentioned catalogs, give us the update. What's the newest from Podium Data? Help explain that a little bit more. >> So we believe that the catalog has to help operationalize the data delivery process. So one of the things we did from the very start was say let's use the analytical power of big data technologies, Spark, Hadoop, and others, to analyze the data on it's way in to the platform and build a metadata catalog out of that. So we have over 100 profiling statistics that we automatically calculate and maintain for every field of every file we ever load. It's not something you do as an afterthought or selectively. We knew from our experience that we needed to do that, data validation, and then bring in inferences such as this field looks like PII data and tag that in the metadata. That process of taking in data and this even applies to legacy mainframe data coming in a VSAM format. It gets converted and landed to a usable format automatically. But the most important part is the catalog gets enriched with all this statistical profiling information, validation, all of the technical information and we interoperate as well as have a GUI to help with business tagging, business definitions in the light. >> Paul, just a little bit of a broader industry question, we talked a value of data I think everybody understands how important is it. How are we doing in understanding the value of that data though, is that a monetization thing? You've got academia in your background, there's debates, we've talked to some people at MIT about this. How do you look at data value as an industry in general, is there anything from Podium Data that you help people identify, are we leveraging it, are we doing the most, what are your thoughts around that? >> So I'd say someone who's looking for a good framework to think about this I'd recommend Doug Laney's book on infonomics, we've collaborated for a while, he's doing a great job there. But there's also just a blocking and tackling which is what data is getting used or a common one for our customers is where do I have data that's duplicate or it comes from the same source but it's not exactly the same. That often causes reconciliation issues in finance, or in forecasting, in sales analysis. So what we've done with our data catalog with all these profiling statistics is start to build some analytics that identify similar data sets that don't have to be exactly the same to say you may have a version of the data that you're trying to load here already available. Why don't you look at that data set and see if that one is preferred and the data governance community really likes this. For one of our customers there were literally millions of dollars in savings of eliminating duplication but the more important thing is the inconsistency, when people are using similar but not the same data sets. So we're seeing that as a real driver. >> I want to give you the final word. Just what are you seeing out in the industry these days, biggest opportunities, biggest challenges from users you're talking to? >> Well, what I'd say is when we started this it was very difficult for traditional businesses to use Hadoop in production and they needed an army of programmers and I think we solved that. Last year we started on our work to move to a post-Hadoop world so the first thing we've done is open up our cataloging tools so we can catalog any data set in any source and allow the data to be brought into an analytical environment or production environment more on demand then the idea that you're going to build a giant data lake with everything in it and replicate everything. That's become really interesting because you can build the catalog in a few weeks and then actually use the analysis and all the contents to drive the strategy. What do I prioritize, where do I put things? The other big initiative is of course, Cloud. As I mentioned earlier you have to protect and make Cloud ready data behind your firewall and then you have to know where it's used and how it's used externally. We automate a lot of that process and make that transition something that you can manage over time, and that is now going to be extended into multi cloud, multi lake type of technologies. >> Multi cloud, multi lake, alright. Well Paul Barth, I appreciate getting the update everything happening with Podium Data. Well, theCUBE had so many events this year, be sure to check out thecube.net for all the upcoming events and all the existing interviews. I'm Stu Miniman, thanks for watching theCUBE. (light techno music)

Published Date : Apr 26 2018

SUMMARY :

Narrator: From the SiliconANGLE Media office Hi, I'm Stu Miniman and welcome to theCUBE conversation it was a fun event that we do at MIT and in doing this work, we kept stumbling across one of the pithy things we did, and start to build correlations. and how you fit into this changing ecosystem. and safe for the end users to use and it shows the promise of making me So the use and leverage of data was a very critical mass. and then migrate it to the Cloud and manage it there. Expand a little bit on the security aspect there, and none of them could go to production. What's the newest from Podium Data? and tag that in the metadata. that you help people identify, are we leveraging it, and the data governance community really likes this. I want to give you the final word. and allow the data to be brought into Well Paul Barth, I appreciate getting the update

ENTITIES

Entity	Category	Confidence
2014	DATE	0.99+
Podium Data	ORGANIZATION	0.99+
Paul Maritz	PERSON	0.99+
Stu Miniman	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Paul Barth	PERSON	0.99+
Paul	PERSON	0.99+
Boston	LOCATION	0.99+
last year	DATE	0.99+
Stu	PERSON	0.99+
Last year	DATE	0.99+
Podium	ORGANIZATION	0.99+
Doug Laney	PERSON	0.99+
thecube.net	OTHER	0.99+
more than 50 percent	QUANTITY	0.99+
one	QUANTITY	0.99+
today	DATE	0.99+
Boston, Massachusetts	LOCATION	0.99+
MIT	ORGANIZATION	0.98+
GRC	ORGANIZATION	0.98+
One	QUANTITY	0.98+
this year	DATE	0.98+
both ends	QUANTITY	0.98+
50 different projects	QUANTITY	0.97+
Spark	TITLE	0.97+
Data Robot	ORGANIZATION	0.97+
Hadoop	TITLE	0.96+
S3	TITLE	0.95+
millions of dollars	QUANTITY	0.95+
GDPR	TITLE	0.95+
theCUBE	ORGANIZATION	0.95+
a decade ago	DATE	0.94+
over 100 profiling statistics	QUANTITY	0.91+
Cloud	TITLE	0.9+
Rotaready	ORGANIZATION	0.89+
One thing	QUANTITY	0.87+
first thing	QUANTITY	0.87+
VMware	TITLE	0.86+
Kerberos	TITLE	0.83+
The Podium Data Marketplace	ORGANIZATION	0.79+
first	QUANTITY	0.79+
LDAB	TITLE	0.79+
Pi, P-I	ORGANIZATION	0.77+
SiliconANGLE Media	ORGANIZATION	0.61+
a decade	QUANTITY	0.6+
wave	EVENT	0.45+
Protegrity	ORGANIZATION	0.44+

Peter Smails, Datos IO | CUBE Conversation with John Furrier

(light orchestral music) >> Hello, everyone, and welcome to the Cube Conversation here at the Palo Alto studios for theCUBE. I'm John Furrier, the co-founder of SiliconANGLE Media. We're here for some news analysis with Peter Smails, the CMO of Datos.IO D-a-t-o-s dot I-O. Hot new start up with some news. Peter was just here for a thought leader segment with Chris Cummings talking about the industry breakdown. But the news is hot, prior to re:Invent which you will be at? >> Absolutely. >> RecoverX is the product. 2.5, it's a release. So, you've got a point release on your core product. >> Correct. >> Welcome to this conversation. >> Thanks for having me. Yeah, we're excited to share the news. Big day for us. >> All right, so let's get into the hard news. You guys are announcing a point release of the latest product which is your core flagship, RecoverX. >> Correct. >> Love the name. Love the branding of the X in there. It reminds me of the iPhone, so makes me wanna buy one. But you know ... >> We can make that happen, John. >> You guys are the X Factor. So, we've been pretty bullish on what you guys are doing. Obviously, like the positioning. It's cloud. You're taking advantage of the growth in the cloud. What is this new product release? Why? What's the big deal? What's in it for the customer? >> So, I'll start with the news, and then we'll take a small step back and sort of talk about why exactly we're doing what we're doing. So, RecoverX 2.5 is the latest in our flagship RecoverX line. It's a cloud data management platform. And the market that we're going after and the market we're disrupting is the traditional data management space. The proliferation of modern applications-- >> John: Which includes which companies? >> So, the Veritas' of the world, the Commvault's of the world, the Dell EMC's of the world. Anybody that was in the traditional-- >> 20-year-old architected data backup and recovery software. >> You stole my fun fact. (laughs) But very fair point which is that the average age approximately of the leading backup and recovery software products is approximately 20 years. So, a lot's changed in the last 20 years, not the least of which has been this proliferation of modern applications, okay? Which are geo-distributed microservices oriented and the rapid proliferation of multicloud. That disrupts that traditional notion of data management specifically backup and recovery. That's what we're going after with RecoverX. RecoverX 2.5 is the most recent version. News on three fronts. One is on our advanced recovery, and we can double-click into those. But it's essentially all about giving you more data awareness, more granularity to what data you wanna recover and where you wanna put it, which becomes very important in the multicloud world. Number two is what we call data center aware backup and recovery. That's all about supporting geo-distributed application environments, which again, is the new normal in the cloud. And then number three is around enterprise hardening, specifically around security. So, it's all about us increased flexibility and new capabilities for the multicloud environment and continue to enterprise-harden the product. >> Okay, so you guys say significant upgrade. >> Peter: Yep. >> I wanna just look at that. I'm also pretty critical, and you know how I feel on this so don't take it personal, multicloud is not a real deal yet. It's in statement of value that customers are saying-- It's coming! But cloud is here today, regular cloud. So, multicloud ... Well, what is multicloud actually mean? I mean, I can have multiple clouds but I'm not actually moving workloads across clouds, yet. >> I disagree. >> Okay. >> I actually disagree. We have multiple customers. >> All right, debunk that. >> I will debunk that. Number one use case for RecoverX is backup and recovery. But with a twist of the fact that it's for these modern applications running these geo-distributed environments. Which means it's not about backing up my data center, it's about, I need to make a copy of my data but I wanna back it up in the cloud. I'm running my application natively in the cloud, so I want a backup in the cloud. I'm running my application in the cloud but I actually wanna backup from the cloud back to my private cloud. So, that in lies a backup and recovery, and operation recovery use case that involves multicloud. That's number one. Number two use case for RecoverX is what we talk about on data mobility. >> So, you have a different definition of multicloud. >> Sorry, what was your-- Our definition of multicloud is fundamentally a customer using multiple clouds, whether it be a private on-prem GCP, AWS, Oracle, any mix and match. >> I buy that. I buy that. Where I was getting critical of was a workload. >> Okay. >> I have a workload and I'm running it on Amazon. It's been architected for Amazon. Then I also wanna run that same workload on Azure and Google. >> Okay. >> Or Oracle or somewhere else. >> Yep. >> I have to re-engineer it (laughs) to move, and I can't share the data. So, to me what multicloud means, I can run it anywhere. My app anywhere. Backup is a little bit different. You're saying the cloud environments can be multiple environments for your solution. >> That is correct. >> So, you're looking at it from the other perspective. >> Correct. The way we define ourselves is application-centric data management. And what that essentially means is we don't care what the underlying infrastructure is. So, if you look at traditional backup and recovery products they're LUN-based. So, I'm going to backup my storage LUN. Or they're VM-based. And a lot of big companies made a lot of money doing that. The problem is they are no LUN's and VM's in hybrid cloud or multicloud environment. The only thing that's consistent across application, across cloud-environments is the data and the applications that are running. Where we focus is we're 100% application-centric. So, we integrate at the database level. The database is the foundation of any application you create. We integrate there, which makes us agnostic to the underlying infrastructure. We run, just as examples, we have customers running next generation applications on-prem. We have customers running next generation applications on AWS in GCP. Any permutation of the above, and to your point about back to the multicloud we've got organizations doing backup with us but then we also have organizations using us to take copies of their backup data and put them on whatever clouds they want for things like test and refresh. Or performance testing or business analytics. Whatever you might wanna do. >> So, you're pretty flexible. I like that. So, we talked before on other segments, and certainly even this morning about modern stacks. >> Yeah. >> Modern applications. This is the big to-do item for all CXOs and CIOs. I need a modern infrastructure. I need modern applications. I need modern developers. I need modern everything. Hyper, micro, ultra. >> Whatever buzz word you use. >> But you guys in this announcement have a couple key things I wanna just get more explanation on. One, advanced recovery, backup anywhere, recover anywhere, and you said enterprise-grade security is the third thing. >> Yep. >> So, let's just break them down one at a time. Advanced recovery for Datos 2.5, RecoverX 2.5. >> Yep. >> What is advanced recovery? >> It's very specifically about providing high levels of granularity for recovering your data, on two fronts. So, the use case is, again, backup. I need to recover data. But I don't wanna necessarily recover everything. I wanna get smarter about the data I wanna recover. Or it could be for non-operational use cases, which is I wanna spin up a copy of data to run test dev or to do performance testing on. What advanced recovery specifically means is number one, we've introduced the notion of queryble recovery. And what that means is that I can say things like star dot John star. And the results returning from that, because we're application-centric, and we integrated the database, we give you visibility to that. I wanna see everything star dot John star. Or I wanna recover data from a very specific row, in a very specific column. Or I want to mask data that I do not wanna be recovered and I don't want people to see. The implications of that are think about that from a performance standpoint. Now, I only recover the data I need. So, I'm very, very high levels of granularity based upon a query. So, I'm fast from an RTO standpoint. The second part of it is for non-operational requirements I only move the data that is select to that data set. And number three is it helps you with things like GDPR compliance and PII compliance because you can mask data. So, that's query-based recovery. That's number one. The second piece of advanced recovery is what we call incremental recovery. That is granular recovery based upon a time stamp. So, you can get within individual points in time. So, you can get to a very high level of granularity based upon time. So, it's all about visibility. It's your data and getting very granular in a smart way to what you wanna recover. So, if I kind of hear what you're saying, what you're saying is essentially you built in the operational effectiveness of being effective operationally. You know, time to backup recovery, all that good RTO stuff. Restoring stuff operationally >> Peter: Very quickly. >> very fast. >> Peter: In a smart way. >> So, there's a speed game there which is table stakes. But you're real value here is all these compliance nightmares that are coming down the pike, GDPR and others. There's gonna be more. >> Peter: Absolutely. I mean, it could be HIPPA, it could be GDPR, anything that involves-- >> Policy. >> Policies. Anything that requires, we're completely policy-driven. And you can create a policy to mask certain data based upon the criteria you wanna put in. So, it's all about-- >> So you're the best of performance, and you got some tunability. >> And it's all about being data aware. It's all about being data aware. So, that's what advanced recovery is. >> Okay, backup anywhere, recover anywhere. What does that mean? >> So, what that means is the old world of backup and recovery was I had a database running in my data center. And I would say database please take a snapshot of yourself so I can make a copy. The new world of cloud is that these microservices-based modern applications typically run, they're by definition distributed, And in many cases they run distributed across they're geo-distributed. So, what data center aware backup and recovery is, use a perfect example. We have a customer. They're running their eCommerce. So, leading online restaurant reservations company. They're running their eCommerce application on-prem, interestingly enough, but it's based on Cassandra distributed database. Excuse me, MongoDB. Sorry. They're running geo-distributed, sharded MongoDB clusters. Anybody in the traditional backup and recovery their head would explode when you say that. In the modern application world, that's a completely normal use case. They have a data center in the U.S. They have a data center in the U.K. What they want is they wanna be able to do local backup and recovery while maintaining complete global consistency of their data. So again, it's about recovery time ultimately but it's also being data aware and focusing only on the data that you need to backup and recovery. So, it's about performance but then it's also about compliance. It's about governance. That's what data center aware backup is. >> And that's a global phenomenon people are having with the GO. >> Absolutely. Yeah, you could be within country. It could be any number of different things that drive that. We can do it because we're data aware-- >> And that creates complexity for the customer. You guys can take that complexity away >> Correct. >> From the whole global, regional where the data can sit. >> Correct. I'd say two things actually. To give the customers credit, the customers building these apps or actually getting a lot smarter about what they're data is and where they're data is. >> So they expect this feature? >> Oh, absolutely. Absolutely. I wouldn't call it table stakes cause we're the only kids on the block that can do it. But this is in direct response to our customers that are building these new apps. I wanna get into some of the environmental and customer drivers in a second. I wanna nail the last segment down. Cause I wanna unpack the whole why is this trend happening? What's the gestation period? What's the main enabler for you? But okay, final point on the significant announcements. My favorite topic enterprise-grade security. What the hell does that mean? First of all, from your standpoint the industry's trying to solve the same thing. Enterprise-grade security, what are you guys providing in this? >> Number one, it's basically security protocol. So, TLS and SSL. This is weed stuff. TLS, SSL, so secure protocol support. It's integration with LDAP. So, if organizations are running, primarily if they're running on-prem and they're running in an LDAP environment, we're support there. And then we've got Kerberos support for Kerberos authentication. So, it's all about just checking the boxes around the different security >> So, this is like in between >> and transport protocol. >> the toes, the details around compliance, identity management. >> Peter: Bingo. >> I mean we just had Centrify's CyberConnect conference, and you're seeing a lot of focus on identity. >> Absolutely. And the reason that that's sort of from a market standpoint the reason that these are very important now is because the applications that we're supporting these are not science experiments. These are eCommerce applications. These are core business applications that mainstream enterprises are running, and they need to be protected and they're bringing the true, classic enterprise security, authentication, authorization requirements to the table. >> Are you guys aligning with those features? Or is there anything significant in that section? >> From an enterprise security standpoint? It's primarily about we provide the support, so we integrate with all of those environments and we can check the boxes. Oh, absolutely TLS. Absolutely, we've got that box checked because-- >> So, you're not competing with other cybersecurity? >> No, this is purely we need to do this. This is part of our enterprise-- >> This is where you partner. >> Peter: Well, no. For these things it's literally just us providing the protocol support. So, LDAP's a good example. We support LDAP. So, we show up and if somebody's using my data management-- >> But you look at the other security solutions as a way to integrate with? >> Yeah. >> Not so much-- >> Absolutely, no. This has nothing to do with the competition. It's just supporting ... I mean Google has their own protocol, you know, security protocols, so we support those. So, does Amazon. >> I really don't want to go into the customer benefits. We'll let the folks go to the Datos website, d-a-t-o-s dot i-o is the website, if you wanna check out all their customer references. I don't wanna kind of drill on that. I kind of wanna really end this segment on the real core issue for me is reading the tea leaves. You guys are different. You're now kind of seeing some traction and some growth. You're a new kind of animal in the zoo, if you will. (Peter laughs) You've got a relevant product. Why is it happening now? And I'm trying to get to understanding Cloud Oss is enabling a lot of stuff. You guys are an effect of that, a data point of what the cloud is enabled as a venture. Everything that you're doing, the value you create is the function of the cloud. >> Yes. >> And how data is moving. Where's this coming from? Is it just recently? Is it a gestation period of a few years? Where did this come from? You mentioned some comparisons like Oracle. >> So, I'll answer that in sort of, we like to use history as our guide. So, I'll answer that both in macro terms, and then I'll answer it in micro terms. From a macro term standpoint, this is being driven by the proliferation of new data sources. It's the easiest way to look at it. So, if you let history be your guide. There was about a seven to eight year proliferation or gap between proliferation of Oracle as the primary traditional relational database data source and the advent of Veritas who really defined themselves as the defacto standard for traditional on-prem data center relational data management. You look at that same model, you'll look at the proliferation of VMware. In the late 90s, about a seven to eight year gestation with the rapid adoption of Veeam. You know the early days a lot of folks laughed at Veeam, like, "Who's gonna backup VMs? People aren't gonna use VMs in the enterprise. Now, you looked at Veeam, great company. They've done some really tremendous things carving out much more than a niche providing backup and recovery and availability in a VM-based environment. The exact same thing is happening now. If you go back six to seven years from now, you had the early adoption of the MongoDBs, the Cassandras, the Couches. More recently you've got a much faster acceleration around the DynamoDBs and the cloud databases. We're riding that same wave to support that. >> This is a side effect of the enabling of the growth of cloud. >> Yes. >> So, similar to what you did in VMware with VMs and database for Oracle you guys are taking it to the next level. >> These new data sources are completely driven by the fact that the cloud is enabling this completely distributed, far more agile, far more dynamic, far less expensive application deployment model, and a new way of providing data management is required. That's what we do. >> Yeah, I mean it's a function of maturity, one. As Jeff Rickard, General Manager of theCube, always says, when the industry moves to it's next point of failure, in this case failure is problem and you solve. So, the headaches that come from the awesomeness of the growth. >> Absolutely. And to answer that micro-wise briefly. So, that was the macro. The micro is the proliferation of, the movement from monolithic apps to microservices-based app, it's happening. And the cloud is what's enabling them. The move from traditional on-prem to hybrid cloud is absolutely happening. That's by definition the cloud. The third piece which is cloud-centric is the world's moving from a scale up world to an elastic-compute, elastic storage model. We call that the modern IT stack. Traditional backup and recovery, traditional data management doesn't work in the new modern IT stack. That's the market we're planning. That's the market we're disrupting is all that traditional stuff moving to the modern IT stack. >> Okay, Datos IO announcing a 2.5 release of RecoverX, their flagship product, their start up growing out of Los Gatos. Peter Smails here, the CMO. Where ya gonna be next? What's going on-- I know we're gonna see you re:Invent in a week in a half. >> Absolutely. So, we've got two stops. Well, actually the next stop on the tour is re:Invent. So, absolutely looking forward to being back on theCUBE at re:Invent. >> And the company feels good about those things are good. You've got good money in the bank. You're growing. >> We feel fantastic. It's fascinating to watch as things develop. The conversations we have now versus even six months ago. It's sort of the tipping point of people get it. You sort of explain, "Oh, yeah it's data management from modern applications. Are you deploying modern applications?" Absolutely. >> Share one example to end this segment on what you hear over and over again from customers that illuminates what you guys are about as a company, the DNA, the value preposition, and their impact on results and value for customers. >> So, I'll use a case study as an example. You know, we're the world's largest home improvement retailers. Old way, was they ran their multi-billion dollar eCommerce infrastructure. Running on IBM Db2 database. Running in their on-prem data center. They've moved their world. They're now running, they've re-architected their application. It's now completely microservices-based running on Cassandra, deployed 100% in Google cloud platform. And they did that because they wanted to be more agile. They wanted to be more flexible. It's a far more cost effective deployment model. They are all in on the cloud. And they needed a next generation backup and recovery data protection, data management solution which is exactly what we do. So, that's the value. Backup's not a new problem. People need to protect data and they need to be able to take better advantage of the data. >> All right, so here's the final, final question. I'm a customer watching this video. Bottom line maybe, I'm kind of hearing all this stuff. When do I call you? What are the signals? What are the little smoke signals I see in my organization burning? When do I need to call you guys, Datos? >> You should call Datos IO anytime, if you're doing anything with development of modern applications, number one. If you're doing anything with hybrid cloud you should call us. Because you're gonna need to reevaluate your overall data management strategy it's that simple. >> All right, Peter Smails, the CMO of Datos, one of the hot companies here in Silicon Valley, out of Los Gatos, California. Of course, we're in Palo Alto at theCube Studios. I'm John Furrier. This is theCUBE conversation. Thanks for watching. (upbeat techno music)

Published Date : Nov 16 2017

SUMMARY :

But the news is hot, RecoverX is the product. Yeah, we're excited to share the news. of the latest product which is Love the branding of the X in there. What's in it for the customer? So, RecoverX 2.5 is the latest in So, the Veritas' of the world, data backup and recovery software. is that the average age Okay, so you guys and you know how I feel on I actually disagree. I'm running my application in the cloud So, you have a different Our definition of critical of was a workload. I have a workload and You're saying the cloud environments from the other perspective. The database is the foundation So, we talked before on other segments, This is the big to-do item security is the third thing. So, let's just break So, the use case is, again, backup. that are coming down the I mean, it could be And you can create a and you got some tunability. So, that's what advanced recovery is. What does that mean? the data that you need And that's a global phenomenon Yeah, you could be within country. complexity for the customer. From the whole global, the customers building these on the block that can do it. checking the boxes around the toes, the details I mean we just had Centrify's is because the applications and we can check the boxes. This is part of our enterprise-- providing the protocol support. So, does Amazon. You're a new kind of animal in the zoo, And how data is moving. and the advent of Veritas of the growth of cloud. So, similar to what you did that the cloud is enabling So, the headaches that come from We call that the modern IT stack. Peter Smails here, the CMO. on the tour is re:Invent. And the company feels good It's sort of the tipping as a company, the DNA, So, that's the value. All right, so here's the you should call us. Smails, the CMO of Datos,

ENTITIES

Entity	Category	Confidence
Jeff Rickard	PERSON	0.99+
Google	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Peter Smails	PERSON	0.99+
Chris Cummings	PERSON	0.99+
John Furrier	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
Peter	PERSON	0.99+
Palo Alto	LOCATION	0.99+
100%	QUANTITY	0.99+
Peter Smails	PERSON	0.99+
John	PERSON	0.99+
Veeam	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
One	QUANTITY	0.99+
Los Gatos	LOCATION	0.99+
second part	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
U.S.	LOCATION	0.99+
eight year	QUANTITY	0.99+
two fronts	QUANTITY	0.99+
U.K.	LOCATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
third piece	QUANTITY	0.99+
Palo Alto	LOCATION	0.99+
second piece	QUANTITY	0.99+
three fronts	QUANTITY	0.99+
GDPR	TITLE	0.99+
both	QUANTITY	0.99+
today	DATE	0.99+
approximately 20 years	QUANTITY	0.99+
iPhone	COMMERCIAL_ITEM	0.98+
two things	QUANTITY	0.98+
six	QUANTITY	0.98+
theCube	ORGANIZATION	0.98+
Los Gatos, California	LOCATION	0.98+
two stops	QUANTITY	0.98+
third thing	QUANTITY	0.98+
Veritas	ORGANIZATION	0.98+
IBM	ORGANIZATION	0.98+
Cloud Oss	TITLE	0.98+
late 90s	DATE	0.98+
X Factor	TITLE	0.98+
Dell EMC	ORGANIZATION	0.98+
Number two	QUANTITY	0.97+
First	QUANTITY	0.97+
six months ago	DATE	0.97+
one	QUANTITY	0.96+
20-year-old	QUANTITY	0.96+
MongoDB	TITLE	0.94+
RecoverX	ORGANIZATION	0.94+
Datos.IO	ORGANIZATION	0.94+
number three	QUANTITY	0.94+
one example	QUANTITY	0.93+
RecoverX 2.5	TITLE	0.92+
multicloud	ORGANIZATION	0.92+
seven years	QUANTITY	0.9+
RecoverX	TITLE	0.87+
multi-billion dollar	QUANTITY	0.87+
Centrify	ORGANIZATION	0.86+
a week in a half	QUANTITY	0.86+
GCP	ORGANIZATION	0.84+
2.5	QUANTITY	0.84+
Datos	ORGANIZATION	0.83+
couple	QUANTITY	0.83+
Kerberos	TITLE	0.82+
re:Invent	EVENT	0.82+

David McNeely, Centrify | CyberConnect 2017

(upbeat music) >> Narrator: Live from New York City It's theCUBE, covering CyberConnect 2017. Brought to you by Centrify and the Institute for Critical Infrastructure Technology. >> Hey, welcome back everyone. Live here in New York is theCUBE's exclusive coverage of Centrify's CyberConnect 2017, presented by Centrify. It's an industry event that Centrify is underwriting but it's really not a Centrify event, it's really where industry and government are coming together to talk about the best practices of architecture, how to solve the biggest crisis of our generation, and the computer industry that is security. I am John Furrier, with my co-host Dave Vellante. Next guest: David McNeely, who is the vice president of product strategy with Centrify, welcome to theCUBE. >> Great, thank you for having me. >> Thanks for coming on. I'm really impressed by Centrify's approach here. You're underwriting the event but it's not a Centrify commercial. >> Right >> This is about the core issues of the community coming together, and the culture of tech. >> Right. >> You are the product. You got some great props from the general on stage. You guys are foundational. What does that mean, when he said that Centrify could be a foundational element for solving this problem? >> Well, I think a lot of it has to do with if you look at the problems that people are facing, the breaches are misusing computers in order to use your account. If your account is authorized to still gain access to a particular resource, whether that be servers or databases, somehow the software and the systems that we put in place, and even some of the policies need to be retrofitted in order to go back and make sure that it really is a human gaining access to things, and not malware running around the network with compromised credentials. We've been spending a lot more time trying to help customers eliminate the use of passwords and try to move to stronger authentication. Most of the regulations now start talking about strong authentication but what does that really mean? It can't just be a one time passcode delivered to your phone. They've figured out ways to break into that. >> Certificates are being hacked and date just came out at SourceStory even before iStekNET's certificate authorities, are being compromised even before the big worm hit in what he calls the Atom Bomb of Malware. But this is the new trend that we are seeing is that the independent credentials of a user is being authentically compromised with the Equifax and all these breaches where all personal information is out there, this is a growth area for the hacks that people are actually getting compromised emails and sending them. How do you know it's not a fake account if you think it's your friend? >> Exactly. >> And that's the growth area, right? >> The biggest problem is trying to make sure that if you do allow someone to use my device here to gain access to my mail account, how do we make it stronger? How do we make sure that it really is David that is logged onto the account? If you think about it, my laptop, my iPad, my phone all authenticate and access the same email account and if that's only protected with a password then how good is that? How hard is it to break passwords? So we are starting to challenge a lot of base assumptions about different ways to do security because if you look at some of the tools that the hackers have their tooling is getting better all the time. >> So when, go ahead, sorry. finish your thoughts. >> Tools like their HashCat can break passwords. Like millions and millions a second. >> You're hacked, and basically out there. >> When you talk about eliminating passwords, you're talking about doing things other than just passwords, or you mean eliminating passwords? >> I mean eliminating passwords. >> So how does that work? >> The way that works is you have to have a stronger vetting process around who the person is, and this is actually going to be a challenge as people start looking at How do you vet a person? We ask them a whole bunch of questions: your mother's maiden name, where you've lived, other stuff that Equifax asked-- >> Yeah, yeah, yeah, everybody has. >> We ask you all of that information to find out is it really you?. But really the best way to do it now is going to be go back to government issued IDs because they have a vetting process where they're establishing an identity for you. You've got a driver's license, we all have social security numbers, maybe a passport. That kind of information is really the only way to start making sure it really is me. This is where you start, and the next place is assigning a stronger credential. So there is a way to get a strong credential on to your mobile device. The issuance process itself generates the first key pair inside the device in a protected place, that can't be compromised because it is part of the hardware, part of the chip that runs the processes of the phone and that starts acting as strong as a smart card. In the government they call it derived credentials. It's kind of new technology, NIST has had described documentation on how to make that work for quite some time but actually implementing it and delivering it as a solution that can be used for authentication to other things is kind of new here. >> A big theme of your talk tomorrow is on designing this in, so with all of this infrastructure out there I presume you can't just bolt this stuff on and spread it in a peanut butter spread across, so how do we solve that problem? Is it just going to take time-- >> Well that's actually-- >> New infrastructure? Modernization? >> Dr. Ron Ross is going to be joining me tomorrow and he is from the NIST, and we will be talking with him about some of these security frameworks that they've created. There's cyber security framework, there's also other guidance that they've created, the NIST 800-160, that describe how to start building security in from the very start. We actually have to back all the way up to the app developer and the operating system developers and get them to design security into the applications and also into the operating systems in such a way that you can trust the OS. Applications sitting on top of an untrusted operating system is not very good so the applications have to be sitting on top of trusted operating systems. Then we will probably get into a little bit of the newer technology. I am starting to find a lot of our customers that move to cloud based infostructures, starting to move their applications into containers where there is a container around the application, and actually is not bound so heavily to the OS. I can deploy as many of these app containers as I want and start scaling those out. >> So separate the workload from some of your infostructure. You're kind of seeing that trend? >> Exactly and that changes a whole lot of the way we look at security. So now your security boundary is not the machine or the computer, it's now the application container. >> You are the product strategist. You have the keys to the kingdom at Centrify, but we also heard today that it's a moving train, this business, it's not like you can lock into someone. Dave calls it the silver bullet and it's hard to get a silver bullet in security. How do you balance the speed of the game, the product strategy, and how do you guys deal with bringing customer solutions to the market that has an architectural scalability to it? Because that's the challenge. I am a slow enterprise, but I want to implement a product, I don't want to be obsolete by the time I roll it out. I need to have a scalable solution that can give me the head room and flexibility. So you're bringing a lot to the table. Explain what's going on in that dynamic. >> There's a lot of the, I try as much as possible to adhere to standards before they exist and push and promote those like on the authentication side of things. For the longest time we used LDAP and Kerberos to authenticate computers, to act a directory. Now almost all of the web app develops are using SAML or OpenID Connect or OLAF too as a mechanism for authenticating the applications. Just keeping up with standards like that is one of the best ways. That way the technologies and tools that we deliver just have APIs that the app developers can use and take advantage of. >> So I wanted to follow up on that because I was going to ask you. Isn't there a sort of organizational friction in that you've got companies, if you have to go back to the developers and the guys who are writing code around the OS, there's an incentive from up top to go for fast profits. Get to market as soon as you can. If I understand what you just said, if you are able to use open source standards, things like OLAF, that maybe could accelerate your time to market. Help me square that circle. Is there an inherent conflict between the desire to get short term profits versus designing in good security? >> It does take a little bit of time to design, build, and deliver products, but as we moved to cloud based infostructure we are able to more rapidly deploy and release features. Part of having a cloud service, we update that every month. Every 30 days we have a new version of that rolling out that's got new capabilities in it. Part of adapting an agile delivery models, but everything we deliver also has an API so when we go back and talk to the customers and the developer at the customer organizations we have a rich set of APIs that the cloud service exposes. If they uncover a use case or a situation that requires something new or different that we don't have then that's when I go back to the product managers and engineering teams and talk about adding that new capability into the cloud service, which we can expect the monthly cadence helps me deliver that more rapidly to the market. >> So as you look at the bell curve in the client base, what's the shape of those that are kind of on the cutting edge and doing by definition, I shouldn't use the term cutting edge, but on the path to designing in as you would prescribe? What's it look like? Is it 2080? 199? >> That's going to be hard to put a number on. Most of the customers are covering the basics with respect to consolidating identities, moving to stronger authetication, I'm finding one of the areas that the more mature companies have adopted as this just in time notion where by default nobody has any rights to gain access to either systems or applications, and moving it to a workflow request access model. So that's the one that's a little bit newer that fewer of my customers are using but most everybody wants to adopt. If you think about some of the attacks that have taken place, if I can get a piece of email to you, and you think it's me and you open up the attachment, at that point you are now infected and the malware that's on your machine has the ability to use your account to start moving around and authenticating the things that you are authorized to get to. So if I can send that piece of email and accomplish that, I might target a system administrator or system admins and go try to use their account because it's already authorized to go long onto the database servers, which is what I'm trying to get to. Now if we could flip it say well, yeah. He's a database admin but if he doesn't have permissions to go log onto anything right now and he has to make a request then the malware can't make the request and can't get the approval of the manager in order to go gain access to the database. >> Now, again, I want to explore the organizational friction. Does that slow down the organization's ability to conduct business and will it be pushed back from the user base or can you make that transparent? >> It does slow things down. We're talking about process-- >> That's what it is. It's a choice that organizations have to make if you care about the long term health of your company, your brand, your revenues or do you want to go for the short term profit? >> That is one of the biggest challenges that we describe in the software world as technical debt. Some IT organizations may as well. It's just the way things happen in the process by which people adhere to things. We find all to often that people will use the password vault for example and go check out the administrator password or their Dash-A account. It's authorized to log on to any Windows computer in the entire network that has an admin. And if they check it out, and they get to use it all day long, like okay did you put it in Clipboard? Malware knows how to get to your clipboard. Did you put it in a notepad document stored on your desktop? Guess what? Malware knows how to get to that. So now we've got a system might which people might check out a password and Malware can get to that password and use it for the whole day. Maybe at the end of the day the password vault can rotate the password so that it is not long lived. The process is what's wrong there. We allow humans to continue to do things in a bad way just because it's easy. >> The human error is a huge part in this. Administrators have their own identity. Systems have a big problem. We are with David McNeely, the vice president of product strategy with Centrify. I've got to get your take on Jim Ruth's, the chief security officer for Etna that was on the stage, great presentation. He's really talking about the cutting edge things that he's doing unconventionally he says, but it's the only way for him to attack the problem. He did do a shout out for Centrify. Congratulations on that. He was getting at a whole new way to reimagine security and he brought up that civilizations crumble when you lose trust. Huge issues. How would you guys seeing that help you guys solve problems for your customers? Is Etna a tell-sign for which direction to go? >> Absolutely, I mean if you think about problem we just described here the SysAdmin now needs to make a workflow style request to gain access to a machine, the problem is that takes time. It involves humans and process change. It would be a whole lot nicer, and we've already been delivering solutions that do this Machine learning behavior-based access controls. We tied it into our multifactor authentication system. The whole idea was to get the computers to make a decision based on behavior. Is it really David at the keyboard trying to gain access to a target application or a server? The machine can learn by patterns and by looking at my historical access to go determine does that look, and smell, and feel like David? >> The machine learning, for example. >> Right and that's a huge part of it, right? Because if we can get the computers to make these decisions automatically, then we eliminate so much time that is being chewed up by humans and putting things into a queue and then waiting for somebody to investigate. >> What's the impact of machine-learning on security in your opinion? Is it massive in the sense of, obviously it's breached, no it's going to be significant, but what areas is it attacking? The speed of the solution? The amount of data it can go through? Unique domain expertise of the applications? Where is the a-ha, moment for the machine learning value proposition? >> It's really going to help us enormously on making more intelligent decisions. If you think about access control systems, they all make a decision based on did you supply the correct user ID and password, or credential, or did you have access to whatever that resource is? But we only looked at two things. The authentication, and the access policy, and these behavior based systems, they look at a lot of other things. He mentioned 60 different attributes that they're looking at. And all of these attributes, we're looking at where's David's iPad? What's the location of my laptop, which would be in the room upstairs, my phone is nearby, and making sure that somebody is not trying to use my account from California because there's no way I could get from here to California at a rapid pace. >> Final question for you while we have a couple seconds left here. What is the value propositions for Centrify? If you had the bottom line of the product strategy in a nutshell? >> Well, kind of a tough one there. >> Identity? Stop the Breach is the tagline. Is it the identity? Is it the tech? Is it the workflow? >> Identity and access control. At the end of the day we are trying to provide identity and access controls around how a user accesses an application, how we access servers, privileged accounts, how you would access your mobile device and your mobile device accesses applications. Basically, if you think about what defines an organization, identity, the humans that work at an organization and your rights to go gain access to applications is what links everything together because as you start adopting cloud services as we've adopted mobile devices, there's no perimeter any more really for the company. Identity makes up the definition and the boundary of the organization. >> Alright, David McNeely, vice president of product strategy, Centrify. More live coverage, here in New York City from theCUBE, at CyberConnect 2017. The inaugural event. Cube coverage continues after this short break. (upbeat music)

Published Date : Nov 6 2017

SUMMARY :

Brought to you by Centrify and and the computer industry that is security. I'm really impressed by Centrify's approach here. This is about the core issues of the community You are the product. Well, I think a lot of it has to do with if you look is that the independent credentials of a user is David that is logged onto the account? finish your thoughts. Tools like their HashCat can break passwords. that runs the processes of the phone so the applications have to be sitting on top of So separate the workload from some of your infostructure. is not the machine or the computer, You have the keys to the kingdom at Centrify, For the longest time we used LDAP and Kerberos the desire to get short term profits and the developer at the customer organizations has the ability to use your account from the user base or can you make that transparent? It does slow things down. have to make if you care about the long term That is one of the biggest challenges that we describe seeing that help you guys solve problems for your customers? Is it really David at the keyboard Because if we can get the computers to make these decisions The authentication, and the access policy, What is the value propositions for Centrify? Is it the identity? and the boundary of the organization. of product strategy, Centrify.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
David McNeely	PERSON	0.99+
Centrify	ORGANIZATION	0.99+
California	LOCATION	0.99+
Dave	PERSON	0.99+
Institute for Critical Infrastructure Technology	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
David	PERSON	0.99+
New York City	LOCATION	0.99+
Ron Ross	PERSON	0.99+
NIST	ORGANIZATION	0.99+
60 different attributes	QUANTITY	0.99+
iPad	COMMERCIAL_ITEM	0.99+
iStekNET	ORGANIZATION	0.99+
millions	QUANTITY	0.99+
Equifax	ORGANIZATION	0.99+
two things	QUANTITY	0.99+
New York	LOCATION	0.99+
today	DATE	0.99+
one	QUANTITY	0.99+
tomorrow	DATE	0.99+
first key pair	QUANTITY	0.99+
SourceStory	ORGANIZATION	0.98+
one time	QUANTITY	0.98+
2080	DATE	0.98+
Jim Ruth	PERSON	0.98+
CyberConnect 2017	EVENT	0.97+
SysAdmin	ORGANIZATION	0.95+
millions a second	QUANTITY	0.95+
theCUBE	ORGANIZATION	0.93+
Windows	TITLE	0.92+
OLAF	TITLE	0.9+
OpenID Connect	TITLE	0.9+
Etna	ORGANIZATION	0.89+
Dr.	PERSON	0.85+
SAML	TITLE	0.85+
HashCat	TITLE	0.85+
couple seconds	QUANTITY	0.74+
LDAP	TITLE	0.73+
Every 30 days	QUANTITY	0.69+
Centrify	EVENT	0.69+
lot more time	QUANTITY	0.67+
notepad	COMMERCIAL_ITEM	0.66+
Kerberos	TITLE	0.65+
199	QUANTITY	0.64+
Atom Bomb	OTHER	0.62+
800-160	COMMERCIAL_ITEM	0.45+
Cube	ORGANIZATION	0.41+
Malware	TITLE	0.4+

Mark Grover & Jennifer Wu | Spark Summit 2017

>> Announcer: Live from San Francisco, it's the Cube covering Spark Summit 2017, brought to you by databricks. >> Hi, we're back here where the Cube is live, and I didn't even know it Welcome, we're at Spark Summit 2017. Having so much fun talking to our guests I didn't know the camera was on. We are doing a talk with Cloudera, a couple of experts that we have here. First is Mark Grover, who's a software engineer and an author. He wrote the book, "Dupe Application Architectures." Mark, welcome to the show. >> Mark: Thank you very much. Glad to be here. And just to his left we also have Jennifer Wu, and Jennifer's director of product management at Cloudera. Did I get that right? >> That's right. I'm happy to be here, too. >> Alright, great to have you. Why don't we get started talking a little bit more about what Cloudera is maybe introducing new at the show? I saw a booth over here. Mark, do you want to get started? >> Mark: Yeah, there are two exciting things that we've launched at least recently. There Cloudera Altus, which is for transient work loads and being able to do ETL-Like workloads, and Jennifer will be happy to talk more about that. And then there's Cloudera data science workbench, which is this tool that allows folks to use data science at scale. So, get away from doing data science in silos on your personal laptops, and do it in a secure environment on cloud. >> Alright, well, let's jump into Data Science Workbench first. Tell me a little bit more about that, and you mentioned it's for exploratory data science. So give us a little more detail on what it does. >> Yeah, absolutely. So, there was private beta for Cloudera Data Science Workbench earlier in the year and then it was GA a few months ago. And it's like you said, an exploratory data science tool that brings data science to the masses within an enterprise. Previously people used to have, it was this dichotomy, right? As a data scientist, I want to have the latest and greatest tools. I want to use the latest version of Python, the latest notebook kernel, and I want to be able to use R and Python to be able to crunch this data and run my models in machine learning. However, on the other side of this dichotomy are the IT organization of the organization, where if they want to make sure that all tools are compliant and that your clusters are secure, and your data is not going into places that are not secured by state of the art security solutions, like Kerberos for example, right? And of course if the data scientists are putting the data on their laptops and taking the laptop around to wherever they go, that's not really a solution. So, that was one problem. And the other one was if you were to bring them all together in the same solution, data scientists have different requirements. One may want to use Python 2.6. Another one maybe want to use 3.2, right? And so Cloudera Data Science Workbench is a new product that allows data scientists to visualize and do machine learning through this very nice notebook-like interface, share their work with the rest of their colleagues in the organization, but also allows you to keep your clusters secure. So it allows you to run against a Kerberized cluster, allows single sign on to your web interface to Data Science Workbench, and provides a really nice developer experience in the sense that My workflow and my tools and my version of Python does not conflict with Jennifer's version of Python. We all have our own docker and Kubernetes-based infrastructure that makes sure that we use the packages that we need, and they don't interfere with each other. We're going to go to Jennifer on Altus in just a few minutes, but George first give you a chance to maybe dig in on Data Science workshop. >> Two questions on the data science side: some of the really toughest nuts to crack have been Sort of a common environment for the collaborators, but also the ability to operationalize the models once you've sort of agreed on them, and manage the lifecycle across teams, you know? Like, challenger champion, promote something, or even before that doing the ab testing, and then sort of what's in production is typically in a different language from what, you know, it was designed in and sort of integrating it with the apps. Where is that on the road map? Cause no one really has a good answer for that. >> Yeah, that's an excellent question. In general I think it's the problem to crack these days. How do you productionalize something that was written by a data scientist in a notebook-like system onto the production cluster, right? And I think the part where the data scientist works in a different language than the language that's in production, I think that problem, the best I can say right now is to actually have someone rewrite that. Have someone rewrite that in the language you're going to make in production, right? I don't see that to be the more common part. I think the more widespread problem is even when the language is production, how do you go making the part that the data scientist wrote, the model or whatever that would be, into a prodution cluster? And so, Data Science Workbench in particular runs on the same cluster that is being managed by Cloudera manager, right? So this is a tool that you install, but that is available to you as a web server, as a web interface, and so that allows you to move your development machine learning algorithms from your data science workbench to production much more easier, because it's all running on the same hardware and same systems. There's no separate Cloudera managers that you have to use to manage the workbench compared to your actual cluster. >> Okay. A tangential question, but one of the, the difficulties of doing machine learning is finding all the training data and, and sort of data science expertise to sit with the domain expert to, you know, figure out proper model of features, things like that. One of the things we've seen so far from the cloud vendors is they take their huge datasets in terms of voice, you know, images. They do the natural language understanding, speech or rather text to speech, you know, facial recognition. Cause they have such huge datasets they can train on. We're hearing noises that they'd going to take that down to the more mundane statistical kind of machine learning algorithms, so that you wouldn't be, like, here's a algorithm to do churn, you know, go to town, but that they might have something that's already kind of pre-populated that you would just customize. Is that something that you guys would tackle, too? >> I can't speak for the road map in that sense, but I think some of that problem needs to be tackled by projects like Spark for example. So I think as the stack matures, it's going to raise the level of abstraction as time goes on. And I think whatever benefits Spark ecosystem will have will come directly to distributions like Cloudera. >> George: That's interesting. >> Yeah >> Okay >> Alright, well let's go to Jennifer now and talk about Altus a little bit. Now you've been on the Cube show before, right? >> I have not. >> Okay, well, familiar with your work. Tell us again, you're the product manager for Altus. What does it do, and what was the motivation to build it? >> Yeah, we're really excited about Cloudera Altus. So, we released Cloudera Altus in its first GA form in April, and we launched Cloudera Altus in a public environment in Strata London about two weeks ago, so we're really excited about this and we are very excited to now open this up to all of the customer base. And what it is is a platform as a service offering designed to leverage, basically, the agility and the scale of cloud, and make a very easy to use type of experience to expose Cloudera capacity for, in particular for data engineering type of workloads. So the end user will be able to very easily, in a very agile manner, get data engineering capacity on Cloudera in the cloud, and they'll be able to do things like ETL and large scale data processing, productionized machine learning workflows in the cloud with this new data engineering as a service experience. And we wanted to abstract away the cloud, and cluster operations, and make the end user a really, the end user experience very easy. So, jobs and workloads as first class objects. You can do things like submit jobs, clone jobs, terminate jobs, troubleshoot jobs. We wanted to make this very, very easy for the data engineering end user. >> It does sound like you've sort of abstracted away a lot of the infrastructure that you would associate with on-prem, and sort of almost make it, like, programmable and invisible. But, um, I guess my, one of my questions is when you put it in a cloud environment, when you're on-prem you have a certain set of competitors which is kind of restrictive, because you are the standalone platform. But when you go on the cloud, someone might say, "I want to use red shift on Amazon," or Snowflake, you know, as the MPP sequel database at the end of a pipeline. And it's not just, I'm using those as examples. There's, you know, dozens, hundreds, thousands of other services to choose from. >> Yes. >> What happens to the integrity of that platform if someone carves off one piece? >> Right. So, interoperability and a unified data pipeline is very important to us, so we want to make sure that we can still service the entire data pipeline all the way from ingest and data processing to analytics. So our team has 24 different open source components that we deliver in the CDH distribution, and we have committers across the entire stack. We know the application, and we want to make sure that everything's interoperable, no matter how you deploy the cluster. So if you deploy data engineering clusters through Cloudera Altus, but you deployed Impala clusters for data marks in the cloud through Cloudera Director or through any other format, we want all these clusters to be interoperable, and we've taken great pains in order to make everything work together well. >> George: Okay. So how do Altus and Sata Science Workbench interoperate with Spark? Maybe start with >> You want to go first with Altus? >> Sure, so, we, in terms of interoperability we focus on things like making sure there are no data silos so that the data that you use for your entire data lake can be consumed by the different components in our system, the different compute engines and different tools, and so if you're processing data you can also look at this data and visualize this data through Data Science Workbench. So after you do data ingestion and data processing, you can use any of the other analytic tools and then, and this includes Data Science Workbench. >> Right, and for Data Science Workbench runs, for example, with the latest version of Spark you could pick, the currently latest released version of Spark, Spark 2.1, Spark 2.2 is being boarded of course, and that will soon be integrated after its release. For example you could use Data Science Workbench with your flavor of Spark two's version and you can run PySpark or Scala jobs on this notebook-like interface, be able to share your work, and because you're using Spark Underneath the hood it uses yarn for resource management, the Data Science Workbench itself uses Docker for configuration management, and Kubernetes for resource managing these Docker containers. >> What would be, if you had to describe sort of the edge conditions and the sweet spot of the application, I mean you talked about data engineering. One thing, we were talking to Matei Zaharia and Ronald Chin about was, and Ali Ghodsi as well was if you put Spark on a database, or at least a, you know, sophisticated storage manager, like Kudu, all of a sudden there're a whole new class of jobs or applications that open up. Have you guys thought about what that might look like in the future, and what new applications you would tackle? >> I think a lot of that benefit, for example, could be coming from the underlying storage engine. So let's take Spark on Kudu, for example. The inherent characteristics of Kudu today allow you to do updates without having to either deal with the complexity of something like Hbase, or the crappy performance of dealing HDFS compactions, right? So the sweet spot comes from Kudu's capabilities. Of course it doesn't support transactions or anything like that today, but imagine putting something like Spark and being able to use the machine learning libraries and, we have been limited so far in the machine learning algorithms that we have implemented in Spark by the storage system sometimes, and, for example new machine learning algorithms or the existing ones could rewritten to make use of the update features for example, in Kudu. >> And so, it sounds like it makes it, the machine learning pipeline might get richer, but I'm not hearing that, and maybe this isn't sort of in the near term sort of roadmap, the idea that you would build sort of operational apps that have these sophisticated analytics built in, you know, where the analytics, um, you've done the training but at run time, you know, the inferencing influences a transaction, influences a decision. Is that something that you would foresee? >> I think that's totally possible. Again, at the core of it is the part that now you have one storage system that can do scans really well, and it can also do random reads and writes any place, right? So as your, and so that allows applications which were previously siloed because one appication that ran off of HDFS, another application that ran out of Hbase, and then so you had to correlate them to just being one single application that can use to train and then also use their trained data to then make decisions on the new transactions that come in. >> So that's very much within the sort of scope of imagination, or scope. That's part of sort of the ultimate plan? >> Mark: I think it's definitely conceivable now, yeah. >> Okay. >> We're up against a hard break coming up in just a minute, so you each get a 30-second answer here, so it's the same question. You've been here for a day and a half now. What's the most surprising thing you've learned that you thing should be shared more broadly with the Spark community? Let's start with you. >> I think one of the great things that's happening in Spark today is people have been complaining about latency for a long time. So if you saw the keynote yesterday, you would see that Spark is making forays into reducing that latency. And if you are interested in Spark, using Spark, it's very exciting news. You should keep tabs on it. We hope to deliver lower latency as a community sooner. >> How long is one millisecond? (Mark laughs) >> Yeah, I'm largely focused on cloud infrastructure and I found here at the conference that, like, many many people are very much prepared to actually start taking more, you know, more POCs and more interest in cloud and the response in terms of all of this in Altus has been very encouraging. >> Great. Well, Jennifer, Mark, thank you so much for spending some time here on the Cube with us today. We're going to come by your booth and chat a little bit more later. It's some interesting stuff. And thank you all for watching the Cube today here at Spark Summit 2017, and thanks to Cloudera for bringing us these two experts. And thank you for watching. We'll see you again in just a few minutes with our next interview.

Published Date : Jun 7 2017

SUMMARY :

covering Spark Summit 2017, brought to you by databricks. I didn't know the camera was on. And just to his left we also have Jennifer Wu, I'm happy to be here, too. Mark, do you want to get started? and being able to do ETL-Like workloads, and you mentioned it's for exploratory data science. And the other one was if you were to bring them all together and manage the lifecycle across teams, you know? and so that allows you to move your development machine the domain expert to, you know, I can't speak for the road map in that sense, and talk about Altus a little bit. to build it? on Cloudera in the cloud, and they'll be able to do things a lot of the infrastructure that you would associate with We know the application, and we want to make sure Maybe start with so that the data that you use for your entire data lake and you can run PySpark in the future, and what new applications you would tackle? or the existing ones could rewritten to make use the idea that you would build sort of operational apps Again, at the core of it is the part that now you have That's part of sort of the ultimate plan? that you thing should be shared more broadly So if you saw the keynote yesterday, you would see that and the response in terms of all of this on the Cube with us today.

ENTITIES

Entity	Category	Confidence
Jennifer	PERSON	0.99+
Mark Grover	PERSON	0.99+
Jennifer Wu	PERSON	0.99+
Ali Ghodsi	PERSON	0.99+
George	PERSON	0.99+
Mark	PERSON	0.99+
April	DATE	0.99+
Ronald Chin	PERSON	0.99+
San Francisco	LOCATION	0.99+
Matei Zaharia	PERSON	0.99+
30-second	QUANTITY	0.99+
Cloudera	ORGANIZATION	0.99+
Dupe Application Architectures	TITLE	0.99+
dozens	QUANTITY	0.99+
Python	TITLE	0.99+
yesterday	DATE	0.99+
Two questions	QUANTITY	0.99+
today	DATE	0.99+
Spark	TITLE	0.99+
Amazon	ORGANIZATION	0.99+
two experts	QUANTITY	0.99+
a day and a half	QUANTITY	0.99+
First	QUANTITY	0.99+
one problem	QUANTITY	0.99+
Python 2.6	TITLE	0.99+
Strata London	LOCATION	0.99+
one piece	QUANTITY	0.99+
first	QUANTITY	0.98+
Spark Summit 2017	EVENT	0.98+
Cloudera Altus	TITLE	0.98+
Scala	TITLE	0.98+
Docker	TITLE	0.98+
One	QUANTITY	0.97+
Kudu	ORGANIZATION	0.97+
one millisecond	QUANTITY	0.97+
PySpark	TITLE	0.96+
R	TITLE	0.95+
one	QUANTITY	0.95+
two weeks ago	DATE	0.93+
Data Science Workbench	TITLE	0.92+
Cloudera	TITLE	0.91+
hundreds	QUANTITY	0.89+
Hbase	TITLE	0.89+
each	QUANTITY	0.89+
24 different open source components	QUANTITY	0.89+
few months ago	DATE	0.89+
single	QUANTITY	0.88+
kernel	TITLE	0.88+
Altus	TITLE	0.88+

Jack Norris - Strata Conference 2012 - theCUBE

>>Hi everybody. We're back. This is Dave Volante from Wiki bond.org. We're live at strata in Santa Clara, California. This is Silicon angle TVs, continuous coverage of the strata conference. So Riley media or Raleigh media is a great partner of ours. And thanks to them for allowing us to be here. We've been going all week cause it's day three for us. I'm here with Jeff Kelly Wiki bonds that lead big data analysts. And we're here with Jack Norris. Who's the VP of marketing at Matt bar Jack. Welcome to the cube. Thank you, Dave. Thanks very much for coming on. And you know, we've been going all week. You guys are a great sponsor of ours. Thank you for the support. We really appreciate it. How's the show going for you? >>Great. A lot of attention, a lot of focus, a lot of discussion about Hadoop and big data. >>Yeah. So you guys getting a lot of traffic. I mean, it says I hear this 2,500 people here up from 1400 last year. So that's >>Yeah, we've had like five, six people deep in the, in the booth. So I think there's a lot of, a lot of interests. There's interesting. >>You know, when we were here last year, when you looked at the, the infrastructure and the competitive landscape, there wasn't a lot going on and just a very short time, that's completely changed. And you guys have had your hand in that. So, so that's good. Competition is a good thing, right? And, and obviously customers want choice, but so we want to talk about that a little bit. We want to talk about map bar, the kind of problems you're solving. So why don't we start there? What is map are all about? And you've got your own distribution of, of, of enterprise Hadoop. You make it Hadoop enterprise ready? Let's start there. >>Okay. Yeah, I mean, we invested heavily in creating a alternative distribution one that took the best of the open source community with the best of the map, our innovations, and really it's, it's about making Hadoop more applicable, broader use cases, more mission, critical support, you know, being able to sit in and work in a lights out data center environment. >>Okay. So what was the problem that you set out to solve? Why, why do, why do we need another distribution of Hadoop? Let me ask it that way. Get nice and close to. >>So there, there are some just big issues with, with the duke. >>One of those issues, let's talk about that. There's >>Some ease of use issues. There's some deep dependability issues. There's some, some performance. So, you know, let's take those in order right now. If you look at some of the distributions, Apache Hadoop, great technology, but it requires a programmer, right? To get access to the data it's through the Hadoop API, you can't really see the data. So there's a lot of focus of, you know, what do I do once the data's in there opening that up, providing a full file based access, right? So I can look at it and treat it like enterprise storage, see the data, use my standard tools, standard commands, you know, drag and drop from a file browser. You can do that with Matt bar. You can't do that with other districts >>Talking about mountain HDFS as a NFS correct >>Example. Correct. And then, and then just the underlying storage services. The fact that it's append only instead of full random read-write, you know, causes some, some issues. So, you know, that's some of the, the ease of use features. There's a whole lot. We could discuss there. Big picture for reliability. Dependability is there's a single point of failure, multiple single points of failure within Hadoop. So you risk data loss. So people have looked at Hadoop. Traditionally is, is batch oriented. Scratchpad right. We were out to solve that, right? We want to make sure that you can use it for mission critical data, that you don't have a risk of a data loss that you've got full high availability. You've got the full data protection in terms of snapshots and mirroring that you would expect with the enterprise products. >>It gets back to when you guys were, you know, thinking about doing this. I'm not even sure you were at the company at the time, but you, your DNA was there and you're familiar with it. So you guys saw this big data movement. You saw this at duke moon and you said, okay, this is cool. It's going to be big. And it's gonna take a long time for the community to fix all these problems. We can fix them. Now let's go do that. Is that the general discussion? Yeah. >>You know, I think, I think the what's different about this. This is the first open source package. The first open source project that's created a market. If you look at the other open source, you know, Linux, my SQL, et cetera, it was really late in the life cycle of a product. Everyone knew what the features were. It was about, you know, giving an alternative choice, better Unix. Your, your, the focus is on innovation and our founders, you know, have deep enterprise background or CTO was at Google and charge of big table, understands MapReduce at scale, spent time as chief software architect at Spinnaker, which was kind of the fastest clustered Nazanin on the planet. So recognize that the underlying layers of Hadoop needed some rearchitecture and needed some deep investment and to do that effectively and do that quickly required a whole lot of focus. And we thought that was the best way to go to market. >>Talk about the early validation from customers. Obviously you guys didn't just do this in a vacuum, I presume. So you went out and talked to some customers. Yeah. >>What sorts of conversations with customers, why we're in stealth mode? We're probably the loudest stealth >>As you were nodding. And I mean, what were they telling you at the time? Yeah, please go do this. >>The, what we address weren't secrets. I there've been gyrus for open for four or five years on, on these issues. >>Yeah. But at the same time, Jack, you've got this, you got this purist community out there that says, I don't want to, I don't want to rip out HDFS. You know, I want it to be pure. What'd you, what'd you say to those guys, you just say, okay, thank you. We, we understand you're not a prospect. >>And I think, I think that, you know, duke has a huge amount of momentum. And I think a lot of that momentum is that there isn't any risks to adopting Hadoop, right? It's not like the fractured no SQL market where there's 122 different entrance, which one's going to win. Hadoop's got the ecosystem. So when you say pure, it's about the API APIs, it's about making sure that if I create a MapReduce job, it's going to run an Apache. It's going to run a map bar. It's going to run on the other distributions. That's where I think that the heat and the focus is now to do that. You also have to have innovation occurring up and down the stack that that provides choice and alternatives for. >>So when I'm talking about purists, I don't, I agree with you the whole lock-in thing, which is the elephant in the room here. People will worry about lock-in >>Pun intended. >>No, no, but good one good catch. But so, but you're basically saying, Hey, where we're no more locked in than cloud era. Right. I mean, they've got their own >>Actually. I think we're less because it's so easy to get data in and out with our NFS. That there's probably less so, >>So, and I'm gonna come back to that. But so for instance, many, when I, when I say peers, I mean some users in ISV, some guys we've had on here, we had an Abby Mehta from Triceda on the other day, for instance, he's one who said, I just don't have time to mess with that stuff and figure out all that API integration. I mean, there are people out there that just don't want to go that route. Okay. But, but you're saying I'm, I'm inferring this plenty who do right. >>And the, and by the API route, I want to make sure I understand what you're saying. You >>Talked about, Hey, it's all about the API integration. It's not >>About, it's not the, it it's about the API APIs being consistent, a hundred percent compatible. Right. So if I, you know, write a program, that's, that's going after HDFS and the HDFS API, I want to make sure that that'll run on other distributions. Right. >>And that's your promise. Yeah. Okay. All right. So now where I was going with this was th again, there are some peers to say, oh, I just don't want to mess with all that. Now let's talk about what that means to mess with all that. So comScore was a big, high profile case study for you guys. They, they were cloud era customer. They basically, in my understanding is a couple of days migrated from Cloudera to Mapbox. And the impetus was, let's talk about that. Why'd they do that >>Performance data protection, ease of use >>License fee issues. There was some license issues there as well, right? The, the, your, your maintenance pricing was more attractive. Is that true? Or >>I read more mainly about price performance and reliability, and, you know, they tested our stuff at work real well in a test environment, they put it in production environment. Didn't actually tell all their users, they had one guys debug the software for half a day because something was wrong. It finished so quickly. >>So, so it took him a couple of days to migrate and then boom, >>Boom. And they've, they handle about 30 billion objects a day. So there, you know, the use of that really high performance support for, for streaming data flows, you know, they're talking about, they're doing forecasts and insights into web behavior, and, you know, they w the earlier they can do that, the better off they are. So >>Greg, >>So talk about the implications of, of your approach in terms of the customer base. So I'm, I'm imagining that your customers are more, perhaps advanced than a lot of your typical Hadoop users who are just getting started tinkering with Hadoop. Is it fair to say, you know, your customers know what they want and they want performance and they want it now. And they're a little more advanced than perhaps some of the typical early adopters. >>We've got people to go to our website and download the free version. And some of them are just starting off and getting used to Hadoop, but we did specifically target those very experienced Hadoop users that, you know, we're kind of, you know, stubbing their toes on, on the issues. And so they're very receptive to the message of we've made it faster. We've made it more reliable, you know, we've, we've added a lot of ease of use to the, to the Hindu. >>So I found this, let me interrupt, go back to what I was saying before is I found this comment that I found online from Mike Brown comScore. Skipio I presume you mean, he said comScore's map our direct access NFS feature, which exposes a duke distributed file system data as NFS files can then be easily mounted, modified, or overwritten. So that's a data access simplification. You also said we could capitalize on the purchase of map bar with an annual maintenance charge versus a yearly cost per node. NFS allowed our enterprise systems to easily access the data in the cluster. So does that make sense to you that, that enterprise of that annual maintenance charge versus yearly cost per node? I didn't get that. >>Oh, I think he's talking about some, some organizations prefer to do a perpetual license versus a subscription model that's >>Oh, okay. So the traditional way of licensing software >>And that, that you have to do it basically reinforces the fact that we've really invested in have kind of a, a product, you know, orientation rather than just services on top of, of some opensource. >>Okay. So you go in, you license it and then yeah. Perpetual license. >>Then you can also start with the free edition that does all the performance NFS support kick the tires >>Before you buy it. Sorry. Sorry, Jeff. Sorry to interrupt. No, no problem >>At all. So another topic, a lot of interest is security making a dupe enterprise ready. One of the pillars, there is security, making sure access controls, for instance, making sure let's talk about how you guys approach that and maybe how you differentiate from some of the other vendors out there, or the other >>Full Kerberos support. We Lincoln to enterprise standards for access eldap, et cetera. We leveraged the Linux, Pam security, and we also provide volume control. So, you know, right now in Hindu in Apache to dupe other distributions, you put policies at the file level or the entire cluster. And we see many organizations having separate physical clusters because of that limitation, right? And we'd provide volume. So you can define a volume. And in that volume control, access control, administrative privileges data protection class, and, you know, in a sense kind of segregate that content. And that provides a lot of, a lot of control and a lot more, you know, security and protection and separation of data. >>That scenario, the comScore scenario, common where somebody's moving off an existing distribution onto a map are, or, or you more going, going, seeing demand from new customers that are saying, Hey, what's this big data thing I really want to get into it. How's it shake out there >>Right now? There's this huge pent up demand for these features. And we're seeing a lot of people that have run on other distributions switched to map our >>A little bit of everything. How about, can you talk a little bit about your, your channel? You go to market strategy, maybe even some of your ecosystem and partnerships in the little time. >>Sure. So EMC is a big partner of the EMC Greenplum Mr. Edition is basically a map R you can start with any of our additions and upgrade to that. Greenplum with just a licensed key that gives us worldwide service and support. It's been a great partnership. >>We hear a lot of proof of concepts out there >>For, yeah. And then it just hit the news news today about EMC's distribution, Mr. Distribution being available with UCS Cisco's ECS gear. So now that's further expanded the, the footprint that we have about. >>Okay. So you're the EMC relationship. Anything else that you can share with us? >>We have other announcements coming out and >>Then you want to pre-announce in the queue. >>Oops. Did I let that slip >>It's alive? So be careful. And so, in terms of your, your channel strategy, you guys mostly selling direct indirect combination, >>It's it? It, it's kind of an indirect model through these, these large partners with a direct assist. >>Yeah. Okay. So you guys come in and help evangelize. Yep. Excellent. All right. Do you have anything else before we gotta got a roll here? >>Yeah, I did wonder if you could talk a little bit about, you mentioned EMC Greenplum so there's a lot of talk about the data warehouse market, the MPB data warehouses, versus a Hadoop based on that relationship. I'm assuming that Matt BARR thinks well, they're certainly complimentary. Can you just touch on that? And, you know, as opposed to some who think, well, Hadoop is going to be the platform where we go, >>Well, th th there's just, I mean, if you look at the typical organization, they're just really trying to get their, excuse me, their arms around a lot of this machine generated content, this, you know, unstructured data that just growing like wildfire. So there's a lot of Paducah specific use cases that are being rolled out. They're also kind of data lakes, data, oceans, whatever you want to call it, large pools where that information is then being extracted and loaded into data warehouses for further analysis. And I think the big pivot there is if it's well understood what the issue is, you define the schema, then there's a whole host of, of data warehouse applications out there that can be deployed. But there's many things where you don't really understand that yet having to dupe where you don't need to find a schema a is a, is a big value, >>Jack, I'm sorry. We have to go run a couple of minutes behind. Thank you very much for coming on the cube. Great story. Good luck with everything. And sounds like things are really going well and market's heating up and you're in the right place at the right time. So thank you again. Thank you to Jeff. And we'll be right back everybody to the strata conference live in Santa Clara, California, right after this word from our.

Published Date : Apr 27 2012

SUMMARY :

And you know, we've been going all week. A lot of attention, a lot of focus, a lot of discussion about Hadoop So that's So I think there's a lot of, And you guys have had your hand in that. broader use cases, more mission, critical support, you know, being able to sit in and work Let me ask it that way. So there, there are some just big issues with, One of those issues, let's talk about that. So there's a lot of focus of, you know, what do I do once the data's in So you risk data loss. It gets back to when you guys were, you know, thinking about doing this. It was about, you know, giving an alternative choice, better Unix. So you went out and talked to some customers. And I mean, what were they telling you at the time? I there've been gyrus for open for four or five You know, I want it to be And I think, I think that, you know, duke has a huge amount of momentum. So when I'm talking about purists, I don't, I agree with you the whole lock-in thing, I mean, they've got their own I think we're less because it's so easy to get data in and out with our NFS. So, and I'm gonna come back to that. And the, and by the API route, I want to make sure I understand what you're saying. Talked about, Hey, it's all about the API integration. So if I, you know, write a program, that's, that's going after for you guys. Is that true? and, you know, they tested our stuff at work real well in a test environment, they put it in production environment. you know, the use of that really high performance support for, to say, you know, your customers know what they want and they want performance and they want it now. experienced Hadoop users that, you know, we're kind of, you know, So does that make sense to you that, So the traditional way of licensing software And that, that you have to do it basically reinforces the fact that we've really invested in have kind Before you buy it. for instance, making sure let's talk about how you guys approach that and maybe how you differentiate from a lot of control and a lot more, you know, security and protection and separation of data. off an existing distribution onto a map are, or, or you more going, And we're seeing a lot of people that have run on other distributions switched to map our How about, can you talk a little bit about your, your channel? Mr. Edition is basically a map R you can start with any of our additions So now that's further Anything else that you can share with us? you guys mostly selling direct indirect combination, It, it's kind of an indirect model through these, these large partners with Do you have anything else before And, you know, as opposed to some who think, excuse me, their arms around a lot of this machine generated content, this, you know, So thank you again.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Jeff	PERSON	0.99+
Jack Norris	PERSON	0.99+
five	QUANTITY	0.99+
Dave Volante	PERSON	0.99+
Jack	PERSON	0.99+
EMC	ORGANIZATION	0.99+
last year	DATE	0.99+
Matt BARR	PERSON	0.99+
four	QUANTITY	0.99+
UCS	ORGANIZATION	0.99+
2,500 people	QUANTITY	0.99+
Santa Clara, California	LOCATION	0.99+
Greg	PERSON	0.99+
Google	ORGANIZATION	0.99+
Mike Brown	PERSON	0.99+
half a day	QUANTITY	0.99+
Spinnaker	ORGANIZATION	0.99+
Hadoop	TITLE	0.99+
comScore	ORGANIZATION	0.99+
five years	QUANTITY	0.99+
Riley	ORGANIZATION	0.98+
EMC Greenplum	ORGANIZATION	0.98+
Abby Mehta	PERSON	0.98+
Linux	TITLE	0.97+
strata conference	EVENT	0.97+
SQL	TITLE	0.97+
One	QUANTITY	0.97+
one guys	QUANTITY	0.97+
today	DATE	0.97+
Raleigh	ORGANIZATION	0.97+
122 different entrance	QUANTITY	0.97+
six people	QUANTITY	0.97+
Skipio	PERSON	0.96+
Jeff Kelly	PERSON	0.95+
single point	QUANTITY	0.95+
about 30 billion objects a day	QUANTITY	0.94+
Strata Conference 2012	EVENT	0.93+
ECS	ORGANIZATION	0.93+
hundred percent	QUANTITY	0.91+
Triceda	ORGANIZATION	0.9+
Apache	TITLE	0.9+
firs	QUANTITY	0.9+
Paducah	LOCATION	0.89+
Greenplum	ORGANIZATION	0.89+
single points	QUANTITY	0.88+
day three	QUANTITY	0.88+
NFS	TITLE	0.87+
Wiki bond.org	OTHER	0.87+
1400	QUANTITY	0.85+
Unix	TITLE	0.85+
Wiki bonds	ORGANIZATION	0.84+
Silicon angle	ORGANIZATION	0.83+
Mapbox	ORGANIZATION	0.78+
Apache	ORGANIZATION	0.76+
MapReduce	ORGANIZATION	0.75+
Kerberos	ORGANIZATION	0.75+
first open	QUANTITY	0.74+
Pam	TITLE	0.73+
Matt bar	ORGANIZATION	0.73+
Nazanin	ORGANIZATION	0.61+
Cloudera	TITLE	0.59+
moon	LOCATION	0.58+
Cisco	ORGANIZATION	0.54+
one	QUANTITY	0.53+
days	QUANTITY	0.52+
MapReduce	TITLE	0.47+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Kerberos: