Haseeb Budhani, Rafay & Kevin Coleman, AWS | AWS Summit New York 2022

(gentle music) (upbeat music) (crowd chattering) >> Welcome back to The City That Never Sleeps. Lisa Martin and John Furrier in New York City for AWS Summit '22 with about 10 to 12,000 of our friends. And we've got two more friends joining us here today. We're going to be talking with Haseeb Budhani, one of our alumni, co-founder and CEO of Rafay Systems, and Kevin Coleman, senior manager for Go-to Market for EKS at AWS. Guys, thank you so much for joining us today. >> Thank you very much for having us. Excited to be here. >> Isn't it great to be back at an in-person event with 10, 12,000 people? >> Yes. There are a lot of people here. This is packed. >> A lot of energy here. So, Haseeb, we've got to start with you. Your T-shirt says it all. Don't hate k8s. (Kevin giggles) Talk to us about some of the trends, from a Kubernetes perspective, that you're seeing, and then Kevin will give your follow-up. >> Yeah. >> Yeah, absolutely. So, I think the biggest trend I'm seeing on the enterprise side is that enterprises are forming platform organizations to make Kubernetes a practice across the enterprise. So it used to be that a BU would say, "I need Kubernetes. I have some DevOps engineers, let me just do this myself." And the next one would do the same, and then next one would do the same. And that's not practical, long term, for an enterprise. And this is now becoming a consolidated effort, which is, I think it's great. It speaks to the power of Kubernetes, because it's becoming so important to the enterprise. But that also puts a pressure because what the platform team has to solve for now is they have to find this fine line between automation and governance, right? I mean, the developers, you know, they don't really care about governance. Just give me stuff, I need to compute, I'm going to go. But then the platform organization has to think about, how is this going to play for the enterprise across the board? So that combination of automation and governance is where we are finding, frankly, a lot of success in making enterprise platform team successful. I think, that's a really new thing to me. It's something that's changed in the last six months, I would say, in the industry. I don't know if, Kevin, if you agree with that or not, but that's what I'm seeing. >> Yeah, definitely agree with that. We see a ton of customers in EKS who are building these new platforms using Kubernetes. The term that we hear a lot of customers use is standardization. So they've got various ways that they're deploying applications, whether it's on-prem or in the cloud and region. And they're really trying to standardize the way they deploy applications. And Kubernetes is really that compute substrate that they're standardizing on. >> Kevin, talk about the relationship with Rafay Systems that you have and why you're here together. And two, second part of that question, why is EKS kicking ass so much? (Haseeb and Kevin laughing) All right, go ahead. First one, your relationship. Second one, EKS is doing pretty well. >> Yep, yep, yep. (Lisa laughing) So yeah, we work closely with Rafay, Rafay, excuse me. A lot of joint customer wins with Haseeb and Co, so they're doing great work with EKS customers and, yeah, love the partnership there. In terms of why EKS is doing so well, a number of reasons, I think. Number one, EKS is vanilla, upstream, open-source Kubernetes. So customers want to use that open-source technology, that open-source Kubernetes, and they come to AWS to get it in a managed offering, right? Kubernetes isn't the easiest thing to self-manage. And so customers, you know, back before EKS launched, they were banging down the door at AWS for us to have a managed Kubernetes offering. And, you know, we launched EKS and there's been a ton of customer adoption since then. >> You know, Lisa, when we, theCUBE 12 years, now everyone knows we started in 2010, we used to cover a show called OpenStack. >> I remember that. >> OpenStack Summit. >> What's that now? >> And at the time, at that time, Kubernetes wasn't there. So theCUBE was present at creation. We've been to every KubeCon ever, CNCF then took it over. So we've been watching it from the beginning. >> Right. And it reminds me of the same trend we saw with MapReduce and Hadoop. Very big promise, everyone loved it, but it was hard, very difficult. And Hadoop's case, big data, it ended up becoming a data lake. Now you got Spark, or Snowflake, and Databricks, and Redshift. Here, Kubernetes has not yet been taken over. But, instead, it's being abstracted away and or managed services are emerging. 'Cause general enterprises can't hire enough Kubernetes people. >> Yep. >> They're not that many out there yet. So there's the training issue. But there's been the rise of managed services. >> Yep. >> Can you guys comment on what your thoughts are relative to that trend of hard to use, abstracting away the complexity, and, specifically, the managed services? >> Yeah, absolutely. You want to go? >> Yeah, absolutely. I think, look, it's important to not kid ourselves. It is hard. (Johns laughs) But that doesn't mean it's not practical, right. When Kubernetes is done well, it's a thing of beauty. I mean, we have enough customer to scale, like, you know, it's like a, forget a hockey stick, it's a straight line up, because they just are moving so fast when they have the right platform in place. I think that the mistake that many of us make, and I've made this mistake when we started this company, was trivializing the platform aspect of Kubernetes, right. And a lot of my customers, you know, when they start, they kind of feel like, well, this is not that hard. I can bring this up and running. I just need two people. It'll be fine. And it's hard to hire, but then, I need two, then I need two more, then I need two, it's a lot, right. I think, the one thing I keep telling, like, when I talk to analysts, I say, "Look, somebody needs to write a book that says, 'Yes, it's hard, but, yes, it can be done, and here's how.'" Let's just be open about what it takes to get there, right. And, I mean, you mentioned OpenStack. I think the beauty of Kubernetes is that because it's such an open system, right, even with the managed offering, companies like Rafay can build really productive businesses on top of this Kubernetes platform because it's an open system. I think that is something that was not true with OpenStack. I've spent time with OpenStack also, I remember how it is. >> Well, Amazon had a lot to do with stalling the momentum of OpenStack, but your point about difficulty. Hadoop was always difficult to maintain and hiring against. There were no managed services and no one yet saw that value of big data yet. Here at Kubernetes, people are living a problem called, I'm scaling up. >> Yep. And so it sounds like it's a foundational challenge. The ongoing stuff sounds easier or manageable. >> Once you have the right tooling. >> Is that true? >> Yeah, no, I mean, once you have the right tooling, it's great. I think, look, I mean, you and I have talked about this before, I mean, the thesis behind Rafay is that, you know, there's like 8, 12 things that need to be done right for Kubernetes to work well, right. And my whole thesis was, I don't want my customer to buy 10, 12, 15 products. I want them to buy one platform, right. And I truly believe that, in our market, similar to what vCenter, like what VMware's vCenter did for VMs, I want to do that for Kubernetes, right. And that the reason why I say that is because, see, vCenter is not about hypervisors, right? vCenter is about hypervisor, access, networking, storage, all of the things, like multitenancy, all the things that you need to run an enterprise-grade VM environment. What is that equivalent for the Kubernetes world, right? So what we are doing at Rafay is truly building a vCenter, but for Kubernetes, like a kCenter. I've tried getting the domain. I couldn't get it. (Kevin laughs) >> Well, after the Broadcom view, you don't know what's going to happen. >> Ehh. (John laughs) >> I won't go there! >> Yeah. Yeah, let's not go there today. >> Kevin, EKS, I've heard people say to me, "Love EKS. Just add serverless, that's a home run." There's been a relationship with EKS and some of the other Amazon tools. Can you comment on what you're seeing as the most popular interactions among the services at AWS? >> Yeah, and was your comment there, add serverless? >> Add serverless with AKS at the edge- >> Yeah. >> and things are kind of interesting. >> I mean, so, one of the serverless offerings we have today is actually Fargate. So you can use Fargate, which is our serverless compute offering, or one of our serverless compute offerings with EKS. And so customers love that. Effectively, they get the beauty of EKS and the Kubernetes API but they don't have to manage nodes. So that's, you know, a good amount of adoption with Fargate as well. But then, we also have other ways that they can manage their nodes. We have managed node groups as well, in addition to self-managed nodes also. So there's a variety of options that customers can use from a compute perspective with EKS. And you'll continue to see us evolve the portfolio as well. >> Can you share, Haseeb, can you share a customer example, a joint customer example that you think really articulates the value of what Rafay and AWS are doing together? >> Yeah, absolutely. In fact, we announced a customer very recently on this very show, which is MoneyGram, which is a joint AWS and Rafay customer. Look, we have enough, you know, the thing about these massive customers is that, you know, not everybody's going to give us their logo to use. >> Right. >> But MoneyGram has been a Rafay plus EKS customer for a very, very long time. You know, at this point, I think we've earned their trust, and they've allowed us to, kind of say this publicly. But there's enough of these financial services companies who have, you know, standardized on EKS. So it's EKS first, Rafay second, right. They standardized on EKS. And then they looked around and said, "Who can help me platform EKS across my enterprise?" And we've been very lucky. We have some very large financial services, some very large healthcare companies now, who, A, EKS, B, Rafay. I'm not just saying that because my friend Kevin's here, (Lisa laughs) it's actually true. Look, EKS is a brilliant platform. It scales so well, right. I mean, people try it out, relative to other platforms, and it's just a no-brainer, it just scales. You want to build a big enterprise on the backs of a Kubernetes platform. And I'm not saying that's because I'm biased. Like EKS is really, really good. There's a reason why so many companies are choosing it over many other options in the market. >> You're doing a great job of articulating why the theme (Kevin laughs) of the New York City Summit is scale anything. >> Oh, yeah. >> There you go. >> Oh, yeah. >> I did not even know that but I'm speaking the language, right? >> You are. (John laughs) >> Yeah, absolutely. >> One of the things that we're seeing, also, I want to get your thoughts on, guys, is the app modernization trend, right? >> Yep. >> Because unlike other standards that were hard, that didn't have any benefit downstream 'cause they were too hard to get to, here, Kubernetes is feeding into real app for app developer pressure. They got to get cloud-native apps out. It's fairly new in the mainstream enterprise and a lot of hyperscalers have experience. So I'm going to ask you guys, what is the key thing that you're enabling with Kubernetes in the cloud-native apps? What is the key value? >> Yeah. >> I think, there's a bifurcation happening in the market. One is the Kubernetes Engine market, which is like EKS, AKS, GKE, right. And then there's the, you know, what, back in the day, we used to call operations and management, right. So the OAM layer for Kubernetes is where there's need, right. People are learning, right. Because, as you said before, the skill isn't there, you know, there's not enough talent available to the market. And that's the opportunity we're seeing. Because to solve for the standardization, the governance, and automation that we talked about earlier, you know, you have to solve for, okay, how do I manage my network? How do I manage my service mesh? How do I do chargebacks? What's my, you know, policy around actual Kubernetes policies? What's my blueprinting strategy? How do I do add-on management? How do I do pipelines for updates of add-ons? How do I upgrade my clusters? And we're not done yet, there's a longer list, right? This is a lot, right? >> Yeah. >> And this is what happens, right. It's just a lot. And really, the companies who understand that plethora of problems that need to be solved and build easy-to-use solutions that enterprises can consume with the right governance automation, I think they're going to be very, very successful here. >> Yeah. >> Because this is a train, right? I mean, this is happening whether, it's not us, it's happening, right? Enterprises are going to keep doing this. >> And open-source is a big driver in all of this. >> Absolutely. >> Absolutely. >> And I'll tag onto that. I mean, you talked about platform engineering earlier. Part of the point of building these platforms on top of Kubernetes is giving developers an easier way to get applications into the cloud. So building unique developer experiences that really make it easy for you, as a software developer, to take the code from your laptop, get it out of production as quickly as possible. The question is- >> So is that what you mean, does that tie your point earlier about that vertical, straight-up value once you've set up it, right? >> Yep. >> Because it's taking the burden off the developers for stopping their productivity. >> Absolutely. >> To go check in, is it configured properly? Is the supply chain software going to be there? Who's managing the services? Who's orchestrating the nodes? >> Yep. >> Is that automated, is that where you guys see the value? >> That's a lot of what we see, yeah. In terms of how these companies are building these platforms, is taking all the component pieces that Haseeb was talking about and really putting it into a cohesive whole. And then, you, as a software developer, you don't have to worry about configuring all of those things. You don't have to worry about security policy, governance, how your app is going to be exposed to the internet. >> It sounds like infrastructure is code. >> (laughs) Yeah. >> Come on, like. >> (laughs) Infrastructure's code is a big piece of it, for sure, for sure. >> Yeah, look, infrastructure's code actually- >> Infrastructure's sec is code too, the security. >> Yeah. >> Huge. >> Well, it all goes together. Like, we talk about developer self-service, right? The way we enable developer self-service is by teaching developers, here's a snippet of code that you write and you check it in and your infrastructure will just magically be created. >> Yep. >> But not automatically. It's going to go through a check, like a check through the platform team. These are the workflows that if you get them right, developers don't care, right. All developers want is I want to compute. But then all these 20 things need to happen in the back. That's what, if you nail it, right, I mean, I keep trying to kind of pitch the company, I don't want to do that today. But if you nail that, >> I'll give you a plug at the end. >> you have a good story. >> But I got to, I just have a tangent question 'cause you reminded me. There's two types of developers that have emerged, right. You have the software developer that wants infrastructures code. I just want to write my code, I don't want to stop. I want to build in shift-left for security, shift-right for data. All that's in there. >> Right. >> I'm coding away, I love coding. Then you've got the under-the-hood person. >> Yes. >> I've been to the engines. >> Certainly. >> So that's more of an SRE, data engineer, I'm wiring services together. >> Yeah. >> A lot of people are like, they don't know who they are yet. They're in college or they're transforming from an IT job. They're trying to figure out who they are. So question is, how do you tell a person that's watching, like, who am I? Like, should I be just coding? But I love the tech. Would you guys have any advice there? >> You know, I don't know if I have any guidance in terms of telling people who they are. (all laughing) I mean, I think about it in terms of a spectrum and this is what we hear from customers, is some customers want to shift as much responsibility onto the software teams to manage their infrastructure as well. And then some want to shift it all the way over to the very centralized model. And, you know, we see everything in between as well with our EKS customer base. But, yeah, I'm not sure if I have any direct guidance for people. >> Let's see, any wisdom? >> Aside from experiment. >> If you're coding more, you're a coder. If you like to play with the hardware, >> Yeah. >> or the gears. >> Look, I think it's really important for managers to understand that developers, yes, they have a job, you have to write code, right. But they also want to learn new things. It's only fair, right. >> Oh, yeah. >> So what we see is, developers want to learn. And we enable for them to understand Kubernetes in small pieces, like small steps, right. And that is really, really important because if we completely abstract things away, like Kubernetes, from them, it's not good for them, right. It's good for their careers also, right. It's good for them to learn these things. This is going to be with us for the next 15, 20 years. Everybody should learn it. But I want to learn it because I want to learn, not because this is part of my job, and that's the distinction, right. I don't want this to become my job because I want, I want to write my code. >> Do what you love. If you're more attracted to understanding how automation works, and robotics, or making things scale, you might be under-the-hood. >> Yeah. >> Yeah, look under the hood all day long. But then, in terms of, like, who keeps the lights on for the cluster, for example. >> All right, see- >> That's the job. >> He makes a lot of value. Now you know who you are. Ask these guys. (Lisa laughing) Congratulations on your success on EKS 2. >> Yeah, thank you. >> Quick, give a plug for the company. I know you guys are growing. I want to give you a minute to share to the audience a plug that's going to be, what are you guys doing? You're hiring? How many employees? Funding? Customer new wins? Take a minute to give a plug. >> Absolutely. And look, I come see, John, I think, every show you guys are doing a summit or a KubeCon, I'm here. (John laughing) And every time we come, we talk about new customers. Look, platform teams at enterprises seem to love Rafay because it helps them build that, well, Kubernetes platform that we've talked about on the show today. I think, many large enterprises on the financial service side, healthcare side, digital native side seem to have recognized that running Kubernetes at scale, or even starting with Kubernetes in the early days, getting it right with the right standards, that takes time, that takes effort. And that's where Rafay is a great partner. We provide a great SaaS offering, which you can have up and running very, very quickly. Of course, we love EKS. We work with our friends at AWS. But also works with Azure, we have enough customers in Azure. It also runs in Google. We have enough customers at Google. And it runs on-premises with OpenShift or with EKS A, right, whichever option you want to take. But in terms of that standardization and governance and automation for your developers to move fast, there's no better product in the market right now when it comes to Kubernetes platforms than Rafay. >> Kevin, while we're here, why don't you plug EKS too, come on. >> Yeah, absolutely, why not? (group laughing) So yes, of course. EKS is AWS's managed Kubernetes offering. It's the largest managed Kubernetes service in the world. We help customers who want to adopt Kubernetes and adopt it wherever they want to run Kubernetes, whether it's in region or whether it's on the edge with EKS A or running Kubernetes on Outposts and the evolving portfolio of EKS services as well. We see customers running extremely high-scale Kubernetes clusters, excuse me, and we're here to support them as well. So yeah, that's the managed Kubernetes offering. >> And I'll give the plug for theCUBE, we'll be at KubeCon in Detroit this year. (Lisa laughing) Lisa, look, we're giving a plug to everybody. Come on. >> We're plugging everybody. Well, as we get to plugs, I think, Haseeb, you have a book to write, I think, on Kubernetes. And I think you're wearing the title. >> Well, I do have a book to write, but I'm one of those people who does everything at the very end, so I will never get it right. (group laughing) So if you want to work on it with me, I have some great ideas. >> Ghostwriter. >> Sure! >> But I'm lazy. (Kevin chuckles) >> Ooh. >> So we got to figure something out. >> Somehow I doubt you're lazy. (group laughs) >> No entrepreneur's lazy, I know that. >> Right? >> You're being humble. >> He is. So Haseeb, Kevin, thank you so much for joining John and me today, >> Thank you. >> talking about what you guys are doing at Rafay with EKS, the power, why you shouldn't hate k8s. We appreciate your insights and your time. >> Thank you as well. >> Yeah, thank you very much for having us. >> Our pleasure. >> Thank you. >> We appreciate it. With John Furrier, I'm Lisa Martin. You're watching theCUBE live from New York City at the AWS NYC Summit. John and I will be right back with our next guest, so stick around. (upbeat music) (gentle music)

Published Date : Jul 14 2022

SUMMARY :

We're going to be talking Thank you very much for having us. This is packed. Talk to us about some of the trends, I mean, the developers, you know, in the cloud and region. that you have and why And so customers, you know, we used to cover a show called OpenStack. And at the time, And it reminds me of the same trend we saw They're not that many out there yet. You want to go? And, I mean, you mentioned OpenStack. Well, Amazon had a lot to do And so it sounds like it's And that the reason why Well, after the Broadcom view, (John laughs) Yeah, let's not go there today. and some of the other Amazon tools. I mean, so, one of the you know, the thing about these who have, you know, standardized on EKS. of the New York City (John laughs) So I'm going to ask you guys, And that's the opportunity we're seeing. I think they're going to be very, I mean, this is happening whether, big driver in all of this. I mean, you talked about Because it's taking the is taking all the component pieces code is a big piece of it, is code too, the security. here's a snippet of code that you write that if you get them right, at the end. I just want to write my I'm coding away, I love coding. So that's more of But I love the tech. And then some want to If you like to play with the hardware, for managers to understand This is going to be with us Do what you love. the cluster, for example. Now you know who you are. I want to give you a minute Kubernetes in the early days, why don't you plug EKS too, come on. and the evolving portfolio And I'll give the plug And I think you're wearing the title. So if you want to work on it with me, But I'm lazy. So we got to (group laughs) So Haseeb, Kevin, thank you so much the power, why you shouldn't hate k8s. Yeah, thank you very much at the AWS NYC Summit.

ENTITIES

Entity	Category	Confidence
Lisa Martin	PERSON	0.99+
Kevin Coleman	PERSON	0.99+
Kevin	PERSON	0.99+
John	PERSON	0.99+
Rafay	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Haseeb	PERSON	0.99+
John Furrier	PERSON	0.99+
two	QUANTITY	0.99+
EKS	ORGANIZATION	0.99+
10	QUANTITY	0.99+
John Furrier	PERSON	0.99+
New York City	LOCATION	0.99+
Haseeb Budhani	PERSON	0.99+
2010	DATE	0.99+
Rafay Systems	ORGANIZATION	0.99+
20 things	QUANTITY	0.99+
12	QUANTITY	0.99+
Lisa	PERSON	0.99+
two people	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
one platform	QUANTITY	0.99+
two types	QUANTITY	0.99+
MoneyGram	ORGANIZATION	0.99+
15 products	QUANTITY	0.99+
one	QUANTITY	0.99+
OpenShift	TITLE	0.99+
Rafay	ORGANIZATION	0.99+
12 things	QUANTITY	0.98+
today	DATE	0.98+
Second one	QUANTITY	0.98+
8	QUANTITY	0.98+
10, 12,000 people	QUANTITY	0.98+
vCenter	TITLE	0.98+
Detroit	LOCATION	0.98+
12 years	QUANTITY	0.98+
New York City Summit	EVENT	0.97+
EKS A	TITLE	0.97+
Kubernetes	TITLE	0.97+

Dr. Matt Wood, AWS | AWS Summit SF 2022

(gentle melody) >> Welcome back to theCUBE's live coverage of AWS Summit in San Francisco, California. Events are back. AWS Summit in New York City this summer, theCUBE will be there as well. Check us out there. I'm glad to have events back. It's great to have of everyone here. I'm John Furrier, host of theCUBE. Dr. Matt Wood is with me, CUBE alumni, now VP of Business Analytics Division of AWS. Matt, great to see you. >> Thank you, John. It's great to be here. I appreciate it. >> I always call you Dr. Matt Wood because Andy Jackson always says, "Dr. Matt, we would introduce you on the arena." (Matt laughs) >> Matt: The one and only. >> The one and only, Dr. Matt Wood. >> In joke, I love it. (laughs) >> Andy style. (Matt laughs) I think you had walk up music too. >> Yes, we all have our own personalized walk up music. >> So talk about your new role, not a new role, but you're running the analytics business for AWS. What does that consist of right now? >> Sure. So I work. I've got what I consider to be one of the best jobs in the world. I get to work with our customers and the teams at AWS to build the analytics services that millions of our customers use to slice dice, pivot, better understand their data, look at how they can use that data for reporting, looking backwards. And also look at how they can use that data looking forward, so predictive analytics and machine learning. So whether it is slicing and dicing in the lower level of Hadoop and the big data engines, or whether you're doing ETL with Glue, or whether you're visualizing the data in QuickSight or building your models in SageMaker. I got my fingers in a lot of pies. >> One of the benefits of having CUBE coverage with AWS since 2013 is watching the progression. You were on theCUBE that first year we were at Reinvent in 2013, and look at how machine learning just exploded onto the scene. You were involved in that from day one. It's still day one, as you guys say. What's the big thing now? Look at just what happened. Machine learning comes in and then a slew of services come in. You've got SageMaker, became a hot seller right out of the gate. The database stuff was kicking butt. So all this is now booming. That was a real generational change over for database. What's the perspective? What's your perspective on that's evolved? >> I think it's a really good point. I totally agree. I think for machine learning, there's sort of a Renaissance in machine learning and the application of machine learning. Machine learning as a technology has been around for 50 years, let's say. But to do machine learning right, you need like a lot of data. The data needs to be high quality. You need a lot of compute to be able to train those models and you have to be able to evaluate what those models mean as you apply them to real world problems. And so the cloud really removed a lot of the constraints. Finally, customers had all of the data that they needed. We gave them services to be able to label that data in a high quality way. There's all the compute you need to be able to train the models. And so where you go? And so the cloud really enabled this Renaissance with machine learning. And we're seeing honestly a similar Renaissance with data and analytics. If you look back five to ten years, analytics was something you did in batch, your data warehouse ran an analysis to do reconciliation at the end of the month, and that was it. (John laughs) And so that's when you needed it. But today, if your Redshift cluster isn't available, Uber drivers don't turn up, DoorDash deliveries don't get made. Analytics is now central to virtually every business, and it is central to virtually every business's digital transformation. And being able to take that data from a variety of sources, be able to query it with high performance, to be able to actually then start to augment that data with real information, which usually comes from technical experts and domain experts to form wisdom and information from raw data. That's kind of what most organizations are trying to do when they kind of go through this analytics journey. >> It's interesting. Dave Velanta and I always talk on theCUBE about the future. And you look back, the things we're talking about six years ago are actually happening now. And it's not hyped up statement to say digital transformation is actually happening now. And there's also times when we bang our fists on the table saying, say, "I really think this is so important." And David says, "John, you're going to die on that hill." (Matt laughs) And so I'm excited that this year, for the first time, I didn't die on that hill. I've been saying- >> Do all right. >> Data as code is the next infrastructure as code. And Dave's like, "What do you mean by that?" We're talking about how data gets... And it's happening. So we just had an event on our AWS startups.com site, a showcase for startups, and the theme was data as code. And interesting new trends emerging really clearly, the role of a data engineer, right? Like an SRE, what an SRE did for cloud, you have a new data engineering role because of the developer onboarding is massively increasing, exponentially, new developers. Data science scientists are growing, but the pipelining and managing and engineering as a system, almost like an operating system. >> Kind of as a discipline. >> So what's your reaction to that about this data engineer, data as code? Because if you have horizontally scalable data, you've got to be open, that's hard (laughs), okay? And you got to silo the data that needs to be siloed for compliance and reason. So that's a big policy around that. So what's your reaction to data's code and the data engineering phenomenon? >> It's a really good point. I think with any technology project inside of an organization, success with analytics or machine learning, it's kind of 50% technology and then 50% cultural. And you have often domain experts. Those could be physicians or drug design experts, or they could be financial experts or whoever they might be, got deep domain expertise, and then you've got technical implementation teams. And there's kind of a natural, often repulsive force. I don't mean that rudely, but they just don't talk the same language. And so the more complex a domain and the more complex the technology, the stronger their repulsive force. And it can become very difficult for domain experts to work closely with the technical experts to be able to actually get business decisions made. And so what data engineering does and data engineering is, in some cases a team, or it can be a role that you play. It's really allowing those two disciplines to speak the same language. You can think of it as plumbing, but I think of it as like a bridge. It's a bridge between the technical implementation and the domain experts, and that requires a very disparate range of skills. You've got to understand about statistics, you've got to understand about the implementation, you got to understand about the data, you got to understand about the domain. And if you can put all of that together, that data engineering discipline can be incredibly transformative for an organization because it builds the bridge between those two groups. >> I was advising some young computer science students at the sophomore, junior level just a couple of weeks ago, and I told them I would ask someone at Amazon this question. So I'll ask you, >> Matt: Okay. since you've been in the middle of it for years, they were asking me, and I was trying to mentor them on how do you become a data engineer, from a practical standpoint? Courseware, projects to work on, how to think, not just coding Python, because everyone's coding in Python, but what else can they do? So I was trying to help them. I didn't really know the answer myself. I was just trying to kind of help figure it out with them. So what is the answer, in your opinion, or the thoughts around advice to young students who want to be data engineers? Because data scientists is pretty clear on what that is. You use tools, you make visualizations, you manage data, you get answers and insights and then apply that to the business. That's an application. That's not the standing up a stack or managing the infrastructure. So what does that coding look like? What would your advice be to folks getting into a data engineering role? >> Yeah, I think if you believe this, what I said earlier about 50% technology, 50 % culture, the number one technology to learn as a data engineer is the tools in the cloud which allow you to aggregate data from virtually any source into something which is incrementally more valuable for the organization. That's really what data engineering is all about. It's about taking from multiple sources. Some people call them silos, but silos indicates that the storage is kind of fungible or undifferentiated. That's really not the case. Success requires you to have really purpose built, well crafted, high performance, low cost engines for all of your data. So understanding those tools and understanding how to use them, that's probably the most important technical piece. Python and programming and statistics go along with that, I think. And then the most important cultural part, I think is... It's just curiosity. You want to be able to, as a data engineer, you want to have a natural curiosity that drives you to seek the truth inside an organization, seek the truth of a particular problem, and to be able to engage because probably you're going to some choice as you go through your career about which domain you end up in. Maybe you're really passionate about healthcare, or you're really just passionate about transportation or media, whatever it might be. And you can allow that to drive a certain amount of curiosity. But within those roles, the domains are so broad you kind of got to allow your curiosity to develop and lead you to ask the right questions and engage in the right way with your teams, because you can have all the technical skills in the world. But if you're not able to help the team's truth seek through that curiosity, you simply won't be successful. >> We just had a guest, 20 year old founder, Johnny Dallas who was 16 when he worked at Amazon. Youngest engineer- >> Johnny Dallas is a great name, by the way. (both chuckle) >> It's his real name. It sounds like a football player. >> That's awesome. >> Rock star. Johnny CUBE, it's me. But he's young and he was saying... His advice was just do projects. >> Matt: And get hands on. Yeah. >> And I was saying, hey, I came from the old days where you get to stand stuff up and you hugged on for the assets because you didn't want to kill the project because you spent all this money. And he's like, yeah, with cloud you can shut it down. If you do a project that's not working and you get bad data no one's adopting it or you don't like it anymore, you shut it down, just something else. >> Yeah, totally. >> Instantly abandon it and move on to something new. That's a progression. >> Totally! The blast radius of decisions is just way reduced. We talk a lot about in the old world, trying to find the resources and get the funding is like, all right, I want to try out this kind of random idea that could be a big deal for the organization. I need $50 million and a new data center. You're not going to get anywhere. >> And you do a proposal, working backwards, documents all kinds of stuff. >> All that sort of stuff. >> Jump your hoops. >> So all of that is gone. But we sometimes forget that a big part of that is just the prototyping and the experimentation and the limited blast radius in terms of cost, and honestly, the most important thing is time, just being able to jump in there, fingers on keyboards, just try this stuff out. And that's why at AWS, we have... Part of the reason we have so many services, because we want, when you get into AWS, we want the whole toolbox to be available to every developer. And so as your ideas develop, you may want to jump from data that you have that's already in a database to doing realtime data. And then you have the tools there. And when you want to get into real time data, you don't just have kinesis, you have real time analytics, and you can run SQL against that data. The capabilities and the breadth really matter when it comes to prototyping. >> That's the culture piece, because what was once a dysfunctional behavior. I'm going to go off the reservation and try something behind my boss' back, now is a side hustle or fun project. So for fun, you can just code something. >> Yeah, totally. I remember my first Hadoop projects. I found almost literally a decommissioned set of servers in the data center that no one was using. They were super old. They're about to be literally turned off. And I managed to convince the team to leave them on for me for another month. And I installed Hadoop on them and got them going. That just seems crazy to me now that I had to go and convince anybody not to turn these servers off. But what it was like when you- >> That's when you came up with Elastic MapReduce because you said this is too hard, we got to make it easier. >> Basically yes. (John laughs) I was installing Hadoop version Beta 9.9 or whatever. It was like, this is really hard. >> We got to make it simpler. All right, good stuff. I love the walk down memory Lane. And also your advice. Great stuff. I think culture is huge. That's why I like Adam's keynote at Reinvent, Adam Selipsky talk about Pathfinders and trailblazers, because that's a blast radius impact when you can actually have innovation organically just come from anywhere. That's totally cool. >> Matt: Totally cool. >> All right, let's get into the product. Serverless has been hot. We hear a lot of EKS is hot. Containers are booming. Kubernetes is getting adopted, still a lot of work to do there. Cloud native developers are booming. Serverless, Lambda. How does that impact the analytics piece? Can you share the hot products around how that translates? >> Absolutely, yeah. >> Aurora, SageMaker. >> Yeah, I think it's... If you look at kind of the evolution and what customers are asking for, they don't just want low cost. They don't just want this broad set of services. They don't just want those services to have deep capabilities. They want those services to have as low an operating cost over time as possible. So we kind of really got it down. We got built a lot of muscle, a lot of services about getting up and running and experimenting and prototyping and turning things off and turning them on and turning them off. And that's all great. But actually, you really only in most projects start something once and then stop something once, and maybe there's an hour in between or maybe there's a year. But the real expense in terms of time and operations and complexity is sometimes in that running cost. And so we've heard very loudly and clearly from customers that running cost is just undifferentiated to them. And they want to spend more time on their work. And in analytics, that is slicing the data, pivoting the data, combining the data, labeling the data, training their models, running inference against their models, and less time doing the operational pieces. >> Is that why the service focuses there? >> Yeah, absolutely. It dramatically reduces the skill required to run these workloads of any scale. And it dramatically reduces the undifferentiated heavy lifting because you get to focus more of the time that you would have spent on the operations on the actual work that you want to get done. And so if you look at something just like Redshift Serverless, that we launched a Reinvent, we have a lot of customers that want to run the cluster, and they want to get into the weeds where there is benefit. We have a lot of customers that say there's no benefit for me, I just want to do the analytics. So you run the operational piece, you're the experts. We run 60 million instant startups every single day. We do this a lot. >> John: Exactly. We understand the operations- >> I just want the answers. Come on. >> So just give me the answers or just give me the notebook or just give me the inference prediction. Today, for example, we announced Serverless Inference. So now once you've trained your machine learning model, just run a few lines of code or you just click a few buttons and then you got an inference endpoint that you do not have to manage. And whether you're doing one query against that end point per hour or you're doing 10 million, we'll just scale it on the back end. I know we got not a lot of time left, but I want to get your reaction on this. One of the things about the data lakes not being data swamps has been, from what I've been reporting and hearing from customers, is that they want to retrain their machine learning algorithm. They need that data, they need the real time data, and they need the time series data. Even though the time has passed, they got to store in the data lake. So now the data lake's main function is being reusing the data to actually retrain. It works properly. So a lot of post mortems turn into actually business improvements to make the machine learnings smarter, faster. Do you see that same way? Do you see it the same way? >> Yeah, I think it's really interesting >> Or is that just... >> No, I think it's totally interesting because it's convenient to kind of think of analytics as a very clear progression from point A to point B. But really, you're navigating terrain for which you do not have a map, and you need a lot of help to navigate that terrain. And so having these services in place, not having to run the operations of those services, being able to have those services be secure and well governed. And we added PII detection today. It's something you can do automatically, to be able to use any unstructured data, run queries against that unstructured data. So today we added text queries. So you can just say, well, you can scan a badge, for example, and say, well, what's the name on this badge? And you don't have to identify where it is. We'll do all of that work for you. It's more like a branch than it is just a normal A to B path, a linear path. And that includes loops backwards. And sometimes you've got to get the results and use those to make improvements further upstream. And sometimes you've got to use those... And when you're downstream, it will be like, "Ah, I remember that." And you come back and bring it all together. >> Awesome. >> So it's a wonderful world for sure. >> Dr. Matt, we're here in theCUBE. Just take the last word and give the update while you're here what's the big news happening that you're announcing here at Summit in San Francisco, California, and update on the business analytics group. >> Yeah, we did a lot of announcements in the keynote. I encourage everyone to take a look at, that this morning with Swami. One of the ones I'm most excited about is the opportunity to be able to take dashboards, visualizations. We're all used to using these things. We see them in our business intelligence tools, all over the place. However, what we've heard from customers is like, yes, I want those analytics, I want that visualization, I want it to be up to date, but I don't actually want to have to go from my tools where I'm actually doing my work to another separate tool to be able to look at that information. And so today we announced 1-click public embedding for QuickSight dashboard. So today you can literally as easily as embedding a YouTube video, you can take a dashboard that you've built inside QuickSight, cut and paste the HTML, paste it into your application and that's it. That's what you have to do. It takes seconds. >> And it gets updated in real time. >> Updated in real time. It's interactive. You can do everything that you would normally do. You can brand it, there's no power by QuickSight button or anything like that. You can change the colors, fit in perfectly with your application. So that's an incredibly powerful way of being able to take an analytics capability that today sits inside its own little fiefdom and put it just everywhere. Very transformative. >> Awesome. And the business is going well. You got the Serverless detail win for you there. Good stuff. Dr. Matt Wood, thank you for coming on theCUBE. >> Anytime. Thank you. >> Okay, this is theCUBE's coverage of AWS Summit 2022 in San Francisco, California. I'm John Furrier, host of theCUBE. Stay with us for more coverage of day two after this short break. (gentle music)

Published Date : Apr 21 2022

SUMMARY :

It's great to have of everyone here. I appreciate it. I always call you Dr. Matt Wood The one and only, In joke, I love it. I think you had walk up music too. Yes, we all have our own So talk about your and the big data engines, One of the benefits and you have to be able to evaluate And you look back, and the theme was data as code. And you got to silo the data And so the more complex a domain students at the sophomore, junior level I didn't really know the answer myself. the domains are so broad you kind of We just had a guest, is a great name, by the way. It's his real name. His advice was just do projects. Matt: And get hands on. and you hugged on for the assets move on to something new. and get the funding is like, And you do a proposal, And then you have the tools there. So for fun, you can just code something. And I managed to convince the team That's when you came I was installing Hadoop I love the walk down memory Lane. How does that impact the analytics piece? that is slicing the data, And so if you look at something We understand the operations- I just want the answers. that you do not have to manage. And you don't have to and give the update while you're here is the opportunity to be able that you would normally do. And the business is going well. Thank you. I'm John Furrier, host of theCUBE.

ENTITIES

Entity	Category	Confidence
Johnny Dallas	PERSON	0.99+
Andy Jackson	PERSON	0.99+
John Furrier	PERSON	0.99+
Dave Velanta	PERSON	0.99+
Dave	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
John	PERSON	0.99+
Matt	PERSON	0.99+
Adam Selipsky	PERSON	0.99+
10 million	QUANTITY	0.99+
$50 million	QUANTITY	0.99+
Matt Wood	PERSON	0.99+
60 million	QUANTITY	0.99+
today	DATE	0.99+
50%	QUANTITY	0.99+
five	QUANTITY	0.99+
Adam	PERSON	0.99+
two groups	QUANTITY	0.99+
San Francisco, California	LOCATION	0.99+
16	QUANTITY	0.99+
2013	DATE	0.99+
Python	TITLE	0.99+
1-click	QUANTITY	0.99+
a year	QUANTITY	0.99+
Today	DATE	0.99+
Hadoop	TITLE	0.99+
ten years	QUANTITY	0.99+
two disciplines	QUANTITY	0.99+
New York City	LOCATION	0.99+
San Francisco, California	LOCATION	0.99+
an hour	QUANTITY	0.99+
first	QUANTITY	0.99+
this year	DATE	0.99+
CUBE	ORGANIZATION	0.99+
first time	QUANTITY	0.98+
50 %	QUANTITY	0.98+
theCUBE	ORGANIZATION	0.98+
millions	QUANTITY	0.98+
AWS Summit	EVENT	0.98+
YouTube	ORGANIZATION	0.98+
memory Lane	LOCATION	0.98+
Uber	ORGANIZATION	0.98+
20 year old	QUANTITY	0.97+
day two	QUANTITY	0.97+
One	QUANTITY	0.97+
SageMaker	TITLE	0.97+
AWS Summit 2022	EVENT	0.97+
QuickSight	TITLE	0.96+
both	QUANTITY	0.96+
Swami	PERSON	0.96+
50 years	QUANTITY	0.96+
one	QUANTITY	0.96+
SQL	TITLE	0.95+
Elastic MapReduce	TITLE	0.95+
Dr.	PERSON	0.94+
Johnny CUBE	PERSON	0.93+

Venkat Venkataramani and Dhruba Borthakur, Rockset | CUIBE Conversation

(bright intro music) >> Welcome to this "Cube Conversation". I'm your host, Lisa Martin. This is part of our third AWS Start-up Showcase. And I'm pleased to welcome two gentlemen from Rockset, Venkat Venkataramani is here, the CEO and co-founder and Dhruba Borthakur, CTO and co-founder. Gentlemen, welcome to the program. >> Thanks for having us. >> Thank you. >> Excited to learn more about Rockset, Venkat, talk to me about Rockset and how it's putting real-time analytics within the reach of every company. >> If you see the confluent IPO, if you see where the world is going in terms of analytics, I know, we look at this, real-time analytics is like the lost frontier. Everybody wants fast queries on fresh data. Nobody wants to say, "I don't need that. You know, give me slow queries on stale data," right? I think if you see what data warehouses and data lakes have done, especially in the cloud, they've really, really made batch analytics extremely accessible, but real-time analytics still seems too clumsy, too complex, and too expensive for most people. And we are on a mission to make, you know, real-time analytics, make it very, very easy and affordable for everybody to be able to take advantage of that. So that's our, that's what we do. >> But you're right, nobody wants a stale data or slower queries. And it seems like one of the things that we learned, Venkat, sticking with you in the last 18 months of a very strange world that we're living in, is that real-time is no longer a nice to have. It's really a differentiator and table stakes for businesses in every industry. How do you make it more affordable and accessible to businesses in so many different industries? >> I think that's a great question. I think there are, at a very high level, there are two categories of use cases we see. I think there is one full category of use cases where business teams and business units are demanding almost like business observability. You know, if you think about one domain that actually understood real-time and made everything work in real-time is the DevOps world, you know, metrics and monitoring coming out of like, you know, all these machines and because they really want to know as soon as something goes wrong, immediately, I want to, you know, be able to dive in and click and see what happens. But now businesses are demanding the same thing, right? Like a CEO wants to know, "Are we on track to hit our quarterly estimates or not? And tell me now what's happening," because you know, the larger the company, the more complex that have any operations dashboards are. And, you know, if you don't give them real-time visibility, the window of opportunity to do something about it disappears. And so they are really, businesses is really demanding that. And so that is one big use case we have. And the other strange thing we're also seeing is that customers are demanding real-time even from the products they are using. So you could be using a SaaS product for sales automation, support automation, marketing automation. Now I don't want to use a product if it doesn't have real-time analytics baked into the product itself. And so all these software companies, you know, providing a SaaS service to their cloud customers and clients, they are also looking to actually, you know, their proof of value really comes from the analytics that they can show within the product. And if that is not interactive and real-time, then they are also going to be left behind. So it's really a huge differentiator whether you're building a software product or your running a business, the real-time observability gives you a window of opportunity to actually do something about, you know, when something goes wrong, you can actually act on it very, very quickly. >> Right, which is absolutely critical. Dhruba, I want to get your take on this. As the CTO and co-founder as I introduced you, what were some of the gaps in the market back in 2016 that you saw that really necessitated the development of this technology? >> Yeah, for real-time analytics, the difference compared to what it was earlier is that all your things used to be a lot of batch processes. Again, the reason being because there was something called MapReduce, and that was a scanning system that was kind of a invention from Google, which talked about processing big data sets. And it was about scanning, scanning large data sets to give answers. Whereas for real-time analytics, the new trend is that how can you index these big datasets so that you can answer queries really fast? So this is what Rockset does as well, is that we have capabilities to index humongous amounts of data cheaply, efficiently, and economically feasible for our customers. And that's why query is the leverage the index to give fast (indistinct). This is one of the big changes. The other change obviously is that it has moved to the cloud, right? A lot of analytics have moved to the cloud. So Rockset is built natively for the cloud, which is why we can scale up, scale down resources when queries come and we can provide a great (indistinct) for people as data latency, and as far as query latencies comes on, both of these things. So these two trends, I think, are kind of the power behind moving, making people use more real-time analytics. >> Right, and as Venkat was talking about how it's an absolute differentiator for businesses, you know, last year we saw this really, this quick, all these quick pivots to survive and ultimately thrive. And we're seeing the businesses now coming out of this, that we're able to do that, and we're able to pivot to digital, to be successful and to out-compete those who maybe were not as fast. I saw that recently, Venkat, you guys had a new product release a few weeks ago, major product release, that is making real-time analytics on streaming data sources like Apache Kafka, Amazon Kinesis, Amazon DynamoDB, and data lakes a lot more accessible and affordable. Breakdown that launch for me, and how is it doing the accessibility and affordability that you talked about before? >> Extremely good question. So we're really excited about what we call SQL-based roll-ups, is what we call that release. So what does that do? So if you think about real-time analytics and even teeing off the previous question you asked on what is the gap in the market? The gap in the market is really, all that houses and lakes are built for batch. You know, they're really good at letting people accumulate huge volumes of data, and once a week, analyst asking a question, generating a report, and everybody's looking at it. And with real-time, the data never stops coming. The queries never stop coming. So how do you, if I want real-time metrics on all this huge volumes of data coming in, now if I drain it into a huge data lake and then I'm doing analytics on that, it gets very expensive and very complex very quickly. And so the new release that we had is called SQL-based roll-ups, where simply using SQL, you can define any real-time metric that you want to track across any dimensions you care about. It could be geo demographic and other dimensions you care about that and Rockset will automatically maintain all those real-time metrics for you in real-time in a highly accurate fashion. So you never have to doubt whether the metrics are valid and it will be accurate up to the second. And the best part is you don't have to learn a new language. You can actually use SQL to define those metrics and Rockset will automatically maintain that and scale that for you in the cloud. And that, I think, reduces the barrier. So like if somebody wants to build a real-time, you know, track something for their business in real-time, you know, you have to duct tape together multiple, disparate components and systems that were never meant to work with each other. Now you have a real-time database built for the cloud that is fully, you know, supports full feature SQL. So you can do this in a matter of minutes, which would probably take you days or weeks with alternate technologies. >> That's a dramatic X reduction in time there. I want to mention the Snowflake IPO since you guys mentioned the Confluent IPO. You say that Rockset does for real-time, what Snowflake did for batch. Dhruba, I want to get your perspective on that. Tell me about that. What do you mean by that? >> Yeah, so like we see this trend in the market where lot of analytics, which are very batch, they get a lot of value if they've moved more real-time, right? Like Venkat mentioned, when analytics powers, actual products, which need to use analytics into their, to make the product better. So Rockset very much plays in this area. So Rockset is the only solution. I shouldn't say solution. It's a database, it's a real-time database, which powers these kind of analytic systems. If you don't use Rockset, then you might be using maybe a warehouse or something, but you cannot get real-time because there is always a latency of putting data into the warehouse. It could be minutes, it could be hours. And then also you don't get too many people making concurrent queries on the warehouse. So this is another difference for real-time analytics because it powers applications, the query volume could be large. So that's why you need a real-time database and not a real-time warehouse or any other technologies for this. And this trend has really caught up because most people have either, are pretty much into this journey. You asked me this previous question about what has changed since 2016 as well. And this is a journey that most enterprises we see are already embarking upon. >> One thing too, that we're seeing is that more and more applications are becoming data intensive applications, right? We think of whether it's Instagram or DoorDash or whatnot, or even our banking app, we expect to have the information updated immediately. How do you help, Dhruba, sticking with you, how do you help businesses build and power those data intensive applications that the consumers are demanding? >> That's a great question. And we have booked, me and Venkat, we have seen these data applications at large scale when we were at Facebook earlier. We were both parts of the Facebook team. So we saw how real-time was really important for building that kind of a business, that was social media. But now we are taking the same kind of back ends, which can scale to like huge volumes of data to the enterprises as well. Venkat, do you have anything to add? >> Yeah, I think when you're trying to go from batch to real-time, you're 100% spot on that, a static report, a static dashboard actually becomes an application, becomes a data application, and it has to be interactive. So you're not just showing a newspaper where you just get to read. You want to click and deep dive, do slice and dice the data to not only understand what happened, but why it happened and come up with hypotheses to figure out what I want to do with it. So the interactivity is important and the real-timeliness now it becomes important. So the way we think about it is like, once you go into real-time analytics, you know, the data never stops coming. That's obvious. Data freshness is important. But the queries never stop coming also because one, when your dashboards and metrics are getting up to date real-time, you really want alerts and anomaly detection to be automatically built in. And so you don't even have to look at the graphs once a week. When something is off, the system will come and tap on your shoulder and say, "Hey, something is going on." And so that really is a real-time application at that point, because it's constantly looking at the data and querying on your behalf and only alerting you when something, actually, is interesting happening that you might need to look at. So yeah, the whole movement towards data applications and data intensive apps is a huge use case for us. I think most of our customers, I would say, are building a data application in one shape or form or another. >> And if I think of use cases like cutthroat customer 360, you know, as customers and consumers of whatever product or solution we're talking about, we expect that these brands know who we are, know what we've done with them, what we've bought, what to show me next is what I expect whether again, it's my bank or it's Instagram or something else. So that personalization approach is absolutely critical, and I imagine another big game changer, differentiator for the customers that use Rockset. What do you guys think about that? >> Absolutely, personalized recommendation is a huge use case. We see this all where we have, you know, Ritual is one of the customers. We have a case study on that, I think. They want to personalize. They generate offline recommendations for anything that the user is buying, but they want to use behavioral data from the product to personalize that experience and combine the two before they serve anything on the checkout lane, right? We also see in B2B companies, real-time analytics and data applications becoming a very important thing. And we have another customer, Command Alkon, who, you know, they have a supply chain platform for heavy construction and 80% of concrete in North America flows through their platform, for example. And what they want to know in real-time is reporting on how many concrete trucks are arriving at a big construction site, which ones are late and whatnot. And the real-time, you know, analytics needs to be accurate and needs to be, you know, up to the second, you know, don't tell me what trucks were, you know, coming like an hour ago. No, I need this right now. And so even in a B2B platform, we see that very similar trend trend where real-time reporting, real-time search, real-time indexing is actually a very, very important piece to the puzzle. And not just for B to C examples that you said, and the Instagram comment is also very appropriate because a hedge fund customer came to us and said, "I have kind of a dashboards built on top of like Snowflake. They're taking two to five seconds and I have certain parts of my dashboards, but I am actually having 50/60 visualizations. You do the math, it takes many minutes to load. And so they said, "Hey, you have some indexing deck. Can you make this faster?" Three weeks later, the queries that would take two to five seconds on a traditional warehouse or a cloud data warehouse came back in 18 milliseconds with Rockset. And so it is so fast that they said, you know, "If my internal dashboards are not as fast as Instagram, no one in my company uses it." These are their words. And so they are really, you know, the speed is really, really important. The scale is really, really important. Data freshness is important. If you combine all of these things and also make it simple for people to access with SQL-based, that's really the real unique value prop that we have a Rockset, which is what our customers love. >> You brought up something interesting, Venkat, that kind of made me think of the employee experience. You know, we always think of the customer 360. The customer experience with the employee experience, in my opinion, is inextricably linked. The employees have to have access to what they need to deliver and help these great customer relationships. And as you were saying, you know, the employees are expecting databases to be as fast as they see on Instagram, when they're, you know, surfing on their free time. Then adoption, I imagine, gets better, obviously, than the benefit from the end user and customers' perspective is that speed. Talk to me a little bit about how Rockset, and I would like to get both of your opinions here, is a facilitator of that employee productivity for your customers. >> This is a great question. In fact, the same hedge fund, you know, customer, I pushed them to go and measure how many times do people even look at all the data that you produce? (laughs) How many analysts and investors actually use your dashboards and ask them to go investigate at that. And one of the things that they eventually showed me was there was a huge uptake and their dashboards went from two to three second kind of like, you know, lags to 18 milliseconds. They almost got the daily active user for their own internal dashboards to be almost going from five people to the entire company, you know, so I think you're absolutely spot on. So it really goes back to, you know, really leveraging the data and actually doing something about it. Like, you know, if I ask a question and it's going to, you know, system is going to take 20 minutes to answer that, you know, I will probably not ask as many questions as I want to. When it becomes interactive and very, very fast, and all of a sudden, I not only start with a question and, you know, I can ask a follow-up question and then another follow-up question and make it really drive that to, you know, a conclusion and I can actually act upon it. And this really accelerates. So even if you kind of like, look at the macro, you hear these phrases, the world is going from batch to real-time, and in my opinion, when I look at this, people want to, you know, accelerate their growth. People want to make faster decisions. People want to get to, what can I do about this and get actionable insights. And that is not really going to come from systems that take 20 minutes to give a response. It's going to really come from systems that are interactive and real-time, and that's really the need for acceleration is what's really driving this movement from batch to real-time. And we're very happy to facilitate that and accelerate that moment. >> And it really drives the opportunity for your customers to monetize more and more data so that they can actually act on it, as you said, in real-time and do something about it, whether it's a positive experience or it is, you know, remediating a challenge. Last question guys, since we're almost out of time here, but I want to understand, talk to me about the Rockset-AWS partnership and what the value is for your customers. >> Okay, yeah. I'll get to that in a second, but I wanted to add something to your previous question. I think my opinion for all the customers that we see is that real-time analytics is addictive. Once they get used to it, they can go back to the old stuff. So this is what we have found with all our customers. So, yeah, for the AWS question, I think maybe Venkat can answer that better than me. >> Yeah, I mean, we love partnering with AWS. I think, they are the world's leader when it comes to public clouds. We have a lot of joint happy customers that are all AWS customers. Rockset is entirely built on top of AWS, and we love that. And there is a lot of integrations that Rockset natively comes with. So if you're already managing your data in AWS, you know, there are no data transfer costs or anything like that involved for you to also, you know, index that data in Rockset and actually build real-time applications and stream the data to Rockset. So I think the partnership goes in very, very deep in terms of like, we are an AWS customer, we are a partner and we, you know, our go-to market teams work with them. And so, yeah, we're very, very happy, you know, like, AWS fanboys here, yeah. >> Excellent, it sounds like a very great synergistic collaborative relationship, and I love, Dhruba, what you said. This is like, this is a great quote. "Real-time analytics is addictive." That sounds to me like a good addiction (all subtly laugh) for businesses and every industry to take out. Guys, it's been a pleasure talking to you. Thank you for joining me, talking to the audience about Rockset, what differentiates you, and how you're helping customers really improve their customer productivity, their employee productivity, and beyond. We appreciate your time. >> Thanks, Lisa. >> Thank you, thanks a lot. >> For my guests, I'm Lisa Martin. You're watching this "Cube Conversation". (bright ending music)

Published Date : Sep 14 2021

SUMMARY :

And I'm pleased to welcome the reach of every company. And we are on a mission to make, you know, How do you make it more is the DevOps world, you know, that you saw that really the new trend is that how can you index for businesses, you know, And the best part is you don't What do you mean by that? And then also you don't that the consumers are demanding? Venkat, do you have anything to add? that you might need to look at. you know, as customers and And the real-time, you And as you were saying, you know, So it really goes back to, you know, a positive experience or it is, you know, the customers that we see and stream the data to Rockset. and I love, Dhruba, what you said. For my guests, I'm Lisa Martin.

ENTITIES

Entity	Category	Confidence
Lisa Martin	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Rockset	ORGANIZATION	0.99+
Facebook	ORGANIZATION	0.99+
20 minutes	QUANTITY	0.99+
Dhruba Borthakur	PERSON	0.99+
2016	DATE	0.99+
two	QUANTITY	0.99+
80%	QUANTITY	0.99+
100%	QUANTITY	0.99+
Lisa	PERSON	0.99+
five people	QUANTITY	0.99+
last year	DATE	0.99+
Google	ORGANIZATION	0.99+
five seconds	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
one	QUANTITY	0.99+
Venkat Venkataramani	PERSON	0.99+
North America	LOCATION	0.99+
two categories	QUANTITY	0.99+
18 milliseconds	QUANTITY	0.99+
both	QUANTITY	0.99+
Instagram	ORGANIZATION	0.99+
Dhruba	ORGANIZATION	0.99+
SQL	TITLE	0.99+
Snowflake	ORGANIZATION	0.98+
one domain	QUANTITY	0.98+
two gentlemen	QUANTITY	0.98+
third	QUANTITY	0.98+
Three weeks later	DATE	0.97+
three second	QUANTITY	0.97+
two trends	QUANTITY	0.97+
One thing	QUANTITY	0.96+
second	QUANTITY	0.96+
Venkat	ORGANIZATION	0.95+
Ritual	ORGANIZATION	0.93+
an hour ago	DATE	0.92+
both parts	QUANTITY	0.91+
once a week	QUANTITY	0.91+
Snowflake	TITLE	0.9+
one big use case	QUANTITY	0.89+
50/60	QUANTITY	0.89+
few weeks ago	DATE	0.87+
one shape	QUANTITY	0.86+
Cube Conversation	TITLE	0.84+

Joshua Burgin, AWS Outposts & Michael Sotnick, Pure Storage

(digital music) >> My, what a difference 10 years makes in the tech industry. At the beginning of the last decade, the cloud generally in AWS specifically ushered in the era where leading developers they tapped into a powerful collection of remote services through programmable interfaces you know, out there in the cloud. By the end of the decade this experience would shape the way virtually every IT professional thinks about acquiring, deploying, consuming and managing technology. Today that remote cloud is becoming ubiquitous, expanding to the "edge" with connections to on-premises, data centers and other local points throughout the globe. One of the most talked about examples of this movement is AWS Outposts, which brings the Amazon experience to the edge wherever that may be. Welcome everyone to this CUBE conversation. My name is Dave Vellante. We're going to explore the ever expanding cloud and how two companies are delivering on customer needs to connect their data center operations to the cloud and the cloud to their on-prem infrastructure and applications. And with me are Joshua Burgin who's the General Manager of AWS Outposts and Michael Sotnick who's the VP at Global Alliances at Pure Storage. Gents, welcome come inside theCUBE. >> Right on. Well, thrilled to be here Dave. >> Great. >> Pleasure is mine, thank you. >> Awesome to have this conversation with you it's really our pleasure. So Joshua, let's start with Outpost. Maybe you could for the audience describe what it is maybe some of the use cases that you're seeing you're heard by narrative upfront maybe you can course correct anything I missed. >> Oh sure. I mean, I think you got it right on. AWS Outpost is a fully managed service that allows you to use AWS, API systems, tools, technology, hardware software innovation in your own data center or a colocation facility. And coming later this year as you put the edge in quotes at almost any edge site, as we announced the small form factor one you and two you Outposts at this last year's re-invent. >> I was excited when I saw Outpost a couple of years ago we were doing theCUBE at reinvent and I said, wow, this is truly going to be interesting. And I'm wondering like, how's Amazon, how are they going to partner? Where do some of the ecosystem get folks fit in? So Michael, you're an AWS Outpost ready partner. You know, what is that program all about? What does that mean for customers? >> Yeah, it's a great question. And you know, like you, Dave, I think we're as a vendor in technology we're inspired by what AWS has done. And when we look at Pure and see the opportunity we have you know, shared customer obsession, focused on outcomes, focused on NPS, great customer experience seeing AWS deliver the cloud to the edge, deliver the cloud to the data center that's just a great fit for us. So we rallied internally across our flash array of block storage solution a unified fast file and object flash plate solution and our container solution Portworx and, you know, across the entire portfolio we're the first to be in our segment the first to be service ready with AWS Outposts. And to us, it's an opportunity to link arms with AWS and cover some ground that's very familiar to us in the data center and clearly cover some ground that's very familiar to AWS in terms of great customer relationships across the board. >> Right, and, you know, I got to say, I've been a student of of Andy Jassy I always have listened to all his talks and go back and read the transcripts and Joshua I've learned that I never say never when it comes to AWS. And you see you guys moving into that, whatever you call it, the hybrid cloud, the on-premises really leaning in in a big way with Outposts and I wonder if you could talk about what's behind that expansion strategy? >> Sure, I mean, the way we looked at it obviously is always kind of working backwards from our customers. We have people tell us that they had some applications with low latency needs or where data resonancy or sovereignty was driven by regulations or in some cases where they needed to do local data processing something like an autonomous vehicle workload or in a factory or a healthcare facility. And they really wanted to say like, look, we're going to move all of our applications, you know the bulk of them to one of your regions in the fullness of time, but what's holding us back is that we want a consistent environment on-prem and in what you call the cloud. So we wanted a continuum of offerings from AWS to be able to serve all those needs. And that's really where Outpost came from. And, you know, we're seeing a lot of traction across financial services with companies like Morningstar and First Abu Dhabi bank, the iGaming space as you can imagine highly regulated industry, every city and, you know, municipality around the world wants to get in on that but they have their own regulations and they really require the infrastructure to be in a specific location and run a certain way. A company like TYPICA, which is based out of Europe they don't want to deliver different solutions depending on whether something's deployed in Minnesota or Germany or, you know, Vancouver. So that's where AWS Outpost comes in and it kind of fits that it works the same way as the things do in the region they can use the same tooling. >> Yeah, so Michael I'm going to ask you this question and maybe Joshua, you can chime in as well. I mean, you've got this, it's sort of a, win-win-win you know, Pure, AWS, you bringing that experience to on-premises, the customer gets that experience that Joshua just explained. I wonder if you could, I mean, you've been out now for a little bit testing the market learning here and there. What are the big takeaways in the learnings you're getting from customers? >> Yeah, I'll start and I'm sure Joshua can compliment quite a bit. And like Joshua hit on, right. You know, I think we take our cues from our customers, Dave, and you know what the customers are looking for, you know is a commercial relationship and so in addition to the technological inspiration we've got from AWS we offer the solution for Outposts and a Pure as a service model. So it's 100% subscription-based for the customer and they're able to consume it, you know the same way that they would all of their services from AWS including Outposts and it's also available on the AWS marketplace. So you've got to meet the customer where they want to be met first and foremost and so they appreciate that. And they see that as a great value in the relationship. You know, the growth of object, you know, I think is another one of those macro trends that's happening in our space. And as customers are deploying locations that are putting out petabytes of object storage requirements there's an increasing need for high-performance object. And that's where we can really compliment an Outpost implementation and deliver high performance and that kind of ubiquitous experience, that hybrid experience to allow the customer on a policy based way to maximize that on-prem performance with Outpost and Pure around that object data set. And then also manage the life cycle of that data and the economics of that data in the cloud. >> So, but Joshua, so you guys obviously you invented that, you know, the modern subscription model for infrastructure but it's different, you're actually installing hardware. So you had to sort of rethink how you did that. What have you learned and how has that model... How do you get it substantially similar as possible to the public cloud? >> Yeah, I mean, I think you called it a win-win-win earlier. And as much as we like to innovate we also like to make things feel kind of comfortable and familiar to people 'cause you think about there's both the developer who's using the APIs and the tools and also the CFO and the people in finance or procurement who are looking at the spending. So with Outposts, it actually feels very similar to the region. If you're used to purchasing our compute savings plans or what people used to call reserved instances or RIs the underlying infrastructure on the Outpost works in a very similar way. You're not going to be deploying a multi-rack Outpost and then ripping it out three weeks later so on demand doesn't really make sense there. But for all the services that are deployed on top of Outposts whether it's application load balancer or elastic cash or Elastic MapReduce, those have the same kind of on demand service model, the pricing model that they do in the region. And so very similarly, the Outpost ready program which lets you use trusted and certified third-party solutions, such as ones from Pure those are also going to feel familiar, whether you're coming from the on-prem world and you're already that technology for your storage, your network monitoring, your security or if you're using that solution from the marketplace in the AWS region, it's going to be a totally seamless deploy on the Outpost. So you're going to get something that's kind of the best of both worlds, familiar to you economically and from an installation perspective but also removing all that undifferentiated heavy lifting of having to patch and manage firmware upgrades and you asked this earlier, what customers really want is that there's this whole world of innovation, things that haven't even been invented yet. A few years ago, we hadn't invented Outposts. People want to know that as those innovations get released to the market they can take advantage of them without having to redeploy and so that's what having an AWS Outpost means. That as third parties or Amazon innovates new services can be made available without shipping a DVD or kind of spinning up an entire staff to manage that. >> Yeah, it's kind of interesting watching this equilibrium you know, take place. And I think it's going to continue to evolve. Obviously AWS has a huge impact on how people think about price, as I said upfront. And it seems like, you know, culturally, Michael, there's a fit. I mean, you guys have always sort of been into that you know, your evergreen model, for the first one that subscription sort of mindset. So it's sort of natural for you whereas, you know, maybe a a legacy company might not (chuckles) be able to lean in as hard as you guys are. Maybe some quick thoughts on that. >> Yeah, look, I love the way you framed that up and couldn't agree more. I think AWS is famous for a lot of things some of the values that they embrace and putting the customer at the center of everything they do couldn't be more shared, you know, with Pure. I think, you know, we talk about our company as one that runs two fires right, to give the customer a great experience. And so we know our way around the data center and I think the opportunity to give that customer, you know a consistent experience with AWS as they deliver Outpost to the data center is a really powerful combination. You know, I think one thing, just look at the backdrop of the pandemic, Dave, you know, every part of a company's organization is going through significant change. And I think the data center is absolutely at the center of some of those changes. And I think every one now as they look at the next generation data center they're asking themselves what are containers what does Kubernetes mean to my business? And I think the opportunity that, you know we see jointly with EKS as a partner is really to help customers achieve that goal of, you know the application deployments anywhere and the ability to drive that application, you know modernize that next generation application cycle. So I love the way you framed it up, giving us credit for being highly differentiated from our legacy competitors and we take great pride in that and really want to give a cloud-like experience to our customers. And I think what we're able to do with AWS Outpost is kind of bring that cloud-like experience that they have come to love from AWS into the data center and at the same time shine a light on what we've always done in terms of a cloud-like experience for the Pure customer. >> There's a lot of ways to skin a cat but when you've invented the cloud and you don't have a lot of legacy baggage you can kind of move faster. And I think that, you know, we're really excited about what's occurring here because take the term digital transformation I mean, before the pandemic (groaning) it's like, yeah okay, it had some meaning but you really had to squint through it and a lot of people were complacent about it. Well, we know what digital means now if you're not a digital business, you're out of business. And so it was kind of this forced march to digital I call it and as a result it really increases the need for things like automation and that cloud experience on-prem because I don't have time to be provisioning LUNs anymore. It's just what you guys call it undifferentiated heavy lifting that is really a no-no these days I just absolutely can't afford it. Let's close on what's next. I mean, we've got new form factors coming we're like super excited about when we see things like what Amazon is doing with custom Silicon we see these innovations coming out with processing power going through the roof. Everybody says Moore's law is dead but processing power is increasing faster than it ever has when you combine all these innovations of GPU's and NPUs and accelerators, it's just, it's amazing. And the costs are coming down so you're going to be able to take advantage of that. Outpost will take advantage of that, Pure will, New Designs but specifically as it relates to Outpost, you got one you, you got two you, you coming optimizing for the edge what do customers need to know about these solutions? Why should they consider this combination of Pure and AWS? Maybe Joshua you can start and Michael you can bring us home. >> Yeah, I mean, you hit a lot of the reasons that people should consider it, right. The pace of innovation is not going to slow down here at AWS or of course, with Pure. Whether you have the need for a single server, or you're somebody like dish rolling out a new cloud enabled, you know cloud native 5G network you want to work with somebody who can deploy all the way at the Telco edge right, with hardware innovation up to a local zone all the way up to a region. You don't want to be working with different providers for that and you don't know what you're going to need in three or five years and frankly, I'm not sure that we know everything yet either but we're going to continue to listen to our customers and as you mentioned, deliver things like graviton and inferential and trainium which are our innovations in custom Silicon. Those are delivering 40% price performance improvements for people who are migrating, that's really an enormous benefit. And we're bringing all of those to the Outpost as well so you don't have to choose between moving to the cloud and that being your only modernization option, you can move to the cloud and at the same time still operate on-prem, you know, at a colo facility or all the way at the edge using all of the same tooling. And you can work with best-in-breed third-party technologies like what's offered by Pure. >> Well, and Michael, I'm going to cut you off before you get a chance to close, but I'll let you close. The Portworx acquisition was really interesting to us because it brings that kind of portability, new programming model and something that Joshua said struck in my mind is when I think about the edge word to me what's going to win the edge you know, obviously the flexibility, the agility but the programmability and the customization. So many different use cases. We're not just going to take general purpose boxes and throw them over the fence and say, here you go. You know, the general purpose, that's not what's going to win the edge it's really going to take a lot more thought than that. But, so I just wanted to put that in there. Michael, bring us home, please. (laughing) >> Right on. Well, look you two, and no surprise here right, you two covered so much great ground there. From first principles you know, what does Pure look at? Like what we did being first in terms of service ready across Portworx, for EKS, for flash plate across unified fast file on object and flash ray, you know for block storage, being first with Outposts we want to be first for the one you and to you solutions. So I think customers can expect, you know that our partnership is going to continue to deliver that cloud-like experience, that cloud experience in the AWS context, that cloud-like experience in the Pure context, you know for their on-prem and hybrid workloads. And I think you hit it up so well like if you're not digital business, you're not in business. And so I think one thing that everyone learned over the last year is exactly that. The other thing they learned is they don't know what they don't know. And so they need to make bets on partners that are modern that are delivering simple solutions that solve complex problems that are automated and that are being delivered with the customer first mindset. And I think in the combination of AWS, Outposts and Pure, we're doing exactly that. >> Great point, so a lot of unknowns out there. Hey guys, congratulations on the progress you've made. It's a great partnership, two super innovative companies and really pleasure to have you in theCUBE. Thank you for coming on. >> Thanks for having us. >> Yeah, always a pleasure. Thank you so much. >> All right, thank you for watching everybody. This is Dave Vellante. We'll see you next time. (digital music)

Published Date : May 18 2021

SUMMARY :

and the cloud to their Well, thrilled to be here Dave. conversation with you I mean, I think you got it right on. Where do some of the deliver the cloud to the data center and I wonder if you could talk the bulk of them to one of your regions to ask you this question and they're able to consume it, you know that, you know, the familiar to you economically And it seems like, you know, culturally, So I love the way you framed And I think that, you and you don't know what I'm going to cut you off in the Pure context, you know and really pleasure to Thank you so much. All right, thank you

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Michael	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Joshua Burgin	PERSON	0.99+
Michael Sotnick	PERSON	0.99+
Germany	LOCATION	0.99+
Joshua	PERSON	0.99+
Morningstar	ORGANIZATION	0.99+
Europe	LOCATION	0.99+
Minnesota	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
40%	QUANTITY	0.99+
100%	QUANTITY	0.99+
TYPICA	ORGANIZATION	0.99+
two companies	QUANTITY	0.99+
two	QUANTITY	0.99+
10 years	QUANTITY	0.99+
Vancouver	LOCATION	0.99+
EKS	ORGANIZATION	0.99+
Andy Jassy	PERSON	0.99+
three	QUANTITY	0.99+
first	QUANTITY	0.99+
AWS Outposts	ORGANIZATION	0.99+
five years	QUANTITY	0.98+
three weeks later	DATE	0.98+
both	QUANTITY	0.98+
first one	QUANTITY	0.98+
Outpost	ORGANIZATION	0.98+
one thing	QUANTITY	0.98+
One	QUANTITY	0.98+
last year	DATE	0.98+
two fires	QUANTITY	0.97+
Telco	ORGANIZATION	0.97+
Pure Storage	ORGANIZATION	0.97+
one	QUANTITY	0.97+
later this year	DATE	0.97+
two super innovative companies	QUANTITY	0.97+
both worlds	QUANTITY	0.96+
First Abu Dhabi	ORGANIZATION	0.96+
Today	DATE	0.96+
Portworx	ORGANIZATION	0.95+

Brett McMillen, AWS | AWS re:Invent 2020

>>From around the globe. It's the cube with digital coverage of AWS reinvent 2020, sponsored by Intel and AWS. >>Welcome back to the cubes coverage of AWS reinvent 2020 I'm Lisa Martin. Joining me next is one of our cube alumni. Breton McMillan is back the director of us, federal for AWS. Right. It's great to see you glad that you're safe and well. >>Great. It's great to be back. Uh, I think last year when we did the cube, we were on the convention floor. It feels very different this year here at reinvent, it's gone virtual and yet it's still true to how reinvent always been. It's a learning conference and we're releasing a lot of new products and services for our customers. >>Yes. A lot of content, as you say, the one thing I think I would say about this reinvent, one of the things that's different, it's so quiet around us. Normally we're talking loudly over tens of thousands of people on the showroom floor, but great. That AWS is still able to connect in such an actually an even bigger way with its customers. So during Theresa Carlson's keynote, want to get your opinion on this or some info. She talked about the AWS open data sponsorship program, and that you guys are going to be hosting the national institutes of health, NIH sequence, read archive data, the biologist, and may former gets really excited about that. Talk to us about that because especially during the global health crisis that we're in, that sounds really promising >>Very much is I am so happy that we're working with NIH on this and multiple other initiatives. So the secret greed archive or SRA, essentially what it is, it's a very large data set of sequenced genomic data. And it's a wide variety of judge you gnomic data, and it's got a knowledge human genetic thing, but all life forms or all branches of life, um, is in a SRA to include viruses. And that's really important here during the pandemic. Um, it's one of the largest and oldest, um, gen sequence genomic data sets are out there and yet it's very modern. It has been designed for next generation sequencing. So it's growing, it's modern and it's well used. It's one of the more important ones that it's out there. One of the reasons this is so important is that we know to find cures for what a human ailments and disease and death, but by studying the gem genomic code, we can come up with the answers of these or the scientists can come up with answer for that. And that's what Amazon is doing is we're putting in the hands of the scientists, the tools so that they can help cure heart disease and diabetes and cancer and, um, depression and yes, even, um, uh, viruses that can cause pandemics. >>So making this data, sorry, I'm just going to making this data available to those scientists. Worldwide is incredibly important. Talk to us about that. >>Yeah, it is. And so, um, within NIH, we're working with, um, the, um, NCBI when you're dealing with NIH, there's a lot of acronyms, uh, and uh, at NIH, it's the national center for, um, file type technology information. And so we're working with them to make this available as an open data set. Why, why this is important is it's all about increasing the speed for scientific discovery. I personally think that in the fullness of time, the scientists will come up with cures for just about all of the human ailments that are out there. And it's our job at AWS to put into the hands of the scientists, the tools they need to make things happen quickly or in our lifetime. And I'm really excited to be working with NIH on that. When we start talking about it, there's multiple things. The scientists needs. One is access to these data sets and SRA. >>It's a very large data set. It's 45 petabytes and it's growing. I personally believe that it's going to double every year, year and a half. So it's a very large data set and it's hard to move that data around. It's so much easier if you just go into the cloud, compute against it and do your research there in the cloud. And so it's super important. 45 petabytes, give you an idea if it were all human data, that's equivalent to have a seven and a half million people or put another way 90% of everybody living in New York city. So that's how big this is. But then also what AWS is doing is we're bringing compute. So in the cloud, you can scale up your compute, scale it down, and then kind of the third they're. The third leg of the tool of the stool is giving the scientists easy access to the specialized tool sets they need. >>And we're doing that in a few different ways. One that the people would design these toolsets design a lot of them on AWS, but then we also make them available through something called AWS marketplace. So they can just go into marketplace, get a catalog, go in there and say, I want to launch this resolve work and launches the infrastructure underneath. And it speeds the ability for those scientists to come up with the cures that they need. So SRA is stored in Amazon S3, which is a very popular object store, not just in the scientific community, but virtually every industry uses S3. And by making this available on these public data sets, we're giving the scientists the ability to speed up their research. >>One of the things that Springs jumps out to me too, is it's in addition to enabling them to speed up research, it's also facilitating collaboration globally because now you've got the cloud to drive all of this, which allows researchers and completely different parts of the world to be working together almost in real time. So I can imagine the incredible power that this is going to, to provide to that community. So I have to ask you though, you talked about this being all life forms, including viruses COVID-19, what are some of the things that you think we can see? I expect this to facilitate. Yeah. >>So earlier in the year we took the, um, uh, genetic code or NIH took the genetic code and they, um, put it in an SRA like format and that's now available on AWS and, and here's, what's great about it is that you can now make it so anybody in the world can go to this open data set and start doing their research. One of our goals here is build back to a democratization of research. So it used to be that, um, get, for example, the very first, um, vaccine that came out was a small part. It's a vaccine that was done by our rural country doctor using essentially test tubes in a microscope. It's gotten hard to do that because data sets are so large, you need so much computer by using the power of the cloud. We've really democratized it and now anybody can do it. So for example, um, with the SRE data set that was done by NIH, um, organizations like the university of British Columbia, their, um, cloud innovation center is, um, doing research. And so what they've done is they've scanned, they, um, SRA database think about it. They scanned out 11 million entries for, uh, coronavirus sequencing. And that's really hard to do in a typical on-premise data center. Who's relatively easy to do on AWS. So by making this available, we can have a larger number of scientists working on the problems that we need to have solved. >>Well, and as the, as we all know in the U S operation warp speed, that warp speed alone term really signifies how quickly we all need this to be progressing forward. But this is not the first partnership that AWS has had with the NIH. Talk to me about what you guys, what some of the other things are that you're doing together. >>We've been working with NIH for a very long time. Um, back in 2012, we worked with NIH on, um, which was called the a thousand genome data set. This is another really important, um, data set and it's a large number of, uh, against sequence human genomes. And we moved that into, again, an open dataset on AWS and what's happened in the last eight years is many scientists have been able to compute about on it. And the other, the wonderful power of the cloud is over time. We continue to bring out tools to make it easier for people to work. So what they're not they're computing using our, um, our instance types. We call it elastic cloud computing. whether they're doing that, or they were doing some high performance computing using, um, uh, EMR elastic MapReduce, they can do that. And then we've brought up new things that really take it to the next layer, like level like, uh, Amazon SageMaker. >>And this is a, um, uh, makes it really easy for, um, the scientists to launch machine learning algorithms on AWS. So we've done the thousand genome, uh, dataset. Um, there's a number of other areas within NIH that we've been working on. So for example, um, over at national cancer Institute, we've been providing some expert guidance on best practices to how, how you can architect and work on these COVID related workloads. Um, NIH does things with, um, collaboration with many different universities, um, over 2,500, um, academic institutions. And, um, and they do that through grants. And so we've been working with doc office of director and they run their grant management applications in the RFA on AWS, and that allows it to scale up and to work very efficiently. Um, and then we entered in with, um, uh, NIH into this program called strides strides as a program for knowing NIH, but also all these other institutions that work within NIH to use the power of the cloud use commercial cloud for scientific discovery. And when we started that back in July of 2018, long before COVID happened, it was so great that we had that up and running because now we're able to help them out through the strides program. >>Right. Can you imagine if, uh, let's not even go there? I was going to say, um, but so, okay. So the SRA data is available through the AWS open data sponsorship program. You talked about strides. What are some of the other ways that AWS system? >>Yeah, no. So strides, uh, is, uh, you know, wide ranging through multiple different institutes. So, um, for example, over at, uh, the national heart lung and blood Institute, uh, do di NHL BI. I said, there's a lot of acronyms and I gel BI. Um, they've been working on, um, harmonizing, uh, genomic data. And so working with the university of Michigan, they've been analyzing through a program that they call top of med. Um, we've also been working with a NIH on, um, establishing best practices, making sure everything's secure. So we've been providing, um, AWS professional services that are showing them how to do this. So one portion of strides is getting the right data set and the right compute in the right tools, in the hands of the scientists. The other areas that we've been working on is making sure the scientists know how to use it. And so we've been developing these cloud learning pathways, and we started this quite a while back, and it's been so helpful here during the code. So, um, scientists can now go on and they can do self-paced online courses, which we've been really helping here during the, during the pandemic. And they can learn how to maximize their use of cloud technologies through these pathways that we've developed for them. >>Well, not education is imperative. I mean, there, you think about all of the knowledge that they have with within their scientific discipline and being able to leverage technology in a way that's easy is absolutely imperative to the timing. So, so, um, let's talk about other data sets that are available. So you've got the SRA is available. Uh, what are their data sets are available through this program? >>What about along a wide range of data sets that we're, um, uh, doing open data sets and in general, um, these data sets are, um, improving the human condition or improving the, um, the world in which we live in. And so, um, I've talked about a few things. There's a few more, uh, things. So for example, um, there's the cancer genomic Atlas that we've been working with, um, national cancer Institute, as well as the national human genomic research Institute. And, um, that's a very important data set that being computed against, um, uh, throughout the world, uh, commonly within the scientific community, that data set is called TCGA. Um, then we also have some, uh, uh, datasets are focused on certain groups. So for example, kids first is a data set. That's looking at a lot of the, um, challenges, uh, in diseases that kids get every kind of thing from very rare pediatric cancer as to heart defects, et cetera. >>And so we're working with them, but it's not just in the, um, uh, medical side. We have open data sets, um, with, uh, for example, uh, NOAA national ocean open national oceanic and atmospheric administration, um, to understand what's happening better with climate change and to slow the rate of climate change within the department of interior, they have a Landsat database that is looking at pictures of their birth cell, like pictures of the earth, so we can better understand the MCO world we live in. Uh, similarly, uh, NASA has, um, a lot of data that we put out there and, um, over in the department of energy, uh, there's data sets there, um, that we're researching against, or that the scientists are researching against to make sure that we have better clean, renewable energy sources, but it's not just government agencies that we work with when we find a dataset that's important. >>We also work with, um, nonprofit organizations, nonprofit organizations are also in, they're not flush with cash and they're trying to make every dollar work. And so we've worked with them, um, organizations like the child mind Institute or the Allen Institute for brain science. And these are largely like neuro imaging, um, data. And we made that available, um, via, um, our open data set, um, program. So there's a wide range of things that we're doing. And what's great about it is when we do it, you democratize science and you allowed many, many more science scientists to work on these problems. They're so critical for us. >>The availability is, is incredible, but also the, the breadth and depth of what you just spoke. It's not just government, for example, you've got about 30 seconds left. I'm going to ask you to summarize some of the announcements that you think are really, really critical for federal customers to be paying attention to from reinvent 2020. >>Yeah. So, um, one of the things that these federal government customers have been coming to us on is they've had to have new ways to communicate with their customer, with the public. And so we have a product that we've had for a while called on AWS connect, and it's been used very extensively throughout government customers. And it's used in industry too. We've had a number of, um, of announcements this weekend. Jasmine made multiple announcements on enhancement, say AWS connect or additional services, everything from helping to verify that that's the right person from AWS connect ID to making sure that that customer's gets a good customer experience to connect wisdom or making sure that the managers of these call centers can manage the call centers better. And so I'm really excited that we're putting in the hands of both government and industry, a cloud based solution to make their connections to the public better. >>It's all about connections these days, but I wish we had more time, cause I know we can unpack so much more with you, but thank you for joining me on the queue today, sharing some of the insights, some of the impacts and availability that AWS is enabling the scientific and other federal communities. It's incredibly important. And we appreciate your time. Thank you, Lisa, for Brett McMillan. I'm Lisa Martin. You're watching the cubes coverage of AWS reinvent 2020.

Published Date : Dec 10 2020

SUMMARY :

It's the cube with digital coverage of AWS It's great to see you glad that you're safe and well. It's great to be back. Talk to us about that because especially during the global health crisis that we're in, One of the reasons this is so important is that we know to find cures So making this data, sorry, I'm just going to making this data available to those scientists. And so, um, within NIH, we're working with, um, the, So in the cloud, you can scale up your compute, scale it down, and then kind of the third they're. And it speeds the ability for those scientists One of the things that Springs jumps out to me too, is it's in addition to enabling them to speed up research, And that's really hard to do in a typical on-premise data center. Talk to me about what you guys, take it to the next layer, like level like, uh, Amazon SageMaker. in the RFA on AWS, and that allows it to scale up and to work very efficiently. So the SRA data is available through the AWS open data sponsorship And so working with the university of Michigan, they've been analyzing absolutely imperative to the timing. And so, um, And so we're working with them, but it's not just in the, um, uh, medical side. And these are largely like neuro imaging, um, data. I'm going to ask you to summarize some of the announcements that's the right person from AWS connect ID to making sure that that customer's And we appreciate your time.

ENTITIES

Entity	Category	Confidence
NIH	ORGANIZATION	0.99+
Lisa Martin	PERSON	0.99+
Brett McMillan	PERSON	0.99+
Brett McMillen	PERSON	0.99+
AWS	ORGANIZATION	0.99+
NASA	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
July of 2018	DATE	0.99+
2012	DATE	0.99+
Theresa Carlson	PERSON	0.99+
Jasmine	PERSON	0.99+
Lisa	PERSON	0.99+
90%	QUANTITY	0.99+
New York	LOCATION	0.99+
Allen Institute	ORGANIZATION	0.99+
SRA	ORGANIZATION	0.99+
last year	DATE	0.99+
Breton McMillan	PERSON	0.99+
NCBI	ORGANIZATION	0.99+
45 petabytes	QUANTITY	0.99+
SRE	ORGANIZATION	0.99+
seven and a half million people	QUANTITY	0.99+
third leg	QUANTITY	0.99+
One	QUANTITY	0.99+
Intel	ORGANIZATION	0.99+
earth	LOCATION	0.99+
over 2,500	QUANTITY	0.99+
SRA	TITLE	0.99+
S3	TITLE	0.98+
pandemic	EVENT	0.98+
first partnership	QUANTITY	0.98+
one	QUANTITY	0.98+
child mind Institute	ORGANIZATION	0.98+
U S	LOCATION	0.98+
this year	DATE	0.98+
pandemics	EVENT	0.98+
national cancer Institute	ORGANIZATION	0.98+
both	QUANTITY	0.98+
national heart lung and blood Institute	ORGANIZATION	0.98+
NOAA	ORGANIZATION	0.97+
national human genomic research Institute	ORGANIZATION	0.97+
today	DATE	0.97+
Landsat	ORGANIZATION	0.96+
first	QUANTITY	0.96+
11 million entries	QUANTITY	0.96+
about 30 seconds	QUANTITY	0.95+
year and a half	QUANTITY	0.94+
AWS connect	ORGANIZATION	0.93+
university of British Columbia	ORGANIZATION	0.92+
COVID	EVENT	0.91+
COVID-19	OTHER	0.91+
over tens of thousands of people	QUANTITY	0.91+

Breaking Analysis: How Snowflake Plans to Change a Flawed Data Warehouse Model

>> From theCUBE Studios in Palo Alto in Boston, bringing you data-driven insights from theCUBE in ETR. This is Breaking Analysis with Dave Vellante. >> Snowflake is not going to grow into its valuation by stealing the croissant from the breakfast table of the on-prem data warehouse vendors. Look, even if snowflake got 100% of the data warehouse business, it wouldn't come close to justifying its market cap. Rather Snowflake has to create an entirely new market based on completely changing the way organizations think about monetizing data. Every organization I talk to says it wants to be, or many say they already are data-driven. why wouldn't you aspire to that goal? There's probably nothing more strategic than leveraging data to power your digital business and creating competitive advantage. But many businesses are failing, or I predict, will fail to create a true data-driven culture because they're relying on a flawed architectural model formed by decades of building centralized data platforms. Welcome everyone to this week's Wikibon Cube Insights powered by ETR. In this Breaking Analysis, I want to share some new thoughts and fresh ETR data on how organizations can transform their businesses through data by reinventing their data architectures. And I want to share our thoughts on why we think Snowflake is currently in a very strong position to lead this effort. Now, on November 17th, theCUBE is hosting the Snowflake Data Cloud Summit. Snowflake's ascendancy and its blockbuster IPO has been widely covered by us and many others. Now, since Snowflake went public, we've been inundated with outreach from investors, customers, and competitors that wanted to either better understand the opportunities or explain why their approach is better or different. And in this segment, ahead of Snowflake's big event, we want to share some of what we learned and how we see it. Now, theCUBE is getting paid to host this event, so I need you to know that, and you draw your own conclusions from my remarks. But neither Snowflake nor any other sponsor of theCUBE or client of SiliconANGLE Media has editorial influence over Breaking Analysis. The opinions here are mine, and I would encourage you to read my ethics statement in this regard. I want to talk about the failed data model. The problem is complex, I'm not debating that. Organizations have to integrate data and platforms with existing operational systems, many of which were developed decades ago. And as a culture and a set of processes that have been built around these systems, and they've been hardened over the years. This chart here tries to depict the progression of the monolithic data source, which, for me, began in the 1980s when Decision Support Systems or DSS promised to solve our data problems. The data warehouse became very popular and data marts sprung up all over the place. This created more proprietary stovepipes with data locked inside. The Enron collapse led to Sarbanes-Oxley. Now, this tightened up reporting. The requirements associated with that, it breathed new life into the data warehouse model. But it remained expensive and cumbersome, I've talked about that a lot, like a snake swallowing a basketball. The 2010s ushered in the big data movement, and Data Lakes emerged. With a dupe, we saw the idea of no schema online, where you put structured and unstructured data into a repository, and figure it all out on the read. What emerged was a fairly complex data pipeline that involved ingesting, cleaning, processing, analyzing, preparing, and ultimately serving data to the lines of business. And this is where we are today with very hyper specialized roles around data engineering, data quality, data science. There's lots of batch of processing going on, and Spark has emerged to improve the complexity associated with MapReduce, and it definitely helped improve the situation. We're also seeing attempts to blend in real time stream processing with the emergence of tools like Kafka and others. But I'll argue that in a strange way, these innovations actually compound the problem. And I want to discuss that because what they do is they heighten the need for more specialization, more fragmentation, and more stovepipes within the data life cycle. Now, in reality, and it pains me to say this, it's the outcome of the big data movement, as we sit here in 2020, that we've created thousands of complicated science projects that have once again failed to live up to the promise of rapid cost-effective time to insights. So, what will the 2020s bring? What's the next silver bullet? You hear terms like the lakehouse, which Databricks is trying to popularize. And I'm going to talk today about data mesh. These are other efforts they look to modernize datalakes and sometimes merge the best of data warehouse and second-generation systems into a new paradigm, that might unify batch and stream frameworks. And this definitely addresses some of the gaps, but in our view, still suffers from some of the underlying problems of previous generation data architectures. In other words, if the next gen data architecture is incremental, centralized, rigid, and primarily focuses on making the technology to get data in and out of the pipeline work, we predict it's going to fail to live up to expectations again. Rather, what we're envisioning is an architecture based on the principles of distributed data, where domain knowledge is the primary target citizen, and data is not seen as a by-product, i.e, the exhaust of an operational system, but rather as a service that can be delivered in multiple forms and use cases across an ecosystem. This is why we often say the data is not the new oil. We don't like that phrase. A specific gallon of oil can either fuel my home or can lubricate my car engine, but it can't do both. Data does not follow the same laws of scarcity like natural resources. Again, what we're envisioning is a rethinking of the data pipeline and the associated cultures to put data needs of the domain owner at the core and provide automated, governed, and secure access to data as a service at scale. Now, how is this different? Let's take a look and unpack the data pipeline today and look deeper into the situation. You all know this picture that I'm showing. There's nothing really new here. The data comes from inside and outside the enterprise. It gets processed, cleanse or augmented so that it can be trusted and made useful. Nobody wants to use data that they can't trust. And then we can add machine intelligence and do more analysis, and finally deliver the data so that domain specific consumers can essentially build data products and services or reports and dashboards or content services, for instance, an insurance policy, a financial product, a loan, that these are packaged and made available for someone to make decisions on or to make a purchase. And all the metadata associated with this data is packaged along with the dataset. Now, we've broken down these steps into atomic components over time so we can optimize on each and make them as efficient as possible. And down below, you have these happy stick figures. Sometimes they're happy. But they're highly specialized individuals and they each do their job and they do it well to make sure that the data gets in, it gets processed and delivered in a timely manner. Now, while these individual pieces seemingly are autonomous and can be optimized and scaled, they're all encompassed within the centralized big data platform. And it's generally accepted that this platform is domain agnostic. Meaning the platform is the data owner, not the domain specific experts. Now there are a number of problems with this model. The first, while it's fine for organizations with smaller number of domains, organizations with a large number of data sources and complex domain structures, they struggle to create a common data parlance, for example, in a data culture. Another problem is that, as the number of data sources grows, organizing and harmonizing them in a centralized platform becomes increasingly difficult, because the context of the domain and the line of business gets lost. Moreover, as ecosystems grow and you add more data, the processes associated with the centralized platform tend to get further genericized. They again lose that domain specific context. Wait (chuckling), there are more problems. Now, while in theory organizations are optimizing on the piece parts of the pipeline, the reality is, as the domain requires a change, for example, a new data source or an ecosystem partnership requires a change in access or processes that can benefit a domain consumer, the reality is the change is subservient to the dependencies and the need to synchronize across these discrete parts of the pipeline or actually, orthogonal to each of those parts. In other words, in actuality, the monolithic data platform itself remains the most granular part of the system. Now, when I complain about this faulty structure, some folks tell me this problem has been solved. That there are services that allow new data sources to really easily be added. A good example of this is Databricks Ingest, which is, it's an auto loader. And what it does is it simplifies the ingestion into the company's Delta Lake offering. And rather than centralizing in a data warehouse, which struggles to efficiently allow things like Machine Learning frameworks to be incorporated, this feature allows you to put all the data into a centralized datalake. More so the argument goes, that the problem that I see with this, is while the approach does definitely minimizes the complexities of adding new data sources, it still relies on this linear end-to-end process that slows down the introduction of data sources from the domain consumer beside of the pipeline. In other words, the domain experts still has to elbow her way into the front of the line or the pipeline, in this case, to get stuff done. And finally, the way we are organizing teams is a point of contention, and I believe is going to continue to cause problems down the road. Specifically, we've again, we've optimized on technology expertise, where for example, data engineers, well, really good at what they do, they're often removed from the operations of the business. Essentially, we created more silos and organized around technical expertise versus domain knowledge. As an example, a data team has to work with data that is delivered with very little domain specificity, and serves a variety of highly specialized consumption use cases. All right. I want to step back for a minute and talk about some of the problems that people bring up with Snowflake and then I'll relate it back to the basic premise here. As I said earlier, we've been hammered by dozens and dozens of data points, opinions, criticisms of Snowflake. And I'll share a few here. But I'll post a deeper technical analysis from a software engineer that I found to be fairly balanced. There's five Snowflake criticisms that I'll highlight. And there are many more, but here are some that I want to call out. Price transparency. I've had more than a few customers telling me they chose an alternative database because of the unpredictable nature of Snowflake's pricing model. Snowflake, as you probably know, prices based on consumption, just like AWS and other cloud providers. So just like AWS, for example, the bill at the end of the month is sometimes unpredictable. Is this a problem? Yes. But like AWS, I would say, "Kill me with that problem." Look, if users are creating value by using Snowflake, then that's good for the business. But clearly this is a sore point for some users, especially for procurement and finance, which don't like unpredictability. And Snowflake needs to do a better job communicating and managing this issue with tooling that can predict and help better manage costs. Next, workload manage or lack thereof. Look, if you want to isolate higher performance workloads with Snowflake, you just spin up a separate virtual warehouse. It's kind of a brute force approach. It works generally, but it will add expense. I'm kind of reminded of Pure Storage and its approach to storage management. The engineers at Pure, they always design for simplicity, and this is the approach that Snowflake is taking. Usually, Pure and Snowflake, as I have discussed in a moment, is Pure's ascendancy was really based largely on stealing share from Legacy EMC systems. Snowflake, in my view, has a much, much larger incremental market opportunity. Next is caching architecture. You hear this a lot. At the end of the day, Snowflake is based on a caching architecture. And a caching architecture has to be working for some time to optimize performance. Caches work well when the size of the working set is small. Caches generally don't work well when the working set is very, very large. In general, transactional databases have pretty small datasets. And in general, analytics datasets are potentially much larger. Is it Snowflake in the analytics business? Yes. But the good thing that Snowflake has done is they've enabled data sharing, and it's caching architecture serves its customers well because it allows domain experts, you're going to hear this a lot from me today, to isolate and analyze problems or go after opportunities based on tactical needs. That said, very big queries across whole datasets or badly written queries that scan the entire database are not the sweet spot for Snowflake. Another good example would be if you're doing a large audit and you need to analyze a huge, huge dataset. Snowflake's probably not the best solution. Complex joins, you hear this a lot. The working set of complex joins, by definition, are larger. So, see my previous explanation. Read only. Snowflake is pretty much optimized for read only data. Maybe stateless data is a better way of thinking about this. Heavily right intensive workloads are not the wheelhouse of Snowflake. So where this is maybe an issue is real-time decision-making and AI influencing. A number of times, Snowflake, I've talked about this, they might be able to develop products or acquire technology to address this opportunity. Now, I want to explain. These issues would be problematic if Snowflake were just a data warehouse vendor. If that were the case, this company, in my opinion, would hit a wall just like the NPP vendors that proceeded them by building a better mouse trap for certain use cases hit a wall. Rather, my promise in this episode is that the future of data architectures will be really to move away from large centralized warehouses or datalake models to a highly distributed data sharing system that puts power in the hands of domain experts at the line of business. Snowflake is less computationally efficient and less optimized for classic data warehouse work. But it's designed to serve the domain user much more effectively in our view. We believe that Snowflake is optimizing for business effectiveness, essentially. And as I said before, the company can probably do a better job at keeping passionate end users from breaking the bank. But as long as these end users are making money for their companies, I don't think this is going to be a problem. Let's look at the attributes of what we're proposing around this new architecture. We believe we'll see the emergence of a total flip of the centralized and monolithic big data systems that we've known for decades. In this architecture, data is owned by domain-specific business leaders, not technologists. Today, it's not much different in most organizations than it was 20 years ago. If I want to create something of value that requires data, I need to cajole, beg or bribe the technology and the data team to accommodate. The data consumers are subservient to the data pipeline. Whereas in the future, we see the pipeline as a second class citizen, with a domain expert is elevated. In other words, getting the technology and the components of the pipeline to be more efficient is not the key outcome. Rather, the time it takes to envision, create, and monetize a data service is the primary measure. The data teams are cross-functional and live inside the domain versus today's structure where the data team is largely disconnected from the domain consumer. Data in this model, as I said, is not the exhaust coming out of an operational system or an external source that is treated as generic and stuffed into a big data platform. Rather, it's a key ingredient of a service that is domain-driven and monetizable. And the target system is not a warehouse or a lake. It's a collection of connected domain-specific datasets that live in a global mesh. What is a distributed global data mesh? A data mesh is a decentralized architecture that is domain aware. The datasets in the system are purposely designed to support a data service or data product, if you prefer. The ownership of the data resides with the domain experts because they have the most detailed knowledge of the data requirement and its end use. Data in this global mesh is governed and secured, and every user in the mesh can have access to any dataset as long as it's governed according to the edicts of the organization. Now, in this model, the domain expert has access to a self-service and obstructed infrastructure layer that is supported by a cross-functional technology team. Again, the primary measure of success is the time it takes to conceive and deliver a data service that could be monetized. Now, by monetize, we mean a data product or data service that it either cuts cost, it drives revenue, it saves lives, whatever the mission is of the organization. The power of this model is it accelerates the creation of value by putting authority in the hands of those individuals who are closest to the customer and have the most intimate knowledge of how to monetize data. It reduces the diseconomies at scale of having a centralized or a monolithic data architecture. And it scales much better than legacy approaches because the atomic unit is a data domain, not a monolithic warehouse or a lake. Zhamak Dehghani is a software engineer who is attempting to popularize the concept of a global mesh. Her work is outstanding, and it's strengthened our belief that practitioners see this the same way that we do. And to paraphrase her view, "A domain centric system must be secure and governed with standard policies across domains." It has to be trusted. As I said, nobody's going to use data they don't trust. It's got to be discoverable via a data catalog with rich metadata. The data sets have to be self-describing and designed for self-service. Accessibility for all users is crucial as is interoperability, without which distributed systems, as we know, fail. So what does this all have to do with Snowflake? As I said, Snowflake is not just a data warehouse. In our view, it's always had the potential to be more. Our assessment is that attacking the data warehouse use cases, it gave Snowflake a straightforward easy-to-understand narrative that allowed it to get a foothold in the market. Data warehouses are notoriously expensive, cumbersome, and resource intensive, but they're a critical aspect to reporting and analytics. So it was logical for Snowflake to target on-premise legacy data warehouses and their smaller cousins, the datalakes, as early use cases. By putting forth and demonstrating a simple data warehouse alternative that can be spun up quickly, Snowflake was able to gain traction, demonstrate repeatability, and attract the capital necessary to scale to its vision. This chart shows the three layers of Snowflake's architecture that have been well-documented. The separation of compute and storage, and the outer layer of cloud services. But I want to call your attention to the bottom part of the chart, the so-called Cloud Agnostic Layer that Snowflake introduced in 2018. This layer is somewhat misunderstood. Not only did Snowflake make its Cloud-native database compatible to run on AWS than Azure in the 2020 GCP, what Snowflake has done is to obstruct cloud infrastructure complexity and create what it calls the data cloud. What's the data cloud? We don't believe the data cloud is just a marketing term that doesn't have any substance. Just as SAS is Simplified Application Software and iOS made it possible to eliminate the value drain associated with provisioning infrastructure, a data cloud, in concept, can simplify data access, and break down fragmentation and enable shared data across the globe. Snowflake, they have a first mover advantage in this space, and we see a number of fundamental aspects that comprise a data cloud. First, massive scale with virtually unlimited compute and storage resource that are enabled by the public cloud. We talk about this a lot. Second is a data or database architecture that's built to take advantage of native public cloud services. This is why Frank Slootman says, "We've burned the boats. We're not ever doing on-prem. We're all in on cloud and cloud native." Third is an obstruction layer that hides the complexity of infrastructure. and fourth is a governed and secured shared access system where any user in the system, if allowed, can get access to any data in the cloud. So a key enabler of the data cloud is this thing called the global data mesh. Now, earlier this year, Snowflake introduced its global data mesh. Over the course of its recent history, Snowflake has been building out its data cloud by creating data regions, strategically tapping key locations of AWS regions and then adding Azure and GCP. The complexity of the underlying cloud infrastructure has been stripped away to enable self-service, and any Snowflake user becomes part of this global mesh, independent of the cloud that they're on. Okay. So now, let's go back to what we were talking about earlier. Users in this mesh will be our domain owners. They're building monetizable services and products around data. They're most likely dealing with relatively small read only datasets. They can adjust data from any source very easily and quickly set up security and governance to enable data sharing across different parts of an organization, or, very importantly, an ecosystem. Access control and governance is automated. The data sets are addressable. The data owners have clearly defined missions and they own the data through the life cycle. Data that is specific and purposely shaped for their missions. Now, you're probably asking, "What happens to the technical team and the underlying infrastructure and the cluster it's in? How do I get the compute close to the data? And what about data sovereignty and the physical storage later, and the costs?" All these are good questions, and I'm not saying these are trivial. But the answer is these are implementation details that are pushed to a self-service layer managed by a group of engineers that serves the data owners. And as long as the domain expert/data owner is driving monetization, this piece of the puzzle becomes self-funding. As I said before, Snowflake has to help these users to optimize their spend with predictive tooling that aligns spend with value and shows ROI. While there may not be a strong motivation for Snowflake to do this, my belief is that they'd better get good at it or someone else will do it for them and steal their ideas. All right. Let me end with some ETR data to show you just how Snowflake is getting a foothold on the market. Followers of this program know that ETR uses a consistent methodology to go to its practitioner base, its buyer base each quarter and ask them a series of questions. They focus on the areas that the technology buyer is most familiar with, and they ask a series of questions to determine the spending momentum around a company within a specific domain. This chart shows one of my favorite examples. It shows data from the October ETR survey of 1,438 respondents. And it isolates on the data warehouse and database sector. I know I just got through telling you that the world is going to change and Snowflake's not a data warehouse vendor, but there's no construct today in the ETR dataset to cut a data cloud or globally distributed data mesh. So you're going to have to deal with this. What this chart shows is net score in the y-axis. That's a measure of spending velocity, and it's calculated by asking customers, "Are you spending more or less on a particular platform?" And then subtracting the lesses from the mores. It's more granular than that, but that's the basic concept. Now, on the x-axis is market share, which is ETR's measure of pervasiveness in the survey. You can see superimposed in the upper right-hand corner, a table that shows the net score and the shared N for each company. Now, shared N is the number of mentions in the dataset within, in this case, the data warehousing sector. Snowflake, once again, leads all players with a 75% net score. This is a very elevated number and is higher than that of all other players, including the big cloud companies. Now, we've been tracking this for a while, and Snowflake is holding firm on both dimensions. When Snowflake first hit the dataset, it was in the single digits along the horizontal axis and continues to creep to the right as it adds more customers. Now, here's another chart. I call it the wheel chart that breaks down the components of Snowflake's net score or spending momentum. The lime green is new adoption, the forest green is customers spending more than 5%, the gray is flat spend, the pink is declining by more than 5%, and the bright red is retiring the platform. So you can see the trend. It's all momentum for this company. Now, what Snowflake has done is they grabbed a hold of the market by simplifying data warehouse. But the strategic aspect of that is that it enables the data cloud leveraging the global mesh concept. And the company has introduced a data marketplace to facilitate data sharing across ecosystems. This is all about network effects. In the mid to late 1990s, as the internet was being built out, I worked at IDG with Bob Metcalfe, who was the publisher of InfoWorld. During that time, we'd go on speaking tours all over the world, and I would listen very carefully as he applied Metcalfe's law to the internet. Metcalfe's law states that the value of the network is proportional to the square of the number of connected nodes or users on that system. Said another way, while the cost of adding new nodes to a network scales linearly, the consequent value scores scales exponentially. Now, apply that to the data cloud. The marginal cost of adding a user is negligible, practically zero, but the value of being able to access any dataset in the cloud... Well, let me just say this. There's no limitation to the magnitude of the market. My prediction is that this idea of a global mesh will completely change the way leading companies structure their businesses and, particularly, their data architectures. It will be the technologists that serve domain specialists as it should be. Okay. Well, what do you think? DM me @dvellante or email me at david.vellante@siliconangle.com or comment on my LinkedIn? Remember, these episodes are all available as podcasts, so please subscribe wherever you listen. I publish weekly on wikibon.com and siliconangle.com, and don't forget to check out etr.plus for all the survey analysis. This is Dave Vellante for theCUBE Insights powered by ETR. Thanks for watching. Be well, and we'll see you next time. (upbeat music)

Published Date : Nov 14 2020

SUMMARY :

This is Breaking Analysis and the data team to accommodate.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Frank Slootman	PERSON	0.99+
Bob Metcalfe	PERSON	0.99+
Zhamak Dehghani	PERSON	0.99+
Metcalfe	PERSON	0.99+
AWS	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
Palo Alto	LOCATION	0.99+
November 17th	DATE	0.99+
75%	QUANTITY	0.99+
Snowflake	ORGANIZATION	0.99+
five	QUANTITY	0.99+
2020	DATE	0.99+
Snowflake	TITLE	0.99+
1,438 respondents	QUANTITY	0.99+
2018	DATE	0.99+
October	DATE	0.99+
david.vellante@siliconangle.com	OTHER	0.99+
today	DATE	0.99+
more than 5%	QUANTITY	0.99+
theCUBE Studios	ORGANIZATION	0.99+
First	QUANTITY	0.99+
2020s	DATE	0.99+
Snowflake Data Cloud Summit	EVENT	0.99+
Second	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
both dimensions	QUANTITY	0.99+
theCUBE	ORGANIZATION	0.99+
iOS	TITLE	0.99+
DSS	ORGANIZATION	0.99+
1980s	DATE	0.99+
each company	QUANTITY	0.99+
decades ago	DATE	0.98+
zero	QUANTITY	0.98+
first	QUANTITY	0.98+
2010s	DATE	0.98+
each quarter	QUANTITY	0.98+
Third	QUANTITY	0.98+
20 years ago	DATE	0.98+
Databricks	ORGANIZATION	0.98+
earlier this year	DATE	0.98+
both	QUANTITY	0.98+
Pure	ORGANIZATION	0.98+
fourth	QUANTITY	0.98+
IDG	ORGANIZATION	0.97+
Today	DATE	0.97+
each	QUANTITY	0.97+
Decision Support Systems	ORGANIZATION	0.96+
Boston	LOCATION	0.96+
single digits	QUANTITY	0.96+
siliconangle.com	OTHER	0.96+
one	QUANTITY	0.96+
Spark	TITLE	0.95+
Legacy EMC	ORGANIZATION	0.95+
Kafka	TITLE	0.94+
LinkedIn	ORGANIZATION	0.94+
Snowflake	EVENT	0.92+
first mover	QUANTITY	0.92+
Azure	TITLE	0.91+
InfoWorld	ORGANIZATION	0.91+
dozens and	QUANTITY	0.91+
mid to	DATE	0.91+

Clive Charlton and Aditya Agrawal | AWS Public Sector Summit Online

(upbeat music) >> Narrator: From around the globe. It's The CUBE, with digital coverage of AWS public sector online, (upbeat music) brought to you by, Amazon Web Services. >> Everyone welcome back to The CUBE virtual coverage, of AWS public sector summit online. I'm John Furrier, your host of The CUBE. Normally we're in person, out on Asia-Pacific, and all the different events related to public sector. But this year we have to do it remote, and we're going to do the remote virtual CUBE, with Data Virtual Public Sector Online Summit. And we have two great guests here, about Digital Earth Africa project, Clive Charlton. Head of Solutions Architecture, Sub-Saharan Africa with AWS, Clive thanks for coming on, and Aditya Agrawal founder of D4DInsights, and also the advisor for the Digital Earth Africa project with AWS. So gentlemen, thank you for coming on. Appreciate you coming on remotely. >> Thanks for having us. >> Thank you for having us, John. >> So Clive take us through real quickly. Just take a minute to describe what is the Digital Earth Africa Project. What are the problems, that you're aiming to solve? >> Well, we're really aiming to provide, actionable data to governments, and organization around Africa, by providing satellite imagery, in an easy to use format, and doing that on the cloud, that serves countries throughout Africa. >> And just from a cloud perspective, give us a quick taste of what's going on, just with the tech, it's on Amazon. You got a little satellite action. Is there ground station involved? Give us a little bit more color around, you know, what's the scope of the project. >> Yeah, so, historically speaking you'd have to process satellite imagery down link it, and then do some heavy heavy lifting, around the processing of the data. Digital Earth Africa was built, from the experiences from Digital Earth Australia, originally developed by a Geo-sciences Australia and they use container services for Kubernetes's called Elastic Kubernetes Service to spin up virtual machines, which we are required to process the raw satellite imagery, into a format called a Cloud Optimized GeoTIFF. This format is used to store very large volumes of data in a format that's really easy to query. So, organizations can just use NHTTP get range request. Just a query part of the file, that they're interested in, which means, the results are served much, much quicker, from much, much better overall experience, under the hood, the store where the data is stored in the Amazon Simple Storage Service, which is S3, and the Metadata Index in a Relational Database Service, that runs the Open Data CUBE Library, which is allows Digital Earth Africa, to store this data in both space and time. >> It's interesting. I just did a, some interviews last week, on a symposium on space and cybersecurity, and we were talking about , the impact of satellites and GPS and just the overall infrastructure shift. And it's just another part of the edge of the network. Aditya, I want to get your thoughts on this, and your reaction to the Digital Earth, cause you're an advisor. Let's zoom out. What's the impact of people's lives? Give us a quick overview, of how you see it playing out because, explaining to someone, who doesn't know anything about the project, like, okay what is it about, and how does it actually impact people? >> Sure. So, you know, as, as Clive mentioned, I mean there's, there's definitely a, a digital infrastructure behind Digital Earth Africa, in a way that it's going to be able to serve free and open satellite data. And often the, the issue around satellite data, especially within the context of Africa, and other parts of the world is that there's a level of capacity that's required, in order to be able to use that data. But there's also all kinds of access issues, because, traditionally satellite data is heavy. There's the old model of being able to download the data and then being able to do something with it. And then often about 80% of the time, that you spend on satellite data is spent, just pre processing the data, before you can actually, do any of the fun analysis around it, that really sort of impacts the kinds of decisions and actions that you're looking for. And so that's why Digital Earth Africa. And that's why this partnership, with Amazon is a fantastic partnership, because it really allows us, to be able, to scale the approach across the entire continent, make it easy for that data to be accessed and make it easier for people to be able to use that data. The way that Digital Earth Africa is being operationalized, is that we're not just looking at it, from the perspective of, let's put another infrastructure into Africa. We want this program, and it is a program, that we want institutionalized within Africa itself. One that leverages expertise across the continent, and one that brings in organizations across the continent to really sort of take the leadership and ownership of this program as it moves forward. The idea of it is that, once you're able to have this information, being able to address issues like food security, climate change, coastal resilience, land degradation where illegal mining is, where is the water? We want to be able to do that, in a way that it's really looking at what are the national development priorities within the countries themselves, and how does it also then support regional and global frameworks like Africa's Agenda 2063 and the sustainable development goals. >> No doubt in my mind, obviously, is that huge benefits to these kinds of technologies. I want to also just ask you, as a follow up is a huge space race going on, right now, explosion of availability of satellite data. And again, more satellites going up, There's more congestion, more contention. Again, we had a big event on that cybersecurity, and the congestion issue, but, you know, satellite data was power everyone here in the United States, you want an Uber, you want Google Maps you've got your everywhere with GPS, without it, we'd be kind of like (laughing), wondering what's going on. How do we even vote these days? So certainly an impact, but there's a huge surge of availability, of the use of satellite data. How do you explain this? And what are some of the challenges, from the data side that's coming, from the Digital Earth Africa project that you guys hope to resolve? >> Sure. I mean, that's a great question. I mean, I think at one level, when you're looking at the space race right now, satellites are becoming cheaper. They're becoming more efficient. There's increased technology now, on the types of sensors that you can deploy. There's companies like Planet, that are really revolutionizing how even small countries are able to deploy their own satellites, and the constellation that they're putting forward, in terms of the frequency by which, you're able to get data, for any given part of the earth on a daily basis, coupled with that. And you know, this is really sort of in climbs per view, but the cloud computing capabilities, and overall computing power that you have today, then what you had 10 years, 15 years ago is so vastly different. What used to take weeks to do before, for any kind of analysis on satellite data, which is heavy data now takes, you know, minutes or hours to do. So when you put all that together, again, you know, I think it really speaks, to the power of this partnership with Amazon and really, what that means, for how this data is going to be delivered to Africa, because it really allows for the scalability, for anything that happens through Digital Earth Africa. And so, for example, one of the approaches, that we're taking us, we identify what the priorities, and needs are at the country level. Let's say that it's a land degradation, there's often common issues across countries. And so when we can take one particular issue, tested with additional countries, and then we can scale it across the whole continent because the infrastructure is there for the whole continent. >> Yeah. That's a great point. So many storylines here. We'll get to climb in a second on sustainability. And I want to talk about the Open Data Platform. Obviously, open data, having data is one thing, but now train data, and having more trusted data becomes a huge issue. Again, I want to dig into that for a second, but, Clive, I want to ask you, first, what region are we in? I mean, is this, you guys actually have a great, first of all, we've been covering the region expansion from Bahrain all the way, as moves around the world, probably soon in space. There'll be a region Amazon space station region probably, someday in the future but, what region are you running the project out of? Can you, and why is it important? Can you share the update on the regional piece? >> Well, we're very pleased, that Digital Earth Africa, is using the new Africa region in Cape Town, in South Africa, which was launched in April of this year. It's one of 24 regions around the world and we have another three new regions announced, what this means for users of Digital Earth Africa is, they're able to use region closest to them, which gives them the best user experience. It's the, it's the quickest connection for them. But more importantly, we also wanted to use, an African solution, for African people and using the Africa region in Cape Town, really aligned with that thinking. >> So, localization on the data, latency, all that stuff is kind of within the region, within country here. Right? >> That's right, Yeah >> And why is that important? Is there any other benefits? Why should someone care? Obviously, this failover option, I mean, in any other countries to go to, but why is having something, in that region important for this project? >> Well, it comes down to latency for the, for the users. So, being as close to the data, as possible is, is really important, for the user experience. Especially when you're looking at large data sets, and big queries. You don't want to be, you don't want to be waiting a long lag time, for that query to go backwards and forwards, between the user and the region. So, having the data, in the Africa region in Cape Town is important. >> So it's about the region, I love when these new regions rollout from Amazon, Cause obviously it's this huge buildup CapEx, in this huge data center servers and everything. Sustainability is a huge part of the story. How does the sustainability piece fit into the, the data initiative supported in Africa? Can you share some updates on that? >> Well, this, this project is also closely aligned with the, Amazon Sustainability Data Initiative, which looks to accelerate sustainability research. and innovation, really by minimizing the cost, and the time required to acquire, and analyze large sustainability datasets. So the initiative supports innovators, and researchers with the data and tools, and, and technical experience, that they need to move sustainability, to the next level. These are public datasets and publicly available to anyone. In addition, to that, the initiative provides cloud grants to those who are interested in exploring, exploring the use of AWS technology and scalable infrastructure, to serve sustainability challenges, of this nature. >> Aditya, I want to hear your thoughts, on this comment that Clive made around latency, and certainly having a region there has great benefits. You don't need to hop on that. Everyone knows I'm a big fan of the regional model, but it brings up the issue, of what's going on in the country, from an infrastructure standpoint, a lot of mobility, a lot of edge computing. I can almost imagine that. So, so how do you see that evolving, from a business standpoint, from a project standpoint data standpoint, can you comment and react to that edge, edge angle? >> Yeah, I mean, I think, I think that, the value of an open data infrastructure, is that, you want to use that infrastructure, to create a whole data ecosystem type of an approach. And so, from the perspective of being able. to make this data readily accessible, making it efficiently accessible, and really being able to bring industry, into that ecosystem, because of what we really want as we, as the program matures, is for this program, to then also instigate the development of new businesses, entrepreneurship, really get the young people across Africa, which has the largest proportion of young people, anywhere in the world, to be engaged around what you can do, with satellite data, and the types of businesses that can be developed around it. And, so, by having all of our data reside in Cape Town on the continent there's obviously technical benefits, to that in terms of, being able to apply the data, and create new businesses. There's also a, a perception in the fact that, the data that Digital Earth Africa is serving, is in Africa and residing in Africa which does have, which does go a long way. >> Yeah. And that's a huge value. And I can just imagine the creativity cloud, if you can comment on this open data platform idea, because some of the commentary that we've been having on The CUBE here, and all around the world is data's great. We all know we're living with a lot of data, you starting to see that, the commoditization and horizontal scalability of data, is one thing, but to put it into software defined environments, whether, it's an entrepreneur coding up an app, or doing something to share some transparency, around some initiatives going on within the region or on the continent, it's about trusted data. It's about sharing algorithms. AI is also a consumer of data, machines consume data. So, it's not just the technology data, is part of this new normal. What's this Open Data Platform, And how does that translate into value in your opinion? >> I, yeah. And you know, when, when data is shared on, on AWS anyone can analyze it and build services on top of it, using a broad range of compute and data to data analytics products, you know, things like Amazon EC2, or Lambda, which is all serverless compute, to things like Amazon Elastic MapReduce, for complex extract and transformation processes, but sharing data in the cloud, lets users, spend more time on the data analysis, rather than, than the data acquisition. And researchers can analyze data shared on AWS, without needing to pay to store their own copy, which is what the Open Data Platform provides. You only have to pay for the compute that you use and you don't need to purchase storage, to start a new project. So the registry of the open data on AWS, makes it easy to find those datasets, but, by making them publicly available through AWS services. And when you share, share your data on AWS, you make it available, to a large and growing community of developers, and startups, and enterprises, all around the world. And you know, and we've been talking particularly around, around Africa. >> Yeah. So it's an open source model, basically, it's free. You don't, it doesn't cost you anything probably, just started maybe down the road, if it gets heavy, maybe to charging but the most part easy for scientists to use and then you're leveraging it into the open, contributing back. Is that right? >> Yep. That's right. To me getting, getting researchers, and startups, and organizations growing quickly, without having to worry about the data acquisition, they can just get going and start building. >> I want to get back to Aditya, on this skill gap issue, because you brought up something that, I thought was really cool. People are going to start building apps. I'm going to start to see more innovation. What are the needs out there? Because we're seeing a huge onboarding of new talent, young talent, people rescaling from existing jobs, certainly COVID accelerated, people looking for more different kinds of work. I'm sure there's a lot of (laughing) demand to, to do some innovative things. The question I always get, and want to get your reaction is, what are the skills needed to, to get involved, to one contribute, but also benefit from it, whether it's the data satellite, data or just how to get involved skill-wise >> Sure. >> Yes. >> Yeah. So most recently we've created a six week training course. That's really kind of taken users from understanding, the basics of Earth Observation Data, to how to work, with Python, to how to create their own Jupyter notebooks, and their own Use cases. And so there's a, there's a wide sort of range of skill sets, that are required depending on who you are because, effectively, what we want to be able to do is get everyone from, kind of the technical user, that might have some remote sensing background to the developer, to the policy maker, and decision maker, to understand the value of this infrastructure, whether you're the one who's actually analyzing the data. If you're the one who's developing new applications, or you're taking that information from a managerial or policy level discussion to actually deliver the action and sort of impact that you're looking for. And so, you know, in, in that regard, we're working with ITC in the Netherlands and again, with institutions across Africa, that already have a mandate, and expertise in this particular area, to create a holistic capacity development program, that will address all of those different factors. >> So I guess the follow up question I want to have is, how do you ensure the priorities of Africa are addressed, as part of this program? >> Yeah, so, we are, we've created a governance model, that really is both top down, and bottom up. At the bottom up level, We have a technical advisory committee, that has over 15 institutions, many of which are based across Africa, that really have a good understanding of the needs, the priorities, and the mandate for how to work with countries. And at the top down level, we're developing a governing board, that will be inclusive, of the key continental level institutions, that really provide the political buy-in, the sustainability of the program, and really provide overall guidance. And within that, we're also creating an operational models, such that these institutions, that do have the capacity to support the program, they're actually the ones, who are also going to be supporting, the implementation of the program itself. >> And there's been some United Nations, sustained development projects all kinds of government involvement, around making sure certain things would happen, within the country. Can you just share, some of the highlights, or some of the key initiatives, that are going on, that you're supporting, to make it a better, better world? >> Yeah. So this is, this program is very closely aligned to a sustainable development agenda. And so looking after, looking developing methods, that really address, the sustainable development goals as one facet, in Africa, there's another program looking overall, overall national development priorities and sustainability called the Agenda 2063. And really like, I think what it really comes down to this, this wouldn't be happening, without the country level involvement themselves. So, this started with five countries, originally, Senegal, Ghana, Kenya, Tanzania, and the government of Kenya itself, has really been, a kind of a founding partner for, how Digital Earth Africa and it's predecessor of Africa Regional Data Cube, came to be. And so without high level support, and political buying within those governments, I mean, it's really because of that. That's why we're, we're where we are. >> I need you to thank you for coming on and sharing that insight. Clive will give you the final word, for the folks watching Digital Earth Africa, processes, petabytes of data. I mean the satellite data as well, huge, you mentioned it's a new region. You're running Kubernetes, Elastic Kubernetes Service, making containers easy to use, pay as you go. So you get cutting edge, take the one minute to, to share why this region's cutting edge. Does it have the scale of other regions? What should they know about AWS, in Cape Town, for Africa's new region? Take a minute to, to put plugin. >> Yeah, thank you for that, John. So all regions are built in the, in the same way, all around the world. So they're built for redundancy and reliability. They typically have a minimum of three, what we call Availability Zones. And each one is a contains a, a cluster of, of data centers, and all interconnected with fast fiber. So, you know, you can survive, you know, a failure with with no impact to your services. And the Cape Town region is built in exactly the same the same way, we have most of the services available in the, in the Cape Town region, like most other regions. So, as a user of AWS, you, you can have the confidence that, You can deploy your services and workloads, into AWS and run it in the same in the same way, with the same kind of speed, and the same kind of support, and infrastructure that's backing any region, anywhere else in the world. >> Well great. Thanks for that plug, Aditya, thank you for your insight. And again, innovation follows cloud computing, whether you're building on top of it as a startup a government or enterprise, or the big society better, in this case, the Digital Earth Africa project. Great. A great story. Thank you for sharing. I appreciate it. >> Thank you for having us. >> Thank you for having us, John >> I'm John Furrier with, The CUBE, virtual remote, not in person this year. I hope to see you next time in person. Thanks for watching. (upbeat music) (upbeat music decreases)

Published Date : Oct 20 2020

SUMMARY :

Narrator: From around the globe. and all the different events What are the problems, and doing that on the cloud, you know, and the Metadata Index in a and just the overall infrastructure shift. and other parts of the world and the congestion issue, and the constellation that on the regional piece? It's one of 24 regions around the world So, localization on the data, in the Africa region in So it's about the region, and the time required to acquire, fan of the regional model, and the types of businesses and all around the world is data's great. the compute that you use it into the open, about the data acquisition, What are the needs out there? kind of the technical user, and the mandate for how or some of the key initiatives, and the government of Kenya itself, I mean the satellite data as well, and the same kind of support, or the big society better, I hope to see you next time in person.

ENTITIES

Entity	Category	Confidence
Aditya Agrawal	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Clive	PERSON	0.99+
Cape Town	LOCATION	0.99+
John	PERSON	0.99+
Africa	LOCATION	0.99+
John Furrier	PERSON	0.99+
Amazon Web Services	ORGANIZATION	0.99+
United States	LOCATION	0.99+
six week	QUANTITY	0.99+
Agenda 2063	TITLE	0.99+
Clive Charlton	PERSON	0.99+
Python	TITLE	0.99+
Aditya	PERSON	0.99+
Netherlands	LOCATION	0.99+
South Africa	LOCATION	0.99+
five countries	QUANTITY	0.99+
United Nations	ORGANIZATION	0.99+
last week	DATE	0.99+
one minute	QUANTITY	0.99+
Digital Earth Africa	ORGANIZATION	0.99+
earth	LOCATION	0.99+
both	QUANTITY	0.98+
D4DInsights	ORGANIZATION	0.98+
April of this year	DATE	0.98+
10 years	QUANTITY	0.98+
this year	DATE	0.98+
Uber	ORGANIZATION	0.98+
Bahrain	LOCATION	0.98+
S3	TITLE	0.97+
15 years ago	DATE	0.97+
over 15 institutions	QUANTITY	0.97+
each one	QUANTITY	0.97+
Data Virtual Public Sector Online Summit	EVENT	0.97+
one	QUANTITY	0.96+
first	QUANTITY	0.96+
about 80%	QUANTITY	0.96+
three	QUANTITY	0.96+
Earth	LOCATION	0.96+
Africa Regional Data Cube	ORGANIZATION	0.96+
Google Maps	TITLE	0.95+
one level	QUANTITY	0.94+

Ed Walsh | CUBE Conversation, August 2020

>> From theCUBE Studios in Palo Alto in Boston, connecting with thought leaders all around the world, this is theCUBE Conversation. >> Hey, everybody, this is Dave Vellante, and welcome to this CXO Series. As you know, I've been running this series discussing major trends and CXOs, how they've navigated through the pandemic. And we've got some good news and some bad news today. And Ed Walsh is here to talk about that. Ed, how you doing? Great to see you. >> Great seeing you, thank you for having me on. I really appreciate it. So the bad news is Ed Walsh is leaving IBM as the head of the storage division (indistinct). But the good news is, he's joining a new startup as CEO, and we're going to talk about that, but Ed, always a pleasure to have you. You're quite a run at at IBM. You really have done a great job there. So, let's start there if we can before we get into the other part of the news. So, you give us the update. You're coming off another strong quarter for the storage business. >> I would say listen, they're sweet, heartily, but to be honest, we're leaving them in a really good position where they have sustainable growth. So they're actually IBM storage in a very good position. I think you're seeing it in the numbers as well. So, yeah, listen, I think the team... I'm very proud of what they were able to pull off. Four years ago, they kind of brought me in, hey, can we get IBM storage back to leadership? They were kind of on their heels, not quite growing, or not growing but falling back in market share. You know, kind of a distant third place finisher, and basically through real innovation that mattered to clients which that's a big deal. It's the right innovation that matters to the clients. We really were able to dramatically grow, grow all different four segments of the portfolio. But also get things like profitability growing, but also NPS growing. It really allowed us to go into a sustainable model. And it's really about the team. You heard I've talked about team all the time, which is you get a good team and they really nailed great client experiences. And they take the right offerings and go to market and merge it. And I'll tell you, I'm very proud of what the IBM team put together. And I'm still the number one fan and inside or outside IBM. So it might be bittersweet, but I actually think they're ready for quite some growth. >> You know Ed, when you came in theCUBE, right after you had joined IBM, a lot of people are saying, Ed Walsh joined an IBM storage division to sell the division. And I asked you on theCUBE, are you there to sell division? And you said, no, absolutely not. So it's always it seemed to me, well, hey, it's good. It's a good business, good cash flow business, got a big customer base, so why would IBM sell it? Never really made sense to me. >> I think it's integral to what IBM does, I think it places their client base in a big way. And under my leadership, really, we got more aligned with what IBM is doing from the big IBM right. What we're doing around Red Hat hybrid multi cloud and what we're doing with AI. And those are big focuses of the storage portfolio. So listen, I think IBM as a company is in a position where they're really innovating and thriving, and really customer centric. And I think IBM storage is benefiting from that. And vice versa. I think it's a good match. >> So one of the thing I want to bring up before we move on. So you had said you were seeing a number. So I want to bring up a chart here. As you know, we've been using a lot of data and sharing data reporting from our partner. ETR, Enterprise Technology Research, they do quarterly surveys. They have a very tight methodology, it's similar to NPS. But it's a net score, we call it methodology. And every quarter they go out and what we're showing here is the results from the last three quarter, specific to IBM storage and IBM net score in storage. And net scores is essentially, we ask people are you spending more, are you spending less, we subtract the less from the more and that's the net score. And you can see when you go back to the October 19, survey, you know, low single digits and then it dipped in the April survey, which was the height of the pandemic. So this was this is forward looking. So in the height of the pa, the lockdown people were saying, maybe I'm going to hold off on budgets. But then now look at the July survey. Huge, huge up check. And I think this is testament to a couple of things. One is, as you mentioned, the team. But the other is, you guys have done a good job of taking R&D, building a product pipeline and getting it into the field. And I think that shows up in the numbers. That was really a one of the hallmarks of your leadership. >> Yeah, I mean, they're the innovation. IBM is there's almost an embarrassment of riches inside. It's how do you get in the pipeline? We went from a typically about for four years, four and a half year cycles, not a two year cycle product cycle. So we're able to innovate and bring it to market much quicker. And I think that's what clients are looking for. >> Yeah, so I mean, you brought a startup mentality to the division and of course now, cause your startup guy, let's face it. Now you're going back to the startup world. So the other part of the news is Ed Walsh is joining ChaosSearch as the CEO. ChaosSearches is a local Boston company, they're focused on log analytics but more on we're going to talk about that. So first of all, congratulations. And tell us about your decision. Why ChaosSearch? And you know where you're out there? >> Yeah, listen, if you can tell from the way I describe IBM, I mean, it was a hard decision to leave IBM, but it was a very, very easy decision to go to Chaos, right. So I knew the founder, I knew what he was working on for the last seven years, right. Last five years as a company, and I was just blown away at their fundamental innovation, and how they're really driving like how to get insights at scale from your data lake in the cloud. But also and also instead, and statements slash cost dramatically. And they make it so simple. Simply put your data in your S3 or really Cloud object storage. But right now, it's, Amazon, they'll go the rest of clouds, but just put your data in S3. And what we'll do is we'll index it, give you API so you can search it and query it. And it literally brings a way to do at scale data analysts. And also login analytics on everything you just put into S3 basically bucket. It makes it very simple. And because they're really fundamental, we can go through it. Fundamental on hard technology that data layer, but they kept all the API. So you're using your normal tools that we did for Elastic Search API's. You want to do Glyfada, you want to do Cabana, or you want to do SQL or you want to do use Looker, Tableau, all those work. Which is that's a part of it. It's really revolutionary what they're doing as far as the value prop and we can explain it. But also they made it evolution, it's very easy for clients to go. Just run in parallel, and then they basically turn off what they currently have running. >> So data lakes, really the term became popular during the sort of early big data, Hadoop era. And, Hadoop obviously brought a lot of innovation, you know, leave the data where it is. Bring the compute to the data, really launched the Big Data initiative, but it was very complicated. You had, MapReduce and and elastic MapReduce in the cloud. And, it really was a big batch job, where storage was really kind of a second class citizen, if you will. There wasn't a lot of real time stuff going on. And then, Spark comes in. And still there's this very complicated situation. So it's sounds like, ChaosSearch is really attacking that problem. And the first use case, it's really going after is log analytics. Explain that a little bit more, please. >> Yeah, so listen, they finally went after it with this, it's called a data lake engine for scalable and we'll say log analytics firstly. It was the first use case to go after it. But basically, they allows for log analytics people, everyone does it, and everyone's kind of getting to scale with it, right. But if you asked your IT department, are you even challenged with scale, or cost, or retention levels, but also management overlay of what they're doing on log analytics or security log analytics, or all this machine data they're collecting? The answer be absolutely no, it's a nightmare. It starts easy and becomes a big, very costly application for our environments. And what Chaos does is because they deal with a real issue, which is the data layer, but keep the API's on top. And so people easily use the data insights at scale, what they're able to do is very simply run in parallel and we'll save 80% of your cost, but also get better data retention. Cause there's typically a trade off. Clients basically have this trade off, or it gets really expensive. It gets to scale. So I should just retain less. We have clients that went from nine day retention and security logs to literally four and five days. If they didn't catch it in that time, it was too late. Now what they're able to do is, they're able to go to our solution. Not change what they're doing applications, because you're using the same API's, but literally save 80% and this is millions and 10s of millions of dollars of savings, but also basically get 90 day retention. There's really limitless, whatever you put into your S3 bucket, we're going to give you access to. So that alone shows you that it's literally revolutions that CFO wins because they save money. The IT department wins because they don't that wrestle with this data technology that wasn't really built. It is really built 30 years ago, wasn't built for this volume and velocity of data coming in. And then the data analytics guys, hey, I keep my tool set but I get all the retention I want. No one's limiting me anymore. So it's kind of an easy win win. And it makes it really easy for clients to have this really big benefit for them. And dramatic cost savings. But also you get the scale, which really means a lot in security login or anything else. >> So let's dig into that a little bit. So Cloud Object Storage has kind of become the de facto bucket, if you will. Everybody wants it, because it's simple. It's a get put kind of paradigm. And it's cheap, but it's also got performance issues. So people will throw cash at the problem, they'll have to move data around. So is that the problem that you're solving? Is it a performance? You know, problem is it a cause problem or both? And explain that a little bit. >> Yeah, so it's all over. So basically, if you were building a data lake, they would like to just put all their data in one very cost effective, scalable, resilient environment. And that is Cloud Object Storage, or S3, or every cloud has around, right? You can do also on prem, everyone would love to do that. And then literally get their insights out of it. But they want to go after it with our tools. Is it Search or is it SQL, they want to go after their own tools. That's the vision everyone wants. But what everyone does now is because this is where the core special sauce what ChaosSearch provides, is we built from the ground up. The database, the indexing technology, the database technology, how to actually make your Cloud object storage a database. We don't move it somewhere, we don't cash it. You put it in the inside the bucket, we literally make the Cloud object storage, the database. And then around it, we basically built a Chaos fabric that allows you to spin up compute nodes to go at the data in different ways. We truly have separated that the data from the compute, but also if a worker nodes, beautiful, beauty of like containerization technology, a worker nodes goes away, nothing happens. It's not like what you do on Prem. And all sudden you have to rebuild clusters. So by fundamentally solving that data layer, but really what was interesting is they just published API's, you mentioned put and get. So the API's you're using cloud obvious sources of put and get. Imagine we just added to that API, your Search API from elastic, or your SQL interface. It's just all we're doing is extending. You put it in the bucket will extend your ability to get after it. Really is an API company, but it's a hard tech, putting that data layer together. So you have cost effectiveness, and scale simultaneously. But we can ask for instance, log analytics. We don't cash, nothing's on the SSD, nothing's on local storage. And we're as fast as you're running Elastic Search on SSDs. So we've solved the performance and scale issues simultaneously. And that's really the core fundamental technology. >> And you do that with math, with algorithms, with machine learning, what's the secret sauce? Yeah, we should really have I'll tell you, my founder, just has the right interesting way of looking at problems. And he really looked at this differently and went after how do you make a both, going after data. He really did it in a different way, and really a modern way. And the reason it differentiates itself is he built from the ground up to do this on object storage. Where basically everyone else is using 30 year old technology, right? So even really new up and coming companies, they're using Tableau, Looker, or Snowflake could be another example. They're not changing how the data stored, they always have to move it ETL at somewhere to go after it. We avoid all that. In fact, we're probably a pretty good ecosystem players for all those partners as we go forward. >> So your talking about Tom Hazel, you're founder and CTO and he's brought in the team and they've been working on this for a while. What's his background? >> Launched Telkom, building out God boxes. So he's always been in the database space. I can't do his in my first day of the job, I can't do justice to his deep technology. There's a really good white paper on our website that does that pretty well. But literally the patent technology is a Chaos index, which is a database that it makes your object storage, the database. And then it's really the chaos fabric that puts around in the chaos refinery that gives you virtual views. But that's one solution. And if you look for log analytics, you come in log in and you get all the tools you're used to. But underneath the covers, were just saving about 80% of overall cost, but also almost limitless retention. We see people going from literally have been reduced the number of logs are keeping because of cost, and complexity, and scale, down to literally a very small amount and going right back at nine days. You could do longer, but that's what we see most people go into when they go to our service. >> Let's talk about the market. I mean, as a startup person, you always look for large markets. Obviously, you got to have good tech, a great team. And you want large markets. So the, space that you're in, I mean, I would think it started, early days and kind of the decision support. Sort of morphed into the data warehouse, you mentioned ETL, that's kind of part of it. Business Intelligence, it's sort of all in there. If you look at the EDW market, it's probably around 18 to 20 billion. Small slice of that is data lakes, maybe a billion or a billion plus. And then you got this sort of BI layer on top, you mentioned a lot of those. You got ETL, you probably get up into the 30,35 billion just sort of off the top of my head and from my historical experience and looking at these markets. But I have to say these markets have traditionally failed to live up to the expectations. Things like 360 degree views of the customer, real time analytics, delivering insights and self service to the business. Those are promises that these industries made. And they ended up being cumbersome, slow, maybe requiring real experts, requiring a lot of infrastructure, the cloud is changing that. Is that right? Is that the way to look at the market that you're going after? You're a player inside of that very large team. >> Yeah, I think we're a key fundamental component underneath that whole ecosystem. And yes, you're seeing us build a full stack solution for log analytics, because there's really good way to prove just how game changing the technology is. But also how we publishing API's, and it's seamless for how you're using log analytics. Same thing can be applied as we go across the SQL and different BI and analytic type of platforms. So it's exactly how we're looking at the market. And it's those players that are all struggling with the same thing. How they add more value to clients? It's a big cost game, right? So if I can literally make your underlying how you store your data and mix it literally 80% more cost effective. that's a big deal or simultaneously saving 80% and give you much longer retention. Those two things are typically, Lily a trade off, you have to go through, and we don't have to do that. That's what really makes this kind of the underlying core technology. And really I look at log analytics is really the first application set. But or if you have any log analytics issues, if you talk to your teams and find out, scale, cost, management issues, it's a pretty we make it very easy. Just run in parallel, we'll do a PLC, and you'll see how easy it is you can just save 80% which is, 80% and better retention is really the value proposition you see at scale, right. >> So this is day zero for you. Give us the hundred day plan, what do you want to accomplish? Where are you going to focus your priorities? I mean, obviously, the company's been started, it's well funded, but where are you going to focus in the next 100 days? >> No, I think it's building out where are we taking the next? There's a lot of things we could do, there's degrees of freedom as far as where we'd go with this technology is pretty wide. You're going to see us be the best log analytic company there. We're getting, really a (mumbling) we, you saw the announcement, best quarter ever last quarter. And you're seeing this nice as a service ramp, you're going to see us go to VPC. So you can do as a service with us, but now we can put this same thing in your own virtual private data center. You're going to see us go to Google, Azure, and also IBM cloud. And the really, clients are driving this. It's not us driving it, but you're going to see actually the client. So we'll go into Google because we had a couple financial institutions that are saying they're driving us to go do exactly that. So it's more really working with our client sets and making sure we got the right roadmap to support what they're trying to do. And then the ecosystem is another play. How to, you know, my core technology is not necessarily competitive with anyone else. No one else is doing this. They're just kind of, hey, move it here, I'll put it on this, you know, a foundational DV or they'll put it on on a presto environment. They're not really worried about the bottom line economics, which is really that's the value prop and that's the hard tech and patented technology that we bring to this ecosystem. >> Well, people are definitely worried about their cloud bills. The the CFO saying, whoa, cause it's so easy to spin up, instances in the cloud. And so, Ed it really looks like you're going after a real problem. You got some great tech behind you. And of course, we love the fact that it's another Boston based company that you're joining, cause it's more Boston based startups. Better for us here at the East Coast Cube, so give us a give us your final thoughts. What should we look for? I'm sure we're going to be being touched and congratulations. >> No, hey, thank you for the time. I'm really excited about this. I really just think it's fundamental technology that allows us to get the most out of everything you're doing around analytics in the cloud. And if you look at a data lake model, I think that's our philosophy. And we're going to drive it pretty aggressively. And I think it's a good fundamental innovation for the space and that's the type of tech that I like. And I think we can also, do a lot of partnering across ecosystems to make it work for a lot of different people. So anyway, so I guess thank you very much for the time appreciate. >> Yeah, well, thanks for coming on theCUBE and best of luck. I'm sure we're going to be learning a lot more and hearing a lot more about ChaosSearch, Ed Walsh. This is Dave Vellante. Thank you for watching everybody, and we'll see you next time on theCUBE. (upbeat music)

Published Date : Aug 7 2020

SUMMARY :

leaders all around the world, And Ed Walsh is here to talk about that. So the bad news is Ed Walsh is leaving IBM And it's really about the team. And I asked you on theCUBE, of the storage portfolio. So in the height of the pa, the And I think that's what And you know where you're out there? So I knew the founder, I knew And the first use case, So that alone shows you that So is that the problem And that's really the core And the reason it differentiates he's brought in the team I can't do his in my first day of the job, And then you got this and give you much longer retention. I mean, obviously, the And the really, clients are driving this. And of course, And if you look at a data lake model, and we'll see you next time on theCUBE.

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Tom Hazel	PERSON	0.99+
80%	QUANTITY	0.99+
October 19	DATE	0.99+
Ed	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Ed Walsh	PERSON	0.99+
90 day	QUANTITY	0.99+
Palo Alto	LOCATION	0.99+
ChaosSearches	ORGANIZATION	0.99+
April	DATE	0.99+
July	DATE	0.99+
ChaosSearch	ORGANIZATION	0.99+
nine day	QUANTITY	0.99+
millions	QUANTITY	0.99+
August 2020	DATE	0.99+
four	QUANTITY	0.99+
Boston	LOCATION	0.99+
Chaos	ORGANIZATION	0.99+
360 degree	QUANTITY	0.99+
30,35 billion	QUANTITY	0.99+
two things	QUANTITY	0.99+
nine days	QUANTITY	0.99+
five days	QUANTITY	0.99+
last quarter	DATE	0.99+
Snowflake	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
two year	QUANTITY	0.99+
Looker	ORGANIZATION	0.99+
S3	TITLE	0.99+
Telkom	ORGANIZATION	0.99+
SQL	TITLE	0.99+
Enterprise Technology Research	ORGANIZATION	0.98+
East Coast Cube	ORGANIZATION	0.98+
a billion	QUANTITY	0.98+
30 years ago	DATE	0.98+
Tableau	TITLE	0.98+
four and a half year	QUANTITY	0.98+
Four years ago	DATE	0.98+
one	QUANTITY	0.98+
both	QUANTITY	0.98+
Elastic Search	TITLE	0.97+
today	DATE	0.97+
Cabana	TITLE	0.97+
one solution	QUANTITY	0.97+
One	QUANTITY	0.97+
first day	QUANTITY	0.97+
ETR	ORGANIZATION	0.97+
first use case	QUANTITY	0.96+
theCUBE Studios	ORGANIZATION	0.96+
VPC	ORGANIZATION	0.96+
about 80%	QUANTITY	0.96+
30 year old	QUANTITY	0.95+
Looker	TITLE	0.95+
last three quarter	DATE	0.94+
third place	QUANTITY	0.93+

Christian Romming, Etleap | AWS re:Invent 2019

>>LA from Las Vegas. It's the cube covering AWS reinvent 2019, brought to you by Amazon web services and along with its ecosystem partners. >>Oh, welcome back. Inside the sands, we continue our coverage here. Live coverage on the cube of AWS. Reinvent 2019. We're in day three at has been wall to wall, a lot of fun here. Tuesday, Wednesday now Thursday. Dave Volante. I'm John Walls and we're joined by Christian Rahman who was the founder and CEO of for Christian. Good morning to you. Good morning. Thanks for having afternoon. If you're watching on the, uh, on the East coast right now. Um, let's talk about sleep a little bit. I know you're all about data, um, but let's go ahead and introduce the company to those at home who might not be familiar with what your, your poor focus was. The primary focus. Absolutely. So athlete is a managed ETL as a service company. ETL is extract, transform, and load basically about getting data from different data sources, like different applications and databases into a place where it can be analyzed. >>Typically a data warehouse or a data Lake. So let's talk about the big picture then. I mean, because this has been all about data, right? I mean, accessing data, coming from the edge, coming from multiple sources, IOT, all of this, right? You had this proliferation of data and applications that come with that. Um, what are you seeing that big picture wise in terms of what people are doing with their data, how they're trying to access their data, how to turn to drive more value from it and how you serve all those masters, if you will. So there are a few trends that we see these days. One is a, you know, an obvious one that data warehouses are moving to the cloud, right? So, you know, uh, companies used to have, uh, data warehouses on premises and now they're in the cloud. They're, uh, cheaper and um, um, and more scalable, right? With services like a Redshift and snowflake in particular on AWS. Um, and then, uh, another trend is that companies have a lot more applications than they used to. You know, in the, um, in the old days you would have maybe a few data ware, sorry, databases, uh, on premises that you would integrate into your data warehouses. Nowadays you have companies have hundreds or even thousands of applications, um, that effectively become data silos, right? Where, um, uh, analysts are seeing value in that data and they want to want to have access to it. >>So, I mean, ETL is obviously not going away. I mean, it's been here forever and it'll, it'll be here forever. The challenge with ETL has always been it's cumbersome and it's expensive. It's, and now we have this new cloud era. Um, how are you guys changing ETL? >>Yeah. ETL is something that everybody would like to see go away. Everybody would just like, not to do it, but I just want to get access to their data and it should be very unfortunate for you. Right. Well, so we started, uh, we started athlete because we saw that ETL is not going away. In fact, with all the, uh, all these applications and all these needs that analysts have, it's actually becoming a bigger problem than it used to be. Um, and so, uh, what we wanted to do is basically take, take some of that pain out, right? So that companies can get to analyzing their data faster and with less engineering effort. >>Yeah. I mean, you hear this, you know, the typical story is that data scientists spend 80% of their time wrangling data and it's, and it's true in any situation. So, um, are you trying to simplify, uh, or Cloudify ETL? And if so, how are you doing that? >>So with, uh, with the growth in the number of data analysts and the number of data analytics projects that companies wants to take on the, the traditional model of having a few engineers that know how to basically make the data available for analysts, that that model is essentially now broken. And so, uh, just like you want to democratize, uh, BI and democratize analytics, you essentially have to democratize ETL as well, right? Basically that process of making the data ready for analysis. And, uh, and that is really what we're doing at athlete. We're, we're opening up ETL to a much broader audience. >>So I'm interested in how I, so I'm in pain. It's expensive. It's time consuming. Help me Christian, how, how can you help me, sir? >>So, so first of all, we're, we're, um, uh, at least specifically we're a hundred percent AWS, so we're deeply focused on, uh, Redshift data warehouses and S3 and good data lakes. Uh, and you know, there's tremendous amount of innovation. Um, those two sort of sets of technologies now, um, Redshift made a bunch of very cool announcements era at AWS reinvent this year. Um, and so what we do is we take the, uh, the infrastructure piece out, you know, so you can deploy athlete as a hosted service, uh, where we manage all the infrastructure for you or you can deploy it within your VPC. Um, again, you know, in a much, much simplified way, uh, compared to a traditional ETL technologies. Um, and then, you know, beyond that taking, uh, building pipelines, you know, building data pipelines used to be something that would take engineers six months to 18 months, something like that. But, um, but now what we, what we see is companies using athlete, they're able to do it much faster often, um, often an hours or days. >>A couple of questions there. So it's exclusively red shift, is that right? Or other analytic databases and make is >>a hundred percent AWS we're deeply focused on, on integrating well with, with AWS technologies and services. So, um, so on the data warehousing side, we support Redshift and snowflake. >>Okay, great. So I was going to ask you if snowflake was part of that. So, well you saw red shift kind of, I sort of tongue in cheek joke. They took a page out of snowflake separating compute and storage that's going to make customers very happen so they get happy. So they can scale that independently. But there's a big trend going on. I wonder if you can address it in your, you were pointing out before that there's more data sources now because of the cloud. We were just having that conversation and you're seeing the data exchange, more data sources, things like Redshift and snowflake, uh, machine intelligence, other tools like Databricks coming in at the Sage maker, a Sage maker studios, making it simpler. So it's just going to keep going faster and faster and faster, which creates opportunities for you guys. So are you seeing that trend? It's almost like a new wave of compute and workload coming into the cloud? >>Yeah, it's, it's super interesting. Companies can now access, um, a lot more data, more varied data, bigger volumes of data that they could before and um, and they want faster access to it, both in terms of the time that it takes to, you know, to, to bite zero, right? Like the time, the time that it takes to get to the first, uh, first analysis. Um, and also, um, and also in terms of the, the, the data flow itself, right? They, they not want, um, up to the second or up to the millisecond, um, uh, essentially fresh data, uh, in their dashboards and for interactive analysis. And what about the analytics side of this then when we were talking about, you know, warehousing but, but also having access to it and doing something with it. Um, what's that evolution looking like now in this new world? So lots of, um, lots of new interesting technologies there to, um, um, you know, on the, on the BI side and, um, and our focus is on, on integrating really well with the warehouses and lakes so that those, those BI tools can plug in and, and, um, um, and, and, you know, um, get access to the data straight away. Okay. >>So architecturally, why are you, uh, how are you solving the problem? Why are you able to simplify? I'm presuming it's all built in the cloud. That's been, that's kind of an obvious one. Uh, but I wonder if you could talk about that a little bit because oftentimes when we talk to companies that have started born in the cloud, John furrier has been using this notion of, you know, cloud native. Well, the meme that we've started is you take out the T it cloud native and it's cloud naive. So you're cloud native. Now what happens oftentimes with cloud native guys is much simpler, faster, lower cost, agile, you know, cloud mentality. But maybe some, sometimes it's not as functional as a company that's been around for 40 years. So you have to build that up. What's the state of ETL, you know, in your situation. Can you maybe describe that a little bit? How is it that the architecture is different and how address functionality? >>Yeah, I mean, um, so a couple of things there. Uh, um, you, you mentioned Redshift earlier and how they now announce the separation of storage and compute. I think the same is true for e-tail, right? We can, we can build on, um, on these great services that AWS develops like S three and, and, uh, a database migration service and easy to, um, elastic MapReduce, right? We can, we can take advantage of all these, all these cloud primitives and um, um, and, and so the, the infrastructure becomes operationally, uh, easier that way. Um, and, and less expensive and all, all those good things. >>You know, I wonder, Christian, if I can ask you something, given you where you live in a complicated world, I mean, data's complicated and it's getting more complicated. We heard Andy Jassy on Tuesday really give a message to the, to the enterprise. It wasn't really so much about the startups as it previously been at, at AWS reinvent. I mean, certainly talking to developers, but he, he was messaging CEOs. He had two or three CEOs on stage. But what we're describing here with, with red shift, and I threw in Databricks age maker, uh, elastic MapReduce, uh, your tooling. Uh, we just had a company on that. Does governance and, and builders have to kind of cobble these things together? Do you see an opportunity to actually create solutions for the enterprise or is that antithetical to the AWS cloud model? What, what are your thoughts? >>Oh, absolutely know them. Um, uh, these cloud services are, are fantastic primitives, but um, but enterprises clearly have a lot of, and we, we're seeing a lot of that, right? We started out in venture Bactec and, and, and got, um, a lot of, a lot of venture backed tech companies up and running quickly. But now that we're sort of moving up market and, and uh, and into the enterprise, we're seeing that they have a requirements that go way beyond, uh, beyond what, what venture tech, uh, needs. Right. And in terms of security, governance, you know, in, in ETL specifically, right? That that manifests itself in terms of, uh, not allowing data to flow out of, of the, the company's virtual private cloud for example. That's something that's very important in enterprise, a much less important than in, uh, in, in venture-backed tech. Um, data lineage. Right? That's another one. Understanding how data, uh, makes it from, you know, all those sources into the warehouse. What happens along the way. Right. And, and regulated industries in particular, that's very important. >>Yeah. I mean, I, you know, AWS is mindset is we got engineers, we're going to throw engineers at the problem and solve it. Many enterprises look at it differently. We'll pay money to save time, you know, cause we don't have the time. We don't have the resource, I feel like I, I'd like to see sort of a increasing solutions focus. Maybe it's the big SIS that provide that. Now are you guys in the marketplace today? We are. Yup. That's awesome. So how's that? How's that going? >>Yeah. Um, you mean AWS market? Yes. Yes. Uh, yeah, it's, it's um, um, that's definitely one, one channel that, uh, where there's a lot of, a lot of promise I think both. Um, for, for for enterprise companies. Yeah. >>Cause I mean, you've got to work it obviously it doesn't, just the money just doesn't start rolling in you gotta you gotta market yourselves. >>But that's definitely simplifies that, um, that model. Right? So delivering, delivering solutions to the enterprise for sure. So what's down the road for you then, uh, from, from ETL leaps perspectives here or at leaps perspectives. Um, you've talked about the complexities and what's occurred and you're not going away. ETL is here to say problems are getting bigger. What do you see the next year, 12, 18, 24 months as far as where you want to focus on? What do you think your customers are going to need you to focus on? So the big challenge, right is that, um, um, bigger and bigger companies now are realizing that there is a ton of value in their data, in all these applications, right? But in order to, in order to get value out of it, um, you have to put, uh, engineering effort today into building and maintaining these data pipelines. >>And so, uh, so yeah, so our focus is on reducing that, reducing those engineering requirements. Um, right. So that both in terms of infrastructure, pipeline, operation, pipeline setup, uh, and, and those kinds of things. So where, uh, we believe that a lot of that that's traditionally been done with specialized engineering can be done with great software. So that's, that's what we're focused on building. I love the, you know, the company tagged the perfect data pipeline. I think of like the perfect summer, the guy catching a big wave out in Maui or someplace. Good luck on catching that perfect data pipeline you guys are doing. You're solving a real problem regulations. Yeah. Good to meet you. That cause more. We are alive at AWS reinvent 2019 and you are watching the cube.

Published Date : Dec 5 2019

SUMMARY :

AWS reinvent 2019, brought to you by Amazon web services Inside the sands, we continue our coverage here. Um, what are you seeing that big picture wise in terms of what people are doing how are you guys changing ETL? So that companies can get to analyzing their data faster and with less engineering effort. So, um, are you trying to simplify, And so, uh, just like you want to democratize, uh, Help me Christian, how, how can you help me, sir? Um, and then, you know, beyond that taking, So it's exclusively red shift, is that right? So, um, so on the data warehousing side, we support Redshift and snowflake. So are you seeing that trend? both in terms of the time that it takes to, you know, to, to bite zero, right? born in the cloud, John furrier has been using this notion of, you know, you mentioned Redshift earlier and how they now announce the separation of storage and compute. Do you see an opportunity to actually create Understanding how data, uh, makes it from, you know, all those sources into the warehouse. time, you know, cause we don't have the time. it's um, um, that's definitely one, one channel that, uh, where there's a lot of, So what's down the road for you then, uh, from, from ETL leaps perspectives I love the, you know, the company tagged the perfect data pipeline.

ENTITIES

Entity	Category	Confidence
Dave Volante	PERSON	0.99+
two	QUANTITY	0.99+
Christian Rahman	PERSON	0.99+
John Walls	PERSON	0.99+
Christian Romming	PERSON	0.99+
80%	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
Andy Jassy	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Las Vegas	LOCATION	0.99+
Tuesday	DATE	0.99+
six months	QUANTITY	0.99+
hundreds	QUANTITY	0.99+
Maui	LOCATION	0.99+
Sage	ORGANIZATION	0.99+
first	QUANTITY	0.99+
LA	LOCATION	0.99+
18 months	QUANTITY	0.99+
both	QUANTITY	0.99+
Databricks	ORGANIZATION	0.99+
thousands	QUANTITY	0.99+
next year	DATE	0.98+
first analysis	QUANTITY	0.98+
18	QUANTITY	0.98+
one	QUANTITY	0.98+
today	DATE	0.98+
Redshift	TITLE	0.98+
S three	TITLE	0.97+
Bactec	ORGANIZATION	0.97+
John furrier	PERSON	0.97+
zero	QUANTITY	0.96+
Thursday	DATE	0.95+
12	QUANTITY	0.95+
hundred percent	QUANTITY	0.95+
this year	DATE	0.95+
Wednesday	DATE	0.94+
applications	QUANTITY	0.94+
second	QUANTITY	0.94+
one channel	QUANTITY	0.93+
S3	TITLE	0.91+
ETL	TITLE	0.9+
Sage maker	ORGANIZATION	0.9+
24 months	QUANTITY	0.9+
day three	QUANTITY	0.9+
red shift	TITLE	0.89+
two sort	QUANTITY	0.88+
three CEOs	QUANTITY	0.87+
an	QUANTITY	0.86+
Etleap	PERSON	0.83+
venture	ORGANIZATION	0.83+
ETL	ORGANIZATION	0.81+
40 years	QUANTITY	0.81+
MapReduce	ORGANIZATION	0.79+
2019	DATE	0.78+
couple	QUANTITY	0.78+
Redshift	ORGANIZATION	0.77+
snowflake	TITLE	0.77+
One	QUANTITY	0.73+
Cloudify	TITLE	0.67+
Christian	ORGANIZATION	0.67+
2019	TITLE	0.54+
Invent	EVENT	0.47+

Colin Mahony, Vertica | MIT CDOIQ 2019

>> From Cambridge, Massachusetts, it's theCUBE, covering MIT Chief Data Officer and Information Quality Symposium 2019, brought to you by SiliconANGLE Media. >> Welcome back to Cambridge, Massachusetts everybody, you're watching The Cube, the leader in tech coverage. My name is Dave Vellante here with my cohost Paul Gillin. This is day one of our two day coverage of the MIT CDOIQ conferences. CDO, Chief Data Officer, IQ, information quality. Colin Mahoney is here, he's a good friend and long time CUBE alum. I haven't seen you in awhile, >> I know >> But thank you so much for taking some time, you're like a special guest here >> Thank you, yeah it's great to be here, thank you. >> Yeah, so, this is not, you know, something that you would normally attend. I caught up with you, invited you in. This conference has started as, like back office governance, information quality, kind of wonky stuff, hidden. And then when the big data meme took off, kind of around the time we met. The Chief Data Officer role emerged, the whole Hadoop thing exploded, and then this conference kind of got bigger and bigger and bigger. Still intimate, but very high level, very senior. It's kind of come full circle as we've been saying, you know, information quality still matters. You have been in this data business forever, so I wanted to invite you in just to get your perspectives, we'll talk about what's new with what's going on in your company, but let's go back a little bit. When we first met and even before, you saw it coming, you kind of invested your whole career into data. So, take us back 10 years, I mean it was so different, remember it was Batch, it was Hadoop, but it was cool. There was a lot of cool >> It's still cool. (laughs) projects going on, and it's still cool. But, take a look back. >> Yeah, so it's changed a lot, look, I got into it a while ago, I've always loved data, I had no idea, the explosion and the three V's of data that we've seen over the last decade. But, data's really important, and it's just going to get more and more important. But as I look back I think what's really changed, and even if you just go back a decade I mean, there's an insatiable appetite for data. And that is not slowing down, it hasn't slowed down at all, and I think everybody wants that perfect solution that they can ask any question and get an immediate answers to. We went through the Hadoop boom, I'd argue that we're going through the Hadoop bust, but what people actually want is still the same. You know, they want real answers, accurate answers, they want them quickly, and they want it against all their information and all their data. And I think that Hadoop evolved a lot as well, you know, it started as one thing 10 years ago, with MapReduce and I think in the end what it's really been about is disrupting the storage market. But if you really look at what's disrupting storage right now, public clouds, S3, right? That's the new data league. So there's always a lot of hype cycles, everybody talks about you know, now it's Cloud, everything, for maybe the last 10 years it was a lot of Hadoop, but at the end of the day I think what people want to do with data is still very much the same. And a lot of companies are still struggling with it, hence the role for Chief Data Officers to really figure out how do I monetize data on the one hand and how to I protect that asset on the other hand. >> Well so, and the cool this is, so this conference is not a tech conference, really. And we love tech, we love talking about this, this is why I love having you on. We kind of have a little Vertica thread that I've created here, so Colin essentially, is the current CEO of Vertica, I know that's not your title, you're GM and Senior Vice President, but you're running Vertica. So, Michael Stonebreaker's coming on tomorrow, >> Yeah, excellent. >> Chris Lynch is coming on tomorrow, >> Oh, great, yeah. >> we've got Andy Palmer >> Awesome, yeah. >> coming up as well. >> Pretty cool. (laughs) >> So we have this connection, why is that important? It's because, you know, Vertica is a very cool company and is all about data, and it was all about disrupting, sort of the traditional relational database. It's kind of doing more with data, and if you go back to the roots of Vertica, it was like how do you do things faster? How do you really take advantage of data to really drive new business? And that's kind of what it's all about. And the tech behind it is really cool, we did your conference for many, many years. >> It's coming back by the way. >> Is it? >> Yeah, this March, so March 30th. >> Oh, wow, mark that down. >> At Boston, at the new Encore Hotel. >> Well we better have theCUBE there, bro. (laughs) >> Yeah, that's great. And yeah, you've done that conference >> Yep. >> haven't you before? So very cool customers, kind of leading edge, so I want to get to some of that, but let's talk the disruption for a minute. So you guys started with the whole architecture, MPP and so forth. And you talked about Cloud, Cloud really disrupted Hadoop. What are some of the other technology disruptions that you're seeing in the market space? >> I think, I mean, you know, it's hard not to talk about AI machine learning, and what one means versus the other, who knows right? But I think one thing that is definitely happening is people are leveraging the volumes of data and they're trying to use all the processing power and storage power that we have to do things that humans either are too expensive to do or simply can't do at the same speed and scale. And so, I think we're going through a renaissance where a lot more is being automated, certainly on the Vertica roadmap, and our path has always been initially to get the data in and then we want the platform to do a lot more for our customers, lots more analytics, lots more machine-learning in the platform. So that's definitely been a lot of the buzz around, but what's really funny is when you talk to a lot of customers they're still struggling with just some basic stuff. Forget about the predictive thing, first you've got to get to what happened in the past. Let's give accurate reporting on what's actually happening. The other big thing I think as a disruption is, I think IOT, for all the hype that it's getting it's very real. And every device is kicking off lots of information, the feedback loop of AB testing or quality testing for predictive maintenance, it's happening almost instantly. And so you're getting massive amounts of new data coming in, it's all this machine sensor type data, you got to figure out what it means really quick, and then you actually have to do something and act on it within seconds. And that's a whole new area for so many people. It's not their traditional enterprise data network warehouse and you know, back to you comment on Stonebreaker, he got a lot of this right from the beginning, you know, and I think he looked at the architectures, he took a lot of the best in class designs, we didn't necessarily invent everything, but we put a lot of that together. And then I think the other you've got to do is constantly re-invent your platform. We came out with our Eon Mode to run cloud native, we just got rated the best cloud data warehouse from a net promoter score rating perspective, so, but we got to keep going you know, we got to keep re-inventing ourselves, but leverage everything that we've done in the past as well. >> So one of the things that you said, which is kind of relevant for here, Paul, is you're still seeing a real data quality issue that customers are wrestling with, and that's a big theme here, isn't it? >> Absolutely, and the, what goes around comes around, as Dave said earlier, we're still talking about information quality 13 years after this conference began. Have the tools to improve quality improved all that much? >> I think the tools have improved, I think that's another area where machine learning, if you look at Tamr, and I know you're going to have Andy here tomorrow, they're leveraging a lot of the augmented things you can do with the processing to make it better. But I think one thing that makes the problem worse now, is it's gotten really easy to pour data in. It's gotten really easy to store data without having to have the right structure, the right quality, you know, 10 years ago, 20 years ago, everything was perfect before it got into the platform. Right, everything was, there was quality, everything was there. What's been happening over the last decade is you're pumping data into these systems, nobody knows if it's redundant data, nobody knows if the quality's any good, and the amount of data is massive. >> And it's cheap to store >> Very cheap to store. >> So people keep pumping it in. >> But I think that creates a lot of issues when it comes to data quality. So, I do think the technology's gotten better, I think there's a lot of companies that are doing a great job with it, but I think the challenge has definitely upped. >> So, go ahead. >> I'm sorry. You mentioned earlier that we're seeing the death of Hadoop, but I'd like you to elaborate on that becuase (Dave laughs) Hadoop actually came up this morning in the keynote, it's part of what GlaxoSmithKline did. Came up in a conversation I had with the CEO of Experian last week, I mean, it's still out there, why do you think it's in decline? >> I think, I mean first of all if you look at the Hadoop vendors that are out there, they've all been struggling. I mean some of them are shutting down, two of them have merged and they've got killed lately. I think there are some very successful implementations of Hadoop. I think Hadoop as a storage environment is wonderful, I think you can process a lot of data on Hadoop, but the problem with Hadoop is it became the panacea that was going to solve all things data. It was going to be the database, it was going to be the data warehouse, it was going to do everything. >> That's usually the kiss of death, isn't it? >> It's the kiss of death. And it, you know, the killer app on Hadoop, ironically, became SQL. I mean, SQL's the killer app on Hadoop. If you want to SQL engine, you don't need Hadoop. But what we did was, in the beginning Mike sort of made fun of it, Stonebreaker, and joked a lot about he's heard of MapReduce, it's called Group By, (Dave laughs) and that created a lot of tension between the early Vertica and Hadoop. I think, in the end, we embraced it. We sit next to Hadoop, we sit on top of Hadoop, we sit behind it, we sit in front of it, it's there. But I think what the reality check of the industry has been, certainly by the business folks in these companies is it has not fulfilled all the promises, it has not fulfilled a fraction on the promises that they bet on, and so they need to figure those things out. So I don't think it's going to go away completely, but I think its best success has been disrupting the storage market, and I think there's some much larger disruptions of technologies that frankly are better than HTFS to do that. >> And the Cloud was a gamechanger >> And a lot of them are in the cloud. >> Which is ironic, 'cause you know, cloud era, (Colin laughs) they didn't really have a cloud strategy, neither did Hortonworks, neither did MapR and, it just so happened Amazon had one, Google had one, and Microsoft has one, so, it's just convenient to-- >> Well, how is that affecting your business? We've seen this massive migration to the cloud (mumbles) >> It's actually been great for us, so one of the things about Vertica is we run everywhere, and we made a decision a while ago, we had our own data warehouse as a service offering. It might have been ahead of its time, never really took off, what we did instead is we pivoted and we say "you know what? "We're going to invest in that experience "so it's a SaaS-like experience, "but we're going to let our customers "have full control over the cloud. "And if they want to go to Amazon they can, "if they want to go to Google they can, "if they want to go to Azure they can." And we really invested in that and that experience. We're up on the Amazon marketplace, we have lots of customers running up on Amazon Cloud as well as Google and Azure now, and then about two years ago we went down and did this endeavor to completely re-architect our product so that we could separate compute and storage so that our customers could actually take advantage of the cloud economics as well. That's been huge for us, >> So you scale independent-- >> Scale independently, cloud native, add compute, take away compute, and for our existing customers, they're loving the hybrid aspect, they love that they can still run on Premise, they love that they can run up on a public cloud, they love that they can run in both places. So we will continue to invest a lot in that. And it is really, really important, and frankly, I think cloud has helped Vertica a lot, because being able to provision hardware quickly, being able to tie in to these public clouds, into our customers' accounts, give them control, has been great and we're going to continue on that path. >> Because Vertica's an ISV, I mean you're a software company. >> We're a software company. >> I know you were a part of HP for a while, and HP wanted to mash that in and run it on it's hardware, but software runs great in the cloud. And then to you it's another hardware platform. >> It's another hardware platform, exactly. >> So give us the update on Micro Focus, Micro Focus acquired Vertica as part of the HPE software business, how many years ago now? Two years ago? >> Less than two years ago. >> Okay, so how's that going, >> It's going great. >> Give us the update there. >> Yeah, so first of all it is great, HPE and HP were wonderful to Vertica, but it's great being part of a software company. Micro Focus is a software company. And more than just a software company it's a company that has a lot of experience bridging the old and the new. Leveraging all of the investments that you've made but also thinking about cloud and all these other things that are coming down the pike. I think for Vertica it's been really great because, as you've seen Vertica has gotten its identity back again. And that's something that Micro Focus is very good at. You can look at what Micro Focus did with SUSE, the Linux company, which actually you know, now just recently spun out of Micro Focus but, letting organizations like Vertica that have this culture, have this product, have this passion, really focus on our market and our customers and doing the right thing by them has been just really great for us and operating as a software company. The other nice thing is that we do integrate with a lot of other products, some of which came from the HPE side, some of which came from Micro Focus, security products is an example. The other really nice thing is we've been doing this insource thing at Micro Focus where we open up our source code to some of the other teams in Micro Focus and they've been contributing now in amazing ways to the product. In ways that we would just never be able to scale, but with 4,000 engineers strong in Micro Focus, we've got a much larger development organization that can actually contribute to the things that Vertica needs to do. And as we go into the cloud and as we do a lot more operational aspects, the experience that these teams have has been incredible, and security's another great example there. So overall it's been great, we've had four different owners of Vertica, our job is to continue what we do on the innovation side in the culture, but so far Micro Focus has been terrific. >> Well, I'd like to say, you're kind of getting that mojo back, because you guys as an independent company were doing your own thing, and then you did for a while inside of HP, >> We did. >> And that obviously changed, 'cause they wanted more integration, but, and Micro Focus, they know what they're doing, they know how to do acquisitions, they've been very successful. >> It's a very well run company, operationally. >> The SUSE piece was really interesting, spinning that out, because now RHEL is part of IBM, so now you've got SUSE as the lone independent. >> Yeah. >> Yeah. >> But I want to ask you, go back to a technology question, is NoSQL the next Hadoop? Are these databases, it seems to be that the hot fad now is NoSQL, it can do anything. Is the promise overblown? >> I think, I mean NoSQL has been out almost as long as Hadoop, and I, we always say not only SQL, right? Mike's said this from day one, best tool for the job. Nothing is going to do every job well, so I think that there are, whether it's key value stores or other types of NoSQL engines, document DB's, now you have some of these DB's that are running on different chips, >> Graph, yeah. >> there's always, yeah, graph DBs, there's always going to be specialty things. I think one of the things about our analytic platform is we can do, time series is a great example. Vertica's a great time series database. We can compete with specialized time series databases. But we also offer a lot of, the other things that you can do with Vertica that you wouldn't be able to do on a database like that. So, I always think there's going to be specialty products, I also think some of these can do a lot more workloads than you might think, but I don't see as much around the NoSQL movement as say I did a few years ago. >> But so, and you mentioned the cloud before as kind of, your position on it I think is a tailwind, not to put words in your mouth, >> Yeah, yeah, it's a great tailwind. >> You're in the Amazon marketplace, I mean they have products that are competitive, right? >> They do, they do. >> But, so how are you differentiating there? >> I think the way we differentiate, whether it's Redshift from Amazon, or BigQuery from Google, or even what Azure DB does is, first of all, Vertica, I think from, feature functionality and performance standpoint is ahead. Number one, I think the second thing, and we hear this from a lot of customers, especially at the C-level is they don't want to be locked into these full stacks of the clouds. Having the ability to take a product and run it across multiple clouds is a big thing, because the stack lock-in now, the full stack lock-in of these clouds is scary. It's really easy to develop in their ecosystems but you get very locked into them, and I think a lot of people are concerned about that. So that works really well for Vertica, but I think at the end of the day it's just, it's the robustness of the product, we continue to innovate, when you look at separating compute and storage, believe it or not, a lot of these cloud-native databases don't do that. And so we can actually leverage a lot of the cloud hardware better than the native cloud databases do themselves. So, like I said, we have to keep going, those guys aren't going to stop, and we actually have great relationships with those companies, we work really well with the clouds, they seem to care just as much about their cloud ecosystem as their own database products, and so I think that's going to continue as well. >> Well, Colin, congratulations on all the success >> Yeah, thank you, yeah. >> It's awesome to see you again and really appreciate you coming to >> Oh thank you, it's great, I appreciate the invite, >> MIT. >> it's great to be here. >> All right, keep it right there everybody, Paul and I will be back with our next guest from MIT, you're watching theCUBE. (electronic jingle)

Published Date : Jul 31 2019

SUMMARY :

brought to you by SiliconANGLE Media. I haven't seen you in awhile, kind of around the time we met. It's still cool. but at the end of the day I think is the current CEO of Vertica, (laughs) and if you go back to the roots of Vertica, at the new Encore Hotel. Well we better have theCUBE there, bro. And yeah, you've done that conference but let's talk the disruption for a minute. but we got to keep going you know, Have the tools to improve quality the right quality, you know, But I think that creates a lot of issues but I'd like you to elaborate on that becuase I think you can process a lot of data on Hadoop, and so they need to figure those things out. so one of the things about Vertica is we run everywhere, and frankly, I think cloud has helped Vertica a lot, I mean you're a software company. And then to you it's another hardware platform. the Linux company, which actually you know, and Micro Focus, they know what they're doing, so now you've got SUSE as the lone independent. is NoSQL the next Hadoop? Nothing is going to do every job well, the other things that you can do with Vertica and so I think that's going to continue as well. Paul and I will be back with our next guest from MIT,

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Andy Palmer	PERSON	0.99+
Paul Gillin	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Colin Mahoney	PERSON	0.99+
Paul	PERSON	0.99+
Colin	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Vertica	ORGANIZATION	0.99+
Chris Lynch	PERSON	0.99+
HPE	ORGANIZATION	0.99+
Michael Stonebreaker	PERSON	0.99+
HP	ORGANIZATION	0.99+
Micro Focus	ORGANIZATION	0.99+
Hadoop	TITLE	0.99+
Colin Mahony	PERSON	0.99+
last week	DATE	0.99+
Andy	PERSON	0.99+
March 30th	DATE	0.99+
NoSQL	TITLE	0.99+
Mike	PERSON	0.99+
Experian	ORGANIZATION	0.99+
tomorrow	DATE	0.99+
SQL	TITLE	0.99+
two day	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Boston	LOCATION	0.99+
Cambridge, Massachusetts	LOCATION	0.99+
4,000 engineers	QUANTITY	0.99+
Two years ago	DATE	0.99+
SUSE	TITLE	0.99+
Azure DB	TITLE	0.98+
second thing	QUANTITY	0.98+
20 years ago	DATE	0.98+
10 years ago	DATE	0.98+
one	QUANTITY	0.98+
Vertica	TITLE	0.98+
Hortonworks	ORGANIZATION	0.97+
MapReduce	ORGANIZATION	0.97+
one thing	QUANTITY	0.97+

Gabe Monroy, Microsoft & Tim Hockin, Google | KubeCon + CloudNativeCon EU 2019

>>Live from Barcelona, Spain, execute covering CubeCon cloud native con Europe, 2019 onto you by red hat, the cloud native computing foundation and ecosystem partners. >>Welcome back. We're here in Barcelona, Spain where 7,700 attendees are here for Q con cloud native con. I'm Stu Miniman and this is the cubes live two day coverage having to have on the program to returning guests to talk about five years of Kubernetes. To my right is Tim Hawkin wearing the Barna contributors shirt. Uh, and uh, sitting to his right is gay Bon Roy. So, uh, I didn't introduce their titles and companies, but you know, so Tim's and Google gives it Microsoft, uh, but you know, heavily involvement in uh, you know, Coobernetti's since the very early days. I mean, you know, Tim, you're, you're on the Wikipedia page game, you know, I think we have to do some re editing to make sure we get the community expanded in some of the major contributors and get you on there. But gentlemen, thanks so much for joining us. Thanks for having us. >>Alright. Uh, so, you know, Tim just spoke to Joe Beda and we talked about, you know, the, the, the idea of, you know, Craig and Brendan and him sitting in the room and, you know, open source and, you know, really bringing this out there to community. But let's start with you. Cause he, you know, uh, I remember back many times in my career like, Oh, I read this phenomenal paper about Google. You know, we're going to spend the next decade, you know, figuring out the ripple effect of this technology. Um, you know, Coobernetti's has in five years had a major impact on, on what we're doing. Uh, it gives a little bit of your insight is to, you know, what you've seen from those early days, you know. >>Yeah. You know, um, in the early days we had the same conversations we produced. These papers are, you know, seminal in the industry. Um, and then we sort of don't follow up on them sometimes as Google. Um, we didn't want this to be that, right. We wanted this to be alive living thing with a real community. Uh, that took root in a different way than MapReduce, Hadoop sort of situation. Um, so that was very much front of mind as we work through what are we going to build, how are we going to build and how are we going to manage it? How are we going to build a community? How, how do you get people involved? How do you find folks like Gaiman and Deus and get them to say we're in, we want to be a part of this. >>All right, so Gabe, it was actually Joe corrected me when I said, well, Google started it and they pulled in some other like-minded vendors. Like he said, no, no stew. We didn't pull vendors in. We pulled in people and people that believed in the project and the vision, you were one of those people that got pulled in early. He were, you know, so help give us a little context in your, your viewpoint. I did. And, and, and you know, at the time I was working for a company, uh, called, uh, that I had started and we were out there trying to make developers more productive in industry using modern technology like containers. And you know, it was through the process of trying to solve problems for customers, sort of the lens that I was bringing, uh, to this where, um, I was introduced to some really novel technology approaches first through Docker. >>Uh, and you know, I was close with Solomon hikes, the, the founder over there. Uh, and then, you know, started to work closely with folks at Google, uh, namely Brendon burns, who I now work with at Microsoft. Um, you know, part of the, the founding Kubernetes team. Uh, and I, I agree with that statement that it is really about people. It's really about individual connections at the end of the day. Um, I think we do these things that at these coupons, uh, events called the contributor summits. And it's very interesting because when folks land at one of these summits, it's not about who you work for, what Jersey you're wearing, that sort of thing. It's people talking to people, trying to solve technical problems, trying to solve organizational challenges. Uh, and I think, you know, the, the phenomenon that's happened there and the scale with which that's happened is part of the reason why there's 8,000 people here in Barcelona today. >>Yeah. It's interesting to him cause you know, I used to be involved in some standards work and I've been, you know, working with the open source community for about 20 years. It used to be ah, you know, it was the side project that people did at nights and everything like that. Today a lot of the people that are contributing, well they do have a full time job and their job will either let them or asking them to do that. So I do talk to people here that when they're involved in the working groups, when they're doing these things, yes. You think about who their paycheck comes for, but that's secondary to what they're doing as part of the community. And it is, you know, some of the people what, what >>absolutely. It's part of the ethos of the project that the project comes first and if company comes second or maybe even third. Uh, and for the most part, this has been wildly successful. Uh, there's this huge base of trust among, uh, among the leadership and among the contributors. Um, and you know, it's, it's a big enough project now that I don't know every one of the contributors, but we have this web of trust. And, you know, I, I have this, this army of people that I know and I trust very well and they know people and they know people and it works out that the project has been wildly successful and we've never yet had a major conflict or strife that centered on company this or company that. >>Yeah. And I don't, I'd also add that it's an important development has happened in the wake of Kubernetes where, you know, for example, in my teams at Microsoft, I actually have dedicated PM and engineering staff where their only job is to focus on community engagements, right? Running the release team for communities one 15 or working on IPV six support or windows container support. Uh, and, and that work, that upstream work, uh, puts folks in contact with people from all different companies, Google, uh, uh, you know, Microsoft working closely together on countless initiatives. Uh, and the same is true really for the entire community. So I think it's really great to see that you can get not just sort of the interpersonal interactions. We can also get sort of corporate sponsorship of that model. Cause I do think at the end of the day people need to get their paychecks. Uh, and oftentimes that's going to come from a big company. Uh, and, and seeing that level of investment is, I think, uh, pretty encouraging. Okay. Well, you know, luckily five years in we've solved all the problems and everything works perfectly. Um, if that's not maybe the case, where do we need people involved? What things should we be looking at? Kind of the, the, the next year or two in this space, you know, a project >>of this size, a community of this size, a system of this scope has infinite work to do, right? The, the, the barrel is never going to be empty. Um, and in some cases it's filling faster than it's draining. Um, every special interest group, every SIG, it has a backlog of issues of things that they would like to see fixed of features that they have some user pounding the table saying, I need this thing to work. Uh, IPV six is a great example, right? And, and we have people now stepping up to take on these big issues because they have customers who need it or they see it as important foundational work for building future stuff. Um, so, you know, there's, there's no shortage of work to do. That's not just engineering work though, right? It's not just product definition or API. We have a, what we call a contributor experience. People who work with our community to entre online, uh, new contributors and um, and, and streamline how to get them in and involved in documentation and testing and release engineering. And there's so much sort of non-core work. Uh, I could go on on this for. >>Yeah, you're just reminding me of the session this morning is I don't manage clusters. I manage fleets. And you have the same challenge with the people. Yeah. And I also had another dimension to this about just the breadth of contribution. We were just talking before the show that, um, you know, outside at the logo there is this, uh, you know, characters, book characters, and such. And really that came from a children's book that was created to demonstrate core concepts, uh, to developers who were new to Kubernetes. And it ended up taking off and it was eventually donated to the CNCF. Um, but things like that, you can't underestimate the importance and impact that that can have on making sure that Kubernetes is accessible to a really broad audience. Okay. Uh, yeah, look, I want to give you both a, just the, the, the final word as to w what you shout out, you one for the community and uh, yeah. And any special things that have surprised you or exciting you? Uh, you know, here in 2019, >>uh, you know, exciting is being here. If you rewind five years and tell me I'm going to in Barcelona with with 7,500 of my best friends, uh, I would think you are crazy or are from Mars. Um, this is amazing. And uh, I thank everybody who's here, who's made this thing possible. We have a ton of work to do. Uh, and if you feel like you can't figure out what you need to work on, come talk to me and we'll, we'll figure it out. >>Yeah. And for me, I just want to give a big thank you to all the maintainers folks like Tim, but also, you know, some other folks who, you know, may, you may not know their name but they're the ones slogging it out and to get hub PRQ you know, trying to just make the project work and function day to day and were it not for their ongoing efforts, we wouldn't have any of this. So thank you to that. Well and look, thank you. Of course, to the community and thank you both for sharing with our community. We're always happy to be a small piece of a, you know, helping to spread the word and uh, give some voice to everything that's going on here. Thank you so much. All right, so we will be back with more coverage here from coupon cloud native con 2019 on Stu Miniman and thank you for watching the cube.

Published Date : May 22 2019

SUMMARY :

cloud native con Europe, 2019 onto you by red hat, heavily involvement in uh, you know, Coobernetti's since the very early days. Uh, so, you know, Tim just spoke to Joe Beda and we talked about, These papers are, you know, seminal in the industry. And, and, and you know, at the time I was working for a company, uh, Uh, and I think, you know, the, the phenomenon that's happened there and the scale with which And it is, you know, some of the people what, what Um, and you know, it's, it's a big enough project now that I don't know every one of the contributors, but we have this web of trust. from all different companies, Google, uh, uh, you know, Microsoft working closely together on countless initiatives. Um, so, you know, there's, there's no shortage of work to do. Uh, you know, here in 2019, uh, you know, exciting is being here. it out and to get hub PRQ you know, trying to just make the project work and function day to day

ENTITIES

Entity	Category	Confidence
Tim Hawkin	PERSON	0.99+
Tim	PERSON	0.99+
Barcelona	LOCATION	0.99+
Craig	PERSON	0.99+
Tim Hockin	PERSON	0.99+
7,500	QUANTITY	0.99+
Joe	PERSON	0.99+
Gabe	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Mars	LOCATION	0.99+
2019	DATE	0.99+
Joe Beda	PERSON	0.99+
Google	ORGANIZATION	0.99+
Stu Miniman	PERSON	0.99+
Brendan	PERSON	0.99+
8,000 people	QUANTITY	0.99+
Bon Roy	PERSON	0.99+
five years	QUANTITY	0.99+
Today	DATE	0.99+
7,700 attendees	QUANTITY	0.99+
Barcelona, Spain	LOCATION	0.99+
two day	QUANTITY	0.99+
Gabe Monroy	PERSON	0.99+
second	QUANTITY	0.99+
today	DATE	0.99+
third	QUANTITY	0.99+
first	QUANTITY	0.98+
about 20 years	QUANTITY	0.98+
KubeCon	EVENT	0.98+
CNCF	ORGANIZATION	0.98+
both	QUANTITY	0.97+
one	QUANTITY	0.97+
next decade	DATE	0.97+
Kubernetes	TITLE	0.96+
CubeCon	EVENT	0.96+
Gaiman	PERSON	0.95+
five years	QUANTITY	0.95+
Europe	LOCATION	0.94+
next year	DATE	0.93+
this morning	DATE	0.92+
Hadoop	TITLE	0.92+
Deus	PERSON	0.92+
Kubernetes	ORGANIZATION	0.92+
Coobernetti	PERSON	0.88+
Wikipedia	ORGANIZATION	0.83+
IPV six	TITLE	0.82+
15	QUANTITY	0.8+
Solomon hikes	PERSON	0.77+
windows	TITLE	0.76+
Q con cloud	EVENT	0.75+
Brendon burns	PERSON	0.75+
red hat	ORGANIZATION	0.72+
SIG	ORGANIZATION	0.71+
CloudNativeCon EU 2019	EVENT	0.7+
MapReduce	TITLE	0.7+
Jersey	LOCATION	0.69+
two	DATE	0.56+
cloud native con 2019	EVENT	0.56+
Barna	TITLE	0.53+
stew	PERSON	0.51+
coupon	EVENT	0.51+
Docker	ORGANIZATION	0.41+

Brian Grant & Tim Hockin, Google Cloud | KubeCon 2018

>> Live from Seattle, Washington, it's theCUBE covering KubeCon and CloudNativeCon, North America 2018, brought to you by Redhat, the Cloud Native Computing Foundation and it's ecosystem partners. >> Okay, welcome back, everyone, this is theCUBE's live coverage here in Seattle for KubeCon and CloudNativeCon 2018. I'm John Furrier with Stu Miniman breaking down all the action, talking to all the top people, influencers, executives, start-ups, vendors, the foundation itself. We're here with two co-leads of Kubernetes at Google, legends in the Kubernetes industry. Tim Hockin and Brian Grant, both with Google, both co-leads at GKE. Thanks for joining us, legends in the industry. Kubernetes is still a short life, but still, being there from the beginning, you guys were instrumental at Google building out and contributing to this massive tsunami of 8000 people here. Who would have thought? >> It's amazing! >> It's a little overwhelming. >> It's almost like you guys are celebrity-status here inside this crowd. How's that feel? >> It's a little weird. I don't buy into the celebrity culture for technologists. I don't think it works well. >> We agree, but it's great to have you on. Let's get down to it. Kubernetes, certainly the rise of Kubernetes has grown. It's now pretty mainstream, people look at that as a key linchpin for the center of Cloud Native. And we see the growth of Cloud, you guys are living it with Google. What is the importance of Kubernetes? Why is it so important? Fundamentally at it's core, has a lot of impact, what's the fundamental reason why it's so successful? >> I think fundamentally Kubernetes provides a framework for driving migration towards Cloud Native patterns across your entire operational infrastructure. The basic design of Kubernetes is pretty simple and can be applied to automating pretty much anything. We're seeing that here, there are at least more than half a dozen talks about how people are using the Kubernetes to control plane to manage their applications or workflows or functions or things other than just core Kubernetes, containers, for example. Cloud Native is about... One of the things I'm involved with is I'm on the Technical Oversight Committee of the Cloud Native Computing Foundation. I drove the update of the Cloud Native definition. If you're trying to operate with high velocity, deploying many times a day, if you're trying to operate at scale, especially with containers and functions, scale is increasing and compounding as people break their applications into more and more micro services. Kubernetes really provides the framework for managing that scale and for integrating other infrastructure that needs to accommodate that scale and that pace of change. >> I think Kubernetes speaks to the pain points that users are really having today. Everybody's a software company now, right? And they have to deploy their software, they have to build their software, they have to run their software, and these things, they build up pain. When it was just a little thing, you didn't have to worry about scale, internet-scale and web-scale, you could tolerate it within your organization. But more and more, you need to deploy faster, you need to automate things. You can't afford to have giant staffs of people who are running your applications. These things are all part of Kubernetes purvey. I think it just spoke to people in a way, they said I suffer from that every day and you just made it go away. >> And what's the core impact now? Because then now people are seeing it, what is the impact to the organizations that are rethinking their entire operation from all parts of the staff, from how they buy infrastructure, which is also Cloud, you see some Cloud there, and then that deploying applicant, what's the real impact? >> I think the most obvious, the most important part here is the way it changes how people operate and how they think about how they manage systems. It no longer becomes scary to update your application. It's just a thing you do. If you can do it with high confidence, you're going to do it more often, which means you get features and bugs fixed and you get your roll-outs done quicker. It's amazing, the result that it can have on the user experience. A user reports a bug in the morning, and you fix it in the afternoon, and you don't worry about that. >> You bring up some really interesting points. I think back 10 years ago, from a research standpoint, we were looking at how can the enterprise do some of the things that the hyperscale vendors were doing. I feel over the last 10 years, every time Google released one of the great scientific papers, we'd all get a peer inside and say like, oh hey. When I went to the first DockerCon and heard how Google was using containers, when Kubernetes first came out, it's like, oh wow, maybe the rest of us will get to do something that Google's been doing for the last 10 years. Maybe bring us back a little bit to Borg and how that led to Kubernetes. Are we still all the rest of us just doing whatever Google did 10 years ago? >> Yeah, Tim and I both worked on Borg previously, Tim on the node-agent side and I worked on the control-point side in Borg One lesson we really took from Borg is that really you can run all types of applications. People started with stateless applications and we started with that because it's simpler in Kubernetes. But really it's just a general management control plane for managing applications. With the model of one application per container, then you can manage the applications in a much more first-class way and unlock a lot of opportunities for automation in the management control plane. At Google, several years ago when we started, Google had already gone through the transition of moving most of its applications to Borg. It was after that phase that Google started its Cloud effort and the rest of the world was doing VMs. When Docker emerged, we were... In the early phases, Tim mentioned this in our keynote yesterday of open-sourcing our container runtime. When Docker emerged, it is clear it had a much better user experience for the way folks were managing applications outside of Google and we just pivoted to that immediately. >> When Docker first came out, we took a look at it, we, my node-agent team in Borg, and we went, yeah, it's kind of like poor man's version of Borglet. We sort of ignored it for awhile because we were already working on our open-source effort. We were open-sourcing it, not really to change the world and make everybody use it, but more so that we can have conversations with people like the Linux kernel community. When we said we need this feature, and they'd say well why, why do you need this, we could actually demonstrate for them why we needed it. When Docker landed, we saw the community building, and building, and building. That was a snowball of its own, right? As it caught on, we realized we know what this is going to. We know once you embrace the Docker mindset that you very quickly need something to manage all of your Docker nodes once you get beyond two or three of them. We know how to build that. We got a ton of experience here. We went to our leadership and said, please, this is going to happen with us or without us and I think the world would be better if we helped. >> I think that's an interesting point. You guys had to open-source to do collaboration with Linux to get that flywheel going for you guys out of necessity. Then when Docker validated the community acceptance of hey, we can just use containers, a lot of magic will happen, it hit the second trigger point. What happened after that? You guys just had a debate internally? Is this another MapReduce? What's happening? Like, we should get behind this. I knew there was a big argument or debate, I should say, within Google. At that time there were a lot of conversations, how do we handle this? >> That was around the time that Google Compute Engine, our infrastructures and service platform, was going GA and really starting to get usage. So then we had an opportunity to enable our customers to benefit from the kinds of techniques we had been using internally. So I don't think the debate was whether we should participate, it was more how. For example, should we have a fully managed product, should we have to do open-source, should we do managed open-source, so those were really the three alternatives that we were discussing. >> Well, congratulations, you guys done great work and certainly a huge impact to the industry. I think it's clear that the motivation to have some sort of standardization, de facto standard, whatever word can be used to kind of let people be enabled on top or below Kubernetes is great. I guess the next question is how do you guys envision this going forward as a core? If we're going to go to decomposition with low levels of granularity tying together through the network and cloud-scale and the new operating law, we'll have comments in this, how does the industry maintain the greatness of what Kubernetes is delivering and bring new things to market faster? What's your vision on this? >> I talked a little bit about this this week. We put a ton of work into extension points, extensibility of the system trying to stay very true to the original vision of Kubernetes. It is a box, and Kubernetes fits inside a box, and anything that's outside the box has to stay outside the box. This gives us the opportunity to build new ecosystems. You can see it in networking space, you can see it in storage space where whole sort of cottage industries are now springing up around doing networking for Kubernetes and doing storage for Kubernetes. And that's fantastic! You see projects like Istio, which I'm a big fan of, it's outside of Kubernetes. It works really well with Kubernetes, it's designed on top of Kubernetes infrastructure, but it's not Kubernetes. It's totally removable and you don't need it. There's systems like Knative which are taking the serverless idea and upleveling Kubernetes into serverless space. It's happening all over the place. We're trying to sort of pray fanatically, say, no, we're staying this big and no bigger. >> It's a really... From an engineering standpoint, it's much simpler if I just build a product and build everything into it. All those connection points, I go back to my engineering training. It's like every connection point is going to be another place where it could fail. Now it's got all these APIs, there's all the security issues, and things like that. But what I love what I heard right here is some of the learnings that we've had in open-source is these are all of these individual components that most of them can stand on their own. They don't even have to be with Kubernetes, but altogether you can build lots of different offerings. How do you balance that? How do you look at that from kind of a design and architecture standpoint? >> So one thing I've been looking at is how do we ensure compatibility of workloads across Kubernetes in all different environments and different configurations. How do we ensure that the tools and other systems building an ecosystem work with Kubernetes everywhere? So this is why we created the Conformance Program to certify that the critical APIs that everybody depends on behave the same way. As we try to improve the test coverage of the conformance, people are focusing on these areas of the system that are highly pluggable and extensible. So for example, the kubelet in the node has a pluggable container runtime, pluggable networks, pluggable storage systems now with CSI. So we're really focusing on ensuring we have good coverage of the Pod API, for example. And other parts of the system, people have swapped out an ecosystem, whether it's kube-proxy for our Kubernetes services or the scheduler. So we'll be working through those areas to make sure that they have really good coverage so users can deploy, say, a Helm Chart or their takes on a configuration or whatever, however they manage their applications and have that behave the same way on Kubernetes everywhere. >> I think you guys have done a great job of identifying this enabling concept. What is good enabling technology? Allowing others to do innovation around it. I think that's a nice positioning. What are the new problem areas that you guys see to work on next? Now I see things are developing in the ecosystem. You mentioned the Istio service mesh and people see value in that. Security is certainly a big conversation we've been having this week. What new problem areas or problem sets you guys see emerging that are needed to just tackle and just knock down right away? >> The most obvious, the thing that comes up sort of in every conversation of users now is multi-cluster, multi-cloud, hybrid, whether that's two clouds or on-prem plus cloud or even across different data centers on your premises. It's a hard topic. For a long time Kubernetes was able to sort of put a finger in our ears and pretend it didn't exist while we built out the Kubernetes model. Now we're at a place where we've crossed the adoption chasm. We're into the real adoption now. It's a real problem. It actually exists and we have to deal with it, and so we're now looking at how's it supposed to work. Philosophically, what do we think is supposed to happen here? Technologically, how do we make it happen? How do these pieces fit together? What primitives can we bring into Kubernetes to make these higher level systems possible? >> Would you consider 2019 to be the year of multi-cloud, in terms of the evolution of trying to tackle some of these things from latency? >> Yeah, I'm always reluctant to say the year of something because... >> Someone has to get killed, and someone dies, and someone's winning. >> It's the year of the last desktop. >> It's the year of something. (laughs) EDI, I'm just saying. >> I think multi-cluster is definitely the hot topic right now. It's certainly almost every customer that we talk to through Google and tons of community chatter about how to make this work. >> You've seen companies like NetApp and Cisco, for instance, and how they're been getting a tail-wind from the Kubernetes. It's been interesting. You need networks. They have a lot of networks. They can play a role in it. So it's interesting how it's designed to allow people to put their hands in there without kind of mucking up the main... >> Yeah, I think that really contributes to the success of Kubernetes, the more people that can help add value to Kubernetes, more people have a stake in the success of Kubernetes, both users and vendors, and developers, and contributors. We're all stakeholders in this endeavor now and we all share common goals, I think. >> Well guys, final question for you. I know we got to break on time. Thanks for coming. I really appreciate the time. Talk about an area of Kubernetes that most people should know about that might not know about. In other words, there was a lot of hype around Kubernetes, and it's warranted, it's a lot of buzz, what's an important area that's not talked about much that people should know more about it and pay attention to within the Kubernetes realms of that world? Is there any area that you think is not talked about enough that should be focused on in the conversations, the press, or just in general? >> Wow, that's a challenging question. I spent a lot of my time in the infrastructure side of Kubernetes, the lower end of the stack, so my brain immediately goes to networking and storage and all the lower level pieces there. I think there's a lot of policy knobs that Kubernetes has that not everybody's aware of, whether those are security policies or network policies. There's a whole family of these things and I think we're going to continue to acree more and more policy as more people come up with real-use cases for doing stuff. It's hard to keep that all in your mind, but it's really valuable stuff down there. >> For programmability, it's like a Holy Grail, really. Thoughts on the things that (chuckles) put you on the spot there? >> I think this question of how people should change what they were doing before if they're going to migrate to Kubernetes. To operate any workload, you need at least monitoring and you need really CI/CD if you want to operate with any amount of velocity. When you bring those practices to Kubernetes, should you just lift and shift those into Kubernetes or do you really need to change your mindset? I think Kubernetes really provides some capabilities that create opportunities for changing the way some things happen. I'm a big fan of GitOps, for example, in managing the resources to declaritively using version control as a source of truth and keeping that in sync with the state in your for live clusters. I think that enables a lot of interesting capabilities like instant disaster recovery, for example, migrations, new locations. There are some key folks here who are talking about that, giving that message, but we're really at the early stages there. >> All right, well great to have you guys on. Thanks for the insight. We've got to wrap up. Thanks Brian, thanks Tim, appreciate it. Live coverage here, theCUBE is at KubeCon, Cloud Native, Cloud 2018. I'm John Furrier with Stu Miniman, we'll be back after this short break.

Published Date : Dec 12 2018

SUMMARY :

brought to you by Redhat, legends in the Kubernetes industry. It's almost like you guys I don't buy into the celebrity great to have you on. the Kubernetes to control plane to manage I think it just spoke to people in a way, and you get your roll-outs done quicker. and how that led to Kubernetes. and the rest of the world was doing VMs. but more so that we can have conversations it hit the second trigger point. and really starting to get usage. the motivation to have and anything that's outside the box has to some of the learnings that and have that behave the same I think you guys have done a great job We're into the real adoption now. to say the year of something Someone has to get of the last desktop. It's the year of something. the hot topic right now. from the Kubernetes. the more people that can I really appreciate the time. in the infrastructure side of Kubernetes, Thoughts on the things that (chuckles) the resources to declaritively to have you guys on.

ENTITIES

Entity	Category	Confidence
Tim Hockin	PERSON	0.99+
Brian	PERSON	0.99+
Tim	PERSON	0.99+
Brian Grant	PERSON	0.99+
Brian Grant	PERSON	0.99+
John Furrier	PERSON	0.99+
Cisco	ORGANIZATION	0.99+
2019	DATE	0.99+
Cloud Native Computing Foundation	ORGANIZATION	0.99+
Stu Miniman	PERSON	0.99+
Seattle	LOCATION	0.99+
Google	ORGANIZATION	0.99+
three	QUANTITY	0.99+
two	QUANTITY	0.99+
KubeCon	EVENT	0.99+
both	QUANTITY	0.99+
CloudNativeCon	EVENT	0.99+
GKE	ORGANIZATION	0.99+
GitOps	TITLE	0.99+
first	QUANTITY	0.98+
Cloud Native	TITLE	0.98+
several years ago	DATE	0.98+
8000 people	QUANTITY	0.98+
yesterday	DATE	0.98+
Seattle, Washington	LOCATION	0.98+
NetApp	ORGANIZATION	0.98+
Kubernetes	TITLE	0.98+
CloudNativeCon 2018	EVENT	0.98+
10 years ago	DATE	0.97+
Istio	ORGANIZATION	0.97+
this week	DATE	0.97+
two clouds	QUANTITY	0.96+
three alternatives	QUANTITY	0.96+
One	QUANTITY	0.96+
One lesson	QUANTITY	0.96+
Kubernetes	ORGANIZATION	0.94+
one thing	QUANTITY	0.94+
today	DATE	0.92+
Docker	ORGANIZATION	0.92+
theCUBE	ORGANIZATION	0.92+
two co-leads	QUANTITY	0.91+
DockerCon	EVENT	0.91+
Borg	ORGANIZATION	0.91+
one	QUANTITY	0.9+
Kubernetes	PERSON	0.9+
Cloud 2018	EVENT	0.9+
KubeCon 2018	EVENT	0.9+
Technical Oversight Committee	ORGANIZATION	0.89+
2018	EVENT	0.89+
last 10 years	DATE	0.89+
MapReduce	ORGANIZATION	0.88+
one application	QUANTITY	0.88+

Brent Compton, Red Hat | theCUBE NYC 2018

>> Live from New York, it's theCUBE, covering theCUBE New York City 2018. Brought to you by SiliconANGLE Media and its ecosystem partners. >> Hello, everyone, welcome back. This is theCUBE live in New York City for theCUBE NYC, #CUBENYC. This is our ninth year covering the big data ecosystem, which has now merged into cloud. All things coming together. It's really about AI, it's about developers, it's about operations, it's about data scientists. I'm John Furrier, my co-host Dave Vellante. Our next guest is Brent Compton, Technical Marketing Director for Storage Business at Red Hat. As you know, we cover Red Hat Summit and great to have the conversation. Open source, DevOps is the theme here. Brent, thanks for joining us, thanks for coming on. >> My pleasure, thank you. >> We've been talking about the role of AI and AI needs data and data needs storage, which is what you do, but if you look at what's going on in the marketplace, kind of an architectural shift. It's harder to find a cloud architect than it is to find diamonds these days. You can't find a good cloud architect. Cloud is driving a lot of the action. Data is a big part of that. What's Red Hat doing in this area and what's emerging for you guys in this data landscape? >> Really, the days of specialists are over. You mentioned it's more difficult to find a cloud architect than find diamonds. What we see is the infrastructure, it's become less about compute as storage and networking. It's the architect that can bring the confluence of those specialties together. One of the things that we see is people bringing their analytics workloads onto the common platforms where they've been running the rest of their enterprise applications. For instance, if they're running a lot of their enterprise applications on AWS, of course, they want to run their analytics workloads in AWS and that's EMRs long since in the history books. Likewise, if they're running a lot of their enterprise applications on OpenStack, it's natural that they want to run a lot of their analytics workloads on the same type of dynamically provisioned infrastructure. Emerging, of course, we just announced on Monday this week with Hortonworks and IBM, if they're running a lot of their enterprise applications on a Kubernetes substrate like OpenShift, they want to run their analytics workloads on that same kind of agile infrastructure. >> Talk about the private cloud impact and hybrid cloud because obviously we just talked to the CEO of Hortonworks. Normally it's about early days, about Hadoop, data legs and then data planes. They had a good vision. They're years into it, but I like what Hortonworks is doing. But he said Kubernetes, on a data show Kubernetes. Kubernetes is a multi-cloud, hybrid cloud concept, containers. This is really enabling a lot of value and you guys have OpenShift which became very successful over the past few years, the growth has been phenomenal. So congratulations, but it's pointing to a bigger trend and that is that the infrastructure software, the platform as a service is becoming the middleware, the glue, if you will, and Kubernetes and containers are facilitating a new architecture for developers and operators. How important is that with you guys, and what's the impact of the customer when they think, okay I'm going to have an agile DevOps environment, workload portability, but do I have to build that out? You mentioned people don't have to necessarily do that anymore. The trend has become on-premise. What's the impact of the customer as they hear Kubernetese and containers and the data conversation? >> You mentioned agile DevOps environment, workload portability so one of the things that customers come to us for is having that same thing, but infrastructure agnostic. They say, I don't want to be locked in. Love AWS, love Azure, but I don't want to be locked into those platforms. I want to have an abstraction layer for my Kubernetese layer that sits on top of those infrastructure platforms. As I bring my workloads, one-by-one, custom DevOps from a lift and shift of legacy apps onto that substrate, I want to have it be independent, private cloud or public cloud and, time permitting, we'll go into more details about what we've seen happening in the private cloud with analytics as well, which is effectively what brought us here today. The pattern that we've discovered with a lot of our large customers who are saying, hey, we're running OpenStack, they're large institutions that for lots of reasons they store a lot of their data on-premises saying, we want to use the utility compute model that OpenStack gives us as well as the shared data context that Ceph gives us. We want to use that same thing for our analytics workload. So effectively some of our large customers taught us this program. >> So they're building infrastructure for analytics essentially. >> That's what it is. >> One of the challenges with that is the data is everywhere. It's all in silos, it's locked in some server somewhere. First of all, am I overstating that problem and how are you seeing customers deal with that? What are some of the challenges that they're having and how are you guys helping? >> Perfect lead in, in fact, one of our large government customers, they recently sent us an unsolicited email after they deployed the first 10 petabytes in a deca petabyte solution. It's OpenStack based as well as Ceph based. Three taglines in their email. The first was releasing the lock on data. The second was releasing the lock on compute. And the third was releasing the lock on innovation. Now, that sounds a bit buzzword-y, but when it comes from a customer to you. >> That came from a customer? Sounds like a marketing department wrote that. >> In the details, as you know, traditional HDFS clusters, traditional Hadoop clusters, sparklers or whatever, HDFS is not shared between clusters. One of our large customers has 50 plus analytics clusters. Their data platforms team employ a maze of scripts to copy data from one cluster to the other. And if you are a scientist or an engineer, you'd say, I'm trying to obtain these types of answers, but I need access to data sets A, B, C, and D, but data sets A and B are only on this cluster. I've got to go contact the data platforms team and have them copy it over and ensure that it's up-to-date and in sync so it's messy. >> It's a nightmare. >> Messy. So that's why the one customer said releasing the lock on data because now it's in a shared. Similar paradigm as AWS with EMR. The data's in a shared context, an S3. You spin up your analytics workloads on AC2. Same paradigm discussion as with OpenStack. Your spinning up your analytics workloads via OpenStack virtualization and their sourcing is shared data context inside of Ceph, S3 compatible Ceph so same architecture. I love his last bit, the one that sounds the most buzzword-y which was releasing lock on innovation. And this individual, English was not this person's first language so love the word. He said, our developers no longer fear experimentation because it's so easy. In minutes they can spin up an analytics cluster with a shared data context, they get the wrong mix of things they shut it down and spin it up again. >> In previous example you used HDFS clusters. There's so many trip wires, right. You can break something. >> It's fragile. >> It's like scripts. You don't want to tinker with that. Developers don't want to get their hand slapped. >> The other thing is also the recognition that innovation comes from data. That's what my takeaway is. The customer saying, okay, now we can innovate because we have access to the data, we can apply intelligence to that data whether it's machine intelligence or analytics, et cetera. >> This the trend in infrastructure. You mentioned the shared context. What other observations and learnings have you guys come to as Red Hat starts to get more customer interactions around analytical infrastructure. Is it an IT problem? You mentioned abstracting the way different infrastructures, and that means multi-cloud's probably setup for you guys in a big way. But what does that mean for a customer? If you had to explain infrastructure analytics, what needs to get done, what does the customer need to do? How do you describe that? >> I love the term that industry uses of multi-tenant workload isolation with shared data context. That's such a concise term to describe what we talk to our customers about. And most of them, that's what they're looking for. They've got their data scientist teams that don't want their workloads mixed in with the long running batch workloads. They say, listen, I'm on deadline here. I've got an hour to get these answers. They're working with Impala. They're working with Presto. They iterate, they don't know exactly the pattern they're looking for. So having to take a long time because their jobs are mixed in with these long MapReduce jobs. They need to be able to spin up infrastructure, workload isolation meaning they have their own space, shared context, they don't want to be placing calls over to the platform team saying, I need data sets C, D, and E. Could you please send them over? I'm on deadline here. That phrase, I think, captures so nicely what customers are really looking to do with their analytics infrastructure. Analytics tools, they'll still do their thing, but the infrastructure underneath analytics delivering this new type of agility is giving that multi-tenant workload isolation with shared data context. >> You know what's funny is we were talking at the kickoff. We were looking back nine years. We've been at this event for nine years now. We made prediction there will be no Red Hat of big data. John, years ago said, unless it's Red Hat. You guys got dragged into this by your customers really is how it came about. >> Customers and partners, of course with your recent guest from Hortonworks, the announcement that Red Hat, Hortonworks, and IBM had on Monday of this week. Dialing up even further taking the agility, okay, OpenStack is great for agility, private cloud, utility based computing and storage with OpenStack and Ceph, great. OpenShift dials up that agility another notch. Of course, we heard from the CEO of Hortonworks how much they love the agility that a Kubernetes based substrate provides their analytics customers. >> That's essentially how you're creating that sort of same-same experience between on-prem and multi-cloud, is that right? >> Yeah, OpenShift is deployed pervasively on AWS, on-premises, on Azure, on GCE. >> It's a multi-cloud world, we see that for sure. Again, the validation was at VMworld. AWS CEO, Andy Jassy announced RDS which is their product on VMware on-premises which they've never done. Amazon's never done any product on-premises. We were speculating it would be a hardware device. We missed that one, but it's a software. But this is the validation, seamless cloud operations on-premise in the cloud really is what people want. They want one standard operating model and they want to abstract away the infrastructure, as you were saying, as the big trend. The question that we have is, okay, go to the next level. From a developer standpoint, what is this modern developer using for tools in the infrastructure? How can they get that agility and spinning up isolated, multi-tenant infrastructure concept all the time? This is the demand we're seeing, that's an evolution. Question for Red Hat is, how does that change your partnership strategy because you mentioned Rob Bearden. They've been hardcore enterprise and you guys are hardcore enterprise. You kind of know the little things that customers want that might not be obvious to people: compliance, certification, a decade of support. How is Red Hat's partnership model changing with this changing landscape, if you will? You mentioned IBM and Hortonworks release this week, but what in general, how does the partnership strategy look for you? >> The more it changes, the more it looks the same. When you go back 20 years ago, what Red Hat has always stood for is any application on any infrastructure. But back in the day it was we had n-thousand of applications that were certified on Red Hat Linux and we ran on anybody's server. >> Box. >> Running on a box, exactly. It's a similar play, just in 2018 in the world of hybrid, multi-cloud architectures. >> Well, you guys have done some serious heavy lifting. Don't hate me for saying this, but you're kind of like the mules of the industry. You do a lot of stuff that nobody either wants to do or knows how to do and it's really paid off. You just look at the ascendancy of the company, it's been amazing. >> Well, multi-cloud is hard. Look at what it takes to do multi-cloud in DevOps. It's not easy and a lot of pretenders will fall out of the way, you guys have done well. What's next for you guys? What's on the horizon? What's happening for you guys this next couple months for Red Hat and technology? Any new announcements coming? What's the vision, what's happening? >> One of the announcements that you saw last week, was Red Hat, Cloudera, and Eurotech as analytics in the data center is great. Increasingly, the world's businesses run on data-driven decisions. That's great, but analytics at the edge for more realtime industrial automation, et cetera. Per the announcements we did with Cloudera and Eurotech about the use of, we haven't even talked about Red Hat's middleware platforms, such as AMQ Streams now based on Kafka, a Kafka distribution, Fuze, an integration master effectively bringing Red Hat technology to the edge of analytics so that you have the ability to do some processing in realtime before back calling all the way back to the data center. That's an area that you'll also see is pushing some analytics to the edge through our partnerships such as announced with Cloudera and Eurotech. >> You guys got the Red Hat Summit coming up next year. theCUBE will be there, as usual. It's great to cover Red Hat. Thanks for coming on theCUBE, Brent. Appreciate it, thanks for spending the time. We're here in New York City live. I'm John Furrier, Dave Vallante, stay with us. All day coverage today and tomorrow in New York City. We'll be right back. (upbeat music)

Published Date : Sep 12 2018

SUMMARY :

Brought to you by SiliconANGLE Media Open source, DevOps is the theme here. Cloud is driving a lot of the action. One of the things that we see is people and that is that the infrastructure software, the shared data context that Ceph gives us. So they're building infrastructure One of the challenges with that is the data is everywhere. And the third was releasing the lock on innovation. That came from a customer? In the details, as you know, I love his last bit, the one that sounds the most buzzword-y In previous example you used HDFS clusters. You don't want to tinker with that. that innovation comes from data. You mentioned the shared context. I love the term that industry uses of You guys got dragged into this from Hortonworks, the announcement that Yeah, OpenShift is deployed pervasively on AWS, You kind of know the little things that customers want But back in the day it was we had n-thousand of applications in the world of hybrid, multi-cloud architectures. You just look at the ascendancy of the company, What's on the horizon? One of the announcements that you saw last week, You guys got the Red Hat Summit coming up next year.

ENTITIES

Entity	Category	Confidence
Dave Vallante	PERSON	0.99+
Dave Vellante	PERSON	0.99+
IBM	ORGANIZATION	0.99+
John	PERSON	0.99+
Brent Compton	PERSON	0.99+
AWS	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Eurotech	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Brent	PERSON	0.99+
New York City	LOCATION	0.99+
2018	DATE	0.99+
Red Hat	ORGANIZATION	0.99+
Rob Bearden	PERSON	0.99+
nine years	QUANTITY	0.99+
Andy Jassy	PERSON	0.99+
last week	DATE	0.99+
first language	QUANTITY	0.99+
Three taglines	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
first	QUANTITY	0.99+
tomorrow	DATE	0.99+
second	QUANTITY	0.99+
One	QUANTITY	0.99+
Cloudera	ORGANIZATION	0.99+
next year	DATE	0.99+
third	QUANTITY	0.99+
New York	LOCATION	0.99+
Impala	ORGANIZATION	0.99+
Monday this week	DATE	0.99+
VMworld	ORGANIZATION	0.98+
one cluster	QUANTITY	0.98+
Red Hat Summit	EVENT	0.98+
ninth year	QUANTITY	0.98+
one	QUANTITY	0.98+
OpenStack	TITLE	0.98+
today	DATE	0.98+
NYC	LOCATION	0.97+
20 years ago	DATE	0.97+
Kubernetese	TITLE	0.97+
Kafka	TITLE	0.97+
First	QUANTITY	0.96+
this week	DATE	0.96+
Red Hat	TITLE	0.95+
English	OTHER	0.95+
Monday of this week	DATE	0.94+
OpenShift	TITLE	0.94+
one standard	QUANTITY	0.94+
50 plus analytics clusters	QUANTITY	0.93+
Ceph	TITLE	0.92+
Azure	TITLE	0.92+
GCE	TITLE	0.9+
Presto	ORGANIZATION	0.9+
agile DevOps	TITLE	0.89+
theCUBE	ORGANIZATION	0.88+
DevOps	TITLE	0.87+

Alan Stearn, Cisco | VeeamON 2018

>> Narrator: Live from Chicago, Illinois It's theCUBE covering VeeamOn 2018 Brought to you by Veeam. >> Dave: Welcome back to VeeamOn 2018. You're watching theCUBE, the leader in live tech coverage. We go out to the events, we extract the signal from the Noise. My name is Dave Vellante and I'm here with my cohost, Stu Miniman. This is our second year at VeeamOn, #VeeamOn. Alan Stern is here. He's the technical solutions architect at Cisco. Alan, thanks for coming to theCUBE. >> Alan: Great to be here. It's a real honor and privilege, so I'm excited. >> It's a great show. It's smallish. It's not as big as Cisco Live which will be at the next month but it's clean, it's focused. Let's start with your role at Cisco as a solutions architect. What's your focus? >> So my focus is really on three areas of technology. Data protection being one of them, software defined storage or object storage, and then the Hadoop ecosystem. And I work with our sales teams to help them understand how the technology is relevant to Cisco as a solutions partner, and also work with the partners to help them understand how Cisco-- the benefit of working with Cisco is advantageous to all of us in order to help our customers come to solutions that benefit their enterprise. So your job as a catalyst and a technical expert-- so you identify workloads, use cases, and figure out how can we take Cisco products and services and point them there and add the most value for customers. That's really your job. >> To some degree, yeah, I mean in a lot of these solutions, this is an area that our executive team has said, "Hey this is something we can go help our customers with" and then it's handed down to my team and my job is then to make it happen. Along with a lot of other people. >> So let's look at these. Data protection is obviously relevant at VeeamOn. What role does Cisco play in the data protection matrix? >> So Cisco provides an optimal platform for great partners like Veeam to land these backups. It's critical, it's funny we often talk about backup, and what we should be talking about is restore. Cause nobody backs up just for the sake of backing up. But how do I restore quickly, and having that backup on premise on an optimized platform where Cisco has done all of the integration work to make sure everything is going to work is critical to the customer's success. Because as we know maintenance windows and downtime are a thing of the past. They don't exist anymore. We live in an always-on enterprise and that's really where folks like Veeam are focused. >> For you younger people out there, we used to talk about planned downtime which is just-- what? What is that? Why would anybody plan for downtime? It's ridiculous. >> Stu: Alan, what if we can unpack that a little. I think back and the data center group, you and Cisco launched UCS, the memory that it had was really geared for virtualization and I could see why Veeam and Cisco would work well together because some unique architecture that's there. This is a few years ago now that UCS has been on the market, What's the differentiation and maybe bring us inside some of the engineering work that happened between Cisco and Veeam in some of these spaces. >> So we take our engineers and lock them in with Veeam engineers into a lab and they go in and deploy the solution, they turn all the various nerd knobs to get the platform optimized. Primarily we talk about our S3260 which in a 4U space holds about 672 terabytes of storage and they optimize it and then publish a document that goes with it. We call them Cisco-validated designs. And these designs allow the customer to deploy the solution without having to go through the hit-or-miss of "what happens when I turn "this nerd knob or that nerd knob, "alter this network configuration or that one" and to get the best performance in the shortest possible time. >> Those CVDs are critical, but field knows them, they trust them, can you speak a bit to -- the presence that you have having Veeam in your pricebook, what that means, to kind of take that out to the broad Cisco ecosystem. Yeah, and it's more than just having it on the pricelist. It's the integrated support, so that the customer knows that if there's a problem they're not going to end up in a finger-pointing solution of Cisco saying "Call Veeam" or Veeam saying "Call Cisco." They have a solution and we're in lockstep so that there aren't going to be the problems. The CVD insures that problems are kept to a minimum. Cisco has fantastic support, Veeam has great support. They were talking this morning about the net promoter score being 73 which is unbelievably good. So that in the event that there is a problem, they know they're going to get to resolution incredibly quickly and they're going to get their environment restored as quickly as possible. >> So when I think about the three areas of your focus, data protection, object storage, and Hadoop ecosystem, there's definitely intersection amongst those. We talked a little bit about data protection. The object store piece, the whole software defined, is a trend that's taking off, we were talking earlier about some of the trade-offs of software defined. Bill Philbin was saying, "Well if I go out "and put it together myself when there's "a problem, I've got to fix it myself." So there's a trade-off there. I don't know if you watch Silicon Valley, Stu but the box. Sometimes it's nice to have an appliance. What are you seeing in terms of the trends toward software defined-- What's driving that? Is it choice, is it flexibility? What are the trade-offs? >> It's a couple of things. The biggest thing that's driving it is just the explosion of data. Data that's born in the cloud-- It's probably pretty good to store with one of the cloud providers. But data that's born in your data center or that is extremely proprietary and sensitive; customers are increasingly looking to say "You know what, I want to keep that onsite." and that's in addition to the regulatory issues that we're going to see with GDPR and others. So they want to keep it on site, but they like the idea of the ease of use of cloud and the nature of object storage and the cost-- the cost model for object storage is great. I take a X86 based server like UCS and I overlay a storage software that's going to give me that resiliency through erasure coding or replication. And now I've got a cost model that looks a lot like the cloud, but it's on premise forming. So that also allows me, I'm putting archival data there, I can store it cheaply and bring it back quickly. Because the one challenge with the cloud is my connectivity to my cloud provider is finite. >> Just a quick follow-up on that, I know Scality's a partner or there are other options for optic storage. >> Sure, both Scality and Swiftstack are on our global pricelist like Veeam. We also work with some other folks like IBM cloud object store, Cohesity, which sort of fits in between space, as well as, we're doing some initial work with Cloudy. >> Think about the hadoop ecosystem. That brings in new challenges, I mean A lot of Hadoop is basically software defined file system. And it's also in a distributed-- The idea of bringing five megabytes of computing to a petabyte of data. So it's leave the data where it is. So that brings new challenges with regard to architectures, protecting that data, talk about that a little bit. >> The issue with Hadoop is data has gravity. Moving lots of data around is really inefficient. That's where MapReduce was born. The data is already there. I don't have to move it across the network to process it. Data protection was sort of an afterthought. You do have replication of data, but that was really for locality, not so much for data protection. >> Or recovery to your earlier. >> But even with all of that the network is still critical. Without sounding like an advertisement for Cisco, we're really the only server provider that thought about the network as we're building the servers and as we're designing the entire ecosystem. Nobody else can do that. Nobody has that expertise. And a number of hardware features that we have in the products give us that advantage like the Cisco virtual interface card. >> That's a true point, you managed your heritage so of course that's where you started. So what advantage does that give you and one of the things we talked about in theCUBE a lot is, Flash changed everything. We used to just use spinning disks to persist and we certainly didn't it for performance. Did unnatural acts to try to get performance going. So, in many respects, Flash exposed some of the challenges with network performance. So how has that affected the market, technology, and Cisco's business? >> We're in this period of shift on Flash. Because if you think about it, at the end of the day, the Flash is still sitting on a PCI bus, it's probably ISCSI with a SATA interface. >> You got the horrible storage stack >> We move the bottleneck away from the disk drive itself, now to the bus. Now we're going to solve a lot of that with NBME and then it will come to the network. But the network's already ahead of that. We're looking at-- We have 10 gig, 40 gig, we're going to see 100 gig ethernet. So we're in pretty good shape in order to survive and really flourish as the storage improves the performance. We know with compute, the bottlenecks just move. You know, I think this morning you said Whack-a-Mole. >> Thinking about the next progression in the Whack-a-Mole, what is the next bottleneck? Is it the latency to the cloud, is it-- I mean if it's not the network, because it sounds like you're prepared for NVMe. Is it getting outside the data center? Is the next bottleneck? >> I think that's always going to be the bottleneck I use analogies like roads. We think about a roadway inside my network it's sort of the superhighway but then once I go off, I'm on a connector road. And gigabit ethernet, multi-gigabit, some folks will have fiber in the metropolitan area, but at some point they're going to hit that bottleneck. And so it becomes increasingly important to manage the data properly so that you're not moving the data around unnecessarily. >> I wonder if we could talk a little bit about the cloud here. at the Veeam show we're talking about beyond just the data center virtualization. Talking about a multi-cloud world. I had the opportunity to go to Cisco Live Barcelona, interviewed Rowan Trollope, he talked heavily about Cisco's software strategy, living in that multi-cloud world, maybe help connect the dots for us as to how Cisco and Veeam go beyond the data center and where Cisco lives beyond that. >> So beyond the data center, we really believe the multi-cloud world is where it's going to happen. Whether the cloud is on-prem, off-prem, multiple providers, software, and servers, all of those things and both Cisco and Veeam are committed to giving that consistent performance, availability, security. Veeam, obviously, is an expert at the data management, data availability. Cisco, we're going to provide some application availability and performance through apt dynamics, we have our security portfolio in order to protect the data in the cloud and then the virtualized networking features that are there to again insure that the network policy is consistent whether you're on prem in Cloud A, Cloud B, or the Cloud yet to be developed. >> So we'll come back backup, which is the first of the three that we talked about. What's Cisco's point of view, your point of view, on how that's evolving from one -- think about Veeam started out as a virtualization specialist generally but specifically for Veeamware. Now we've got messaging around the digital economy, multi-cloud, hyperavailability, etc. What does that mean from a customer's standpoint? How is it evolving? >> Well, it's evolving in ways we couldn't have imagined. Everything is connected now, and that data -- that's the value. The data that the customer has is their crown jewels. What Veeam has done really well is yeah they start off as a small virtualization player, but as they've seen the market grow and evolve, they've made adaptations to really be able to expand and stay with their customers as their needs have morphed and changed. And in many ways, similar to Cisco. We didn't start in the server space, we saw an opportunity to do something that nobody else was doing, to make sure the network was robust and well-built and the system was well managed, and that's when we entered the space. So I think it's two companies that understand consistency is critical and availability is critical. And we both evolved with our customers as the markets and demands of the business had changed. >> Last question: What are some of the biggest challenges you're working on with customers that get you excited, that you say, "Alright I'm really going to "attack this one" Give me some color on that. >> I think the biggest challenge we're seeing today is a lot of customers are-- their infrastructure because of budgets, hasn't been able to evolve fast enough and they have legacy platforms and legacy software on those platforms in terms of availability that they've got to make the migration to. So helping them determine which platform is going to be best, which platform is going to let them scale the way they need, and then which software package is going to give them all the tools and features that they need. That's exciting because you're making sure that that company is going to be around tomorrow. >> Well that's a great point. And we've been talking all day Stu, about some of the research that we've done at WikiBon the day before, quantified in a Fortune1000, they leave between one and a half and 2 billion dollars over a three to four year period on the table because of poorly architected, or non-modern infrastructure and poorly architected availability, and backup and recovery procedures. It's a hard problem because you can't just snap your fingers and modernize and the CFO's going "How we going to pay for this" We've got this risk, this threat, We're sort of losing soft dollars, but at the end of the day they actually come down they do affect the bottom line. Do you agree that-- I said last question I lied. Do you agree that CXOs are becoming aware of this problem and ideally will start to fund it? >> Absolutely, because we talked earlier about the days of planned downtime are gone. Let a CXO have a minute of downtime and look at the amount of lost revenue that he sees and suddenly you've got his/her attention. >> Great point. Alan we've got to run. Thanks very much for coming to theCUBE >> My pleasure. Great to meet you both. >> Thanks for watching everybody. This is theCUBE live from VeeamON 2018 in Chicago. We'll be right back.

Published Date : May 15 2018

SUMMARY :

Brought to you by Veeam. We go out to the events, Alan: Great to be here. Let's start with your role at and add the most value for customers. and my job is then to make it happen. the data protection matrix? has done all of the integration work What is that? UCS has been on the market, and to get the best performance So that in the event about some of the trade-offs and the nature of object storage I know Scality's a partner or we're doing some initial work with Cloudy. So it's leave the data where it is. the network to process it. the network is still critical. So how has that affected the market, it, at the end of the day, But the network's already ahead of that. Is it the latency to the cloud, is it-- in the metropolitan area, I had the opportunity to So beyond the data around the digital economy, The data that the customer Last question: What are some of the is going to be best, but at the end of the day they and look at the amount of lost revenue Alan we've got to run. Great to meet you both. This is theCUBE live from

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Cisco	ORGANIZATION	0.99+
Alan	PERSON	0.99+
Stu Miniman	PERSON	0.99+
Bill Philbin	PERSON	0.99+
Alan Stern	PERSON	0.99+
Veeam	ORGANIZATION	0.99+
Dave	PERSON	0.99+
10 gig	QUANTITY	0.99+
Rowan Trollope	PERSON	0.99+
100 gig	QUANTITY	0.99+
Alan Stearn	PERSON	0.99+
two companies	QUANTITY	0.99+
40 gig	QUANTITY	0.99+
IBM	ORGANIZATION	0.99+
Chicago	LOCATION	0.99+
five megabytes	QUANTITY	0.99+
WikiBon	ORGANIZATION	0.99+
second year	QUANTITY	0.99+
Chicago, Illinois	LOCATION	0.99+
73	QUANTITY	0.99+
one	QUANTITY	0.99+
GDPR	TITLE	0.99+
first	QUANTITY	0.99+
four year	QUANTITY	0.99+
both	QUANTITY	0.99+
Stu	PERSON	0.98+
UCS	ORGANIZATION	0.98+
VeeamOn	ORGANIZATION	0.98+
today	DATE	0.98+
Hadoop	TITLE	0.98+
CVD	ORGANIZATION	0.97+
three	QUANTITY	0.97+
about 672 terabytes	QUANTITY	0.96+
S3260	COMMERCIAL_ITEM	0.95+
Scality	ORGANIZATION	0.95+
this morning	DATE	0.94+
next month	DATE	0.94+
tomorrow	DATE	0.94+
few years ago	DATE	0.94+
one challenge	QUANTITY	0.94+
VeeamON 2018	EVENT	0.93+
Flash	TITLE	0.92+
2 billion dollars	QUANTITY	0.91+
Veeam	PERSON	0.9+
Cloud B	TITLE	0.9+
one and a half	QUANTITY	0.89+
Cloud A	TITLE	0.88+

Scott Gnau, Hortonworks | Dataworks Summit EU 2018

(upbeat music) >> Announcer: From Berlin, Germany, it's The Cube, covering DataWorks Summit Europe 2018. Brought to you by Hortonworks. >> Hi, welcome to The Cube, we're separating the signal from the noise and tuning into the trends in data and analytics. Here at DataWorks Summit 2018 in Berlin, Germany. This is the sixth year, I believe, that DataWorks has been held in Europe. Last year I believe it was at Munich, now it's in Berlin. It's a great show. The host is Hortonworks and our first interviewee today is Scott Gnau, who is the chief technology officer of Hortonworks. Of course Hortonworks got established themselves about seven years ago as one of the up and coming start ups commercializing a then brand new technology called Hadoop and MapReduce. They've moved well beyond that in terms of their go to market strategy, their product portfolio, their partnerships. So Scott, this morning, it's great to have ya'. How are you doing? >> Glad to be back and good to see you. It's been awhile. >> You know, yes, I mean, you're an industry veteran. We've both been around the block a few times but I remember you years ago. You were at Teradata and I was at another analyst firm. And now you're with Hortonworks. And Hortonworks is really on a roll. I know you're not Rob Bearden, so I'm not going to go into the financials, but your financials look pretty good, your latest. You're growing, your deal sizes are growing. Your customer base is continuing to deepen. So you guys are on a roll. So we're here in Europe, we're here in Berlin in particular. It's five weeks--you did the keynote this morning, It's five weeks until GDPR. The sword of Damacles, the GDPR sword of Damacles. It's not just affecting European based companies, but it's affecting North American companies and others who do business in Europe. So your keynote this morning, your core theme was that, if you're in enterprise, your business strategy is equated with your cloud strategy now, is really equated with your data strategy. And you got to a lot of that. It was a really good discussion. And where GDPR comes into the picture is the fact that protecting data, personal data of your customers is absolutely important, in fact it's imperative and mandatory, and will be in five weeks or you'll face a significant penalty if you're not managing that data and providing customers with the right to have it erased, or the right to withdraw consent to have it profiled, and so forth. So enterprises all over the world, especially in Europe, are racing as fast as they can to get compliant with GDPR by the May 25th deadline time. So, one of the things you discussed this morning, you had an announcement overnight that Hortonworks has released a new solution in technical preview called The Data Steward Studio. And I'm wondering if you can tie that announcement to GDPR? It seems like data stewardship would have a strong value for your customers. >> Yeah, there's definitely a big tie-in. GDPR is certainly creating a milestone, kind of a trigger, for people to really think about their data assets. But it's certainly even larger than that, because when you even think about driving digitization of a business, driving new business models and connecting data and finding new use cases, it's all about finding the data you have, understanding what it is, where it came from, what's the lineage of it, who had access to it, what did they do to it? These are all governance kinds of things, which are also now mandated by laws like GDPR. And so it's all really coming together in the context of the new modern data architecture era that we live in, where a lot of data that we have access to, we didn't create. And so it was created outside the firewall by a device, by some application running with some customer, and so capturing and interpreting and governing that data is very different than taking derivative transactions from an ERP system, which are already adjudicated and understood, and governing that kind of a data structure. And so this is a need that's driven from many different perspectives, it's driven from the new architecture, the way IoT devices are connecting and just creating a data bomb, that's one thing. It's driven by business use cases, just saying what are the assets that I have access to, and how can I try to determine patterns between those assets where I didn't even create some of them, so how do I adjudicate that? >> Discovering and cataloging your data-- >> Discovering it, cataloging it, actually even... When I even think about data, just think the files on my laptop, that I created, and I don't remember what half of them are. So creating the metadata, creating that trail of bread crumbs that lets you piece together what's there, what's the relevance of it, and how, then, you might use it for some correlation. And then you get in, obviously, to the regulatory piece that says sure, if I'm a new customer and I ask to be forgotten, the only way that you can guarantee to forget me is to know where all of my data is. >> If you remember that they are your customer in the first place and you know where all that data is, if you're even aware that it exists, that's the first and foremost thing for an enterprise to be able to assess their degree of exposure to GDPR. >> So, right. It's like a whole new use case. It's a microcosm of all of these really big things that are going on. And so what we've been trying to do is really leverage our expertise in metadata management using the Apache Atlas project. >> Interviewer: You and IBM have done some major work-- >> We work with IBM and the community on Apache Atlas. You know, metadata tagging is not the most interesting topic for some people, but in the context that I just described, it's kind of important. And so I think one of the areas where we can really add value for the industry is leveraging our lowest common denominator, open source, open community kind of development to really create a standard infrastructure, a standard open infrastructure for metadata tagging, into which all of these use cases can now plug. Whether it's I want to discover data and create metadata about the data based on patterns that I see in the data, or I've inherited data and I want to ensure that the metadata stay with that data through its life cycle, so that I can guarantee the lineage of the data, and be compliant with GDPR-- >> And in fact, tomorrow we will have Mandy Chessell from IBM, a key Hortonworks partner, discussing the open metadata framework you're describing and what you're doing. >> And that was part of this morning's keynote close also. It all really flowed nicely together. Anyway, it is really a perfect storm. So what we've done is we've said, let's leverage this lowest common denominator, standard metadata tagging, Apache Atlas, and uplevel it, and not have it be part of a cluster, but actually have it be a cloud service that can be in force across multiple data stores, whether they're in the cloud or whether they're on prem. >> Interviewer: That's the Data Steward Studio? >> Well, Data Plane and Data Steward Studio really enable those things to come together. >> So the Data Steward Studio is the second service >> Like an app. >> under the Hortonworks DataPlane service. >> Yeah, so the whole idea is to be able to tie those things together, and when you think about it in today's hybrid world, and this is where I really started, where your data strategy is your cloud strategy, they can't be separate, because if they're separate, just think about what would happen. So I've copied a bunch of data out to the cloud. All memory of any lineage is gone. Or I've got to go set up manually another set of lineage that may not be the same as the lineage it came with. And so being able to provide that common service across footprint, whether it's multiple data centers, whether it's multiple clouds, or both, is a really huge value, because now you can sit back and through that single pane, see all of your data assets and understand how they interact. That obviously has the ability then to provide value like with Data Steward Studio, to discover assets, maybe to discover assets and discover duplicate assets, where, hey, I can save some money if I get rid of this cloud instance, 'cause it's over here already. Or to be compliant and say yeah, I've got these assets here, here, and here, I am now compelled to do whatever: delete, protect, encrypt. I can now go do that and keep a record through the metadata that I did it. >> Yes, in fact that is very much at the heart of compliance, you got to know what assets there are out there. And so it seems to me that Hortonworks is increasingly... the H-word rarely comes up these days. >> Scott: Not Hortonworks, you're talking about Hadoop. >> Hadoop rarely comes up these days. When the industry talks about you guys, it's known that's your core, that's your base, that's where HDP and so forth, great product, great distro. In fact, in your partnership with IBM, a year or more ago, I think it was IBM standardized on HDP in lieu of their distro, 'cause it's so well-established, so mature. But going forward, you guys in many ways, Hortonworks, you have positioned yourselves now. Wikibon sees you as being the premier solution provider of big data governance solutions specifically focused on multi-cloud, on structured data, and so forth. So the announcement today of the Data Steward Studio very much builds on that capability you already have there. So going forward, can you give us a sense to your roadmap in terms of building out DataPlane's service? 'Cause this is the second of these services under the DataPlane umbrella. Give us a sense for how you'll continue to deepen your governance portfolio in DataPlane. >> Really the way to think about it, there are a couple of things that you touched on that I think are really critical, certainly for me, and for us at Hortonworks to continue to repeat, just to make sure the message got there. Number one, Hadoop is definitely at the core of what we've done, and was kind of the secret sauce. Some very different stuff in the technology, also the fact that it's open source and community, all those kinds of things. But that really created a foundation that allowed us to build the whole beginning of big data data management. And we added and expanded to the traditional Hadoop stack by adding Data in Motion. And so what we've done is-- >> Interviewer: NiFi, I believe, you made a major investment. >> Yeah, so we made a large investment in Apache NiFi, as well as Storm and Kafka as kind of a group of technologies. And the whole idea behind doing that was to expand our footprint so that we would enable our customers to manage their data through its entire lifecycle, from being created at the edge, all the way through streaming technologies, to landing, to analytics, and then even analytics being pushed back out to the edge. So it's really about having that common management infrastructure for the lifecycle of all the data, including Hadoop and many other things. And then in that, obviously as we discuss whether it be regulation, whether it be, frankly, future functionality, there's an opportunity to uplevel those services from an overall security and governance perspective. And just like Hadoop kind of upended traditional thinking... and what I mean by that was not the economics of it, specifically, but just the fact that you could land data without describing it. That seemed so unimportant at one time, and now it's like the key thing that drives the difference. Think about sensors that are sending in data that reconfigure firmware, and those streams change. Being able to acquire data and then assess the data is a big deal. So the same thing applies, then, to how we apply governance. I said this morning, traditional governance was hey, I started this employee, I have access to this file, this file, this file, and nothing else. I don't know what else is out there. I only have access to what my job title describes. And that's traditional data governance. In the new world, that doesn't work. Data scientists need access to all of the data. Now, that doesn't mean we need to give away PII. We can encrypt it, we can tokenize it, but we keep referential integrity. We keep the integrity of the original structures, and those who have a need to actually see the PII can get the token and see the PII. But it's governance thought inversely as it's been thought about for 30 years. >> It's so great you've worked governance into an increasingly streaming, real-time in motion data environment. Scott, this has been great. It's been great to have you on The Cube. You're an alum of The Cube. I think we've had you at least two or three times over the last few years. >> It feels like 35. Nah, it's pretty fun.. >> Yeah, you've been great. So we are here at Dataworks Summit in Berlin. (upbeat music)

Published Date : Apr 18 2018

SUMMARY :

Brought to you by Hortonworks. So Scott, this morning, it's great to have ya'. Glad to be back and good to see you. So, one of the things you discussed this morning, of the new modern data architecture era that we live in, forgotten, the only way that you can guarantee and foremost thing for an enterprise to be able And so what we've been trying to do is really leverage so that I can guarantee the lineage of the data, discussing the open metadata framework you're describing And that was part of this morning's keynote close also. those things to come together. of lineage that may not be the same as the lineage And so it seems to me that Hortonworks is increasingly... When the industry talks about you guys, it's known And so what we've done is-- Interviewer: NiFi, I believe, you made So the same thing applies, then, to how we apply governance. It's been great to have you on The Cube. Nah, it's pretty fun.. So we are here at Dataworks Summit in Berlin.

ENTITIES

Entity	Category	Confidence
Europe	LOCATION	0.99+
Scott	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Berlin	LOCATION	0.99+
Scott Gnau	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Teradata	ORGANIZATION	0.99+
Last year	DATE	0.99+
May 25th	DATE	0.99+
five weeks	QUANTITY	0.99+
Mandy Chessell	PERSON	0.99+
GDPR	TITLE	0.99+
Munich	LOCATION	0.99+
Rob Bearden	PERSON	0.99+
second service	QUANTITY	0.99+
30 years	QUANTITY	0.99+
both	QUANTITY	0.99+
tomorrow	DATE	0.99+
first	QUANTITY	0.99+
Berlin, Germany	LOCATION	0.99+
second	QUANTITY	0.99+
DataPlane	ORGANIZATION	0.99+
sixth year	QUANTITY	0.98+
three times	QUANTITY	0.98+
first interviewee	QUANTITY	0.98+
Dataworks Summit	EVENT	0.98+
one	QUANTITY	0.97+
this morning	DATE	0.97+
DataWorks Summit 2018	EVENT	0.97+
MapReduce	ORGANIZATION	0.96+
Hadoop	TITLE	0.96+
Hadoop	ORGANIZATION	0.96+
one time	QUANTITY	0.96+
35	QUANTITY	0.96+
single pane	QUANTITY	0.96+
NiFi	ORGANIZATION	0.96+
today	DATE	0.94+
DataWorks Summit Europe 2018	EVENT	0.93+
Data Steward Studio	ORGANIZATION	0.93+
Dataworks Summit EU 2018	EVENT	0.92+
about seven years ago	DATE	0.91+
a year or	DATE	0.88+
years	DATE	0.87+
Storm	ORGANIZATION	0.87+
Wikibon	ORGANIZATION	0.86+
Apache NiFi	ORGANIZATION	0.85+
The Cube	PERSON	0.84+
North American	OTHER	0.84+
DataWorks	ORGANIZATION	0.84+
Data Plane	ORGANIZATION	0.76+
Data Steward Studio	TITLE	0.75+
Kafka	ORGANIZATION	0.75+

Allan Naim, Google | DevNet Create 2018

>> Announcer: Live from the Computer History Museum in Mountain View, California. It's theCUBE, covering DevNet Create 2018, brought to you by Cisco. >> Hi there and welcome to the special CUBE live broadcast here at the Computer History Museum in Mountain View, California, It's theCUBE's exclusive coverage of Cisco's DevNet Create. This is Cisco's developer ecosystem brand new, second event that they've done, and it's one and a half years in existence. This is Cisco's extension to their DevNet developer program, which is mostly Cisco developers, mostly networking, and theCUBE is here covering the future of cloud native Kubernetes, and the future of application development, as networks become more programmable. I'm John Furrier, your host, with Lauren Cooney, analyst today co-hosting with me, all day coverage. Our guest is Allan Naim, who is the product manager at Google Kubernetes engine, at Google, right down the street here. Allan, great to have you, thanks for joining us. >> Yeah, thanks for inviting me. >> So, you are the key man with the fireside chat with Susie Wee who is heading up this whole program, doing an amazing job. Google's no stranger. We all know Google at the scale level, massive scale, running infrastructure, building your own stuff, really inventing the category and then fast followers, Facebook among others, large scale. So, you guys invented Kubernetes. So that's a fact. So, tell the story of how it started because there was a moment in Google where Kubernetes, there was a debate. Do we keep it internally, open it up? And you guys have history. You've created MapReduce, you've created the data surge that we're seeing now and changing the game there. Maybe a little bit differently than how Kubernetes is handled. What's the inside story about the creation of Kubernetes and how it's evolved? >> Yeah, so Google has been working with containers for a long, long, time. It's nothing new to Google, and we wanted really to take a lot of the best practices associated with how we manage and run containers internally and share that with the community as a whole. What we found initially was the move to the cloud was very much traditionally a lift and shift and modernized move. And, there's a reason why only, I think the latest statistic I've seen is less than 10% of the applications have actually moved to the cloud. What about the other 90%? So, we wanted to bring some of the magic that Google uses internally and bring that to the world, right, so that you can modernize wherever you're running, right, for those applications that can't just move to the cloud. Why not provide a way to take advantage of some of the innovations that we've created around packaging applications up, deploying applications very seamlessly, and then eventually moving them to the cloud with less friction? And that was really behind the reason we took Kubernetes, which is really a set of best practices around how Google runs and operates containers, and made it available to the open source community. We could've kept it internally, right, and not shared it with the community, but then that really stifles innovation. Google is not about stifling innovation. We're about enabling the community to really drive innovation and build an ecosystem around it. And looking back now, it was a tremendous move. >> Yeah, and you know what, the leadership I remember at that time, and I wanted to get that out there. Thank you for sharing that. Craig McLuckie, Brendan Burns, Joe Beda, those guys and the team around them, it was kind of a small team, held the line on that. And the conversation was, this needs to happen in an open way mainly because you saw, though, how to manage your workloads internally and wanted to bring it to the masses. So, real props to the original team, a really good call, and again, it worked out great. >> Yes. >> So, okay, today. Where are we today? Because now you go back at the creation of Kubernetes, you guys open it up, still contributed and nurtured it, and now it becomes part of the bigger part of the open source community. You have now new innovations. What is the update from your standpoint where Kubernetes is today? Okay, it's well know that the containers is now standard and standard. Now the business model container hasn't materialized. That's okay. The technical architecture is very solid. Kubernetes has become the favorite child in the architecture because of the benefits. What's the update? What's Kubernetes doing today that's compelling? What's the update? >> Yeah, so just as you said, containers are mainstream now. Kubernetes is on fire. We see a world today where Kubernetes is literally running everywhere, right, from Google Cloud to other clouds to partnerships that we have with the likes of Cisco. You now have these clusters that are popping up in heterogeneous environments. So, we've enabled developers now to really build services very efficiently and update those services in a consistent manner regardless of where those services are running. Now, as you build more and more clusters and expose more and more services, the day two experience starts coming in, right? How do I manage this environment? How do I manage my services? How do I find out what these services are actually doing, which services are talking to each other? How do I do more of the networking aspect around traffic management? And this is where I see a lot of the investments happening right now in the open source world with projects like Istio, which are fairly new, but are taking a lot of the goodness that Kubernetes is bringing and applying more of an operations mindset around networking. >> And what problem is that solving? Can you be specific? Because I like this day two experience. I mean, day three will be like, oh my God. How do you manage it beyond that, but, what is the problem that's being solved? Is it more industrial strength, is it more tolerance? Is it securities or all the above? What's the main problem? >> It's security, it's when you're running services in heterogeneous environments, there is no consistent security model, right? Istio helped solved some of that. It's service discovery. When services are running, again, in environments where you have different mechanisms for storing services, how do you discover these services? Now, how do you route traffic to the right service? How do you do canary deployments where perhaps I'd like to trickle certain load onto a new version and eventually move all my work into the new version that I've deployed? So, canary testing. Running services in geographic locations and using networking algorithms to route my requests to the closest location. Those are all really hard challenges that you need to solve, and technologies like Istio really make it possible for developers to get those benefits without having to write a single line of code, right? So, you leverage the API to get all these benefits that I just talked about. >> I want to get you in for a minute to talk about that if we can. Talk about Google cloud right now vis a vis the momentum because a lot's changed with Google just in the past couple of years. A lot of people on board, new hires, industry veterans, leaders. We've heard Lou Tucker from Cisco say at CubeCon that Istio is probably the biggest thing he's seen in years in terms of its implementation capability to impact the valued creation of application developers and also in creating efficiencies in networks. How is the Google team right now doing? Give an update, because you guys are now in the center of it and I've called you guys, the real competitor to Amazon, because I consider you and amazon probably the coolest cloud and most relevant clouds vis a vis what clients want to do in a modern era. Not so much retrofitting legacy cloud to make it kind of retrofit, but really doing ground zero cutting edge cloud stuff. What's the update from Google Cloud? What are you guys most proud of? What's the things that you want to highlight that are notable? >> So, Google Cloud's been growing at a tremendous rate. It's just mind-boggling how fast customer adoption has been. What we've seen is, the adoption has spanned all the way from startup to small, medium-sized businesses, extending into the Fortune 100s regardless of industry. And what we hear from customers is they like the clean APIs that Google provides. They like our compute infrastructure from a resiliency standpoint, the transparency that we provide in terms of enabling customers in running their workloads on Google Cloud. We've made a lot of investments in Google Cloud and we continue to make these investments. Now, on the cloud native and container fronts, what we're doing and what we're focusing on is really a differentiated model where we are working with customers to enable them to modernize in place and move to the cloud at their own pace versus having to lift and shift an application to take advantage of modernization and APIs in the cloud. That's really a differentiating story that we're bringing to the table. Along with that, we continue to invest in storage, in optimizing our networking, in setting up more and more points of presence around the world. We added, I believe, over 12 zones last year around the world. So, the growth rate has just been phenomenal. On the Kubernetes side, it's all about value, right? It's all about differentiated value as well. Google has been operating a managed Kubernetes service now for over two years. Building and providing a managed service is hard, right? We have the expertise to do that. We feel that Google Cloud is the best environment on the planet for running containers. And through this expertise, we'll continue to invest to bring our services and make it a first class experience to run managed scale containers as well. >> So, would it be safe to say that you guys are focused on differentiating and not trying to be the whole world, everything to everybody, to really kind of narrow the focus? >> Well, there are table stakes that you need to address, especially around storage and networking, and we feel we've gotten there, right? Now, for a customer that's picking a cloud, whether it's Google or any other cloud, we've addressed those table stakes. But on the cloud native side of the house, when building containerized applications, we feel that we have a differentiated offering that really no other cloud on the planet can deliver on. >> That's awesome. Let's talk about, and my last question is mush more about developers' relationship to the new architecture. We'll call this the new architecture. >> Yeah. >> You've got Kubernetes which has done some great innovate work, containers continue to be a great resource aspect of architecture, and storage infrastructure becoming more programmable like what Cisco's offering. Great stuff. App developers. I just want to write code. So, you've got some developers. How does a developer, in your opinion, Google's opinion, yours and Google's opinion, how do they determine their relationship to the network or the new architecture? You've got some guys who just want to write apps. So, I don't want to do any kind of speeds and feeds. Some guys want to get down and dirty and wire up some services when you get in the middle layer, and some might want to get down low in the stack. How does a developer kind of peg their orientation to different parts of the cloud architecture? >> So, when you really think about it, Kubernetes is a logical layer that sits on top of infrastructure that makes it possible to take an application that runs a certain way in one location to run consistently in other locations. So, for application developers that just want to write code, we've got a clean set of APIs that they can take advantage of to spin up cluster resources, deploy their applications. We've been heavily focused as well on not just creating an amazing story for stateless applications, but stateful applications as well. So, being able to orchestrate, you choreograph your application deployment. Now, for developers that want to get their hands dirty, the way we've designed Kubernetes is very much an extensible model. So, the Kubernetes APIs can be extended and functionality can actually be over written to tailor the experience. A developer may want to plug in a different type of controller, for example, versus the standard Kubernetes controller. So, we enabled that, think of it as a peel the onion approach, so that we can meet the developer where they are and give them the tools required for them to actually be productive in their companies or in the community. >> Awesome. Right, and you guys have a deal with Cisco, or relationship with Cisco, or else you're here, at the DevNet Create event, which is about cloud native, not so much about being kind of Cisco or DevNet, the classic developer program. On stage you talked about Istio. Is that the key to the partnership with Cisco? What specifically is your relationship to Cisco? >> Yeah, it's a great question. So, with Cisco, we've been hearing from customers a lot that getting Kubernetes up and running on premise is really hard. We've also been hearing a lot from customers that they want support. So, we got together with Cisco to provide a hybrid offering that tailors customers that want to start their journey to cloud native on prem. So, Cisco basically provides a mechanism, right, for customers to actually run Kubernetes on prem with a single support model for all their needs, which is great for Google because this is something that Cisco-- >> They know a lot about that. >> Absolutely. Now, for customers that want to start building in the cloud and connecting to the cloud, but you need secure performance networking. How do you do that, right? Well, Cisco is an innovator in networking and security. Google is an innovator in cloud and open source technology and cloud native technology. So, we bring these two things together to give really developers and sys admins a world where they can collaborate and have an API-driven approach to running workloads that span a hybrid estate. >> John: And it's great for you guys too. You open up your market to the enterprise. >> Yeah, I would say that also it really gives an opportunity for network engineers and developers, and I think you talked about clusters ops and Arkino and new types of app ops that you're bringing to the table-- >> Yep, yes. >> And what kind of roles do you see these people playing as you grow that ecosystem? >> Exactly. It's not just about the technology, but it's the culture within the company that oftentimes really drives, it's a hard obstacle to bypass. For customers that I talk to, a lot of times they tell me, look, we've settled. We want to go with Kubernetes, but what about the internal culture? How do we build our teams around Kubernetes? How do we scale our services in such a way where we have specialization of service?kino And I talked about Narkino, the whole notion of separation of concerns where we introduce this new notion in terms of how Google does things of an application ops team that's typically small in size, but their role starts where the developer role ends, and basically, they're responsible for taking an application from a developer and deploying it out into a environment. Then you have a cluster ops role team that's focused on the underlying infrastructure and maintains all the various cluster APIs, the Kubernetes environment. So, think of them as shared services that are very much tailored to enabling developers to do what they do best and build great applications and push changes in production very quickly. >> Well, thanks for coming out to theCUBE. I know you've got another hard stop. You're got another panel. Real quick, I'll give you the final word. What's the one thing people should know about Google Cloud that they may not know about or gets buried in the noise out in the marketplace? >> Yeah. Google Cloud is the most innovative cloud out there on the market. We have points of presence in literally every region around the world. Our APIs are some of the cleanest out there of any cloud, as well as the Kubernetes experience running in Google has been something that we've been invested in for over two years and it's actually a highly optimized experience for developers that want to run their containerized application and very differentiated. And 100% upstream compatible with Kubernetes open source. >> That's great stuff. I got to tell you, just Google team, we covered all the cloud players from day one. There's no shortcut. You've got to put the work in, whether it's public sector or getting the building blocks in there. You guys do a great job. Congratulations. >> Thank you. >> Kubernetes is worth noting. theCUBE covering all the action, and the story here is Kubernetes, Google's creation, which is now open standard for all, 100% upstream compatible here at the Cisco's DevNet Create event. Back with more live coverage. I'm John Furrier with Lauren cooney after this short break. (upbeat music) [Announcer] In center.

Published Date : Apr 10 2018

SUMMARY :

brought to you by Cisco. and the future of application development, So, tell the story of how it started to the world, right, so that you can modernize wherever So, real props to the original team, a really good call, and now it becomes part of the bigger part How do I do more of the networking aspect Is it securities or all the above? into the new version that I've deployed? in the center of it and I've called you guys, We have the expertise to do that. that really no other cloud on the planet can deliver on. to the new architecture. and wire up some services when you get in the middle layer, a peel the onion approach, so that we can meet the Is that the key to the partnership with Cisco? for customers to actually run Kubernetes on prem in the cloud and connecting to the cloud, John: And it's great for you guys too. And I talked about Narkino, the whole notion What's the one thing people should know Google Cloud is the most innovative cloud out there or getting the building blocks in there. and the story here is Kubernetes, Google's creation,

ENTITIES

Entity	Category	Confidence
Susie Wee	PERSON	0.99+
Lauren Cooney	PERSON	0.99+
Cisco	ORGANIZATION	0.99+
Allan Naim	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
John	PERSON	0.99+
amazon	ORGANIZATION	0.99+
Allan	PERSON	0.99+
Brendan Burns	PERSON	0.99+
Lou Tucker	PERSON	0.99+
Joe Beda	PERSON	0.99+
John Furrier	PERSON	0.99+
Craig McLuckie	PERSON	0.99+
Lauren cooney	PERSON	0.99+
90%	QUANTITY	0.99+
last year	DATE	0.99+
100%	QUANTITY	0.99+
less than 10%	QUANTITY	0.99+
Mountain View, California	LOCATION	0.99+
second event	QUANTITY	0.99+
one and a half years	QUANTITY	0.99+
Kubernetes	TITLE	0.99+
over two years	QUANTITY	0.99+
two things	QUANTITY	0.98+
DevNet	TITLE	0.98+
today	DATE	0.98+
Facebook	ORGANIZATION	0.98+
DevNet Create	TITLE	0.98+
Istio	ORGANIZATION	0.96+
DevNet Create	EVENT	0.96+
one location	QUANTITY	0.95+
two	QUANTITY	0.95+
one	QUANTITY	0.95+
over 12 zones	QUANTITY	0.94+
single line	QUANTITY	0.93+
day three	QUANTITY	0.93+
theCUBE	ORGANIZATION	0.92+
Kubernetes	ORGANIZATION	0.92+
Computer History Museum	LOCATION	0.91+
DevNet Create 2018	TITLE	0.9+

Murthy Mathiprakasam, Informatica | Big Data SV 2018

>> Narrator: Live from San Jose, it's theCUBE. Presenting big data silicon valley, brought to you be Siliconangle Media and its ecosystem partner. >> Welcome back to theCUBE we are live in San Jose, at Forger Eatery, super cool place. Our first day of our two days of coverage at our event called Big Data SV. Down the street is the Strata Data Conference, and we've got some great guests today that are going to share a lot insight and different perspectives on Big Data. This is our 10th big data event on theCUBE, our fifth in San Jose. We invite you to come on down to Forger Eatery and we also invite you to come down this evening. We've got a party going on and we've got a really cool breakfast presentation on the analysis site in the morning. Our first guest is, needs no introduction to theCUBE, he's a Cube Alumni, Murthy Mathiprakasam, did I get that right? >> Murthy: Absolutely. >> Murthy, awesome, as we're going to call him. The director of product marketing for Informatica, welcome back to theCUBE, it's great to have you back. >> Thanks for having me back, and congratulations on the 10 year anniversary. >> Yeah! So, interesting, exciting news from Informatica in the last two days, tell us about a couple of those big announcements that you guys just released. >> Absolutely, yes. So this has been very exciting year for us lots of, you know product, innovations and announcements, so just this week alone, actually there's one announcement that's probably going out right now as we speak, around API management, so one of the things, probably taking about before we started interviews you know around the trend toward cloud, lots of people doing a lot more data integration and application integration in the cloud space. But they face all the challenges that we've always seen in the data management space. Around developer productivity, and hand coding, just a lot of complexity that organizations have around maintenance. So one of the things at Informatica always brought to every domain that we cover is this ability to kind of abstract the underlying complexity, use a graphical user interface, make things at the logical level instead of the physical level. So we're bringing that entire kind of paradigm to the API management space. That's going to be very exciting, very game changing on the kind of app-to-app integration side of things. Back on the data world of course, which is what we're, you know, mainly talking about here today. We're doing a lot there as well. So we announced kind of a next generation of our data management platforms for the big data world, part of that is also a lot of cloud capabilities. 'Cause again, one of the bigger trends. >> Peter: Have you made a big bet there? >> Absolutely, and I mean this is the investment, return on investments over like 10 years, right? We were started in a kind of cloud game about 10 years ago with our platform as a service offering. So that has been continuously innovated on and we've been engineering, re-imagining that, to now include more of the big data stuff in it too, because more and more people are building data lakes in the cloud. So it's actually quite surprising, you know the rate at which the data lake kind of projects are now either migrating or just starting in the cloud environments. So given that being the trend, we were kind of innovating against that as well. So now our platform is service offerings supports the ability to connect to data sources in the cloud natively. You can do processing and gestion in the cloud. So there's a lot of really cool capabilities, again it's kind of bringing the Informatica ease of use, and kind of acceleration that comes to platform approach to the cloud environment. And there's a whole bunch of other announcements too, I mean I could spend 20 minutes, just on different innovations, but you know bringing artificial intelligence into the platform so we can talk more about that. >> Well I want to connect what you just announced with the whole notion of the data lake, 'cause it's really Informatica strength has always been in between. And it turns out that where a lot the enterprise problems have been. So the data lake has been there, but it's been big, it's been large, it was big data and the whole notion is make this as big as you can and we'll figure out what to do with it later. >> Murthy: Right. >> And now you're doing the API which is just a indication that we're seeing further segmentation and a specificity, a targeting of how we're going to use data, the value that we create out of data and apply it to business problems. But really Informatica strength is been in between. >> Murthy: Absolutely. >> It's been in, knowing where you data is, it's been in helping to build those pipelines and managing those pipelines. How have the investments that you've made over the last few years, made it possible for you to actually deliver an API orientation, that will actually work for Enterprises? >> Yeah, absolutely, and I would actually phrase it as sort of platform orientation, but you're exactly right. So what's happening is, I view this as sort of maturation of a lot of these new technologies. You know Hadoop was a very very, as you were saying kind of experimental technology four or five years ago. And we had customers too who were kind of in that experimental phase. But what's happening now is, big data isn't just a conversation with data engineers and developers, we're talking to CDO's, and Chief Data Officers, and VP's of data infrastructures about using Hadoop for Enterprise scale projects, now the minute you start having a conversation with a Chief Data Officer, you're not just talking about simple tools for ingestion and stuff like that. You're talking about security, you're talking about compliance, you're talking about GDPR if you're in Europe. So there's a whole host of sort of data management challenges, that are now relevant for the big data world, just because the big data world has become main stream. And so this is exactly to your point, where the investments that I think Informatica has been making and bringing our kind of comprehensive platform oriented approach to this space are paying off. Because for Chief Data Officer, they can't really do big data without those features. They can't not deal with security and compliance, they can't not deal with not knowing what the data is. 'Cause they're accountable for knowing what the data is, right? And so, there's a number of things that by virtue of the maturation of the industry, I think that trends are pointing toward, the enterprises kind of going more toward that platform approach. >> On that platform approach Informatica's really one of the only vendors that's talking about that, and delivering it. So that clearly is an area of differentiation. Why do you think that's nascent. This platform approach verses a kind of fit-for-purpose approach. >> Yeah, absolutely. And we should be careful with even the phrase fit-for-purpose too, 'cause I think that word gets thrown around a lot as it's one of those buzz words in the industry. Because it's sort of the positive way of saying incomplete, you know? And so, I think there are vendors who have tried to kind of address, know you one aspect of sort of one feature of the entire problem, that a Chief Data Officer would care about. They might call it fit-for-purpose, but you have to actually solve a problem at the end of the day. The Chief Data Officer's are trying to build enterprise data pipelines. You know you've got raw information from all sorts of data sources, on premise, in the cloud. You need to push that through a process, like at manufacturing process of being able to ingest it, repair it, cleanse it, govern it, secure it, master it, all the stuff has to happen in order to serve all the various communities that a Chief Data Officer has to serve. And so you're either doing all that or you're not. You know, that's the problem, that way we see the problem. And so the platform approach is a way of addressing the comprehensive set of problems that a Chief Data Officer, or these kind of of Data Executives care about, but also do it in a way, that fosters productivity and re-usability. Because the more you sort of build things in a kind of infrastructure level way, as soon as the infrastructure changes you're hosed, right? So you're seeing a lot of this in the industry now too, where somebody built something in Mapreduce three years ago, as soon as Spark came out, they're throwing all that stuff away. And it's not just, you know, major changes like that, even versions of Spark, or versions of Hadoop, can sometimes trigger a need to recode and throw away stuff. And organization can't afford this. When you're talking about 40 to 50% growth in the data overall. The last thing you want to do is make an investment that you're going to end up throwing away. And so, the platform approach to go back to your question, is the sort of most efficient pathway from an investment stand point, that an enterprise can take, to build something now that they can actually reuse and maintain and kind of scale in a very very pragmatic way. >> Well, let me push you on that a little bit. >> Murthy: Yeah. >> 'Cause what we would say is that, the fit-to-purpose is okay so long as you're true about the purpose, and you understand what it means to fit. What a lot of the open source, a lot of companies have done, is they've got a fit-to-purpose but then they do make promises that they say, oh this is fit-to-purpose, but it's really a platform. And as a consequence you get a whole bunch of, you know, duck-like solutions, (laughing) That are, you know, are they swimming, or are they flying, kind of problems. So, I think that what we see clients asking for, and this is one of my questions, what we see clients asking for is, I want to invest in technologies that allow me to sustain my investments, including perhaps some of my mistakes, if they are generating business value. >> Murthy: Right. >> So it's not a rip and replace, that's not what you're suggesting, what you're suggesting I think is, you know, use what you got, if it's creating value continue to use it, and then over time, invest the platform approach that's able to generate additional returns on top of it. Have I got that right? >> Absolutely. So it goes back to flexibility, that's the key word, I think that's kind of on the minds of a lot of Chief Data Officers. I don't want to build something today, that I know I'm going to throw away a year from now. >> Peter: I want to create options for the future. >> Create options. >> When I build them today. >> Exactly. So even the cloud, you're bringing up earlier on, right? Not everybody knows exactly what their cloud strategy is. And it's changing extremely rapidly, right? We had almost, we were seeing very few big data customers in the cloud maybe even a year or two ago? Now we're close to almost 50% of our big data business is people deploying off premise, I mean that's amazing, you know in a period of just a year or two. So Chief Data Officers are having to operate in these extreme kind of high velocity environments. The last thing you want to do is make a bet today, with the knowledge that you're going to end up having to throw away that bet in six months or a year. So the platform approach is sort of like your insurance policy because it enables you to design for today's requirements, but then very very quickly migrate or modify for new requirements that maybe be six months, a year or two down the line. >> On that front, I'd love for you to give us an example of a customer that has maybe in the last year, since you've seen so much velocity, come to you. But also had other technologies and their environment that from a cost perspective, I mean but at Peter's point there's still generating value, business value. How do you help customers that have multiple different products maybe exploring different multi-calibers, how to they come and start working with Informatica and not have to rip out other stuff, but be able to move forward and achieve ROI? >> So, it's really interesting kind of how people think about the whole rip and replace concept. So we actually had a customer dinner last night and I'm sitting next to a guy, and I was kind of asking very similar question. Tell me about your technology landscape, you know where are things going, where have things gone in the past, and he basically said there's a whole portfolio of technologies that they plan to obsolete. 'Cause they just know that, like they're probably, they don't even bother thinking about sustainability, to your point. They just want to use something just to kind of try it out. It's basically like a series of like three month trails of different technologies. And that's probably why we such proliferation of different technologies, 'cause people are just kind of trying stuff out, but it's like, I know I'm going to throw this stuff out. >> Yeah but that's, I mean, let me make sure I got that. 'Cause I want to reconcile a point. That's if they're in pilot and the pilot doesn't work. But the minute it goes into production and values being created they want to be able to sustain that stream of value. >> This is production environment. I'm glad you asked that question. So this is a customer that, and I'll tell you where I'm going to the point. So they've been using Informatica for over four years, for big data which is essentially almost the entire time big data's been around. So the reason this customers making the point is, Informatica's the only technology that is actually sustained precisely for the point that you're bringing up, because their requirements have changed wildly during this time. Even the internal politics of who needs access to data, all of that has changed radically over these four years. But the platform has enabled them to actually make those changes, and it's you know, been able to give them that flexibly. Everything else as far as, you know, developer tools, you know, visualization tools, like every year there's some kind of new thing that sort of comes out. And I don't want to be terribly harsh, there's probably one or two kind of vendors that have also persisted in those other areas. But, the point that they were trying to make to your original point is, is the point about sustainability. Like, at some point to avoid complete and utter chaos, you got to have like some foundation in the data environment. Something actually has to be something you can invest in today, knowing that as these changes internally externally are happening, you can kind of count on it and you can go to cloud you can be on Premise, you can have structured data, unstructured data, you know, for any type of data, any type of user, any type of deployment environment. I need something that I can count on, that's actually existing for four or more years. And that's where Informatica fits in. And meanwhile there's going to be a lot of other tools that, like this guy was saying, they're going to try out for three month or six months and that's great, but they're almost using it with the idea that they're going to throw it away. >> Couple questions here; What are some of the business values that you were, stating like this gentlemen, that you ere talking to last night. What's the industry that's he in and also, are there any like stats or ranges you can give us. Like, reduction in TCO, or new business models opening up. What's the business impact that Informatica is helping these customers achieve. >> Yeah, absolutely, I'll use this example, he's, I can't mention the name of the company but it's an insurance company. >> Lisa: Lot's of data. >> Lots of data, right. Not only do they have a lot of data, but there's a lot of sensitivity around the data. Because basically the only way they grow is by identifying patterns in consumers and they want to look at it if somebody's using car insurance in, maybe it for so long they're ready to get married, they need home insurance, they have these like really really sophisticated models around human behavior. So they know when to go and position new forms of insurance. There's also obviously security government types of issues that are at play as well. So the sensitivity around data is very very important. So for them, the business value is increased revenue, and you know ability to meet kind of regulatory pressure. I think that's generally, I mean every industry has some variant of that. >> Right. >> Cost production, increase revenue, you know meeting regulatory pressures. And so Informatica facilitates that, because instead of having to hire armies of people, and having to change them out maybe every three months or six months 'cause the underlying infrastructures changing, there's this one team, the Informatica team that's actually existed for this entire journey. They just keep changing, used cases, and projects, and new data sets, new deployment models, but the platform is sort of fixed and it's something that they can count on it's robust, it enables that kind of. >> Peter: It's an asset. >> It's an asset that delivers that sustainable value that you were taking about. >> Last question, we've got about a minute left, in terms of delivering value, Informatica not the only game in town, your competitors are kind of going with this MNA partnership approach. What makes Informatica stand out, why should companies consider Informatica? >> So they say like, what there's a quote about it. Imitation is the most sincere from of flattery. Yeah! (laughing) I guess we should feel as a little bit flattered, you know, by what we're seeing in the industry, but why from a customers stand point should they, you know continue to rely on Informatica. I mean we keep pushing the envelope on innovations, right? So, one the other areas that we innovated on is machine learning within the platform, because ultimately if one of the goals of the platform is to eliminate manual labor, a great way to do that is to just not have people doing it in the first place. Have machines doing it. So we can automatically understand the structure of data without any human intervention, right? We can understand if there's a file and it's got costumer names and you know, cost and skews, it must be an order. You don't actually have to say that it's an order. We can infer all this, because of the machine learning them we have. We can give recommendations to people as they're using our platform, if you're using a data set and you work with another person, we can go to you and say hey, maybe this is a data set that you would be interesting in. So those types of recommendations, predictions, discovery, totally changes the economic game for an organization. 'Cause the last thing you want is to have 40 to 50% growth in data translate into 40 to 50% of labor. Like you just can't afford it. It's not sustainable, again, to go back to your original point. The only sustainable approach to managing data for the future, is to have a machine learning based approach and so that's why, to your question, I think just gluing a bunch of stuff together still doesn't actually get to nut of sustainability. You actually have to have, the glue has to have something in it, you know? And in our case it's the machine learning approach that ties everything together that brings a data organization together, so they can actually deliver the maximum business value. >> Literally creates a network of data that delivers business value. >> You got it. >> Well Murthy, Murthy Awesome, thank you so much for coming back to theCUBE. >> Thank you! >> And sharing what's going on the Informatica and what's differentiating you guys. We wish you a great rest of the Strata Conference. >> Awesome, you as well. Thank you. >> Absolutely, we want to thank you for watching theCUBE. I'm Lisa Martin with Peter Burris, we are live in San Jose at the Forger Eatery, come down here and join us, we've got a really cool space, we've got a part-tay tonight, so come join us. And we've got a really interesting breakfast presentation tomorrow morning, stick around and we'll be right back, with our next guest for this short break. (fun upbeat music)

Published Date : Mar 7 2018

SUMMARY :

brought to you be Siliconangle Media and we also invite you to come down this evening. welcome back to theCUBE, it's great to have you back. and congratulations on the 10 year anniversary. big announcements that you guys just released. of our data management platforms for the big data world, and kind of acceleration that comes to platform approach So the data lake has been there, and apply it to business problems. for you to actually deliver an API orientation, now the minute you start having a conversation Informatica's really one of the only vendors And so, the platform approach to go back to your question, about the purpose, and you understand what it means to fit. you know, use what you got, that I know I'm going to throw away a year from now. So even the cloud, you're bringing up earlier on, right? that has maybe in the last year, of technologies that they plan to obsolete. But the minute it goes into production But the platform has enabled them to actually make What are some of the business values that you were, he's, I can't mention the name of the company and you know ability to meet kind of regulatory pressure. and it's something that they can count on it's robust, that you were taking about. Informatica not the only game in town, the glue has to have something in it, you know? that delivers business value. thank you so much for coming back to theCUBE. and what's differentiating you guys. Awesome, you as well. Absolutely, we want to thank you for watching theCUBE.

ENTITIES

Entity	Category	Confidence
40	QUANTITY	0.99+
Lisa Martin	PERSON	0.99+
Peter Burris	PERSON	0.99+
Europe	LOCATION	0.99+
Lisa	PERSON	0.99+
Peter	PERSON	0.99+
six months	QUANTITY	0.99+
San Jose	LOCATION	0.99+
Murthy Mathiprakasam	PERSON	0.99+
three month	QUANTITY	0.99+
Informatica	ORGANIZATION	0.99+
Murthy	PERSON	0.99+
20 minutes	QUANTITY	0.99+
one	QUANTITY	0.99+
two days	QUANTITY	0.99+
four	QUANTITY	0.99+
two	QUANTITY	0.99+
first day	QUANTITY	0.99+
a year	QUANTITY	0.99+
tomorrow morning	DATE	0.99+
Siliconangle Media	ORGANIZATION	0.99+
four	DATE	0.99+
one team	QUANTITY	0.99+
MNA	ORGANIZATION	0.99+
last year	DATE	0.99+
last night	DATE	0.99+
today	DATE	0.99+
fifth	QUANTITY	0.99+
50%	QUANTITY	0.98+
first guest	QUANTITY	0.98+
four years	QUANTITY	0.98+
tonight	DATE	0.98+
Cube	ORGANIZATION	0.98+
three years ago	DATE	0.98+
Big Data SV	EVENT	0.97+
five years ago	DATE	0.97+
first	QUANTITY	0.96+
Couple questions	QUANTITY	0.96+
Strata Data Conference	EVENT	0.95+
almost 50%	QUANTITY	0.95+
10 year anniversary	QUANTITY	0.94+
over four years	QUANTITY	0.94+
two kind	QUANTITY	0.94+
one announcement	QUANTITY	0.94+
GDPR	TITLE	0.94+
two ago	DATE	0.92+

Donna Prlich, Hitachi Vantara | PentahoWorld 2017

>> Announcer: Live, from Orlando, Florida, it's The Cube. Covering PentahoWorld 2017. Brought to you by, Hitachi Vantara. >> Welcome back to Orlando, everybody. This is PentahoWorld, #pworld17 and this is The Cube, The leader in live tech coverage. My name is Dave Vellante and I'm here with my co-host, Jim Kobielus Donna Prlich is here, she's the Chief Product Officer of Pentaho and a many-time Cube guest. Great to see you again. >> Thanks for coming on. >> No problem, happy to be here. >> So, I'm thrilled that you guys decided to re-initiate this event. You took a year off, but we were here in 2015 and learned a lot about Pentaho and especially about your customers and how they're applying this, sort of, end-to-end data pipeline platform that you guys have developed over a decade plus, but it was right after the acquisition by Hitachi. Let's start there, how has that gone? So they brought you in, kind of left you alone for awhile, but what's going on, bring us up to date. >> Yeah, so it's funny because it was 2015, it was PentahoWorld, second one, and we were like, wow, we're part of this new company, which is great, so for the first year we were really just driving against our core. Big-Data Integration, analytics business, and capturing a lot of that early big-data market. Then, probably in the last six months, with the initiation of Hitachi Ventara which really is less about Pentaho being merged into a company, and I think Brian covered it in a keynote, we're going to become a brand new entity, which Hitachi Vantara is now a new company, focused around software. So, obviously, they acquired us for all that big-data orchestration and analytics capability and so now, as part of that bigger organization, we're really at the center of that in terms of moving from edge to outcome, as Brian talked about, and how we focus on data, digital transformation and then achieving the outcome. So that's where we're at right now, which is exciting. So now we're part of this bigger portfolio of products that we have access to in some ways. >> Jim: And I should point out that Dave called you The CPO of Pentaho, but in fact you're the CPO of Hitachi Vantara, is that correct? >> No, so I am not. I am the CPO for the Pentaho product line, so it's a good point, though, because Pentaho brand, the product brand, stays the same. Because obviously we have 1,800 customers and a whole bunch of them are all around here. So I cover that product line for Hitachi Vantara. >> David: And there's a diverse set of products in the portfolios >> Yes. >> So I'm actually not sure if it makes sense to have a Chief Products officer for Hitachi Vantara, right? Maybe for different divisions it makes sense, right? But I've got to ask you, before the acquisition, how much were you guys thinking about IOT and Industrial IOT? It must have been on your mind, at about 2015 it certainly was a discussion point and GE was pushing all this stuff out there with the ads and things like that, but, how much was Pentaho thinking about it and how has that accelerated since the acquisition? >> At that time in my role, I had product marketing I think I had just taken Product Management and what we were seeing was all of these customers that were starting to leverage machine-generated data and were were thinking, well, this is IOT. And I remember going to a couple of our friendly analyst folks and they were like, yeah, that's IOT, so it was interesting, it was right before we were acquired. So, we'd always focus on these blueprints of we've got to find the repeatable patterns, whether it's Customer 360 in big data and we said, well they're is some kind of emerging pattern here of people leveraging sensor data to get a 360 of something. Whether it's a customer or a ship at sea. So, we started looking at that and going, we should start going after this opportunity and, in fact, some of the customers we've had for a long time, like IMS, who spoke today all around the connected cars. They were one of the early ones and then in the last year we've probably seen more than 100% growth in customers, purely from a Pentaho perspective, leveraging Machine-generated data with some other type of data for context to see the outcome. So, we were seeing it then, and then when we were acquired it was kind of like, oh this is cool now we're part of this bigger company that's going after IOT. So, absolutely, we were looking at it and starting to see those early use cases. >> Jim: A decade or more ago, Pentaho, at that time, became very much a pioneer in open-source analytics, you incorporated WECA, the open-source code base for machine-learning, data mining of sorts. Into the core of you're platform, today, here, at the conference you've announced Pentaho 8.0, which from what I can see is an interesting release because it brings stronger integration with the way the open-source analytic stack has evolved, there's some Spark Streaming integration, there's some Kafaka, some Hadoop and so forth. Can you give us a sense of what are the main points of 8.0, the differentiators for that release, and how it relates to where Pentaho has been and where you're going as a product group within Hiatachi Vantara. >> So, starting with where we've been and where we're going, as you said, Anthony DeShazor, Head of Customer Success, said today, 13 years, on Friday, that Pentaho started with a bunch of guys who were like, hey, we can figure out this BI thing and solve all the data problems and deliver the analytics in an open-source environment. So that's absolutely where we came form. Obviously over the years with big data emerging, we focused heavily on the big data integration and delivering the analytics. So, with 8.0, it's a perfect spot for us to be in because we look at IOT and the amount of data that's being generated and then need to address streaming data, data that's moving faster. This is a great way for us to pull in a lot of the capabilities needed to go after those types of opportunities and solve those types of challenges. The first one is really all about how can we connect better to streaming data. And as you mentioned, it's Spark Streaming, it's connecting to Kafka streams, it's connecting to the Knox gateway, all things that are about streaming data and then in the scale-up, scale-out kind of, how do we better maximize the processing resources, we announced in 7.1, I think we talked to you guys about it, the Adaptive Execution Layers, the idea that you could choose execution engine you want based on the processing you need. So you can choose the PDI engine, you can choose Spark. Hopefully over time we're going to see other engines emerge. So we made that easier, we added Horton Work Support to that and then this concept of, so that's to scale up, but then when you think about the scale-out, sometimes you want to be able to distribute the processing across your nodes and maybe you run out of capacity in a Pentaho server, you can add nodes now and then you can kind-of get rid of that capacity. So this concept of worker-nodes, and to your point earlier about the Hitachi Portfolio, we use some of the services in the foundry layer that Hitachi's been building as a platform. >> David: As a low balancer, right? >> As part of that, yes. So we could leverage what they had done which if you think about Hitachi, they're really good at storage, and a lot of things Pentaho doesn't have experience in, and infrastructure. So we said, well why are we trying to do this, why don't we see what these guys are doing and we leverage that as part of the Pentaho platform. So that's the first time we brought some of their technology into the mix with the Pentaho platform and I think we're going to see more of that and then, lastly, around the visual data prep, so how can we keep building on that experience to make data prep faster and easier. >> So can I ask you a really Columbo question on that sort-of load-balancing capabilities that you just described. >> That's a nice looking trench coat you're wearing. >> (laughter) gimme a little cigar. So, is that the equivalent of a resource negotiator? Do I think of that as sort of your own yarn? >> Donna: I knew you were going to ask me about that (laughter) >> Is that unfair to position it that way? >> It's a little bit different, conceptually, right, it's going to help you to better manage resources, but, if you think about Mesos and some of the capabilities that are out there that folks are using to do that, that's what we're leveraging, so it's really more about sometimes I just need more capacity for the Pentaho server, but I don't need it all the time. Not every customer is going to get to the scale that they need that so it's a really easy way to just keep bringing in as much capacity as you need and have it available. >> David: I see, so really efficient, sort of low-level kind of stuff. >> Yes. >> So, when you talk about distributed load execution, you're pushing more and more of the processing to the edge and, of course, Brian gave a great talk about edge to outcome. You and I were on a panel with Mark Hall and Ella Hilal about the, so called, "power of three" and you did a really good blog post on that the power of the IOT, and big data, and the third is either predictive analytics or machine learning, can you give us a quick sense for our viewers about what you mean by the power of three and how it relates to pushing more workloads to the edge and where Hitachi Vantara is going in terms of your roadmap in that direction for customers. >> Well, its interesting because one of the things we, maybe we have a recording of it, but kind of shrink down that conversation because it was a great conversation but we covered a lot of ground. Essentially that power of three is. We started with big data, so as we could capture more data we could store it, that gave us the ability to train and tune models much easier than we could before because it was always a challenge of, how do I have that much data to get my model more accurate. Then, over time everybody's become a data scientist with the emergence of R and it's kind of becoming a little bit easier for people to take advantage of those kinds of tools, so we saw more of that, and then you think about IOT, IOT is now generating even more data, so, as you said, you're not going to be able to process all of that, bring all that in and store it, it's not really efficient. So that's kind of creating this, we might need the machine learning there, at the edge. We definitely need it in that data store to keep it training and tuning those models, and so what it does is, though, is if you think about IMS, is they've captured all that data, they can use the predictive algorithms to do some of the associations between customer information and the censor data about driving habits, bring that together and so it's sort of this perfect storm of the amount of data that's coming in from IOT, the availability of the machine learning, and the data is really what's driving all of that, and I think that Mark Hall, on our panel, who's a really well-known data-mining expert was like, yeah, it all started because we had enough data to be able to do it. >> So I want to ask you, again, a product and maybe philosophy question. We've talked on the Cube a lot about the cornucopia of tooling that's out there and people who try to roll their own and. The big internet companies and the big banks, they get the resources to do it but they need companies like you. When we talk to your customers, they love the fact that there's an integrated data pipeline and you've made their lives simple. I think in 8.0 I saw spark, you're probably replacing MapReduce and making life simpler so you've curated a lot of these tools, but at the same time, you don't own you're own cloud, you're own database, et cetera. So, what's the philosophy of how you future-proof your platform when you know that there are new projects in Apache and new tooling coming out there. What's the secret sauce behind that? >> Well the first one is the open-source core because that just gave us the ability to have APIs, to extend, to build plugins, all of that in a community that does quite a bit of that, in fact, Kafka started with a customer that built a step, initially, we've now brought that into a product and created it as part of the platform but those are the things that in early market, a customer can do at first. We can see what emerges around that and then go. We will offer it to our customers as a step but we can also say, okay, now we're ready to productize this. So that's the first thing, and then I think the second one is really around when you see something like Spark emerge and we were all so focused on MapReduce and how are we going to make it easier and let's create tools to do that and we did that but then it was like MapReduce is going to go away, well there's still a lot of MapReduce out there, we know that. So we can see then, that MapReduce is going to be here and, I think the numbers are around 50/50, you probably know better than I do where Spark is versus MapReduce. I might be off but. >> Jim: If we had George Gilbert, he'd know. >> (laughs) Maybe ask George, yeah it's about 50/50. So you can't just abandon that, 'cause there's MapReduce out there, so it was, what are we going to do? Well, what we did in the Hadoop Distro days is we created a adaptive, big data layer that said, let's abstract a layer so that when we have to support a new distribution of Hadoop, we don't have to go back to the drawing board. So, it was the same thing with the execution engines. Okay, let's build this adaptive execution layer so that we're prepared to deal with other types of engines. I can build the transformation once, execute it anywhere, so that kind of philosophy of stepping back if you have that open platform, you can do those kinds of things, You can create those layers to remove all of that complexity because if you try to one-off and take on each one of those technologies, whether it's Spark or Flink or whatever's coming, as a product, and a product management organization, and a company, that's really difficult. So the community helps a ton on that, too. >> Donna, when you talk to customers about. You gave a great talk on the roadmap today to give a glimpse of where you guys are headed, your basic philosophy, your architecture, what are they pushing you for? Where are they trying to take you or where are you trying to take them? (laughs) >> (laughs) Hopefully, a little bit of both, right? I think it's being able to take advantage of the kinds of technologies, like you mentioned, that are emerging when they need them, but they also want us to make sure that all of that is really enterprise-ready, you're making it solid. Because we know from history and big data, a lot of those technologies are early, somebody has to get their knees skinned and all that with the first one. So they're really counting on us to really make it solid and quality and take care of all of those intricacies of delivering it in a non-open-source way where you're making it a real commercial product, so I think that's one thing. Then the second piece that we're seeing a lot more of as part of Hitachi we've moved up into the enterprise we also need to think a lot more about monitoring, administration, security, all of the things that go at the base of a pipeline. So, that scenario where they want us to focus. The great thing is, as part of Hitachi Vantara now, those aren't areas that we always had a lot of expertise in but Hitachi does 'cause those are kind of infrastructure-type technologies, so I think the push to do that is really strong and now we'll actually be able to do more of it because we've got that access to the portfolio. >> I don't know if this is a fair question for you, but I'm going to ask it anyway, because you just talked about some of the things Hitachi brings and that you can leverage and it's obvious that a lot of the things that Pentaho brings to Hitachi, the family but one of the things that's not talked about a lot is go-to-market, Hitachi data systems, traditionally don't have a lot of expertise at going to market with developers as the first step, where in your world you start. Has Pentaho been able to bring that cultural aspect to the new entity. >> For us, even though we have the open-source world, that's less of the developer and more of an architect or a CIO or somebody who's looking at that. >> David: Early adopter or. >> More and more it's the Chief Data Officer and that type of a persona. I think that, now that we are a entity, a brand new entity, that's a software-oriented company, we're absolutely going to play a way bigger role in that, because we brought software to market for 13 years. I think we've had early wins, we've had places where we're able to help. In an account, for instance, if you're in the data center, if that's where Hitachi is, if you start to get that partnership and we can start to draw the lines from, okay, who are the people that are now looking at, what's the big data strategy, what's the IOT strategy, where's the CDO. That's where we've had a much better opportunity to get to bigger sales in the enterprise in those global accounts, so I think we'll see more of that. Also there's the whole transformation of Hitachi as well, so I think there'll be a need to have much more of that software experience and also, Hitachi's hired two new executives, one on the sales side from SAP, and one who's now my boss, Brad Surak from GE Digital, so I think there's a lot of good, strong leadership around the software side and, obviously, all of the expertise that the folks at Pentaho have. >> That's interesting, that Chief Data Officer role is emerging as a target for you, we were at an event on Tuesday in Boston, there were about 200 Chief Data Officers there and I think about 25% had a Robotic Process Automation Initiative going on, they didn't ask about IOT just this little piece of IOT and then, Jim, Data Scientists and that whole world is now your world, okay great. Donna Prlich, thanks very much for coming to the Cube. Always a pleasure to see you. >> Donna: Yeah, thank you. >> Okay, Dave Velonte for Jim Kobielus. Keep it right there everybody, this is the Cube. We're live from PentahoWorld 2017 hashtag P-World 17. Brought to you by Hitachi Vantara, we'll be right back. (upbeat techno)

Published Date : Oct 26 2017

SUMMARY :

Brought to you by, Hitachi Vantara. Great to see you again. that you guys decided to that we have access to in some ways. I am the CPO for the Pentaho product line, of data for context to see the outcome. of 8.0, the differentiators on the processing you need. on that experience to that you just described. That's a nice looking So, is that the equivalent it's going to help you to David: I see, so really efficient, of the processing to in that data store to but at the same time, you to do that and we did Jim: If we had George have that open platform, you of where you guys are headed, that go at the base of a pipeline. and that you can leverage and more of an architect that the folks at Pentaho have. and that whole world is Brought to you by Hitachi

ENTITIES

Entity	Category	Confidence
Hitachi	ORGANIZATION	0.99+
Anthony DeShazor	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Jim Kobielus	PERSON	0.99+
Jim	PERSON	0.99+
George	PERSON	0.99+
Brian	PERSON	0.99+
David	PERSON	0.99+
2015	DATE	0.99+
Dave	PERSON	0.99+
Donna	PERSON	0.99+
Mark Hall	PERSON	0.99+
Dave Velonte	PERSON	0.99+
Ella Hilal	PERSON	0.99+
Donna Prlich	PERSON	0.99+
Pentaho	ORGANIZATION	0.99+
Brad Surak	PERSON	0.99+
Hitachi Vantara	ORGANIZATION	0.99+
13 years	QUANTITY	0.99+
Friday	DATE	0.99+
Mark Hall	PERSON	0.99+
George Gilbert	PERSON	0.99+
Tuesday	DATE	0.99+
Boston	LOCATION	0.99+
GE Digital	ORGANIZATION	0.99+
1,800 customers	QUANTITY	0.99+
second piece	QUANTITY	0.99+
last year	DATE	0.99+
GE	ORGANIZATION	0.99+
Orlando	LOCATION	0.99+
Orlando, Florida	LOCATION	0.99+
third	QUANTITY	0.99+
first step	QUANTITY	0.99+
Hitachi Ventara	ORGANIZATION	0.99+
two new executives	QUANTITY	0.99+
more than 100%	QUANTITY	0.99+
second one	QUANTITY	0.98+
PentahoWorld	EVENT	0.98+
today	DATE	0.98+
PentahoWorld	ORGANIZATION	0.98+
#pworld17	EVENT	0.98+
first one	QUANTITY	0.98+
first year	QUANTITY	0.97+
first time	QUANTITY	0.97+
three	QUANTITY	0.97+
one	QUANTITY	0.97+
Hiatachi Vantara	ORGANIZATION	0.96+
both	QUANTITY	0.96+
IMS	ORGANIZATION	0.96+
Kafka	TITLE	0.95+
about 200 Chief Data Officers	QUANTITY	0.95+

Amit Walia, Informatica | BigData NYC 2017

>> Announcer: Live from midtown Manhattan, it's theCUBE. Covering Big Data New York City 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. >> Okay welcome back everyone, live here in New York City it's theCUBE's coverage of Big Data NYC. It's our event we've been doing for five years in conjunction with Strata Hadoop now called Strata Data right around the corner, separate place. Every year we get the best voices tech. Thought leaders, CEO's, executives, entrepreneurs anyone who's bringing the signal, we share that with you. I'm John Furrier, the co-host of theCUBE. Eight years covering Big Data, since 2010, the original Hadoop world. I'm here with Amit Walia, who's the Executive Vice President, Chief Product Officer for Informatica. Welcome back, good to see you. >> Good to be here John. >> theCUBE alumni, always great to have you on. Love product we had everyone on from Hortonworks. >> I just saw that. >> Product guys are great, can share the road map and kind of connect the dots. As Chief Product Officer, you have to have a 20 mile stare into the future. You got to know what the landscape is today, where it's going to be tomorrow. So I got to ask you, where's it going to be tomorrow? It seems that the rubber's hit the road, real value has to be produced. The hype of AI is out there, which I love by the way. People can see through that but they get it's good. Where's the value today? That's what customers want to know. I got hybrid cloud on the table, I got a lot of security concerns. Governance is a huge problem. The European regulations are coming over the top. I don't have time to do IoT and these other things, or do I? I mean this is a lot of challenges but how do you see it playing out? >> I think, to be candid, it's the best of times. The changing times are the best of times because people can experiment. I would say if you step back and take a look, we've been talking for such a long time. If there was any time, where forget the technology jargon of infrastructure, cloud, IoT, data has become the currency for every enterprise right? Everybody wants data. I say like you know, business users want today's data yesterday to make a decision tomorrow. IT has always been in the business of data, everybody wants more data. But the point you're making is that while that has become more relevant to an enterprise, it brings into the lot of other things, GDPR, it brings governance, it brings security issues, I mean hybrid clouds, some data on-prem, some data on cloud but in essence, what I think every company has realized that they will live and die by how well do they predict the future with the data they have on all their customers, products, whatever it is, and that's the new normal. >> Well hate to say it, admit pat myself on the back, but we in theCUBE team and Wikibon saw this early. You guys did too, and I want to bring up a comment we've talked about a couple of years ago. One, you guys were in the data business, Informatica. You guys went private but that was an early indicator of the trend that everyone's going private now. And that's a signal. For the first time, private equity finance have had trumped bigger venture capital asset class financing. Which is a signal that the waves are coming. We're surfing these little waves right now, we think they're big but they big ones are coming. The indicator is everyone's retrenching. Private equity's a sign of undervaluation. They want to actually also transform maybe some of the product engineering side of it or go to market. Basically get the new surfboard. >> Yeah. >> For the big waves. >> I mean that was the premise for us too because we saw as we were chatting right. We knew the new world, which was going towards predictive analytics or AI. See data is the richest thing for AI to be applied to but the thing is that it requires some heavy lifting. In fact that was our thesis, that as we went private, look we can double down on things like cloud. Invest truly for the next four years which being in public markets sometimes is hard. So we step back and look where we are as you were acting from my cover today. Our big believers look, there's so much data, so many varying architecture, so many different places. People are in Azure, or AWS, on-prem, by the way, still on mainframe. That hasn't gone away, you go back to the large customers. But ultimately when you talk about the biggest, I would say the new normal, which is AI, which clearly has been overtalked about but in my opinion has been barely touched because the biggest application of machine learning is on data. And that predicts things, whether you want to predict forecasting, or you predict something will come down or you can predict, and that's what we believe is where the world is going to go and that's what we double down on with our Claire technology. Just go deep, bring AI to data across the enterprise. >> We got to give you guys props, you guys are right on the line. I got to say as a product person myself, I see you guys executing great strategy, you've been very complimentary to your team, think you're doing a great job. Let's get back to AI. I think if you look at the hype cycles of things, IoT certainly has, still think there's a lot more hype to have there, there's so much more to do there. Cloud was overhyped, remember cloud washing? Pexus back in 2010-11, oh they're just cloud washing. Well that's a sign that ended up becoming what everyone was kind of hyping up. It did turn out. AI thinks the same thing. And I think it's real because you can almost connect the dots and be there but the reality is, is that it's just getting started. And so we had Rob Thomas from IBM on theCUBE and, you know we were talking. He made a comment, I want to get your reaction to, he said, "You can't have AI without IA." Information architecture. And you're in the information Informatica business you guys have been laying out an architecture specifically around governance. You guys kind of saw that early too. You can't just do AI, AI needs to be trained as data models. There's a lot of data involved that feeds AI. Who trains the machines that are doing the learning? So, you know, all these things come into play back to data. So what is the preferred information architecture, IA, that can power AI, artificial intelligence? >> I think it's a great question. I think of what typically, we recommend and we see large companies do look in the current complex architectures the companies are in. Hybrid cloud, multicloud, old architecture. By the way mainframe, client server, big data, you pick your favorite archit, everything exists for any enterprise right. People are not, companies are not going to move magically, everything to one place, to just start putting data in one place and start running some kind of AI on it. Our belief is that that will get organized around metadata. Metadata is data about data right? The organizing principle for any enterprise has to be around metadata. Leave your data wherever it is, organize your metadata, which is a much lighter footprint and then, that layer becomes the true central nervous system for your new next gen information architecture. That's the layer on which you apply machine learning too. So a great example is look, take GDPR. I mean GDPR is, if I'm a distributor, large companies have their GDPR. I mean who's touching my data? Where is my data coming from? Which database has sensitive data? All of these things are such complex problems. You will not move everything magically to one place. You will apply metadata approach to it and then machine learning starts to telling you gee I some anomaly detection. You see I'm seeing some data which does not have access to leave the geographical boundaries, of lets say Germany, going to, let's say UK. Those are kind of things that become a lot easier to solve once you go organize yourself at the metadata layer and that's the layer on which you apply AI. To me, that's the simplest way to describe as the organizing principle of what I call the data architecture or the information architecture for the next ten years. >> And that metadata, you guys saw that earlier, but how does that relate to these new things coming in because you know, one would argue that the ideal preferred infrastructure would be one that says hey no matter what next GDPR thing will happen, there'll be another Equifax that's going to happen, there'll be some sort of state sponsor cyber attack to the US, all these things are happening. I mean hell, all securities attacks are going up-- >> Security's a great example of that. We saw it four years ago you know, and we worked on a metadata driven approach to security. Look I've been on the security business however that's semantic myself. Security's a classic example of where it was all at the infrastructure layer, network, database, server. But the problem is that, it doesn't matter. Where is your database? In the cloud. Where is your network? I mean, do you run a data center anymore right? If I may, figuratively you don't. Ultimately, it's all about the data. The way at which we are going and we want more users like you and me access to data. So security has to be applied at the data layer. So in that context, I just talked about the whole metadata driven approach. Once you have the context of your data, you can apply governance to your data, you can apply security to your data, and as you keep adding new architectures, you do not have to create a paddle architecture you have to just append your metadata. So security, governance, hybrid cloud, all of those things become a lot easier for you, versus clearing one new architecture after another which you can never get to. >> Well people will be afraid of malware and these malicious attacks so auditing becomes now a big thing. If you look at the Equifax, it might take on, I have some data on that show that there was other action, they were fleeced out for weeks and months before the hack was even noticed. >> All this happens. >> I mean, they were ten times phished over even before it was discovered. They were inside, so audit trail would be interesting. >> Absolutely, I'll give you, typically, if you read any external report this is nothing tied to Equifax. It takes any enterprise three months minimum to figure out they're under attack. And now if a sophisticated attacker always goes to right away when they enter your enterprise, they're finding the weakest link. You're as secure as your weakest link in security. And they will go to some data trail that was left behind by some business user who moved onto the next big thing. But data was still flowing through that pipe. Or by the way, the biggest issue is inside our attack right? You will have somebody hack your or my credentials and they don't download like Snowden, a big fat document one day. They'll go drip by drip by drip by drip. You won't even know that. That again is an anomaly detection thing. >> Well it's going to get down to the firmware level. I mean look at the sophisticated hacks in China, they run their own DNS. They have certificates, they hack the iPhones. They make the phones and stuff, so you got to assume packing. But now, it's knowing what's going on and this is really the dynamic nature. So we're in the same page here. I'd love to do a security feature, come into the studio in our office at Palo Alto, think that's worthy. I just had a great cyber chat with Vidder, CTO of Vidder. Junaid is awesome, did some work with the government. But this brings up the question around big data. The landscape that we're in is fast and furious right now. You have big data being impacted by cloud because you have now unlimited compute, low latency storage, unlimited power source in that engine. Then you got the security paradigm. You could argue that that's going to slow things down maybe a little bit, but it also is going to change the face of big data. What is your reaction to the impact to security and cloud to big data? Because even though AI is the big talk of the show, what's really happening here at Strata Data is it's no longer a data show, it's a cloud and security show in my opinion. >> I mean cloud to me is everywhere. It was the, when Hadoop started it was on-prem but it's pretty much in the cloud and look at AWS and Azure, everyone runs natively there, so you're exactly right. To me what has happened is that, you're right, companies look at things two ways. If I'm experimenting, then I can look at it in a way where I'm not, I'm in dev mode. But you're right. As things are getting more operational and production then you have to worry about security and governance. So I don't think it's a matter of slowing down, it's a nature of the business where you can be fast and experiment on one side, but as you go prod, as you go real operational, you have to worry about controls, compliance and governance. By the way in that case-- >> And by the way you got to know what's going on, you got to know the flows. A data lake is a data lake, but you got the Niagara falls >> That's right. >> streaming content. >> Every, every customer of ours who's gone production they always want to understand full governance and lineage in the data flow. Because when I go talk to a regulator or I got talk to my CEO, you may have hundred people going at the data lake. I want to know who has access to it, if it's a production data lake, what are they doing, and by the way, what data is going in. The other one is, I mean walk around here. How much has changed? The world of big data or the wild wild west. Look at the amount of consolidation that has happened. I mean you see around the big distribution right? To me it's going to continue to happen because it's a nature of any new industry. I mean you looked at securities, cyber security big data, AI, you know, massive investment happens and then as customers want to truly go to scale they say look I can only bet on a few that can not only scale, but had the governance and compliance of what a large company wants. >> The waves are coming, there's no doubt about it. Okay so, let me get your reaction to end this segment. What's Informatica doing right now? I mean I've seen a whole lot 'cause we've cover you guys with the show and also we keep in touch, but I want you to spend a minute to talk about why you guys are better than what's out there on the floor. You have a different approach, why are customers working with you and if the folks aren't working with you yet, why should they work with Informatica? >> Our approach in a way has changed but not changed. We believe we operate in what we call the enterprise cloud data management. Our thing is look, we embrace open source. Open source, parks, parkstreaming, Kafka, you know, Hive, MapReduce, we support them all. To us, that's not where customers are spending their time. They're spending their time, once I got all that stuff, what can I do with it? If I'm truly building next gen predictive analytics platform I need some level of able to manage batch and streaming together. I want to make sure that it can scale. I want to make sure it has security, it has governance, it has compliance. So customers work with us to make sure that they can run a hybrid architecture. Whether it is cloud on-prem, whether it is traditional or big data or IoT, all in once place, it is scale-able and it has governance and compliance bricked into it. And then they also look for somebody that can provide true things like, not only data integration, quality, cataloging, all of those things, so when we working with large or small customers, whether you are in dev or prod, but ultimately helping you, what I call take you from an experiment stage to a large scale operational stage. You know, without batting an eyelid. That's the business we are in and in that case-- >> So you are in the business of operationalizing data for customers who want to add scale. >> Our belief is, we want to help our customers succeed. And customers will only succeed, not just by experimenting, but taking their experiments to production. So we have to think of the entire lifecycle of a customer. We cannot stop and say great for experiments, sorry don't go operational with us. >> So we've had a theme here in theCUBE this week called, I'm calling it, don't be a tool, and too many tools are out there right now. We call it the tool shed phenomenon. The tool shed phenomenon is customers aren't, they're tired of having too many tools and they bought a hammer a couple years ago that wants to try to be a lawn mower now and so you got to understand the nature of having great tooling, which you need which defines the work, but don't confuse a tool with a platform. And this is a huge issue because a lot of these companies that are flowing by wayside are groping for platforms. >> So there are customers tell us the same thing, which is why we-- >> But tools have to work in context. >> That's exactly, so that's why you heard, we talked about that for the last couple, it was the intelligent data platform. Customers don't buy a platform but all of our products, like are there microservices on our platform. Customers want to build the next gen data management platform, which is the intelligent data platform. A lot of little things are features or tools along the way but if I am a large bank, if I'm a large airline, and I want to go at scale operational, I can't stitch hundred tools and expect to run my IT shop from there. >> Yeah >> I can't I will never be able to do it. >> There's good tools out there that have a nice business model, lifestyle business or cashflow business, or even tools that are just highly focused and that's all they do and that's great. It's the guys who try to become something that they're not. It's hard, it's just too difficult. >> I think you have to-- >> The tool shed phenomenon is real. >> I think companies have to realize whether they are a feature. I always say are you a feature or are you a product? You have to realize the difference between the two and in between sits our tool. (John laughing) >> Well that quote came, the tool comment came from one of our chief data officers, that was kind of sparked the conversation but people buy a hammer, everything looks like a nail and you don't want to mow your lawn with a hammer, get a lawn mower right? Do the right tool for the job. But you have to platform, the data has to have a holistic view. >> That's exactly right. The intelligent data platform, that's what we call it. >> What's new with Informatica, what's going on? Give us a quick update, we'll end the segment with a quick update on Informatica. What do you got going on, what events are coming up? >> Well we just came off a very big release, we call it 10-2 which had lot of big data, hybrid cloud, AI and catalog and security and governance, all five of them. Big release, just came out and basically customers are adopting it. Which obviously was all centered around the things we talked in Informatica. Again, single platform, cloud, hybrid, big data, streaming and governance and compliance. And then right now, we are basically in the middle, after Informatica, we go on as barrage of tours across multiple cities across the globe so customers can meet us there. Paris is coming up, I was in London a few weeks ago. And then separately we're getting up for coming up, I will probably see you there at Amazon re:Invent. I mean we are obviously all-in partner for-- >> Do you have anything in China? >> China is a- >> Alibaba? >> We're working with them, I'll leave it there. >> We'll be in Alibaba in two weeks for their cloud event. >> Excellent. >> So theCUBE is breaking into China, CUBE China. We need some translators so if anyone out there wants to help us with our China blog. >> We'll be at Dreamforce. We were obviously, so you'll see us there. We were at Amazon Ignite, obviously very close to- >> re:Invent will be great. >> Yeah we will be there and Amazon obviously is a great partner and by the way a great customer of ours. >> Well congratulations, you guys are doing great, Informatica. Great to see the success. We'll see you at re:Invent and keep in touch. Amit Walia, the Executive Vice President, EVP, Chief Product Officer, Informatica. They get the platform game, they get the data game, check em out. It's theCUBE ending day two coverage. We've got a big event tonight. We're going to be streaming live our research that we are going to be rolling out here at Big Data NYC, our even that we're running in conjunction with Strata Data. They run their event, we run our event. Thanks for watching and stay tuned, stay with us. At five o'clock, live Wikibon coverage of their new research and then Party at Seven, which will not be filmed, that's when we're going to have some cocktails. I'm John Furrier, thanks for watching. Stay tuned. (techno music)

Published Date : Sep 28 2017

SUMMARY :

Brought to you by SiliconANGLE Media I'm John Furrier, the co-host of theCUBE. theCUBE alumni, always great to have you on. and kind of connect the dots. I say like you know, business users want today's data of the product engineering side of it or go to market. See data is the richest thing for AI to be applied to We got to give you guys props, and that's the layer on which you apply AI. And that metadata, you guys saw that earlier, and we want more users like you and me access to data. I have some data on that show that there was other action, I mean, they were if you read any external report I mean look at the sophisticated hacks in China, it's a nature of the business where you can be fast And by the way you got to know what's going on, I mean you see around the big distribution right? and if the folks aren't working with you yet, That's the business we are in and in that case-- So you are in the business of operationalizing data but taking their experiments to production. and so you got to understand the nature That's exactly, so that's why you heard, I will never be able to do it. It's the guys who try to become something that they're not. I always say are you a feature or are you a product? and you don't want to mow your lawn with a hammer, The intelligent data platform, that's what we call it. What do you got going on, what events are coming up? I will probably see you there at Amazon re:Invent. wants to help us with our China blog. We were obviously, so you'll see us there. is a great partner and by the way a great customer of ours. you guys are doing great, Informatica.

ENTITIES

Entity	Category	Confidence
Amit Walia	PERSON	0.99+
London	LOCATION	0.99+
Alibaba	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
China	LOCATION	0.99+
ten times	QUANTITY	0.99+
Informatica	ORGANIZATION	0.99+
John	PERSON	0.99+
Equifax	ORGANIZATION	0.99+
New York City	LOCATION	0.99+
yesterday	DATE	0.99+
Rob Thomas	PERSON	0.99+
tomorrow	DATE	0.99+
five years	QUANTITY	0.99+
hundred people	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
20 mile	QUANTITY	0.99+
three months	QUANTITY	0.99+
Paris	LOCATION	0.99+
today	DATE	0.99+
five	QUANTITY	0.99+
Wikibon	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
iPhones	COMMERCIAL_ITEM	0.99+
theCUBE	ORGANIZATION	0.99+
2010	DATE	0.99+
one side	QUANTITY	0.99+
UK	LOCATION	0.99+
Palo Alto	LOCATION	0.98+
Germany	LOCATION	0.98+
AWS	ORGANIZATION	0.98+
one	QUANTITY	0.98+
four years ago	DATE	0.98+
one place	QUANTITY	0.98+
Dreamforce	ORGANIZATION	0.98+
two ways	QUANTITY	0.98+
Eight years	QUANTITY	0.98+
Vidder	ORGANIZATION	0.98+
2010-11	DATE	0.98+
tonight	DATE	0.97+
GDPR	TITLE	0.97+
NYC	LOCATION	0.97+
Junaid	PERSON	0.97+
this week	DATE	0.97+
MapReduce	ORGANIZATION	0.96+
Pexus	ORGANIZATION	0.95+
One	QUANTITY	0.95+
two weeks	QUANTITY	0.95+
five o'clock	DATE	0.94+
first time	QUANTITY	0.94+
big	EVENT	0.94+
single platform	QUANTITY	0.92+
CTO	PERSON	0.92+
Strata Hadoop	ORGANIZATION	0.91+
Claire	ORGANIZATION	0.9+
Strata Data	ORGANIZATION	0.89+
US	LOCATION	0.88+

Christian Rodatus, Datameer | BigData NYC 2017

>> Announcer: Live from Midtown Manhattan, it's theCUBE covering Big Data New York City 2017. Brought to by SiliconANGLE Media and its ecosystem sponsors. >> Coverage to theCUBE in New York City for Big Data NYC, the hashtag is BigDataNYC. This is our fifth year doing our own event in conjunction with Strata Hadoop, now called Strata Data, used to be Hadoop World, our eighth year covering the industry, we've been there from the beginning in 2010, the beginning of this revolution. I'm John Furrier, the co-host, with Jim Kobielus, our lead analyst at Wikibon. Our next guest is Christian Rodatus, who is the CEO of Datameer. Datameer, obviously, one of the startups now evolving on the, I think, eighth year or so, roughly seven or eight years old. Great customer base, been successful blocking and tackling, just doing good business. Your shirt says show him the data. Welcome to theCUBE, Christian, appreciate it. >> So well established, I barely think of you as a startup anymore. >> It's kind of true, and actually a couple of months ago, after I took on the job, I met Mike Olson, and Datameer and Cloudera were sort of founded the same year, I believe late 2009, early 2010. Then, he told me there were two open source projects with MapReduce and Hadoop, basically, and Datameer was founded to actually enable customers to do something with it, as an entry platform to help getting data in, create the data and doing something with it. And now, if you walk the show floor, it's a completely different landscape now. >> We've had you guys on before, the founder, Stefan, has been on. Interesting migration, we've seen you guys grow from a customer base standpoint. You've come on as the CEO to kind of take it to the next level. Give us an update on what's going on at Datameer. Obviously, the shirt says "Show me the data." Show me the money kind of play there, I get that. That's where the money is, the data is where the action is. Real solutions, not pie in the sky, we're now in our eighth year of this market, so there's not a lot of tolerance for hype even though there's a lot of AI watching going on. What's going on with you guys? >> I would say, interesting enough I met with a customer, prospective customer, this morning, and this was a very typical organization. So, this is a customer that was an insurance company, and they're just about to spin up their first Hadoop cluster to actually work on customer management applications. And they are overwhelmed with what the market offers now. There's 27 open source projects, there's dozens and dozens of other different tools that try to basically, they try best of reach approaches and certain layers of the stack for specific applications, and they don't really know how to stitch this all together. And if I reflect from a customer meeting at a Canadian bank recently that has very successfully deployed applications on the data lake, like in fraud management and compliance applications and things like this, they still struggle to basically replicate the same performance and the service level agreements that they used from their old EDW that they still have in production. And so, everybody's now going out there and trying to figure out how to get value out of the data lake for the business users, right? There's a lot of approaches that these companies are trying. There's SQL-on-Hadoop that supposedly doesn't perform properly. There is other solutions like OLAP on Hadoop that tries to emulate what they've been used to from the EDWs, and we believe these are the wrong approaches, so we want to stay true to the stack and be native to the stack and offer a platform that really operates end-to-end from interesting the data into the data lake to creation, preparation of the data, and ultimately, building the data pipelines for the business users, and this is certainly something-- >> Here's more of a play for the business users now, not the data scientists and statistical modelers. I thought the data scientists were your core market. Is that not true? >> So, our primary user base as Datameer used to be like, until last week, we were the data engineers in the companies, or basically the people that built the data lake, that created the data and built these data pipelines for the business user community no matter what tool they were using. >> Jim, I want to get your thoughts on this for Christian's interest. Last year, so these guys can fix your microphone. I think you guys fix the microphone for us, his earpiece there, but I want to get a question to Chris, and I ask to redirect through you. Gartner, another analyst firm. >> Jim: I've heard of 'em. >> Not a big fan personally, but you know. >> Jim: They're still in business? >> The magic quadrant, they use that tool. Anyway, they had a good intro stat. Last year, they predicted through 2017, 60% of big data projects will fail. So, the question for both you guys is did that actually happen? I don't think it did, I'm not hearing that 60% have failed, but we are seeing the struggle around analytics and scaling analytics in a way that's like a dev ops mentality. So, thoughts on this 60% data projects fail. >> I don't know whether it's 60%, there was another statistic that said there's only 14% of Hadoop deployments, or production or something, >> They said 60, six zero. >> Or whatever. >> Define failure, I mean, you've built a data lake, and maybe you're not using it immediately for any particular application. Does that mean you've failed, or does it simply mean you haven't found the killer application yet for it? I don't know, your thoughts. >> I agree with you, it's probably not a failure to that extent. It's more like how do they, so they dump the data into it, right, they build the infrastructure, now it's about the next step data lake 2.0 to figure out how do I get value out of the data, how do I go after the right applications, how do I build a platform and tools that basically promotes the use of that data throughout the business community in a meaningful way. >> Okay, so what's going on with you guys from a product standpoint? You guys have some announcements. Let's get to some of the latest and greatest. >> Absolutely. I think we were very strong in data creation, data preparation and the entire data governance around it, and we are using, as a user interface, we are using this spreadsheet-like user interface called a workbook, it really looks like Excel, but it's not. It operates at completely different scale. It's basically an Excel spreadsheet on steroids. Our customers built a data pipeline, so this is the data engineers that we discussed before, but we also have a relatively small power user community in our client base that use that spreadsheet for deep data exploration. Now, we are lifting this to the next level, and we put up a visualization layer on top of it that runs natively in the stack, and what you get is basically a visual experience not only in the data curation process but also in deep data exploration, and this is combined with two platform technologies that we use, it's based on highly scalable distributed search in the backend engine of our product, number one. We have also adopted a columnar data store, Parquet, for our file system now. In this combination, the data exploration capabilities we bring to the market will allow power analysts to really dig deep into the data, so there's literally no limits in terms of the breadth and the depth of the data. It could be billions of rows, it could be thousands of different attributes and columns that you are looking at, and you will get a response time of sub-second as we create indices on demand as we run this through the analytic process. >> With these fast queries and visualization, do you also have the ability to do semantic data virtualization roll-ups across multi-cloud or multi-cluster? >> Yeah, absolutely. We, also there's a second trend that we discussed right before we started the live transmission here. Things are also moving into the cloud, so what we are seeing right now is the EDW's not going away, the on prem is data lake, that prevail, right, and now they are thinking about moving certain workload types into the cloud, and we understand ourselves as a platform play that builds a data fabric that really ties all these data assets together, and it enables business. >> On the trends, we weren't on camera, we'll bring it up here, the impact of cloud to the data world. You've seen this movie before, you have extensive experience in this space going back to the origination, you'd say Teradata. When it was the classic, old-school data warehouse. And then, great purpose, great growth, massive value creation. Enter the Hadoop kind of disruption. Hadoop evolved from batch to do ranking stuff, and then tried to, it was a hammer that turned into a lawnmower, right? Then they started going down the path, and really, it wasn't workable for what people were looking at, but everyone was still trying to be the Teradata of whatever. Fast forward, so things have evolved and things are starting to shake out, same picture of data warehouse-like stuff, now you got cloud. It seems to be changing the nature of what it will become in the future. What's your perspective on that evolution? What's different about now and what's same about now that's, from the old days? What's the similarities of the old-school, and what's different that people are missing? >> I think it's a lot related to cloud, just in general. It is extremely important to fast adoptions throughout the organization, to get performance, and service-level agreements without customers. This is where we clearly can help, and we give them a user experience that is meaningful and that resembles what they were used to from the old EDW world, right? That's number one. Number two, and this comes back to a question to 60% fail, or why is it failing or working. I think there's a lot of really interesting projects out, and our customers are betting big time on the data lake projects whether it being on premise or in the cloud. And we work with HSBC, for instance, in the United Kingdom. They've got 32 data lake projects throughout the organization, and I spoke to one of these-- >> Not 32 data lakes, 32 projects that involve tapping into the data lake. >> 32 projects that involve various data lakes. >> Okay. (chuckling) >> And I spoke to one of the chief data officers there, and they said they are data center infrastructure just by having kick-started these projects will explode. And they're not in the business of operating all the hardware and things like this, and so, a major bank like them, they made an announcement recently, a public announcement, you can read about it, started moving the data assets into the cloud. This is clearly happening at rapid pace, and it will change the paradigm in terms of breathability and being able to satisfy peak workload requirements as they come up, when you run a compliance report at quota end or something like this, so this will certainly help with adoption and creating business value for our customers. >> We talk about all the time real-time, and there's so many examples of how data science has changed the game. I mean, I was talking about, from a cyber perspective, how data science helped capture Bin Laden to how I can get increased sales to better user experience on devices. Having real-time access to data, and you put in some quick data science around things, really helps things in the edge. What's your view on real-time? Obviously, that's super important, you got to kind of get your house in order in terms of base data hygiene and foundational work, building blocks. At the end of the day, the real-time seems to be super hot right now. >> Real-time is a relative term, right, so there's certainly applications like IOT applications, or machine data that you analyze that require real-time access. I would call it right-time, so what's the increment of data load that is required for certain applications? We are certainly not a real-time application yet. We can possibly load data through Kafka and stream data through Kafka, but in general, we are still a batch-oriented platform. We can do. >> Which, by the way, is not going away any time soon. It's like super important. >> No, it's not going away at all, right. It can do many batches at relatively frequent increments, which is usually enough for what our customers demand from our platform today, but we're certainly looking at more streaming types of capability as we move this forward. >> What do the customer architectures look like? Because you brought up the good point, we talk about this all the time, batch versus real-time. They're not mutually exclusive, obviously, good architectures would argue that you decouple them, obviously will have a good software elements all through the life cycle of data. >> Through the stack. >> And have the stack, and the stack's only going to get more robust. Your customers, what's the main value that you guys provide them, the problem that you're solving today and the benefits to them? >> Absolutely, so our true value is that there's no breakages in the stack. We enter, and we can basically satisfy all requirements from interesting the data, from blending and integrating the data, preparing the data, building the data pipelines, and analyzing the data. And all this we do in a highly secure and governed environment, so if you stitch it together, as a customer, the customer this morning asked me, "Whom do you compete with?" I keep getting this question all the time, and we really compete with two things. We compete with build-your-own, which customers still opt to do nowadays, while our things are really point and click and highly automated, and we compete with a combination of different products. You need to have at least three to four different products to be able to do what we do, but then you get security breaks, you get lack of data lineage and data governance through the process, and this is the biggest value that we can bring to the table. And secondly now with visual exploration, we offer capability that literally nobody has in the marketplace, where we give power users the capability to explore with blazing fast response times, billion rows of data in a very free-form type of exploration process. >> Are there more power users now than there were when you started as a company? It seemed like tools like Datameer have brought people into the sort of power user camp, just simply by the virtue of having access to your tool. What are your thoughts there? >> Absolutely, it's definitely growing, and you see also different companies exploiting their capability in different ways. You might find insurance or financial services customers that have a very sophisticated capability building in that area, and you might see 1,000 to 2,000 users that do deep data exploration, and other companies are starting out with a couple of dozen and then evolving it as they go. >> Christian, I got to ask you as the new CEO of Datameer, obviously going to the next level, you guys have been successful. We were commenting yesterday on theCUBE about, we've been covering this for eight years in depth in terms of CUBE coverage, we've seen the waves come and go of hype, but now there's not a lot of tolerance for hype. You guys are one of the companies, I will say, that stay to your knitting, you didn't overplay your hand. You've certainly rode the hype like everyone else did, but your solution is very specific on value, and so, you didn't overplay your hand, the company didn't really overplay their hand, in my opinion. But now, there's really the hand is value. >> Absolutely. >> As the new CEO, you got to kind of put a little shiny new toy on there, and you know, rub the, keep the car lookin' shiny and everything looking good with cutting edge stuff, the same time scaling up what's been working. The question is what are you doubling down on, and what are you investing in to keep that innovation going? >> There's really three things, and you're very much right, so this has become a mature company. We've grown with our customer base, our enterprise features and capabilities are second to none in the marketplace, this is what our customers achieve, and now, the three investment areas that we are putting together and where we are doubling down is really visual exploration as I outlined before. Number two, hybrid cloud architectures, we don't believe the customers move their entire stack right into the cloud. There's a few that are going to do this and that are looking into these things, but we will, we believe in the idea that they will still have to EDW their on premise data lake and some workload capabilities in the cloud which will be growing, so this is investment area number two. Number three is the entire concept of data curation for machine learning. This is something where we've released a plug-in earlier in the year for TensorFlow where we can basically build data pipelines for machine learning applications. This is still very small. We see some interest from customers, but it's growing interest. >> It's a directionally correct kind of vector, you're looking and say, it's a good sign, let's kick the tires on that and play around. >> Absolutely. >> 'Cause machine learning's got to learn, too. You got to learn from somewhere. >> And quite frankly, deep learning, machine learning tools for the rest of us, there aren't really all that many for the rest of us power users, they're going to have to come along and get really super visual in terms of enabling visual modular development and tuning of these models. What are your thoughts there in terms of going forward about a visualization layer to make machine learning and deep learning developers more productive? >> That is an area where we will not engage in a way. We will stick with our platform play where we focus on building the data pipelines into those tools. >> Jim: Gotcha. >> In the last area where we invest is ecosystem integration, so we think with our visual explorer backend that is built on search and on a Parquet file format is, or columnar store, is really a key differentiator in feeding or building data pipelines into the incumbent BRE ecosystems and accelerating those as well. We've currently prototypes running where we can basically give the same performance and depth of analytic capability to some of the existing BI tools that are out there. >> What are some the ecosystem partners do you guys have? I know partnering is a big part of what you guys have done. Can you name a few? >> I mean, the biggest one-- >> Everybody, Switzerland. >> No, not really. We are focused on staying true to our stack and how we can provide value to our customers, so we work actively and very important on our cloud strategy with Microsoft and Amazon AWS in evolving our cloud strategy. We've started working with various BI vendors throughout that you know about, right, and we definitely have a play also with some of the big SIs and IBM is a more popular one. >> So, BI guys mostly on the tool visualization side. You said you were a pipeline. >> On tool and visualization side, right. We have very effective integration for our data pipelines into the BI tools today we support TD for Tableau, we have a native integration. >> Why compete there, just be a service provider. >> Absolutely, and we have more and better technology come up to even accelerate those tools as well in our big data stuff. >> You're focused, you're scaling, final word I'll give to you for the segment. Share with the folks that are a Datameer customer or have not yet become a customer, what's the outlook, what's the new Datameer look like under your leadership? What should they expect? >> Yeah, absolutely, so I think they can expect utmost predictability, the way how we roll out the division and how we build our product in the next couple of releases. The next five, six months are critical for us. We have launched Visual Explorer here at the conference. We're going to launch our native cloud solution probably middle of November to the customer base. So, these are the big milestones that will help us for our next fiscal year and provide really great value to our customers, and that's what they can expect, predictability, a very solid product, all the enterprise-grade features they need and require for what they do. And if you look at it, we are really enterprise play, and the customer base that we have is very demanding and challenging, and we want to keep up and deliver a capability that is relevant for them and helps them create values from the data lakes. >> Christian Rodatus, technology enthusiast, passionate, now CEO of Datameer. Great to have you on theCUBE, thanks for sharing. >> Thanks so much. >> And we'll be following your progress. Datameer here inside theCUBE live coverage, hashtag BigDataNYC, our fifth year doing our own event here in conjunction with Strata Data, formerly Strata Hadoop, Hadoop World, eight years covering this space. I'm John Furrier with Jim Kobielus here inside theCUBE. More after this short break. >> Christian: Thank you. (upbeat electronic music)

Published Date : Sep 27 2017

SUMMARY :

Brought to by SiliconANGLE Media and its ecosystem sponsors. I'm John Furrier, the co-host, with Jim Kobielus, So well established, I barely think of you create the data and doing something with it. You've come on as the CEO to kind of and the service level agreements that they used Here's more of a play for the business users now, that created the data and built these data pipelines and I ask to redirect through you. So, the question for both you guys is the killer application yet for it? the next step data lake 2.0 to figure out Okay, so what's going on with you guys and columns that you are looking at, and we understand ourselves as a platform play the impact of cloud to the data world. and that resembles what they were used to tapping into the data lake. and being able to satisfy peak workload requirements and you put in some quick data science around things, or machine data that you analyze Which, by the way, is not going away any time soon. more streaming types of capability as we move this forward. What do the customer architectures look like? and the stack's only going to get more robust. and analyzing the data. just simply by the virtue of having access to your tool. and you see also different companies and so, you didn't overplay your hand, the company and what are you investing in to keep that innovation going? and now, the three investment areas let's kick the tires on that and play around. You got to learn from somewhere. for the rest of us power users, We will stick with our platform play and depth of analytic capability to some of What are some the ecosystem partners do you guys have? and how we can provide value to our customers, on the tool visualization side. into the BI tools today we support TD for Tableau, Absolutely, and we have more and better technology Share with the folks that are a Datameer customer and the customer base that we have is Great to have you on theCUBE, here in conjunction with Strata Data, Christian: Thank you.

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Chris	PERSON	0.99+
HSBC	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Jim	PERSON	0.99+
Christian Rodatus	PERSON	0.99+
Stefan	PERSON	0.99+
IBM	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
60%	QUANTITY	0.99+
2017	DATE	0.99+
Datameer	ORGANIZATION	0.99+
2010	DATE	0.99+
32 projects	QUANTITY	0.99+
Last year	DATE	0.99+
United Kingdom	LOCATION	0.99+
1,000	QUANTITY	0.99+
New York City	LOCATION	0.99+
14%	QUANTITY	0.99+
eight years	QUANTITY	0.99+
fifth year	QUANTITY	0.99+
one	QUANTITY	0.99+
Cloudera	ORGANIZATION	0.99+
Excel	TITLE	0.99+
eighth year	QUANTITY	0.99+
late 2009	DATE	0.99+
early 2010	DATE	0.99+
Mike Olson	PERSON	0.99+
60	QUANTITY	0.99+
27 open source projects	QUANTITY	0.99+
last week	DATE	0.99+
thousands	QUANTITY	0.99+
two things	QUANTITY	0.99+
Kafka	TITLE	0.99+
seven	QUANTITY	0.99+
second trend	QUANTITY	0.99+
Midtown Manhattan	LOCATION	0.99+
yesterday	DATE	0.99+
Christian	PERSON	0.99+
both	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.98+
two open source projects	QUANTITY	0.98+
Gartner	ORGANIZATION	0.98+
two platform technologies	QUANTITY	0.98+
Wikibon	ORGANIZATION	0.98+
Switzerland	LOCATION	0.98+
billions of rows	QUANTITY	0.98+
first	QUANTITY	0.98+
MapReduce	ORGANIZATION	0.98+
2,000 users	QUANTITY	0.98+
Bin Laden	PERSON	0.98+
NYC	LOCATION	0.97+
Strata Data	ORGANIZATION	0.97+
32 data lakes	QUANTITY	0.97+
six	QUANTITY	0.97+
Hadoop	TITLE	0.97+
secondly	QUANTITY	0.96+
next fiscal year	DATE	0.96+
three things	QUANTITY	0.96+
today	DATE	0.95+
four different products	QUANTITY	0.95+
Teradata	ORGANIZATION	0.95+
Christian	ORGANIZATION	0.95+
this morning	DATE	0.95+
TD	ORGANIZATION	0.94+
EDW	ORGANIZATION	0.94+
BigData	EVENT	0.92+

Doug Merritt, Splunk | Splunk .conf 2017

>> Narrator: Live from Washington D.C. it's The Cube, covering .comf 2017. Brought to you by Splunk. >> Welcome back to the district everybody. We are here at .comf 2017. This is The Cube, the leader in live tech coverage. I'm Dave Vellante with my co-host George Gilbert. Doug Merritt here, the CEO of Splunk. Doug, thanks for stopping by The Cube. >> Thanks for having me here Dave. >> You're welcome! Good job this morning. You are a positive guy, great energy. You got the fun T-shirt, I like big data and I cannot lie. The T-shirts I love, so great. You guys are a fun company. So congratulations. >> Doug: Well thank you. >> How's it feel? >> It feels great. You're surrounded by 7,000 fans that are getting value out of the products that you distribute to them and the energy is just off the charts as you said. It's truly an honor to be able to be surrounded by people that care about your company as much as these people do. >> Well one of the badges of honor that Splunk has at your shows is spontaneous laughter and spontaneous applause. You get a lot of that. And that underscores the nature of your customer base and the passion that they have for you guys so that's a pretty good feeling. >> From the very beginning, from the first code that Erik Swan and Rob Dos pushed out, the whole focus has been on making sure that you please the user. The attendance that they created to drive Splunk still stand today and I think a lot of that spontaneous laughter and applause goes back to if you really pay attention to your customer and you really focus all your energy on making sure they're successful, then life gets a lot easier. >> Well it's interesting to watch the ascendancy of Splunk and when you know, go back to 2010, 2011, everybody was talking about big data, it was the next big thing, Splunk never really hopped on that meme from a narrative standpoint. But now you kind of are big data. You kind of need big data platforms to analyze all this data. Talk about that shift. >> I still don't think that we are the lead flag waver on big data. And I think so much of that goes back to our belief on how do you serve customers? Customers have problems and you've got to create a solution to solve that problem for them. Increasingly in these days, those problems can be solved in a much more effective way with big data. But big data is the after effect. It's not the lead of the story, it's the substantiation of the story. So what I think Splunk has done uniquely well is, whether it's our origins in IT operations and systems administration or our foray into security operations centers and analytics and security analyst support. As we started with what is the problem that we're trying to solve. And then because we're so good at dealing with big data, obviously we're going to take a unstructured data, big data approach to that problem. >> So about two years in, you were telling us off camera about the story of Splunk has a tendency to be a little ADD. You came in, helped a little prioritization exercise, but what have you learned in two years. >> Ah, infinite. You have to have an hour for that. I think part of the ADD is because the platform is so powerful, it can solve almost any problem. And what we need to do to help our customers is listen to them and figure out what are the repeat problems so that we can actually scale and bring it to lots of different people. And that's been part of that focus problem or focus opportunity we have, is if you can solve just about anything, how do you help your customers understand what they should do first, second and third. I think that's part of the dilemma we see in the big data space, is people started with I want to just amass all the data. And I think that was a leftover to where big data, George and I were talking about this, where those big data platforms started from. If I'm Yahoo, if I'm Google, if I'm LinkedIn, if I'm Facebook, the guys that originated MapReduce and the whole Hadoop ecosystem, my job is data. Literally, that's all I have, that's all I monetize and drive. So I both have the motivation and the technical engineering knowhow to just put every bit of data I possibly can somewhere for later retrieval. But even those organizations have a hard time really optimizing that data. So if the average or ar-din-e-ah start in a different spot. It's not just put everything somewhere that I can later retrieve it, it's what problem are you trying to solve, what data do I need to solve that problem and then how do I use it, how do I bring it into something and then visualize it so that I get immediate payback and return and that's, I think you guys talked to Mike Odi-son on the show, he was in my keynote, that's a lot of the magic he brought to Get-lick and to Dubai Upworks is, let's just start with can we get people through security in five minutes or less? What data do we need? And then you can move on to the next problem and the next problem. But I think it's a more practical and more effective way of looking at big data is through a customer solution lens. >> Dave: Yeah great story Dubai Upwork. Go ahead George. >> When you look at the customer adjacencies, are you looking at what is the most relevant next batch of data relative to what I've accumulated for the first problem? Or is it an analytic solution that addresses a similar end customer, similar department? How do you find those adjacencies and attack them? >> So the good news and the beauty of Splunk is it's not difficult to get data into the platform. When you do the surveys on data scientists and I think Richard talked about this in his keynote, they all unanimously come back and say, "We spend 60 to 80% "of our time just trying to wrangle data." Well that's not super hard for them. How do you get data in quickly? So we've always been effective at getting massive amounts of data because of the way that we architect the system in. The challenge for us is how do you marry domain expertise and the different algorithms, queries or usage the data so you get that specific solution to a problem? So we've built up a whole practice of looking at the data sources that are in. What do we know from our customer base that says here are the top end use cases that have been able to take advantage of those data sources for these outcomes. And that's how we try to work with customers to say, "Alright you've already brought server logs, "firewall logs and API streams from these four "A to B odd services into Splunk. "I've already got this benefit. "What are the next two things you can do "with that data to get additional benefit?" >> So in a sense, you've got a template for mapping out a customer journey that says, "Here are the next steps." It's like a field guide to move them along in maturity. >> Dave: And you can codify that? >> That's been the hard part is both creating the open source contribution framework, for lack of a better word, what are all these different uses? But the final mile or final inch that most of these customers are trying to drive to is different for every single customer. And that's again, part of what the challenge is with AINML and what we were highlighting on stage this morning. There's two different dimensions, three different dimensions you're dealing with simultaneously. One is what data sets are you bringing together? And as you add different data it radically changes the outcome. What algorithms are you driving? And as you tweak an al-go, even on the same data, it radically changes the outcome. And then what functional lens are you putting in place? And so if you want to solve baggage handling at the airport like one of Michael Epperson's guys, you need some rich aviation and logistics experience to actually understand that to mean how do you bring that to main set together with the actual data that the algorithms and the data sets you get that rapid piece. And so creating enough of those so they're easily digestible and easily actionable by our customers, that is the horizon that we're trying to pierce through. >> And that leads to an ecosystem question, does it not? >> Doug: It does. >> Is that the answer or part of the answer for that mile or last inch that micro vertical. >> That's a huge chunk of the answer. Because you just go back to I need that domain expertise. And pharmaceutical drug exploration expertise is different than general healthcare medical expertise. If you're not able to bring that practical experience with the ability to easily wrangle data and some data scientists that can write these really interesting and effective ML routines, then it's difficult to get that value. >> So I know you'll jump in here in a second, so what are you guys doing explicitly on that front? Where does that fall in the priority list? Is it percolating? >> So many points made Splunk unique from the very beginning. A whole host of things. But one is we made it accessible for an average person to get data in, to store data and to extract value. A lot of the technologies that are out there, you can cobble together and eventually get to Splunk but it's really long, painful and difficult. If you take that same orientation around this now over-hyped MLAI world, it's the same thing, how do you raise the bar so that an average person on an average day with domain expertise and some understanding of data can find ways to get value back out. So I think there's certainly a technology problem because you've got to be able to do it at scale, at speed with integrity. But I think it's almost as much or maybe more of a user interface, an approachability problem 'cause there's just not enough data scientists and data experts that are also computer science experts to go around and solve this problem for the world. >> So it sounds like there's two approaches. There's the customer specific last mile and then what you were talking about earlier, sort of in the keynote and the (mumbles) breakout, which is try and find the horizontal use cases that you can bake into what Richard called curated experiences, which is really ML models that need minimal, light touch from the customer. >> Doug: Yes. >> So help us understand how those can build out with the customer last mile and then the customer wakes up with a platform. >> We have over 1,500 solutions as part of Splunk base which really are those mini curated experiences. From my Palo Alto environment, a combination of Palo Alto, us and third parties created Palo Alto Solution that is able to read data in from the different Palo Alto technologies and provide Dash, Borge, Alert, Remediations how to really assist the Palo Alto team doing their job more effectively. So there's over 1,500 of those in Splunk base. What Rick and the IT operations and App Dev arena and high end security arena are responsible for is how do we continue to gen up the ecosystem so we get more and more of those experiences? How can we extend from Palo Alto firewalls to overall network and perimeter visibility? Which is a combination now of breeding in Palo Alto firewall logs plus the other firewall technologies they likely have, plus network data, plus endpoint data so we can get visibility. And that almost always is a hyper heterogeneous environment, especially when you start to drive the applications (mumbles), maybe some in GCP, maybe some in Azure. They all have different formats. They've got different virtualization technologies that represent all those different on prime renditions. So I think that the world continues to get more complex. And the more that we can help the community, corral the community into here are buying centers and here are pinpoints, use the technology to finish and deliver that curated experience, the easier it is and the better it is for our customers. >> Doug I know you're super busy and you got to go, so last question. We've seen Splunk go from startup, pre IPO, successful IPO, couple bumps along the way. Now you guys are over a billion dollars. I feel like there's much more to come. The ecosystem is growing, the adoption is really, really solid. The richness of the platform continues to grow. Where do you see it going from here? >> I really do believe in my heart, my deepest heart, that this is the next five, ten, 20 billion dollar organization out there. And it's less the money than the representation of what that means. Reaching millions to tens of millions to hundreds of millions of people with these curated experiences, with these solutions within sights across hundreds of thousands to potentially millions of different entities out there, organizations, whether it's non-profit, governmental, commercial. We are, Mark Endreessen is famous for saying, "The world is becoming a software world." I agree. I take it one step further. I think the world is becoming a data driven and a data inside world. Software is key to that but you implement software so you can get insights and be intelligent and sense and respond and continue to iterate and grow. And I believe that Splunk is the best position company and technology on the planet right now to lean in and make this practical and approachable for the millions of end users and the hundreds of thousands of organizations that need that capability. >> So much more to talk about with Doug Merritt. Thanks so much for coming brother. >> Thank you. >> Really a pleasure having you. >> Thank you George. >> Alright keep it right there everybody, we'll be back with our next guest. This is #splunkconf17, check that out. Check out #cubegems. This is The Cube. We're live, right back from the D.C. Bye bye. (electronic pulse music)

Published Date : Sep 27 2017

SUMMARY :

Brought to you by Splunk. This is The Cube, the leader in live tech coverage. You got the fun T-shirt, I like big data and I cannot lie. is just off the charts as you said. and the passion that they have for you guys that you please the user. and when you know, go back to 2010, 2011, And I think so much of that goes back to about the story of Splunk has a tendency to be a little ADD. And then you can move on to the next problem Dave: Yeah great story Dubai Upwork. "What are the next two things you can do that says, "Here are the next steps." and the data sets you get that rapid piece. Is that the answer or part of the answer That's a huge chunk of the answer. A lot of the technologies that are out there, and then what you were talking about earlier, the customer wakes up with a platform. And the more that we can help the community, The richness of the platform continues to grow. And I believe that Splunk is the best position So much more to talk about with Doug Merritt. We're live, right back from the D.C.

ENTITIES

Entity	Category	Confidence
Richard	PERSON	0.99+
George	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Mark Endreessen	PERSON	0.99+
Doug Merritt	PERSON	0.99+
Doug	PERSON	0.99+
George Gilbert	PERSON	0.99+
2010	DATE	0.99+
Dave	PERSON	0.99+
7,000 fans	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
2011	DATE	0.99+
Yahoo	ORGANIZATION	0.99+
five minutes	QUANTITY	0.99+
Erik Swan	PERSON	0.99+
LinkedIn	ORGANIZATION	0.99+
Rick	PERSON	0.99+
Washington D.C.	LOCATION	0.99+
third	QUANTITY	0.99+
One	QUANTITY	0.99+
first problem	QUANTITY	0.99+
Facebook	ORGANIZATION	0.99+
Palo Alto	ORGANIZATION	0.99+
two approaches	QUANTITY	0.99+
Rob Dos	PERSON	0.99+
two years	QUANTITY	0.99+
Splunk	ORGANIZATION	0.99+
second	QUANTITY	0.99+
first code	QUANTITY	0.98+
millions	QUANTITY	0.98+
Palo Alto	LOCATION	0.98+
over 1,500 solutions	QUANTITY	0.98+
Michael Epperson	PERSON	0.98+
tens of millions	QUANTITY	0.98+
two different dimensions	QUANTITY	0.98+
60	QUANTITY	0.98+
one	QUANTITY	0.98+
first	QUANTITY	0.97+
three different dimensions	QUANTITY	0.97+
both	QUANTITY	0.97+
80%	QUANTITY	0.97+
hundreds of thousands	QUANTITY	0.96+
Azure	TITLE	0.96+
over 1,500	QUANTITY	0.96+
#splunkconf17	EVENT	0.95+
Splunk	PERSON	0.95+
over a billion dollars	QUANTITY	0.95+
two things	QUANTITY	0.94+
one step	QUANTITY	0.92+
an hour	QUANTITY	0.91+
ten, 20 billion dollar	QUANTITY	0.91+
Mike Odi-son	PERSON	0.9+
about two years	QUANTITY	0.89+
five	QUANTITY	0.89+
Dubai Upworks	ORGANIZATION	0.88+
millions of end users	QUANTITY	0.87+
this morning	DATE	0.87+
today	DATE	0.86+
hundreds of millions of people	QUANTITY	0.83+
Splunk .conf	EVENT	0.8+
The Cube	ORGANIZATION	0.8+
one of the badges	QUANTITY	0.76+
Get-lick	ORGANIZATION	0.76+
AINML	TITLE	0.74+
.comf	OTHER	0.68+
Dubai Upwork	ORGANIZATION	0.66+
single customer	QUANTITY	0.66+
MapReduce	TITLE	0.59+

Tendü Yogurtçu, Syncsort | BigData NYC 2017

>> Announcer: Live from midtown Manhattan, it's theCUBE, covering BigData New York City 2017, brought to you by SiliconANGLE Media and its ecosystem sponsors. >> Hello everyone, welcome back to theCUBE's special BigData NYC coverage of theCUBE here in Manhattan in New York City, we're in Hell's Kitchen. I'm John Furrier, with my cohost Jim Kobielus, whose Wikibon analyst for BigData. In conjunction with Strata Data going on right around the corner, this is our annual event where we break down the big data, the AI, the cloud, all the goodness of what's going on in big data. Our next guest is Tendu Yogurtcu who's the Chief Technology Officer at Syncsort. Great to see you again, CUBE alumni, been on multiple times. Always great to have you on, get the perspective, a CTO perspective and the Syncsort update, so good to see you. >> Good seeing you John and Jim. It's a pleasure being here too. Again the pulse of big data is in New York, and it's a great week with a lot of happening. >> I always borrow the quote from Pat Gelsinger, who's the CEO of VMware, he said on theCUBE in I think 2011, before he joined VMware as CEO he was at EMC. He said if you're not out in front of that next wave, you're driftwood. And the key to being successful is to ride the waves, and the big waves are coming in now with AI, certainly big data has been rising tide for its own bubble but now the aperture of the scale of data's larger, Syncsort has been riding the wave with us, we've been having you guys on multiple times. And it was important to the mainframe in the early days, but now Syncsort just keeps on adding more and more capabilities, and you're riding the wave, the big wave, the big data wave. What's the update now with you guys, where are you guys now in context of today's emerging data landscape? >> Absolutely. As organizations progress with their modern data architectures and building the next generation analytics platforms, leveraging machine learning, leveraging cloud elasticity, we have observed that data quality and data governance have become more critical than ever. Couple of years we have been seeing this trend, I would like to create a data lake, data as a service, and enable bigger insights from the data, and this year, really every enterprise is trying to have that trusted data set created, because data lakes are turning into data swamps, as Dave Vellante refers often (John laughs) and collection of this diverse data sets, whether it's mainframe, whether it's messaging queues, whether it's relational data warehouse environments is challenging the customers, and we can take one simple use case like Customer 360, which we have been talking for decades now, right? Yet still it's a complex problem. Everybody is trying to get that trusted single view of their customers so that they can serve the customer needs in a better way, offer better solutions and products to customers, get better insights about the customer behavior, whether leveraging deep learning, machine learning, et cetera. However, in order to do that, the data has to be in a clean, trusted, valid format, and every business is going global. You have data sets coming from Asia, from Europe, from Latin America, and many different places, in different formats and it's becoming challenge. We acquired Trillium Software in December 2016, and our vision was really to bring that world leader enterprise grade data quality into the big data environments. So last week we announced our Trillium Quality for Big Data product. This product brings unmatched capabilities of data validation, cleansing, enrichment, and matching, fuzzy matching to the data lake. We are also leveraging our Intelligent eXecution engine that we developed for data integration product, the MX8. So we are enabling the organizations to take this data quality offering, whether it's in Hadoop, MapReduce or Apache Spark, whichever computer framework it's going to be in the future. So we are very excited about that now. >> Congratulations, you mentioned the data lake being a swamp, that Dave Vellante referred to. It's interesting, because how does it become a swamp if it's a silo, right? We've seen data silos being antithesis to governance, it challenges, certainly IoT. Then you've got the complication of geopolitical borders, you mentioned that earlier. So you still got to integrate the data, you need data quality, which has been around for a while but now it's more complex. What specifically about the cleansing and the quality of the data that's more important now in the landscape now? Is it those factors, are that the drivers of the challenges today and what's the opportunity for customers, how do they figure this out? >> Complexity is because of many different factors. Some of it from being global. Every business is trying to have global presence, and the data is originating from web, from mobile, from many different data sets, and if we just take a simple address, these address formats are different in every single country. Trillium Quality for Big Data, we support over 150 postal data from different countries, and data enrichment with this data. So it becomes really complex, because you have to deal with different types of data from different countries, and the matching also becomes very difficult, whether it's John Furrier, J Furrier, John Currier, you have to be >> All my handles on Twitter, knowing that's about. (Tendu laughs) >> All of the handles you have. Every business is trying to have a better targeting in terms of offering product and understanding the single and one and only John Furrier as a customer. That creates a complexity, and any data management and data processing challenge, the variety of data and the speed that data is really being populated is higher than ever we have observed. >> Hold on Jim, I want to get Jim involved in this one conversation, 'cause I want to just make sure those guys can get settled in on, and adjust your microphone there. Jim, she's bringing up a good point, I want you to weigh in just to kind of add to the conversation and take it in the direction of where the automation's happening. If you look at what Tendu's saying as to complexity is going to have an opportunity in software. Machine learning, root-level cleanliness can be automated, because Facebook and others have shown that you can apply machine learning and techniques to the volume of data. No human can get at all the nuances. How is that impacting the data platforms and some of the tooling out there, in your opinion? >> Yeah well, much of the issue, one of the core issues is where do you place the data matching and data cleansing logic or execution in this distributed infrastructure. At the source, in the cloud, at the consumer level in terms of rolling up the disparate versions of data into a common view. So by acquiring a very strong, well-established reputable brand in data cleansing, Trillium, as Syncsort has done, a great service to your portfolio, to your customers. You know, Trillium is well known for offering lots of options in terms of where to configure the logic, where to deploy it within distributed hybrid architectures. Give us a sense for going forward the range of options you're going to be providing with for customers on where to place the cleansing and matching logic. How you're going to support, Syncsort, a flexible workflows in terms of curation of the data and so forth, because the curation cycle for data is critically important, the stewardship. So how do you plan to address all of that going forward in your product portfolio, Tendu? >> Thank you for asking the question, Jim, because that's exactly the challenge that we hear from our customers, especially from larger enterprise and financial services, banking and insurance. So our plan is our actually next upcoming release end of the year, is targeting very flexible deployment. Flexible deployment in the sense that you might be creating, when you understand the data and create the business rules and said what kind of matching and enrichment that you'll be performing on the data sets, you can actually have those business rules executed in the source of the data or in the data lake or switch between the source and the enterprise data lake that you are creating. That flexibility is what we are targeting, that's one area. On the data creation side, we see these percentages, 80% of data stewards' time is spent on data prep, data creation and data cleansing, and it is actually really a very high percentage. From our customers we see this still being a challenge. One area that we started investing is using the machine learning to understand the data, and using that discovery of the data capabilities we currently have to make recommendations what those business rules can be, or what kind of data validation and cleansing and matching might be required. So that's an area that we will be investing. >> Are you contemplating in terms of incorporating in your product portfolio, using machine learning to drive a sort of, the term I like to use is recommendation engine, that presents recommendations to the data stewards, human beings, about different data schemas or different ways of matching the data, different ways of, the optimal way of reconciling different versions of customer data. So is there going to be like a recommendation engine of that sort >> It's going to be >> In line with your >> That's what our plan currently recommendations so the users can opt to apply or not, or to modify them, because sometimes when you go too far with automation you still need some human intervention in making these decisions because you might be operating on a sample of data versus the full data set, and you may actually have to infuse some human understanding and insight as well. So our plan is to make as a recommendation in the first phase at least, that's what we are planning. And when we look at the portfolio of the products and our CEO Josh is actually today was also in theCUBE, part of Splunk .conf. We have acquisitions happening, we have organic innovation that's happening, and we really try to stay focused in terms of how do we create more value from your data, and how do we increase the business serviceability, whether it's with our Ironstream product, we made an announcement this week, Ironstream transaction tracing to create more visibility to application performance and more visibility to IT operations, for example when you make a payment with your mobile, you might be having problem and you want to be able to trace back to the back end, which is usually a legacy mainframe environment, or whether you are populating the data lake and you want to keep the data in sync and fresh with the data source, and apply the change as a CDC, or whether you are making that data from raw data set to more consumable data by creating the trusted, high quality data set. We are very much focused on creating more value and bigger insights out of the data sets. >> And Josh'll be on tomorrow, so folks watching, we're going to get the business perspective. I have some pointed questions I'm going to ask him, but I'll take one of the questions I was going to ask him but I want to get your response from a technical perspective as CTO. As Syncsort continues your journey, you keep on adding more and more things, it's been quite impressive, you guys done a great job, >> Tendu: Thank you. >> We enjoy covering the success there, watching you guys really evolve. What is the value proposition for Syncsort today, technically? If you go in, talk to a customer, and prospective new customer, why Syncsort, what's the enabling value that you're providing under the hood, technically for customers? >> We are enabling our customers to access and integrate data sets in a trusted manner. So we are ultimately liberating the data from all of the enterprise data stores, and making that data consumable in a trusted manner. And everything we provide in that data management stack, is about making data available, making data accessible and integrated the modern data architecture, bridging the gap between those legacy environments and the modern data architecture. And it becomes really a big challenge because this is a cross-platform play. It is not a single environment that enterprises are working with. Hadoop is real now, right? Hadoop is in the center of data warehouse architecture, and whether it's on-premise or in the cloud, there is also a big trend about the cloud. >> And certainly batch, they own the batch thing. >> Yeah, and as part of that, it becomes very important to be able to leverage the existing data assets in the enterprise, and that requires an understanding of the legacy data stores, and existing infrastructure, and existing data warehouse attributes. >> John: And you guys say you provide that. >> We provide that and that's our baby and provide that in enterprise grade manner. >> Hold on Jim, one second, just let her finish the thought. Okay, so given that, okay, cool you got that out there. What's the problem that you're solving for customers today? What's the big problem in the enterprise and in the data world today that you address? >> I want to have a single view of my data, and whether that data is originating on the mobile or that data is originating on the mainframe, or in the legacy data warehouse, and we provide that single view in a trusted manner. >> When you mentioned Ironstream, that reminded me that one of the core things that we're seeing in Wikibon in terms of, IT operations is increasingly being automated through AI, some call it AI ops and whatnot, we're going deeper on the research there. Ironstream, by bringing mainframe and transactional data, like the use case you brought in was IT operations data, into a data lake alongside machine data that you might source from the internet of things and so forth. Seem to me that that's a great enabler potentially for Syncsort if it wished to play your solutions or position them into IT operations as an enabler, leveraging your machine learning investments to build more automated anomaly detection and remediation into your capabilities. What are your thoughts? Is that where you're going or do you see it as an opportunity, AI for IT ops, for Syncsort going forward? >> Absolutely. We target use cases around IT operations and application performance. We integrate with Splunk ITSI, and we also provide this data available in the big data analytics platforms. So those are really application performance and IT operations are the main uses cases we target, and as part of the advanced analytics platform, for example, we can correlate that data set with other machine data that's originating in other platforms in the enterprise. Nobody's looking at what's happening on mainframe or what's happening in my Hadoop cluster or what's happening on my VMware environment, right. They want to correlate the data that's closed platform, and that's one of the biggest values we bring, whether it's on the machine data, or on the application data. >> Yeah, that's quite a differentiator for you. >> Tendu, thanks for coming on theCUBE, great to see you. Congratulations on your success. Thanks for sharing. >> Thank you. >> Okay, CUBE coverage here in BigData NYC, exclusive coverage of our event, BigData NYC, in conjunction with Strata Hadoop right around the corner. This is our annual event for SiliconANGLE, and theCUBE and Wikibon. I'm John Furrier, with Jim Kobielus, who's our analyst at Wikibon on big data. Peter Burris has been on theCUBE, he's here as well. Big three days of wall-to-wall coverage on what's happening in the data world. This is theCUBE, thanks for watching, be right back with more after this short break.

Published Date : Sep 27 2017

SUMMARY :

brought to you by SiliconANGLE Media all the goodness of what's going on in big data. and it's a great week with a lot of happening. and the big waves are coming in now with AI, and enable bigger insights from the data, of the data that's more important now and the data is originating from web, from mobile, All my handles on Twitter, All of the handles you have. and some of the tooling out there, in your opinion? and so forth, because the curation cycle for data and create the business rules and said the term I like to use is recommendation engine, and bigger insights out of the data sets. but I'll take one of the questions I was going to ask him What is the value proposition for Syncsort today, and integrated the modern data architecture, in the enterprise, and that requires an understanding and provide that in enterprise grade manner. and in the data world today that you address? or that data is originating on the mainframe, like the use case you brought in was IT operations data, and that's one of the biggest values we bring, Tendu, thanks for coming on theCUBE, great to see you. and theCUBE and Wikibon.

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Jim	PERSON	0.99+
Dave Vellante	PERSON	0.99+
John	PERSON	0.99+
Pat Gelsinger	PERSON	0.99+
Asia	LOCATION	0.99+
Europe	LOCATION	0.99+
Peter Burris	PERSON	0.99+
John Furrier	PERSON	0.99+
December 2016	DATE	0.99+
VMware	ORGANIZATION	0.99+
New York	LOCATION	0.99+
Tendu Yogurtcu	PERSON	0.99+
Manhattan	LOCATION	0.99+
Latin America	LOCATION	0.99+
Josh	PERSON	0.99+
Facebook	ORGANIZATION	0.99+
Syncsort	ORGANIZATION	0.99+
2011	DATE	0.99+
Ironstream	ORGANIZATION	0.99+
80%	QUANTITY	0.99+
tomorrow	DATE	0.99+
EMC	ORGANIZATION	0.99+
last week	DATE	0.99+
Tendu	PERSON	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
one second	QUANTITY	0.99+
over 150 postal data	QUANTITY	0.99+
BigData	ORGANIZATION	0.99+
Wikibon	ORGANIZATION	0.99+
one	QUANTITY	0.98+
Trillium Software	ORGANIZATION	0.98+
New York City	LOCATION	0.98+
Trillium	ORGANIZATION	0.98+
single	QUANTITY	0.98+
John Currier	PERSON	0.98+
first phase	QUANTITY	0.98+
theCUBE	ORGANIZATION	0.97+
this week	DATE	0.96+
Tendü Yogurtçu	PERSON	0.96+
this year	DATE	0.96+
Twitter	ORGANIZATION	0.95+
Couple of years	QUANTITY	0.95+
today	DATE	0.95+
single view	QUANTITY	0.94+
CUBE	ORGANIZATION	0.94+
NYC	LOCATION	0.94+
one area	QUANTITY	0.93+
J Furrier	PERSON	0.92+
Hadoop	TITLE	0.91+
2017	EVENT	0.9+
three days	QUANTITY	0.89+
single environment	QUANTITY	0.88+
One area	QUANTITY	0.87+
one conversation	QUANTITY	0.86+
Apache	ORGANIZATION	0.85+
big wave	EVENT	0.84+
one simple use case	QUANTITY	0.82+

Phu Hoang, DataTorrent Inc. | CUBEConversation

>> Narrator: From Palo Alto, California, it's CUBEConversations with John Furrier. >> Hello, welcome to our special CUBEConversation here in Palo Alto, California. I'm John Furrier, co-founder of SiliconAngle Media and co-host for the CUBE. I'm here with Phu Hoang who's the co-founder and chief strategy officer of DataTorrent. Great to see you again. Welcome back >> Thank you so much, John. >> This CUBEConversation. So, you're now the chief strategy officer, which is code words for you are the CEO and co-founder of the company. You bring in a pro guy, Churchwood we know very well, former EMC-er, real pro. Gives you a chance to kind of get down and dirty into the organization and get back to your roots and kind of look at the big picture. Great management team. Talk about what your background is, because I think I want to start there, because you have an interesting background. Former Yahoo executive, we've talked before. Take a minute to talk about your background. >> Yeah, sure. You know I think I'm just one of those super lucky engineer. I got involved with Yahoo way early in 1996. I think I was the fifth engineer, or so. I stayed there for 12 years, ended up running about close to 3,000 engineers, and had the chance to really experience the whole growth of the internet. We build out hundreds of sites worldwide, so all of engineering team develop all of those websites throughout the world. >> You must have a tear in your eye at how Yahoo ended up. We don't want to go there. Folks that don't remember Yahoo during the web1.0 days, it was the beginning of a revolution. I kind of see the same thing happening, like blockchain and what's going on now. A whole new wild west is happening, but back then you couldn't buy off the shelves. You had to certainly buy servers, but the software, you guys were handling kind of a first generation use case. >> That's right. >> Folks may or may not know, but Yahoo really was the inventor of Hadoop. Doing Hadoop at large scale, honestly ... MapReduce written by Google, but the rest is, you guys were deploying a lot of that stuff. You had to deal with scale and write your own software for big data, before it was called big data. >> That's exactly right. It's interesting, because originally we thought that our job was really customer-facing website, and all of the data crunching and massaging that we would actually be able to use enterprise software to do that. Very quickly we learned at the pace of scale data that we were generating that we really couldn't use that software. We were kind of on our own, so we had to invent approaches to do that. The thing we knew a lot was commodity servers on racks. So, we ended up saying, "How do I solve this big data processing problem using that hardware?" It didn't happen overnight. It took many years of doing it right, doing it wrong, and fixing it. You start to iterate around how to do distributed processing across many hundreds of servers. >> It's interesting, Yahoo had the same situation. And ultimately Amazon ended up having, cause they were a pioneer. People dismissed Amazon web services. Like, "It's just hosting and bare metal on the cloud." Really what's interesting is that you guys were hardening and operationalizing big data. >> That's right. >> So, I got to ask you the question, cause this is more of a geeky computer science concept, but batch processing has been around since the mainframe, and that's become normal. Databases, et cetera, software. But now, over the past 8 years in particular, as big data and unstructured data has proliferated in massive scale, certainly now with internet of things you see it booming. This notion of real time data in motion. You have two paradigms out there, batch processing, which is well known and data in motion, which is essentially real time. Self-driving cars ... Evidence is everywhere, where this is going. Real time is not near real time. >> That's right. >> In nanoseconds, people want results. This is a fundamental data challenge. What's your thoughts on this and how does this relate to how big data will evolve for customers? >> I think you're exactly right. I think as big data came, and people were able to process data, and understand it better, and derive insights from it, very quickly for competitive reason, they find out that they want those insights sooner and sooner. They couldn't get it soon enough. So, you have those opposing trends of more and more data, yet at the same time, faster and faster insight. Where does that go? When you really come down to it, people don't really want to do batch processing. They do batch processing, cause that was the technology that they have. If they have their way, they don't want to just ... Information is coming into their business. Customers are interacting with their products constantly, 24 by 7. Those events, if you will, that are giving them insights are happening all the time. Except, for a long time, they store it into a file. They wait til midnight, and then they process it overnight. More and more there are now capabilities in memory distributed to do that processing as it comes in. That's one of the big motivations for forming DataTorrent. >> I want to get to DataTorrent in a minute, but I want to get some of these trends, cause I think they're important to kind of put together the two big pieces of the puzzle, if you will. One is, you mentioned batch processing in real time. The companies, historically, have built their infrastructure and their operations IT, and whatever, around that, how storage was procured and deployed. Now with IOT and the edge of the network becoming huge, it's a big deal. So, data in motion, it's pretty much well agreed upon amongst most of the smart people, this is a big issue. Let me throw a little wrench in the equation. Cloud computing kind of changes the security paradigm. There's no perimeter anymore, so there's no door you can secure, no firewall model. Once you get in, you're in. That's where we've seen a lot of attacks on ransomware and a lot of cyber attacks. The penetration is everywhere. Now there's APIs and everything. When you bring cloud into it, and you bring in the fact that you've got data in motion, what is the challenge for the customer? How do top architects get their arms around this? What's the solution? What's your vision on that? >> Well, I will start by saying it's a hard problem. I think you're absolutely right. I think we're still in the phase where the problems are very visible about how do you solve this. I think we're still, as an industry, figuring out how to solve it, cause you're right, the security issue ... Security is not this one point tool. It's an entire ecosystem process for doing that. The cloud opens up all of those opportunities for fraud and so on. It's still an ongoing challenge. I think the trend of memory becoming cheaper and cheaper, so that things are done more in memory and less in storage could actually help a bit on that. But overall, security internal, external processes are ... >> It's a moving train. >> Yeah, it's moving. Exactly. >> Let me ask you about the big other trend to throw on top of this. This is really kind of where you see a lot of the activity, although some will claim that the app store is not seeing as many apps now as they used to be. Certainly the enterprises, massive growth and application development. So, ready-made apps with DevOps and Cloud have built a whole culture of infrastructure as code, which is essentially saying that I'm going to build apps and make the infrastructure kind of invisible. You're seeing a lot of apps like that, called ready-made apps, however you want to call it. Those are the things. How are you guys at DataTorrent handling and supporting that trend? >> We are right smack in the middle of exactly that trend. One of theses that we had was that big data's hard. Hadoop is hard. Hadoop is now 12 years old. Lots of people are using Hadoop, trying Hadoop, but yet it's still not something that is fully operationalized and easy for everybody. I think that part of that is big data's hard, distributed processing is hard, how to get all that working. There were two things we were focusing on. One was the real time thing. The other one was, how do we make this stuff a lot easier to use? So, we focus a lot on building tools on top of the open source engine, if you will, to kind of make it easy to use. The other one is exactly that, ready-made apps. As we continue to learn in working with our customers, and starting to see the patterns, putting kind of, bigger functional block together, so that it's easier to kind of build these big data application at this next layer. Machine learning, rule engines, whatever not. How do you piece that together in a way that is 80 percent done, so that the customer only has a little bit, the last mile. >> So, you guys want to be the tooling for that? >> Yeah, I think so. I think you have to. This stuff, you have to kind of go through the whole six layer of what it takes to get the final business value out. You're not going to have the skillset to do it. The more we can abstract and get it to the top, the better. >> Every company's got their own DNA. Intel has Moore's Law. You're the co-founder of DataTorrent. What's the DNA of your company, as the founder? Talk about what's the, what do employees you try to instill into your culture that is the DNA that you want to be known for? >> Interesting. So, I start out sort of on the technical or product side. Actually, our DNA is all about ops. We think that, especially in big data, there's lots of ways to do prototypes and get some proof of concept going.. Getting that to production, to run it 24 by 7, never lose data, that really has been hard. Our entire existence is around how to truly build 24 by 7, no data, fast application. All of our engineers live and breathe how to do that well. >> Ops is consistent with stability. It's interesting, Silicon Valley's going through its own transformation around programmers and the role of entrepreneurship. It's interesting, in the enterprise, they always kind of were like, "Oh, no big deal." Because at the end of the day they need stuff to run at five 9. These are networks. The old saying that Mark Zuckerberg used to have is, "Move fast and break stuff." They've changed their tune to, "Move fast and be a hundred percent reliable." This is the trend that the enterprises will always put out there. How do companies stay and maintain that ops integrity and still be innovative without a lot of command and control, and compliance restrictions? How do they experiment with this data tsunami that's happening and maintain that integrity? >> My answer to that is, I think, as an industry, we have to build products and tools to allow for that. Some of that is processes inside a company, but I think a lot of that can be productize. The advances in that big data processing layer, and how to recover, get new containers, and do all the right things, allow for the application developer not to have to worry about many of those segments. I think technology exists out there for tools to be developed to deal with a lot of that. >> I love talking with entrepreneurs and you're the co-founder of DataTorrent. Talk about the journey you've been on from the beginning. You have a new CEO, which as the CEO, you want to lighten the load up a little bit. It gets bigger, you got to have HR issues, things are happening. You're putting culture in place and trying to scale out and get a groove swing. Certainly Uber could've taken a few tips from your playbook, as bringing in senior management. You did it at the right time. Talk about your journey, the company, and what people should know about DataTorrent. >> We're just a bunch of guys that are just still trying to make a contribution to the industry. I think we saw an opportunity to really help people move towards big data, move towards real time analytics, and really help them solve some really hairy problems that they have coming up with data. From a skillset and personally, I think kind of my particular strength was really about that initial vision. Be able to build out a set of capabilities, and maybe get a first set of half a dozen wins, and really prove point. To sort of make it into a machine that has all the right marketing tools, and business development tools, and so on. It will be great to bring in someone like Guy, who has done that many, many times over, and has been super successful at that, to take us to the next level. >> Takes a lot of self awareness, too. You probably had your moments where, should you stay on, be the CEO ... But, what are you doing now, cause you get down and you can get into the products. Are you doing a lot more product tinkering? Are you involved the road map? What's your involvement day-to-day now? >> I love it, cause it's exactly what I enjoy most, which is really interacting with customers and users and really continue to hone in on the product market fit. And continue to understand, what are the pain points? What are the issues? And, how can we solve it? All coming from, not so much a services mentality, but a product mentality. >> At the cloud ops, too. That's a big area. So, what's the big problem that you solve for the customers? What's the big, hairy problem? >> Really easy, how to productize, how to operationalize this data pipeline that they have, so that they can truly be accepting real live business data that they are getting in, and giving them the insight. >> Been a lot of talk about automation and AI, lately. Obviously, it's a buzzword. Wikibon just put out a report called True Private Cloud that shows all the automation's actually going at and replacing non-differentiated labor, which actually the racking and stacking gear. Moving to values, actually is going to be more employment on that side. Talk about the role of automation in the data world, because if you just think about the amount of data that companies like Facebook and Yahoo take in, you need machine learning. You need automation. What is the key to automation in a lot of these new emerging areas around large data sets? >> It's so funny, yesterday I was driving. I was listening to a KQED segment, and they were talking about in its next phase, AI and machine learning is going to do sort of the first layer of all the reporting. So, you actually have reporters doing much more sophisticated reporting, cause there's an AI layer that has a template of what are the questions to answer, and they can just spill out all the news for you. >> Paid by cryptocurrency. >> Yeah. I think machine learning and AI will be everywhere. We will continue to learn, and it will continue to get better at doing more and more things for us, so that we get to kind of play at that creative, disruptive layer, while it does all the menial tasks. I think it will touch every part of our civilization. The technology is getting incredible. The algorithms are incredible. The power, the computing power to allow for that is getting exponential. I think it's super interesting that the engineers are super interested in it. Everything we do now revolves around ... When we talk about the analytics layer at real time, it's all about machine learning scoring and how to, rules and all that. >> Great to have you here on the CUBEConversation. Give you the last word. Give a quick plug about DataTorrent. What should your customers know about you guys? Why should they call you? >> We're a company solely focused on bringing big data applications to production. We focus on making sure that as long as you understand what you want to do with data, we can make it super fast, super reliable, super scalable. All that stuff. >> Co-founder of DataTorrent here and the CUBEConversation here in Palo Alto. I'm John Furrier. Thanks for watching. (synth music)

Published Date : Aug 17 2017

SUMMARY :

it's CUBEConversations with John Furrier. Great to see you again. and kind of look at the big picture. and had the chance to really experience I kind of see the same thing happening, You had to deal with scale and write your own software and all of the data crunching and massaging that we would It's interesting, Yahoo had the same situation. So, I got to ask you the question, relate to how big data will evolve for customers? So, you have those opposing trends of more and more data, and you bring in the fact that you've got data in motion, the problems are very visible about how do you solve this. Yeah, it's moving. and make the infrastructure kind of invisible. the open source engine, if you will, I think you have to. that is the DNA that you want to be known for? Getting that to production, to run it 24 by 7, and the role of entrepreneurship. and do all the right things, allow for the application You did it at the right time. To sort of make it into a machine that has all the right and you can get into the products. and really continue to hone in on the product market fit. So, what's the big problem that you solve for the customers? so that they can truly be accepting real live business data What is the key to automation in a lot of these AI and machine learning is going to do sort of The power, the computing power to allow for that Great to have you here on the CUBEConversation. We focus on making sure that as long as you understand and the CUBEConversation here in Palo Alto.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Tom	PERSON	0.99+
Marta	PERSON	0.99+
John	PERSON	0.99+
IBM	ORGANIZATION	0.99+
David	PERSON	0.99+
Dave	PERSON	0.99+
Peter Burris	PERSON	0.99+
Chris Keg	PERSON	0.99+
Laura Ipsen	PERSON	0.99+
Jeffrey Immelt	PERSON	0.99+
Chris	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Chris O'Malley	PERSON	0.99+
Andy Dalton	PERSON	0.99+
Chris Berg	PERSON	0.99+
Dave Velante	PERSON	0.99+
Maureen Lonergan	PERSON	0.99+
Jeff Frick	PERSON	0.99+
Paul Forte	PERSON	0.99+
Erik Brynjolfsson	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Andrew McCafee	PERSON	0.99+
Yahoo	ORGANIZATION	0.99+
Cheryl	PERSON	0.99+
Mark	PERSON	0.99+
Marta Federici	PERSON	0.99+
Larry	PERSON	0.99+
Matt Burr	PERSON	0.99+
Sam	PERSON	0.99+
Andy Jassy	PERSON	0.99+
Dave Wright	PERSON	0.99+
Maureen	PERSON	0.99+
Google	ORGANIZATION	0.99+
Cheryl Cook	PERSON	0.99+
Netflix	ORGANIZATION	0.99+
$8,000	QUANTITY	0.99+
Justin Warren	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
2012	DATE	0.99+
Europe	LOCATION	0.99+
Andy	PERSON	0.99+
30,000	QUANTITY	0.99+
Mauricio	PERSON	0.99+
Philips	ORGANIZATION	0.99+
Robb	PERSON	0.99+
Jassy	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Mike Nygaard	PERSON	0.99+

Distributed Data with Unifi Software

>> Narrator: From the Silicon Angle Media Office in Boston, Massachusetts, it's theCUBE. Now, here's your host, Stu Miniman. >> Hi, I'm Stu Miniman and we're here at the east coast studio for Silicon Angle Media. Happy to welcome back to the program, a many time guest, Chris Selland, who is now the Vice President of strategic growth with Unifi Software. Great to see you Chris. >> Thanks so much Stu, great to see you too. >> Alright, so Chris, we'd had you in your previous role many times. >> Chris: Yes >> I think not only is the first time we've had you on since you made the switch, but also first time we've had somebody from Unifi Software on. So, why don't you give us a little bit of background of Unifi and what brought you to this opportunity. >> Sure, absolutely happy to sort of open up the relationship with Unifi Software. I'm sure it's going to be a long and good one. But I joined the company about six months ago at this point. So I joined earlier this year. I actually had worked with Unifi for a bit as partners. Where when I was previously at the Vertica business inside of HP/HP, as you know for a number of years prior to that, where we did all the work together. I also knew the founders of Unifi, who were actually at Greenplum, which was a direct Vertica competitor. Greenplum is acquired by EMC. Vertica was acquired by HP. We were sort of friendly respected competitors. And so I have known the founders for a long time. But it was partly the people, but it was really the sort of the idea, the product. I was actually reading the report that Peter Burris or the piece that Peter Burris just did on I guess wikibon.com about distributed data. And it played so into our value proposition. We just see it's where things are going. I think it's where things are going right now. And I think the market's bearing that out. >> The piece you reference, it was actually, it's a Wikibon research meeting, we run those weekly. Internally, we're actually going to be doing them soon we will be broadcasting video. Cause, of course, we do a lot of video. But we pull the whole team together, and it was one, George Gilbert actually led this for us, talking about what architectures do I need to build, when I start doing distributed data. With my background really more in kind of the cloud and infrastructure world. We see it's a hybrid, and many times a multi-cloud world. And, therefore, one of the things we look at that's critical is wait, if I've got things in multiple places. I've got my SAS over here, I've got multiple public clouds I'm using, and I've got my data center. How do I get my arms around all the pieces? And of course data is critical to that. >> Right, exactly, and the fact that more and more people need data to do their jobs these days. Working with data is no longer just the area where data scientists, I mean organizations are certainly investing in data scientists, but there's a shortage, but at the same time, marketing people, finance people, operations people, supply chain folks. They need data to do their jobs. And as you said where it is, it's distributed, it's in legacy systems, it's in the data center, it's in warehouses, it's in SAS applications, it's in the cloud, it's on premise, It's all over the place, so, yep. >> Chris, I've talked to so many companies that are, everybody seems to be nibbling at a piece of this. We go to the Amazon show and there's this just ginormous ecosystem that everybody's picking at. Can you drill in a little bit for what problems do you solve there. I have talked to people. Everything from just trying to get the licensing in place, trying to empower the business unit to do things, trying to do government compliance of course. So where's Unifi's point in this. >> Well, having come out of essentially the data warehousing market. And now of course this has been going on, of course with all the investments in HDFS, Hadoop infrastructure, and open source infrastructure. There's been this fundamental thinking that, well the answer's if I get all of the data in one place then I can analyze it. Well that just doesn't work. >> Right. >> Because it's just not feasible. So I think really and its really when you step back it's one of these like ah-ha that makes total sense, right. What we do is we basically catalog the data in place. So you can use your legacy data that's on the main frame. Let's say I'm a marketing person. I'm trying to do an analysis of selling trends, marketing trends, marketing effectiveness. And I want to use some order data that's on the main frame, I want some click stream data that's sitting in HDFS, I want some customer data in the CRM system, or maybe it's in Sales Force, or Mercado. I need some data out of Workday. I want to use some external data. I want to use, say, weather data to look at seasonal analysis. I want to do neighborhooding. So, how do I do that? You know I may be sitting there with Qlik or Tableau or Looker or one of these modern B.I. products or visualization products, but at the same time where's the data. So our value proposition it starts with we catalog the data and we show where the data is. Okay, you've got these data sources, this is what they are, we describe them. And then there's a whole collaboration element to the platform that lets people as they're using the data say, well yes that's order data, but that's old data. So it's good if you use it up to 2007, but the more current data's over here. Do things like that. And then we also then help the person use it. And again I almost said IT, but it's not real data scientists, it's not just them. It's really about democratizing the use. Because business people don't know how to do inner and outer joins and things like that or what a schema is. They just know, I'm trying do a better job of analyzing sales trends. I got all these different data sources, but then once I found them, once I've decided what I want to use, how do I use them? So we answer that question too. >> Yea, Chris reminds me a lot of some the early value propositions we heard when kind of Hadoop and the whole big data wave came. It was how do I get as a smaller company, or even if I'm a bigger company, do it faster, do it for less money than the things it use to be. Okay, its going to be millions of dollars and it's going to take me 18 months to roll out. Is it right to say this is kind of an extension of that big data wave or what's different and what's the same? >> Absolutely, we use a lot of that stuff. I mean we basically use, and we've got flexibility in what we can use, but for most of our customers we use HDFS to store the data. We use Hive as the most typical data form, you have flexibility around there. We use MapReduce, or Spark to do transformation of the data. So we use all of those open source components, and as the product is being used, as the platform is being used and as multiple users, cause it's designed to be an enterprise platform, are using it, the data does eventually migrate into the data lake, but we don't require you to sort of get it there as a prerequisite. As I said, this is one of the things that we really talk about a lot. We catalog the data where it is, in place, so you don't have to move it to use it, you don't have to move it to see it. But at the same time if you want to move it you can. The fundamental idea I got to move it all first, I got to put it all in one place first, it never works. We've come into so many projects where organizations have tried to do that and they just can't, it's too complex these days. >> Alright, Chris, what are some of the organizational dynamics you're seeing from your customers. You mention data scientist, the business users. Who is identifying, whose driving this issues, whose got the budget to try to fix some of these challenges. >> Well, it tends to be our best implementations are driven really, almost all of them these days, are driven by used cases. So they're driven by business needs. Some of the big ones. I've sort of talked about customers already, but like customer 360 views. For instance, there's a very large credit union client of ours, that they have all of their data, that is organized by accounts, but they can't really look at Stu Miniman as my customer. How do I look at Stu's value to us as a customer? I can look at his mortgage account, I can look at his savings account, I can look at his checking account, I can look at his debit card, but I can't just see Stu. I want to like organize my data, that way. That type of customer 360 or marketing analysis I talked about is a great use case. Another one that we've been seeing a lot of is compliance. Where just having a better handle on what data is where it is. This is where some of the governance aspects of what we do also comes into play. Even though we're very much about solving business problems. There's a very strong data governance. Because when you are doing things like data compliance. We're working, for instance, with MoneyGram, is a customer of ours. Who this day and age in particular, when there's money flows across the borders, there's often times regulators want to know, wait that money that went from here to there, tell me where it came from, tell me where it went, tell me the lineage. And they need to be able to respond to those inquiries very very quickly. Now the reality is that data sits in all sorts of different places, both inside and outside of the organization. Being able to organize that and give the ability to respond more quickly and effectively is a big competitive advantage. Both helps with avoiding regulatory fines, but also helps with customers responsiveness. And then you've got things GDPR, the General Data Protection Regulation, I believe it is, which is being driven by the EU. Where its sort of like the next Y2K. Anybody in data, if they are not paying attention to it, they need to be pretty quick. At least if they're a big enough company they're doing business in Europe. Because if you are doing business with European companies or European customers, this is going to be a requirement as of May next year. There's a whole 'nother set of how data's kept, how data's stored, what customers can control over data. Things like 'Right to Be Forgotten'. This need to comply with regulatory... As data's gotten more important, as you might imagine, the regulators have gotten more interested in what organizations are doing with data. Having a framework with that, organizes and helps you be more compliant with those regulations is absolutely critical. >> Yeah, my understanding of GDPR, if you don't comply, there's hefty fines. >> Chris: Major Fines. >> Major Fines. That are going to hit you. Does Unifi solve that? Is there other re-architecture, redesign that customers need to do to be able to be compliant? [speaking at The same Time] >> No, no that's the whole idea again where being able to leave the data where it is, but know what it is and know where it is and if and when I need to use it and where it came from and where it's going and where it went. All of those things, so we provide the platform that enables the customers to use it or the partners to build the solutions for their customers. >> Curious, customers, their adoption of public cloud, how does that play into what you are doing? They deploy more SAS environments. We were having a conversation off camera today talking about the consolidation that's happening in the software world. What does those dynamics mean for your customers? >> Well public cloud is obviously booming and growing and any organization has some public cloud infrastructure at this point, just about any organization. There's some very heavily regulated areas. Actually health care's probably a good example. Where there's very little public cloud. But even there we're working with... we're part of the Microsoft Accelerator Program. Work very closely with the Azure team, for instance. And they're working in some health care environments, where you have to be things like HIPAA compliant, so there is a lot of caution around that. But none the less, the move to public cloud is certainly happening. I think I was just reading some stats the other day. I can't remember if they're Wikibon or other stats. It's still only about 5% of IT spending. And the reality is organizations of any size have plenty of on-prem data. And of course with all the use of SAS solutions, with Salesforce, Workday, Mercado, all of these different SAS applications, it's also in somebody else's data center, much of our data as well. So it's absolutely a hybrid environment. That's why the report that you guys put out on distributed data, really it spoke so much to what out value proposition is. And that's why you know I'm really glad to be here to talk to you about it. >> Great, Chris tell us a little bit, the company itself, how many employees you have, what metrics can you share about the number of customers, revenue, things like that. >> Sure, no, we've got about, I believe about 65 people at the company right now. I joined like I said earlier this year, late February, early March. At that point we we were like 40 people, so we've been growing very quickly. I can't get in too specifically to like our revenue, but basically we're well in the triple digit growth phase. We're still a small company, but we're growing quickly. Our number of customers it's up in the triple digits as well. So expanding very rapidly. And again we're a platform company, so we serve a variety of industries. Some of the big ones are health care, financial services. But even more in the industries it tends to be driven by these used cases I talked about as well. And we're building out our partnerships also, so that's a big part of what I do also. >> Can you share anything about funding where you are? >> Oh yeah, funding, you asked about that, sorry. Yes, we raised our B round of funding, which closed in March of this year. So we [mumbles], a company called Pelion Venture Partners, who you may know, Canaan Partners, and then most recently Scale Venture Partners are investors. So the companies raised a little over $32 million dollars so far. >> Partnerships, you mentioned Microsoft already. Any other key partnerships you want to call out? >> We're doing a lot of work. We have a very broad partner network, which we're building up, but some of the ones that we are sort of leaning in the most with, Microsoft is certainly one. We're doing a lot of work guys at Cloudera as well. We also work with Hortonworks, we also work with MapR. We're really working almost across the board in the BI space. We have spent a lot of time with the folks at Looker. Who was also a partner I was working with very closely during my Vertica days. We're working with Qlik, we're working with Tableau. We're really working with actually just about everybody in sort of BI and visualization. I don't think people like the term BI anymore. The desktop visualization space. And then on public cloud, also Google, Amazon, so really all the kind of major players. I would say that they're the ones that we worked with the most closely to date. As I mentioned earlier we're part of the Microsoft Accelerator Program, so we're certainly very involved in the Microsoft ecosystem. I actually just wrote a blog post, which I don't believe has been published yet, about some of the, what we call the full stack solutions we have been rolling out with Microsoft for a few customers. Where we're sitting on Azure, we're using HDInsight, which is essentially Microsoft's Hadoop cloud Hadoop distribution, visualized empower BI. So we've really got to lot of deep integration with Microsoft, but we've got a broad network as well. And then I should also mention service providers. We're building out our service provider partnerships also. >> Yeah, Chris I'm surprised we haven't talked about kind of AI yet at all, machine learning. It feels like everybody that was doing big data, now has kind pivoted in maybe a little bit early in the buzz word phase. What's your take on that? You've been apart of this for a while. Is big data just old now and we have a new thing, or how do you put those together? >> Well I think what we do maps very well until, at least my personal view of what's going on with AI/ML, is that it's really part of the fabric of what our product does. I talked before about once you sort of found the data you want to use, how do I use it? Well there's a lot of ML built into that. Where essentially, I see these different datasets, I want to use them... We do what's called one click functions. Which basically... What happens is these one click functions get smarter as more and more people use the product and use the data. So that if I've got some table over here and then I've got some SAS data source over there and one user of the product... or we might see field names that we, we grab the metadata, even though we don't require moving the data, we grab the metadata, we look at the metadata and then we'll sort of tell the user, we suggest that you join this data source with that data source and see what it looks like. And if they say: ah that worked, then we say oh okay that's part of sort of the whole ML infrastructure. Then we are more likely to advise the next few folks with the one click function that, hey if you trying to do a analysis of sales trends, well you might want to use this source and that source and you might want to join them together this way. So it's a combination of sort of AI and ML built into the fabric of what we do, and then also the community aspect of more and more people using it. But that's, going back to your original question, That's what I think that... There was quote, I'll misquote it, so I'm not going to directly say it, but it was just.. I think it might have John Ferrier, who was recently was talking about ML and just sort of saying you know eventually we're not going to talk about ML anymore than we talk about phone business or something. It's just going to become sort of integrated into the fabric of how organizations do business and how organizations do things. So we very much got it built in. You could certainly call us an AI/ML company if you want, its actually definitely part of our slide deck. But at the same time its something that will just sort of become a part of doing business over time. But it really, it depends on large data sets. As we all know, this is why it's so cheap to get Amazon Echoes and such these days. Because it's really beneficial, because the more data... There's value in that data, there was just another piece, I actually shared it on Linkedin today as a matter of fact, about, talking about Amazon and Whole Foods and saying: why are they getting such a valuation premium? They're getting such a valuation premium, because they're smart about using data, but one of the reasons they're smart about using the data is cause they have the data. So the more data you collect, the more data you use, the smarter the systems get, the more useful the solutions become. >> Absolutely, last year when Amazon reinvented, John Ferrier interviewed Andy Jassy and I had posited that the customer flywheel, is going to be replaced by that data flywheel. And enhanced to make things spin even further. >> That's exactly right and once you get that flywheel going it becomes a bigger and bigger competitive advantage, by the way that's also why the regulators are getting interested these days too, right? There's sort of, that flywheel going back the other way, but from our perspective... I mean first of all it just makes economic sense, right? These things could conceivably get out of control, that's at least what the regulators think, if you're not careful at least there's some oversight and I would say that, yes probably some oversight is a good idea, so you've got kind of flywheels pushing in both directions. But one way or another organizations need to get much smarter and much more precise and prescriptive about how they use data. And that's really what we're trying to help with. >> Okay, Chris want to give you the final word, Unify Software, you're working on kind of the strategic road pieces. What should we look for from you in your segment through the rest of 2017? >> Well, I think, I've always been a big believer, I've probably cited 'Crossing the Chasm' like so many times on theCUBE, during my prior HP 10 year and such but you know, I'm a big believer and we should be talking about customers, we should be talking about used cases. It's not about alphabet soup technology or data lakes, it's about the solutions and it's about how organizations are moving themselves forward with data. Going back to that Amazon example, so I think from us, yes we just released 2.O, we've got a very active blog, come by unifisoftware.com, visit it. But it's also going to be around what our customers are doing and that's really what we're going to try to promote. I mean if you remember this was also something, that for all the years I've worked with you guys I've been very much... You always have to make sure that the customer has agreed to be cited, it's nice when you can name them and reference them and we're working on our customer references, because that's what I think is the most powerful in this day and age, because again, going back to my, what I said before about, this is going throughout organizations now. People don't necessarily care about the technology infrastructure, but they care about what's being done with it. And so, being able to tell those customer stories, I think that's what you're going to probably see and hear the most from us. But we'll talk about our product as much as you let us as well. >> Great thing, it reminds me of when Wikibon was founded it was really about IT practice, users being able to share with their peers. Now when the software economy today, when they're doing things in software often that can be leveraged by their peers and that flywheel that they're doing, just like when Salesforce first rolled out, they make one change and then everybody else has that option. We're starting to see that more and more as we deploy as SAS and as cloud, it's not the shrink wrap software anymore. >> I think to that point, you know, I was at a conference earlier this year and it was an IT conference, but I was really sort of floored, because when you ask what we're talking about, what the enlightened IT folks and there is more and more enlightened IT folks we're talking about these days, it's the same thing. Right, it's how our business is succeeding, by being better at leveraging data. And I think the opportunities for people in IT... But they really have to think outside of the box, it's not about Hadoop and Sqoop and Sequel and Java anymore it's really about business solutions, but if you can start to think that way, I think there's tremendous opportunities and we're just scratching the surface. >> Absolutely, we found that really some of the proof points of what digital transformation really is for the companies. Alright Chris Selland, always a pleasure to catch up with you. Thanks so much for joining us and thank you for watching theCUBE. >> Chris: Thanks too. (techno music)

Published Date : Aug 2 2017

SUMMARY :

Narrator: From the Silicon Angle Media Office Great to see you Chris. we'd had you in your previous role many times. I think not only is the first time we've had you on But I joined the company about six months ago at this point. And of course data is critical to that. it's in legacy systems, it's in the data center, I have talked to people. the data warehousing market. So I think really and its really when you step back and it's going to take me 18 months to roll out. But at the same time if you want to move it you can. You mention data scientist, the business users. and give the ability to respond more quickly Yeah, my understanding of GDPR, if you don't comply, that customers need to do to be able to be compliant? that enables the customers how does that play into what you are doing? to be here to talk to you about it. what metrics can you share about the number of customers, But even more in the industries it tends to be So the companies raised a little Any other key partnerships you want to call out? so really all the kind of major players. in the buzz word phase. So the more data you collect, the more data you use, and I had posited that the customer flywheel, There's sort of, that flywheel going back the other way, What should we look for from you in your segment that for all the years I've worked with you guys We're starting to see that more and more as we deploy I think to that point, you know, and thank you for watching theCUBE. Chris: Thanks too.

ENTITIES

Entity	Category	Confidence
Chris	PERSON	0.99+
George Gilbert	PERSON	0.99+
John Ferrier	PERSON	0.99+
Unifi	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Europe	LOCATION	0.99+
Microsoft	ORGANIZATION	0.99+
Chris Selland	PERSON	0.99+
Stu Miniman	PERSON	0.99+
Pelion Venture Partners	ORGANIZATION	0.99+
HP	ORGANIZATION	0.99+
Greenplum	ORGANIZATION	0.99+
Peter Burris	PERSON	0.99+
Google	ORGANIZATION	0.99+
Vertica	ORGANIZATION	0.99+
Stu	PERSON	0.99+
Unifi Software	ORGANIZATION	0.99+
Whole Foods	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
General Data Protection Regulation	TITLE	0.99+
Canaan Partners	ORGANIZATION	0.99+
Andy Jassy	PERSON	0.99+
EMC	ORGANIZATION	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
last year	DATE	0.99+
Looker	ORGANIZATION	0.99+
May next year	DATE	0.99+
EU	ORGANIZATION	0.99+
late February	DATE	0.99+
40 people	QUANTITY	0.99+
18 months	QUANTITY	0.99+
MoneyGram	ORGANIZATION	0.99+
Qlik	ORGANIZATION	0.99+
HP/HP	ORGANIZATION	0.99+
Scale Venture Partners	ORGANIZATION	0.99+
360 views	QUANTITY	0.99+
one	QUANTITY	0.99+
MapR	ORGANIZATION	0.99+
GDPR	TITLE	0.99+
Cloudera	ORGANIZATION	0.99+
early March	DATE	0.99+
Echoes	COMMERCIAL_ITEM	0.99+
Both	QUANTITY	0.99+
Tableau	ORGANIZATION	0.99+
millions of dollars	QUANTITY	0.99+
Boston, Massachusetts	LOCATION	0.99+
both	QUANTITY	0.98+
Wikibon	ORGANIZATION	0.98+
Linkedin	ORGANIZATION	0.98+
one click	QUANTITY	0.98+
one place	QUANTITY	0.98+
Java	TITLE	0.98+
2007	DATE	0.98+
over $32 million	QUANTITY	0.98+
today	DATE	0.98+
Spark	TITLE	0.98+
HIPAA	TITLE	0.98+
first time	QUANTITY	0.98+
earlier this year	DATE	0.98+
unifisoftware.com	OTHER	0.98+
10 year	QUANTITY	0.97+

Arun Murthy, Hortonworks | DataWorks Summit 2017

>> Announcer: Live from San Jose, in the heart of Silicon Valley, it's theCUBE covering DataWorks Summit 2017. Brought to you by Hortonworks. >> Good morning, welcome to theCUBE. We are live at day 2 of the DataWorks Summit, and have had a great day so far, yesterday and today, I'm Lisa Martin with my co-host George Gilbert. George and I are very excited to be joined by a multiple CUBE alumni, the co-founder and VP of Engineering at Hortonworks Arun Murthy. Hey, Arun. >> Thanks for having me, it's good to be back. >> Great to have you back, so yesterday, great energy at the event. You could see and hear behind us, great energy this morning. One of the things that was really interesting yesterday, besides the IBM announcement, and we'll dig into that, was that we had your CEO on, as well as Rob Thomas from IBM, and Rob said, you know, one of the interesting things over the last five years was that there have been only 10 companies that have beat the S&P 500, have outperformed, in each of the last five years, and those companies have made big bets on data science and machine learning. And as we heard yesterday, these four meta-trains IoT, cloud streaming, analytics, and now the fourth big leg, data science. Talk to us about what Hortonworks is doing, you've been here from the beginning, as a co-founder I've mentioned, you've been with Hadoop since it was a little baby. How is Hortonworks evolving to become one of those big users making big bets on helping your customers, and yourselves, leverage machine loading to really drive the business forward? >> Absolutely, a great question. So, you know, if you look at some of the history of Hadoop, it started off with this notion of a data lake, and then, I'm talking about the enterprise side of Hadoop, right? I've been working for Hadoop for about 12 years now, you know, the last six of it has been as a vendor selling Hadoop to enterprises. They started off with this notion of data lake, and as people have adopted that vision of a data lake, you know, you bring all the data in, and now you're starting to get governance and security, and all of that. Obviously the, one of the best ways to get value over the data is the notion of, you know, can you, sort of, predict what is going to happen in your world of it, with your customers, and, you know, whatever it is with the data that you already have. So that notion of, you know, Rob, our CEO, talks about how we're trying to move from a post-transactional world to a pre-transactional world, and doing the analytics and data sciences will be, obviously, with me. We could talk about, and there's so many applications of it, something as similar as, you know, we did a demo last year of, you know, of how we're working with a freight company, and we're starting to show them, you know, predict which drivers and which routes are going to have issues, as they're trying to move, alright? Four years ago we did the same demo, and we would say, okay this driver has, you know, we would show that this driver had an issue on this route, but now, within the world, we can actually predict and let you know to take preventive measures up front. Similarly internally, you know, you can take things from, you know, mission-learning, and log analytics, and so on, we have a internal problem, you know, where we have to test two different versions of HDP itself, and as you can imagine, it's a really, really hard problem. We have the support, 10 operating systems, seven databases, like, if you multiply that matrix, it's, you know, tens of thousands of options. So, if you do all that testing, we now use mission-learning internally, to look through the logs, and kind of predict where the failures were, and help our own, sort of, software engineers understand where the problems were, right? An extension of that has been, you know, the work we've done in Smartsense, which is a service we offer our enterprise customers. We collect logs from their Hadoop clusters, and then they can actually help them understand where they can either tune their applications, or even tune their hardware, right? They might have a, you know, we have this example I really like where at a really large enterprise Financial Services client, they had literally, you know, hundreds and, you know, and thousands of machines on HDP, and we, using Smartsense, we actually found that there were 25 machines which had bad NIC configuration, and we proved to them that by fixing those, we got a 30% to put back on their cluster. At that scale, it's a lot of money, it's a lot of cap, it's a lot of optics So, as a company, we try to ourselves, as much as we, kind of, try to help our customers adopt it, that make sense? >> Yeah, let's drill down on that even a little more, cause it's pretty easy to understand what's the standard telemetry you would want out of hardware, but as you, sort of, move up the stack the metrics, I guess, become more custom. So how do you learn, not just from one customer, but from many customers especially when you can't standardize what you're supposed to pull out of them? >> Yeah so, we're sort of really big believers in, sort of, doctoring your own stuff, right? So, we talk about the notion of data lake, we actually run a Smartsense data lake where we actually get data across, you know, the hundreds of of our customers, and we can actually do predictive mission-learning on that data in our own data lake. Right? And to your point about how we go up the stack, this is, kind of, where we feel like we have a natural advantage because we work on all the layers, whether it's the sequel engine, or the storage engine, or, you know, above and beyond the hardware. So, as we build these models, we understand that we need more, or different, telemetry right? And we put that back into the product so the next version of HDP will have that metrics that we wanted. And, now we've been doing this for a couple of years, which means we've done three, four, five turns of the crank, obviously something we always get better at, but I feel like, compared to where we were a couple of years ago when Smartsense first came out, it's actually matured quite a lot, from that perspective. >> So, there's a couple different paths you can add to this, which is customers might want, as part of their big data workloads, some non-Hortonworks, you know, services or software when it's on-prem, and then can you also extend this management to the Cloud if they want to hybrid setup where, in the not too distant future, the Cloud vendor will be also a provider for this type of management. >> So absolutely, in fact it's true today when, you know, we work with, you know, Microsoft's a great partner of ours. We work with them to enable Smartsense on HDI, which means we can actually get the same telemetry back, whether you're running the data on an on-prem HDP, or you're running this on HDI. Similarly, we shipped a version of our Cloud product, our Hortonworks Data Cloud, on Amazon and again Smartsense preplanned there, so whether you're on an Amazon, or a Microsoft, or on-prem, we get the same telemetry, we get the same data back. We can actually, if you're a customer using many of these products, we can actually give you that telemetry back. Similarly, if you guys probably know this we have, you were probably there in an analyst when they announced the Flex Support subscription, which means that now we can actually take the support subscription you have to get from Hortonworks, and you can actually use it on-prem or on the Cloud. >> So in terms of transforming, HDP for example, just want to make sure I'm understanding this, you're pulling in data from customers to help evolve the product, and that data can be on-prem, it can be in a Microsoft lesur, it can be an AWS? >> Exactly. The HDP can be running in any of these, we will actually pull all of them to our data lake, and they actually do the analytics for us and then present it back to the customers. So, in our support subscription, the way this works is we do the analytics in our lake, and it pushes it back, in fact to our support team tickets, and our sales force, and all the support mechanisms. And they get a set of recommendations saying Hey, we know this is the work loads you're running, we see these are the opportunities for you to do better, whether it's tuning a hardware, tuning an application, tuning the software, we sort of send the recommendations back, and the customer can go and say Oh, that makes sense, the accept that and we'll, you know, we'll update the recommendation for you automatically. Then you can have, or you can say Maybe I don't want to change my kernel pedometers, let's have a conversation. And if the customer, you know, is going through with that, then they can go and change it on their own. We do that, sort of, back and forth with the customer. >> One thing that just pops into my mind is, we talked a lot yesterday about data governance, are there particular, and also yesterday on stage were >> Arun: With IBM >> Yes exactly, when we think of, you know, really data-intensive industries, retail, financial services, insurance, healthcare, manufacturing, are there particular industries where you're really leveraging this, kind of, bi-directional, because there's no governance restrictions, or maybe I shouldn't say none, but. Give us a sense of which particular industries are really helping to fuel the evolution of Hortonworks data lake. >> So, I think healthcare is a great example. You know, when we started off, sort of this open-source project, or an atlas, you know, a couple of years ago, we got a lot of traction in the healthcare sort of insurance industry. You know, folks like Aetna were actually founding members of that, you know, sort of consortium of doing this, right? And, we're starting to see them get a lot of leverage, all of this. Similarly now as we go into, you know, Europe and expand there, things like GDPR, are really, really being pardoned, right? And, you guys know GDPR is a really big deal. Like, you pay, if you're not compliant by, I think it's like March of next year, you pay a portion of your revenue as fines. That's, you know, big money for everybody. So, I think that's what we're really excited about the portion with IBM, because we feel like the two of us can help a lot of customers, especially in countries where they're significantly, highly regulated, than the United States, to actually get leverage our, sort of, giant portfolio of products. And IBM's been a great company to atlas, they've adopted wholesale as you saw, you know, in the announcements yesterday. >> So, you're doing a Keynote tomorrow, so give us maybe the top three things, you're giving the Keynote on Data Lake 3.0, walk us through the evolution. Data Lakes 1.0, 2.0, 3.0, where you are now, and what folks can expect to hear and see in your Keynote. >> Absolutely. So as we've, kind of, continued to work with customers and we see the maturity model of customers, you know, initially people are staying up a data lake, and then they'd want, you know, sort of security, basic security what it covers, and so on. Now, they want governance, and as we're starting to go to that journey clearly, our customers are pushing us to help them get more value from the data. It's not just about putting the data lake, and obviously managing data with governance, it's also about Can you help us, you know, do mission-learning, Can you help us build other apps, and so on. So, as we look to there's a fundamental evolution that, you know, Hadoop legal system had to go through was with advance of technologies like, you know, a Docker, it's really important first to help the customers bring more than just workloads, which are sort of native to Hadoop. You know, Hadoop started off with MapReduce, obviously Spark's went great, and now we're starting to see technologies like Flink coming, but increasingly, you know, we want to do data science. To mass market data science is obviously, you know, people, like, want to use Spark, but the mass market is still Python, and R, and so on, right? >> Lisa: Non-native, okay. >> Non-native. Which are not really built, you know, these predate Hadoop by a long way, right. So now as we bring these applications in, having technology like Docker is really important, because now we can actually containerize these apps. It's not just about running Spark, you know, running Spark with R, or running Spark with Python, which you can do today. The problem is, in a true multi-tenant governed system, you want, not just R, but you want specifics of a libraries for R, right. And the libraries, you know, George wants might be completely different than what I want. And, you know, you can't do a multi-tenant system where you install both of them simultaneously. So Docker is a really elegant solution to problems like those. So now we can actually bring those technologies into a Docker container, so George's Docker containers will not, you know, conflict with mine. And you can actually go to the races, you know after the races, we're doing data signs. Which is really key for technologies like DSX, right? Because with DSX if you see, obviously DSX supports Spark with technologies like, you know, Zeppelin which is a front-end, but they also have Jupiter, which is going to work the mass market users for Python and R, right? So we want to make sure there's no friction whether it's, sort of, the guys using Spark, or the guys using R, and equally importantly DSX, you know, in the short map will also support things like, you know, the classic IBM portfolio, SBSS and so on. So bringing all of those things in together, making sure they run with data in the data lake, and also the computer in the data lake, is really big for us. >> Wow, so it sounds like your Keynote's going to be very educational for the folks that are attending tomorrow, so last question for you. One of the themes that occurred in the Keynote this morning was sharing a fun-fact about these speakers. What's a fun-fact about Arun Murthy? >> Great question. I guess, you know, people have been looking for folks with, you know, 10 years of experience on Hadoop. I'm here finally, right? There's not a lot of people but, you know, it's fun to be one of those people who've worked on this for about 10 years. Obviously, I look forward to working on this for another 10 or 15 more, but it's been an amazing journey. >> Excellent. Well, we thank you again for sharing time again with us on theCUBE. You've been watching theCUBE live on day 2 of the Dataworks Summit, hashtag DWS17, for my co-host George Gilbert. I am Lisa Martin, stick around we've got great content coming your way.

Published Date : Jun 14 2017

SUMMARY :

Brought to you by Hortonworks. We are live at day 2 of the DataWorks Summit, and Rob said, you know, one of the interesting and we're starting to show them, you know, when you can't standardize what you're or the storage engine, or, you know, some non-Hortonworks, you know, services when, you know, we work with, you know, And if the customer, you know, Yes exactly, when we think of, you know, Similarly now as we go into, you know, Data Lakes 1.0, 2.0, 3.0, where you are now, with advance of technologies like, you know, And the libraries, you know, George wants One of the themes that occurred in the Keynote this morning There's not a lot of people but, you know, Well, we thank you again for sharing time again

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
Lisa Martin	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Rob	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Rob Thomas	PERSON	0.99+
George	PERSON	0.99+
Lisa	PERSON	0.99+
30%	QUANTITY	0.99+
San Jose	LOCATION	0.99+
Microsoft	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
25 machines	QUANTITY	0.99+
10 operating systems	QUANTITY	0.99+
hundreds	QUANTITY	0.99+
Arun Murthy	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
two	QUANTITY	0.99+
Aetna	ORGANIZATION	0.99+
10 years	QUANTITY	0.99+
Arun	PERSON	0.99+
today	DATE	0.99+
Spark	TITLE	0.99+
yesterday	DATE	0.99+
AWS	ORGANIZATION	0.99+
both	QUANTITY	0.99+
Python	TITLE	0.99+
last year	DATE	0.99+
Four years ago	DATE	0.99+
15	QUANTITY	0.99+
tomorrow	DATE	0.99+
CUBE	ORGANIZATION	0.99+
three	QUANTITY	0.99+
DataWorks Summit	EVENT	0.99+
seven databases	QUANTITY	0.98+
four	QUANTITY	0.98+
DataWorks Summit 2017	EVENT	0.98+
United States	LOCATION	0.98+
Dataworks Summit	EVENT	0.98+
10	QUANTITY	0.98+
Europe	LOCATION	0.97+
10 companies	QUANTITY	0.97+
One	QUANTITY	0.97+
one customer	QUANTITY	0.97+
thousands of machines	QUANTITY	0.97+
about 10 years	QUANTITY	0.96+
GDPR	TITLE	0.96+
Docker	TITLE	0.96+
Smartsense	ORGANIZATION	0.96+
about 12 years	QUANTITY	0.95+
this morning	DATE	0.95+
each	QUANTITY	0.95+
two different versions	QUANTITY	0.95+
five turns	QUANTITY	0.94+
R	TITLE	0.93+
four meta-trains	QUANTITY	0.92+
day 2	QUANTITY	0.92+
Data Lakes 1.0	COMMERCIAL_ITEM	0.92+
Flink	ORGANIZATION	0.91+
first	QUANTITY	0.91+
HDP	ORGANIZATION	0.91+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for TOC: