Chris Grusz, AWS | AWS Marketplace Seller Conference 2022

>>Hello. And welcome back to the cubes live coverage here in Seattle for the cubes coverage of AWS marketplace seller conference. Now part of really big move and news, Amazon partner network combines with AWS marketplace to form one organization, the Amazon partner organization, APO where the efficiencies, the next iteration, as they say in Amazon language, where they make things better, simpler, faster, and, and for customers is happening. We're here with Chris Cruz, who's the general manager, worldwide leader of ISV alliances and marketplace, which includes all the channel partners and the buyer and seller relationships all now under one partner organization, bringing together years of work. Yes. If you work with AWS and are a partner and, or sell with them, all kind of coming together, kind of in a new way for the next generation, Chris, congratulations on the new role and the reor. >>Thank you. Yeah, it's very exciting. We're we think it invent, simplifies the process on how we work with our partners and we're really optimistic so far. The feedback's been great. And I think it's just gonna get even better as we kind of work out the final details. >>This is huge news because one, we've been very close to the partner that we've been working with and we talking to, we cover them. We cover the news, the startups from startups, channel partners, big ISVs, big and small from the dorm room to the board room. You guys have great relationships. So check marketplace, the future of procurement, how software will be bought, implemented and deployed is also changed. So you've got the confluence of two worlds coming together, growth in the ecosystem. Yep. NextGen cloud on the horizon for AWS and the customers as digital transformation goes from lift and shift to refactoring businesses. Yep. This is really a seminal moment. Can you share what you talked about on the keynote stage here, around why this is happening now? Yeah. What's the guiding principle. What's the north star where, why what's what's the big news. >>Yeah. And so, you know, a lot of reasons on why we kind of, we pulled the two teams together, but you know, a lot of it kind gets centered around co-sell. And so if you take a look at marketplace where we started off, where it was really a machine image business, and it was a great self-service model and we were working with ISVs that wanted to have this new delivery mechanism on how to bring in at the time was Amazon machine images and you fast forward, we started adding more product types like SAS and containers. And the experience that we saw was that customers would use marketplace for kind of up to a certain limit on a self-service perspective. But then invariably, they wanted by a quantity discount, they wanted to get an enterprise discount and we couldn't do that through marketplace. And so they would exit us and go do a direct deal with a, an ISV. >>And, and so to remedy that we launched private offers, you know, four years ago. And private offers now allowed ISVs to do these larger deals, but do 'em all through marketplace. And so they could start off doing self-service business. And then as a customer graduated up to buying for a full department or an organization, they can now use private offers to execute that larger agreement. And it, we started to do more and more private offers, really kind of coincided with a lot of the initiatives that were going on within Amazon partner network at the time around co-sell. And, and so we started to launch programs like ISV accelerate that really kind of focused on our co-sell relationship with ISVs. And what we found was that marketplace private offers became this awesome way to automate how we co-sell with ISV. And so we kinda had these two organizations that were parallel. We said, you know what, this is gonna be better together. If we put together, it's gonna invent simplify and we can use marketplace private offers as part of that co-sell experience and really feed that automation layer for all of our ISVs as they interacted with native >>Discussions. Well, I gotta give you props, you and Mona work on stage. You guys did a great job and it reminds me of the humble nature of AWS and Amazon. I used to talk to Andy jazzy about this all the time. That reminds me of 2013 here right now, because you're in that mode where Amazon reinvent was in 2013. Yeah. Where you knew it was breaking out. Yeah. Everyone's it was kind of small, but we haven't made it yet. Yeah. But you guys are doing billions of vows in transactions. Yeah. But this event is really, I think the beginning of what we're seeing as the change over from securing and deploying applications in the cloud, because there's a lot of nuanced things I want to get your reaction on one. I heard making your part product as an ISV, more native to AWS's stack. That was one major call out. I heard the other one was, Hey, if you're a channel partner, you can play too. And by the way, there's more choice. There's a lot going on here. That's about to kind of explode in a good way for customers. Yeah. Buyers get more access to assemble their solutions. Yeah. And you got all kinds of like business logic, compensation, integration, and scale. Yeah. This is like unprecedented. >>Yeah. It's, it's exciting to see what's going on. I mean, I think we kind of saw the tipping point probably about two years ago, which, you know, prior to that, you know, we would be working with ISVs and customers and it was really much more of an evangelism role where we were just getting people to try it. Just, just list a product. We think this is gonna be a good idea. And if you're a buyer, it's like just try out a private offer, try out a self, you know, service subscription. And, and what's happened now is there's no longer a lot of that convincing that needs to happen. It's really become accepted. And so a lot of the conversations I have now with ISVs, it's not about, should I do marketplace it's how do I do it better? And how do I really leverage marketplace as part of my co-sell initiatives as, as part of my go to market strategy. >>And so you've, you've really kind of passed this tipping point where marketplaces are now becoming very accepted ways to buy third party software. And so that's really exciting. And, and we see that we, you know, we can really enhance that experience, you know, and what we saw on the machine image side is we had this awesome integrated experience where you would buy it. It was tied right into the EC two control plane. And you could go from buying to deploying in one single motion. SAS is a little bit different, you know, we can do all the buying in a very simple motion, but then deploying it. There's a whole bunch of other stuff that our customers have to do. And so we see all kinds of ways that we can simplify that. You know, recently we launched the ability to put third party solutions outta marketplace, into control tower, which is how we deploy all of our landing zones for AWS. And now it's like, instead of having to go wire that up as you're adding new AWS environments, why not just use that third party solution that you've already integrated to you and have it there as you're span those landing zones through >>Control towers, again, back to humble nature, you guys have dominated the infrastructure as a service layer. You kind of mentioned it. You didn't really kind of highlight it other than saying you're doing pretty good. Yeah. On the IAS or the technology partners as you call or infrastructure as you guys call it. Okay. I can see how the, the, the pan, the control panel is great for those customers. But outside that, when you get into like CRM, you mentioned E R P these business apps, these horizontal and verticals have data they're gonna have SageMaker, they're gonna have edge. They might have, you know, other services that are coming online from Amazon. How do I, as an ISV, get my stuff in there. Yeah. And how do I succeed? And what are you doing to make that better? Cause I know it's kind of new, but not new. Yeah, >>No, it's not. I mean, that's one of the things that we've really invested on is how do we make it really easy to list marketplace? And, you know, again, when we first start started, it was a big, huge spreadsheet that you had to fill out. It was very cumbersome and we've really automated all those aspects. So now we've exposed an API as an example. So you can go straight out of your own build process and you might have your own C I CD pipeline. And then you have a build step at the end. And now you can have that execute marketplace update from your build script, right across that API all the way over to AWS marketplace. So it's taking that effectively, a C CD pipeline from an ISV and extending it all the way to AWS and then eventually to a customer, because now it's just an automated supply chain for that software coming into their environment. And we see that being super powerful. There's nowhere manual steps >>Along. Yeah. I wanna dig into that because you made a comment and I want you to clarify it here in the cube. Some have said, even us on the cube. Oh, marketplace. Just the website's a catalog. Yeah. Feels old school. Yeah. Feels like 1995 database. I'm kind of just, you know, saying no offense sake. And now you're saying, you're now looking at this and, and implementing more of a API based. Why is that relevant? I'm I know the answer. You already set up with APIs, but explain the transition from the mindset of it's a website. Yeah. Buy stuff on a catalog to full blown API layer. Yeah. Services. >>Absolutely. Well, when you look at all AWS services, you know, our customers will interface, you know, they'll interface them through a console initially, but when they're using them in production, they're, it's all about APIs and marketplace, as you mentioned, did start off as a website. And so we've kind of taken the opposite approach. We've got this great website experience, which is great for demand gen and, you know, highlighting those listings. But what we want to do is really have this API service layer that you're interfacing with so that an ISV effectively is not even in our marketplace. They interfacing over APIs to do a variety of their high, you know, value functions, whether it's listing soy, private offers. We don't have that all available through APIs and the same thing on the buyer side. So it's integrating directly into their AWS environment and then they can view all their third party spend within things like our cost management suites. They can look at things like cost Explorer, see third party software, right next to first party software, and have that all integrated this nice as seamless >>For the customer. That's a nice cloud native kind of native experience. I think that's a huge advantage. I'm gonna track that closer. We're we're gonna follow that. I think that's gonna be the killer killer feature. All right. Now let's get to the killer feature and the business logic. Okay. Yeah. All partners all wanna know what's in it for me. Yeah. How do I make more cash? Yeah. How do I compensate my sales people? Yeah. What do you guys don't compete with me? Give me leads. Yeah. Can I get MDF market development funds? Yeah. So take me through the, how you're thinking about supporting the partners that are leaning in that, you know, the parachute will open when they jump outta the plane. Yeah. It's gonna be, they're gonna land safely with you. Yeah. MDF marketing to leads. What are you doing to support the partners to help them serve their >>Customers? It's interesting. Market marketplace has become much more of an accepted way to buy, you know, our customers are, are really defaulting to that as the way to go get that third party software. So we've had some industry analysts do some studies and in what they found, they interviewed a whole cohort of ISVs across various categories within marketplace, whether it was security or network or even line of business software. And what they've found is that on average, our ISVs will see a 24% increased close rate by using marketplace. Right. So when I go talk to a CRO and say, do you want to close, you know, more deals? Yes. Right. And we've got data to show that we're also finding that customers on average, when an ISV sales marketplace, they're seeing an 80% uplift in the actual deal size. And so if your ASP is a hundred K 180 K has a heck of a lot better, right? >>So we're seeing increased deal sizes by going through marketplace. And then the third thing that we've seen, that's a value prop for ISVs is speed of closure. And so on average, what we're finding is that our ISVs are closing deals 40% faster by using marketplace. So if you've got a 10 month sales cycle, shaving four months off of a sales cycle means you're bringing deals in, in an earlier calendar year, earlier quarter. And for ISVs getting that cash flow early is very important. So those are great metrics that we're seeing. And, and, you know, we think that they're only >>Gonna improve and from startups who also want, they don't have a lot of cash ISVs that are rich and doing well. Yeah. They have good, good, good, good, good to market funding. Yeah. You got the range of partners and you know, the next startup could be the next Figma could be in that batch startups. Exactly. Yeah. You don't know the game is changing. Yeah. The next brand could be one of those batch of startups. Yeah. What's the message to the startup community. Yeah. >>I mean, marketplace in a lot of ways becomes a level in effect, right. Because, you know, if, if you look at pre marketplace, if you were a startup, you were having to go generate sales, have a sales force, go compete, you know, kind of hand to hand with these largest ISVs marketplace is really kind of leveling that because now you can both list in marketplace. You have the same advantage of putting that directly in the AWS bill, taking advantage of all the management go features that we offer all the automation that we bring to the table. And so >>A lot of us joint selling >>And joint selling, right? When it goes through marketplace, you know, it's gonna feed into a number of our APN programs like ISV accelerate, our sales teams are gonna get recognized for those deals. And so, you know, it brings nice co-sell behavior to how we work with our, our field sales teams together. It brings nice automation that, you know, pre marketplaces, they would have to go build all that. And that was a heavy lift that really now becomes just kind of table stakes for any kind of ISV selling to an, any of >>Customer. Well, you know, I'm a big fan of the marketplace. I've always have been, even from the early days, I saw this as a procurement game changer. It makes total sense. It's so obvious. Yeah. Not obvious to everyone, but there's a lot of moving parts behind the scenes behind the curtain. So to speak that you're handling. Yeah. What's your message to the audience out there, both the buyers and the sellers. Yeah. About what your mission is, what you're you wake up every day thinking about. Yeah. And what's your promise to them and what you're gonna work on. Cause it's not easy. You're building a, an operating model. That's not a website. It's a full on cloud service. Yeah. What's your promise. And what's >>Your goals. No. And like, you know, ultimately we're trying to do from an Aus market perspective is, is provide that selection experience to the ABUS customer, right? There's the infamous flywheel that Jeff put together that had the concepts of why Amazon is successful. And one are the concepts he points to is the concept of selection. And, and what we mean by that is if you come to Amazon it's is effectively that everything stored. And when you come across, AWS marketplace becomes that selection experience. And so that's what we're trying to do is provide whatever our AWS customers wanna buy, whatever form factor, whatever software type, whatever data type it's gonna be available in AWS marketplace for consumption. And that ultimately helps our customers because now they can get whatever technologies that they need to use alongside Avis. >>And I want, wanna give you props too. You answered the hard question on stage. I've asked Andy EY this on the cube when he was the CEO, Adam Celski last year, I asked him the same question and the answer has been consistent. We have some solutions that people want a AWS end to end, but your ecosystem, you want people to compete yes. And build a product and mostly point to things like snowflake, new Relic. Yeah. Other people that compete with Amazon services. Yeah. You guys want that. You encourage that. Yeah. You're ratifying that same statement. >>Absolutely. Right. Again, it feeds into that selection experience. Right. If a customer wants something, we wanna make sure it's gonna be a great experience. Right. And so a lot of these ISVs are building on top of AWS. We wanna make sure that they're successful. And, you know, while we have a number of our first party services, we have a variety of third party technologies that run very well in a AWS. And ultimately the customer's gonna make their decision. We're customer obsessed. And if they want to go with a third party product, we're absolutely gonna support them in every way shape we can and make sure that's a successful experience for our customers. >>I, I know you referenced two studies check out the website's got buyer and seller surveys on there for Boer. Yeah. I don't want to get into that. I want to just end on one. Yeah. Kind of final note, you got a lot of successful buyers and a lot of successful sellers. The word billions, yes. With an S was and the slide. Can you say the number, how much, how many billions are sold yeah. Through the marketplace. Yeah. And the buyer experience future what's those two things. >>Yeah. So we went on record at reinvent last year, so it's approaching it birthday, but it was the first year that we've in our 10 year history announced how much was actually being sold to the marketplace. And, you know, we are now selling billions of dollars to our marketplace and that's with an S so you can assume, at least it's two, but it's, it's a, it's a large number and it's going >>Very quickly. Yeah. Can't disclose, you know, >>But it's a, it's been a very healthy part of our business. And you know, we look at this, the experience that we >>Saw, there's a lot of headroom. I mean, oh yeah, you have infrastructure nailed down. That's long, you get better, but you have basically growth up upside with these categor other categories. What's the hot categories. You >>Know, we, we started off with infrastructure related products and we've kind of hit critical mass there. Right? We've, there's very few ISVs left that are in that infrastructure related space that are not in our marketplace. And what's happened now is our customers are saying, well, I've been buying infrastructure products for years. I'm gonna buy everything. I wanna buy my line of business software. I wanna buy my vertical solutions. I wanna buy my data and I wanna buy all my services alongside of that. And so there's tons of upside. We're seeing all of these either horizontal business applications coming to our marketplace or vertical specific solutions. Yeah. Which, you know, when we first designed our marketplace, we weren't sure if that would ever happen. We're starting to see that actually really accelerate because customers are now just defaulting to buying everything through their marketplace. >>Chris, thanks for coming on the queue. I know we went a little extra long. There wanted to get that clarification on the new role. Yeah. New organization. Great, great reorg. It makes a lot of sense. Next level NextGen. Thanks for coming on the cube. Okay. >>Thank you for the opportunity. >>All right here, covering the new big news here of AWS marketplace and the AWS partner network coming together under one coherent organization, serving fires and sellers, billions sold the future of how people are gonna be buying software, deploying it, managing it, operating it. It's all happening in the marketplace. This is the big trend. It's the cue here in Seattle with more coverage here at Davis marketplace sellers conference. After the short break.

Published Date : Sep 21 2022

SUMMARY :

If you work with AWS and are a partner and, or sell with them, And I think it's just gonna get even better Can you share what you talked about on the keynote stage here, And so if you take a look at marketplace where And, and so to remedy that we launched private offers, you know, four years ago. And you got all kinds of like business logic, compensation, integration, And so a lot of the conversations I have now with ISVs, it's not about, should I do marketplace it's how do I do and we see that we, you know, we can really enhance that experience, you know, and what we saw on the machine image side is we And what are you doing to make that better? And then you have a build step at the end. I'm kind of just, you know, saying no offense sake. of their high, you know, value functions, whether it's listing soy, private offers. you know, the parachute will open when they jump outta the plane. Market marketplace has become much more of an accepted way to buy, you know, And, and, you know, we think that they're only of partners and you know, the next startup could be the next Figma could be in that batch startups. have a sales force, go compete, you know, kind of hand to hand with these largest ISVs When it goes through marketplace, you know, it's gonna feed into a number of our APN programs And what's your promise to them and what you're gonna work on. And one are the concepts he points to is the concept of selection. And I want, wanna give you props too. And, you know, while we have a number of our first party services, And the buyer experience future what's those two things. And, you know, we are now selling billions of dollars to our marketplace and that's with an S so you can assume, And you know, we look at this, the experience that we I mean, oh yeah, you have infrastructure nailed down. Which, you know, when we first designed our marketplace, we weren't sure if that would ever happen. I know we went a little extra long. It's the cue here in Seattle with more coverage here at Davis marketplace sellers conference.

ENTITIES

Entity	Category	Confidence
AWS	ORGANIZATION	0.99+
Chris	PERSON	0.99+
Chris Cruz	PERSON	0.99+
Seattle	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
Chris Grusz	PERSON	0.99+
10 month	QUANTITY	0.99+
APO	ORGANIZATION	0.99+
Adam Celski	PERSON	0.99+
24%	QUANTITY	0.99+
Jeff	PERSON	0.99+
two	QUANTITY	0.99+
40%	QUANTITY	0.99+
2013	DATE	0.99+
80%	QUANTITY	0.99+
10 year	QUANTITY	0.99+
last year	DATE	0.99+
two studies	QUANTITY	0.99+
two teams	QUANTITY	0.99+
four months	QUANTITY	0.99+
both	QUANTITY	0.99+
two things	QUANTITY	0.99+
billions	QUANTITY	0.98+
two organizations	QUANTITY	0.98+
two worlds	QUANTITY	0.98+
four years ago	DATE	0.98+
one	QUANTITY	0.98+
1995	DATE	0.97+
billions of vows	QUANTITY	0.97+
one organization	QUANTITY	0.97+
one partner organization	QUANTITY	0.97+
first	QUANTITY	0.97+
Figma	ORGANIZATION	0.97+
Mona	PERSON	0.96+
third thing	QUANTITY	0.95+
ABUS	ORGANIZATION	0.94+
first year	QUANTITY	0.94+
billions of dollars	QUANTITY	0.94+
reinvent	EVENT	0.93+
ISV	ORGANIZATION	0.92+
C CD	TITLE	0.92+
Andy EY	PERSON	0.9+
C I CD	TITLE	0.9+
Davis marketplace sellers	EVENT	0.9+
one major call	QUANTITY	0.86+
one coherent organization	QUANTITY	0.85+
NextGen	ORGANIZATION	0.85+
Andy jazzy	PERSON	0.83+
SageMaker	ORGANIZATION	0.83+
about two years ago	DATE	0.83+
first party	QUANTITY	0.81+
one single motion	QUANTITY	0.79+
earlier quarter	DATE	0.78+
AWS Marketplace Seller Conference 2022	EVENT	0.77+
Avis	ORGANIZATION	0.77+

The New Data Equation: Leveraging Cloud-Scale Data to Innovate in AI, CyberSecurity, & Life Sciences

>> Hi, I'm Natalie Ehrlich and welcome to the AWS startup showcase presented by The Cube. We have an amazing lineup of great guests who will share their insights on the latest innovations and solutions and leveraging cloud scale data in AI, security and life sciences. And now we're joined by the co-founders and co-CEOs of The Cube, Dave Vellante and John Furrier. Thank you gentlemen for joining me. >> Hey Natalie. >> Hey Natalie. >> How are you doing. Hey John. >> Well, I'd love to get your insights here, let's kick it off and what are you looking forward to. >> Dave, I think one of the things that we've been doing on the cube for 11 years is looking at the signal in the marketplace. I wanted to focus on this because AI is cutting across all industries. So we're seeing that with cybersecurity and life sciences, it's the first time we've had a life sciences track in the showcase, which is amazing because it shows that growth of the cloud scale. So I'm super excited by that. And I think that's going to showcase some new business models and of course the keynotes Ali Ghodsi, who's the CEO Data bricks pushing a billion dollars in revenue, clear validation that startups can go from zero to a billion dollars in revenues. So that should be really interesting. And of course the top venture capitalists coming in to talk about what the enterprise dynamics are all about. And what about you, Dave? >> You know, I thought it was an interesting mix and choice of startups. When you think about, you know, AI security and healthcare, and I've been thinking about that. Healthcare is the perfect industry, it is ripe for disruption. If you think about healthcare, you know, we all complain how expensive it is not transparent. There's a lot of discussion about, you know, can everybody have equal access that certainly with COVID the staff is burned out. There's a real divergence and diversity of the quality of healthcare and you know, it all results in patients not being happy, and I mean, if you had to do an NPS score on the patients and healthcare will be pretty low, John, you know. So when I think about, you know, AI and security in the context of healthcare in cloud, I ask questions like when are machines going to be able to better meet or make better diagnoses than doctors? And that's starting. I mean, it's really in assistance putting into play today. But I think when you think about cheaper and more accurate image analysis, when you think about the overall patient experience and trust and personalized medicine, self-service, you know, remote medicine that we've seen during the COVID pandemic, disease tracking, language translation, I mean, there are so many things where the cloud and data, and then it can help. And then at the end of it, it's all about, okay, how do I authenticate? How do I deal with privacy and personal information and tamper resistance? And that's where the security play comes in. So it's a very interesting mix of startups. I think that I'm really looking forward to hearing from... >> You know Natalie one of the things we talked about, some of these companies, Dave, we've talked a lot of these companies and to me the business model innovations that are coming out of two factors, the pandemic is kind of coming to an end so that accelerated and really showed who had the right stuff in my opinion. So you were either on the wrong side or right side of history when it comes to the pandemic and as we look back, as we come out of it with clear growth in certain companies and certain companies that adopted let's say cloud. And the other one is cloud scale. So the focus of these startup showcases is really to focus on how startups can align with the enterprise buyers and create the new kind of refactoring business models to go from, you know, a re-pivot or refactoring to more value. And the other thing that's interesting is that the business model isn't just for the good guys. If you look at say ransomware, for instance, the business model of hackers is gone completely amazing too. They're kicking it but in terms of revenue, they have their own they're well-funded machines on how to extort cash from companies. So there's a lot of security issues around the business model as well. So to me, the business model innovation with cloud-scale tech, with the pandemic forcing function, you've seen a lot of new kinds of decision-making in enterprises. You seeing how enterprise buyers are changing their decision criteria, and frankly their existing suppliers. So if you're an old guard supplier, you're going to be potentially out because if you didn't deliver during the pandemic, this is the issue that everyone's talking about. And it's kind of not publicized in the press very much, but this is actually happening. >> Well thank you both very much for joining me to kick off our AWS startup showcase. Now we're going to go to our very special guest Ali Ghodsi and John Furrier will seat with him for a fireside chat and Dave and I will see you on the other side. >> Okay, Ali great to see you. Thanks for coming on our AWS startup showcase, our second edition, second batch, season two, whatever we want to call it it's our second version of this new series where we feature, you know, the hottest startups coming out of the AWS ecosystem. And you're one of them, I've been there, but you're not a startup anymore, you're here pushing serious success on the revenue side and company. Congratulations and great to see you. >> Likewise. Thank you so much, good to see you again. >> You know I remember the first time we chatted on The Cube, you weren't really doing much software revenue, you were really talking about the new revolution in data. And you were all in on cloud. And I will say that from day one, you were always adamant that it was cloud cloud scale before anyone was really talking about it. And at that time it was on premises with Hadoop and those kinds of things. You saw that early. I remember that conversation, boy, that bet paid out great. So congratulations. >> Thank you so much. >> So I've got to ask you to jump right in. Enterprises are making decisions differently now and you are an example of that company that has gone from literally zero software sales to pushing a billion dollars as it's being reported. Certainly the success of Data bricks has been written about, but what's not written about is the success of how you guys align with the changing criteria for the enterprise customer. Take us through that and these companies here are aligning the same thing and enterprises want to change. They want to be in the right side of history. What's the success formula? >> Yeah. I mean, basically what we always did was look a few years out, the how can we help these enterprises, future proof, what they're trying to achieve, right? They have, you know, 30 years of legacy software and, you know baggage, and they have compliance and regulations, how do we help them move to the future? So we try to identify those kinds of secular trends that we think are going to maybe you see them a little bit right now, cloud was one of them, but it gets more and more and more. So we identified those and there were sort of three or four of those that we kind of latched onto. And then every year the passes, we're a little bit more right. Cause it's a secular trend in the market. And then eventually, it becomes a force that you can't kind of fight anymore. >> Yeah. And I just want to put a plug for your clubhouse talks with Andreessen Horowitz. You're always on clubhouse talking about, you know, I won't say the killer instinct, but being a CEO in a time where there's so much change going on, you're constantly under pressure. It's a lonely job at the top, I know that, but you've made some good calls. What was some of the key moments that you can point to, where you were like, okay, the wave is coming in now, we'd better get on it. What were some of those key decisions? Cause a lot of these startups want to be in your position, and a lot of buyers want to take advantage of the technology that's coming. They got to figure it out. What was some of those key inflection points for you? >> So if you're just listening to what everybody's saying, you're going to miss those trends. So then you're just going with the stream. So, Juan you mentioned that cloud. Cloud was a thing at the time, we thought it's going to be the thing that takes over everything. Today it's actually multi-cloud. So multi-cloud is a thing, it's more and more people are thinking, wow, I'm paying a lot's to the cloud vendors, do I want to buy more from them or do I want to have some optionality? So that's one. Two, open. They're worried about lock-in, you know, lock-in has happened for many, many decades. So they want open architectures, open source, open standards. So that's the second one that we bet on. The third one, which you know, initially wasn't sort of super obvious was AI and machine learning. Now it's super obvious, everybody's talking about it. But when we started, it was kind of called artificial intelligence referred to robotics, and machine learning wasn't a term that people really knew about. Today, it's sort of, everybody's doing machine learning and AI. So betting on those future trends, those secular trends as we call them super critical. >> And one of the things that I want to get your thoughts on is this idea of re-platforming versus refactoring. You see a lot being talked about in some of these, what does that even mean? It's people trying to figure that out. Re-platforming I get the cloud scale. But as you look at the cloud benefits, what do you say to customers out there and enterprises that are trying to use the benefits of the cloud? Say data for instance, in the middle of how could they be thinking about refactoring? And how can they make a better selection on suppliers? I mean, how do you know it used to be RFP, you deliver these speeds and feeds and you get selected. Now I think there's a little bit different science and methodology behind it. What's your thoughts on this refactoring as a buyer? What do I got to do? >> Well, I mean let's start with you said RFP and so on. Times have changed. Back in the day, you had to kind of sign up for something and then much later you're going to get it. So then you have to go through this arduous process. In the cloud, would pay us to go model elasticity and so on. You can kind of try your way to it. You can try before you buy. And you can use more and more. You can gradually, you don't need to go in all in and you know, say we commit to 50,000,000 and six months later to find out that wow, this stuff has got shelf where it doesn't work. So that's one thing that has changed it's beneficial. But the second thing is, don't just mimic what you had on prem in the cloud. So that's what this refactoring is about. If you had, you know, Hadoop data lake, now you're just going to have an S3 data lake. If you had an on-prem data warehouse now you just going to have a cloud data warehouse. You're just repeating what you did on prem in the cloud, architected for the future. And you know, for us, the most important thing that we say is that this lake house paradigm is a cloud native way of organizing your data. That's different from how you would do things on premises. So think through what's the right way of doing it in the cloud. Don't just try to copy paste what you had on premises in the cloud. >> It's interesting one of the things that we're observing and I'd love to get your reaction to this. Dave a lot** and I have been reporting on it is, two personas in the enterprise are changing their organization. One is I call IT ops or there's an SRE role developing. And the data teams are being dismantled and being kind of sprinkled through into other teams is this notion of data, pipelining being part of workflows, not just the department. Are you seeing organizational shifts in how people are organizing their resources, their human resources to take advantage of say that the data problems that are need to being solved with machine learning and whatnot and cloud-scale? >> Yeah, absolutely. So you're right. SRE became a thing, lots of DevOps people. It was because when the cloud vendors launched their infrastructure as a service to stitch all these things together and get it all working you needed a lot of devOps people. But now things are maturing. So, you know, with vendors like Data bricks and other multi-cloud vendors, you can actually get much higher level services where you don't need to necessarily have lots of lots of DevOps people that are themselves trying to stitch together lots of services to make this work. So that's one trend. But secondly, you're seeing more data teams being sort of completely ubiquitous in these organizations. Before it used to be you have one data team and then we'll have data and AI and we'll be done. ' It's a one and done. But that's not how it works. That's not how Google, Facebook, Twitter did it, they had data throughout the organization. Every BU was empowered. It's sales, it's marketing, it's finance, it's engineering. So how do you embed all those data teams and make them actually run fast? And you know, there's this concept of a data mesh which is super important where you can actually decentralize and enable all these teams to focus on their domains and run super fast. And that's really enabled by this Lake house paradigm in the cloud that we're talking about. Where you're open, you're basing it on open standards. You have flexibility in the data types and how they're going to store their data. So you kind of provide a lot of that flexibility, but at the same time, you have sort of centralized governance for it. So absolutely things are changing in the market. >> Well, you're just the professor, the masterclass right here is amazing. Thanks for sharing that insight. You're always got to go out of date and that's why we have you on here. You're amazing, great resource for the community. Ransomware is a huge problem, it's now the government's focus. We're being attacked and we don't know where it's coming from. This business models around cyber that's expanding rapidly. There's real revenue behind it. There's a data problem. It's not just a security problem. So one of the themes in all of these startup showcases is data is ubiquitous in the value propositions. One of them is ransomware. What's your thoughts on ransomware? Is it a data problem? Does cloud help? Some are saying that cloud's got better security with ransomware, then say on premise. What's your vision of how you see this ransomware problem being addressed besides the government taking over? >> Yeah, that's a great question. Let me start by saying, you know, we're a data company, right? And if you say you're a data company, you might as well just said, we're a privacy company, right? It's like some people say, well, what do you think about privacy? Do you guys even do privacy? We're a data company. So yeah, we're a privacy company as well. Like you can't talk about data without talking about privacy. With every customer, with every enterprise. So that's obviously top of mind for us. I do think that in the cloud, security is much better because, you know, vendors like us, we're investing so much resources into security and making sure that we harden the infrastructure and, you know, by actually having all of this infrastructure, we can monitor it, detect if something is, you know, an attack is happening, and we can immediately sort of stop it. So that's different from when it's on prem, you have kind of like the separated duties where the software vendor, which would have been us, doesn't really see what's happening in the data center. So, you know, there's an IT team that didn't develop the software is responsible for the security. So I think things are much better now. I think we're much better set up, but of course, things like cryptocurrencies and so on are making it easier for people to sort of hide. There decentralized networks. So, you know, the attackers are getting more and more sophisticated as well. So that's definitely something that's super important. It's super top of mind. We're all investing heavily into security and privacy because, you know, that's going to be super critical going forward. >> Yeah, we got to move that red line, and figure that out and get more intelligence. Decentralized trends not going away it's going to be more of that, less of the centralized. But centralized does come into play with data. It's a mix, it's not mutually exclusive. And I'll get your thoughts on this. Architectural question with, you know, 5G and the edge coming. Amazon's got that outpost stringent, the wavelength, you're seeing mobile world Congress coming up in this month. The focus on processing data at the edge is a huge issue. And enterprises are now going to be commercial part of that. So architecture decisions are being made in enterprises right now. And this is a big issue. So you mentioned multi-cloud, so tools versus platforms. Now I'm an enterprise buyer and there's no more RFPs. I got all this new choices for startups and growing companies to choose from that are cloud native. I got all kinds of new challenges and opportunities. How do I build my architecture so I don't foreclose a future opportunity. >> Yeah, as I said, look, you're actually right. Cloud is becoming even more and more something that everybody's adopting, but at the same time, there is this thing that the edge is also more and more important. And the connectivity between those two and making sure that you can really do that efficiently. My ask from enterprises, and I think this is top of mind for all the enterprise architects is, choose open because that way you can avoid locking yourself in. So that's one thing that's really, really important. In the past, you know, all these vendors that locked you in, and then you try to move off of them, they were highly innovative back in the day. In the 80's and the 90's, there were the best companies. You gave them all your data and it was fantastic. But then because you were locked in, they didn't need to innovate anymore. And you know, they focused on margins instead. And then over time, the innovation stopped and now you were kind of locked in. So I think openness is really important. I think preserving optionality with multi-cloud because we see the different clouds have different strengths and weaknesses and it changes over time. All right. Early on AWS was the only game that either showed up with much better security, active directory, and so on. Now Google with AI capabilities, which one's going to win, which one's going to be better. Actually, probably all three are going to be around. So having that optionality that you can pick between the three and then artificial intelligence. I think that's going to be the key to the future. You know, you asked about security earlier. That's how people detect zero day attacks, right? You ask about the edge, same thing there, that's where the predictions are going to happen. So make sure that you invest in AI and artificial intelligence very early on because it's not something you can just bolt on later on and have a little data team somewhere that then now you have AI and it's one and done. >> All right. Great insight. I've got to ask you, the folks may or may not know, but you're a professor at Berkeley as well, done a lot of great work. That's where you kind of came out of when Data bricks was formed. And the Berkeley basically was it invented distributed computing back in the 80's. I remember I was breaking in when Unix was proprietary, when software wasn't open you actually had the deal that under the table to get code. Now it's all open. Isn't the internet now with distributed computing and how interconnects are happening. I mean, the internet didn't break during the pandemic, which proves the benefit of the internet. And that's a positive. But as you start seeing edge, it's essentially distributed computing. So I got to ask you from a computer science standpoint. What do you see as the key learnings or connect the dots for how this distributed model will work? I see hybrids clearly, hybrid cloud is clearly the operating model but if you take it to the next level of distributed computing, what are some of the key things that you look for in the next five years as this starts to be completely interoperable, obviously software is going to drive a lot of it. What's your vision on that? >> Yeah, I mean, you know, so Berkeley, you're right for the gigs, you know, there was a now project 20, 30 years ago that basically is how we do things. There was a project on how you search in the very early on with Inktomi that became how Google and everybody else to search today. So workday was super, super early, sometimes way too early. And that was actually the mistake. Was that they were so early that people said that that stuff doesn't work. And then 20 years later you were invented. So I think 2009, Berkeley published just above the clouds saying the cloud is the future. At that time, most industry leaders said, that's just, you know, that doesn't work. Today, recently they published a research paper called, Sky Computing. So sky computing is what you get above the clouds, right? So we have the cloud as the future, the next level after that is the sky. That's one on top of them. That's what multi-cloud is. So that's a lot of the research at Berkeley, you know, into distributed systems labs is about this. And we're excited about that. Then we're one of the sky computing vendors out there. So I think you're going to see much more innovation happening at the sky level than at the compute level where you needed all those DevOps and SRE people to like, you know, build everything manually themselves. I can just see the memes now coming Ali, sky net, star track. You've got space too, by the way, space is another frontier that is seeing a lot of action going on because now the surface area of data with satellites is huge. So again, I know you guys are doing a lot of business with folks in that vertical where you starting to see real time data acquisition coming from these satellites. What's your take on the whole space as the, not the final frontier, but certainly as a new congested and contested space for, for data? >> Well, I mean, as a data vendor, we see a lot of, you know, alternative data sources coming in and people aren't using machine learning< AI to eat out signal out of the, you know, massive amounts of imagery that's coming out of these satellites. So that's actually a pretty common in FinTech, which is a vertical for us. And also sort of in the public sector, lots of, lots of, lots of satellites, imagery data that's coming. And these are massive volumes. I mean, it's like huge data sets and it's a super, super exciting what they can do. Like, you know, extracting signal from the satellite imagery is, and you know, being able to handle that amount of data, it's a challenge for all the companies that we work with. So we're excited about that too. I mean, definitely that's a trend that's going to continue. >> All right. I'm super excited for you. And thanks for coming on The Cube here for our keynote. I got to ask you a final question. As you think about the future, I see your company has achieved great success in a very short time, and again, you guys done the work, I've been following your company as you know. We've been been breaking that Data bricks story for a long time. I've been excited by it, but now what's changed. You got to start thinking about the next 20 miles stair when you look at, you know, the sky computing, you're thinking about these new architectures. As the CEO, your job is to one, not run out of money which you don't have to worry about that anymore, so hiring. And then, you got to figure out that next 20 miles stair as a company. What's that going on in your mind? Take us through your mindset of what's next. And what do you see out in that landscape? >> Yeah, so what I mentioned around Sky company optionality around multi-cloud, you're going to see a lot of capabilities around that. Like how do you get multi-cloud disaster recovery? How do you leverage the best of all the clouds while at the same time not having to just pick one? So there's a lot of innovation there that, you know, we haven't announced yet, but you're going to see a lot of it over the next many years. Things that you can do when you have the optionality across the different parts. And the second thing that's really exciting for us is bringing AI to the masses. Democratizing data and AI. So how can you actually apply machine learning to machine learning? How can you automate machine learning? Today machine learning is still quite complicated and it's pretty advanced. It's not going to be that way 10 years from now. It's going to be very simple. Everybody's going to have it at their fingertips. So how do we apply machine learning to machine learning? It's called auto ML, automatic, you know, machine learning. So that's an area, and that's not something that can be done with, right? But the goal is to eventually be able to automate a way the whole machine learning engineer and the machine learning data scientist altogether. >> You know it's really fun and talking with you is that, you know, for years we've been talking about this inside the ropes, inside the industry, around the future. Now people starting to get some visibility, the pandemics forced that. You seeing the bad projects being exposed. It's like the tide pulled out and you see all the scabs and bad projects that were justified old guard technologies. If you get it right you're on a good wave. And this is clearly what we're seeing. And you guys example of that. So as enterprises realize this, that they're going to have to look double down on the right projects and probably trash the bad projects, new criteria, how should people be thinking about buying? Because again, we talked about the RFP before. I want to kind of circle back because this is something that people are trying to figure out. You seeing, you know, organic, you come in freemium models as cloud scale becomes the advantage in the lock-in frankly seems to be the value proposition. The more value you provide, the more lock-in you get. Which sounds like that's the way it should be versus proprietary, you know, protocols. The protocol is value. How should enterprises organize their teams? Is it end to end workflows? Is it, and how should they evaluate the criteria for these technologies that they want to buy? >> Yeah, that's a great question. So I, you know, it's very simple, try to future proof your decision-making. Make sure that whatever you're doing is not blocking your in. So whatever decision you're making, what if the world changes in five years, make sure that if you making a mistake now, that's not going to bite you in about five years later. So how do you do that? Well, open source is great. If you're leveraging open-source, you can try it out already. You don't even need to talk to any vendor. Your teams can already download it and try it out and get some value out of it. If you're in the cloud, this pay as you go models, you don't have to do a big RFP and commit big. You can try it, pay the vendor, pay as you go, $10, $15. It doesn't need to be a million dollar contract and slowly grow as you're providing value. And then make sure that you're not just locking yourself in to one cloud or, you know, one particular vendor. As much as possible preserve your optionality because then that's not a one-way door. If it turns out later you want to do something else, you can, you know, pick other things as well. You're not locked in. So that's what I would say. Keep that top of mind that you're not locking yourself into a particular decision that you made today, that you might regret in five years. >> I really appreciate you coming on and sharing your with our community and The Cube. And as always great to see you. I really enjoy your clubhouse talks, and I really appreciate how you give back to the community. And I want to thank you for coming on and taking the time with us today. >> Thanks John, always appreciate talking to you. >> Okay Ali Ghodsi, CEO of Data bricks, a success story that proves the validation of cloud scale, open and create value, values the new lock-in. So Natalie, back to you for continuing coverage. >> That was a terrific interview John, but I'd love to get Dave's insights first. What were your takeaways, Dave? >> Well, if we have more time I'll tell you how Data bricks got to where they are today, but I'll say this, the most important thing to me that Allie said was he conveyed a very clear understanding of what data companies are outright and are getting ready. Talked about four things. There's not one data team, there's many data teams. And he talked about data is decentralized, and data has to have context and that context lives in the business. He said, look, think about it. The way that the data companies would get it right, they get data in teams and sales and marketing and finance and engineering. They all have their own data and data teams. And he referred to that as a data mesh. That's a term that is your mock, the Gany coined and the warehouse of the data lake it's merely a node in that global message. It meshes discoverable, he talked about federated governance, and Data bricks, they're breaking the model of shoving everything into a single repository and trying to make that the so-called single version of the truth. Rather what they're doing, which is right on is putting data in the hands of the business owners. And that's how true data companies do. And the last thing you talked about with sky computing, which I loved, it's that future layer, we talked about multi-cloud a lot that abstracts the underlying complexity of the technical details of the cloud and creates additional value on top. I always say that the cloud players like Amazon have given the gift to the world of 100 billion dollars a year they spend in CapEx. Thank you. Now we're going to innovate on top of it. Yeah. And I think the refactoring... >> Hope by John. >> That was great insight and I totally agree. The refactoring piece too was key, he brought that home. But to me, I think Data bricks that Ali shared there and why he's been open and sharing a lot of his insights and the community. But what he's not saying, cause he's humble and polite is they cracked the code on the enterprise, Dave. And to Dave's points exactly reason why they did it, they saw an opportunity to make it easier, at that time had dupe was the rage, and they just made it easier. They was smart, they made good bets, they had a good formula and they cracked the code with the enterprise. They brought it in and they brought value. And see that's the key to the cloud as Dave pointed out. You get replatform with the cloud, then you refactor. And I think he pointed out the multi-cloud and that really kind of teases out the whole future and landscape, which is essentially distributed computing. And I think, you know, companies are starting to figure that out with hybrid and this on premises and now super edge I call it, with 5G coming. So it's just pretty incredible. >> Yeah. Data bricks, IPO is coming and people should know. I mean, what everybody, they created spark as you know John and everybody thought they were going to do is mimic red hat and sell subscriptions and support. They didn't, they developed a managed service and they embedded AI tools to simplify data science. So to your point, enterprises could buy instead of build, we know this. Enterprises will spend money to make things simpler. They don't have the resources, and so this was what they got right was really embedding that, making a building a managed service, not mimicking the kind of the red hat model, but actually creating a new value layer there. And that's big part of their success. >> If I could just add one thing Natalie to that Dave saying is really right on. And as an enterprise buyer, if we go the other side of the equation, it used to be that you had to be a known company, get PR, you fill out RFPs, you had to meet all the speeds. It's like going to the airport and get a swab test, and get a COVID test and all kinds of mechanisms to like block you and filter you. Most of the biggest success stories that have created the most value for enterprises have been the companies that nobody's understood. And Andy Jazz's famous quote of, you know, being misunderstood is actually a good thing. Data bricks was very misunderstood at the beginning and no one kind of knew who they were but they did it right. And so the enterprise buyers out there, don't be afraid to test the startups because you know the next Data bricks is out there. And I think that's where I see the psychology changing from the old IT buyers, Dave. It's like, okay, let's let's test this company. And there's plenty of ways to do that. He illuminated those premium, small pilots, you don't need to go on these big things. So I think that is going to be a shift in how companies going to evaluate startups. >> Yeah. Think about it this way. Why should the large banks and insurance companies and big manufacturers and pharma companies, governments, why should they burn resources managing containers and figuring out data science tools if they can just tap into solutions like Data bricks which is an AI platform in the cloud and let the experts manage all that stuff. Think about how much money in time that saves enterprises. >> Yeah, I mean, we've got 15 companies here we're showcasing this batch and this season if you call it. That episode we are going to call it? They're awesome. Right? And the next 15 will be the same. And these companies could be the next billion dollar revenue generator because the cloud enables that day. I think that's the exciting part. >> Well thank you both so much for these insights. Really appreciate it. AWS startup showcase highlights the innovation that helps startups succeed. And no one knows that better than our very next guest, Jeff Barr. Welcome to the show and I will send this interview now to Dave and John and see you just in the bit. >> Okay, hey Jeff, great to see you. Thanks for coming on again. >> Great to be back. >> So this is a regular community segment with Jeff Barr who's a legend in the industry. Everyone knows your name. Everyone knows that. Congratulations on your recent blog posts we have reading. Tons of news, I want to get your update because 5G has been all over the news, mobile world congress is right around the corner. I know Bill Vass was a keynote out there, virtual keynote. There's a lot of Amazon discussion around the edge with wavelength. Specifically, this is the outpost piece. And I know there is news I want to get to, but the top of mind is there's massive Amazon expansion and the cloud is going to the edge, it's here. What's up with wavelength. Take us through the, I call it the power edge, the super edge. >> Well, I'm really excited about this mostly because it gives a lot more choice and flexibility and options to our customers. This idea that with wavelength we announced quite some time ago, at least quite some time ago if we think in cloud years. We announced that we would be working with 5G providers all over the world to basically put AWS in the telecom providers data centers or telecom centers, so that as their customers build apps, that those apps would take advantage of the low latency, the high bandwidth, the reliability of 5G, be able to get to some compute and storage services that are incredibly close geographically and latency wise to the compute and storage that is just going to give customers this new power and say, well, what are the cool things we can build? >> Do you see any correlation between wavelength and some of the early Amazon services? Because to me, my gut feels like there's so much headroom there. I mean, I was just riffing on the notion of low latency packets. I mean, just think about the applications, gaming and VR, and metaverse kind of cool stuff like that where having the edge be that how much power there. It just feels like a new, it feels like a new AWS. I mean, what's your take? You've seen the evolutions and the growth of a lot of the key services. Like EC2 and SA3. >> So welcome to my life. And so to me, the way I always think about this is it's like when I go to a home improvement store and I wander through the aisles and I often wonder through with no particular thing that I actually need, but I just go there and say, wow, they've got this and they've got this, they've got this other interesting thing. And I just let my creativity run wild. And instead of trying to solve a problem, I'm saying, well, if I had these different parts, well, what could I actually build with them? And I really think that this breadth of different services and locations and options and communication technologies. I suspect a lot of our customers and customers to be and are in this the same mode where they're saying, I've got all this awesomeness at my fingertips, what might I be able to do with it? >> He reminds me when Fry's was around in Palo Alto, that store is no longer here but it used to be back in the day when it was good. It was you go in and just kind of spend hours and then next thing you know, you built a compute. Like what, I didn't come in here, whether it gets some cables. Now I got a motherboard. >> I clearly remember Fry's and before that there was the weird stuff warehouse was another really cool place to hang out if you remember that. >> Yeah I do. >> I wonder if I could jump in and you guys talking about the edge and Jeff I wanted to ask you about something that is, I think people are starting to really understand and appreciate what you did with the entrepreneur acquisition, what you do with nitro and graviton, and really driving costs down, driving performance up. I mean, there's like a compute Renaissance. And I wonder if you could talk about the importance of that at the edge, because it's got to be low power, it has to be low cost. You got to be doing processing at the edge. What's your take on how that's evolving? >> Certainly so you're totally right that we started working with and then ultimately acquired Annapurna labs in Israel a couple of years ago. I've worked directly with those folks and it's really awesome to see what they've been able to do. Just really saying, let's look at all of these different aspects of building the cloud that were once effectively kind of somewhat software intensive and say, where does it make sense to actually design build fabricate, deploy custom Silicon? So from putting up the system to doing all kinds of additional kinds of security checks, to running local IO devices, running the NBME as fast as possible to support the EBS. Each of those things has been a contributing factor to not just the power of the hardware itself, but what I'm seeing and have seen for the last probably two or three years at this point is the pace of innovation on instance types just continues to get faster and faster. And it's not just cranking out new instance types because we can, it's because our awesomely diverse base of customers keeps coming to us and saying, well, we're happy with what we have so far, but here's this really interesting new use case. And we needed a different ratio of memory to CPU, or we need more cores based on the amount of memory, or we needed a lot of IO bandwidth. And having that nitro as the base lets us really, I don't want to say plug and play, cause I haven't actually built this myself, but it seems like they can actually put the different elements together, very very quickly and then come up with new instance types that just our customers say, yeah, that's exactly what I asked for and be able to just do this entire range of from like micro and nano sized all the way up to incredibly large with incredible just to me like, when we talk about terabytes of memory that are just like actually just RAM memory. It's like, that's just an inconceivably large number by the standards of where I started out in my career. So it's all putting this power in customer hands. >> You used the term plug and play, but it does give you that nitro gives you that optionality. And then other thing that to me is really exciting is the way in which ISVs are writing to whatever's underneath. So you're making that, you know, transparent to the users so I can choose as a customer, the best price performance for my workload and that that's just going to grow that ISV portfolio. >> I think it's really important to be accurate and detailed and as thorough as possible as we launch each one of these new instance types with like what kind of processor is in there and what clock speed does it run at? What kind of, you know, how much memory do we have? What are the, just the ins and outs, and is it Intel or arm or AMD based? It's such an interesting to me contrast. I can still remember back in the very very early days of back, you know, going back almost 15 years at this point and effectively everybody said, well, not everybody. A few people looked and said, yeah, we kind of get the value here. Some people said, this just sounds like a bunch of generic hardware, just kind of generic hardware in Iraq. And even back then it was something that we were very careful with to design and optimize for use cases. But this idea that is generic is so, so, so incredibly inaccurate that I think people are now getting this. And it's okay. It's fine too, not just for the cloud, but for very specific kinds of workloads and use cases. >> And you guys have announced obviously the performance improvements on a lamb** does getting faster, you got the per billing, second billings on windows and SQL server on ECE too**. So I mean, obviously everyone kind of gets that, that's been your DNA, keep making it faster, cheaper, better, easier to use. But the other area I want to get your thoughts on because this is also more on the footprint side, is that the regions and local regions. So you've got more region news, take us through the update on the expansion on the footprint of AWS because you know, a startup can come in and these 15 companies that are here, they're global with AWS, right? So this is a major benefit for customers around the world. And you know, Ali from Data bricks mentioned privacy. Everyone's a privacy company now. So the huge issue, take us through the news on the region. >> Sure, so the two most recent regions that we announced are in the UAE and in Israel. And we generally like to pre-announce these anywhere from six months to two years at a time because we do know that the customers want to start making longer term plans to where they can start thinking about where they can do their computing, where they can store their data. I think at this point we now have seven regions under construction. And, again it's all about customer trice. Sometimes it's because they have very specific reasons where for based on local laws, based on national laws, that they must compute and restore within a particular geographic area. Other times I say, well, a lot of our customers are in this part of the world. Why don't we pick a region that is as close to that part of the world as possible. And one really important thing that I always like to remind our customers of in my audience is, anything that you choose to put in a region, stays in that region unless you very explicitly take an action that says I'd like to replicate it somewhere else. So if someone says, I want to store data in the US, or I want to store it in Frankfurt, or I want to store it in Sao Paulo, or I want to store it in Tokyo or Osaka. They get to make that very specific choice. We give them a lot of tools to help copy and replicate and do cross region operations of various sorts. But at the heart, the customer gets to choose those locations. And that in the early days I think there was this weird sense that you would, you'd put things in the cloud that would just mysteriously just kind of propagate all over the world. That's never been true, and we're very very clear on that. And I just always like to reinforce that point. >> That's great stuff, Jeff. Great to have you on again as a regular update here, just for the folks watching and don't know Jeff he'd been blogging and sharing. He'd been the one man media band for Amazon it's early days. Now he's got departments, he's got peoples on doing videos. It's an immediate franchise in and of itself, but without your rough days we wouldn't have gotten all the great news we subscribe to. We watch all the blog posts. It's essentially the flow coming out of AWS which is just a tsunami of a new announcements. Always great to read, must read. Jeff, thanks for coming on, really appreciate it. That's great. >> Thank you John, great to catch up as always. >> Jeff Barr with AWS again, and follow his stuff. He's got a great audience and community. They talk back, they collaborate and they're highly engaged. So check out Jeff's blog and his social presence. All right, Natalie, back to you for more coverage. >> Terrific. Well, did you guys know that Jeff took a three week AWS road trip across 15 cities in America to meet with cloud computing enthusiasts? 5,500 miles he drove, really incredible I didn't realize that. Let's unpack that interview though. What stood out to you John? >> I think Jeff, Barr's an example of what I call direct to audience a business model. He's been doing it from the beginning and I've been following his career. I remember back in the day when Amazon was started, he was always building stuff. He's a builder, he's classic. And he's been there from the beginning. At the beginning he was just the blog and it became a huge audience. It's now morphed into, he was power blogging so hard. He has now support and he still does it now. It's basically the conduit for information coming out of Amazon. I think Jeff has single-handedly made Amazon so successful at the community developer level, and that's the startup action happened and that got them going. And I think he deserves a lot of the success for AWS. >> And Dave, how about you? What is your reaction? >> Well I think you know, and everybody knows about the cloud and back stop X** and agility, and you know, eliminating the undifferentiated, heavy lifting and all that stuff. And one of the things that's often overlooked which is why I'm excited to be part of this program is the innovation. And the innovation comes from startups, and startups start in the cloud. And so I think that that's part of the flywheel effect. You just don't see a lot of startups these days saying, okay, I'm going to do something that's outside of the cloud. There are some, but for the most part, you know, if you saw in software, you're starting in the cloud, it's so capital efficient. I think that's one thing, I've throughout my career. I've been obsessed with every part of the stack from whether it's, you know, close to the business process with the applications. And right now I'm really obsessed with the plumbing, which is why I was excited to talk about, you know, the Annapurna acquisition. Amazon bought and a part of the $350 million, it's reported, you know, maybe a little bit more, but that isn't an amazing acquisition. And the reason why that's so important is because Amazon is continuing to drive costs down, drive performance up. And in my opinion, leaving a lot of the traditional players in their dust, especially when it comes to the power and cooling. You have often overlooked things. And the other piece of the interview was that Amazon is actually getting ISVs to write to these new platforms so that you don't have to worry about there's the software run on this chip or that chip, or x86 or arm or whatever it is. It runs. And so I can choose the best price performance. And that's where people don't, they misunderstand, you always say it John, just said that people are misunderstood. I think they misunderstand, they confused, you know, the price of the cloud with the cost of the cloud. They ignore all the labor costs that are associated with that. And so, you know, there's a lot of discussion now about the cloud tax. I just think the pace is accelerating. The gap is not closing, it's widening. >> If you look at the one question I asked them about wavelength and I had a follow up there when I said, you know, we riff on it and you see, he lit up like he beam was beaming because he said something interesting. It's not that there's a problem to solve at this opportunity. And he conveyed it to like I said, walking through Fry's. But like, you go into a store and he's a builder. So he sees opportunity. And this comes back down to the Martine Casada paradox posts he wrote about do you optimize for CapEx or future revenue? And I think the tell sign is at the wavelength edge piece is going to be so creative and that's going to open up massive opportunities. I think that's the place to watch. That's the place I'm watching. And I think startups going to come out of the woodwork because that's where the action will be. And that's just Amazon at the edge, I mean, that's just cloud at the edge. I think that is going to be very effective. And his that's a little TeleSign, he kind of revealed a little bit there, a lot there with that comment. >> Well that's a to be continued conversation. >> Indeed, I would love to introduce our next guest. We actually have Soma on the line. He's the managing director at Madrona venture group. Thank you Soma very much for coming for our keynote program. >> Thank you Natalie and I'm great to be here and will have the opportunity to spend some time with you all. >> Well, you have a long to nerd history in the enterprise. How would you define the modern enterprise also known as cloud scale? >> Yeah, so I would say I have, first of all, like, you know, we've all heard this now for the last, you know, say 10 years or so. Like, software is eating the world. Okay. Put it another way, we think about like, hey, every enterprise is a software company first and foremost. Okay. And companies that truly internalize that, that truly think about that, and truly act that way are going to start up, continue running well and things that don't internalize that, and don't do that are going to be left behind sooner than later. Right. And the last few years you start off thing and not take it to the next level and talk about like, not every enterprise is not going through a digital transformation. Okay. So when you sort of think about the world from that lens. Okay. Modern enterprise has to think about like, and I am first and foremost, a technology company. I may be in the business of making a car art, you know, manufacturing paper, or like you know, manufacturing some healthcare products or what have you got out there. But technology and software is what is going to give me a unique, differentiated advantage that's going to let me do what I need to do for my customers in the best possible way [Indistinct]. So that sort of level of focus, level of execution, has to be there in a modern enterprise. The other thing is like not every modern enterprise needs to think about regular. I'm competing for talent, not anymore with my peers in my industry. I'm competing for technology talent and software talent with the top five technology companies in the world. Whether it is Amazon or Facebook or Microsoft or Google, or what have you cannot think, right? So you really have to have that mindset, and then everything flows from that. >> So I got to ask you on the enterprise side again, you've seen many ways of innovation. You've got, you know, been in the industry for many, many years. The old way was enterprises want the best proven product and the startups want that lucrative contract. Right? Yeah. And get that beach in. And it used to be, and we addressed this in our earlier keynote with Ali and how it's changing, the buyers are changing because the cloud has enabled this new kind of execution. I call it agile, call it what you want. Developers are driving modern applications, so enterprises are still, there's no, the playbooks evolving. Right? So we see that with the pandemic, people had needs, urgent needs, and they tried new stuff and it worked. The parachute opened as they say. So how do you look at this as you look at stars, you're investing in and you're coaching them. What's the playbook? What's the secret sauce of how to crack the enterprise code today. And if you're an enterprise buyer, what do I need to do? I want to be more agile. Is there a clear path? Is there's a TSA to let stuff go through faster? I mean, what is the modern playbook for buying and being a supplier? >> That's a fantastic question, John, because I think that sort of playbook is changing, even as we speak here currently. A couple of key things to understand first of all is like, you know, decision-making inside an enterprise is getting more and more de-centralized. Particularly decisions around what technology to use and what solutions to use to be able to do what people need to do. That decision making is no longer sort of, you know, all done like the CEO's office or the CTO's office kind of thing. Developers are more and more like you rightly said, like sort of the central of the workflow and the decision making process. So it'll be who both the enterprises, as well as the startups to really understand that. So what does it mean now from a startup perspective, from a startup perspective, it means like, right. In addition to thinking about like hey, not do I go create an enterprise sales post, do I sell to the enterprise like what I might have done in the past? Is that the best way of moving forward, or should I be thinking about a product led growth go to market initiative? You know, build a product that is easy to use, that made self serve really works, you know, get the developers to start using to see the value to fall in love with the product and then you think about like hey, how do I go translate that into a contract with enterprise. Right? And more and more what I call particularly, you know, startups and technology companies that are focused on the developer audience are thinking about like, you know, how do I have a bottom up go to market motion? And sometime I may sort of, you know, overlap that with the top down enterprise sales motion that we know that has been going on for many, many years or decades kind of thing. But really this product led growth bottom up a go to market motion is something that we are seeing on the rise. I would say they're going to have more than half the startup that we come across today, have that in some way shape or form. And so the enterprise also needs to understand this, the CIO or the CTO needs to know that like hey, I'm not decision-making is getting de-centralized. I need to empower my engineers and my engineering managers and my engineering leaders to be able to make the right decision and trust them. I'm going to give them some guard rails so that I don't find myself in a soup, you know, sometime down the road. But once I give them the guard rails, I'm going to enable people to make the decisions. People who are closer to the problem, to make the right decision. >> Well Soma, what are some of the ways that startups can accelerate their enterprise penetration? >> I think that's another good question. First of all, you need to think about like, Hey, what are enterprises wanting to rec? Okay. If you start off take like two steps back and think about what the enterprise is really think about it going. I'm a software company, but I'm really manufacturing paper. What do I do? Right? The core thing that most enterprises care about is like, hey, how do I better engage with my customers? How do I better serve my customers? And how do I do it in the most optimal way? At the end of the day that's what like most enterprises really care about. So startups need to understand, what are the problems that the enterprise is trying to solve? What kind of tools and platform technologies and infrastructure support, and, you know, everything else that they need to be able to do what they need to do and what only they can do in the most optimal way. Right? So to the extent you are providing either a tool or platform or some technology that is going to enable your enterprise to make progress on what they want to do, you're going to get more traction within the enterprise. In other words, stop thinking about technology, and start thinking about the customer problem that they want to solve. And the more you anchor your company, and more you anchor your conversation with the customer around that, the more the enterprise is going to get excited about wanting to work with you. >> So I got to ask you on the enterprise and developer equation because CSOs and CXOs, depending who you talk to have that same answer. Oh yeah. In the 90's and 2000's, we kind of didn't, we throttled down, we were using the legacy developer tools and cloud came and then we had to rebuild and we didn't really know what to do. So you seeing a shift, and this is kind of been going on for at least the past five to eight years, a lot more developers being hired yet. I mean, at FinTech is clearly a vertical, they always had developers and everyone had developers, but there's a fast ramp up of developers now and the role of open source has changed. Just looking at the participation. They're not just consuming open source, open source is part of the business model for mainstream enterprises. How is this, first of all, do you agree? And if so, how has this changed the course of an enterprise human resource selection? How they're organized? What's your vision on that? >> Yeah. So as I mentioned earlier, John, in my mind the first thing is, and this sort of, you know, like you said financial services has always been sort of hiring people [Indistinct]. And this is like five-year old story. So bear with me I'll tell you the firewall story and then come to I was trying to, the cloud CIO or the Goldman Sachs. Okay. And this is five years ago when people were still like, hey, is this cloud thing real and now is cloud going to take over the world? You know, am I really ready to put my data in the cloud? So there are a lot of questions and conversations can affect. The CIO of Goldman Sachs told me two things that I remember to this day. One is, hey, we've got a internal edict. That we made a decision that in the next five years, everything in Goldman Sachs is going to be on the public law. And I literally jumped out of the chair and I said like now are you going to get there? And then he laughed and said like now it really doesn't matter whether we get there or not. We want to set the tone, set the direction for the organization that hey, public cloud is here. Public cloud is there. And we need to like, you know, move as fast as we realistically can and think about all the financial regulations and security and privacy. And all these things that we care about deeply. But given all of that, the world is going towards public load and we better be on the leading edge as opposed to the lagging edge. And the second thing he said, like we're talking about like hey, how are you hiring, you know, engineers at Goldman Sachs Canada? And he said like in hey, I sort of, my team goes out to the top 20 schools in the US. And the people we really compete with are, and he was saying this, Hey, we don't compete with JP Morgan or Morgan Stanley, or pick any of your favorite financial institutions. We really think about like, hey, we want to get the best talent into Goldman Sachs out of these schools. And we really compete head to head with Google. We compete head to head with Microsoft. We compete head to head with Facebook. And we know that the caliber of people that we want to get is no different than what these companies want. If you want to continue being a successful, leading it, you know, financial services player. That sort of tells you what's going on. You also talked a little bit about like hey, open source is here to stay. What does that really mean kind of thing. In my mind like now, you can tell me that I can have from given my pedigree at Microsoft, I can tell you that we were the first embraces of open source in this world. So I'll say that right off the bat. But having said that we did in our turn around and said like, hey, this open source is real, this open source is going to be great. How can we embrace and how can we participate? And you fast forward to today, like in a Microsoft is probably as good as open source as probably any other large company I would say. Right? Including like the work that the company has done in terms of acquiring GitHub and letting it stay true to its original promise of open source and community can I think, right? I think Microsoft has come a long way kind of thing. But the thing that like in all these enterprises need to think about is you want your developers to have access to the latest and greatest tools. To the latest and greatest that the software can provide. And you really don't want your engineers to be reinventing the wheel all the time. So there is something available in the open source world. Go ahead, please set up, think about whether that makes sense for you to use it. And likewise, if you think that is something you can contribute to the open source work, go ahead and do that. So it's really a two way somebody Arctic relationship that enterprises need to have, and they need to enable their developers to want to have that symbiotic relationship. >> Soma, fantastic insights. Thank you so much for joining our keynote program. >> Thank you Natalie and thank you John. It was always fun to chat with you guys. Thank you. >> Thank you. >> John we would love to get your quick insight on that. >> Well I think first of all, he's a prolific investor the great from Madrona venture partners, which is well known in the tech circles. They're in Seattle, which is in the hub of I call cloud city. You've got Amazon and Microsoft there. He'd been at Microsoft and he knows the developer ecosystem. And reason why I like his perspective is that he understands the value of having developers as a core competency in Microsoft. That's their DNA. You look at Microsoft, their number one thing from day one besides software was developers. That was their army, the thousand centurions that one won everything for them. That has shifted. And he brought up open source, and .net and how they've embraced Linux, but something that tele before he became CEO, we interviewed him in the cube at an Xcel partners event at Stanford. He was open before he was CEO. He was talking about opening up. They opened up a lot of their open source infrastructure projects to the open compute foundation early. So they had already had that going and at that price, since that time, the stock price of Microsoft has skyrocketed because as Ali said, open always wins. And I think that is what you see here, and as an investor now he's picking in startups and investing in them. He's got to read the tea leaves. He's got to be in the right side of history. So he brings a great perspective because he sees the old way and he understands the new way. That is the key for success we've seen in the enterprise and with the startups. The people who get the future, and can create the value are going to win. >> Yeah, really excellent point. And just really quickly. What do you think were some of our greatest hits on this hour of programming? >> Well first of all I'm really impressed that Ali took the time to come join us because I know he's super busy. I think they're at a $28 billion valuation now they're pushing a billion dollars in revenue, gap revenue. And again, just a few short years ago, they had zero software revenue. So of these 15 companies we're showcasing today, you know, there's a next Data bricks in there. They're all going to be successful. They already are successful. And they're all on this rocket ship trajectory. Ali is smart, he's also got the advantage of being part of that Berkeley community which they're early on a lot of things now. Being early means you're wrong a lot, but you're also right, and you're right big. So Berkeley and Stanford obviously big areas here in the bay area as research. He is smart, He's got a great team and he's really open. So having him share his best practices, I thought that was a great highlight. Of course, Jeff Barr highlighting some of the insights that he brings and honestly having a perspective of a VC. And we're going to have Peter Wagner from wing VC who's a classic enterprise investors, super smart. So he'll add some insight. Of course, one of the community session, whenever our influencers coming on, it's our beat coming on at the end, as well as Katie Drucker. Another Madrona person is going to talk about growth hacking, growth strategies, but yeah, sights Raleigh coming on. >> Terrific, well thank you so much for those insights and thank you to everyone who is watching the first hour of our live coverage of the AWS startup showcase for myself, Natalie Ehrlich, John, for your and Dave Vellante we want to thank you very much for watching and do stay tuned for more amazing content, as well as a special live segment that John Furrier is going to be hosting. It takes place at 12:30 PM Pacific time, and it's called cracking the code, lessons learned on how enterprise buyers evaluate new startups. Don't go anywhere.

Published Date : Jun 24 2021

SUMMARY :

on the latest innovations and solutions How are you doing. are you looking forward to. and of course the keynotes Ali Ghodsi, of the quality of healthcare and you know, to go from, you know, a you on the other side. Congratulations and great to see you. Thank you so much, good to see you again. And you were all in on cloud. is the success of how you guys align it becomes a force that you moments that you can point to, So that's the second one that we bet on. And one of the things that Back in the day, you had to of say that the data problems And you know, there's this and that's why we have you on here. And if you say you're a data company, and growing companies to choose In the past, you know, So I got to ask you from a for the gigs, you know, to eat out signal out of the, you know, I got to ask you a final question. But the goal is to eventually be able the more lock-in you get. to one cloud or, you know, and taking the time with us today. appreciate talking to you. So Natalie, back to you but I'd love to get Dave's insights first. And the last thing you talked And see that's the key to the of the red hat model, to like block you and filter you. and let the experts manage all that stuff. And the next 15 will be the same. see you just in the bit. Okay, hey Jeff, great to see you. and the cloud is going and options to our customers. and some of the early Amazon services? And so to me, and then next thing you Fry's and before that and appreciate what you did And having that nitro as the base is the way in which ISVs of back, you know, going back is that the regions and local regions. And that in the early days Great to have you on again Thank you John, great to you for more coverage. What stood out to you John? and that's the startup action happened the most part, you know, And that's just Amazon at the edge, Well that's a to be We actually have Soma on the line. and I'm great to be here How would you define the modern enterprise And the last few years you start off thing So I got to ask you on and then you think about like hey, And the more you anchor your company, So I got to ask you on the enterprise and this sort of, you know, Thank you so much for It was always fun to chat with you guys. John we would love to get And I think that is what you see here, What do you think were it's our beat coming on at the end, and it's called cracking the code,

ENTITIES

Entity	Category	Confidence
Ali Ghodsi	PERSON	0.99+
Natalie Ehrlich	PERSON	0.99+
Dave	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Natalie	PERSON	0.99+
Jeff	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
John	PERSON	0.99+
Google	ORGANIZATION	0.99+
Osaka	LOCATION	0.99+
UAE	LOCATION	0.99+
Allie	PERSON	0.99+
Israel	LOCATION	0.99+
Peter Wagner	PERSON	0.99+
John Furrier	PERSON	0.99+
Facebook	ORGANIZATION	0.99+
Tokyo	LOCATION	0.99+
$10	QUANTITY	0.99+
Sao Paulo	LOCATION	0.99+
Goldman Sachs	ORGANIZATION	0.99+
Frankfurt	LOCATION	0.99+
Berkeley	ORGANIZATION	0.99+
Jeff Barr	PERSON	0.99+
Seattle	LOCATION	0.99+
$28 billion	QUANTITY	0.99+
Katie Drucker	PERSON	0.99+
$15	QUANTITY	0.99+
Morgan Stanley	ORGANIZATION	0.99+
Soma	PERSON	0.99+
Iraq	LOCATION	0.99+
2009	DATE	0.99+
Juan	PERSON	0.99+
Goldman Sachs	ORGANIZATION	0.99+
$350 million	QUANTITY	0.99+
Ali	PERSON	0.99+
11 years	QUANTITY	0.99+

Yaron Haviv, Iguazio | KubeCon + CloudNativeCon NA 2019

>>Live from San Diego, California at the cube covering to clock in cloud native con brought to you by red hat, the cloud native computing foundation and its ecosystem Marsh. >>Welcome back. This is the cubes coverage of CubeCon cloud date of con 2019 in San Diego, 12,000 in attendance. I'm just two minute and my cohost is John trier. And welcome back to the program. A multi-time cube alumni. You're on Aviv, who is the CTO and cofounder of a Gwoza. We've had quite a lot of, you know, founders, CTOs, you know, their big brains at this show, your own. So you know, let, let, let's start, you know, there's, there's really a gathering, uh, there's a lot of effort building out, you know, a very complicated ecosystem. Give us first, kind of your overall impressions of the show in this ecosystem. Yeah, so we're very early on on Desecco system. We were one of the first in the first batch of CNCF members when there were a few dozens of those. Not like a thousand of those. Uh, so I've been, I've been to all those shows. >>Uh, we're part of the CNCF committees for different things. And any initiating, I think this has become much more mainstream. I told you before, it's sort of the new van world. You know, I lot a lot more, uh, all day infrastructure vendors along with middleware and application vendor are coming here. All right, so, so one of the things we like having you on the program you're on is you don't pull any punches. So we've seen certain waves of technology come with big promise and fall short, you know, big data was going to allow us to leverage everything and you know, large percentage of, uh, solutions, you know, had to stop or be pulled back. Um, give us, what's the cautionary tale that we should learn and make sure that we don't repeat, you know, so I've been a CTO for many years in different companies and, and what everyone used to say about it, I'm always right. >>I'm only one year off usually. I'm usually a little more optimistic. So, you know, we've been talking about Cloudera and Hadoop world sort of going down and Kubernetes and cloud services, essentially replacing them. We were talking about it four years ago and what do you see that's actually happening? You know, with the collapse of my par and whore, then we're going to Cloudera things are going down, customer now Denon guys, we need equivalent solution for Kubernetes. We're not going to maintain two clusters. So I think in general we've been, uh, picking on many of those friends. We've, we've invented serverless before it was even called serverless with, with nuclear and now we're expanding it further and now we see the new emerging trends really around machine learning and AI. That's sort of the big thing. I'm surprised, you know, that's our space where essentially you're doing a data science platform as a service fully automated around serverless constructs so people can, can develop things really, really quickly. >>And what I see that, you know, third of the people I talk to are, have some relations to machine learning and AI. Yeah. Maybe explain that for our audience a little bit. Because when, you know, Kubernetes first started very much an infrastructure discussion, but the last year or two, uh, very much application specific, we hear many people talking about those data use cases, AI and ML early days. But you know how, how does that fit into the overall? It's simple. You know there, if you're moving to the cloud are two workloads. There is lift and shift workloads and there are new workloads. Okay, lift and ship. Why? Why bother moving them to Kubernetes? Okay, so you end up with new workloads. Everyone is trying to be cloud native server, elastic services and all that. Everyone has to feed data and machine learning into those new applications. This is why you see those trends that talk about old data integration, various frameworks and all that in that space. >>So I don't think it's by coincidence. I think it's, that's because new applications incorporate the intelligence. That's why you hear a lot of the talk about those things. What I loved about the architecture, what you just said is like people don't want to run into another cluster. I don't want to run two versions of Kubernetes, you know, if I'm moving there you, because you, but you're still built on that, that kind of infrastructure framework and, and knowledge of, of how to do serverless and how to make more nodes and fewer nodes and persistent storage and all that sort of good stuff and uh, and, and run TensorFlow and run, you know, all these, all these big data apps. But you can, um, you can talk about that just as a, as a, the advantage to your customer cause you could, it seems like you could, you could run it on top of GKE. >>You could run it on prem. I could run my own Coobernetti's you could, you could just give me a, uh, so >> we, we say Kubernetes is not interesting. I didn't know. I don't want anyone to get offended. Okay. But Kubernetes is not the big deal. The big deal is organizations want to be competitive in this sort of digital world. They need to build new applications. Old ones are sort of in sort of a maintenance mode. And the big point is about delivering new application with elastic scaling because your, your customers may, may be a million people behind some sort of, uh, you know, uh, app. Okay. Um, so that's the key thing and Kubernetes is a way to deliver those microservices. But what we figured out, it's still very complicated for people. Okay. Especially in, in the data science work. Uh, he takes him a few weeks to deliver a model on a Jupiter notebook, whatever. >>And then productizing it is about the year. That's something we've seen between six months to a year to productize things that are relatively simple. Okay. And that's because people think about the container, the TensorFlow, the Kuda driver, whatever, how to scale it, how to make it perform, et cetera. So let's, we came up with is traditionally there's a notion of serverless, which is abstraction with very slow performance, very limited set of use cases. We sell services about elastic scaling paper, use, full automation around dev ops and all that. Okay. Why cannot apply to other use cases are really high concurrency, high-speed batch, no distributed training, distributed workload. Because we're coming, if you know my background, you know, been beeping in Mellanox and other high-performance companies. So where I have a, we have a high performance DNA so we don't know how to build things are extremely slow. >>It sort of irritates me. So the point is that how can we apply this notion of abstraction and scaling and all that to variety of workloads and this is essentially what it was. It is a combination of high speed data technology for like, you know, moving data around on between those function and extremely high speed set though functions that work on the different domains of data collection and ingestion, data analytics, you know, machine learning, training and CIN learning model serving. So a customer can come on on our platform and we have testimonials around that, that you know, things that they thought about building on Amazon or even on prem for months and months. They'd built in our platform in few weeks with fewer people because the focus is on building the application. The focus is not about joining your Kubernetes. Now we go to customers, some of them are large banks, et cetera. >>They say, Alrighty, likes Kubernetes, we have our own Kubernetes. So you know what, we don't butter. Initially we, we used to bring our own Kubernetes, but then you know, I don't mind, you know, we do struggle sometimes because our level of expertise in Coobernetti's is way more sophisticated than what they have to say. Okay, we've installed Kubernetes and we come with our software stack. No you didn't, you know, you didn't configure the security, they didn't configure ingress, et cetera. So sometimes it's easier for us to bring, but we don't want him to get into this sort of tension with it. Our focus is to accelerate development on the new application that are intelligent, you know, move applications from, if you think of the traditional data analytics and data science, it's about reporting and what people want to do. And some applications we've announced this week and application around real time cyber collection, it's being used in some different governments is that you can collect a lot of information, SMS, telephony, video, et cetera. >>And in real time you could detect terrorists. Okay. So those application requires high concurrency always on rolling upgrades, things that weren't there in the traditional BI, Oracle, you know, kind of reporting. So you have this wave of putting intelligence into more highly concurrent online application. It requires all the dev ops sort of aspects, but all the data analytics and machine learning aspects to to come to come along. Alright. So speaking of those workloads for, for machine learning, uh, cube flow is a project, uh, moving the, moving in that space along it. Give us the update there. Yeah. So, so there is sort of a rising star in the Kubernetes community around how to automate machine learning workflows. That's cube flow. Uh, I'm personally, I one of the committers and killed flow and what we've done, because it's very complicated cause Google developed the cube cube flow as one of the services on, on a GKE. >>Okay. And the tweaked everything. It works great in GK, even that it's relatively new technology and people want to move around it in a more generic. So one of the things in our platform is a managed cube flow that works natively with all the rest of the solutions. And other thing that we've done is we make it, we made it fully. So instead of queue flow approach is very con, you know, Kubernetes oriented containers, the ammos, all that. Uh, in our flavor of Coupa we can just create function and you just like chain functions and you click and it runs. Just, you've mentioned a couple of times, uh, how does serverless, as you defined it, fit in with, uh, Coobernetti's? Is that working together just functions on top or I'm just trying to make here, >> you'll, you'll hear different things. I think when most people say serverless, they mean sort of front end application things that are served low concurrency, a Terra, you know, uh, when we mean serverless, it's, we have eight different engines that each one is very good in, in different, uh, domain like distributed deep learning, you know, distributed machine learning, et cetera. >>And we know how to fit the thing into any workloads. So for me, uh, we deliver the elastic scaling, the paper use and the ease of use of sort of no dev ops across all the eight workloads that we're addressing. For most people it's like a single Dreek phony. And I think really that the future is, is moving to that. And if you think about serverless, there's another aspect here which is very important for machine learning and Israel's ability. I'm not going to develop any algorithm in the world. Okay. There are a bunch of companies or users or developers that can develop an algorithm and I can just consume it. So the future in data science but not just data science is essentially to have like marketplaces of algorithms premade or analytic tools or maybe even vendors licensing their technology through sort of prepackaged solution. >>So we're a great believer of forget about the infrastructure, focus on the business components and Daisy chain them in to a pipeline like UFO pipeline and run them. And that will allow you most reusability that, you know, lowest amount of cost, best performance, et cetera. That's great. I just want to double click on the serverless idea one more time, but, so you're, you're developing, it's an architectural pattern, uh, and you're developing these concepts yourself. You're not actually, sometimes the concept gets confused with the implementations of other people's serverless frameworks or things like that. Is that, is that correct? I think there are confusion. I'm getting asked a lot of times. How do you compare your technology compared to let's say a? You've heard the term gay native is just a technology or open FAS or, yeah. Hold on. Pfizer's a CGIs or Alito. An open community is very nice for hobbies, but if you're an enterprise and it's security, Eldep integration, authentication for anything, you need DUIs, you need CLI, you need all of those things. >>So Amazon provides that with Lambda. Can you compare Lambda to K native? No. Okay. Native is, I need to go from get and build and all that. Serverless is about taking a function and clicking and deploying. It's not about building. And the problem is that this conference is about people, it people in crowd for people who like to build. So they, they don't like to get something that work. They want to get the build the Lego building blocks so they can play. So in our view, serverless is not open FAS or K native. Okay. It's something that you click and it works and have all the enterprise set of features. We've extended it to different levels of magnitude of performance. I'll give you an anecdote. I did a comparison for our customer asking me the same question, not about Canadian, but this time Lambda. How do you guys compare with London? >>Know Nokia is extremely high performance. You know we are doing up to 400,000 events on a single process and the customer said, you know what, I have a use case. I need like 5,000 events per second. How do you guys compare a total across all my functions? How do you compare against Lambda? We went into, you know the price calculator, 5,000 events per second on Lambda. That's $50,000 okay. $50,000 we do about, let's say even in simple function, 60,000 per process, $500 VM on Amazon, $500 VM on Amazon with our technology stick, 2000 transactions per second, 5,000 events per second on Lambda. That's 50,000. Okay. 100 times more expensive. So it depends on the design point. We designed our solution to be extremely efficient, high concurrency. If you just need something to do a web hook, use Lambda, you know, if you are trying to build a high concurrency application efficient, you know, an enterprise application on it, on a serverless architecture construct come to us. >>Yeah. So, so just a, I'll pause at this for you because a, it reminds me what you were talking about about the builders here in the early days of VMware to get it to work the way I wanted to. People need to participate and build it and there's the Ikea effect. If I actually helped build it a little bit, I like it more to get to the vast majority, uh, to uh, adopt those things. It needs to become simplified and I can't have, you know, all the applications move over to this environment if I have to constantly tweak that. Everything. So that's the trend we've been really seeing this year is some of that simplification needs to get there. There's focus on, you know, the operators, the day two operations, the applications so that anybody can get there without having to build themselves. So we know there's still work to be done. >>Um, but if we've crossed the chasm and we want the majority to now adopt this, it can't be that I have to customize it. It needs to be more turnkey. Yeah. And I think it's a friendly and attitude between what you'll see in Amazon reinvent in couple of weeks. And then what you see here, because there is those, the focus of we're building application a what kind of tools and the Jess is gonna just launch today on the, on the floor. Okay. So we can just consume it and build our new application. They're not thinking, how did Andy just, he built his tools. Okay. And I think that's the opposite here is like how can you know Ali's is still working inside underneath dude who cares about his team. You know, you care about having connectivity between two points and and all that. How do you implement it that, you know, let someone else take care of it and then you can apply your few people that you have on solving your business problem, not on infrastructure. >>You know, I just met a guy, came to our booth, we've seen our demo. Pretty impressive how we rise people function and need scales and does everything automatically said we want to build something like you're doing, you know, not really like only 10% of what you just showed me. And we have about six people and for three months where it just like scratching our head. I said, okay, you can use our platform, pay us some software license and now you'll get, you know, 10 times more functionality and your six people can do something more useful. Says right, let's do a POC. So, so that's our intention and I think people are starting to get it because Kubernetes is not easy. Again, people tell me we installed Kubernete is now installed your stack and then they haven't installed like 20% of all the things that you need to stop so well your own have Eve always pleasure to catch up with you. Thanks for the all the updates and I know we'll catch up with you again soon. Sure. All right. For John Troyer, I'm Stu Miniman. We'll be back with more coverage here from CubeCon cloud date of con in San Diego. Thanks for watching the cube.

Published Date : Nov 20 2019

SUMMARY :

clock in cloud native con brought to you by red hat, the cloud native computing foundation So you know, All right, so, so one of the things we like having you on the program you're on is you don't pull any punches. I'm surprised, you know, that's our space where essentially you're doing a data science platform as a service And what I see that, you know, third of the people I talk to are, have some relations to machine learning you know, if I'm moving there you, because you, but you're still built on that, that kind of infrastructure I could run my own Coobernetti's you could, you could just give me a, uh, so sort of, uh, you know, uh, app. Because we're coming, if you know my background, you know, been beeping in Mellanox and other high-performance companies. and we have testimonials around that, that you know, things that they thought about building on Amazon or even I don't mind, you know, we do struggle sometimes because our level of expertise in Coobernetti's is Oracle, you know, kind of reporting. you know, Kubernetes oriented containers, the ammos, all that. in different, uh, domain like distributed deep learning, you know, distributed machine learning, And if you think about serverless, most reusability that, you know, lowest amount of cost, best performance, It's something that you click and it works and have all the enterprise set of features. a web hook, use Lambda, you know, if you are trying to build a high concurrency application you know, all the applications move over to this environment if I have to constantly tweak that. And I think that's the opposite here is like how can you know Ali's is still working inside I said, okay, you can use our platform, pay us some software license and now you'll get, you know,

ENTITIES

Entity	Category	Confidence
$50,000	QUANTITY	0.99+
John Troyer	PERSON	0.99+
John trier	PERSON	0.99+
$500	QUANTITY	0.99+
Stu Miniman	PERSON	0.99+
Andy	PERSON	0.99+
Nokia	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
three months	QUANTITY	0.99+
10 times	QUANTITY	0.99+
two points	QUANTITY	0.99+
San Diego	LOCATION	0.99+
50,000	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
six months	QUANTITY	0.99+
six people	QUANTITY	0.99+
San Diego, California	LOCATION	0.99+
two minute	QUANTITY	0.99+
Kubernete	TITLE	0.99+
Yaron Haviv	PERSON	0.99+
20%	QUANTITY	0.99+
100 times	QUANTITY	0.99+
Kubernetes	TITLE	0.99+
Lambda	TITLE	0.99+
Iguazio	PERSON	0.99+
one year	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
Pfizer	ORGANIZATION	0.99+
first	QUANTITY	0.99+
four years ago	DATE	0.99+
CNCF	ORGANIZATION	0.99+
two clusters	QUANTITY	0.98+
12,000	QUANTITY	0.98+
KubeCon	EVENT	0.98+
CubeCon	EVENT	0.98+
Jess	PERSON	0.97+
a year	QUANTITY	0.97+
Lego	ORGANIZATION	0.97+
last year	DATE	0.97+
CloudNativeCon	EVENT	0.97+
first batch	QUANTITY	0.97+
each one	QUANTITY	0.97+
today	DATE	0.96+
Desecco	ORGANIZATION	0.96+
weeks	QUANTITY	0.96+
5,000 events per second	QUANTITY	0.96+
Ali	PERSON	0.96+
two versions	QUANTITY	0.96+
one	QUANTITY	0.96+
two workloads	QUANTITY	0.95+
10%	QUANTITY	0.95+
two	QUANTITY	0.94+
Mellanox	ORGANIZATION	0.94+
dozens	QUANTITY	0.94+
Gwoza	ORGANIZATION	0.94+
5,000 events per second	QUANTITY	0.94+
single	QUANTITY	0.93+
third	QUANTITY	0.93+
up to 400,000 events	QUANTITY	0.93+
60,000 per process	QUANTITY	0.92+
this year	DATE	0.91+
this week	DATE	0.91+
a million people	QUANTITY	0.9+
Eve	PERSON	0.9+
5,000 events per second	QUANTITY	0.9+
Denon	ORGANIZATION	0.89+
2000 transactions per second	QUANTITY	0.88+
Alito	ORGANIZATION	0.87+
Aviv	PERSON	0.85+
about six people	QUANTITY	0.85+
Coobernetti	ORGANIZATION	0.85+
eight workloads	QUANTITY	0.84+
red hat	ORGANIZATION	0.83+
Hadoop	TITLE	0.82+
Cloudera	ORGANIZATION	0.81+
thousand	QUANTITY	0.79+
Canadian	LOCATION	0.79+

Colin Mahony, Vertica | MIT CDOIQ 2019

>> From Cambridge, Massachusetts, it's theCUBE, covering MIT Chief Data Officer and Information Quality Symposium 2019, brought to you by SiliconANGLE Media. >> Welcome back to Cambridge, Massachusetts everybody, you're watching The Cube, the leader in tech coverage. My name is Dave Vellante here with my cohost Paul Gillin. This is day one of our two day coverage of the MIT CDOIQ conferences. CDO, Chief Data Officer, IQ, information quality. Colin Mahoney is here, he's a good friend and long time CUBE alum. I haven't seen you in awhile, >> I know >> But thank you so much for taking some time, you're like a special guest here >> Thank you, yeah it's great to be here, thank you. >> Yeah, so, this is not, you know, something that you would normally attend. I caught up with you, invited you in. This conference has started as, like back office governance, information quality, kind of wonky stuff, hidden. And then when the big data meme took off, kind of around the time we met. The Chief Data Officer role emerged, the whole Hadoop thing exploded, and then this conference kind of got bigger and bigger and bigger. Still intimate, but very high level, very senior. It's kind of come full circle as we've been saying, you know, information quality still matters. You have been in this data business forever, so I wanted to invite you in just to get your perspectives, we'll talk about what's new with what's going on in your company, but let's go back a little bit. When we first met and even before, you saw it coming, you kind of invested your whole career into data. So, take us back 10 years, I mean it was so different, remember it was Batch, it was Hadoop, but it was cool. There was a lot of cool >> It's still cool. (laughs) projects going on, and it's still cool. But, take a look back. >> Yeah, so it's changed a lot, look, I got into it a while ago, I've always loved data, I had no idea, the explosion and the three V's of data that we've seen over the last decade. But, data's really important, and it's just going to get more and more important. But as I look back I think what's really changed, and even if you just go back a decade I mean, there's an insatiable appetite for data. And that is not slowing down, it hasn't slowed down at all, and I think everybody wants that perfect solution that they can ask any question and get an immediate answers to. We went through the Hadoop boom, I'd argue that we're going through the Hadoop bust, but what people actually want is still the same. You know, they want real answers, accurate answers, they want them quickly, and they want it against all their information and all their data. And I think that Hadoop evolved a lot as well, you know, it started as one thing 10 years ago, with MapReduce and I think in the end what it's really been about is disrupting the storage market. But if you really look at what's disrupting storage right now, public clouds, S3, right? That's the new data league. So there's always a lot of hype cycles, everybody talks about you know, now it's Cloud, everything, for maybe the last 10 years it was a lot of Hadoop, but at the end of the day I think what people want to do with data is still very much the same. And a lot of companies are still struggling with it, hence the role for Chief Data Officers to really figure out how do I monetize data on the one hand and how to I protect that asset on the other hand. >> Well so, and the cool this is, so this conference is not a tech conference, really. And we love tech, we love talking about this, this is why I love having you on. We kind of have a little Vertica thread that I've created here, so Colin essentially, is the current CEO of Vertica, I know that's not your title, you're GM and Senior Vice President, but you're running Vertica. So, Michael Stonebreaker's coming on tomorrow, >> Yeah, excellent. >> Chris Lynch is coming on tomorrow, >> Oh, great, yeah. >> we've got Andy Palmer >> Awesome, yeah. >> coming up as well. >> Pretty cool. (laughs) >> So we have this connection, why is that important? It's because, you know, Vertica is a very cool company and is all about data, and it was all about disrupting, sort of the traditional relational database. It's kind of doing more with data, and if you go back to the roots of Vertica, it was like how do you do things faster? How do you really take advantage of data to really drive new business? And that's kind of what it's all about. And the tech behind it is really cool, we did your conference for many, many years. >> It's coming back by the way. >> Is it? >> Yeah, this March, so March 30th. >> Oh, wow, mark that down. >> At Boston, at the new Encore Hotel. >> Well we better have theCUBE there, bro. (laughs) >> Yeah, that's great. And yeah, you've done that conference >> Yep. >> haven't you before? So very cool customers, kind of leading edge, so I want to get to some of that, but let's talk the disruption for a minute. So you guys started with the whole architecture, MPP and so forth. And you talked about Cloud, Cloud really disrupted Hadoop. What are some of the other technology disruptions that you're seeing in the market space? >> I think, I mean, you know, it's hard not to talk about AI machine learning, and what one means versus the other, who knows right? But I think one thing that is definitely happening is people are leveraging the volumes of data and they're trying to use all the processing power and storage power that we have to do things that humans either are too expensive to do or simply can't do at the same speed and scale. And so, I think we're going through a renaissance where a lot more is being automated, certainly on the Vertica roadmap, and our path has always been initially to get the data in and then we want the platform to do a lot more for our customers, lots more analytics, lots more machine-learning in the platform. So that's definitely been a lot of the buzz around, but what's really funny is when you talk to a lot of customers they're still struggling with just some basic stuff. Forget about the predictive thing, first you've got to get to what happened in the past. Let's give accurate reporting on what's actually happening. The other big thing I think as a disruption is, I think IOT, for all the hype that it's getting it's very real. And every device is kicking off lots of information, the feedback loop of AB testing or quality testing for predictive maintenance, it's happening almost instantly. And so you're getting massive amounts of new data coming in, it's all this machine sensor type data, you got to figure out what it means really quick, and then you actually have to do something and act on it within seconds. And that's a whole new area for so many people. It's not their traditional enterprise data network warehouse and you know, back to you comment on Stonebreaker, he got a lot of this right from the beginning, you know, and I think he looked at the architectures, he took a lot of the best in class designs, we didn't necessarily invent everything, but we put a lot of that together. And then I think the other you've got to do is constantly re-invent your platform. We came out with our Eon Mode to run cloud native, we just got rated the best cloud data warehouse from a net promoter score rating perspective, so, but we got to keep going you know, we got to keep re-inventing ourselves, but leverage everything that we've done in the past as well. >> So one of the things that you said, which is kind of relevant for here, Paul, is you're still seeing a real data quality issue that customers are wrestling with, and that's a big theme here, isn't it? >> Absolutely, and the, what goes around comes around, as Dave said earlier, we're still talking about information quality 13 years after this conference began. Have the tools to improve quality improved all that much? >> I think the tools have improved, I think that's another area where machine learning, if you look at Tamr, and I know you're going to have Andy here tomorrow, they're leveraging a lot of the augmented things you can do with the processing to make it better. But I think one thing that makes the problem worse now, is it's gotten really easy to pour data in. It's gotten really easy to store data without having to have the right structure, the right quality, you know, 10 years ago, 20 years ago, everything was perfect before it got into the platform. Right, everything was, there was quality, everything was there. What's been happening over the last decade is you're pumping data into these systems, nobody knows if it's redundant data, nobody knows if the quality's any good, and the amount of data is massive. >> And it's cheap to store >> Very cheap to store. >> So people keep pumping it in. >> But I think that creates a lot of issues when it comes to data quality. So, I do think the technology's gotten better, I think there's a lot of companies that are doing a great job with it, but I think the challenge has definitely upped. >> So, go ahead. >> I'm sorry. You mentioned earlier that we're seeing the death of Hadoop, but I'd like you to elaborate on that becuase (Dave laughs) Hadoop actually came up this morning in the keynote, it's part of what GlaxoSmithKline did. Came up in a conversation I had with the CEO of Experian last week, I mean, it's still out there, why do you think it's in decline? >> I think, I mean first of all if you look at the Hadoop vendors that are out there, they've all been struggling. I mean some of them are shutting down, two of them have merged and they've got killed lately. I think there are some very successful implementations of Hadoop. I think Hadoop as a storage environment is wonderful, I think you can process a lot of data on Hadoop, but the problem with Hadoop is it became the panacea that was going to solve all things data. It was going to be the database, it was going to be the data warehouse, it was going to do everything. >> That's usually the kiss of death, isn't it? >> It's the kiss of death. And it, you know, the killer app on Hadoop, ironically, became SQL. I mean, SQL's the killer app on Hadoop. If you want to SQL engine, you don't need Hadoop. But what we did was, in the beginning Mike sort of made fun of it, Stonebreaker, and joked a lot about he's heard of MapReduce, it's called Group By, (Dave laughs) and that created a lot of tension between the early Vertica and Hadoop. I think, in the end, we embraced it. We sit next to Hadoop, we sit on top of Hadoop, we sit behind it, we sit in front of it, it's there. But I think what the reality check of the industry has been, certainly by the business folks in these companies is it has not fulfilled all the promises, it has not fulfilled a fraction on the promises that they bet on, and so they need to figure those things out. So I don't think it's going to go away completely, but I think its best success has been disrupting the storage market, and I think there's some much larger disruptions of technologies that frankly are better than HTFS to do that. >> And the Cloud was a gamechanger >> And a lot of them are in the cloud. >> Which is ironic, 'cause you know, cloud era, (Colin laughs) they didn't really have a cloud strategy, neither did Hortonworks, neither did MapR and, it just so happened Amazon had one, Google had one, and Microsoft has one, so, it's just convenient to-- >> Well, how is that affecting your business? We've seen this massive migration to the cloud (mumbles) >> It's actually been great for us, so one of the things about Vertica is we run everywhere, and we made a decision a while ago, we had our own data warehouse as a service offering. It might have been ahead of its time, never really took off, what we did instead is we pivoted and we say "you know what? "We're going to invest in that experience "so it's a SaaS-like experience, "but we're going to let our customers "have full control over the cloud. "And if they want to go to Amazon they can, "if they want to go to Google they can, "if they want to go to Azure they can." And we really invested in that and that experience. We're up on the Amazon marketplace, we have lots of customers running up on Amazon Cloud as well as Google and Azure now, and then about two years ago we went down and did this endeavor to completely re-architect our product so that we could separate compute and storage so that our customers could actually take advantage of the cloud economics as well. That's been huge for us, >> So you scale independent-- >> Scale independently, cloud native, add compute, take away compute, and for our existing customers, they're loving the hybrid aspect, they love that they can still run on Premise, they love that they can run up on a public cloud, they love that they can run in both places. So we will continue to invest a lot in that. And it is really, really important, and frankly, I think cloud has helped Vertica a lot, because being able to provision hardware quickly, being able to tie in to these public clouds, into our customers' accounts, give them control, has been great and we're going to continue on that path. >> Because Vertica's an ISV, I mean you're a software company. >> We're a software company. >> I know you were a part of HP for a while, and HP wanted to mash that in and run it on it's hardware, but software runs great in the cloud. And then to you it's another hardware platform. >> It's another hardware platform, exactly. >> So give us the update on Micro Focus, Micro Focus acquired Vertica as part of the HPE software business, how many years ago now? Two years ago? >> Less than two years ago. >> Okay, so how's that going, >> It's going great. >> Give us the update there. >> Yeah, so first of all it is great, HPE and HP were wonderful to Vertica, but it's great being part of a software company. Micro Focus is a software company. And more than just a software company it's a company that has a lot of experience bridging the old and the new. Leveraging all of the investments that you've made but also thinking about cloud and all these other things that are coming down the pike. I think for Vertica it's been really great because, as you've seen Vertica has gotten its identity back again. And that's something that Micro Focus is very good at. You can look at what Micro Focus did with SUSE, the Linux company, which actually you know, now just recently spun out of Micro Focus but, letting organizations like Vertica that have this culture, have this product, have this passion, really focus on our market and our customers and doing the right thing by them has been just really great for us and operating as a software company. The other nice thing is that we do integrate with a lot of other products, some of which came from the HPE side, some of which came from Micro Focus, security products is an example. The other really nice thing is we've been doing this insource thing at Micro Focus where we open up our source code to some of the other teams in Micro Focus and they've been contributing now in amazing ways to the product. In ways that we would just never be able to scale, but with 4,000 engineers strong in Micro Focus, we've got a much larger development organization that can actually contribute to the things that Vertica needs to do. And as we go into the cloud and as we do a lot more operational aspects, the experience that these teams have has been incredible, and security's another great example there. So overall it's been great, we've had four different owners of Vertica, our job is to continue what we do on the innovation side in the culture, but so far Micro Focus has been terrific. >> Well, I'd like to say, you're kind of getting that mojo back, because you guys as an independent company were doing your own thing, and then you did for a while inside of HP, >> We did. >> And that obviously changed, 'cause they wanted more integration, but, and Micro Focus, they know what they're doing, they know how to do acquisitions, they've been very successful. >> It's a very well run company, operationally. >> The SUSE piece was really interesting, spinning that out, because now RHEL is part of IBM, so now you've got SUSE as the lone independent. >> Yeah. >> Yeah. >> But I want to ask you, go back to a technology question, is NoSQL the next Hadoop? Are these databases, it seems to be that the hot fad now is NoSQL, it can do anything. Is the promise overblown? >> I think, I mean NoSQL has been out almost as long as Hadoop, and I, we always say not only SQL, right? Mike's said this from day one, best tool for the job. Nothing is going to do every job well, so I think that there are, whether it's key value stores or other types of NoSQL engines, document DB's, now you have some of these DB's that are running on different chips, >> Graph, yeah. >> there's always, yeah, graph DBs, there's always going to be specialty things. I think one of the things about our analytic platform is we can do, time series is a great example. Vertica's a great time series database. We can compete with specialized time series databases. But we also offer a lot of, the other things that you can do with Vertica that you wouldn't be able to do on a database like that. So, I always think there's going to be specialty products, I also think some of these can do a lot more workloads than you might think, but I don't see as much around the NoSQL movement as say I did a few years ago. >> But so, and you mentioned the cloud before as kind of, your position on it I think is a tailwind, not to put words in your mouth, >> Yeah, yeah, it's a great tailwind. >> You're in the Amazon marketplace, I mean they have products that are competitive, right? >> They do, they do. >> But, so how are you differentiating there? >> I think the way we differentiate, whether it's Redshift from Amazon, or BigQuery from Google, or even what Azure DB does is, first of all, Vertica, I think from, feature functionality and performance standpoint is ahead. Number one, I think the second thing, and we hear this from a lot of customers, especially at the C-level is they don't want to be locked into these full stacks of the clouds. Having the ability to take a product and run it across multiple clouds is a big thing, because the stack lock-in now, the full stack lock-in of these clouds is scary. It's really easy to develop in their ecosystems but you get very locked into them, and I think a lot of people are concerned about that. So that works really well for Vertica, but I think at the end of the day it's just, it's the robustness of the product, we continue to innovate, when you look at separating compute and storage, believe it or not, a lot of these cloud-native databases don't do that. And so we can actually leverage a lot of the cloud hardware better than the native cloud databases do themselves. So, like I said, we have to keep going, those guys aren't going to stop, and we actually have great relationships with those companies, we work really well with the clouds, they seem to care just as much about their cloud ecosystem as their own database products, and so I think that's going to continue as well. >> Well, Colin, congratulations on all the success >> Yeah, thank you, yeah. >> It's awesome to see you again and really appreciate you coming to >> Oh thank you, it's great, I appreciate the invite, >> MIT. >> it's great to be here. >> All right, keep it right there everybody, Paul and I will be back with our next guest from MIT, you're watching theCUBE. (electronic jingle)

Published Date : Jul 31 2019

SUMMARY :

brought to you by SiliconANGLE Media. I haven't seen you in awhile, kind of around the time we met. It's still cool. but at the end of the day I think is the current CEO of Vertica, (laughs) and if you go back to the roots of Vertica, at the new Encore Hotel. Well we better have theCUBE there, bro. And yeah, you've done that conference but let's talk the disruption for a minute. but we got to keep going you know, Have the tools to improve quality the right quality, you know, But I think that creates a lot of issues but I'd like you to elaborate on that becuase I think you can process a lot of data on Hadoop, and so they need to figure those things out. so one of the things about Vertica is we run everywhere, and frankly, I think cloud has helped Vertica a lot, I mean you're a software company. And then to you it's another hardware platform. the Linux company, which actually you know, and Micro Focus, they know what they're doing, so now you've got SUSE as the lone independent. is NoSQL the next Hadoop? Nothing is going to do every job well, the other things that you can do with Vertica and so I think that's going to continue as well. Paul and I will be back with our next guest from MIT,

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Andy Palmer	PERSON	0.99+
Paul Gillin	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Colin Mahoney	PERSON	0.99+
Paul	PERSON	0.99+
Colin	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Vertica	ORGANIZATION	0.99+
Chris Lynch	PERSON	0.99+
HPE	ORGANIZATION	0.99+
Michael Stonebreaker	PERSON	0.99+
HP	ORGANIZATION	0.99+
Micro Focus	ORGANIZATION	0.99+
Hadoop	TITLE	0.99+
Colin Mahony	PERSON	0.99+
last week	DATE	0.99+
Andy	PERSON	0.99+
March 30th	DATE	0.99+
NoSQL	TITLE	0.99+
Mike	PERSON	0.99+
Experian	ORGANIZATION	0.99+
tomorrow	DATE	0.99+
SQL	TITLE	0.99+
two day	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Boston	LOCATION	0.99+
Cambridge, Massachusetts	LOCATION	0.99+
4,000 engineers	QUANTITY	0.99+
Two years ago	DATE	0.99+
SUSE	TITLE	0.99+
Azure DB	TITLE	0.98+
second thing	QUANTITY	0.98+
20 years ago	DATE	0.98+
10 years ago	DATE	0.98+
one	QUANTITY	0.98+
Vertica	TITLE	0.98+
Hortonworks	ORGANIZATION	0.97+
MapReduce	ORGANIZATION	0.97+
one thing	QUANTITY	0.97+

Ansa Sekharan, Informatica | Informatica World 2019

(upbeat music) >> Live from Las Vegas, it's theCUBE! Covering Informatica World 2019. Brought to you by Informatica. >> Welcome back to theCUBE, everyone. We are in the middle of two days of coverage of Informatica World here in Las Vegas. I'm your host, Rebecca Knight, along with my cohost, John Furrier. We are joined by Ansa Sekharan, he is the Executive Vice President and Chief Customer Officer at Informatica. Thanks so much for coming on the Cube, Ansa. >> My pleasure to be back on theCUBE. >> Great to see you. >> Thank you. >> So, let's talk about your role as the Chief Customer Officer. Last year you announced this change from a customer service model to a customer success model. How has that been? How have you implemented it and how's it going? >> Now, we have a great opportunity ahead of us. You see a number of enterprises embarking on a data transformation journey. As we offer the best products, it was quite apparent we had to take the services to the next level. We had to take the services and connect them to customers' business values. So we are blurring the lines between the various services functions: support, professional services, university, customer success, we want to abstract them, along with their products, we want to offer the best value to the customers. It's very simple. We sign up a new customer. The first thing we want to do is to work with the customer and define the success plan. What does success mean to them? Success, in two words, business outcomes. It's not about go-lives. Are the business users adopting and realizing value? That's where Informatica is very different from other enterprises, and I think that's going to further fuel our growth in the future. >> Ansa, you've been in the industry a very long time, Informatica many many years, how many years? >> 23 and counting. >> So, I'd consider you a historian of Informatica. (speaks indistinctly) I never saw myself as a historian. You've seen the transformations. Talk about what's going on now because, and certainly going private affords a lot of good things, in the public eye anymore in terms of shot clock earnings, being on that treadmill. You guys really did a lot of digging in to innovate. Now four years later, you start to see the fruit coming off that tree in the form of good catalog decision with the catalog, cloud early, AI early, the horizontal scalability of the infrastructure now and one operating model. Interesting kind of tailwinds for you guys. What's going on? How do you talk to customers who have kind of living in a cave, I won't want to say living in a cave, but they've been not as on the front end as you guys have been. >> I think when you use the word innovation it's just not about products. As a company we have been innovating. Along with the products, we have been innovating on all fronts, being at the services. We have, used to have, a major release every four years on services. We have shortened the cycle to two years. As a company we are now offering all our products on the cloud. What does it mean? What does it mean in customer support? We are having to redefine the entire delivery model end to end. You heard in the conference eight trillion transactions we process in a month. That was grown 3X just in a year. We have so much data. It's all about what is the information we can glean from these transactions. We have over a billion interactions with the customers every year. How can we put these transactions and interactions, package it in the form of we have the best telemetry products? We are leveraging this data to better sell the customers so that we can drive them, accelerate the business outcomes. When I started off we were a one product portfolio company. We had power center. Now we are the leader in six categories, and our user base is now, not only IT business, it's a great opportunity for us. >> The other thing that's a perfect storm, at least for innovation that's also happening, is the absolute validation that SAS business models have agility benefits, meaning you can take risk using data, understanding data, to get big rewards if scaled properly with cloud, so the role of data in pure SAS has been proven. Enterprises are recognizing that. Not that easy but still that's the path that people are now seeing clear visibility to. You guys are going after that. What's your take on that? >> I think when it comes to SAS, I think customers realize they should be focusing more on their business processes, and push the technology aside to the vendor. Try to partner with the vendor on how they can leverage on the technology side. That's where Informatica has put in a number of programs around that. Imagine a scenario, I'll give you a quick scenario. There's always this risk of putting this data on the cloud. What if you were to say, and there's upgrades every quarter, we push a lot of features and there's always the worry is something going to break. We are going to come out of the program, it's going to guarantee that we're going to foolproof the upgrades. Your stuff will work better, faster with every upgrade. That's the kind of, what customers expect. >> Guarantee that it won't break, basically? >> That's the kind of programs we're going to offer to our customers. We're going to have them for a day at scale, MDM is coming on the cloud you saw the demos we showed yesterday. I think we are redefining our model and going to push the envelope further on. >> Are customers asking for that assurance or is it more of you guys going to make that a table stakes because it's an opportunity for you? >> Both. >> Okay. >> Within the company our philosophy is very simple. I'll say an equation, CS equal to IS, customer success is equal to Informatica success. In my humble opinion, we both need each other. >> Just like data and AI. A symbiotic relationship. So I want to get back to what you were saying in terms of how you are defining this kind of customer success. We're working together with customers to define the business outcome and then working to see, okay, how do we get there? You have a lot of great customers, many in the Fortune 500, 100. Tell us a little bit about what you've seen over the past year in terms of, maybe without naming names or name names if you want to, but in terms of how these companies have seen a difference since you've changed this model. >> We sell a platform. I think we're the only vendor which offers a platform for data management. There are a number of vendors with poor installations. Informatica is the only vendor which offers late inclusion data platforms. Customers buy into the vision because data is, everyone is looking to leverage the power of data. As they buy this platform, they work with us to see how should they approach. This blueprint needs to evolve. We need to define the building blocks. Should they start with the catalog, should they validate what they're assets are? Where are we trying to push the service's frontiers that's not around technology? How can we help on the business processes side, as well? It's a big journey we are going to undertake and I think that's going to pay off big. I can quote a number of examples. I was sitting in a meeting this morning with a large bank and meeting up with the Chief Data Officer, and she kind of laid out her data strategy and we discussed how Informatica is going to be player owned. They are depending on us, and now we are going to keep our commitment, we are going to deliver on that promise we have made to them. >> How many customers do you guys see really thinking about data location storage where on premise versus cloud or are they more thinking differently around knowing that they're probably going to store it everywhere or somewhere? Can you share any insight into what the trends are there with your customers? >> Informatica's uniquely position is, there's future workloads which go to the cloud. It's hard to change systems that working, there's always going to be data in the premises. That shift, if something is working, customers don't quickly shut it down. So we see future workloads going to the cloud, traditional workloads, even we have a number of large clients still on mainframes. We offer the best products on mainframes as well as, it does not get much press, but-- >> This is the end to ending benefits that you guys are-- >> Correct. We go all the way, we cover the entire gambit of the data spectrum. >> What's the key enabler to make that happen? Is it the catalog, what's the big-- >> Catalog was the big, I think, last year that was the turning point with the catalog coming in, and now through professional services we offer a lot of workshops at no cost to our customer on how they should put their strategy, as well. >> One of the things that I'm hearing from you is the importance of really understanding the business in addition to the technology. I'm interested to hear how you hire. Obviously we hear so much about the importance of technical talent, and the problem of the skills gap in Silicon Valley and beyond, but you obviously are looking for candidates who also really get the business. So, what are the kinds of things that you're looking for and what kind of problems do you see in terms of the candidates that you're getting for your open roles? >> Customer support could be a hard job. We really want to, we look for people who want to make a difference. And if you have that attitude you get plenty of opportunities to make a difference. Now, with so much talk about AI, service automation, Chadbot, robotics, you know at the end of the day employees are still the core of the apple tree. I think the current trainers don't forget the people. The technology is not going to replace the people overnight, so I think we have a fabulous team at Informatica of customer support professionals. Our average retention rate is the mid 90s. So, we hire the best people, and they stay with us because this is a great platform. They move around products, but as long as we can give them that spectrum to grow, over time as they sell customers they build that tribal knowledge, and they can sell them better. And so we look for, I mean, there's a lot of data scientists coming in. We look, we always hire from colleges, groom them. I started off that way, and still with the company 23 years. I want to give that chance for the rest of team, as well. >> So how many other folks in the company have been there that long? That's a long time. You've been there a very, very long time. >> You'd be surprised at the number of people who have been long-timers at Informatica. It's a great company. >> How do you maintain the startup mentality? You were there when it was three years old, and now it's... >> I think personally what drives me is the fear of failure. Having set the bar high, you have to push, and if you want to keep at the pace you need to have the startup mentality. We have a number of projects in flight, and some, you have to have that mindset, and now we are a distributor team. We have to keep that spirit going throughout. And like I said, coming back to my equation, customer success equals Informatica success. That's what we believe as the company. >> You said CS is IS, customer success is. I mean, right? >> There you go. You made it sound even better. >> So just getting back to that, one of the biggest problems in the technology industry is the skills gap. Are you finding enough people to fill the roles you have? >> We do not have a problem hiring. The ramp up time, we have a good enablement program, which is good. Take the space of big data. The whole industry landscape changes every six months, so it's that mindset you need to have. Even I have that mindset today. I come in thinking I'm going to learn something new. Learning never stops. So you've just got to keep learning everyday. And I'm not setting expectations, we're going to groom them. I want people who learn on their own. They have to, they have to keep pace with the current technology. >> Any skills in school, kids in school that might, or parents watching with their kids, in high school or elementary school, what disciplines can they turn up, turn down, you think would make them successful in the future of how the data is going to impact society? There's a lot of new jobs coming out that don't have degrees for. Cal Berkeley just graduated their first inaugural class in data analytics. It's just a tell sign of how early it is, so still, you go back to sixth grade, you go back at the high school. Kids are looking to, they're gamers. They're into tech. They want to dial up some-- >> When I went to high school in 1984 I was the first batch of computer science, and we learned basic programming, things have really changed. My girls don't want to do computers, but it is something which we have to evolve constantly right, but-- >> Any classes right now that jump out at you that think, that's important? >> Data science is hard now, you know? >> A hard one. >> Yeah, it's hard. And with all the emphasis, we have a number of initiatives within support that will leverage AI, ML, as well. And I talked about it in the last year's program, but there could be some skills gap in some pockets, always you fill that that's going to be out of their pocket. You just got to be constantly pushing at it. >> Ansa, thank you so much for coming on theCUBE. >> It's a pleasure being on here, thank you. >> Thank you. >> Thank you, great job. >> I'm Rebecca Knight, for John Furrier, you are watching theCUBE's live coverage of Informatica World. Stay tuned. (upbeat music)

Published Date : May 22 2019

SUMMARY :

Brought to you by Informatica. We are in the middle of two days of coverage How have you implemented it and how's it going? We had to take the services and connect them in the public eye anymore in terms of shot clock earnings, We have shortened the cycle to two years. Not that easy but still that's the path and push the technology aside to the vendor. MDM is coming on the cloud you saw Within the company our philosophy is very simple. So I want to get back to what you were saying in terms We need to define the building blocks. We offer the best products on mainframes We go all the way, coming in, and now through professional services we offer One of the things that I'm hearing from you So, we hire the best people, and they stay with us So how many other folks in the company You'd be surprised at the number of people How do you maintain the startup mentality? Having set the bar high, you have to push, I mean, right? There you go. is the skills gap. so it's that mindset you need to have. of how the data is going to impact society? and we learned basic programming, And I talked about it in the last year's program, you are watching theCUBE's live coverage

ENTITIES

Entity	Category	Confidence
Rebecca Knight	PERSON	0.99+
John Furrier	PERSON	0.99+
Ansa Sekharan	PERSON	0.99+
Informatica	ORGANIZATION	0.99+
1984	DATE	0.99+
3X	QUANTITY	0.99+
two years	QUANTITY	0.99+
last year	DATE	0.99+
Silicon Valley	LOCATION	0.99+
Las Vegas	LOCATION	0.99+
Cal Berkeley	ORGANIZATION	0.99+
first	QUANTITY	0.99+
Last year	DATE	0.99+
23 years	QUANTITY	0.99+
Ansa	PERSON	0.99+
yesterday	DATE	0.99+
two words	QUANTITY	0.99+
Both	QUANTITY	0.99+
six categories	QUANTITY	0.99+
two days	QUANTITY	0.99+
One	QUANTITY	0.99+
four years later	DATE	0.98+
first batch	QUANTITY	0.98+
both	QUANTITY	0.97+
a day	QUANTITY	0.97+
theCUBE	ORGANIZATION	0.97+
today	DATE	0.96+
sixth grade	QUANTITY	0.96+
one	QUANTITY	0.96+
SAS	ORGANIZATION	0.96+
a month	QUANTITY	0.96+
a year	QUANTITY	0.95+
one product	QUANTITY	0.94+
over a billion interactions	QUANTITY	0.94+
Chadbot	ORGANIZATION	0.94+
eight trillion transactions	QUANTITY	0.93+
mid 90s	DATE	0.93+
this morning	DATE	0.91+
23	QUANTITY	0.91+
Informatica World	ORGANIZATION	0.9+
World	TITLE	0.9+
three years old	QUANTITY	0.87+
2019	DATE	0.86+
Cube	ORGANIZATION	0.84+
six months	QUANTITY	0.83+
first inaugural class	QUANTITY	0.82+
every year	QUANTITY	0.76+
every four years	QUANTITY	0.73+
apple tree	ORGANIZATION	0.69+
100	QUANTITY	0.69+
Informatica World 2019	EVENT	0.65+
each	QUANTITY	0.65+
Ansa	ORGANIZATION	0.62+
past year	DATE	0.62+
Fortune 500	ORGANIZATION	0.54+
model	QUANTITY	0.5+

Ronen Schwartz, Informatica | theCUBE NYC 2018

>> Live from New York, it's theCUBE covering theCUBE New York City 2018. Brought to you by SiliconANGLE Media and its ecosystem partners. (techy music) >> Welcome back to the Big Apple, everybody. This is theCUBE, the leader in live tech coverage. My name is Dave Vellante, I'm here with my cohost Peter Burris, and this is our week-long coverage of CUBENYC. It used to be, really, a big data theme. It sort of evolved into data, AI, machine learning. Ronan Schwartz is here, he's the senior vice president and general manager of cloud, big data, and data integration at data integration company Informatica. Great to see you again, Ronan, thanks so much for coming on. >> Thanks for inviting me, it's a good, warm day in New York. >> Yeah, the storm is coming and... Well, speaking of storms, the data center is booming. Data is this, you know, crescendo of storms (chuckles) have occurred, and you guys are at the center of that. It's been a tailwind for your business. Give us the update, how's business these days? >> So, we finished Q2 in a great, great success, the best Q2 that we ever had, and the third quarter looks just as promising, so I think the short answer is that we are seeing the strong demand for data, for technologies that supports data. We're seeing more users, new use cases, and definitely a huge growth in need to support... To support data, big data, data in the cloud, and so on, so I think very, very good Q2 and it looks like Q3's going to be just as good, if not better. >> That's great, so there's been a decades-long conversation, of course, about data, the value of data, but more often than not over the history of recent history, when I say recent I mean let's say 20 years on, data's been a problem for people. It's been expensive, how do you manage it, when do you delete it? It's sort of this nasty thing that people have to deal with. Fast forward to 2010, the whole Hadoop movement, all of a sudden data's the new oil, data's... You know, which Peter, of course, disagrees with for many reasons. >> No, it's... >> We don't have to get into it. >> It's subtlety. >> It's a subtlety, but you're right about it, and well, maybe if we have time we can talk about that, but the bromide of... But really focused attention on data and the importance of data and the value of data, and that was really a big contribution that Hadoop made. There were a lot of misconceptions. "Oh, we don't need the data warehouse anymore. "Oh, we don't need old," you know, "legacy databases." Of course none of those are true. Those are fundamental components of people's big data strategy, but talk about the importance of data and where Informatica fits. >> In a way, if I look into the same history that you described, and Informatica have definitely been a player through this history. We divide it into three eras. The first one is when data was like this thing that sits below the application, that used the application to feed the data in and if you want to see the data you go through the application, you see the data. We sometimes call that as Data 1.0. Data 2.0 was the time that companies, including Informatica, kind of froze and been able to give you a single view of the data across multiple systems, across your organization, and so on, because we're Informatica we have the ETL with data quality, even with master data management, kind of came into play and allowed an organization to actually build analytics as a system, to build single view as a system, et cetera. I think what is happening, and Hadoop was definitely a trigger, but I would say the cloud is just as big of a trigger as the big data technologies, and definitely everything that's happening right now with Spark and the processing power, et cetera, is contributing to that. This is the time of the Data 3.0 when data is actually in the center. It's not a single application like it was in the Data 2.0. It's not this thing below the application in Data 1.0. Data is in the center and everything else is just basically have to be connected to the data, and I think it's an amazing time. A big part of digitalization is the fact that the data is actually there. It's the most important asset the organization has. >> Yeah, so I want to follow up on something. So, last night we had a session Peter hosted on the future of AI, and he made the point, I said earlier data's the new oil. I said you disagreed, there's a nuance there. You made the point last night that oil, I can put oil in my car, I can put oil in my house, I can't do both. Data is the new currency, people said, "Well, I can spend a dollar or I can spend "a dollar on sports tickets, I can't do both." Data's different in that... >> It doesn't follow the economics of scarcity, and I think that's one of the main drivers here. As you talk about 1.0, 2.0, and 3.0, 1.0 it's locked in the application, 2.0 it's locked in a model, 3.0 now we're opening it up so that the same data can be shared, it can be evolved, it can be copied, it can be easily transformed, but their big issue is we have to sustain overall coherence of it. Security has to remain in place, we have to avoid corruption. Talk to us about some of the new demands given, especially that we've got this, more data but more users of that data. As we think about evidence-based management, where are we going to ensure that all of those new claims from all of those new users against those data sources can be satisfied? >> So, first, I truly like... This is a big nuance, it's not a small one. (laughs) The fact that you have better idea actually means that you do a lot of things better. It doesn't mean that you do one thing better and you cannot do the other. >> Right. I agree 100%, I actually contribute that for two things. One is more users, and the other thing is more ways to use the data, so the fact that you have better data, more data, big data, et cetera, actually means that your analytics is going to be better, right, but it actually means that if you are looking into hyperautomation and AI and machine learning and so on, suddenly this is possible to do because you have this data foundation that is big enough to actually support machine learning processes, and I think we're just in the beginning of that. I think we're going to see data being used for more and more use cases. We're in the integration business and in the data management business, and we're seeing, within what our customers are asking us to support, this huge growth in the number of patterns of how they want the data to be available, how they want to bring data into different places, into different users, so all of that is truly supporting what you just mentioned. I think if you look into the Data 2.0 timeframe, it was the time that a single team that is very, very strong with the right tools can actually handle the organization needs. In what you described, suddenly self-service. Can every group consume the data? Can I get the data in both batch and realtime? Can I get the data in a massive amount as well as in small chunks? These are all becoming very, very central. >> And very use case, but also user and context, you know, we think about time, dependent, and one of the biggest challenges that we have is to liberate the data in the context of the multiple different organization uses, and one of the biggest challenges that customers have, or that any enterprise has, and again, evidence-based management, nice trend, a lot of it's going to happen, but the familiarity with data is still something that's not, let's say broadly diffused, and a lot of the tools for ensuring that people can be made familiar, can discover, can reuse, can apply data, are modestly endowed today, so talk about some of these new tools that are going to make it easier to discover, capture, catalog, sustain these data assets? >> Yeah, and I think you're absolutely right, and if this is such a critical asset, and data is, and we're actually looking into more user consuming the data in more ways, it actually automatically create a bottleneck in how do I find the data, how do I identify the data that I need, and how am I making this available in the right place at the right time? In general, it looks like a problem that is almost unsolvable, like I got more data, more users, more patterns, nobody have their budget tripled or quadrupled just to be able to consume it. How do you address that, and I think Informatica very early have identified this growing need, and we have invested in a product that we call the enterprise data catalog, and it's actually... The concept of a catalog or a metadata repository, a place that you can actually identify all the data that exists, is not necessarily a new concept-- >> No, it's been around for years. >> Yes, but doing it in an enterprise-unified way is unique, and I think if you look into what we're trying to basically empower any user to do I basically, you know, we all using Google. You type something and you find it. If you're trying to find data in the organization in a similar way, it's a much harder task, and basically the catalog and Informatica unified, enterprise-unified catalog is doing that, leveraging a lot of machine learning and AI behind the scenes to basically make this search possible, make basically the identification of the data possible, the curation of the data possible, and basically empowering every user to find the data that he wants, see recommendation for other data that can work with it, and then basically consume the data in the way that he wants. I totally think that this will change the way IT is functioning. It is actually an amazing bridge between IT and the business. If there is one place that you can search all your data, suddenly the whole interface between IT and the business is changing, and Informatica's actually leading this change. >> So, the catalog gives you line-of-sight on all, (clears throat) all those data sources, what's the challenge in terms of creating a catalog and making it performant and useful? >> I think there are a few levels of the challenge. I chose the word enterprise-unified intelligent catalog deliberately, and I think each one of them is kind of representing a different challenge. The first challenge is the unified. There is technical metadata, this is the mapping and the processes that move data from one place to the other, then there is business metadata. These are the definition the business is using, and then there is the operational metadata as well, as well as the physical location and so on. Unifying all of them so that you can actually connect and see them in one place is a unique challenge that at this stage we have already completely addressed. The second one is enterprise, and when talking about enterprise metadata it means that you want all of your applications, you want application in the cloud, you want your cloud environment, your big data environment. You want, actually, your APIs, you want your integration environment. You want to be able to collect all of this metadata across the enterprise, so unified all the types, enterprise is the second one. The third challenge is actually the most exciting one, is how can you leverage intelligence so it's not limited by the human factor, by the amount of people that you have to actually put the data together, right? >> Mm-hm. >> And today we're using a very, very sophisticated, interesting logarithm to run on the metadata and be able to tell you that even though you don't know how the data got from here to here, it actually did get from here to here. >> Mm-hm. >> It's a dotted line, maybe somebody copied it, maybe something else happened, but the data is so similar that we can actually tell you it came from one place. >> So, actually, let me see, because I think there's... I don't think you missed a step, but let me reveal a step that's in there. One of the key issues in the enterprise side of things is to reveal how data's being used. The value of data is tied to its context, and having catalogs that can do, as you said, the unified, but also the metadata becomes part of how it's used makes that opportunity, that ability to then create audit trails and create lineage possible. >> You're absolutely right, and I think it actually is one of the most important things, is to see where the data came from and what steps did it go to. >> Right. >> There's also one other very interesting value of lineage that I think sometimes people tend to ignore is who else is using it? >> Right. >> Who else is consuming it, because that is actually, like, a very good indicator of how good the data is or how common the data is. The ability to actually leverage and create this lineage is a mandatory thing. The ability to create lineage that is inferred, and not actually specifically defined, is also very, very interesting, but we're now doing, like, things that are, I think, really exciting. For example, let's say that a user is looking into a data field in one source and he is actually identifying that this is a certain, specific ID that his organization is using. Now we're able to actually automatically understand that this field actually exists in 700 places, and actually, leverage the intelligence that he just gave us and actually ask him, "Do you want it to be automatically updated everywhere? "Do you want to do it in a step-by-step, guided way?" And this is how you actually scale to handle the massive amount of data, and this is how organizations are going to learn more and more and get the data to be better and better the more they work with the data. >> Now, Ronan, you have hard news this week, right? Why don't you update us on what you've announced? >> So, I think in the context for our discussion, Informatica announced here, actually today, this morning in Strata, a few very exciting news that are actually helping the customer go into this data journey. The first one is basically supporting data across, big data across multi-clouds. The ability to basically leverage all of these great tools, including the catalog, including the big data management, including data quality, data governance, and so on, on AWS, on Azure, on GCP, basically without any effort needed. We're even going further and we're empowering our user to use it in a serverless mode where we're actually allowing them full control over the resources that are being consumed. This is really, really critical because this is actually allowing them to do more with the data in a lower cost. I think the last part of the news that is really exciting is we added a lot, a lot of functionality around our Spark processing and the capabilities of the things that you can do so that the developers, the AI and machine learning can use their stuff, but at the same time we actually empower business users to do more than they ever did before. So, kind of being able to expand the amount of users that can access the data, wanting a more sophisticated way, and wanting a very simple but still very powerful way, I think this is kind of the summary of the news. >> And just a quick followup on that. If I understand it, it's your full complement of functionality across these clouds, is that right? You're not neutering... (chuckles) >> That is absolutely correct, yes, and we are seeing, definitely within our customers, a growing choice to decide to focus their big data efforts in the cloud, it makes a lot of sense. The ability to scale up and down in the cloud is significantly superior, but also the ability to give more users access in the cloud is typically easier, so I think Informatica have chosen as the market we're focusing on enterprise cloud data management. We talked a lot about data management. This is a lot about the cloud, the cloud part of it, and it's basically a very, very focused effort in optimizing things across clouds. >> Cloud is critical, obviously. That's how a lot of people want to do business. They want to do business in a cloud-like fashion, whether it's on-prem or off-prem. A lot of people want things to be off-prem. Cloud's important because it's where innovation is happening, and scale. Ronan, thanks so much for coming on theCUBE today. >> Yeah, thank you very much and I did learn something, oil is not one of the terms that I'm going to use for data in the future. >> Makes you think about that, right? >> I'm going to use something different, yes. >> It's good, and I also... My other takeaway is, in that context, being able to use data in multiple places. Usage is a proportional relationship between usage and value, so thanks for that. >> Excellent. >> Happy to be here. >> And thank you, everybody, for watching. We will be right back right after this short break. You're watching theCUBE at #CUBENYC, we'll be right back. (techy music)

Published Date : Sep 13 2018

SUMMARY :

Brought to you by SiliconANGLE Media Ronan Schwartz is here, he's the senior Well, speaking of storms, the data center is booming. the best Q2 that we ever had, and the third quarter conversation, of course, about data, the value of data, and the importance of data and the value of data, that the data is actually there. Data is the new currency, people said, so that the same data can be shared, it can be evolved, The fact that you have better idea actually so the fact that you have better data, in how do I find the data, how do I identify the data behind the scenes to basically make this search possible, by the amount of people that you have to actually put how the data got from here to here, it actually did get maybe something else happened, but the data and having catalogs that can do, as you said, it actually is one of the most important things, and get the data to be better and better of the things that you can do so that the developers, of functionality across these clouds, is that right? but also the ability to give more users That's how a lot of people want to do business. that I'm going to use for data in the future. being able to use data in multiple places. And thank you, everybody, for watching.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Ronan	PERSON	0.99+
Ronan Schwartz	PERSON	0.99+
Informatica	ORGANIZATION	0.99+
Peter	PERSON	0.99+
New York	LOCATION	0.99+
100%	QUANTITY	0.99+
Peter Burris	PERSON	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
20 years	QUANTITY	0.99+
Ronen Schwartz	PERSON	0.99+
700 places	QUANTITY	0.99+
2010	DATE	0.99+
third challenge	QUANTITY	0.99+
One	QUANTITY	0.99+
both	QUANTITY	0.99+
two things	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
a dollar	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
one source	QUANTITY	0.99+
first challenge	QUANTITY	0.98+
first one	QUANTITY	0.98+
today	DATE	0.98+
one	QUANTITY	0.98+
last night	DATE	0.98+
first	QUANTITY	0.98+
this week	DATE	0.97+
this morning	DATE	0.97+
second one	QUANTITY	0.97+
one place	QUANTITY	0.97+
Spark	TITLE	0.97+
3.0	OTHER	0.96+
single application	QUANTITY	0.96+
New York City	LOCATION	0.95+
single team	QUANTITY	0.93+
decades	QUANTITY	0.92+
2.0	OTHER	0.91+
each one	QUANTITY	0.91+
Hadoop	TITLE	0.9+
theCUBE	ORGANIZATION	0.89+
1.0	OTHER	0.89+
single view	QUANTITY	0.89+
third quarter	DATE	0.88+
Data 3.0	TITLE	0.85+
Data 2.0	TITLE	0.85+
Data 1.0	OTHER	0.84+
Q2	DATE	0.83+
Data 2.0	OTHER	0.83+
Azure	TITLE	0.82+
both batch	QUANTITY	0.81+
Big Apple	LOCATION	0.81+
NYC	LOCATION	0.78+
one thing	QUANTITY	0.74+
three eras	QUANTITY	0.74+
GCP	TITLE	0.65+
Q3	DATE	0.64+
Hadoop	PERSON	0.64+

Greg Fee, Lyft | Flink Forward 2018

>> Narrator: Live from San Francisco, it's theCUBE covering Flink Forward brought to you by Data Artisans. >> This is George Gilbert. We are at Data Artisan's conference Flink Forward. It is for the Apache Flink commmunity, sponsored by Data Artisans, and all the work they're doing to move Flink Forward, and to surround it with additional value that makes building stream-processing applications accessible to mainstream companies. Right now though, we are not talking to a mainstream company, we're talking to Greg Fee from Lyft. Not Uber. (laughs) And Greg tell us a little bit about what you're doing with Flink. What's the first-use case, that comes to mind that really exercises its capabilities? >> Sure, yeah, so the process of adopting Flink at Lyft has really started with a use case, which was, we're trying to make machine learning more accessible across all of Lyft. So we already use machine learning in quite a few applications, but we want to make sure that we use machine learning as much as possible, we really think that's the path forward. And one of the fundamental difficulties with that is having consistent feature generation between these offline batch-y training scenarios and the online real-time streaming scenarios. And the unified processing engine of Flink really helps us bridge that gap, so. >> When you say unified processing engine, are you saying that the fact that you can manage code and data, as sort of an application version, and some of the, either code or data, is part of the model, and so your versioning? >> That's even a step beyond what I'm talking about. >> Okay. >> Just the basic fundamental ability to have one piece of business logic that you can apply at the batch bulk layer, and in the real-time layer. >> George: Yeah. >> So that's sort of like the core of what Flink gives you. >> Are you running both batch and streaming on Flink? >> Yes, that's right. >> And using the, so, you're using the windows? Or just periodic execution on a stream to simulate batch? >> That's right. So we have, so feature generation crosses a broad spectrum of possible use cases in Flink. >> George: Yeah. >> And this is where we sort of transition more into what dA platform could give for us. So, we're looking to have thousands of different features across all of our machine learning models. So having a platform that can help us host many of these little programs running, help with the application life-cycle of each of these features, as we version them over time. So, we're very excited about what dA platform can do for us. >> Can you tell us a little more about how the stream processing helps you with the feature selection engineering, and is it that you're using streaming, or simulated batch, or batch using the same programming model to train these models, and you're using, you're picking up different derived data, is that how it's working? >> So, typical life-cycle is, it's going to be a feature engineering stage, so the data scientist is looking at their data, they're trying figure out patterns in the data, and they're going to, how you apply Flink there, is as you come up with potential algorithms for how you generate your feature, can run that through Flink, generate some data, apply machine learning model on top of it, and sort of play around with that data, prototype things. >> So, what you're doing is offline, or out of the platform, you're doing the feature selection and the engineering. >> Man: Right. >> Then you attach a stream to it that has just the relevant, perhaps, the relevant features. >> Man: Right. >> And then that model gets sort of, well maybe not yet, but eventually versioned as part of the application, which includes the application, the rest of the application logic and the data. >> Right. So, like some of the stuff that was touched on this morning at the keynotes, the versioning and maintaining machine learning applications, is a much, is a very complex ecosystem there. So being able to say, okay, going from the prototype stage, doing stuff in batch, to doing stuff in production, and real-time, then being able to version those over time, to move to better and better versions of the future generation, is very important to us. >> I don't know if this is the most politically correct thing, but you just explained it better than everyone else we have talked to. >> Great. (laughs) >> About how it all fits together with the machine learning. So, once you've got that in place, it sounds like you're using the dA platform, as well as, you know, perhaps some extensions for machine learning, to sort of add that as a separate life-cycle, besides the application code. Then, is that going to be the enterprise-wide platform for deploying, developing and deploying, machine learning applications? >> Yes, certainly we think there's probably a broad ecosystem to do machine learning. It's a very, sort of, wide open area. Certainly my agenda is to push it across the company and get as many things running in this system as possible. I think the real-time aspects of it, a unifying aspect, of what Flink can give us, and the platform can give us, in terms of the life-cycles. >> So, are you set up essentially like where you're the, a shared resource, a shared service, which is the platform group? >> Man: Right. >> And then, all the business units, adopt that platform and build their apps on it. >> Right. So my initiative is part of a greater data science platform at Lyft, so, my goal is to have, we have hundreds of data scientists who are going to be looking at this data, giving me little features that they want to do, and we're probably going to end up numbering in the thousands of features, being able to generate all those, maintain all those little programs. >> And when you say generate all those little programs, that's the application logic, and the models specific to that application? >> That's right, well. >> Or is it this? >> There's features that are typically shared across many models. >> Okay. >> So there's like two layers of things happening. >> So you're managing features separately from the models. >> That's right. >> Interesting. Okay, haven't heard that. And is the application manager tooling going to help address that, or is that custom stuff that you have to do? >> So, I think there's, I think there's a potential that that's the way we're going to manage the model stuff as well, but it's still little new over there. >> That you put it on the application platform? >> Right. >> Then that's sort of at the boundary of what you're doing right now, or what you will be doing shortly. >> Right. It's all, it's a matter of use-case, whether it's online or offline, and how it fits best in with the rest of the Lyft engineering system. >> When you're talking about your application landscape, do you have lots of streaming applications that feed other streaming applications, going through a hub. Or, are they sort of more discrete, you know, artifacts, discrete programs, and then when do you keep, stay within the streaming processors, and when do you have it in a shared database? >> That's a, that's a lot of questions, kind of a deep question. So, the goal is to have a central hub, where sort of all of our event data passes through it, and that allows us to decouple. >> So that's to be careful, that's not a database central hub, that's a, like a? >> An event hub. >> Event hub. >> Right. >> Yeah, okay. >> So, an event hub in the middle allows us to decompose the different, sort of smaller programs, which again are probably going to number in the thousands, so that being able to have different parts of the company maintain their own part of the overall system is very important to us. I think we'll probably see Flink as a major player, in terms of how those programs run, but we'll probably be shooting things off to other systems like Druid, like Hive, like Presto, like Elasticsearch. >> As derived data? >> As all derived data, from these Flink jobs. And then also, pushing data directly out into some of our production systems to feed into machine learning decisions. >> Okay, this is quite, sounds like the most ambitious infrastructure that we've heard, in that it sounds like pretty ubiquitous. >> We want to be a machine-learning first company. So, it's everywhere. >> So, now help me clarify for me, when? Because this is, you know, for mainstream companies who've programmed with, you know, DBMS, as a shared state manager for decades, help explain to them when you would still use a DBMS for shared state, and when you would start using the distributed state that's embedded in Flink, and the derived data, you know, at the endpoints, at the syncs. >> So I mean, I guess this kind of gets into your exact, your use cases and, you know, your opinions and thoughts about how to use these things best, but. >> George: Your opinion is what we're interested in. >> Right. From where I'm coming, I see basically databases as potential one sync for this data. They do things very well, right? They do structured queries very well. You can have indices built off that, aggregates, really feed into a lot of visualization stuff. >> George: Yeah. >> But, from where I am sitting, like we're really moving away from databases as something that feeds production data. We've got other stores to do that, that are sort of more tailored towards those scenarios. >> When you say to feed production data, this is transaction capture, or data capture. >> Right. So we don't have a lot of atomic transactions, outside the payments at Lyft, most of the stuff is eventually consistent. So we have stores, more like Dynamo or Cassandra HBase that feed a lot of our production data. >> And those databases, are they for like ambient information like influencing an interaction, it doesn't sound like automating a transaction. It would be, it sounds like, context that helps with analytics, but very separate from the OLTP apps. >> That's right. So we have, you can kind of bifurcate the company into the data that's used in production to make decisions that are like facing the user, and then our analytics back end, that really helps business analysts and like the executives make decisions about how we proceed. >> And so that second part, that backend, is more like operational efficiency. >> Man: Right. >> And coding new business processes to support new ways of doing business, but the customer-facing stuff specifically like with payments, that still needs a traditional OLTP. >> Man: Right. >> But there not, those use cases aren't growing that much. >> That's right. So, basically we have very specific use-cases for like a traditional database, but in terms of capturing the types of scale, and the type of growth, we're looking for at Lyft, we think some of the other storage engines suit those better. >> So in that use-case, would the OLTP DBMS be at the front end, would it be a source, or a sync? It sounds like it's a source. >> So we actually do it both ways. Right, so, it's great to get our transactional data flowing through our streaming system, it's a lot of value in that, but also then pushing it out, back to some of the aggregate results to DBMS, helps with our analytics pipeline. >> Okay, okay. Well this is actually really interesting. So, where do you see the dA platform helping, you know, going forward; is it something you don't really need because you've built all that scaffolding to help with sort of application life-cycle management, or or do you see it as something that'll help sort of push Flink sort of enterprise-wide? >> I think the dA platform really helps people sort of adopt Flink at an enterprise level. Maintaining the applications is a core part of what it means to run it as a business. And so we're looking at dA platform as a way of managing our applications, and I think, like I'm just talking about one, I'm mostly talking about one application we have for Flink at Lyft. >> Yeah. >> We have many other Flink programs actually running, that are sort of unrelated to my project. >> What about managing non-Flink applications? Do you need an application manager? Is it okay that it's associated with one service, or platform like Flink, or is there a desire you know among bleeding edge customers to have an overall, sort of infrastructure management, application management kind of suite. >> Yes, for sure. You're touching on something I have started to push inside of Lyft, which is the need for an overall application life-cycle management product that's not technology specific. >> Would these sort of plug into the dA platform and whatever the confluent, you know, equivalent is, or is it going to to directly tie to the, you know, operational capabilities, or the functional capabilities, not the management capabilities. In other words would it plug into like core Flink, core Kafka, core Spark, that sort of stuff? >> I think that's sort of largely to be determined. If you go back to sort of how distributed design system works, typically. We have a user plane, which is going to be our data users. Then you end up with the thing we're probably most familiar with, which is our data plane, technologies like Flink and Kafka and Hive, all those guys. What's missing in the middle right now is a control plane. It's a map from the user desire, from the user intention, to what we do with all of that data plane stuff. So launch a new program, maybe you need a new Kafka topic, maybe you need to provision in Kafka. Higher, you need to get some Flink programs running, and whether that talks directly talks to Flink, and goes against Kubernetes, or something like that, or whether it talks to a higher level, like more application-specific platform. >> Man: Yeah. >> I think, you know it's certainly a lot easier, if we have some of these platforms in the way. >> Because they give you better abstractions. >> That's right. >> To talk to the platforms. >> That's right. >> That's interesting. Okay, geesh, we learn something really, really interesting with each interview. I'm curious though, if you look out a couple years, how much of your application landscape will be continuous processing, and is that something you can see mainstream enterprises adopting, or has decades of work with, you know, batch and interactive sort of made people too difficult to learn something so radically new? >> I think it's all going to be driven by the business needs, and whether the value is there for people to make that transition 'cause it is quite expensive to invest in new infrastructure. For companies like Lyft, where we're trying to make decisions very quickly, you know, users get down to two seconds makes a difference for the customer, so we're trying to be as, you know, real-time as possible. I used to work at Salesforce. Salespeople are a little less sensitive to these things, and you know it's very, very traditional world. >> That's interesting. (background applauding) >> But even Salesforce is moving towards that style. >> Even Salesforce is moving? >> Is moving toward streaming processing. >> Really? >> George: So like, I think we're going to see it slowly be adopted across the big enterprises. >> George: I imagine that's probably for their analytics. >> That's where they're starting, of course, yeah. >> Okay. So, this was, a little more affirmation on to how we're going to see the control plane evolve, and the interesting use-cases that you're up to. I hope we can see you back next year. And you can tell us how far you've proceeded. >> I certainly hope so, yeah. >> This was really interesting. So, Greg Fee from Lyft. We will hopefully see you again. And this is George Gilbert. We're at the Data Artisans Flink Forward conference in San Francisco. We'll be back after this break. (techno music)

Published Date : Apr 12 2018

SUMMARY :

brought to you by Data Artisans. What's the first-use case, that comes to mind And one of the fundamental difficulties with that That's even a step beyond what Just the basic fundamental ability to have So we have, so feature generation crosses a broad So having a platform that can help us host with potential algorithms for how you So, what you're doing is offline, or out of the platform, Then you attach a stream to it that has just of the application logic and the data. So, like some of the stuff that was touched on politically correct thing, but you just explained (laughs) Then, is that going to be the enterprise-wide platform in terms of the life-cycles. and build their apps on it. in the thousands of features, being able to generate There's features that are typically And is the application manager tooling going to help that that's the way we're going to manage the model stuff Then that's sort of at the boundary of what you're of the Lyft engineering system. and when do you have it in a shared database? So, the goal is to have a central hub, So, an event hub in the middle allows us to decompose some of our production systems to feed into Okay, this is quite, sounds like the most ambitious So, it's everywhere. and the derived data, you know, at the endpoints, about how to use these things best, but. into a lot of visualization stuff. We've got other stores to do that, that are sort of When you say to feed production data, outside the payments at Lyft, most of the stuff And those databases, are they for like ambient information So we have, you can kind of bifurcate the company And so that second part, that backend, is more like of doing business, but the customer-facing stuff the types of scale, and the type of growth, we're looking be at the front end, would it be a source, or a sync? some of the aggregate results to DBMS, So, where do you see the dA platform helping, you know, Maintaining the applications is a core part actually running, that are sort of unrelated to my project. you know among bleeding edge customers to have an overall, inside of Lyft, which is the need for an overall application or is it going to to directly tie to the, you know, to what we do with all of that data plane stuff. I think, you know it's certainly a lot easier, or has decades of work with, you know, and you know it's very, That's interesting. that style. adopted across the big enterprises. I hope we can see you back next year. We're at the Data Artisans Flink Forward conference

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
George	PERSON	0.99+
Greg	PERSON	0.99+
Greg Fee	PERSON	0.99+
Data Artisans	ORGANIZATION	0.99+
San Francisco	LOCATION	0.99+
Lyft	ORGANIZATION	0.99+
thousands	QUANTITY	0.99+
next year	DATE	0.99+
second part	QUANTITY	0.99+
Uber	ORGANIZATION	0.99+
each interview	QUANTITY	0.99+
Dynamo	ORGANIZATION	0.99+
Salesforce	ORGANIZATION	0.99+
Apache	ORGANIZATION	0.98+
Flink	ORGANIZATION	0.98+
one service	QUANTITY	0.98+
two layers	QUANTITY	0.98+
two seconds	QUANTITY	0.98+
each	QUANTITY	0.97+
thousands of features	QUANTITY	0.97+
both ways	QUANTITY	0.97+
Kafka	TITLE	0.93+
first-use case	QUANTITY	0.92+
one application	QUANTITY	0.92+
Druid	TITLE	0.92+
Flink Forward	TITLE	0.92+
decades	QUANTITY	0.91+
Elasticsearch	TITLE	0.89+
Data Artisans Flink Forward	EVENT	0.89+
one	QUANTITY	0.89+
Artisan	EVENT	0.87+
first company	QUANTITY	0.87+
hundreds of data scientists	QUANTITY	0.87+
both batch	QUANTITY	0.84+
one piece	QUANTITY	0.83+
2018	DATE	0.81+
Flink	TITLE	0.8+
Hive	TITLE	0.77+
Presto	TITLE	0.76+
this morning	DATE	0.75+
features	QUANTITY	0.74+
couple	QUANTITY	0.73+
Flink Forward	EVENT	0.69+
Hive	ORGANIZATION	0.65+
Spark	TITLE	0.62+
Kubernetes	ORGANIZATION	0.61+
Data	ORGANIZATION	0.6+
Cassandra HBase	ORGANIZATION	0.57+

Arun Murthy, Hortonworks | BigData NYC 2017

>> Coming back when we were a DOS spreadsheet company. I did a short stint at Microsoft and then joined Frank Quattrone when he spun out of Morgan Stanley to create what would become the number three tech investment (upbeat music) >> Host: Live from mid-town Manhattan, it's theCUBE covering the BigData New York City 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. (upbeat electronic music) >> Welcome back, everyone. We're here, live, on day two of our three days of coverage of BigData NYC. This is our event that we put on every year. It's our fifth year doing BigData NYC in conjunction with Hadoop World which evolved into Strata Conference, which evolved into Strata Hadoop, now called Strata Data. Probably next year will be called Strata AI, but we're still theCUBE, we'll always be theCUBE and this our BigData NYC, our eighth year covering the BigData world since Hadoop World. And then as Hortonworks came on we started covering Hortonworks' data summit. >> Arun: DataWorks Summit. >> DataWorks Summit. Arun Murthy, my next guest, Co-Founder and Chief Product Officer of Hortonworks. Great to see you, looking good. >> Likewise, thank you. Thanks for having me. >> Boy, what a journey. Hadoop, years ago, >> 12 years now. >> I still remember, you guys came out of Yahoo, you guys put Hortonworks together and then since, gone public, first to go public, then Cloudera just went public. So, the Hadoop World is pretty much out there, everyone knows where it's at, it's got to nice use case, but the whole world's moved around it. You guys have been, really the first of the Hadoop players, before ever Cloudera, on this notion of data in flight, or, I call, real-time data but I think, you guys call it data-in-motion. Batch, we all know what Batch does, a lot of things to do with Batch, you can optimize it, it's not going anywhere, it's going to grow. Real-time data-in-motion's a huge deal. Give us the update. >> Absolutely, you know, we've obviously been in this space, personally, I've been in this for about 12 years now. So, we've had a lot of time to think about it. >> Host: Since you were 12? >> Yeah. (laughs) Almost. Probably look like it. So, back in 2014 and '15 when we, sort of, went public and we're started looking around, the thesis always was, yes, Hadoop is important, we're going to love you to manage lots and lots of data, but a lot of the stuff we've done since the beginning, starting with YARN and so on, was really enable the use cases beyond the whole traditional transactions and analytics. And Drop, our CO calls it, his vision's always been we've got to get into a pre-transactional world, if you will, rather than the post-transactional analytics and BIN and so on. So that's where it started. And increasingly, the obvious next step was to say, look enterprises want to be able to get insights from data, but they also want, increasingly, they want to get insights and they want to deal with it in real-time. You know while you're in you shopping cart. They want to make sure you don't abandon your shopping cart. If you were sitting at at retailer and you're on an island and you're about to walk away from a dress, you want to be able to do something about it. So, this notion of real-time is really important because it helps the enterprise connect with the customer at the point of action, if you will, and provide value right away rather than having to try to do this post-transaction. So, it's been a really important journey. We went and bought this company called Onyara, which is a bunch of geeks like us who started off with the government, built this batching NiFi thing, huge community. Its just, like, taking off at this point. It's been a fantastic thing to join hands and join the team and keep pushing in the whole streaming data style. >> There's a real, I don't mean to tangent but I do since you brought up community I wanted to bring this up. It's been the theme here this week. It's more and more obvious that the community role is becoming central, beyond open-source. We all know open-source, standing on the shoulders before us, you know. And Linux Foundation showing code numbers hitting up from $64 million to billions in the next five, ten years, exponential growth of new code coming in. So open-source certainly blew me. But now community is translating to things you start to see blockchain, very community based. That's a whole new currency market that's changing the financial landscape, ICOs and what-not, that's just one data point. Businesses, marketing communities, you're starting to see data as a fundamental thing around communities. And certainly it's going to change the vendor landscape. So you guys compare to, Cloudera and others have always been community driven. >> Yeah our philosophy has been simple. You know, more eyes and more hands are better than fewer. And it's been one of the cornerstones of our founding thesis, if you will. And you saw how that's gone on over course of six years we've been around. Super-excited to have someone like IBM join hands, it happened at DataWorks Summit in San Jose. That announcement, again, is a reflection of the fact that we've been very, very community driven and very, very ecosystem driven. >> Communities are fundamentally built on trust and partnering. >> Arun: Exactly >> Coding is pretty obvious, you code with your friends. You code with people who are good, they become your friends. There's an honor system among you. You're starting to see that in the corporate deals. So explain the dynamic there and some of the successes that you guys have had on the product side where one plus one equals more than two. One plus one equals five or three. >> You know IBM has been a great example. They've decided to focus on their strengths which is around Watson and machine learning and for us to focus on our strengths around data management, infrastructure, cloud and so on. So this combination of DSX, which is their data science work experience, along with Hortonworks is really powerful. We are seeing that over and over again. Just yesterday we announced the whole Dataplane thing, we were super excited about it. And now to get IBM to say, we'll get in our technologies and our IP, big data, whether it's big Quality or big Insights or big SEQUEL, and the word has been phenomenal. >> Well the Dataplane announcement, finally people who know me know that I hate the term data lake. I always said it's always been a data ocean. So I get redemption because now the data lakes, now it's admitting it's a horrible name but just saying stitching together the data lakes, Which is essentially a data ocean. Data lakes are out there and you can form these data lakes, or data sets, batch, whatever, but connecting them and integrating them is a huge issue, especially with security. >> And a lot of it is, it's also just pragmatism. We start off with this notion of data lake and say, hey, you got too many silos inside the enterprise in one data center, you want to put them together. But then increasingly, as Hadoop has become more and more mainstream, I can't remember the last time I had to explain what Hadoop is to somebody. As it has become mainstream, couple things have happened. One is, we talked about streaming data. We see all the time, especially with HTF. We have customers streaming data from autonomous cars. You have customers streaming from security cameras. You can put a small minify agent in a security camera or smart phone and can stream it all the way back. Then you get into physics. You're up against the laws of physics. If you have a security camera in Japan, why would you want to move it all the way to California and process it. You'd rather do it right there, right? So with this notion of a regional data center becomes really important. >> And that talks to the Edge as well. >> Exactly, right. So you want to have something in Japan that collects all of the security cameras in Tokyo, and you do analysis and push what you want back here, right. So that's physics. The other thing we are increasingly seeing is with data sovereignty rules especially things like GDPR, there's now regulation reasons where data has to naturally stay in different regions. Customer data from Germany cannot move to France or visa versa, right. >> Data governance is a huge issue and this is the problem I have with data governance. I am really looking for a solution so if you can illuminate this it would be great. So there is going to be an Equifax out there again. >> Arun: Oh, for sure. >> And the problem is, is that going to force some regulation change? So what we see is, certainly on the mugi bond side, I see it personally is that, you can almost see that something else will happen that'll force some policy regulation or governance. You don't want to screw up your data. You also don't want to rewrite your applications or rewrite you machine learning algorithms. So there's a lot of waste potential by not structuring the data properly. Can you comment on what's the preferred path? >> Absolutely, and that's why we've been working on things like Dataplane for almost a couple of years now. We is to say, you have to have data and policies which make sense, given a context. And the context is going to change by application, by usage, by compliance, by law. So, now to manage 20, 30, 50 a 100 data lakes, would it be better, not saying lakes, data ponds, >> [Host} Any Data. >> Any data >> Any data pool, stream, river, ocean, whatever. (laughs) >> Jacuzzis. Data jacuzzis, right. So what you want to do is want a holistic fabric, I like the term, you know Forrester uses, they call it the fabric. >> Host: Data fabric. >> Data fabric, right? You want a fabric over these so you can actually control and maintain governance and security centrally, but apply it with context. Last not least, is you want to do this whether it's on frame or on the cloud, or multi-cloud. So we've been working with a bank. They were probably based in Germany but for GDPR they had to stand up something in France now. They had French customers, but for a bunch of new reasons, regulation reasons, they had to sign up something in France. So they bring their own data center, then they had only the cloud provider, right, who I won't name. And they were great, things are working well. Now they want to expand the similar offering to customers in Asia. It turns out their favorite cloud vendor was not available in Asia or they were not available in time frame which made sense for the offering. So they had to go with cloud vendor two. So now although each of the vendors will do their job in terms of giving you all the security and governance and so on, the fact that you are to manage it three ways, one for OnFrame, one for cloud vendor A and B, was really hard, too hard for them. So this notion of a fabric across these things, which is Dataplane. And that, by the way, is based by all the open source technologies we love like Atlas and Ranger. By the way, that is also what IBM is betting on and what the entire ecosystem, but it seems like a no-brainer at this point. That was the kind of reason why we foresaw the need for something like a Dataplane and obviously couldn't be more excited to have something like that in the market today as a net new service that people can use. >> You get the catalogs, security controls, data integration. >> Arun: Exactly. >> Then you get the cloud, whatever, pick your cloud scenario, you can do that. Killer architecture, I liked it a lot. I guess the question I have for you personally is what's driving the product decisions at Hortonworks? And the second part of that question is, how does that change your ecosystem engagement? Because you guys have been very friendly in a partnering sense and also very good with the ecosystem. How are you guys deciding the product strategies? Does it bubble up from the community? Is there an ivory tower, let's go take that hill? >> It's both, because what typically happens is obviously we've been in the community now for a long time. Working publicly now with well over 1,000 customers not only puts a lot of responsibility on our shoulders but it's also very nice because it gives us a vantage point which is unique. That's number one. The second one we see is being in the community, also we see the fact that people are starting to solve the problems. So it's another elementary for us. So you have one as the enterprise side, we see what the enterprises are facing which is kind of where Dataplane came in, but we also saw in the community where people are starting to ask us about hey, can you do multi-cluster Atlas? Or multi-cluster Ranger? Put two and two together and say there is a real need. >> So you get some consensus. >> You get some consensus, and you also see that on the enterprise side. Last not least is when went to friends like IBM and say hey we're doing this. This is where we can position this, right. So we can actually bring in IGSC, you can bring big Quality and bring all these type, >> [Host} So things had clicked with IBM? >> Exactly. >> Rob Thomas was thinking the same thing. Bring in the power system and the horsepower. >> Exactly, yep. We announced something, for example, we have been working with the power guys and NVIDIA, for deep learning, right. That sort of stuff is what clicks if you're in the community long enough, if you have the vantage point of the enterprise long enough, it feels like the two of them click. And that's frankly, my job. >> Great, and you've got obviously the landscape. The waves are coming in. So I've got to ask you, the big waves are coming in and you're seeing people starting to get hip with the couple of key things that they got to get their hands on. They need to have the big surfboards, metaphorically speaking. They got to have some good products, big emphasis on real value. Don't give me any hype, don't give me a head fake. You know, I buy, okay, AI Wash, and people can see right through that. Alright, that's clear. But AI's great. We all cheer for AI but the reality is, everyone knows that's pretty much b.s. except for core machine learning is on the front edge of innovation. So that's cool, but value. [Laughs] Hey I've got the integrate and operationalize my data so that's the big wave that's coming. Comment on the community piece because enterprises now are realizing as open source becomes the dominant source of value for them, they are now really going to the next level. It used to be like the emerging enterprises that knew open source. The guys will volunteer and they may not go deeper in the community. But now more people in the enterprises are in open source communities, they are recruiting from open source communities, and that's impacting their business. What's your advice for someone who's been in the community of open source? Lessons you've learned, what is the best practice, from your standpoint on philosophy, how to build into the community, how to build a community model. >> Yeah, I mean, the end of the day, my best advice is to say look, the community is defined by the people who contribute. So, you get advice if you contribute. Which means, if that's the fundamental truth. Which means you have to get your legal policies and so on to a point that you can actually start to let your employees contribute. That kicks off a flywheel, where you can actually go then recruit the best talent, because the best talent wants to stand out. Github is a resume now. It is not a word doc. If you don't allow them to build that resume they're not going to come by and it's just a fundamental truth. >> It's self governing, it's reality. >> It's reality, exactly. Right and we see that over and over again. It's taken time but it as with things, the flywheel has changed enough. >> A whole new generation's coming online. If you look at the young kids coming in now, it is an amazing environment. You've got TensorFlow, all this cool stuff happening. It's just amazing. >> You, know 20 years ago that wouldn't happen because the Googles of the world won't open source it. Now increasingly, >> The secret's out, open source works. >> Yeah, (laughs) shh. >> Tell everybody. You know they know already but, This is changing some of the how H.R. works and how people collaborate, >> And the policies around it. The legal policies around contribution so, >> Arun, great to see you. Congratulations. It's been fun to watch the Hortonworks journey. I want to appreciate you and Rob Bearden for supporting theCUBE here in BigData NYC. If is wasn't for Hortonworks and Rob Bearden and your support, theCUBE would not be part of the Strata Data, which we are not allowed to broadcast into, for the record. O'Reilly Media does not allow TheCube or our analysts inside their venue. They've excluded us and that's a bummer for them. They're a closed organization. But I want to thank Hortonworks and you guys for supporting us. >> Arun: Likewise. >> We really appreciate it. >> Arun: Thanks for having me back. >> Thanks and shout out to Rob Bearden. Good luck and CPO, it's a fun job, you know, not the pressure. I got a lot of pressure. A whole lot. >> Arun: Alright, thanks. >> More Cube coverage after this short break. (upbeat electronic music)

Published Date : Sep 28 2017

SUMMARY :

the number three tech investment Brought to you by SiliconANGLE Media This is our event that we put on every year. Co-Founder and Chief Product Officer of Hortonworks. Thanks for having me. Boy, what a journey. You guys have been, really the first of the Hadoop players, Absolutely, you know, we've obviously been in this space, at the point of action, if you will, standing on the shoulders before us, you know. And it's been one of the cornerstones Communities are fundamentally built on that you guys have had on the product side and the word has been phenomenal. So I get redemption because now the data lakes, I can't remember the last time I had to explain and you do analysis and push what you want back here, right. so if you can illuminate this it would be great. I see it personally is that, you can almost see that We is to say, you have to have data and policies Any data pool, stream, river, ocean, whatever. I like the term, you know Forrester uses, the fact that you are to manage it three ways, I guess the question I have for you personally is So you have one as the enterprise side, and you also see that on the enterprise side. Bring in the power system and the horsepower. if you have the vantage point of the enterprise long enough, is on the front edge of innovation. and so on to a point that you can actually the flywheel has changed enough. If you look at the young kids coming in now, because the Googles of the world won't open source it. This is changing some of the how H.R. works And the policies around it. and you guys for supporting us. Thanks and shout out to Rob Bearden. More Cube coverage after this short break.

ENTITIES

Entity	Category	Confidence
Asia	LOCATION	0.99+
France	LOCATION	0.99+
Arun	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Rob Bearden	PERSON	0.99+
Germany	LOCATION	0.99+
Arun Murthy	PERSON	0.99+
Japan	LOCATION	0.99+
NVIDIA	ORGANIZATION	0.99+
Tokyo	LOCATION	0.99+
2014	DATE	0.99+
California	LOCATION	0.99+
12	QUANTITY	0.99+
five	QUANTITY	0.99+
Frank Quattrone	PERSON	0.99+
three	QUANTITY	0.99+
two	QUANTITY	0.99+
Onyara	ORGANIZATION	0.99+
$64 million	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
O'Reilly Media	ORGANIZATION	0.99+
each	QUANTITY	0.99+
Morgan Stanley	ORGANIZATION	0.99+
Linux Foundation	ORGANIZATION	0.99+
One	QUANTITY	0.99+
fifth year	QUANTITY	0.99+
Atlas	ORGANIZATION	0.99+
20	QUANTITY	0.99+
one	QUANTITY	0.99+
Rob Thomas	PERSON	0.99+
three days	QUANTITY	0.99+
eighth year	QUANTITY	0.99+
yesterday	DATE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
six years	QUANTITY	0.99+
Equifax	ORGANIZATION	0.99+
next year	DATE	0.99+
NYC	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
second part	QUANTITY	0.99+
both	QUANTITY	0.99+
Ranger	ORGANIZATION	0.99+
50	QUANTITY	0.98+
30	QUANTITY	0.98+
Yahoo	ORGANIZATION	0.98+
Strata Conference	EVENT	0.98+
DataWorks Summit	EVENT	0.98+
Hadoop	TITLE	0.98+
'15	DATE	0.97+
20 years ago	DATE	0.97+
Forrester	ORGANIZATION	0.97+
GDPR	TITLE	0.97+
second one	QUANTITY	0.97+
one data center	QUANTITY	0.97+
Github	ORGANIZATION	0.96+
about 12 years	QUANTITY	0.96+
three ways	QUANTITY	0.96+
Manhattan	LOCATION	0.95+
day two	QUANTITY	0.95+
this week	DATE	0.95+
NiFi	ORGANIZATION	0.94+
Dataplane	ORGANIZATION	0.94+
BigData	ORGANIZATION	0.94+
Hadoop World	EVENT	0.93+
billions	QUANTITY	0.93+

Yaron Haviv, iguazio | BigData NYC 2017

>> Announcer: Live from midtown Manhattan, it's theCUBE, covering BigData New York City 2017, brought to you by SiliconANGLE Media and its ecosystem sponsors. >> Okay, welcome back everyone, we're live in New York City, this is theCUBE's coverage of BigData NYC, this is our own event for five years now we've been running it, been at Hadoop World since 2010, it's our eighth year covering the Hadoop World which has evolved into Strata Conference, Strata Hadoop, now called Strata Data, and of course it's bigger than just Strata, it's about big data in NYC, a lot of big players here inside theCUBE, thought leaders, entrepreneurs, and great guests. I'm John Furrier, the cohost this week with Jim Kobielus, who's the lead analyst on our BigData and our Wikibon team. Our next guest is Yaron Haviv, who's with iguazio, he's the founder and CTO, hot startup here at the show, making a lot of waves on their new platform. Welcome to theCUBE, good to see you again, congratulations. >> Yes, thanks, thanks very much. We're happy to be here again. >> You're known in the theCUBE community as the guy on Twitter who's always pinging me and Dave and team, saying, "Hey, you know, you guys got to "get that right." You really are one of the smartest guys on the network in our community, you're super-smart, your team has got great tech chops, and in the middle of all that is the hottest market which is cloud native, cloud native as it relates to the integration of how apps are being built, and essentially new ways of engineering around these solutions, not just repackaging old stuff, it's really about putting things in a true cloud environment, with an application development, with data at the center of it, you got a whole complex platform you've introduced. So really, really want to dig into this. So before we get into some of my pointed questions I know Jim's got a ton of questions, is give us an update on what's going on so you guys got some news here at the show, let's get to that first. >> So since the last time we spoke, we had tons of news. We're making revenues, we have customers, we've just recently GA'ed, we recently got significant investment from major investors, we raised about $33 million recently from companies like Verizon Ventures, Bosch, you know for IoT, Chicago Mercantile Exchange, which is Dow Jones and other properties, Dell EMC. So pretty broad. >> John: So customers, pretty much. >> Yeah, so that's the interesting thing. Usually you know investors are sort of strategic investors or partners or potential buyers, but here it's essentially our customers that it's so strategic to the business, we want to... >> Let's go with GA of the projects, just get into what's shipping, what's available, what's the general availability, what are you now offering? >> So iguazio is trying to, you know, you alluded to cloud native and all that. Usually when you go to events like Strata and BigData it's nothing to do with cloud native, a lot of hard labor, not really continuous development and integration, it's like continuous hard work, it's continuous hard work. And essentially what we did, we created a data platform which is extremely fast and integrated, you know has all the different forms of states, streaming and events and documents and tables and all that, into a very unique architecture, won't dive into that today. And on top of it we've integrated cloud services like Kubernetes and serverless functionality and others, so we can essentially create a hybrid cloud. So some of our customers they even deploy portions as an Opix-based settings in the cloud, and some portions in the edge or in the enterprise deployed the software, or even a prepackaged appliance. So we're the only ones that provide a full hybrid experience. >> John: Is this a SAS product? >> So it's a software stack, and it could be delivered in three different options. One, if you don't want to mess with the hardware, you can just rent it, and it's deployed in Equanix facility, we have very strong partnerships with them globally. If you want to have something on-prem, you can get a software reference architecture, you go and deploy it. If you're a telco or an IoT player that wants a manufacturing facility, we have a very small 2U box, four servers, four GPUs, all the analytics tech you could think of. You just put it in the factory instead of like two racks of Hadoop. >> So you're not general purpose, you're just whatever the customer wants to deploy the stack, their flexibility is on them. >> Yeah. Now it is an appliance >> You have a hosting solution? >> It is an appliance even when you deploy it on-prem, it's a bunch of Docker containers inside that you don't even touch them, you don't SSH to the machine. You have APIs and you have UIs, and just like the cloud experience when you go to Amazon, you don't open the Kimono, you know, you just use it. So our experience that's what we're telling customers. No root access problems, no security problems. It's a hardened system. Give us servers, we'll deploy it, and you go through consoles and UIs, >> You don't host anything for anyone? >> We host for some customers, including >> So you do whatever the customer was interested in doing? >> Yes. (laughs) >> So you're flexible, okay. >> We just want to make money. >> You're pretty good, sticking to the product. So on the GA, so here essentially the big data world you mentioned that there's data layers, like data piece. So I got to ask you the question, so pretend I'm an idiot for a second, right. >> Yaron: Okay. >> Okay, yeah. >> No, you're a smart guy. >> What problem are you solving. So we'll just go to the simple. I love what you're doing, I assume you guys are super-smart, which I can say you are, but what's the problem you're solving, what's in it for me? >> Okay, so there are two problems. One is the challenge everyone wants to transform. You know there is this digital transformation mantra. And it means essentially two things. One is, I want to automate my operation environment so I can cut costs and be more competitive. The other one is I want to improve my customer engagement. You know, I want to do mobile apps which are smarter, you know get more direct content to the user, get more targeted functionality, et cetera. These are the two key challenges for every business, any industry, okay? So they go and they deploy Hadoop and Hive and all that stuff, and it takes them two years to productize it. And then they get to the data science bit. And by the time they finished they understand that this Hadoop thing can only do one thing. It's queries, and reporting and BI, and data warehousing. How do you do actionable insights from that stuff, okay? 'Cause actionable insights means I get information from the mobile app, and then I translate it into some action. I have to enrich the vectors, the machine learning, all that details. And then I need to respond. Hadoop doesn't know how to do it. So the first generation is people that pulled a lot of stuff into data lake, and started querying it and generating reports. And the boss said >> Low cost data link basically, was what you say. >> Yes, and the boss said, "Okay, what are we going to do with this report? "Is it generating any revenue to the business?" No. The only revenue generation if you take this data >> You're fired, exactly. >> No, not all fired, but now >> John: Look at the budget >> Now they're starting to buy our stuff. So now the point is okay, how can I put all this data, and in the same time generate actions, and also deal with the production aspects of, I want to develop in a beta phase, I want to promote it into production. That's cloud native architectures, okay? Hadoop is not cloud, How do I take a Spark, Zeppelin, you know, a notebook and I turn it into production? There's no way to do that. >> By the way, depending on which cloud you go to, they have a different mechanism and elements for each cloud. >> Yeah, so the cloud providers do address that because they are selling the package, >> Expands all the clouds, yeah. >> Yeah, so cloud providers are starting to have their own offerings which are all proprietary around this is how you would, you know, forget about HDFS, we'll have S3, and we'll have Redshift for you, and we'll have Athena, and again you're starting to consume that into a service. Still doesn't address the continuous analytics challenge that people have. And if you're looking at what we've done with Grab, which is amazing, they started with using Amazon services, S3, Redshift, you know, Kinesis, all that stuff, and it took them about two hours to generate the insights. Now the problem is they want to do driver incentives in real time. So they want to incent the driver to go and make more rides or other things, so they have to analyze the event of the location of the driver, the event of the location of the customers, and just throwing messages back based on analytics. So that's real time analytics, and that's not something that you can do >> They got to build that from scratch right away. I mean they can't do that with the existing. >> No, and Uber invested tons of energy around that and they don't get the same functionality. Another unique feature that we talk about in our PR >> This is for the use case you're talking about, this is the Grab, which is the car >> Grab is the number one ride-sharing in Asia, which is bigger than Uber in Asia, and they're using our platform. By the way, even Uber doesn't really use Hadoop, they use MemSQL for that stuff, so it's not really using open source and all that. But the point is for example, with Uber, when you have a, when they monetize the rides, they do it just based on demand, okay. And with Grab, now what they do, because of the capability that we can intersect tons of data in real time, they can also look at the weather, was there a terror attack or something like that. They don't want to raise the price >> A lot of other data points, could be traffic >> They don't want to raise the price if there was a problem, you know, and all the customers get aggravated. This is actually intersecting data in real time, and no one today can do that in real time beyond what we can do. >> A lot of people have semantic problems with real time, they don't even know what they mean by real time. >> Yaron: Yes. >> The data could be a week old, but they can get it to them in real time. >> But every decision, if you think if you generalize round the problem, okay, and we have slides on that that I explain to customers. Every time I run analytics, I need to look at four types of data. The context, the event, okay, what happened, okay. The second type of data is the previous state. Like I have a car, was it up or down or what's the previous state of that element? The third element is the time aggregation, like, what happened in the last hour, the average temperature, the average, you know, ticker price for the stock, et cetera, okay? And the fourth thing is enriched data, like I have a car ID, but what's the make, what's the model, who's driving it right now. That's secondary data. So every time I run a machine learning task or any decision I have to collect all those four types of data into one vector, it's called feature vector, and take a decision on that. You take Kafka, it's only the event part, okay, you take MemSQL, it's only the state part, you take Hadoop it's only like historical stuff. How do you assemble and stitch a feature vector. >> Well you talked about complex machine learning pipeline, so clearly, you're talking about a hybrid >> It's a prediction. And actions based on just dumb things, like the car broke and I need to send a garage, I don't need machine learning for that. >> So within your environment then, do you enable the machine learning models to execute across the different data platforms, of which this hybrid environment is composed, and then do you aggregate the results of those models, runs into some larger model that drives the real time decision? >> In our solution, everything is a document, so even a picture is a document, a lot of things. So you can essentially throw in a picture, run tensor flow, embed more features into the document, and then query those features on another platform. So that's really what makes this continuous analytics extremely flexible, so that's what we give customers. The first thing is simplicity. They can now build applications, you know we have tier one now, automotive customer, CIO coming, meeting us. So you know when I have a project, one year, I need to have hired dozens of people, it's hugely complex, you know. Tell us what's the use case, and we'll build a prototype. >> John: All right, well I'm going to >> One week, we gave them a prototype, and he was amazed how in one week we created an application that analyzed all the streams from the data from the cars, did enrichment, did machine learning, and provided predictions. >> Well we're going to have to come in and test you on this, because I'm skeptical, but here's why. >> Everyone is. >> We'll get to that, I mean I'm probably not skeptical but I kind of am because the history is pretty clear. If you look at some of the big ideas out there, like OpenStack. I mean that thing just morphed into a beast. Hadoop was a cost of ownership nightmare as you mentioned early on. So people have been conceptually correct on what they were trying to do, but trying to get it done was always hard, and then it took a long time to kind of figure out the operational model. So how are you different, if I'm going to play the skeptic here? You know, I've heard this before. How are you different than say OpenStack or Hadoop Clusters, 'cause that was a nightmare, cost of ownership, I couldn't get the type of value I needed, lost my budget. Why aren't you the same? >> Okay, that's interesting. I don't know if you know but I ran a lot of development for OpenStack when I was in Matinox and Hadoop, so I patched a lot of those >> So do you agree with what I said? That that was a problem? >> They are extremely complex, yes. And I think one of the things that first OpenStack tried to bite on too much, and it's sort of a huge tent, everyone tries to push his agenda. OpenStack is still an infrastructure layer, okay. And also Hadoop is sort of a something in between an infrastructure and an application layer, but it was designed 10 years ago, where the problem that Hadoop tried to solve is how do you do web ranking, okay, on tons of batch data. And then the ecosystem evolved into real time, and streaming and machine learning. >> A data warehousing alternative or whatever. >> So it doesn't fit the original model of batch processing, 'cause if an event comes from the car or an IoT device, and you have to do something with it, you need a table with an index. You can't just go and build a huge Parquet file. >> You know, you're talking about complexity >> John: That's why he's different. >> Go ahead. >> So what we've done with our team, after knowing OpenStack and all those >> John: All the scar tissue. >> And all the scar tissues, and my role was also working with all the cloud service providers, so I know their internal architecture, and I worked on SAP HANA and Exodata and all those things, so we learned from the bad experiences, said let's forget about the lower layers, which is what OpenStack is trying to provide, provide you infrastructure as a service. Let's focus on the application, and build from the application all the way to the flash, and the CPU instruction set, and the adapters and the networking, okay. That's what's different. So what we provide is an application and service experience. We don't provide infrastructure. If you go buy VMware and Nutanix, all those offerings, you get infrastructure. Now you go and build with the dozen of dev ops guys all the stack above. You go to Amazon, you get services. Just they're not the most optimized in terms of the implementation because they also have dozens of independent projects that each one takes a VM and starts writing some >> But they're still a good service, but you got to put it together. >> Yeah right. But also the way they implement, because in order for them to scale is that they have a common layer, they found VMs, and then they're starting to build up applications so it's inefficient. And also a lot of it is built on 10-year-old baseline architecture. We've designed it for a very modern architecture, it's all parallel CPUs with 30 cores, you know, flash and NVMe. And so we've avoided a lot of the hardware challenges, and serialization, and just provide and abstraction layer pretty much like a cloud on top. >> Now in terms of abstraction layers in the cloud, they're efficient, and provide a simplification experience for developers. Serverless computing is up and coming, it's an important approach, of course we have the public clouds from AWS and Google and IBM and Microsoft. There are a growing range of serverless computing frameworks for prem-based deployment. I believe you are behind one. Can you talk about what you're doing at iguazio on serverless frameworks for on-prem or public? >> Yes, it's the first time I'm very active in CNC after Cloud Native Foundation. I'm one of the authors of the serverless white paper, which tries to normalize the definitions of all the vendors and come with a proposal for interoperable standard. So I spent a lot of energy on that, 'cause we don't want to lock customers to an API. What's unique, by the way, about our solution, we don't have a single proprietary API. We just emulate all the other guys' stuff. We have all the Amazon APIs for data services, like Kinesis, Dynamo, S3, et cetera. We have the open source APIs, like Kafka. So also on the serverless, my agenda is trying to promote that if I'm writing to Azure or AWS or iguazio, I don't need to change my app. I can use any developer tools. So that's my effort there. And we recently, a few weeks ago, we launched our open source project, which is a sort of second generation of something we had before called Nuclio. It's designed for real time >> John: How do you spell that? >> N-U-C-L-I-O. I even have the logo >> He's got a nice slick here. >> It's really fast because it's >> John: Nuclio, so that's open source that you guys just sponsor and it's all code out in the open? >> All the code is in the open, pretty cool, has a lot of innovative ideas on how to do stream processing and best, 'cause the original serverless functionality was designed around web hooks and HTTP, and even many of the open source projects are really designed around HTTP serving. >> I have a question. I'm doing research for Wikibon on the area of serverless, in fact we've recently published a report on serverless, and in terms of hybrid cloud environments, I'm not seeing yet any hybrid serverless clouds that involve public, you know, serverless like AWS Lambda, and private on-prem deployment of serverless. Do you have any customers who are doing that or interested in hybridizing serverless across public and private? >> Of course, and we have some patents I don't want to go into, but the general idea is, what we've done in Nuclio is also the decoupling of the data from the computation, which means that things can sort of be disjoined. You can run a function in Raspberry Pi, and the data will be in a different place, and those things can sort of move, okay. >> So the persistence has to happen outside the serverless environment, like in the application itself? >> Outside of the function, the function acts as the persistent layer through APIs, okay. And how this data persistence is materialized, that server separate thing. So you can actually write the same function that will run against Kafka or Kinesis or Private MQ, or HTTP without modifying the function, and ad hoc, through what we call function bindings, you define what's going to be the thing driving the data, or storing the data. So that can actually write the same function that does ETL drop from table one to table two. You don't need to put the table information in the function, which is not the thing that Lambda does. And it's about a hundred times faster than Lambda, we do 400,000 events per second in Nuclio. So if you write your serverless code in Nuclio, it's faster than writing it yourself, because of all those low-level optimizations. >> Yaron, thanks for coming on theCUBE. We want to do a deeper dive, love to have you out in Palo Alto next time you're in town. Let us know when you're in Silicon Valley for sure, we'll make sure we get you on camera for multiple sessions. >> And more information re:Invent. >> Go to re:Invent. We're looking forward to seeing you there. Love the continuous analytics message, I think continuous integration is going through a massive renaissance right now, you're starting to see new approaches, and I think things that you're doing is exactly along the lines of what the world wants, which is alternatives, innovation, and thanks for sharing on theCUBE. >> Great. >> That's very great. >> This is theCUBE coverage of the hot startups here at BigData NYC, live coverage from New York, after this short break. I'm John Furrier, Jim Kobielus, after this short break.

Published Date : Sep 27 2017

SUMMARY :

brought to you by SiliconANGLE Media I'm John Furrier, the cohost this week with Jim Kobielus, We're happy to be here again. and in the middle of all that is the hottest market So since the last time we spoke, we had tons of news. Yeah, so that's the interesting thing. and some portions in the edge or in the enterprise all the analytics tech you could think of. So you're not general purpose, you're just Now it is an appliance and just like the cloud experience when you go to Amazon, So I got to ask you the question, which I can say you are, So the first generation is people that basically, was what you say. Yes, and the boss said, and in the same time generate actions, By the way, depending on which cloud you go to, and that's not something that you can do I mean they can't do that with the existing. and they don't get the same functionality. because of the capability that we can intersect and all the customers get aggravated. A lot of people have semantic problems with real time, but they can get it to them in real time. the average temperature, the average, you know, like the car broke and I need to send a garage, So you know when I have a project, an application that analyzed all the streams from the data Well we're going to have to come in and test you on this, but I kind of am because the history is pretty clear. I don't know if you know but I ran a lot of development is how do you do web ranking, okay, and you have to do something with it, and build from the application all the way to the flash, but you got to put it together. it's all parallel CPUs with 30 cores, you know, Now in terms of abstraction layers in the cloud, So also on the serverless, my agenda is trying to promote I even have the logo and even many of the open source projects on the area of serverless, in fact we've recently and the data will be in a different place, So if you write your serverless code in Nuclio, We want to do a deeper dive, love to have you is exactly along the lines of what the world wants, I'm John Furrier, Jim Kobielus, after this short break.

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Bosch	ORGANIZATION	0.99+
Uber	ORGANIZATION	0.99+
John	PERSON	0.99+
John Furrier	PERSON	0.99+
Verizon Ventures	ORGANIZATION	0.99+
Yaron Haviv	PERSON	0.99+
Asia	LOCATION	0.99+
NYC	LOCATION	0.99+
Google	ORGANIZATION	0.99+
New York City	LOCATION	0.99+
Jim	PERSON	0.99+
Palo Alto	LOCATION	0.99+
30 cores	QUANTITY	0.99+
New York	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
two years	QUANTITY	0.99+
BigData	ORGANIZATION	0.99+
Silicon Valley	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
five years	QUANTITY	0.99+
two problems	QUANTITY	0.99+
Dell EMC	ORGANIZATION	0.99+
Yaron	PERSON	0.99+
One	QUANTITY	0.99+
Dave	PERSON	0.99+
Kafka	TITLE	0.99+
third element	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Dow Jones	ORGANIZATION	0.99+
two things	QUANTITY	0.99+
two racks	QUANTITY	0.99+
today	DATE	0.99+
Grab	ORGANIZATION	0.99+
Nuclio	TITLE	0.99+
two key challenges	QUANTITY	0.99+
Cloud Native Foundation	ORGANIZATION	0.99+
about $33 million	QUANTITY	0.99+
eighth year	QUANTITY	0.99+
Hadoop	TITLE	0.98+
second type	QUANTITY	0.98+
Lambda	TITLE	0.98+
10 years ago	DATE	0.98+
each cloud	QUANTITY	0.98+
Strata Conference	EVENT	0.98+
Equanix	LOCATION	0.98+
10-year-old	QUANTITY	0.98+
first thing	QUANTITY	0.98+
first generation	QUANTITY	0.98+
one	QUANTITY	0.98+
second generation	QUANTITY	0.98+
Hadoop World	EVENT	0.98+
first time	QUANTITY	0.98+
theCUBE	ORGANIZATION	0.97+
Nutanix	ORGANIZATION	0.97+
MemSQL	TITLE	0.97+
each one	QUANTITY	0.97+
2010	DATE	0.97+
Kinesis	TITLE	0.97+
SAS	ORGANIZATION	0.96+
Wikibon	ORGANIZATION	0.96+
Chicago Mercantile Exchange	ORGANIZATION	0.96+
about two hours	QUANTITY	0.96+
this week	DATE	0.96+
one thing	QUANTITY	0.95+
dozen	QUANTITY	0.95+

Elaine Yeung, Holberton School | Open Source Summit 2017

(upbeat music) >> Narrator: Live from Los Angeles it's The Cube covering Open Source Summit North America 2017. Brought to you by the Lennox Foundation and Red Hat. >> Welcome back, everyone. Live in Los Angeles for The Cube's exclusive coverage of the Open Source Summit North America. I'm John Furrier, your host, with my co-host, Stu Miniman. Our next guest is Elaine Yeung, @egsy on Twitter, check her out. Student at Holberton School? >> At Holberton School. >> Holberton School. >> And that's in San Francisco? >> I'm like reffing the school right here. (laughs) >> Looking good. You look great, so. Open Source is a new generation. It's going to go from 64 million libraries to 400 million by 2026. New developers are coming in. It's a whole new vibe. >> Elaine: Right. >> What's your take on this, looking at this industry right now? Looking at all this old, the old guard, the new guard's coming in, a lot of cool things happening. Apple's new ARKit was announced today. You saw VR and ARs booming, multimedia. >> Elaine: Got that newer home button. Right, like I-- >> It's just killer stuff happening. >> Stu: (laughs) >> I mean, one of the reason why I wanted to go into tech, and this is why I, like, when I told them that I applied to Holberton School, was that I really think at whatever next social revolution we have, technology is going to be somehow interval to it. It's probably not even, like, an existing technology right now. And, as someone who's just, like, social justice-minded, I wanted to be able to contribute in that way, so. >> John: Yeah. >> And develop a skillset that way. >> Well, we saw the keynote, Christine Corbett Moran, was talking really hardcore about code driving culture. This is happening. >> Elaine: Right. So this is not, like, you know, maybe going to happen, we're starting to see it. We're starting to see the culture being shaped by code. And notions of ruling classes and elites potentially becoming democratized 100% because now software, the guys and gals doing it are acting on it and they have a mindset-- >> Elaine: Right. >> That come from a community. So this is interesting dynamic. As you look at that, do you think that's closer to reality? Where in your mind's eye do you see it? 'Cause you're in the front lines. You're young, a student, you're immersed in that, in all the action. I wish I was in your position and all these great AI libraries. You got TensorFlow from Google, you have all this goodness-- >> Elaine: Right. >> Kind of coming in, I mean-- >> So you're, so let me make sure I am hearing your question right. So, you're asking, like, how do I feel about the democratization of, like, educ-- >> John: Yeah, yeah. Do you feel it? Are you there? Is it happening faster? >> Well, I mean, things are happening faster. I mean, I didn't have any idea of, like, how to use a terminal before January. I didn't know, like, I didn't know my way around Lennox or GitHub, or how to push a commit, (laughs) until I started at Holberton School, so. In that sense, I'm actually experiencing this democratization of-- >> John: Yeah. >> Of education. The whole, like, reason I'm able to go to this school is because they actually invest in the students first, and we don't have to pay tuition when we enroll. It's only after we are hired or actually, until we have a job, and then we do an income-share agreement. So, like, it's really-- >> John: That's cool. >> It's really cool to have, like, a school where they're basically saying, like, "We trust in the education that we're going to give you "so strongly that you're not going to pay up front. >> John: Yeah. >> "Because we know you're going to get a solid job and "you'll pay us at that point-- >> John: Takes a lot of pressure off, too. >> Yeah. >> John: 'Cause then you don't have to worry about that overhang. >> Exactly! I wrote about that in my essay as well. Yeah, just, like because who wants to, like, worry about student debt, like, while you're studying? So, now I can fully focus on learning C, learning Python (laughs) (mumbles) and stuff. >> Alright, what's the coolest thing that you've done, that's cool, that you've gotten, like, motivated on 'cause you're getting your hands dirty, you get the addiction. >> Stu: (laughs) >> Take us through the day in the life of like, "Wow, this is a killer." >> Elaine: I don't know. Normally, (laughs) I'm just kind of a cool person, so I feel like everything I-- no, no. (laughs) >> John: That's a good, that's the best answer we heard. >> (laughs) Okay, so we had a battle, a rap battle, at my school of programming languages. And so, I wrote a rap about Bash scripts and (laughs) that is somewhere on the internet. And, I'm pretty sure that's, like, one of the coolest things. And actually, coming out here, one of my school leaders, Sylvain, he told me, he was like, "You should actually put that, "like, pretty, like, front and center on your "like, LinkedIn." Or whatever, my profile. And what was cool, was when I meet Linus yesterday, someone who had seen my rap was there and it's almost like it was, like, set up because he was like, "Oh, are you the one "that was rapping Bash?" And, I was like, "Well, why yes, that was me." (laughs) >> John: (laughs) >> And then Linus said it was like, what did he say? He was like, "Oh, that's like Weird Al level." Like, just the fact that I would make up a rap about Bash Scripts. (laughs) >> John: That's so cool. So, is that on your Twitter handle? Can we find that on your Twitter handle? >> Yes, you can. I will-- >> Okay, E-G-S-Y. >> Yes. >> So, Elaine, you won an award to be able to come to this show. What's your take been on the show so far? What was exciting about you? And, what's your experience been so far? >> To come to the Summit. >> Stu: Yeah. >> Well, so, when I was in education as a dean, we did a lot of backwards planning. And so, I think for me, like, that's just sort of (claps hands). I was looking into the future, and I knew that in October I would need to, like, start looking for an internship. And so, one of my hopes coming out here was that I would be able to expand my network. And so, like that has been already, like that has happened like more than I even expected in terms of being able to meet new people, come out here and just, like, learn new things, but also just like hear from all these, everyone's experience in the industry. Everyone's been just super awesome (laughs) and super positive here. >> Yeah. We usually find, especially at the Open Source shows, almost everyone's hiring. You know, there's huge demand for software developers. Maybe tell us a little bit about Holberton school, you know, and how they're helping, you know, ramp people up and be ready for kind of this world? >> Yeah. So, it's a two-year higher education alternative, and it is nine months of programming. So, we do, and that's split up into three months low-level, so we actually we did C, where we, you know, programmed our own shell, we programmed printf. Then after that we followed with high-levels. So we studied Python, and now we're in our CIS Admin track. So we're finishing out the last three months. And, like, throughout it there's been a little bit, like, intermix. Like, we did binary trees a couple weeks ago, and so that was back in C. And so, I love it when they're, like, throwing, like, C at us when we've been doing Python for a couple weeks, and I'm like, "Dammit, I have to put semicolons (laughs) >> John: (laughs) >> "And start compiling. "Why do we have to compile this?" Oh, anyway, so, offtrack. Okay, so after those nine months, and then it's a six month internship, and after that it's nine months of specialization. And so there's different spec-- you can specialize in high-level, low-level, they'll work with you in whatever you, whatever the student, their interests are in. And you can do that either full-time student or do it part-time. Which most of the students that are in the first batch that started in January 2016, they're, most of them are, like, still working, are still working, and then they're doing their nine month specialization as, like, part-time students. >> Final question for you, Elaine. Share your personal thoughts on, as you're immersed in the coding and learning, you see the community, you meet some great people here, network expanding, what are you excited about going forward? As you look out there, as you finish it up and getting involved, what's exciting to you in the world ahead of you? What do you think you're going to jump into? What's popping out and revealing itself to you? >> I think coming to the conference and hearing Jim speak about just how diversity is important and also hearing from multiple speakers and sessions about the importance of collaboration and contributions, I just feel like Lennox and Open Source, this whole movement is just a really, it's a step in the right direction, I believe. And it's just, I think the recognition that by being diverse that we are going to be stronger for it, that is super exciting to me. >> John: Yeah. >> Yeah, and I just hope to be able to-- >> John: Yeah (mumbles) >> I mean, I know I'm going to be able to add to that soon. (laughs) >> Well, you certainly are. Thanks for coming on The Cube. Congratulations on your success. Thanks for coming, appreciate it. >> Elaine: Thank you, thank you. >> And this is The Cube coverage, live in LA, for Open Source Summit North America. I'm John Furrier, Stu Miniman. More live coverage after this short break. (upbeat music)

Published Date : Sep 12 2017

SUMMARY :

Brought to you by the Lennox Foundation and Red Hat. of the Open Source Summit North America. I'm like reffing the school It's going to go from 64 million libraries What's your take on this, Elaine: Got that newer I mean, one of the reason why I wanted to go into tech, Well, we saw the keynote, Christine Corbett Moran, you know, maybe going to happen, As you look at that, do you think that's closer to reality? so let me make sure I am hearing your question right. Do you feel it? I mean, I didn't have any idea of, like, and we don't have to pay tuition when we enroll. "so strongly that you're not going to pay up front. John: Takes a lot John: 'Cause then you don't have to worry (laughs) (mumbles) and stuff. you get the addiction. "Wow, this is a killer." Elaine: I don't know. that's the best answer we heard. and (laughs) that is somewhere on the internet. And then Linus said it was like, what did he say? So, is that on your Twitter handle? Yes, you can. So, Elaine, you won an award And so, like that has been already, you know, and how they're helping, you know, and so that was back in C. And you can do that either full-time student What do you think you're going to jump into? that by being diverse that we are going to be stronger for it, I mean, I know I'm going to Well, you certainly are. And this is The Cube coverage, live in LA,

ENTITIES

Entity	Category	Confidence
Elaine	PERSON	0.99+
John	PERSON	0.99+
Linus	PERSON	0.99+
Elaine Yeung	PERSON	0.99+
Stu Miniman	PERSON	0.99+
January 2016	DATE	0.99+
Sylvain	PERSON	0.99+
John Furrier	PERSON	0.99+
Red Hat	ORGANIZATION	0.99+
Jim	PERSON	0.99+
San Francisco	LOCATION	0.99+
October	DATE	0.99+
nine months	QUANTITY	0.99+
LA	LOCATION	0.99+
two-year	QUANTITY	0.99+
Christine Corbett Moran	PERSON	0.99+
Apple	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
Lennox Foundation	ORGANIZATION	0.99+
six month	QUANTITY	0.99+
Python	TITLE	0.99+
Los Angeles	LOCATION	0.99+
Holberton School	ORGANIZATION	0.99+
GitHub	ORGANIZATION	0.99+
three months	QUANTITY	0.99+
nine month	QUANTITY	0.99+
400 million	QUANTITY	0.99+
Open Source Summit North America	EVENT	0.99+
@egsy	PERSON	0.99+
yesterday	DATE	0.99+
Stu	PERSON	0.99+
Lennox	ORGANIZATION	0.98+
2026	DATE	0.98+
Open Source Summit	EVENT	0.98+
Open Source Summit North America 2017	EVENT	0.98+
Google	ORGANIZATION	0.97+
one	QUANTITY	0.96+
Weird Al	PERSON	0.95+
ARKit	TITLE	0.95+
first	QUANTITY	0.95+
first batch	QUANTITY	0.94+
today	DATE	0.93+
Open Source Summit 2017	EVENT	0.92+
64 million libraries	QUANTITY	0.89+
The Cube	ORGANIZATION	0.89+
Bash Scripts	TITLE	0.88+
C	TITLE	0.88+
LinkedIn	ORGANIZATION	0.87+
Open	ORGANIZATION	0.86+
couple weeks ago	DATE	0.85+
North America	LOCATION	0.81+
Holberton school	ORGANIZATION	0.81+
Twitter	ORGANIZATION	0.78+
TensorFlow	TITLE	0.76+
C.	TITLE	0.68+
Bash	TITLE	0.68+
E-G-S-Y	TITLE	0.65+
school	QUANTITY	0.64+
last three months	DATE	0.58+

Dr. Jisheng Wang, Hewlett Packard Enterprise, Spark Summit 2017 - #SparkSummit - #theCUBE

>> Announcer: Live from San Francisco, it's theCUBE covering Sparks Summit 2017 brought to you by Databricks. >> You are watching theCUBE at Sparks Summit 2017. We continue our coverage here talking with developers, partners, customers, all things Spark, and today we're honored now to have our next guest Dr. Jisheng Wang who's the Senior Director of Data Science at the CTO Office at Hewlett Packard Enterprise. Dr. Wang, welcome to the show. >> Yeah, thanks for having me here. >> All right and also to my right we have Mr. Jim Kobielus who's the Lead Analyst for Data Science at Wikibon. Welcome, Jim. >> Great to be here like always. >> Well let's jump into it. At first I want to ask about your background a little bit. We were talking about the organization, maybe you could do a better job (laughs) of telling me where you came from and you just recently joined HPE. >> Yes. I actually recently joined HPE earlier this year through the Niara acquisition, and now I'm the Senior Director of Data Science in the CTO Office of Aruba. Actually, Aruba you probably know like two years back, HP acquired Aruba as a wireless networking company, and now Aruba takes charge of the whole enterprise networking business in HP which is about over three billion annual revenue every year now. >> Host: That's not confusing at all. I can follow you (laughs). >> Yes, okay. >> Well all I know is you're doing some exciting stuff with Spark, so maybe tell us about this new solution that you're developing. >> Yes, actually my most experience of Spark now goes back to the Niara time, so Niara was a three and a half year old startup that invented, reinvented the enterprise security using big data and data science. So what is the problem we solved, we tried to solve in Niara is called a UEBA, user and entity behavioral analytics. So I'll just try to be very brief here. Most of the transitional security solutions focus on detecting attackers from outside, but what if the origin of the attacker is inside the enterprise, say Snowden, what can you do? So you probably heard of many cases today employees leaving the company by stealing lots of the company's IP and sensitive data. So UEBA is a new solution try to monitor the behavioral change of the enterprise users to detect both this kind of malicious insider and also the compromised user. >> Host: Behavioral analytics. >> Yes, so it sounds like it's a native analytics which we run like a product. >> Yeah and Jim you've done a lot of work in the industry on this, so any questions you might have for him around UEBA? >> Yeah, give us a sense for how you're incorporating streaming analytics and machine learning into that UEBA solution and then where Spark fits into the overall approach that you take? >> Right, okay. So actually when we started three and a half years back, the first version when we developed the first version of the data pipeline, we used a mix of Hadoop, YARN, Spark, even Apache Storm for different kind of stream and batch analytics work. But soon after with increased maturity and also the momentum from this open source Apache Spark community, we migrated all our stream and batch, you know the ETL and data analytics work into Spark. And it's not just Spark. It's Spark, Spark streaming, MLE, the whole ecosystem of that. So there are at least a couple advantages we have experienced through this kind of a transition. The first thing which really helped us is the simplification of the infrastructure and also the reduction of the DevOps efforts there. >> So simplification around Spark, the whole stack of Spark that you mentioned. >> Yes. >> Okay. >> So for the Niara solution originally, we supported, even here today, we supported both the on-premise and the cloud deployment. For the cloud we also supported the public cloud like AWS, Microsoft Azure, and also Privia Cloud. So you can understand with, if we have to maintain a stack of different like open source tools over this kind of many different deployments, the overhead of doing the DevOps work to monitor, alarming, debugging this kind of infrastructure over different deployments is very hard. So Spark provides us some unified platform. We can integrate the streaming, you know batch, real-time, near real-time, or even longterm batch job all together. So that heavily reduced both the expertise and also the effort required for the DevOps. This is one of the biggest advantages we experienced, and certainly we also experienced something like the scalability, performance, and also the convenience for developers to develop a new applications, all of this, from Spark. >> So are you using the Spark structured streaming runtime inside of your application? Is that true? >> We actually use Spark in the steaming processing when the data, so like in the UEBS solutions, the first thing is collecting a lot of the data, different account data source, network data, cloud application data. So when the data comes in, the first thing is streaming job for the ETL, to process the data. Then after that, we actually also develop the some, like different frequency like one minute, 10 minute, one hour, one day of this analytics job on top of that. And even recently we have started some early adoption of the deep learning into this, how to use deep learning to monitor the user behavior change over time, especially after user gives a notice what user, is user going to access like most servers or download some of the sensitive data? So all of this requires very complex analytics infrastructure. >> Now there were some announcements today here at Spark Summit by Databricks of adding deep learning support to their core Spark code base. What are your thoughts about the deep learning pipelines, API, that they announced this morning? It's new news, I'll understand if you don't, haven't digested it totally, but you probably have some good thoughts on the topic. >> Yes, actually this is also news for me, so I can just speak from my current experience. How to integrate deep learning into Spark actually was a big challenge so far for us because what we used so far, the deep learning piece, we used TensorFlow. And certainly most of our other stream and data massaging or ETL work is done by Spark. So in this case, there are a couple ways to manage this, too. One is to set up two separate resource pool, one for Spark, the other one for TensorFlow, but in our deployment there is some very small on-premise department which has only like four node or five node cluster. It's not efficient to split resource in that way. So we actually also looking for some closer integration between deep learning and Spark. So one thing we looked before is called the TensorFlow on Spark which was open source a couple months ago by Yahoo. >> Right. >> So maybe this is certainly more exciting news for the Spark team to develop this native integration. >> Jim: Very good. >> Okay and we talked about the UEBA solution, but let's go back to a little broader HPE perspective. You have this concept called the intelligent edge, what's that all about? >> So that's a very cool name. Actually come a little bit back. I come from the enterprise background, and enterprise applications have some, actually a lag behind than consumer applications in terms of the adoption of the new data science technology. So there are some native challenges for that. For example, collecting and storing large amount of this enterprise sensitive data is a huge concern, especially in European countries. Also for the similar reason how to collect, normally weigh developer enterprise applications. You're lack of some good quantity and quality of the trending data. So this is some native challenges when you develop enterprise applications, but even despite of this, HPE and Aruba recently made several acquisitions of analytics companies to accelerate the adoption of analytics into different product line. Actually that intelligent age comes from this IOT, which is internet of things, is expected to be the fastest growing market in the next few years here. >> So are you going to be integrating the UEBA behavioral analytics and Spark capability into your IOT portfolio at HP? Is that a strategy or direction for you? >> Yes. Yes, for the big picture that certainly is. So you can think, I think some of the Gartner Report expected the number of the IOT devices is going to grow over 20 billion by 2020. Since all of this IOT devices are connected to either intranet or internet, either through wire or wireless, so as a networking company, we have the advantage of collecting data and even take some actions at the first of place. So the idea of this intelligent age is we want to turn each of these IOT devices, the small IOT devices like IP camera, like those motion detection, all of these small devices as opposed to the distributed sensor for the data collection and also some inline actor to do some real-time or even close to real-time decisions. For example, the behavior anomaly detection is a very good example here. If IOT devices is compromised, if the IP camera has been compromised, then use that to steal your internal data. We should detect and stop that at the first place. >> Can you tell me about the challenges of putting deep learning algorithms natively on resource constrained endpoints in the IOT? That must be really challenging to get them to perform well considering that there may be just a little bit of memory or flash capacity or whatever on the endpoints. Any thoughts about how that can be done effectively and efficiently? >> Very good question >> And at low cost. >> Yes, very good question. So there are two aspects into this. First is this global training of the intelligence which is not going to be done on each of the device. In that case, each of the device is more like the sensor for the data collection. So we are going to build a, collect the data sent to the cloud, or build all of this giant pool, like computing resource to trend the classifier, to trend the model, but when we trend the model, we are going to ship the model, so the inference and the detection of the model of those behavioral anomaly really happen on the endpoint. >> Do the training centrally and then push the trained algorithms down to the edge devices. >> Yes. But even like, the second as well even like you said, some of the device like say people try to put those small chips in the spoon, in the case of, in hospital to make it like more intelligent, you cannot put even just the detection piece there. So we also looking to some new technology. I know like Caffe recently announced, released some of the lightweight deep learning models. Also there's some, your probably know, there's some of the improvement from the chip industry. >> Jim: Yes. >> How to optimize the chip design for this kind of more analytics driven task there. So we are all looking to this different areas now. >> We have just a couple minutes left, and Jim you get one last question after this, but I got to ask you, what's on your wishlist? What do you wish you could learn or maybe what did you come to Spark Summit hoping to take away? >> I've always treated myself as a technical developer. One thing I am very excited these days is the emerging of the new technology, like a Spark, like TensorFlow, like Caffe, even Big-Deal which was announced this morning. So this is something like the first go, when I come to this big advanced industry events, I want to learn the new technology. And the second thing is mostly to share our experience and also about adopting of this new technology and also learn from other colleagues from different industries, how people change life, disrupt the old industry by taking advantage of the new technologies here. >> The community's growing fast. I'm sure you're going to receive what you're looking for. And Jim, final question? >> Yeah, I heard you mention DevOps and Spark in same context, and that's a huge theme we're seeing, more DevOps is being wrapped around the lifecycle of development and training and deployment of machine learning models. If you could have your ideal DevOps tool for Spark developers, what would it look like? What would it do in a nutshell? >> Actually it's still, I just share my personal experience. In Niara, we actually developed a lot of the in-house DevOps tools like for example, when you run a lot of different Spark jobs, stream, batch, like one minute batch verus one day batch job, how do you monitor the status of those workflows? How do you know when the data stop coming? How do you know when the workflow failed? Then even how, monitor is a big thing and then alarming when you have something failure or something wrong, how do you alarm it, and also the debug is another big challenge. So I certainly see the growing effort from both Databricks and the community on different aspects of that. >> Jim: Very good. >> All right, so I'm going to ask you for kind of a soundbite summary. I'm going to put you on the spot here, you're in an elevator and I want you to answer this one question. Spark has enabled me to do blank better than ever before. >> Certainly, certainly. I think as I explained before, it helped a lot from both the developer, even the start-up try to disrupt some industry. It helps a lot, and I'm really excited to see this deep learning integration, all different road map report, you know, down the road. I think they're on the right track. >> All right. Dr. Wang, thank you so much for spending some time with us. We appreciate it and go enjoy the rest of your day. >> Yeah, thanks for being here. >> And thank you for watching the Cube. We're here at Spark Summit 2017. We'll be back after the break with another guest. (easygoing electronic music)

Published Date : Jun 6 2017

SUMMARY :

brought to you by Databricks. at the CTO Office at Hewlett Packard Enterprise. All right and also to my right we have Mr. Jim Kobielus (laughs) of telling me where you came from of the whole enterprise networking business I can follow you (laughs). that you're developing. of the company's IP and sensitive data. Yes, so it sounds like it's a native analytics of the data pipeline, we used a mix of Hadoop, YARN, the whole stack of Spark that you mentioned. We can integrate the streaming, you know batch, of the deep learning into this, but you probably have some good thoughts on the topic. one for Spark, the other one for TensorFlow, for the Spark team to develop this native integration. Okay and we talked about the UEBA solution, Also for the similar reason how to collect, of the IOT devices is going to grow natively on resource constrained endpoints in the IOT? collect the data sent to the cloud, Do the training centrally But even like, the second as well even like you said, So we are all looking to this different areas now. And the second thing is mostly to share our experience And Jim, final question? If you could have your ideal DevOps tool So I certainly see the growing effort All right, so I'm going to ask you even the start-up try to disrupt some industry. We appreciate it and go enjoy the rest of your day. We'll be back after the break with another guest.

ENTITIES

Entity	Category	Confidence
Jim	PERSON	0.99+
HPE	ORGANIZATION	0.99+
HP	ORGANIZATION	0.99+
10 minute	QUANTITY	0.99+
one hour	QUANTITY	0.99+
one minute	QUANTITY	0.99+
Wang	PERSON	0.99+
San Francisco	LOCATION	0.99+
Yahoo	ORGANIZATION	0.99+
Jisheng Wang	PERSON	0.99+
Niara	ORGANIZATION	0.99+
first version	QUANTITY	0.99+
one day	QUANTITY	0.99+
two aspects	QUANTITY	0.99+
Jim Kobielus	PERSON	0.99+
Hewlett Packard Enterprise	ORGANIZATION	0.99+
First	QUANTITY	0.99+
Caffe	ORGANIZATION	0.99+
Spark	TITLE	0.99+
Spark	ORGANIZATION	0.99+
one	QUANTITY	0.99+
each	QUANTITY	0.99+
three and a half year	QUANTITY	0.99+
both	QUANTITY	0.99+
Sparks Summit 2017	EVENT	0.99+
first	QUANTITY	0.99+
DevOps	TITLE	0.99+
2020	DATE	0.99+
second thing	QUANTITY	0.99+
Aruba	ORGANIZATION	0.98+
Snowden	PERSON	0.98+
two years back	DATE	0.98+
first thing	QUANTITY	0.98+
one last question	QUANTITY	0.98+
AWS	ORGANIZATION	0.98+
over 20 billion	QUANTITY	0.98+
one question	QUANTITY	0.98+
UEBA	TITLE	0.98+
today	DATE	0.98+
Spark Summit	EVENT	0.97+
Microsoft	ORGANIZATION	0.97+
Spark Summit 2017	EVENT	0.96+
Apache	ORGANIZATION	0.96+
three and a half years back	DATE	0.96+
Databricks	ORGANIZATION	0.96+
one day batch	QUANTITY	0.96+
earlier this year	DATE	0.94+
Aruba	LOCATION	0.94+
One	QUANTITY	0.94+
#SparkSummit	EVENT	0.94+
One thing	QUANTITY	0.94+
one thing	QUANTITY	0.94+
European	LOCATION	0.94+
Gartner	ORGANIZATION	0.93+

Wikibon Big Data Market Update Pt. 1 - Spark Summit East 2017 - #sparksummit - #theCUBE

>> [Announcer] Live from Boston, Massachusetts, this is theCUBE, covering Spark Summit East 2017, brought to you by Databricks. Now, here are your hosts, Dave Vellante and George Gilbert. >> We're back, welcome to Boston, everybody, this is a special presentation that George Gilbert and I are going to provide to you now. SiliconANGLE Media is the umbrella brand of our company, and we've got three sub-brands. One of them is Wikibon, it's the research organization that Gorge works in, and then of course, we have theCUBE and then SiliconANGLE, which is the tech publication, and then we extensively, as you may know, use CrowdChat and other social data, but we want to drill down now on the Wikibon, Wikibon research side of things. Wikibon was the first research company ever to do a big data forecast. Many, many years ago, our friend Jeff Kelly produced that for several years, we opensourced it, and it really, I think helped the industry a lot, sort of framing the big data opportunity, and then George last year did the first Spark forecast, really Spark adoption, so what we want to do now is talk about some of the trends in the marketplace, this is going to be done in two parts, today's part one, and we're really going to talk about the overall market trends and the market conditions, and then we're going to go to part two tomorrow, where you're going to release some of the numbers, right? And we'll share some of the numbers today. So, we're going to start on the first slide here, we're going to share with you some slides. The Wikibon forecast review, and George is going to, I'm going to ask you to talk about where we are at with big data apps, everybody's saying it's peaked, big data's now going mainstream, where are we at with big data apps? >> [George] Okay, so, I want to quote, just to provide context, the former CTO on VMware, Steve Herrod. He said, "In the end, it wasn't big data, "it was big analytics." And what's interesting is that when we start thinking about it, there have been three classes of, there have been traditionally two classes of workloads, one batch, and in the context of analytics, that means running reports in the background, doing offline business intelligence, but then there was also the interactive-type work. What's emerging is something that's continuously happening, and it doesn't mean that all apps are going to be always on, it just means that there are, all apps will have a batch component, an interactive component, like with the user, and then a streaming, or continuous component. >> [Dave] So it's a new type of workload. >> Yes. >> Okay. Anything else you want to point out here? >> Yeah, what's worth mentioning, this is, it's not like it's going to burst fully-formed out of the clouds, and become sort of a new standard, there's two things that has to happen, the technology has to mature, so right now you have some pretty tough trade-offs between integration, which provides simplicity, and choice and optimization, which gives you fragmentation, and then skillset, and both of those need to develop. >> [Dave] Alright, we're going to talk about both of those a little bit later in this segment. Let's go to the next slide, which really talks to some of the high-level forecast that we released last year, so these are last year's numbers, correct? >> Yes, yes. >> [Dave] Okay, so, what's changed? You've got the ogive curve, which is sort of the streaming penetration, Spark/streaming, that's what, was last year, this is now reflective of continuous, you'll be updating that, how is this changing, what do you want us to know here? >> [George] Okay, so the key takeaways here are, first, we took three application patterns, the first being the data lake, which is sort of the original canonical repository of all your data. That never goes away, but on top of it, you layer what we were calling last year systems of engagement, which is where you've got the interactive machine learning component helping to anticipate and influence a user's decision, and then on top of that, which was the aqua color, was the self-tuning systems, which is probably more IIoT stuff, where you've got a whole ecosystem of devices and intelligence in the cloud and at the edge, and you don't necessarily need a human in the loop. But, these now, when you look at them, you can break them down as having three types of workloads, the batch, the interactive, and the continuous. >> Okay, and that is sort of a new workload here, and this is a real big theme of your research now is, we all remember, no, we don't all remember, I remember punch cards, that's the ultimate batch, and then of course, the terminals were interactive, and you think of that as closer to real time, but now, this notion of continuous, if you go to the next slide, Patrick, we can take a look at how workloads are changing, so George, take us through that dynamic. >> [George] Okay so, to understand where we're going, sometimes it helps to look at where we've come from, and the traditional workloads, if we talk about applications, they were divided into, now, we talked about sort of batch versus interactive, but now, they were also divided into online transaction processing, operational application, systems of record, and then there was the analytic side, which was reporting on it, but this was sort of backward-looking reporting, and we begin to see some convergence between the two with web and mobile apps, where a user was interacting both with the analytics that informed an interaction that they might have. That's looking backwards, and we're going to take a quick look at some of the new technologies that augmented those older application patterns. Then we're going to go look at the emergent workloads and what they look like. >> Okay so, let's have a quick conversation about this before we go on to the next segment. Hadoop obviously was batch. It really was a way, as we've talked about today and many other dates in theCUBE, a way to reduce the expense of doing data warehousing and business intelligence, I remember we were interviewing Jeff Hammerbacher, and he said, "When I was at Facebook, "my mission was to break the dependency "and the container, the storage container." So he really wanted to, needed to reduce costs, he saw that infrastructure needed to change, so if you look at the next slide, which is really sort of talking to Hadoop doing batch in traditional BI, take us through that, and then we'll sort of evolve to the future. >> Okay, so this is an example of traditional workloads, batch business intelligence, because Hadoop has not really gotten to the maturity point of view where you can really do interactive business intelligence. It's going to take a little more work. But here, you've basically put in a repository more data than you could possibly ever fit in a data warehouse, and the key is, this environment was very fragmented, there were many different engines involved, and so there was a high developer complexity, and a high operational complexity, and we're getting to the point where we can do somewhat better on the integration, and we're getting to the point where we might be able to do interactive business intelligence and start doing a little bit of advanced analytics like machine learning. >> Okay. Let's talk a little bit about why we're here, we're here 'cause it's Spark Summit, Spark was designed to simplify big data, simplify a lot of the complexity in Hadoop, so on the next slide, you've got this red line of Spark, so what is Spark's role, what does that red line represent? >> Okay, so the key takeaway from this slide is, couple things. One, it's interesting, but when you listen to Matei Zaharia, who is the creator of Spark, he said, "I built this to be a better MapReduce than MapReduce," which was the old crufty heart of Hadoop. And of course, they've stretched it far beyond their original intentions, but it's not the panacea yet, and if you put it in the context of a data lake, it can help you with what a data engineer does with exploring and munging the data, and what a data scientist might do in terms of processing the data and getting it ready for more advanced analytics, but it doesn't give you an end-to-end solution, not even within the data lake. The point of explaining this is important, because we want to explain how, even in the newer workloads, Spark isn't yet mature to handle the end-to-end integration, and by making that point, we'll show where it needs still more work, and where you have to substitute other products. >> Okay, so let's have a quick discussion about those workloads. Workloads really kind of drive everything, a lot of decisions for organizations, where to put things, and how to protect data, where the value is, so in this next slide you've got, you're juxtaposing traditional workloads with emerging workloads, so let's talk about these new continuous apps. >> Okay, so, this tees it up well, 'cause we focused on the traditional workloads. The emerging ones are where data is always coming in. You could take a big flow of data and sort of end it and bucket it, and turn it into a batch process, but now that we have the capability to keep processing it, and you want answers from it very near real time, you don't want to stop it from flowing, so the first one that took off like this was collecting telemetry about the operation and performance of your apps and your infrastructure, and Splunk sort of conquered that workload first. And then the second one, the one that everyone's talking about now is sort of Internet of Things, but more accurately, the Industrial Internet of Things, and that stream of data is, again, something you'll want to analyze and act on with as little delay as possible. The third one is interesting, asynchronous microservices. This is difficult, because this doesn't necessarily require a lot of new technology, so much as a new skillset for developers, and that's going to mean it takes off fairly slowly. Maybe new developers coming out of school will adopt it whole cloth, but this is where you don't rely on a big central database, this is where you break things into little pieces, and each piece manages itself. >> So you say the components of these arrows that you're showing in just explore processor, these are all sort of discrete elements of the data flow that you have to then integrate as a customer? >> [George] Yes, frankly, these are all steps that could be an end-to-end integrative process, but it's not yet mature enough really to do it end-to-end. For example, we don't even have a data store that can go all the way from ingest to serve, and by ingest, I mean taking the millions, potentially millions or more, events per second coming in from your Internet of Things devices, the explorer would be in that same data store, letting you visualize what's there, and process doing the analysis, and serving then is, from that same data store, letting your industrial devices, or your business intelligence workloads get real-time updates. For this to work as one whole, we need a data store, for example, that can go from end-to-end, in addition to the compute and analytic capabilities that go end-to-end. The point of this is, for continuous workloads, we do want to get to this integrated point somehow, sometime, but we're not there yet. >> Okay, let's go deeper, and take a look at the next slide, you've got this data feedback loop, and you've got this prediction on top of this, what does all that mean, let's double-click on that. >> Okay, so now we're unpacking the slide we just looked at, in that we're unpacking it into two different elements, one is what you're doing when you're running the system, and the next one will be what you're doing when you're designing it. And so for this one, what you're doing when you're running the system, I've grayed out the where's the data coming from and where's it going to, just to focus on how we're operating on the data, and again, to repeat the green part, which is storage, we don't have an end-to-end integrated store that could cost-effectively, scalably handle this whole chain of steps, but what we do have is that in the runtime, you're going to ingest the data, you're going to process it and make it ready for prediction, then there's a step that's called devops for data science, we know devops for developers, but devops for data science, as we're going to see, actually unpacks a whole 'nother level of complexity, but this devops for data science, this is where you get the prediction, of, okay, so, if this turbine is vibrating and has a heat spike, it means shut it down because something's going to fail. That's the prediction component, and the serve part then takes that prediction, and makes sure that that device gets it fast. >> So you're putting that capability in the hands of the data science component so they can effect that outcome virtually instantaneously? >> Yes, but in this case, the data scientist will have done that at design time. We're still at run time, so this is, once the data scientist has built that model, here, it's the engineer who's keeping it running. >> Yeah, but it's designed into the process, that's the devops analogy. Okay great, well let's go to that sort of next piece, which is design, so how does this all affect design, what are the implications there? >> So now, before we had ingest process, then prediction with devops for data science, and then serving, now when you're at design time, you ingest the data, and there's a whole unpacking of steps, which requires a handful, or two fistfuls of tools right now to make operate. This is to acquire the data, explore it, prepare it, model it, assess it, distribute it, all those things are today handled by a collection of tools that you have to stitch together, and then you have process at which could be typically done in Spark, where you do the analysis, and then serving it, Spark isn't ready to serve, that's typically a high-speed database, one that either has tons of data for history, or gets very, very fast updates, like a Redis that's almost like a cache. So the point of this is, we can't yet take Spark as gospel from end to end. >> Okay so, there's a lot of complexity here. >> [George] Right, that's the trade-off. >> So let's take a look at the next slide, which talks to where that complexity comes from, let's look at it first from the developer side, and then we'll look at the admin, so, so on the next slide, we're looking at the complexity from the dev perspective, explain the axes here. >> Okay, okay. So, there's two axes. If you look at the x-axis at the bottom, there's ingest, explore, process, serve. Those were the steps at a high level that we said a developer has to master, and it's going to be in separate products, because we don't have the maturity today. Then on the y-axis, we have some, but not all, this is not an exhaustive list of all the different things a developer has to deal with, with each product, so the complexity is multiplying all the steps on the y-axis, data model, addressing, programming model, persistence, all the stuff's on the y-axis, by all the products he needs on the x-axis, it's a mess, which is why it's very, very hard to build these types of systems today. >> Well, and why everybody's pushing on this whole unified integration, that was a major thing that we heard throughout the day today. What about from the admin's side, let's take a look at the next slide, which is our last slide, in terms of the operational complexity, take us through that. >> [George] Okay, so, the admin is when the system's running, and reading out the complexity, or inferring the complexity, follows the same process. On the y-axis, there's a separate set of tasks. These are admin-related. Governance, scheduling and orchestration, a high availability, all the different types of security, resource isolation, each of these is done differently for each product, and the products are on the x-axis, ingest, explore, process, serve, so that when you multiply those out, and again, this isn't exhaustive, you get, again, essentially a mess of complexity. >> Okay, so we got the message, if you're a practitioner of these so-called big data technologies, you're going to be dealing with more complexity, despite the industry's pace of trying to address that, but you're seeing new projects pop up, but nonetheless, it feels like the complexity curve is growing faster than customer's ability to absorb that complexity. Okay, well, is there hope? >> Yes. But here's where we've had this conundrum. The Apache opensource community has been the most amazing source of innovation I think we've ever seen in the industry, but the problem is, going back to the amazing book, The Cathedral and the Bazaar, about opensource innovation versus top-down, the cathedral has this central architecture that makes everything fit together harmoniously, and beautifully, with simplicity. But the bazaar is so much faster, 'cause it's sort of this free market of innovation. The Apache ecosystem is the bazaar, and the burden is on the developer and the administrator to make it work together, and it was most appropriate for the big internet companies that had the skills to do that. Now, the companies that are distributing these Apache opensource components are doing a Herculean job of putting them together, but they weren't designed to fit together. On the other hand, you've got the cloud service providers, who are building, to some extent, services that have standard APIs that might've been supported by some of the Apache products, but they have proprietary implementations, so you have lock-in, but they have more of the cathedral-type architecture that-- >> And they're delivering 'em their services, even though actually, many of those data services are discrete APIs, as you point out, are proprietary. Okay, so, very useful, George, thank you, if you have questions on this presentation, you can hit Wikibon.com and fire off a question to us, we'll make sure it gets to George and gets answered. This is part one, part two tomorrow is we're going to dig into some of the numbers, right? So if you care about where the trends are, what the numbers look like, what the market size looks like, we'll be sharing that with you tomorrow, all this stuff, of course, will be available on-demand, we'll be doing CrowdChats on this, George, excellent job, thank you very much for taking us through this. Thanks for watching today, it is a wrap of day one, Spark Summit East, we'll be back live tomorrow from Boston, this is theCUBE, so check out siliconangle.com for a review of all the action today, all the news, check out Wikibon.com for all the research, siliconangle.tv is where we house all these videos, check that out, we start again tomorrow at 11 o'clock east coast time, right after the keynotes, this is theCUBE, we're at Spark Summit, #SparkSummit, we're out, see you tomorrow. (electronic music jingle)

Published Date : Feb 8 2017

SUMMARY :

brought to you by Databricks. and the market conditions, and then we're going to go and it doesn't mean that all apps are going to be always on, Anything else you want to point out here? the technology has to mature, so right now Let's go to the next slide, which really and at the edge, and you don't necessarily need and you think of that as closer to real time, and the traditional workloads, "and the container, the storage container." and we're getting to the point where so on the next slide, you've got this red line of Spark, but it's not the panacea yet, and if you put it Okay, so let's have a quick discussion and you want answers from it very near real time, and by ingest, I mean taking the millions, and take a look at the next slide, and the next one will be what you're doing here, it's the engineer who's keeping it running. Yeah, but it's designed into the process, So the point of this is, we can't yet take Spark so on the next slide, we're looking of all the different things a developer has to deal with, let's take a look at the next slide, and the products are on the x-axis, it feels like the complexity curve is growing faster and the burden is on the developer and the administrator of all the action today, all the news,

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
Patrick	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Jeff Hammerbacher	PERSON	0.99+
Steve Herrod	PERSON	0.99+
Jeff Kelly	PERSON	0.99+
George	PERSON	0.99+
Matei Zaharia	PERSON	0.99+
Boston	LOCATION	0.99+
last year	DATE	0.99+
Wikibon	ORGANIZATION	0.99+
SiliconANGLE	ORGANIZATION	0.99+
tomorrow	DATE	0.99+
millions	QUANTITY	0.99+
VMware	ORGANIZATION	0.99+
Spark	TITLE	0.99+
Gorge	ORGANIZATION	0.99+
one batch	QUANTITY	0.99+
Boston, Massachusetts	LOCATION	0.99+
two classes	QUANTITY	0.99+
Dave	PERSON	0.99+
three classes	QUANTITY	0.99+
first	QUANTITY	0.99+
two parts	QUANTITY	0.99+
each	QUANTITY	0.99+
second one	QUANTITY	0.99+
two different elements	QUANTITY	0.99+
first slide	QUANTITY	0.99+
two	QUANTITY	0.99+
The Cathedral and the Bazaar	TITLE	0.99+
each product	QUANTITY	0.99+
each piece	QUANTITY	0.99+
third one	QUANTITY	0.99+
One	QUANTITY	0.99+
Databricks	ORGANIZATION	0.99+
today	DATE	0.98+
Facebook	ORGANIZATION	0.98+
first one	QUANTITY	0.98+
both	QUANTITY	0.98+
Apache	ORGANIZATION	0.98+
SiliconANGLE Media	ORGANIZATION	0.98+
first research	QUANTITY	0.98+
Spark Summit East 2017	EVENT	0.97+
Hadoop	TITLE	0.97+
two things	QUANTITY	0.97+
two fistfuls of tools	QUANTITY	0.96+
theCUBE	ORGANIZATION	0.96+
one	QUANTITY	0.96+
day one	QUANTITY	0.95+
#SparkSummit	EVENT	0.93+
siliconangle.com	OTHER	0.93+
two axes	QUANTITY	0.92+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for one day batch: