Yaron Haviv, iguazio - DockerCon 2017 - #theCUBE - #DockerCon

>> Narrator: From Austin, Texas. It's the CUBE, covering DockerCon 2017. Brought to you by Docker, and support from it's ecosystem partners. >> Hi, I'm Stu Miniman, with my co-host, James Kobielus, who's been digging into all the application development angles. Happy to welcome back to the program, here at DockerCon, Yaron Haviv, who is the co-founder and CTO of iguazio. Yaron, great to see you. >> Thanks. >> How have you been? >> Great, great, been busy traveling a lot. >> We talked about how some of us celebrated Passover recently, I had brisket at home. We had Franklin's Barbecue brisket here. Anthony Bourdain said the only two people that know how to do brisket well, are Franklin's and the Jews. (all laugh) >> So we had Passover, a lot of good food, but also a lot of traveling. I was also in a Kubernetes conference in Europe and here. Prior to that, big data show, so it's a lot of traveling. >> Kubernetes, Docker, Ecosystem. You've been watching this, your company is involved in it. What's your take on the state of the ecosystem, and what do you think of the announcements this week? >> You know, I have also been to the Kubernetes conference, and you see those are still small, relatively small shows. And it's mostly developer focused. What we see is that Kubernetes is taking a lot of share from the others, because most of the guys that adopt are not enterprises yet. It's people that have a large enough infrastructure that they want to use it internally, and Kubernetes is a little more flexible. And on the other end, you see Docker trying to create a greenware-like, shrink wrapped, version of container infrastructure. So we see those two, and there's obviously the Public Cloud with their fully integrated stack. Now, what I notice here in the show, and also when, a couple of weeks ago, in the Kubernetes conference, think about the stack. It has, let's say, 20 components. So someone like Amazon brings the entire 20 components, and it's fully integrated and secure and networking and storage and data services and everything. And here, what you'll see, is a lot of vendors, this guy has those four components, the other guys have those five components, in some cases they actually overlap. So this guy will have three unique components, and two other components, et cetera. And it's very hard to assemble a full blown solution. So as a buyer, how do you decide which components am I going to choose? That's part of the challenge, and also helps serve the cloud guys. >> I remember when I first joined at Wikibon, we talked about, the hyperscale model was you take your team of PhDs, you just architect your application and software. You're the enterprise though, you don't have that talent. So you will spend money to buy that packaged solution. I want to buy it as a service, I want to buy it easy. Where do you see the maturity of this market, and how that fits for, and what can the enterprise consume, how do they do it? Or do they just go to platforms? >> So this is why our positioning was, it was a platform. We are not a component. We are a fully integrated system. We have multi-tendency, we have security, we have data lifecycle management. We integrate with applications, we have our own UI. But it's focused more on the data services. So if you take a dozen Amazon data services, you need to send Dynamo, and others, and object and file. We basically pack all of them, because data is the biggest challenge, as you know. High volubility, versioning, reliability, security. The biggest and toughest challenge is the data. And once you solve that one, the applications, they all become stateless, and that's much easier. There still needs to be a bigger ecosystem around it, which is why we are doing a lot more work with CNCF. And trying to create standards for the different interactions between those components. So when a buyer goes and buys a certain component from one vendor, it doesn't necessarily lock in to that. They can just go and modify it in the future. I think once you solve the data problem, of the persistency, which is sort of the toughest challenge in this environment, the rest of it becomes simpler. >> One of the questions James has been asking this week, is where analytics fits in? I look at your real-time continuous analytics piece, not an application that I heard talked about too much, maybe we can get your viewpoint on it? >> And the relevance is, of course, much of the application development that is going on, the hot stuff, is related to artificial intelligence, on streaming analytics, clearly continuous. >> Which is where we focus on. Some of the things that I try, to work with different communities, it's explained, that right now we have bifurcation, we have the Apache ecosystem, and we have the Docker ecosystem, totally separate ecosystems, and by the way, you know that cloud is where most analytics happen. >> James: Yes. >> So basically, analytics and cloud technology have to converge. This is what we have been trying to pitch, is why do you use YARN, as a scheduler, where I can use Kubernetes, and it's more generic. Because I can schedule any type of work. So this is something that we are trying to push, and all this notion of continuous integration, when we say continuous analytics, it's not just about the real-time aspect, it's also about the continuous development and integration. >> James: Yes. >> So you actually want this notion of server-less function, which is one of the things I like. Also, just immutable code and infrastructure, you want to adopt those notions, so analytics is going to go into real-time, more and more. So that means, unless I have my connected car pipeline that I get streams, and I process it, and I generate insights. What happens if I find a bug in my application, or I just want to enhance it, and create another feature? So I want to be able to just push a new version, of my analytics code into some platforms, hopefully ours. >> You also want to train that new algorithm as well, to make sure it's fit for whatever specific... >> Yeah, but you have to have this notion of continuity, which means all the integrations we did, have to be different, it has to be a lot more atomic. >> Yeah. >> It has to be check-pointed. All those things that I can basically knock down my analytic process, and relaunch it, and it goes seamlessly and continues. And that's not the Apache model, to play around at bootcamp enough, it's a lot more Legacy kind of approach, which I don't connect to too much. >> Yaron, maybe complete out the stack that you're building, how does serverless fit into this also? >> Okay, so basically, we are building all the data engines, we are doing streaming, we are doing objects, files, NoSQL, SQL, for us it's all integrated into the same very high performance engine. We also have built in analytics, so we can build things like joints and aggregations, and all of the computations on the data as it injects, and it could basically present itself as many different things. Now one of the things we get asked from customers, and we demonstrated that in Strata, let's assume I'm throwing an image into this thing, I want to be able to immediately analyze the image, and say if there is a face, if there is something suspicious about the picture, or maybe even simple things, like extract meta-data information, like geolocation of the picture, so I can do something with it. So we had to develop internally, an event driven process, we didn't call it serverless internally, where you throw data, and it immediately launches and triggers a process, which is a Docker container based process. It has high speed message bust integration into our data platform, that immediately invokes and processes that in a very elastic fashion. So if you throw thousands of objects, it elastically generates multiple workers to work on that, and that's also how we design things like DR, and backup internally in our platform to be very flexible, so we can build DR to S3. How do we do it? We basically have serverless functions that know how to convert the updates into a continuous stream of updates, and then they just go and there is a small code that says "Go right to S3". And that allows me a lot flexibility to develop new features. So this is all this notion of data lifecycle management, with every advance in our product, is actually based on serverless functions, we just didn't call it serverless. One of the things that we're working on with the community, is trying to detach that portion from our product, and contribute it as an open-source projects, because it's much faster and much more optimized than what you'll see, including IBM Whisk or Amazon Lambda implementation of that. >> Are you working with the Apache... Are you working in the context of the Apache framework to expose, for example, machine learning pipeline functions as serverless functions? >> So again, Apache is not the right necessarily place to do that. >> You can do them in Spark. >> I do them in Spark and all that, but we do want the Kubernetes environment to deal with all the constriction requirements for that thing. The way that we do, for example, tensorflow integration is we may expose file into tensor float, on one end, to be able to look at the image, and the same time the metadata updates, so what the image contains is exposed to tensorflow as sort of a key value store, or document store. It just updates attributes on the same image. So the way that we work now with healthcare, an MRI image lands and something looks at the MRI image, and senses cancer. Basically, you can mainly attack the same image, with records, which fields say contains cancer by this guy, take picture of this guy. And then, when you want to run a query, and say, you know what, give me all the MRIs pictures that contain query, it now flips and acts like a database, and you just pull all those images. It's a different approach to how to do those things. >> Yaron talked about Docker containers, Kubernetes, serverless, how do virtual machines fit into the environment? >> I had some interesting conversations at Kubernetes with some friends that are high ranked in this industry, without disclosing, do you really need openstack in between bare metal and containers? Because the traditional approach is, Okay, we have bare metal, we need to put virtualization layer for isolation, and then we need to put Kubernetes or Docker. And we figure out that very little amount of risk, actually, in putting, especially with the new security, things around containers and image signing, and what we do, which is authenticating the container, not the infrastructure on data access, network isolation, all those things that eventually can collapse and eliminate virtualization, but not for every application. Some applications which are more traditional Legacy, the application may still require VMs, because it's quite a different philosophy to develop microservices and develop VMs. Apart of what I see here in the show is not everyone internalizes that. People still think in the notion of Here's my lightweight VM, that happen to be called Docker container, and I'm going to give it the volume, and I'm going to create snapshots on that volume, and all that stuff. But if you think about it, what is really microservices? It's about allowing this elasticity, so the same workload can spawn multiple workers, it's the ability to go and create update versions, it's the ability to knock down this container anytime I want, and just kill it and launch it in a different place. You know how Google works, or Amazon or Ebay, or all those guys. You're basically killing containers on purpose, to basically test their system. All this notion that my configuration and my logs and all that stuff, sits inside the container, is not cloud native, and it doesn't allow this elasticity that you want if you're building a Netflix or an Ebay, or a modern enterprise infrastructure. So I think we need to put those two things aside. You have Legacy applications, keep them in the VMs. You have new workloads, you need to think of data, and data integration, and microservices differently on something which is entirely stateless. The image of the container builds from the get. OK? And create a Docker image. And if you want to go to a different image, you just go and recreate, from source, the same image. The data for that image needs to be stored in a data facility like a database or an object or something like that. >> Yaron, final question I have for you is, talk a little about the customers you're interacting with, talk about the people that are here, as you said, there's a spectrum of how far along they are in the thinking. You're pretty advanced in some of your architectural thoughts and opinionated as to where you're going. Where are the customers today, how many of them are ready for the future versus sticking to what they have got? >> So what you mentioned before, part of the key challenge for enterprises is they all want to move into the digital transformation, they all want to be competitive, because some have existential threats, think about even banks, today, where Apple comes with Apple Pay, it kills a lot of the margins they are making from all those small transactions. And now, no one really cares how many branches you have in the bank, because all the Y Generation just goes to their mobile app. Someone like a bank, have to immediately transition and be able to offer premium services, offer better experiences for the mobile application, be able to analyze user behavior, some things that are more strategic. The traditional things that IT deals with like exchange server management, SAP, all those Legacy things will move to the cloud, because there's no real value there. And what you see is more and more enterprises thinking about how do we generate the differentiation, which is more about analyzing data, and being able to provide better service to the customers, and the biggest challenge is they don't know how to do it. Because what the industry tells them, Go to Apache, and take a dozen of projects, and now integrate those and figure out the security problem, and you know what, you want to add Kubernetes, that's from a different story, but let's try and glue this together, and that's extremely complicated. So what we are trying to do is go to those customers, say you know what, we're building a full blown solution, fully integrated, security is baked in, all the different data services, it integrates with things like Kubernetes natively, we actually do the extra mile, we actually build Spark and tensorflow, and the images that contain everything, including support for us, that you can just launch Spark and it connects and works. We want to make life easier for those enterprises to solve those key challenges that they are working on. And this is working extremely well for us, actually the challenge we have, we only have, I think, two sales guys and we have a huge pipeline, and we can't really deliver for most of those projects. >> Good challenges to have sometimes, talk about scaling, which has been one of the themes of the week here. Yaron Haviv, great to catch up with you as always. We'll be back with two days of our coverage here, at DockerCon 2017. You're watching the CUBE. (electronic music)

Published Date : Apr 19 2017

SUMMARY :

Brought to you by Docker, Yaron, great to see you. that know how to do brisket well, So we had Passover, a lot of good food, and what do you think of the announcements this week? And on the other end, you see Docker trying to create You're the enterprise though, you don't have that talent. because data is the biggest challenge, as you know. the hot stuff, is related to artificial intelligence, and by the way, you know that cloud is where it's not just about the real-time aspect, So you actually want this notion of to make sure it's fit for whatever specific... have to be different, it has to be a lot more atomic. And that's not the Apache model, and all of the computations on the data as it injects, Are you working with the Apache... So again, Apache is not the right necessarily place So the way that we work now with healthcare, and all that stuff, sits inside the container, talk about the people that are here, as you said, and the images that contain everything, Yaron Haviv, great to catch up with you as always.

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Europe	LOCATION	0.99+
James	PERSON	0.99+
Anthony Bourdain	PERSON	0.99+
Stu Miniman	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Yaron Haviv	PERSON	0.99+
20 components	QUANTITY	0.99+
Ebay	ORGANIZATION	0.99+
Netflix	ORGANIZATION	0.99+
Austin, Texas	LOCATION	0.99+
Google	ORGANIZATION	0.99+
two days	QUANTITY	0.99+
Spark	TITLE	0.99+
five components	QUANTITY	0.99+
#DockerCon	EVENT	0.99+
two people	QUANTITY	0.99+
two	QUANTITY	0.99+
Apple	ORGANIZATION	0.99+
Docker	ORGANIZATION	0.99+
this week	DATE	0.98+
Wikibon	ORGANIZATION	0.98+
Passover	EVENT	0.98+
One	QUANTITY	0.98+
S3	TITLE	0.98+
Strata	TITLE	0.98+
DockerCon 2017	EVENT	0.98+
Yaron	PERSON	0.97+
today	DATE	0.97+
#theCUBE	EVENT	0.97+
one	QUANTITY	0.97+
first	QUANTITY	0.97+
two other components	QUANTITY	0.96+
one vendor	QUANTITY	0.95+
two things	QUANTITY	0.95+
a dozen	QUANTITY	0.95+
thousands of objects	QUANTITY	0.95+
CNCF	ORGANIZATION	0.94+
DockerCon	EVENT	0.94+
four components	QUANTITY	0.94+
Apache	ORGANIZATION	0.93+
Kubernetes	TITLE	0.92+
three unique components	QUANTITY	0.92+
IBM Whisk	ORGANIZATION	0.89+
Y Generation	ORGANIZATION	0.88+
one end	QUANTITY	0.88+
Dynamo	ORGANIZATION	0.88+
Kubernetes	EVENT	0.86+
Ecosystem	ORGANIZATION	0.86+
a couple of weeks ago	DATE	0.82+
Kubernetes	ORGANIZATION	0.82+
two sales guys	QUANTITY	0.82+
NoSQL	TITLE	0.8+
a dozen of projects	QUANTITY	0.79+
Yaron Haviv	EVENT	0.75+
Apache	TITLE	0.7+
One of the questions	QUANTITY	0.7+
Lambda	TITLE	0.66+
Jews	PERSON	0.66+
Apple Pay	TITLE	0.63+
Passover	ORGANIZATION	0.62+
iguazio	PERSON	0.6+
SQL	TITLE	0.59+
bootcamp	TITLE	0.55+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Y Generation: