Evan Kaplan, InfluxData | AWS re:invent 2022

>>Hey everyone. Welcome to Las Vegas. The Cube is here, live at the Venetian Expo Center for AWS Reinvent 2022. Amazing attendance. This is day one of our coverage. Lisa Martin here with Day Ante. David is great to see so many people back. We're gonna be talk, we've been having great conversations already. We have a wall to wall coverage for the next three and a half days. When we talk to companies, customers, every company has to be a data company. And one of the things I think we learned in the pandemic is that access to real time data and real time analytics, no longer a nice to have that is a differentiator and a competitive all >>About data. I mean, you know, I love the topic and it's, it's got so many dimensions and such texture, can't get enough of data. >>I know we have a great guest joining us. One of our alumni is back, Evan Kaplan, the CEO of Influx Data. Evan, thank you so much for joining us. Welcome back to the Cube. >>Thanks for having me. It's great to be here. So here >>We are, day one. I was telling you before we went live, we're nice and fresh hosts. Talk to us about what's new at Influxed since the last time we saw you at Reinvent. >>That's great. So first of all, we should acknowledge what's going on here. This is pretty exciting. Yeah, that does really feel like, I know there was a show last year, but this feels like the first post Covid shows a lot of energy, a lot of attention despite a difficult economy. In terms of, you know, you guys were commenting in the lead into Big data. I think, you know, if we were to talk about Big Data five, six years ago, what would we be talking about? We'd been talking about Hadoop, we were talking about Cloudera, we were talking about Hortonworks, we were talking about Big Data Lakes, data stores. I think what's happened is, is this this interesting dynamic of, let's call it if you will, the, the secularization of data in which it breaks into different fields, different, almost a taxonomy. You've got this set of search data, you've got this observability data, you've got graph data, you've got document data and what you're seeing in the market and now you have time series data. >>And what you're seeing in the market is this incredible capability by developers as well and mostly open source dynamic driving this, this incredible capability of developers to assemble data platforms that aren't unicellular, that aren't just built on Hado or Oracle or Postgres or MySQL, but in fact represent different data types. So for us, what we care about his time series, we care about anything that happens in time, where time can be the primary measurement, which if you think about it, is a huge proportion of real data. Cuz when you think about what drives ai, you think about what happened, what happened, what happened, what happened, what's going to happen. That's the functional thing. But what happened is always defined by a period, a measurement, a time. And so what's new for us is we've developed this new open source engine called IOx. And so it's basically a refresh of the whole database, a kilo database that uses Apache Arrow, par K and data fusion and turns it into a super powerful real time analytics platform. It was already pretty real time before, but it's increasingly now and it adds SQL capability and infinite cardinality. And so it handles bigger data sets, but importantly, not just bigger but faster, faster data. So that's primarily what we're talking about to show. >>So how does that affect where you can play in the marketplace? Is it, I mean, how does it affect your total available market? Your great question. Your, your customer opportunities. >>I think it's, it's really an interesting market in that you've got all of these different approaches to database. Whether you take data warehouses from Snowflake or, or arguably data bricks also. And you take these individual database companies like Mongo Influx, Neo Forge, elastic, and people like that. I think the commonality you see across the volume is, is many of 'em, if not all of them, are based on some sort of open source dynamic. So I think that is an in an untractable trend that will continue for on. But in terms of the broader, the broader database market, our total expand, total available tam, lots of these things are coming together in interesting ways. And so the, the, the wave that will ride that we wanna ride, because it's all big data and it's all increasingly fast data and it's all machine learning and AI is really around that measurement issue. That instrumentation the idea that if you're gonna build any sophisticated system, it starts with instrumentation and the journey is defined by instrumentation. So we view ourselves as that instrumentation tooling for understanding complex systems. And how, >>I have to follow quick follow up. Why did you say arguably data bricks? I mean open source ethos? >>Well, I was saying arguably data bricks cuz Spark, I mean it's a great company and it's based on Spark, but there's quite a gap between Spark and what Data Bricks is today. And in some ways data bricks from the outside looking in looks a lot like Snowflake to me looks a lot like a really sophisticated data warehouse with a lot of post-processing capabilities >>And, and with an open source less >>Than a >>Core database. Yeah. Right, right, right. Yeah, I totally agree. Okay, thank you for that >>Part that that was not arguably like they're, they're not a good company or >>No, no. They got great momentum and I'm just curious. Absolutely. You know, so, >>So talk a little bit about IOx and, and what it is enabling you guys to achieve from a competitive advantage perspective. The key differentiators give us that scoop. >>So if you think about, so our old storage engine was called tsm, also open sourced, right? And IOx is open sourced and the old storage engine was really built around this time series measurements, particularly metrics, lots of metrics and handling those at scale and making it super easy for developers to use. But, but our old data engine only supported either a custom graphical UI that you'd build yourself on top of it or a dashboarding tool like Grafana or Chronograph or things like that. With IOCs. Two or three interventions were important. One is we now support, we'll support things like Tableau, Microsoft, bi, and so you're taking that same data that was available for instrumentation and now you're using it for business intelligence also. So that became super important and it kind of answers your question about the expanded market expands the market. The second thing is, when you're dealing with time series data, you're dealing with this concept of cardinality, which is, and I don't know if you're familiar with it, but the idea that that it's a multiplication of measurements in a table. And so the more measurements you want over the more series you have, you have this really expanding exponential set that can choke a database off. And the way we've designed IIS to handle what we call infinite cardinality, where you don't even have to think about that design point of view. And then lastly, it's just query performance is dramatically better. And so it's pretty exciting. >>So the unlimited cardinality, basically you could identify relationships between data and different databases. Is that right? Between >>The same database but different measurements, different tables, yeah. Yeah. Right. Yeah, yeah. So you can handle, so you could say, I wanna look at the way, the way the noise levels are performed in this room according to 400 different locations on 25 different days, over seven months of the year. And that each one is a measurement. Each one adds to cardinality. And you can say, I wanna search on Tuesdays in December, what the noise level is at 2:21 PM and you get a very quick response. That kind of instrumentation is critical to smarter systems. How are >>You able to process that data at at, in a performance level that doesn't bring the database to its knees? What's the secret sauce behind that? >>It's AUM database. It's built on Parque and Apache Arrow. But it's, but to say it's nice to say without a much longer conversation, it's an architecture that's really built for pulling that kind of data. If you know the data is time series and you're looking for a time measurement, you already have the ability to optimize pretty dramatically. >>So it's, it's that purpose built aspect of it. It's the >>Purpose built aspect. You couldn't take Postgres and do the same >>Thing. Right? Because a lot of vendors say, oh yeah, we have time series now. Yeah. Right. So yeah. Yeah. Right. >>And they >>Do. Yeah. But >>It's not, it's not, the founding of the company came because Paul Dicks was working on Wall Street building time series databases on H base, on MyQ, on other platforms and realize every time we do it, we have to rewrite the code. We build a bunch of application logic to handle all these. We're talking about, we have customers that are adding hundreds of millions to billions of points a second. So you're talking about an ingest level. You know, you think about all those data points, you're talking about ingest level that just doesn't, you know, it just databases aren't designed for that. Right? And so it's not just us, our competitors also build good time series databases. And so the category is really emergent. Yeah, >>Sure. Talk about a favorite customer story they think really articulates the value of what Influx is doing, especially with IOx. >>Yeah, sure. And I love this, I love this story because you know, Tesla may not be in favor because of the latest Elon Musker aids, but, but, but so we've had about a four year relationship with Tesla where they built their power wall technology around recording that, seeing your device, seeing the stuff, seeing the charging on your car. It's all captured in influx databases that are reporting from power walls and mega power packs all over the world. And they report to a central place at, at, at Tesla's headquarters and it reports out to your phone and so you can see it. And what's really cool about this to me is I've got two Tesla cars and I've got a Tesla solar roof tiles. So I watch this date all the time. So it's a great customer story. And actually if you go on our website, you can see I did an hour interview with the engineer that designed the system cuz the system is super impressive and I just think it's really cool. Plus it's, you know, it's all the good green stuff that we really appreciate supporting sustainability, right? Yeah. >>Right, right. Talk about from a, what's in it for me as a customer, what you guys have done, the change to IOCs, what, what are some of the key features of it and the key values in it for customers like Tesla, like other industry customers as well? >>Well, so it's relatively new. It just arrived in our cloud product. So Tesla's not using it today. We have a first set of customers starting to use it. We, the, it's in open source. So it's a very popular project in the open source world. But the key issues are, are really the stuff that we've kind of covered here, which is that a broad SQL environment. So accessing all those SQL developers, the same people who code against Snowflake's data warehouse or data bricks or Postgres, can now can code that data against influx, open up the BI market. It's the cardinality, it's the performance. It's really an architecture. It's the next gen. We've been doing this for six years, it's the next generation of everything. We've seen how you make time series be super performing. And that's only relevant because more and more things are becoming real time as we develop smarter and smarter systems. The journey is pretty clear. You instrument the system, you, you let it run, you watch for anomalies, you correct those anomalies, you re instrument the system. You do that 4 billion times, you have a self-driving car, you do that 55 times, you have a better podcast that is, that is handling its audio better, right? So everything is on that journey of getting smarter and smarter. So >>You guys, you guys the big committers to IOCs, right? Yes. And how, talk about how you support the, develop the surrounding developer community, how you get that flywheel effect going >>First. I mean it's actually actually a really kind of, let's call it, it's more art than science. Yeah. First of all, you you, you come up with an architecture that really resonates for developers. And Paul Ds our founder, really is a developer's developer. And so he started talking about this in the community about an architecture that uses Apache Arrow Parque, which is, you know, the standard now becoming for file formats that uses Apache Arrow for directing queries and things like that and uses data fusion and said what this thing needs is a Columbia database that sits behind all of this stuff and integrates it. And he started talking about it two years ago and then he started publishing in IOCs that commits in the, in GitHub commits. And slowly, but over time in Hacker News and other, and other people go, oh yeah, this is fundamentally right. >>It addresses the problems that people have with things like click cows or plain databases or Coast and they go, okay, this is the right architecture at the right time. Not different than original influx, not different than what Elastic hit on, not different than what Confluent with Kafka hit on and their time is you build an audience of people who are committed to understanding this kind of stuff and they become committers and they become the core. Yeah. And you build out from it. And so super. And so we chose to have an MIT open source license. Yeah. It's not some secondary license competitors can use it and, and competitors can use it against us. Yeah. >>One of the things I know that Influx data talks about is the time to awesome, which I love that, but what does that mean? What is the time to Awesome. Yeah. For developer, >>It comes from that original story where, where Paul would have to write six months of application logic and stuff to build a time series based applications. And so Paul's notion was, and this was based on the original Mongo, which was very successful because it was very easy to use relative to most databases. So Paul developed this commitment, this idea that I quickly joined on, which was, hey, it should be relatively quickly for a developer to build something of import to solve a problem, it should be able to happen very quickly. So it's got a schemaless background so you don't have to know the schema beforehand. It does some things that make it really easy to feel powerful as a developer quickly. And if you think about that journey, if you feel powerful with a tool quickly, then you'll go deeper and deeper and deeper and pretty soon you're taking that tool with you wherever you go, it becomes the tool of choice as you go to that next job or you go to that next application. And so that's a fundamental way we think about it. To be honest with you, we haven't always delivered perfectly on that. It's generally in our dna. So we do pretty well, but I always feel like we can do better. >>So if you were to put a bumper sticker on one of your Teslas about influx data, what would it >>Say? By the way, I'm not rich. It just happened to be that we have two Teslas and we have for a while, we just committed to that. The, the, so ask the question again. Sorry. >>Bumper sticker on influx data. What would it say? How, how would I >>Understand it be time to Awesome. It would be that that phrase his time to Awesome. Right. >>Love that. >>Yeah, I'd love it. >>Excellent time to. Awesome. Evan, thank you so much for joining David, the >>Program. It's really fun. Great thing >>On Evan. Great to, you're on. Haven't Well, great to have you back talking about what you guys are doing and helping organizations like Tesla and others really transform their businesses, which is all about business transformation these days. We appreciate your insights. >>That's great. Thank >>You for our guest and Dave Ante. I'm Lisa Martin, you're watching The Cube, the leader in emerging and enterprise tech coverage. We'll be right back with our next guest.

Published Date : Nov 29 2022

SUMMARY :

And one of the things I think we learned in the pandemic is that access to real time data and real time analytics, I mean, you know, I love the topic and it's, it's got so many dimensions and such Evan, thank you so much for joining us. It's great to be here. Influxed since the last time we saw you at Reinvent. terms of, you know, you guys were commenting in the lead into Big data. And so it's basically a refresh of the whole database, a kilo database that uses So how does that affect where you can play in the marketplace? And you take these individual database companies like Mongo Influx, Why did you say arguably data bricks? And in some ways data bricks from the outside looking in looks a lot like Snowflake to me looks a lot Okay, thank you for that You know, so, So talk a little bit about IOx and, and what it is enabling you guys to achieve from a And the way we've designed IIS to handle what we call infinite cardinality, where you don't even have to So the unlimited cardinality, basically you could identify relationships between data And you can say, time measurement, you already have the ability to optimize pretty dramatically. So it's, it's that purpose built aspect of it. You couldn't take Postgres and do the same So yeah. And so the category is really emergent. especially with IOx. And I love this, I love this story because you know, what you guys have done, the change to IOCs, what, what are some of the key features of it and the key values in it for customers you have a self-driving car, you do that 55 times, you have a better podcast that And how, talk about how you support architecture that uses Apache Arrow Parque, which is, you know, the standard now becoming for file And you build out from it. One of the things I know that Influx data talks about is the time to awesome, which I love that, So it's got a schemaless background so you don't have to know the schema beforehand. It just happened to be that we have two Teslas and we have for a while, What would it say? Understand it be time to Awesome. Evan, thank you so much for joining David, the Great thing Haven't Well, great to have you back talking about what you guys are doing and helping organizations like Tesla and others really That's great. You for our guest and Dave Ante.

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Evan Kaplan	PERSON	0.99+
six months	QUANTITY	0.99+
Evan	PERSON	0.99+
Tesla	ORGANIZATION	0.99+
Influx Data	ORGANIZATION	0.99+
Paul	PERSON	0.99+
55 times	QUANTITY	0.99+
two	QUANTITY	0.99+
2:21 PM	DATE	0.99+
Las Vegas	LOCATION	0.99+
Dave Ante	PERSON	0.99+
Paul Dicks	PERSON	0.99+
six years	QUANTITY	0.99+
last year	DATE	0.99+
hundreds of millions	QUANTITY	0.99+
Mongo Influx	ORGANIZATION	0.99+
4 billion times	QUANTITY	0.99+
Two	QUANTITY	0.99+
December	DATE	0.99+
Microsoft	ORGANIZATION	0.99+
Influxed	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Influx	ORGANIZATION	0.99+
IOx	TITLE	0.99+
MySQL	TITLE	0.99+
three	QUANTITY	0.99+
Tuesdays	DATE	0.99+
each one	QUANTITY	0.98+
400 different locations	QUANTITY	0.98+
25 different days	QUANTITY	0.98+
first set	QUANTITY	0.98+
an hour	QUANTITY	0.98+
First	QUANTITY	0.98+
six years ago	DATE	0.98+
The Cube	TITLE	0.98+
One	QUANTITY	0.98+
Neo Forge	ORGANIZATION	0.98+
second thing	QUANTITY	0.98+
Each one	QUANTITY	0.98+
Paul Ds	PERSON	0.97+
IOx	ORGANIZATION	0.97+
today	DATE	0.97+
Teslas	ORGANIZATION	0.97+
MIT	ORGANIZATION	0.96+
Postgres	ORGANIZATION	0.96+
over seven months	QUANTITY	0.96+
one	QUANTITY	0.96+
five	DATE	0.96+
Venetian Expo Center	LOCATION	0.95+
Big Data Lakes	ORGANIZATION	0.95+
Cloudera	ORGANIZATION	0.94+
Columbia	LOCATION	0.94+
InfluxData	ORGANIZATION	0.94+
Wall Street	LOCATION	0.93+
SQL	TITLE	0.92+
Elastic	TITLE	0.92+
Data Bricks	ORGANIZATION	0.92+
Hacker News	TITLE	0.92+
two years ago	DATE	0.91+
Oracle	ORGANIZATION	0.91+
AWS Reinvent 2022	EVENT	0.91+
Elon Musker	PERSON	0.9+
Snowflake	ORGANIZATION	0.9+
Reinvent	ORGANIZATION	0.89+
billions of points a second	QUANTITY	0.89+
four year	QUANTITY	0.88+
Chronograph	TITLE	0.88+
Confluent	TITLE	0.87+
Spark	TITLE	0.86+
Apache	ORGANIZATION	0.86+
Snowflake	TITLE	0.85+
Grafana	TITLE	0.85+
GitHub	ORGANIZATION	0.84+

Mai Lan Tomsen Bukovec & Wayne Duso, AWS | AWS re:Invent 2021

>>Hi, buddy. Welcome back to the keeps coverage of AWS 2021. Re-invent you're watching the cube and I'm really excited. We're going to go outside the storage box. I like to say with my lawn Thompson Bukovac, who's the vice-president of block and object storage and Wayne Duso was a VP of storage edge and data governance guys. Great to see you again, we saw you at storage day, the 15 year anniversary of AWS, of course, the first product service ever. So awesome to be here. Isn't it. Wow. >>So much energy in the room. It's so great to see customers learning from each other, learning from AWS, learning from the things that you're observing as well. >>A lot of companies decided not to do physical events. I think you guys are on the right side of history. We're going to show you, you weren't exactly positive. How many people are going to show up. Everybody showed. I mean, it's packed house here, so >>Number 10. Yeah. >>All right. So let's get right into it. Uh, news of the week. >>So much to say, when you want to kick this off, >>We had a, we had a great set of announcements that Milan, uh, talked about yesterday, uh, in her talk and, and a couple of them in the file space, specifically a new, uh, member of the FSX family. And if you remember that the FSA, Amazon FSX is, uh, for customers who want to run fully managed versions of third party and open source file systems on AWS. And so yesterday we announced a new member it's FSX for open ZFS. >>Okay, cool. And there's more, >>Well, there's more, I mean, one of the great things about the new match file service world and CFS is it's powered by gravity. >>It is taught by Gravatar and all of the capabilities that AWS brings in terms of networking, storage, and compute, uh, to our customers. >>So this is really important. I want the audience to understand this. So I I've talked on the cube about how a large proportion let's call it. 30% of the CPU cycles are kind of wasted really on things like offloads, and we could be much more efficient, so graviton much more efficient, lower power and better price performance, lower cost. Amazon is now on a new curve, uh, cycles are faster for processors, and you can take advantage of that in storage it's storage users, compute >>That's right? In fact, you have that big launch as well for luster, with gravity. >>We did in fact, uh, so with, with, uh, Yasmin of open CFS, we also announced the next gen Lustre offering. And both of these offerings, uh, provide a five X improvement in performance. For example, now with luster, uh, customers can drive up to one terabyte per second of throughput, which is simply amazing. And with open CFS, right out of, right out of the box at GA a million IOPS at sub-millisecond latencies taking advantage of gravitas, taking advantage of our storage and networking capabilities. >>Well, I guess it's for HPC workloads, but what's the difference between these days HPC, big data, data intensive, a lot of AI stuff, >>All right. You to just, there's a lot of intersection between all of those different types of workloads they have, as you said, and you know, it all, it all depends on it all matters. And this is the reason why having the suite of capabilities that the, if you would, the members of the family is so important to our guests. >>We've talked a lot about, it's really can't think about traditional storage as a traditional storage anymore. And certainly your world's not a box. It's really a data platform, but maybe you could give us your point of view on that. >>Yeah, I think, you know, if, if we look, if we take a step back and we think about how does AWS do storage? Uh, we think along multiple dimensions, we have the dimension that Wayne's talking about, where you bring together the power of compute and storage for these managed file services that are so popular. You and I talked about, um, NetApp ONTAP. Uh, we went into some detail on that with you as well, and that's been enormously popular. And so that whole dimension of these managed file services is all about where is the customer today and how can we help them get to the cloud? But then you think about the other things that we're also imagining, and we're, re-imagining how customers want to grow those applications and scale them. And so a great example here at reinvent is let's just take the concept of archive. >>So many people, when they think about archive, they think about taking that piece of data and putting it away on tape, putting it away in a closet somewhere, never pulling it out. We don't think about archive like that archive just happens to be data that you just aren't using at the moment, but when you need it, you need it right away. And that's why we built a new storage class that we launched just yesterday, Dave, and it's called glacier instead of retrieval, it has retrieval and milliseconds, just like an Esri storage class has the same pricing of four tenths of a cent as glacier archive. >>So what's interesting at the analyst event today, Adam got a question about, and somebody was poking at him, you know, analysts can be snarky sometimes about, you know, price, declines and so forth. And he said, you know, one of the, one of the things that's not always shown up and we don't always get credit for lowering prices, but we might lower costs. And there's the archive and deep archive is an example of that. Maybe you could explain that point of view. >>Yeah. The way we look at it is that our customers, when they talk to us about the cost of storage, they talked to us about the total cost of the storage, and it's not just storing the data, it's retrieving it and using it. And so we have done an amazing amount across all the portfolio around reducing costs. We have glacier answer retrieval, which is 68% cheaper than standard infrequent access. That's a big cost reduction. We have EBS snapshots archive, which we introduced yesterday, 75% cheaper to archive a snapshot. And these are the types of that just transform the total cost. And in some cases we just eliminate costs. And so the glacier storage class, all bulk retrievals of data from the glacier storage class five to 12 hours, it's now free of charge. If you don't even have to think about, we didn't even reduce it. We just eliminated the cost of that data retrieval >>And additive to what Milan said around, uh, archiving. If you look at what we've done throughout the entire year, you know, a interesting statistic that was brought up yesterday is over the course of 2021, between our respective teams, we've launched over 105 capabilities for our customers throughout this year. And in some of them, for instance, on the file side for EFS, we launched one zone which reduced, uh, customer costs by 47%. Uh, you can now achieve on EFS, uh, cost of roughly 4.30 cents per gigabyte month on, uh, FSX, we've reduced costs up to 92%, uh, on Lustre and FSX for windows and with the introduction of ONTAP and open CFS, we continue those forward, including customers ability to compress and Dedoose against those costs. So they ended up seeing a considerable savings, even over what our standard low prices are. >>100 plus, what can I call them releases? And how can you categorize those? Are they features of eight? Do they fall into, >>Because they range for major services, like what we've launched with open ZFS to major features and really 95 of those were launched before re-invent. And so really what you have between the different teams that work in storage is you have this relentless drive to improve all the storage platforms. And we do it all across the course of the year, all across the course of the year. And in some cases, the benefit shows up at no cost at all to a customer. >>Uh, how, how did this, it seems like you're on an accelerated pace, a S3 EBS, and then like hundreds of services. I guess the question is how come it took so long and how is it accelerating now? Is it just like, there was so much focus on compute before you had to get that in place, or, but now it's just rapidly accessing, >>I I'll tell you, Dave, we took the time to count this year. And so we came to you with this number of 106, uh, that acceleration has been in place for many years. We just didn't take the time to couch. Correct. So this has been happening for years and years. Wayne and I have been with AWS for, for a long time now for 10 plus years. And really that velocity that we're talking about right now that has been happening every single year, which is where you have storage today. And I got to tell you, innovation is in our DNA and we are not going to stop now >>So 10 years. Okay. So it was really, the first five years was kind of slow. And then >>I think that's true at all. I don't think that try, you know, if you, if you look at, uh, the services that we have, we have the most complete portfolio of any cloud provider when it comes to storage and data. And so over the years, we've added to the foundation, which is S3 and the foundation, which is EBS. We've come out with a number of storage services in the, in the file space. Now you have an entire suite of persistent data stores within AWS and the teams behind those that are able to accelerate that pace. Just to give you an example, when I joined 10 years ago, AWS launched within that year, roughly a hundred and twenty, a hundred and twenty eight services or features our teams together this year have launched almost that many, just in those in, just in this space. So AWS continues to accelerate the storage teams continue to accelerate. And as my line said, we just started counting >>The thing. And if you think about those first five years, that was laying the baseline to launch us three, to launch EBS, to get that foundation in place, get lifecycle policies in place. But really, I think you're just going to see an even faster acceleration that number's going up. >>No, I that's what I'm saying. It does appear that way. And you had to build a team and put teams in place. And so that's, you know, part of the equation. But again, I come back to, it's not even, I don't even think of it as storage anymore. It's it's data. People are data lake is here to stay. You might not like the term. We always use the joke about a data ocean, but data lake is here to say 200,000 data lakes. Now we heard Adam talk about, uh, this morning. I think it was Adam. No, it was Swami. Do you want a thousand data lakes in your customer base now? And people are adding value to that data in new ways, injecting machine intelligence, you know, SageMaker is a big piece of that. Tying it in. I know a lot of customers are using glue as catalogs and which I'm like, wow, is glue a catalog or, I mean, it's just so flexible. So what are you seeing customers do with that base of data now and driving new business value? Because I've said last decade plus has been about it transformation. And now we're seeing business transformation. Maybe you could talk about that a little bit. >>Well, the base of every data lake is going to be as three yesterday has over 200 trillion objects. Now, Dave, and if you think about that, if you took every person on the planet, each of those people would have 26,000 S3 objects. It's gotten that big. And you know, if you think about the base of data with 200 trillion plus objects, really the opportunity for innovation is limitless. And you know, a great example for that is it's not just business value. It's really the new customer experiences that our customers are inventing the NFL. Uh, they, you know, they have that application called digital athlete where, you know, they started off with 10,000 labeled images or up to 20,000 labeled images now. And they're all using it to drive machine learning models that help predict and support the players on the field when they start to see things unfold that might cause injury. That is a brand new experience. And it's only possible with vast amounts of data >>Additive to when my line said, we're, we're in you talk about business transformation. We are in the age of data and we represent storage services. But what we really represent is what our customers hold one of their most valuable assets, which is their data. And that set of data is only growing. And the ability to use that data, to leverage that data for value, whether it's ML training, whether it's analytics, that's only accelerated, this is the feedback we get from our customers. This is where these features and new capabilities come from. So that's, what's really accelerating our pace >>Guys. I wish we had more time. I'd have to have you back because we're on a tight clock here, but, um, so great to see you both especially live. I hope we get to do more of this in 2022. I'm an optimist. Okay. And keep it right there, everybody. This is Dave Volante for the cube you're leader in live tech coverage, right back.

Published Date : Dec 2 2021

SUMMARY :

Great to see you again, we saw you at storage day, the 15 year anniversary of AWS, So much energy in the room. I think you guys are on the right side of history. Uh, news of the week. And if you remember that the FSA, And there's more, Well, there's more, I mean, one of the great things about the new match file service world and CFS is it's powered It is taught by Gravatar and all of the capabilities that AWS brings a new curve, uh, cycles are faster for processors, and you can take advantage of that in storage In fact, you have that big launch as well for luster, with gravity. And both of these offerings, You to just, there's a lot of intersection between all of those different types of workloads they have, as you said, but maybe you could give us your point of view on that. Uh, we went into some detail on that with you as well, and that's been enormously popular. that you just aren't using at the moment, but when you need it, you need it right away. And he said, you know, one of the, one of the things that's not always shown up and we don't always get credit for And so the glacier storage class, the entire year, you know, a interesting statistic that was brought up yesterday is over the course And so really what you have between the different there was so much focus on compute before you had to get that in place, or, but now it's just And so we came to you And then I don't think that try, you know, if you, And if you think about those first five years, that was laying the baseline to launch us three, And so that's, you know, part of the equation. And you know, a great example for that is it's not just business value. And the ability to use that data, to leverage that data for value, whether it's ML training, I'd have to have you back because we're on a tight clock here,

ENTITIES

Entity	Category	Confidence
Dave Volante	PERSON	0.99+
Dave	PERSON	0.99+
Wayne	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Adam	PERSON	0.99+
2022	DATE	0.99+
30%	QUANTITY	0.99+
10 plus years	QUANTITY	0.99+
75%	QUANTITY	0.99+
47%	QUANTITY	0.99+
68%	QUANTITY	0.99+
10 years	QUANTITY	0.99+
Wayne Duso	PERSON	0.99+
yesterday	DATE	0.99+
2021	DATE	0.99+
95	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
five	QUANTITY	0.99+
Yasmin	PERSON	0.99+
200,000 data lakes	QUANTITY	0.99+
10,000 labeled images	QUANTITY	0.99+
12 hours	QUANTITY	0.99+
first five years	QUANTITY	0.99+
FSX	TITLE	0.99+
10 years ago	DATE	0.98+
over 200 trillion objects	QUANTITY	0.98+
today	DATE	0.98+
each	QUANTITY	0.98+
this year	DATE	0.98+
one	QUANTITY	0.98+
three	QUANTITY	0.97+
both	QUANTITY	0.97+
S3	COMMERCIAL_ITEM	0.97+
up to 20,000 labeled images	QUANTITY	0.97+
eight	QUANTITY	0.96+
one zone	QUANTITY	0.96+
five X	QUANTITY	0.96+
NetApp	TITLE	0.95+
200 trillion plus objects	QUANTITY	0.95+
last decade	DATE	0.95+
a hundred and twenty, a hundred and twenty eight services	QUANTITY	0.95+
this morning	DATE	0.94+
EBS	ORGANIZATION	0.94+
over 105 capabilities	QUANTITY	0.94+
ONTAP	TITLE	0.93+
4.30 cents	QUANTITY	0.93+
100 plus	QUANTITY	0.92+
Swami	PERSON	0.92+
up to 92%	QUANTITY	0.91+
NFL	ORGANIZATION	0.9+
CFS	TITLE	0.9+
Milan	PERSON	0.89+
15 year anniversary	QUANTITY	0.88+
single year	QUANTITY	0.87+
SageMaker	ORGANIZATION	0.87+
four tenths of a cent	QUANTITY	0.87+
Gravatar	ORGANIZATION	0.86+
Invent	EVENT	0.85+
hundreds of services	QUANTITY	0.84+
a million	QUANTITY	0.84+
windows	TITLE	0.82+
Mai Lan Tomsen Bukovec	PERSON	0.81+

Sabita Davis and Patrick Zeimet | Io-Tahoe Adaptive Data Governance

>>from around the globe. It's the Cube presenting adaptive data governance brought >>to you by >>Iota Ho. In this next segment, we're gonna be talking to you about getting to know your data. And specifically you're gonna hear from two folks at Io Tahoe. We've got enterprise account execs Evita Davis here, as well as Enterprise Data engineer Patrick Simon. They're gonna be sharing insights and tips and tricks for how you can get to know your data and quickly on. We also want to encourage you to engage with Sabina and Patrick. Use the chat feature to the right, send comments, questions or feedback so you can participate. All right, Patrick Sabetta, take it away. All right. >>Thanks, Lisa. Great to be here as Lisa mentioned guys. I'm the enterprise account executive here in Ohio. Tahoe you Pat? >>Yeah. Hey, everyone so great to be here. A said My name's Patrick Samit. I'm the enterprise data engineer here at Iota Ho. And we're so excited to be here and talk about this topic as one thing we're really trying to perpetuate is that data is everyone's business. >>I couldn't agree more, Pat. So, guys, what patent? I patent. I've actually had multiple discussions with clients from different organizations with different roles. So we spoke with both your technical and your non technical audience. So while they were interested in different aspects of our platform, we found that what they had in common was they wanted to make data easy to understand and usable. So that comes back. The pats point off being everybody's business because no matter your role, we're all dependent on data. So what Pan I wanted to do today was wanted toe walk. You guys through some of those client questions, slash pain points that we're hearing from different industries and different roles and demo how our platform here, like Tahoe, is used for automating those, uh, automating Dozier related tasks. So with that said, are you ready for the first one, Pat? >>Yeah, Let's do it. >>Great. So I'm gonna put my technical hat on for this one, So I'm a data practitioner. I just started my job. ABC Bank. I have over 100 different data sources. So I have data kept in Data Lakes, legacy data, sources, even the cloud. So my issue is I don't know what those data sources hold. I don't know what data sensitive, and I don't even understand how that data is connected. So how can I talk to help? >>Yeah, I think that's a very common experience many are facing and definitely something I've encountered in my past. Typically, the first step is to catalog the data and then start mapping the relationships between your various data stores. Now, more often than not, this has tackled through numerous meetings and a combination of Excel and something similar to video, which are too great tools in their own part. But they're very difficult to maintain. Just due to the rate that we are creating data in the modern world. It starts to beg for an idea that can scale with your business needs. And this is where a platform like Io Tahoe becomes so appealing. You can see here visualization of the data relationships created by the I Ho Tahoe service. Now, what is fantastic about this is it's not only laid out in a very human and digestible format in the same action of creating this view, the data catalog was constructed. >>Um, So is the data catalog automatically populated? Correct. Okay, so So what? I'm using iota. Hope at what I'm getting is this complete, unified automated platform without the added cost, of course. >>Exactly. And that's at the heart of Iota Ho. A great feature with that data catalog is that Iota Ho will also profile your data as it creates the catalog, assigning some meaning to those pesky column Underscore ones and custom variable underscore tents that are always such a joy to deal with. Uh, now, by leveraging this interface, we can start to answer the first part of your question and understand where the core relationships within our data exists. Personally, I'm a big fan of this >>view, >>as it really just helps the i b naturally John to these focal points that coincide with these key columns following that train of thought. Let's examine the customer I D column that seems to be at the center of a lot of these relationships. We can see that it's a fairly important column as it's maintaining the relationship between at least three other tables. Now you notice all the connectors are in this blue color. This means that their system defined relationships. But I hope Tahoe goes that extra mile and actually creates thes orange colored connectors as well. These air ones that are machine learning algorithms have predicted to be relationships. Uh, and you can leverage to try and make new and powerful relationships within your data. So I hope that answers the first part of your question. >>Eso So this is really cool. And I can see how this could be leverage quickly. Now. What if I added new data sources or your multiple data sources and needed toe? Identify what data sensitive. Can I Oh, Tahoe, Detect that. >>Yeah, definitely. Within the i o ta platform. There already over 300 pre defined policies such as HIPAA, ferpa, C, c, p, a and the like. One can choose which of these policies to run against their data along for flexibility and efficiency and running the policies that affect organization. >>Okay, so so 300 is an exceptional number. I'll give you that. But what about internal policies that apply to my organization? Is there any ability for me to write custom policies? >>Yeah, that's no issue. And is something that clients leverage fairly often to utilize this function when simply has to write a rejects that our team has helped many deploy. After that, the custom policy is stored for future use to profile sensitive data. One then selects the data sources they're interested in and select the policies that meet your particular needs. The interface will automatically take your data according to the policies of detects, after which you can review the discoveries confirming or rejecting the tagging. All of these insights are easily exported through the interface, so one can work these into the action items within your project management systems. And I think this lends to the collaboration as a team can work through the discovery simultaneously. And as each item is confirmed or rejected, they can see it ni instantaneously. All this translates to a confidence that with iota how you can be sure you're in compliance. >>Um, so I'm glad you mentioned compliance because that's extremely important to my organization. >>So >>what you're saying when I use the eye a Tahoe automated platform, we'd be 90% more compliant that before were other than if you were going to be using a human. >>Yeah, definitely. The collaboration and documentation that the iota ho interface lends itself to can really help you build that confidence that your compliance is sound. >>Does >>that answer your question about sense of data? >>Definitely so. So path. I have the next question for you. So we're planning on migration on guy. Have a set of reports I need to migrate. But what I need to know is that well, what what data sources? Those report those reports are dependent on and what's feeding those tables? >>Yeah, it's a fantastic questions to be toe identifying critical data elements, and the interdependencies within the various databases could be a time consuming but vital process and the migration initiative. Luckily, Iota Ho does have an answer, and again, it's presented in a very visual format. >>So what I'm looking at here is my entire day landscape. >>Yes, exactly. >>So let's say I add another data source. I can still see that Unified 3 60 view. >>Yeah, One feature that is particularly helpful is the ability to add data sources after the data lineage. Discovery has finished along for the flexibility and scope necessary for any data migration project. If you only need need to select a few databases or your entirety, this service will provide the answers. You're looking for this visual representation of the connectivity makes the identification of critical data elements a simple matter. The connections air driven by both system defined flows as well as those predicted by our algorithms, the confidence of which, uh can actually be customized to make sure that they're meeting the needs of the initiative that you have in place. Now, this also provides tabular output in case you need it for your own internal documentation or for your action items, which we can see right here. Uh, in this interface, you can actually also confirm or deny the pair rejection the pair directions along to make sure that the data is as accurate as possible. Does that help with your data lineage needs? >>Definitely. So So, Pat, My next big question here is So now I know a little bit about my data. How do I know I can trust it? So what I'm interested in knowing really is is it in a fit state for Meteo use it? Is it accurate? Does it conform to the right format? >>Yeah, that's a great question. I think that is a pain point felt across the board, be it by data practitioners or data consumers alike. another service that iota hope provides is the ability to write custom data quality rules and understand how well the data pertains to these rules. This dashboard gives a unified view of the strength of these rules, and your dad is overall quality. >>Okay, so Pat s o on on the accuracy scores there. So if my marketing team needs to run, a campaign can read dependent those accuracy scores to know what what tables have quality data to use for our marketing campaign. >>Yeah, this view would allow you to understand your overall accuracy as well as dive into the minutia to see which data elements are of the highest quality. So for that marketing campaign, if you need everything in a strong form, you'll be able to see very quickly with these high level numbers. But if you're only dependent on a few columns to get that information out the door, you can find that within this view, uh, >>so you >>no longer have to rely on reports about reports, but instead just come to this one platform to help drive conversations between stakeholders and data practitioners. I hope that helps answer your questions about that quality. >>Oh, definitely. So I have another one for you here. Path. So I get now the value of IATA who brings by automatically captured all those technical metadata from sources. But how do we match that with the business glossary? >>Yeah, within the same data quality service that we just reviewed. One can actually add business rules detailing the definitions and the business domains that these fall into. What's more is that the data quality rules were just looking at can then be tied into these definitions, allowing insight into the strength of these business rules. It is this service that empowers stakeholders across the business to be involved with the data life cycle and take ownership over the rules that fall within their domain. >>Okay, so those custom rules can I apply that across data sources? >>Yeah. You can bring in as many data sources as you need, so long as you could tie them to that unified definition. >>Okay, great. Thanks so much bad. And we just want to quickly say to everyone working in data, we understand your pain, so please feel free to reach out >>to us. We >>are website the chapel. Oh, Arlington. And let's get a conversation started on how iota Who can help you guys automate all those manual task to help save you time and money. Thank you. Thank >>you. Erin. >>Impact. If I could ask you one quick question, how do you advise customers? You just walk in this great example This banking example that you and city to talk through. How do you advise customers get started? >>Yeah, I think the number one thing that customers could do to get started with our platform is to just run the tag discovery and build up that data catalog. It lends itself very quickly to the other needs you might have, such as thes quality rules as well as identifying those kind of tricky columns that might exist in your data. Those custom variable underscore tens I mentioned before >>last questions to be to anything to add to what Pat just described as a starting place. >>Um, no, I think actually passed something that pretty well, I mean, just just by automating all those manual tasks, I mean, it definitely can save your company a lot of time and money, so we we encourage you just reach out to us. Let's get that conversation started. >>Excellent. Savita and Pat, Thank you so much. We hope you have learned a lot from these folks about how to get to know your data. Make sure that it's quality so that you can maximize the value of it. Thanks for watching.

Published Date : Dec 10 2020

SUMMARY :

from around the globe. for how you can get to know your data and quickly on. I'm the enterprise account executive here in Ohio. I'm the enterprise data engineer here at Iota Ho. So we spoke with both your technical and your non technical So I have data kept in Data Lakes, legacy data, sources, even the cloud. Typically, the first step is to catalog the data and then start mapping the relationships Um, So is the data catalog automatically populated? Uh, now, by leveraging this interface, we can start to answer the first part of your question So I hope that answers the first part of your question. And I can see how this could be leverage quickly. to run against their data along for flexibility and efficiency and running the policies that affect organization. policies that apply to my organization? And I think this lends to the collaboration as a team can work through the discovery that before were other than if you were going to be using a human. interface lends itself to can really help you build that confidence that your compliance is I have the next question for you. Yeah, it's a fantastic questions to be toe identifying critical data elements, and the interdependencies within I can still see that Unified 3 60 view. Yeah, One feature that is particularly helpful is the ability to add data sources after the data Does it conform to the right format? hope provides is the ability to write custom data quality rules and understand how well the data needs to run, a campaign can read dependent those accuracy scores to know what what tables have quality Yeah, this view would allow you to understand your overall accuracy as well as dive into the minutia I hope that helps answer your questions about that quality. So I have another one for you here. to be involved with the data life cycle and take ownership over the rules that fall within their domain. so long as you could tie them to that unified definition. we understand your pain, so please feel free to reach out to us. help you guys automate all those manual task to help save you time and money. you. This banking example that you and city to talk through. Yeah, I think the number one thing that customers could do to get started with our platform so we we encourage you just reach out to us. Make sure that it's quality so that you can maximize the value of it.

ENTITIES

Entity	Category	Confidence
Sabina	PERSON	0.99+
Savita	PERSON	0.99+
Pat	PERSON	0.99+
Patrick	PERSON	0.99+
Patrick Zeimet	PERSON	0.99+
Patrick Simon	PERSON	0.99+
Evita Davis	PERSON	0.99+
Lisa	PERSON	0.99+
Ohio	LOCATION	0.99+
ABC Bank	ORGANIZATION	0.99+
Patrick Sabetta	PERSON	0.99+
Sabita Davis	PERSON	0.99+
I Ho Tahoe	ORGANIZATION	0.99+
Patrick Samit	PERSON	0.99+
90%	QUANTITY	0.99+
Erin	PERSON	0.99+
Excel	TITLE	0.99+
each item	QUANTITY	0.99+
first step	QUANTITY	0.99+
two folks	QUANTITY	0.99+
today	DATE	0.99+
Io Tahoe	ORGANIZATION	0.98+
both	QUANTITY	0.98+
first part	QUANTITY	0.98+
John	PERSON	0.98+
HIPAA	TITLE	0.98+
first one	QUANTITY	0.97+
iota	TITLE	0.95+
one quick question	QUANTITY	0.94+
ferpa	TITLE	0.93+
Iota Ho	TITLE	0.93+
Cube	ORGANIZATION	0.93+
One feature	QUANTITY	0.92+
IATA	ORGANIZATION	0.92+
over 100 different data sources	QUANTITY	0.9+
one	QUANTITY	0.89+
one platform	QUANTITY	0.88+
three other tables	QUANTITY	0.86+
Pan	PERSON	0.85+
Tahoe	ORGANIZATION	0.84+
Iota Ho	TITLE	0.84+
one thing	QUANTITY	0.82+
Tahoe	PERSON	0.82+
Iota Ho	ORGANIZATION	0.75+
over 300	QUANTITY	0.74+
C	TITLE	0.74+
both system	QUANTITY	0.72+
at least	QUANTITY	0.68+
Data Lakes	LOCATION	0.68+
Meteo	ORGANIZATION	0.64+
One	QUANTITY	0.58+
Io-Tahoe	ORGANIZATION	0.56+
Dozier	ORGANIZATION	0.56+
p	TITLE	0.52+
300	OTHER	0.48+
Arlington	PERSON	0.41+
Tahoe	LOCATION	0.4+
3 60	OTHER	0.38+

Yusef Khan, Io Tahoe | Enterprise Data Automation

>>from around the globe. It's the Cube with digital coverage of enterprise data automation, an event Siri's brought to you by Iot. Tahoe, everybody, We're back. We're talking about enterprise data automation. The hashtag is data automated, and we're going to really dig into data migrations, data, migrations. They're risky. They're time consuming, and they're expensive. Yousef con is here. He's the head of partnerships and alliances at I o ta ho coming again from London. Hey, good to see you, Seth. Thanks very much. >>Thank you. >>So your role is is interesting. We're talking about data migrations. You're gonna head of partnerships. What is your role specifically? And how is it relevant to what we're gonna talk about today? >>Uh, I work with the various businesses such as cloud companies, systems integrators, companies that sell operating systems, middleware, all of whom are often quite well embedded within a company. I t infrastructures and have existing relationships. Because what we do fundamentally makes migrating to the cloud easier on data migration easier. A lot of businesses that are interested in partnering with us. Um, we're interested in parting with, So >>let's set up the problem a little bit. And then I want to get into some of the data. You know, I said that migration is a risky, time consuming, expensive. They're they're often times a blocker for organizations to really get value out of data. Why is that? >>Uh, I think I mean, all migrations have to start with knowing the facts about your data, and you can try and do this manually. But when that you have an organization that may have been going for decades or longer, they will probably have a pretty large legacy data estate so that I have everything from on premise mainframes. They may have stuff which is probably in the cloud, but they probably have hundreds, if not thousands of applications and potentially hundreds of different data stores. Um, now they're understanding of what they have. Ai's often quite limited because you can try and draw a manual maps, but they're outdated very quickly. Every time that data changes the manual that's out of date on people obviously leave organizations over time, so that kind of tribal knowledge gets built up is limited as well. So you can try a Mackel that manually you might need a db. Hey, thanks. Based analyst or ah, business analyst, and they won't go in and explore the data for you. But doing that manually is very, very time consuming this contract teams of people, months and months. Or you can use automation just like what's the bank with Iot? And they managed to do this with a relatively small team. Are in a timeframe of days. >>Yeah, we talked to Paul from Webster Bank. Awesome discussion. So I want to dig into this migration and let's let's pull up graphic it will talk about. We'll talk about what a typical migration project looks like. So what you see here it is. It's very detailed. I know it's a bit of an eye test, but let me call your attention to some of the key aspects of this Ah, and then use. If I want you to chime in. So at the top here, you see that area graph that's operational risk for a typical migration project, and you can see the timeline and the the milestones. That blue bar is the time to test so you can see the second step data analysis talking 24 weeks so, you know, very time consuming. And then Let's not get dig into the stuff in the middle of the fine print, but there's some real good detail there, but go down the bottom. That's labor intensity in the in the bottom and you can see high is that sort of brown and and you could see a number of data analysis, data staging data prep, the trial, the implementation post implementation fixtures, the transition toe B A B a year, which I think is business as usual. Those are all very labor intensive. So what do you take aways from this typical migration project? What do we need to know yourself? >>I mean, I think the key thing is, when you don't understand your data upfront, it's very difficult to scope to set up a project because you go to business stakeholders and decision makers and you say Okay, we want to migrate these data stores. We want to put them in the cloud most often, but actually, you probably don't know how much data is there. You don't necessarily know how many applications that relates to, you know, the relationships between the data. You don't know the flow of the data. So the direction in which the data is going between different data stores and tables, so you start from a position where you have pretty high risk and alleviate that risk. You could be stacking project team of lots and lots of people to do the next base, which is analysis. And so you set up a project which has got a pretty high cost. The big projects, more people, the heavy of governance, obviously on then there, then in the phase where they're trying to do lots and lots of manual analysis manage. That, in a sense, is, as we all know, on the idea of trying to relate data that's in different those stores relating individual tables and columns. Very, very time consuming, expensive. If you're hiring in resource from consultants or systems integrators externally, you might need to buy or to use party tools, Aziz said earlier. The people who understand some of those systems may have left a while ago. See you even high risks quite cost situation from the off on the same things that have developed through the project. Um, what are you doing with it, Ayatollah? Who is that? We're able to automate a lot of this process from the very beginning because we can do the initial data. Discovery run, for example, automatically you very quickly have an automated validator. A data map on the data flow has been generated automatically, much less time and effort and much less cars. Doctor Marley. >>Okay, so I want to bring back that that first chart, and I want to call your attention to the again that area graph the blue bars and then down below that labor intensity. And now let's bring up the the the same chart. But with a set of an automation injection in here and now. So you now see the So let's go Said Accelerated by Iot, Tom. Okay, great. And we're going to talk about this. But look, what happens to the operational risk. A dramatic reduction in that. That graph. And then look at the bars, the bars, those blue bars. You know, data analysis went from 24 weeks down to four weeks and then look at the labor intensity. The it was all these were high data analysis data staging data prep. Try a lot post implementation fixtures in transition to be a you. All of those went from high labor intensity. So we've now attack that and gone to low labor intensity. Explain how that magic happened. >>I think that the example off a data catalog. So every large enterprise wants to have some kind of repository where they put all their understanding about their data in its Price States catalog, if you like, um, imagine trying to do that manually. You need to go into every individual data store. You need a DB a business analyst, rich data store they need to do in extracted the data table was individually they need to cross reference that with other data school, it stores and schemers and tables. You probably were the mother of all lock Excel spreadsheets. It would be a very, very difficult exercise to do. I mean, in fact, one of our reflections as we automate lots of data lots of these things is, um it accelerates the ability to water may, But in some cases, it also makes it possible for enterprise customers with legacy systems um, take banks, for example. There quite often end up staying on mainframe systems that they've had in place for decades. Uh, no migrating away from them because they're not able to actually do the work of understanding the data g duplicating the data, deleting data isn't relevant and then confidently going forward to migrate. So they stay where they are with all the attendant problems assistance systems that are out of support. Go back to the data catalog example. Um, whatever you discover invades, discovery has to persist in a tool like a data catalog. And so we automate data catalog books, including Out Way Cannot be others, but we have our own. The only alternative to this kind of automation is to build out this very large project team or business analysts off db A's project managers processed analysts together with data to understand that the process of gathering data is correct. To put it in the repository to validate it except etcetera, we've got into organizations and we've seen them ramp up teams off 2030 people costs off £234 million a year on a time frame, 15 20 years just to try and get a data catalog done. And that's something that we can typically do in a timeframe of months, if not weeks. And the difference is using automation. And if you do what? I've just described it. In this manual situation, you make migrations to the cloud prohibitively expensive. Whatever saving you might make from shutting down your legacy data stores, we'll get eaten up by the cost of doing it. Unless you go with the more automated approach. >>Okay, so the automated approach reduces risk because you're not gonna, you know you're going to stay on project plan. Ideally, it's all these out of scope expectations that come up with the manual processes that kill you in the rework andan that data data catalog. People are afraid that their their family jewels data is not going to make it through to the other side. So So that's something that you're you're addressing and then you're also not boiling the ocean. You're really taking the pieces that are critical and stuff you don't need. You don't have to pay for >>process. It's a very good point. I mean, one of the other things that we do and we have specific features to do is to automatically and noise data for a duplication at a rover or record level and redundancy on a column level. So, as you say before you go into a migration process. You can then understand. Actually, this stuff it was replicated. We don't need it quite often. If you put data in the cloud you're paying, obviously, the storage based offer compute time. The more data you have in there that's duplicated, that is pure cost. You should take out before you migrate again if you're trying to do that process of understanding what's duplicated manually off tens or hundreds of bases stores. It was 20 months, if not years. Use machine learning to do that in an automatic way on it's much, much quicker. I mean, there's nothing I say. Well, then, that costs and benefits of guitar. Every organization we work with has a lot of money existing, sunk cost in their I t. So have your piece systems like Oracle or Data Lakes, which they've spent a good time and money investing in. But what we do by enabling them to transition everything to the strategic future repositories, is accelerate the value of that investment and the time to value that investment. So we're trying to help people get value out of their existing investments on data estate, close down the things that they don't need to enable them to go to a kind of brighter, more future well, >>and I think as well, you know, once you're able to and this is a journey, we know that. But once you're able to go live on, you're infusing sort of a data mindset, a data oriented culture. I know it's somewhat buzzword, but when you when you see it in organizations, you know it's really and what happens is you dramatically reduce that and cycle time of going from data to actually insights. Data's plentiful, but insights aren't, and that is what's going to drive competitive advantage over the next decade and beyond. >>Yeah, definitely. And you could only really do that if you get your data estate cleaned up in the first place. Um, I worked with the managed teams of data scientists, data engineers, business analysts, people who are pushing out dashboards and trying to build machine learning applications. You know, you know, the biggest frustration for lots of them and the thing that they spend far too much time doing is trying to work out what the right data is on cleaning data, which really you don't want a highly paid thanks to scientists doing with their time. But if you sort out your data stays in the first place, get rid of duplication. If that pans migrate to cloud store, where things are really accessible on its easy to build connections and to use native machine learning tools, you're well on the way up to date the maturity curve on you can start to use some of those more advanced applications. >>You said. What are some of the pre requisites? Maybe the top few that are two or three that I need to understand as a customer to really be successful here? Is it skill sets? Is it is it mindset leadership by in what I absolutely need to have to make this successful? >>Well, I think leadership is obviously key just to set the vision of people with spiky. One of the great things about Ayatollah, though, is you can use your existing staff to do this work. If you've used on automation, platform is no need to hire expensive people. Alright, I was a no code solution. It works out of the box. You just connect to force on your existing stuff can use. It's very intuitive that has these issues. User interface? >>Um, it >>was only to invest vast amounts with large consultants who may well charging the earth. Um, and you already had a bit of an advantage. If you've got existing staff who are close to the data subject matter experts or use it because they can very easily learn how to use a tool on, then they can go in and they can write their own data quality rules on. They can really make a contribution from day one, when we are go into organizations on way. Can I? It's one of the great things about the whole experience. Veritas is. We can get tangible results back within the day. Um, usually within an hour or two great ones to say Okay, we started to map relationships. Here's the data map of the data that we've analyzed. Harrison thoughts on where the sensitive data is because it's automated because it's running algorithms stater on. That's what they were really to expect. >>Um, >>and and you know this because you're dealing with the ecosystem. We're entering a new era of data and many organizations to your point, they just don't have the resources to do what Google and Amazon and Facebook and Microsoft did over the past decade To become data dominant trillion dollar market cap companies. Incumbents need to rely on technology companies to bring that automation that machine intelligence to them so they can apply it. They don't want to be AI inventors. They want to apply it to their businesses. So and that's what really was so difficult in the early days of so called big data. You have this just too much complexity out there, and now companies like Iot Tahoe or bringing your tooling and platforms that are allowing companies to really become data driven your your final thoughts. Please use it. >>That's a great point, Dave. In a way, it brings us back to where it began. In terms of partnerships and alliances. I completely agree with a really exciting point where we can take applications like Iot. Uh, we can go into enterprises and help them really leverage the value of these type of machine learning algorithms. And and I I we work with all the major cloud providers AWS, Microsoft Azure or Google Cloud Platform, IBM and Red Hat on others, and we we really I think for us. The key thing is that we want to be the best in the world of enterprise data automation. We don't aspire to be a cloud provider or even a workflow provider. But what we want to do is really help customers with their data without automated data functionality in partnership with some of those other businesses so we can leverage the great work they've done in the cloud. The great work they've done on work flows on virtual assistants in other areas. And we help customers leverage those investments as well. But our heart, we really targeted it just being the best, uh, enterprised data automation business in the world. >>Massive opportunities not only for technology companies, but for those organizations that can apply technology for business. Advantage yourself, count. Thanks so much for coming on the Cube. Appreciate. All right. And thank you for watching everybody. We'll be right back right after this short break. >>Yeah, yeah, yeah, yeah.

Published Date : Jun 23 2020

SUMMARY :

of enterprise data automation, an event Siri's brought to you by Iot. And how is it relevant to what we're gonna talk about today? fundamentally makes migrating to the cloud easier on data migration easier. a blocker for organizations to really get value out of data. And they managed to do this with a relatively small team. That blue bar is the time to test so you can see the second step data analysis talking 24 I mean, I think the key thing is, when you don't understand So you now see the So let's go Said Accelerated by Iot, You need a DB a business analyst, rich data store they need to do in extracted the data processes that kill you in the rework andan that data data catalog. close down the things that they don't need to enable them to go to a kind of brighter, and I think as well, you know, once you're able to and this is a journey, And you could only really do that if you get your data estate cleaned up in I need to understand as a customer to really be successful here? One of the great things about Ayatollah, though, is you can use Um, and you already had a bit of an advantage. and and you know this because you're dealing with the ecosystem. And and I I we work And thank you for watching everybody.

ENTITIES

Entity	Category	Confidence
Paul	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Facebook	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
London	LOCATION	0.99+
Oracle	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Yusef Khan	PERSON	0.99+
Seth	PERSON	0.99+
Dave	PERSON	0.99+
20 months	QUANTITY	0.99+
Aziz	PERSON	0.99+
hundreds	QUANTITY	0.99+
tens	QUANTITY	0.99+
IBM	ORGANIZATION	0.99+
Webster Bank	ORGANIZATION	0.99+
24 weeks	QUANTITY	0.99+
two	QUANTITY	0.99+
four weeks	QUANTITY	0.99+
three	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
Io Tahoe	PERSON	0.99+
Marley	PERSON	0.99+
Harrison	PERSON	0.99+
Data Lakes	ORGANIZATION	0.99+
Siri	TITLE	0.99+
Excel	TITLE	0.99+
Veritas	ORGANIZATION	0.99+
second step	QUANTITY	0.99+
15 20 years	QUANTITY	0.98+
Tahoe	PERSON	0.98+
One	QUANTITY	0.98+
first chart	QUANTITY	0.98+
an hour	QUANTITY	0.98+
Red Hat	ORGANIZATION	0.98+
one	QUANTITY	0.97+
Tom	PERSON	0.96+
hundreds of bases	QUANTITY	0.96+
first	QUANTITY	0.95+
next decade	DATE	0.94+
first place	QUANTITY	0.94+
Iot	ORGANIZATION	0.94+
Iot	TITLE	0.93+
earth	LOCATION	0.93+
day one	QUANTITY	0.92+
Mackel	ORGANIZATION	0.91+
today	DATE	0.91+
Ayatollah	PERSON	0.89+
£234 million a year	QUANTITY	0.88+
data	QUANTITY	0.88+
Iot	PERSON	0.83+
hundreds of	QUANTITY	0.81+
thousands of applications	QUANTITY	0.81+
decades	QUANTITY	0.8+
I o ta ho	ORGANIZATION	0.75+
past decade	DATE	0.75+
Microsoft Azure	ORGANIZATION	0.72+
two great ones	QUANTITY	0.72+
2030 people	QUANTITY	0.67+
Doctor	PERSON	0.65+
States	LOCATION	0.65+
Iot Tahoe	ORGANIZATION	0.65+
a year	QUANTITY	0.55+
Yousef	PERSON	0.45+
Cloud Platform	TITLE	0.44+
Cube	ORGANIZATION	0.38+

Dave Brown, Amazon | AWS Summit Online 2020

>> Narrator: From theCUBE studios in Palo Alto in Boston, connecting with thought leaders all around the world, this is theCUBE conversation. >> Everyone, welcome to the Cube special coverage of the AWS Summit San Francisco, North America all over the world, and most of the parts Asia, Pacific Amazon Summit is the hashtag. This is part of theCUBE Virtual Program, where we're going to be covering Amazon Summits throughout the year. I'm John Furrier, host of theCUBE. And of course, we're not at the events. We're here in the Palo Alto Studios, with our COVID-19 quarantine crew. And we got a great guest here from AWS, Dave Brown, Vice President of EC2, leads the team on elastic compute, and its business where it's evolving and most importantly, what it means for the customers in the industry. Dave, thanks for spending the time to come on theCUBE virtual program. >> Hey John, it's really great to be here, thanks for having me. >> So we got the summit going down. It's new format because of the shelter in place. They're going virtual or digital, virtualization of events. And I want to have a session with you on EC2, and some of the new things they're going on. And I think the story is important, because certainly around the pandemic, and certainly on the large scale, SaaS business models, which are turning out to be quite the impact from a positive standpoint, with people sheltering in place, what is the role of data in all this, okay? And also, there's a lot of pressure financially. We've had the payroll loan programs from the government, and to companies really looking at their bottom lines. So two major highlights going on in the world that's directly impacted. And you have some products, and news around this, I want to do a deep dive on that. One is AppFlow, which is a new integration service by AWS, that really talks about taking the scale and value of AWS services, and integrating that with SaaS Applications. And the migration acceleration program for Windows, which has a storied history of database. For many, many years, you guys have been powering most of the Windows workloads, ironic that you guys are not Microsoft, but certainly had success there. Let's start with the AppFlow. Okay, this was recently announced on the 22nd of April. This is a new service. Can you take us through why this is important? What is the service? Why now, what was the main driver behind AppFlow? >> Yeah, absolutely. So with the launcher AppFlow, what we're really trying to do is make it easy for organizations and enterprises to really control the flow of their data, between the number of different applications that they use on premise, and AWS. And so the problem we started to see was, enterprises just had this data all over the place, and they wanted to do something useful with it. Right, we see many organizations running Data Lakes, large scale analytics, Big Machine Learning on AWS, but before you can do all of that, you have to have access to the data. And if that data is sitting in an application, either on-premise or elsewhere in AWS, it's very difficult to get out of that application, and into S3, or Redshift, or one of those services, before you can manipulate it, that was the challenge. And so the journey kind of started a few years ago, we actually launched a service on the EC2 network, inside Private Link. And it was really, it provided organizations with a very secure way to transfer network data, both between VPCs, and also between VPC, and on-prem networks. And what this highlighted to us, is organizations say that's great, but I actually don't have the technical ability, or the team, to actually do the work that's required to transform the data from, whether it's Salesforce, or SAP, and actually move it over Private Link to AWS. And so we realized, while private link was useful, we needed another layer of service that actually provided this, and one of the key requirements was an organization must be able to do this with no code at all. So basically, no developer required. And I want to be able to transfer data from Salesforce, my Salesforce database, and put that in Redshift together with some other data, and then perform some function on that. And so that's what AppFlow is all about. And so we came up with the idea about a little bit more than a year ago, that was the first time I sat down, and actually reviewed the content for what this was going to be. And the team's been hard at work, and launched on the 22nd of April. And we actually launched with 14 partners as well, that provide what we call connectors, which allow us to access these various services, and companies like Salesforce and ServiceNow, Slack, Snowflake, to name a few. >> Well, certainly you guys have a great ecosystem of SaaS partners, and that's you know well documented in the industry that you guys are not going to be competing directly with a lot of these big SaaS players, although you do have a few services for customers who want end to end, Jassy continues to pound that home on my Cube interviews. But I think this, >> Absolutely. is notable, and I want to get your thoughts on this, because this seems to be the key unlocking of the value of SaaS and Cloud, because data traversal, data transfer, there's costs involved, also moving traffic over the internet is unsecure, and unreliable. So a couple questions I wanted to just ask you directly. One is did the AppFlow come out of the AWS Private Link piece of it? And two, is it one directional or bi-directional? How is that working? Because I'm guessing that you had Private Link became successful, because no one wants to move on the internet. They wanted direct connects. Was there something inadequate about that service? Was there more headroom there? And is it bi-directional for the customer? >> So let me take the second one, it's absolutely bi-directional. So you can transfer that data between an on-premise application and AWS, or AWS and the on-premise application. Really, anything that has a connector can support the data flow in both directions. And with transformations, and so data in one data source, may need to be transformed, before it's actually useful in a second data source. And so AppFlow takes care of all that transformation as well, in both directions, And again, with no requirement for any code, on behalf of the customer. Which really unlocks it for a lot of the more business focused parts of an organization, who maybe don't have immediate access to developers. They can use it immediately, just literally with a few transformations via the console, and it's working for you. In terms of, you mentioned sort of the flow of data over the internet, and the need for security of data. It's critically important, and as we look at just what had happened as a company does. We have very, very strict requirements around the flow of data, and what services we can use internally. And where's any of our data going to be going? And I think it's a good example of how many enterprises are thinking about data today. They don't even want to trust even HTTPS, and encryption of data on the internet. I'd rather just be in a world where my data never ever traverses the internet, and I just never have to deal with that. And so, the journey all started with Private Link there, and probably was an interesting feature, 'cause it really was changing the way that we asked our customers to think about networking. Nothing like Private Link has ever existed, in the sort of standard networking that an enterprise would normally have. It's kind of only possible because of what VPC allows you to do, and what the software defined network on AWS gives you. And so we built Private Link, and as I said, customers started to adopt it. They loved the idea of being able to transfer data, either between VPCs, or between on-premise. Or between their own VPC, and maybe a third party provider, like Snowflake, has been a very big adopter of Private Link, and they have many customers using it to get access to Snowflake databases in a very secure way. And so that's where it all started, and in those discussions with customers, we started to see that they wanted us to up level a little bit. They said, "We can use Private Link, it's great, "but one of the problems we have is just the flow of data." And how do we move data in a very secure, in a highly available way, with no sort of bottlenecks in the system. And so we thought Private Link was a great sort of underlying technology, that empowered all of this, but we had to build the system on top of that, which is AppFlow. That says we're going to take care of all the complexity. And then we had to go to the ecosystem, and say to all these providers, "Can you guys build connectors?" 'Cause everybody realized it's super important that data can be shared, and so that organizations can really extract the value from that data. And so the 14 of them at launch, we have many, many more down the road, have come to the party with with connectors, and full support of what AppFlow provides. >> Yeah us DevOps purists always are pounding the fist on the table, now virtual table, API's and connectors. This is the model, so people are integrating. And I want to get your thoughts on this. I think you said low code, or no code on the developer simplicity side. Is it no code, or low code? Can you just explain quickly and clarify that point? >> It's no code for getting started literally, for the kind of, it's basic to medium complexity use case. It's not code, and a lot of customers we spoke to, that was a bottleneck. Right, they needed something from data. It might have been the finance organization, or it could have been human resources, somebody else in organization needed that. They don't have a developer that helps them typically. And so we find that they would wait many, many months, or maybe even never get the project done, just because they never ever had access to that data, or to the developer to actually do the work that was required for the transformation. And so it's no code for almost all use cases. Where it literally is, select your data source, select the connector, and then select the transformations. And some basic transformations, renaming of fields, transformation of data in simple ways. That's more than sufficient for the vast majority of use cases. And then obviously through to the destination, with the connector on the other side, to do the final transformation, to the final data source that you want to migrate the data to. >> You know, you have an interesting background, was looking at your history, and you've essentially been a web services kind of guy all your life. From a code standpoint software environment, and now I'll say EC2 is the crown jewel of AWS, and doing more and more with S3. But what's interesting, as you build more of these layers services in there, there's more flexibility. So right now, in most of the customer environments, is a debate around, do I build something monolithic, and or decoupled, okay? And I think there's a world where there's a mutually, not mutually exclusive, I mean, you have a mainframe, you have a big monolithic thing, if it does something. But generally people would agree that a decoupled environment is more flexible, and more agile. So I want to kind of get to the customer use case, 'cause I can really see this being really powerful, AppFlow with Private Link, where you mentioned Snowflake. I mean, Snowflake is built on AWS, they're doing extremely, extremely well, like any other company that builds on AWS. Whether it's theCUBE Cloud, or it's Snowflake. As we tap those services, customers, we might have people who want to build on our platform on top of AWS. So I know a bunch of startups that are building within the Snowflake ecosystem, a customer of yours. >> Yeah. >> So they're technically a customer of Amazon, but they're also in the ecosystem of say, Snowflake. >> Yes. >> So this brings up an interesting kind of computer science problem, which is architecturally, how do I think about that? Is this something where AppFlow could help me? Because I certainly want to enable people to build on a platform, that I build if I'm doing that, if I'm not going to be a pure SaaS turnkey application. But if I'm going to bring partners in, and do integration, use the benefits of the goodness of an API or Connector driven architecture, I need that. So explain to me how this helps me, or doesn't help me. Is this something that makes sense to you? Does this question make sense? How do you react to that? >> I think so, I think the question is pretty broad. But I think there's an element in which I can help. So firstly, you talk about sort of decoupled applications, right? And I think that is certainly the way that we've gone at Amazon, and been very, very successful for us. I think we started that journey back in 2003, when we decoupled the monolithic application that was amazon.com. And that's when our service journey started. And a lot of that sort of inspired AWS, and how we built what we built today. And we see a lot of our customers doing that, moving to smaller applications. It just works better, it's easier to debug, there's ownership at a very controlled level. So you can get all your engineering teams to have very clear and crisp ownership. And it just drives innovation, right? 'Cause each little component can innovate without the burden of the rest of the ecosystem. And so that's what we really enjoy. I think the other thing that's important when you think about design, is to see how much of the ecosystem you can leverage. And so whether you're building on Snowflake, or you're building directly on top of AWS, or you're building on top of one of our other customers and partners. If you can use something that solves the problem for you, versus building it yourself. Well that just leaves you with more time to actually go and focus on the stuff that you need to be solving, right? The product you need to be building. And so in the case of AppFlow, I think if there's a need for transfer of data, between, for example, Snowflake and some data warehouse, that you as an organisation are trying to build on a Snowflake infrastructure. AppFlow is something you could potentially look at. It's certainly not something that you could just use for, it's very specific and focused to the flow of data between services from a data analytics point of view. It's not really something you could use from an API point of view, or messaging between services. It's more really just facilitating that flow of data, and the transformation of data, to get it into a place that you can do something useful with it. >> And you said-- >> But like any of our services-- (speakers talk over each other) Couldn't be using any layer in the stack. >> Yes, it's a level of integration, right? There's no code to code, depending on how you look at it, cool. Customer use cases, you mentioned, large scale analytics, I thought I heard you say, machine learning, Data Lakes. I mean, basically, anyone who's using data is going to want to tap some sort of data repository, and figure out how to scale data when appropriate. There's also contextual, relevant data that might be specific to say, an industry vertical, or a database. And obviously, AI becomes the application for all this. >> Exactly. >> If I'm a customer, how does AppFlow relate to that? How does that help me, and what's the bottom line? >> So I think there's two parts to that journey. And depending on where customers are, and so there's, we do have millions of customers today that are running applications on AWS. Over the last few years, we've seen the emergence of Data Lakes, really just the storage of a large amount of data, typically in S3. But then companies want to extract value out of, and use in certain ways. Obviously, we have many, many tools today, from Redshift, Athena, that allow you to utilize these Data Lakes, and be able to run queries against this information. Things like EMR, and one of our oldest services in the space. And so doing some sort of large scale analytics, and more recently, services like SageMaker, are allowing us to do machine learning. And so being able to run machine learning across an enormous amount of data that we have stored in AWS. And there's some stuff in the IoT, workload use space as well, that's emerging. And many customers are using it. There's obviously many customers today that aren't using it on AWS, potential customers for us, that are looking to do something useful with data. And so the one part of the journey is taking up all of that infrastructure, and we have a lot of services that make it really easy to do machine learning, and do analytics, and that sort of thing. And then the other problem, the other side of the problem, which is what AppFlow is addressing is, how do I get that data to S3, or to Redshift, to actually go and run that machine learning workload? And that's what it's really unlocking for customers. And it's not just the one time transfer of data, the other thing that AppFlow actually supports, is the continuous updating of data. And so if you decide that you want to have that view of your data in S3, for example, and Data Lake, that's kept up to date, within a few minutes, within an hour, you can actually configure AppFlow to do that. And so the data source could be Salesforce, it could be Slack, it could be whatever data source you want to blend. And you continuously have that flow of data between those systems. And so when you go to run your machine learning workload, or your analytics, it's all continuously up to date. And you don't have this problem of, let me get the data, right? And when I think about some of the data jobs that I've run, in my time, back in the day as an engineer, on early EC2, a small part of it was actually running the job on the data. A large part of it was how do I actually get that data, and is it up to date? >> Up to date data is critical, I think that's the big feature there is that, this idea of having the data connectors, really makes the data fresh, because we go through the modeling, and you realize why I missed a big patch of data, the machine learnings not effective. >> Exactly. >> I mean, it's only-- >> Exactly, and the other thing is, it's very easy to bring in new data sources, right? You think about how many companies today have an enormous amount of data just stored in silos, and they haven't done anything with it. Often it'll be a conversation somewhere, right? Around the coffee machine, "Hey, we could do this, and we can do this." But they haven't had the developers to help them, and haven't had access to the data, and haven't been able to move the data, and to put it in a useful place. And so, I think what we're seeing here, with AppFlow, really unlocking of that. Because going from that initial conversation, to actually having something running, literally requires no code. Log into the AWS console, configure a few connectors, and it's up and running, and you're ready to go. And you can do the same thing with SageMaker, or any of the other services we have on the other side that make it really simple to run some of these ideas, that just historically have been just too complicated. >> Alright, so take me through that console piece. Just walk me through, I'm in, you sold me on this. I just came out of meeting with my company, and I said, "Hey, you know what? "We're blowing up this siloed approach. "We want to kind of create this horizontal data model, "where we can mix "and match connectors based upon our needs." >> Yeah. >> So what do I do? I'm using SageMaker, using some data, I got S3, I got an application. What do I do? I'm connecting what, S3? >> Yeah, well-- >> To the app? >> So the simplest thing is, and the simplest place to find this actually, is on Jeff Bezos blog, that he did for the release, right? Jeff always does a great job in demonstrating how to use our various products. But it literally is going into the standard AWS console, which is the console that we use for all of our services. I think we have 200 of them, so it is getting kind of challenging to find the ball in that console, as we continue to grow. And find AppFlow. AppFlow is a top level service, and so you'll see it in the console. And the first thing you got to do, is you got to configure your Source-Connect. And so it's a connector that, where's the data coming from? And as I said, we had 14 partners, you'll be able to see those connectors there, and see what's supported. And obviously, there's the connectivity. Do you have access to that data, or where is the data running? AppFlow runs within AWS, and so you need to have either VPN, or direct connect back to the organization, if the data source is on-premise. If the data source happens to be in AWS, and obviously be in a VPC, and you just need to configure some of that connectivity functionality. >> So no code if the connectors are there, but what if I want to build my own connector? >> So building your own connector, that is something that we working with third parties with right now. I could be corrected, but not 100% sure whether that's available. It's certainly something I think we would allow customers to do, is to extend sort of either the existing connectors, or to add additional transformations as well. And so you'd be able to do that. But the transformations that the vast majority of our customers are using are literally just in the console, with the basic transformations. >> It comes bigger apps that people have, and just building those connectors. How does a partner get involved? You got 14 partners now, how do you extend the partner base contact in Amazon Partner Manager, or you send an email to someone? How does someone get involved? What are you recommending? >> So there are a couple of ways, right? We have an extensive partner ecosystem that the vast majority of these ISVs are already integrated with. And so, we have the 14 we launched with, we also pre announced SAP, which is going to be a very critical one for the vast majority of our customers. Having deep integration with SAP data, and being able to bring that seamlessly into AWS. That'll be launching soon. And then there's a long list of other ones, that we're currently working on. And they're currently working on them themselves. And then the other one is going to be, like with most things that Amazon, feedback from customers. And so what we hear from customers, and very often you'll hear from third party partners as well, who'll come and say, "Hey, my customers are asking me "to integrate with the AppFlow, what do I need to do?" And so, you know, just reaching out to AWS, and letting them know that you'd be interested in integrating, that you're not part of the partner program. The team would be happy to engage, and bring you on board, so-- >> (mumbles) on playbook, get the top use cases nailed down, listen to customers, and figure it out. >> Exactly. >> Great stuff Dave, we really appreciate it. I'm looking forward to digging in AppFlow, and I'll check on Jeff Bezos blog. Sure, it's April 22, was the launch day, probably had up there. One of the things that want to just jump into, now moving into the next topic, is the cost structure. A lot of pressure on costs. This is where I think this Migration Acceleration Program for Windows is interesting. Andy Jassy always likes to boast on stage at Reinvent, about the number of workloads of Windows running on Amazon Web Services. This has been a big part of the customers, I think, for over 10 years, that I can think of him talking about this. What is this about? Are you still seeing uptake on Windows workloads, or, I mean,-- >> Absolutely. >> Azure has got some market share, >> Absolutely. >> but now you, doesn't really kind of square in my mind, what's going on here. Tell us about this migration service. >> Yeah, absolutely, on the migration side. So Windows is absolutely, we still believe AWS is the best place to run a Windows workload. And we have many, many happy Windows customers today. And it's a very big, very large, growing point of our business today, it used to be. I was part of the original team back in 2008, that launched, I think it was Windows 2008, back then on EC2. And I remember sort of working out all the details, of how to do all the virtualization with Windows, obviously back then we'd done Linux. And getting Windows up and running, and working through some of the challenges that Windows had as an operating system in the early days. And it was October 2008 that we actually launched Windows as an operating system. And it's just been, we've had many, many happy Windows customers since then. >> Why is Amazon so peak to run workloads from Windows so effectively? >> Well, I think, sorry what did you say peaked? >> Why is Amazon so in well positioned to run the Windows workloads? >> Well, firstly, I mean, I think Windows is really just the operating system, right? And so if you think about that as the very last little bit of your sort of virtualization stack, and then being able to support your applications. What you really have to think about is, everything below that, both in terms of the compute, so performance you're going to get, the price performance you're going to get. With our Nitro Hypervisor, and the Nitro System that we developed back in 2018, or launched in 2018. We really are able to provide you with the best price performance, and have the very least overhead from a hypervisor point of view. And then what that means is you're getting more out of your machine, for the price that you pay. And then you think about the rest of the ecosystem, right? Think about all the other services, and all the features, and just the breadth, and the extensiveness of AWS. And that's critically important for all of our Windows customers as well. And so you're going to have things like Active Directory, and these sort of things that are very Windows specific, and we can absolutely support all of those, natively. And in the Windows operating system as well. We have things like various agents that you can run inside the Windows box to do more maintenance and management. And so I think we've done a really good job in bringing Windows into the larger, and broader ecosystem of AWS. And it really is just a case of making sure that Windows runs smoothly. And that's just the last little bit on top of that, and so many customers enterprises run Windows today. When I started out my career, I was developing software in the banking industry, and it was a very much a Windows environment. They were running critical applications. And so we see it's critically important for customers who run Windows today, to be able to bring those Windows workloads to AWS. >> Yeah, and that's certainly-- >> We are seeing a trend. Yeah, sorry, go ahead. >> Well, they're certainly out there from a market share standpoint, but this is a cost driver, you guys are saying, and I want you to just give an example, or just illustrate why it costs less. How is it a cost savings? Is it just services, cycle times on EC2? I mean what's the cost savings? I'm a customer like, "Okay, so I'm going to go to Amazon with my workloads." Why is it a cost saving? >> I think there are a few things. The one I was referring to in my previous comment was the price performance, right? And so if I'm running on a system, where the hypervisor is using a significant portion of the physical CPU that I want to use as well. Well there's an overhead to that. And so from a price performance point of view, I look at, if I go and benchmark a CPU, and I look at how much I pay for that per unit of that benchmark, it's better on AWS. Because with our natural system, we're able to give you 100% of the floor. And so you get a performance then. So that's the first thing is price performance, which is different from this price. But there's a saving there as well. The other one is a large part, and getting into the migration program as well. A large part of what we do with our customers, when they come to AWS, is supposed to be, we take a long look at their license strategy. What licenses do they have? And a key part of bringing in Windows workloads AWS, is license optimization. What can we do to help you optimize the licenses that you're using today for Windows, for SQL Server, and really try and find efficiencies in that. And so we're able to secure significant savings for many of our customers by doing that. And we have a number of tools that they use as part of the migration program to do that. And so that helps save there. And then finally, we have a lot of customers doing what we call modernization of their applications. And so it really embraced Cloud, and some of the benefits that you get from Cloud. Especially elasticities, so being able to scale for demand. It's very difficult to do that when you bound by license for your operating system, because every box you run, you have to have a license for it. And so tuning auto scaling on, you've got to make sure you have enough licenses for all these Windows boxes you've seen. And so the push the Cloud's bringing, we've seen a lot of customers move Windows applications from Windows to Linux, or even move SQL Server, from SQL server to SQL Server on Linux, or another database platform. And do a modernization there, that already allows them to benefit from the elasticity that Cloud provides, without having to constantly worry about licenses. >> So final question on this point, migration service implies migration from somewhere else. How do they get involved? What's the onboarding process? Can you give a quick detail on that? >> Absolutely, so we've been helping customers with migrations for years. We've launched a migration program, or Migration Acceleration Program, MAP. We launched it, I think about 2016, 2017 was the first part of that. It was really just a bringing together of the various, the things we'd learned, the tools we built, the best strategies to do a migration. And we said, "How do we help customers looking "to migrate to the Cloud." And so that's what MAP's all about, is just a three phase, we'll help you assess the migration, we'll help you do a lot of planning. And then ultimately, we help you actually do the migration. We partner with a number of external partners, and ISVs, and GSIs, who also worked very closely with us to help customers do migrations. And so what we launched in April of this year, with the Windows migration program, is really just more support for Windows workload, as part of the broader Migration Acceleration Program. And there's benefits to customers, it's a smoother migration, it's a faster migration in almost all cases, we're doing license assessments, and so there's cost reduction in that as well. And ultimately, there's there's other benefits as well that we offer them, if they partner with us in bringing the workload to AWS. And so getting involved is really just reaching out to one of our AWS sales folks, or one of your account managers, if you have an account manager, and talk to them about workloads that you'd like to bring in. And we even go as far as helping you identify which applications are easiest to migrate. And so that you can kind of get going with some of the easier ones, while we help you with some of the more difficult ones. And strategies' about removing those roadblocks to bring your services to AWS. >> Takes the blockers away, Dave Brown, Vice President of EC2, the crown jewel of AWS, breaking down AppFlow, and the migration to Windows services. Great insights, appreciate the time. >> Thanks. >> We're here with Dave Brown, VP of EC2, as part of the virtual Cube coverage. Dave, I want to get your thoughts on an industry topic. Given what you've done with EC2, and the success, and with COVID-19, you're seeing that scale problem play out on the world stage for the entire population of the global world. This is now turning non-believers into believers of DevOps, web services, real time. I mean, this is now a moment in history, with the challenges that we have, even when we come out of this, whether it's six months or 12 months, the world won't be the same. And I believe that there's going to be a Cambrian explosion of applications. And an architecture that's going to look a lot like Cloud, Cloud-native. You've been doing this for many, many years, key architect of EC2 with your team. How do you see this playing out? Because a lot of people are going to be squirreling in rooms, when this comes back. They're going to be video conferencing now, but when they have meetings, they're going to look at the window of the future, and they're going to be exposed to what's failed. And saying, "We need to double down on that, "we have to fix this." So there's going to be winners and losers coming out of this pandemic, really quickly. And I think this is going to be a major opportunity for everyone to rally around this moment, to reset. And I think it's going to look a lot like this decoupled, this distributed computing environment, leveraging all the things that we've talked about in the past. So what's your advice, and how do you see this evolving? >> Yeah, I completely agree. I mean, I think, just the speed at which it happened as well. And the way in which organizations, both internally and externally, had to reinvent themselves very, very quickly, right? We've been very fortunate within Amazon, moving to working from home was relatively simple for the vast majority of us. Obviously, we have a number of our employees that work in data centers, and performance centers that have been on the front lines, and be doing a great job. But for the rest of us, it's been virtual video conferencing, right? All about meetings, and being able to use all of our networking tools securely, either over the VPN, or the no VPN infrastructure that we have. And many organizations had to do that. And so I think there are a number of different things that have impacted us right now. Obviously, virtual desktops has been a significant sort of growth point, right? Folks don't have access to the physical machine anymore, they're now all having to work remote, and so service like Workspaces, which runs on EC2, as well, has being a critical service data to support many of our largest customers. Our client VPN service, so we have within EC2 on the networking side, has also been critical for many large organizations, as they see more of their staff working everyday remotely. It has also seen, been able to support a lot of customers there. Just more broadly, what we've seen with COVID-19, is we've seen some industries really struggle, obviously travel industry, people just aren't traveling anymore. And so there's been immediate impact to some of those industries. They've been other industries that support functions like the video conferencing, or entertainment side of the house, has seen a bit of growth, over the last couple of months. And education has been an interesting one for us as well, where schools have been moving online. And behind the scenes in AWS, and on EC2, we've been working really hard to make sure that our supply chains are not interrupted in any way. The last thing we want to do is have any of our customers not be able to get EC2 capacity, when they desperately need it. And so we've made sure that capacity is fully available, even all the way through the pandemic. And we've even been able to support customers with, I remember one customer who told me the next day, they're going to have more than hundred thousand students coming online. And they suddenly had to grow their business, by some crazy number. And we were able to support them, and give them the capacity, which is way outside of any sort of demand--. >> I think this is the Cambrain explosion that I was referring to, because a whole new set of new things have emerged. New gaps in businesses have been exposed, new opportunities are emerging. This is about agility. It's real time now. It's actually happening for everybody, not just the folks on the inside of the industry. This is going to create a reinvention. So it's ironic, I've heard the word reinvent mentioned more times now, over the past three months, than I've heard it representing to Amazon. 'Cause that's your annual conference, Reinvent, but people are resetting and reinventing. It's actually a tactic, this is going on. So they're going to need some Clouds. So what do you say to that? >> So, I mean, the first thing is making sure that we can continue to be highly available, continue to have the capacity. The worst scenario is not being able to have the capacity for our customers, right? We did see that with some providers, and that honesty on outside is just years and years of experience of being able to manage supply chain. And the second thing is obviously, making sure that we remain available, that we don't have issues. And so, you know, with all of our stuff going remote and working from home, all my teams are working from home. Being able to support AWS in this environment, we haven't missed a beat there, which has been really good. We were well set up to be able to absorb this. And then obviously, remaining secure, which was our highest priority. And then innovating with our customers, and being able to, and that's both products that we're going to launch over time. But in many cases, like that education scenario I was talking about, that's been able to find that capacity, in multiple regions around the world, literally on a Sunday night, because they found out literally that afternoon, that Monday morning, all schools were virtual, and they were going to use their platform. And so they've been able to respond to that demand. We've seen a lot more machine learning workloads, we've seen an increase there as well as organizations are running more models, both within the health sciences area, but also in the financial areas. And also in just general business, (mumbles), yes, wherever it might be. Everybody's trying to respond to, what is the impact of this? And better understand it. And so machine learning is helping there, and so we've been able to support all those workloads. And so there's been an explosion. >> I was joking with my son, I said, "This world is interesting." Amazon really wins, that stuff's getting delivered to my house, and I want to play video games and Twitch, and I want to build applications, and write software. Now I could do that all in my home. So you went all around. But all kidding aside, this is an opportunity to define agility, so I want to get your thoughts, because I'm a bit a big fan of Amazon. As everyone knows, I'm kind of a pro Amazon person, and as other Clouds kind of try to level up, they're moving in the same direction, which is good for everybody, good competition and all. But S3 and EC2 have been the crown jewels. And building more services around those, and creating these abstraction layers, and new sets of service to make it easier, I know has been a top priority for AWS. So can you share your vision on how you're going to make EC2, and all these services easier for me? So if I'm a coder, I want literally no code, low code, infrastructure as code. I need to make Amazon more programmable and easier. Can you just share your vision on, as we talk about the virtual summits, as we cover the show, what's your take on making Amazon easier to consume and use? >> It's been something we thought a lot over the years, right? When we started out, we were very simple. The early days of EC2, it wasn't that rich feature set. And it's been an interesting journey for us. We've obviously become a lot more, we've written, launched local features, which narrative brings some more complexity to the platform. We have launched things like Lightsail over the years. Lightsail is a hosting environment that gives you that EC2 like experience, but it's a lot simpler. And it's also integrated with a number of other services like RDS and ELB as well, basic load balancing functionality. And we've seen some really good growth there. But what we've also learned is customers enjoy the richness of what ECU provides, and what the full ecosystem provides, and being able to use the pieces that they really need to build their application. From an S3 point of view, from a board ecosystem point of view. It's providing customers with the features and functionality that they really need to be successful. From the compute side of the house, we've done some things. Obviously, Containers have really taken off. And there's a lot of frameworks, whether it's EKS, or community service, or a Docker-based ECS, has made that a lot simpler for developers. And then obviously, in the serverless space, Landers, a great way of consuming EC2, right? I know it's serverless, but there's still an EC2 instance under the hood. And being able to bring a basic function and run those functions in serverless is, a lot of customers are enjoying that. The other complexity we're going after is on the networking side of the house, I find that a lot of developers out there, they're more than happy to write the code, they're more than happy to bring their reputation to AWS. But they struggle a little bit more on the networking side, they really do not want to have to worry about whether they have a route to an internet gateway, and if their subnets defined correctly to actually make the application work. And so, we have services like App Mesh, and the whole mesh server space is developing a lot. To really make that a lot simpler, where you can just bring your application, and call it on an application that just uses service discovery. And so those higher level services are definitely helping. In terms of no code, I think that App Mesh, sorry not App Mesh, AppFlow is one of the examples for already given organizations something at that level, that says I can do something with no code. I'm sure there's a lot of work happening in other areas. It's not something I'm actively thinking on right now , in my role in leading EC2, but I'm sure as the use cases come from customers, I'm sure you'll see more from us in those areas. They'll likely be more specific, though. 'Cause as soon as you take code out of the picture, you're going to have to get pretty specific in the use case. You already get the depth, the functionality the customers will need. >> Well, it's been super awesome to have your valuable time here on the virtual Cube for covering Amazon Summit, Virtual Digital Event that's going on. And we'll be going on throughout the year. Really appreciate the insight. And I think, it's right on the money. I think the world is going to have in six to 12 months, surge in reset, reinventing, and growing. So I think a lot of companies who are smart, are going to reset, reinvent, and set a new growth trajectory. Because it's a Cloud-native world, it's Cloud-computing, this is now a reality, and I think there's proof points now. So the whole world's experiencing it, not just the insiders, and the industry, and it's going to be an interesting time. So really appreciate that, they appreciate it. >> Great, >> Them coming on. >> Thank you very much for having me. It's been good. >> I'm John Furrier, here inside theCUBE Virtual, our virtual Cube coverage of AWS Summit 2020. We're going to have ongoing Amazon Summit Virtual Cube. We can't be on the show floor, so we'll be on the virtual show floor, covering and talking to the people behind the stories, and of course, the most important stories in silicon angle, and thecube.net. Thanks for watching. (upbeat music)

Published Date : May 13 2020

SUMMARY :

leaders all around the world, and most of the parts Hey John, it's really great to be here, and certainly on the large And so the problem we started to see was, in the industry that you guys And is it bi-directional for the customer? and encryption of data on the internet. And I want to get your thoughts on this. and a lot of customers we spoke to, And I think there's a world in the ecosystem of say, Snowflake. benefits of the goodness And so in the case of AppFlow, of our services-- and figure out how to scale And so the one part of the really makes the data fresh, Exactly, and the other thing is, and I said, "Hey, you know what? So what do I do? And the first thing you got to do, that the vast majority and just building those connectors. And then the other one is going to be, the top use cases nailed down, One of the things that doesn't really kind of square in my mind, of how to do all the And in the Windows We are seeing a trend. and I want you to just give an example, And so the push the Cloud's bringing, What's the onboarding process? And so that you can kind of get going and the migration to Windows services. And I believe that there's going to And the way in which organizations, inside of the industry. And the second thing is obviously, But S3 and EC2 have been the crown jewels. and the whole mesh server and it's going to be an interesting time. Thank you very much for having me. and of course, the most important stories

ENTITIES

Entity	Category	Confidence
Dave Brown	PERSON	0.99+
Dave	PERSON	0.99+
14	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
John	PERSON	0.99+
100%	QUANTITY	0.99+
John Furrier	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
October 2008	DATE	0.99+
Jeff	PERSON	0.99+
Palo Alto	LOCATION	0.99+
2003	DATE	0.99+
2018	DATE	0.99+
Andy Jassy	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Amazon Web Services	ORGANIZATION	0.99+
April 22	DATE	0.99+
14 partners	QUANTITY	0.99+
six months	QUANTITY	0.99+
two parts	QUANTITY	0.99+
22nd of April	DATE	0.99+
Windows	TITLE	0.99+
Snowflake	TITLE	0.99+
12 months	QUANTITY	0.99+
AppFlow	TITLE	0.99+
first	QUANTITY	0.99+
SQL Server	TITLE	0.99+
SQL	TITLE	0.99+
Linux	TITLE	0.99+
EC2	TITLE	0.99+

Colin Mahony, Vertica at Micro Focus | Virtual Vertica BDC 2020

>>It's the queue covering the virtual vertical Big Data Conference 2020. Brought to you by vertical. >>Hello, everybody. Welcome to the new Normal. You're watching the Cube, and it's remote coverage of the vertical big data event on digital or gone Virtual. My name is Dave Volante, and I'm here with Colin Mahoney, who's a senior vice president at Micro Focus and the GM of Vertical Colin. Well, strange times, but the show goes on. Great to see you again. >>Good to see you too, Dave. Yeah, strange times indeed. Obviously, Safety first of everyone that we made >>a >>decision to go Virtual. I think it was absolutely the right all made it in advance of how things have transpired, but we're making the best of it and appreciate your time here, going virtual with us. >>Well, Joe and we're super excited to be here. As you know, the Cube has been at every single BDC since its inception. It's a great event. You just you just presented the key note to your to your audience, You know, it was remote. You didn't have that that live vibe. And you have a lot of fans in the vertical community But could you feel the love? >>Yeah, you know, it's >>it's hard to >>feel the love virtually, but I'll tell you what. The silver lining in all this is the reach that we have for this event now is much broader than it would have been a Z you know, you know, we brought this event back. It's been a few years since we've done it. We're super excited to do it, obviously, you know, in Boston, where it was supposed to be on location, but there wouldn't have been as many people that could participate. So the silver lining in all of this is that I think there's there's a lot of love out there we're getting, too. I have a lot of participants who otherwise would not have been able to participate in this. Both live as well. It's a lot of these assets that we're gonna have available. So, um, you know, it's out there. We've got an amazing customers and of practitioners with vertical. We've got so many have been with us for a long time. We've of course, have a lot of new customers as well that we're welcoming, so it's exciting. >>Well, it's been a while. Since you've had the BDC event, a lot of transpired. You're now part of micro focus, but I know you and I know the vertical team you guys have have not stopped. You've kept the innovation going. We've been following the announcements, but but bridge the gap between the last time. You know, we had coverage of this event and where we are today. A lot has changed. >>Oh, yeah, a lot. A lot has changed. I mean, you know, it's it's the software industry, right? So nothing stays the same. We constantly have Teoh keep going. Probably the only thing that stays the same is the name Vertical. Um and, uh, you know, you're not spending 10 which is just a phenomenal released for us. So, you know, overall, the the organization continues to grow. The dedication and commitment to this great form of vertical continues every single release we do as you know, and this hasn't changed. It's always about performance and scale and adding a whole bunch of new capabilities on that front. But it's also about are our main road map and direction that we're going towards. And I think one of the things have been great about it is that we've stayed true that from day one we haven't tried to deviate too much and get into things that are barred to outside your box. But we've really done, I think, a great job of extending vertical into places where people need a lot of help. And with vertical 10 we know we're going to talk more about that. But we've done a lot of that. It's super exciting for our customers, and all of this, of course, is driven by our customers. But back to the big data conference. You know, everybody has been saying this for years. It was one of the best conferences we've been to just so really it's. It's developers giving tech talks, its customers giving talks. And we have more customers that wanted to give talks than we had slots to fill this year at the event, which is another benefit, a little bit of going virtually accommodate a little bit more about obviously still a tight schedule. But it really was an opportunity for our community to come together and talk about not just America, but how to deal with data, you know, we know the volumes are slowing down. We know the complexity isn't slowing down. The things that people want to do with AI and machine learning are moving forward in a rapid pace as well. There's a lot talk about and share, and that's really huge part of what we try to do with it. >>Well, let's get into some of that. Um, your customers are making bets. Micro focus is actually making a bet on one vertical. I wanna get your perspective on one of the waves that you're riding and where are you placing your bets? >>Yeah, No, it's great. So, you know, I think that one of the waves that we've been writing for a long time, obviously Vertical started out as a sequel platform for analytics as a sequel, database engine, relational engine. But we always knew that was just sort of takes that we wanted to do. People were going to trust us to put enormous amounts of data in our platform and what we owe everyone else's lots of analytics to take advantage of that data in the lots of tools and capabilities to shape that data to get into the right format. The operational reporting but also in this day and age for machine learning and from some pretty advanced regressions and other techniques of things. So a huge part of vertical 10 is just doubling down on that commitment to what we call in database machine learning and ai. Um, And to do that, you know, we know that we're not going to come up with the world's best algorithms. Nor is that our focus to do. Our advantage is we have this massively parallel platform to ingest store, manage and analyze the data. So we made some announcements about incorporating PM ML models into the product. We continue to deepen our python integration. Building off of a new open source project we started with uber has been a great customer and partner on This is one of our great talks here at the event. So you know, we're continuing to do that, and it turns out that when it comes to anything analytics machine learning, certainly so much of what you have to do is actually prepare the big shape the data get the data in the right format, apply the model, fit the model test a model operationalized model and is a great platform to do that. So that's a huge bet that were, um, continuing to ride on, taking advantage of and then some of the other things that we've just been seeing. You continue. I'll take object. Storage is an example on, I think Hadoop and what would you point through ultimately was a huge part of this, but there's just a massive disruption going on in the world around object storage. You know, we've made several bets on S three early we created America Yang mode, which separates computing story. And so for us that separation is not just about being able to take care of your take advantage of cloud economics as we do, or the economics of object storage. It's also about being able to truly isolate workloads and start to set the sort of platform to be able to do very autonomous things in the databases in the database could actually start self analysing without impacting many operational workloads, and so that continues with our partnership with pure storage. On premise, we just announced that we're supporting beyond Google Cloud now. In addition to Amazon, we supported on we've got a CFS now being supported by are you on mode. So we continue to ride on that mega trend as well. Just the clouds in general. Whether it's a public cloud, it's a private cloud on premise. Giving our customers the flexibility and choice to run wherever it makes sense for them is something that we are very committed to. From a flexibility standpoint. There's a lot of lock in products out there. There's a lot of cloud only products now more than ever. We're hearing our customers that they want that flexibility to be able to run anywhere. They want the ease of use and simplicity of native cloud experiences, which we're giving them as well. >>I want to stay in that architectural component for a minute. Talk about separating compute from storage is not just about economics. I mean apart Is that you, you know, green, really scale compute separate from storage as opposed to in chunks. It's more efficient, but you're saying there's other advantages to operational and workload. Specificity. Um, what is unique about vertical In this regard, however, many others separate compute from storage? What's different about vertical? >>Yeah, I think you know, there's a lot of differences about how we do it. It's one thing if you're a cloud native company, you do it and you have a shared catalog. That's key value store that all of your customers are using and are on the same one. Frankly, it's probably more of a security concern than anything. But it's another thing. When you give that capability to each customer on their own, they're fully protected. They're not sharing it with any other customers. And that's something that we hear a lot of insights from our customers. They want to be able to separate compute and storage. But they want to be able to do this in their own environment so that they know that in their data catalog there's no one else is. You share in that catalog, there's no single point of failure. So, um, that's one huge advantage that we have. And frankly, I think it just comes from being a company that's operating on premise and, uh, up in the cloud. I think another huge advantages for us is we don't know what object storage platform is gonna win, nor do we necessarily have. We designed the young vote so that it's an sdk. We started with us three, but it could be anything. It's DFS. That's three. Who knows what what object storage formats were going to be there and then finally, beyond just the object storage. We're really one of the only database companies that actually allows our customers to natively operate on data in very different formats, like parquet and or if you're familiar with those in the Hadoop community. So we not only embrace this kind of object storage disruption, but we really embrace the different data formats. And what that means is our customers that have data pipelines that you know, fully automated, putting this information in different places. They don't have to completely reload everything to take advantage of the Arctic analytics. We can go where the data is connected into it, and we offer them a lot of different ways to take advantage of those analytics. So there are a couple of unique differences with verdict, and again, I think are really advance. You know, in many ways, by not being a cloud native platform is that we're very good at operating in different environments with different formats that changing formats over time. And I don't think a lot of the other companies out there that I think many, particularly many of the SAS companies were scrambling. They even have challenges moving from saying Amazon environment to a Microsoft azure environment with their office because they've got so much unique Band Aid. Excuse me in the background. Just holding the system up that is native to any of those. >>Good. I'm gonna summarize. I'm hearing from you your Ferrari of databases that we've always known. Your your object store agnostic? Um, it's any. It's the cloud experience that you can bring on Prem to virtually any cloud. All the popular clouds hybrid. You know, aws, azure, now Google or on Prem and in a variety of different data formats. And that is, I think, you know, you need the combination of those I think is unique in the marketplace. Um, before we get into the news, I want to ask you about data silos and data silos. You mentioned H DFs where you and I met back in the early days of big data. You know, in some respects, you know, Hadoop help break down the silos with distributing the date and leave it in place, and in other respects, they created Data Lakes, which became silos. And so we have. Yet all these other sales people are trying to get to, Ah, digital transformation meeting, putting data at their core virtually obviously, and leave it in place. What's your thoughts on that in terms of data being a silo buster Buster, How does verdict of way there? >>Yeah, so And you're absolutely right, I think if even if you look at his due for all the new data that gets into the do. In many ways, it's created yet another large island of data that many organizations are struggling with because it's separate from their core traditional data warehouse. It's separate from some of the operational systems that they have, and so there might be a lot of data in there, but they're still struggling with How do I break it out of that large silo and or combine it again? I think some some of the things that verdict it doesn't part of the announcement just attend his migration tools to make it really easy. If you do want to move it from one platform to another inter vertical, but you don't have to move it, you can actually take advantage of a lot of the data where it resides with vertical, especially in the Hadoop brown with our external table storage with our building or compartment natively. So we're very pragmatic about how our customers go about this. Very few customers, Many of them tried it with Hadoop and realize that didn't work. But very few customers want a wholesale. Just say we're going to throw everything out. We're gonna get rid of our data warehouse. We're gonna hit the pause button and we're going to go from there. Just it's not possible to do that. So we've spent a lot of time investing in the product, really work with them to go where the data is and then seamlessly migrate. And when it makes sense to migrate, you mentioned the performance of America. Um, and you talked about it is the variety. It definitely is. And one other thing that we're really proud of this is that it actually is not a gas guzzler. Easy either One of the things that we're seeing, a lot of the other cloud databases pound for pound you get on the 10th the hardware vertical running up there. You get over 10 x performance. We're seeing that a lot, so it's Ah, it's not just about the performance, but it's about the efficiency as well. And I think that efficiency is really important when it comes to silos. Because there's there's just only so much horsepower out there. And it's easier for companies to play tricks and lots of servers environment when they start up for so many organizations and cloud and frankly, looking at the bills they're getting from these cloud workloads that are running. They really conscious of that. >>Yeah. The big, big energy companies love the gas guzzlers. A lot of a lot of cloud. Cute. But let's get into the news. Uh, 10 dot io you shared with your the audience in your keynote. One of the one of the highlights of data. What do we need to know? >>Yeah, so, you know, again doubling down on these mega trends, I'll start with Machine Learning and ai. We've done a lot of work to integrate so that you can take native PM ml models, bring them into vertical, run them massively parallel and help shape you know your data and prepare it. Do all the work that we know is required true machine learning. And for all the hype that there is around it, this is really you know, people want to do a lot of unsupervised machine learning, whether it's for healthcare fraud, detection, financial services. So we've doubled down on that. We now also support things like Tensorflow and, you know, as I mentioned, we're not going to come up with the best algorithms. Our job is really to ensure that those algorithms that people coming up with could be incorporated, that we can run them against massive data sets super efficiently. So that's that's number one number two on object storage. We continue to support Mawr object storage platforms for ya mode in the cloud we're expanding to Google G CPI, Google's cloud beyond just Amazon on premise or in the cloud. Now we're also supporting HD fs with beyond. Of course, we continue to have a great relationship with our partners, your storage on premise. Well, what we continue to invest in the eon mode, especially. I'm not gonna go through all the different things here, but it's not just sort of Hey, you support this and then you move on. There's so many different things that we learn about AP I calls and how to save our customers money and tricks on performance and things on the third areas. We definitely continue to build on that flexibility of deployment, which is related to young vote with. Some are described, but it's also about simplicity. It's also about some of the migration tools that we've announced to make it easy to go from one platform to another. We have a great road map on these abuse on security, on performance and scale. I mean, for us. Those are the things that we're working on every single release. We probably don't talk about them as much as we need to, but obviously they're critically important. And so we constantly look at every component in this product, you know, Version 10 is. It is a huge release for any product, especially an analytic database platform. And so there's We're just constantly revisiting you know, some of the code base and figuring out how we can do it in new and better ways. And that's a big part of 10 as well. >>I'm glad you brought up the machine Intelligence, the machine Learning and AI piece because we would agree that it is really one of the things we've noticed is that you know the new innovation cocktail. It's not being driven by Moore's law anymore. It's really a combination of you. You've collected all this data over the last 10 years through Hadoop and other data stores, object stores, etcetera. And now you're applying machine intelligence to that. And then you've got the cloud for scale. And of course, we talked about you bringing the cloud experience, whether it's on Prem or hybrid etcetera. The reason why I think this is important I wanted to get your take on this is because you do see a lot of emerging analytic databases. Cloud Native. Yes, they do suck up, you know, a lot of compute. Yeah, but they also had a lot of value. And I really wanted to understand how you guys play in that new trend, that sort of cloud database, high performance, bringing in machine learning and AI and ML tools and then driving, you know, turning data into insights and from what I'm hearing is you played directly in that and your differentiation is a lot of the things that we talk about including the ability to do that on from and in the cloud and across clouds. >>Yeah, I mean, I think that's a great point. We were a great cloud database. We run very well upon three major clouds, and you could argue some of the other plants as well in other parts of the world. Um, if you talk to our customers and we have hundreds of customers who are running vertical in the cloud, the experience is very good. I think it would always be better. We've invested a lot in taking advantage of the native cloud ecosystem, so that provisioning and managing vertical is seamless when you're in that environment will continue to do that. But vertical excuse me as a cloud platform is phenomenal. And, um, you know, there's a There's a lot of confusion out there, you know? I think there's a lot of marketing dollars spent that won't name many of the companies here. You know who they are, You know, the cloud Native Data Warehouse and it's true, you know their their software as a service. But if you talk to a lot of our customers, they're getting very good and very similar. experiences with Bernie comic. We stopped short of saying where software is a service because ultimately our customers have that control of flexibility there. They're putting verdict on whichever cloud they want to run it on, managing it. Stay tuned on that. I think you'll you'll hear from or more from us about, you know, that going going even further. But, um, you know, we do really well in the cloud, and I think he on so much of yang. And, you know, this has really been a sort of 2.5 years and never for us. But so much of eon is was designed around. The cloud was designed around Cloud Data Lakes s three, separation of compute and storage on. And if you look at the work that we're doing around container ization and a lot of these other elements, it just takes that to the next level. And, um, there's a lot of great work, so I think we're gonna get continue to get better at cloud. But I would argue that we're already and have been for some time very good at being a cloud analytic data platform. >>Well, since you open the door I got to ask you. So it's e. I hear you from a performance and architectural perspective, but you're also alluding two. I think something else. I don't know what you can share with us. You said stay tuned on that. But I think you're talking about Optionality, maybe different consumption models. That am I getting that right and you share >>your difficult in that right? And actually, I'm glad you wrote something. I think a huge part of Cloud is also has nothing to do with the technology. I think it's how you and seeing the product. Some companies want to rent the product and they want to rent it for a certain period of time. And so we allow our customers to do that. We have incredibly flexible models of how you provision and purchase our product, and I think that helps a lot. You know, I am opening the door Ah, a little bit. But look, we have customers that ask us that we're in offer them or, you know, we can offer them platforms, brawl in. We've had customers come to us and say please take over systems, um, and offer something as a distribution as I said, though I think one thing that we've been really good at is focusing on on what is our core and where we really offer offer value. But I can tell you that, um, we introduced something called the Verdict Advisor Tool this year. One of the things that the Advisor Tool does is it collects information from our customer environments on premise or the cloud, and we run through our own machine learning. We analyze the customer's environment and we make some recommendations automatically. And a lot of our customers have said to us, You know, it's funny. We've tried managed service, tried SAS off, and you guys blow them away in terms of your ability to help us, like automatically managed the verdict, environment and the system. Why don't you guys just take this product and converted into a SAS offering, so I won't go much further than that? But you can imagine that there's a lot of innovation and a lot of thoughts going into how we can do that. But there's no reason that we have to wait and do that today and being able to offer our customers on premise customers that same sort of experience from a managed capability is something that we spend a lot of time thinking about as well. So again, just back to the automation that ease of use, the going above and beyond. Its really excited to have an analytic platform because we can do so much automation off ourselves. And just like we're doing with Perfect Advisor Tool, we're leveraging our own Kool Aid or Champagne Dawn. However you want to say Teoh, in fact, tune up and solve, um, some optimization for our customers automatically, and I think you're going to see that continue. And I think that could work really well in a bunch of different wallets. >>Welcome. Just on a personal note, I've always enjoyed our conversations. I've learned a lot from you over the years. I'm bummed that we can't hang out in Boston, but hopefully soon, uh, this will blow over. I loved last summer when we got together. We had the verdict throwback. We had Stone Breaker, Palmer, Lynch and Mahoney. We did a great series, and that was a lot of fun. So it's really it's a pleasure. And thanks so much. Stay safe out there and, uh, we'll talk to you soon. >>Yeah, you too did stay safe. I really appreciate it up. Unity and, you know, this is what it's all about. It's Ah, it's a lot of fun. I know we're going to see each other in person soon, and it's the people in the community that really make this happen. So looking forward to that, but I really appreciate it. >>Alright. And thank you, everybody for watching. This is the Cube coverage of the verdict. Big data conference gone, virtual going digital. I'm Dave Volante. We'll be right back right after this short break. >>Yeah.

Published Date : Mar 31 2020

SUMMARY :

Brought to you by vertical. Great to see you again. Good to see you too, Dave. I think it was absolutely the right all made it in advance of And you have a lot of fans in the vertical community But could you feel the love? to do it, obviously, you know, in Boston, where it was supposed to be on location, micro focus, but I know you and I know the vertical team you guys have have not stopped. I mean, you know, it's it's the software industry, on one of the waves that you're riding and where are you placing your Um, And to do that, you know, we know that we're not going to come up with the world's best algorithms. I mean apart Is that you, you know, green, really scale Yeah, I think you know, there's a lot of differences about how we do it. It's the cloud experience that you can bring on Prem to virtually any cloud. to another inter vertical, but you don't have to move it, you can actually take advantage of a lot of the data One of the one of the highlights of data. And so we constantly look at every component in this product, you know, And of course, we talked about you bringing the cloud experience, whether it's on Prem or hybrid etcetera. And if you look at the work that we're doing around container ization I don't know what you can share with us. I think it's how you and seeing the product. I've learned a lot from you over the years. Unity and, you know, this is what it's all about. This is the Cube coverage of the verdict.

ENTITIES

Entity	Category	Confidence
Colin Mahoney	PERSON	0.99+
Dave Volante	PERSON	0.99+
Dave	PERSON	0.99+
Boston	LOCATION	0.99+
Joe	PERSON	0.99+
Colin Mahony	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
uber	ORGANIZATION	0.99+
three	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
python	TITLE	0.99+
hundreds	QUANTITY	0.99+
Ferrari	ORGANIZATION	0.99+
10	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
one	QUANTITY	0.99+
2.5 years	QUANTITY	0.99+
two	QUANTITY	0.99+
Kool Aid	ORGANIZATION	0.99+
Vertical Colin	ORGANIZATION	0.99+
10th	QUANTITY	0.99+
Both	QUANTITY	0.99+
Micro Focus	ORGANIZATION	0.98+
each customer	QUANTITY	0.98+
Moore	PERSON	0.98+
America	LOCATION	0.98+
this year	DATE	0.98+
one platform	QUANTITY	0.97+
today	DATE	0.96+
One	QUANTITY	0.96+
10	TITLE	0.96+
Vertica	ORGANIZATION	0.96+
last summer	DATE	0.95+
third areas	QUANTITY	0.94+
one thing	QUANTITY	0.93+
Vertical	ORGANIZATION	0.92+
this year	DATE	0.92+
single point	QUANTITY	0.92+
Big Data Conference 2020	EVENT	0.92+
Arctic	ORGANIZATION	0.91+
Hadoop	ORGANIZATION	0.89+
three major clouds	QUANTITY	0.88+
H DFs	ORGANIZATION	0.86+
Cloud Data Lakes	TITLE	0.86+
Stone Breaker	ORGANIZATION	0.86+
one huge advantage	QUANTITY	0.86+
Hadoop	TITLE	0.85+
BDC	EVENT	0.83+
day one	QUANTITY	0.83+
Version 10	TITLE	0.83+
Cube	COMMERCIAL_ITEM	0.82+
Google Cloud	TITLE	0.82+
BDC 2020	EVENT	0.81+
thing	QUANTITY	0.79+
Bernie	PERSON	0.79+
first	QUANTITY	0.79+
over 10 x	QUANTITY	0.78+
Prem	ORGANIZATION	0.78+
one vertical	QUANTITY	0.77+
Virtual Vertica	ORGANIZATION	0.77+
Verdict	ORGANIZATION	0.75+
SAS	ORGANIZATION	0.75+
Champagne Dawn	ORGANIZATION	0.73+
every single release	QUANTITY	0.72+
Perfect	TITLE	0.71+
years	QUANTITY	0.7+
last 10 years	DATE	0.69+
Palmer	ORGANIZATION	0.67+
Tensorflow	TITLE	0.65+
single release	QUANTITY	0.65+
a minute	QUANTITY	0.64+
Advisor Tool	TITLE	0.63+
customers	QUANTITY	0.62+

Itamar Ankorion, Qlik | CUBE Conversation, April 2019

>> from the Silicon Angle Media Office in Boston, Massachusetts. It's the queue. Now here's your host. Still minimum. >> I'm stupid, Aman and this is a cube conversation from our Boston area studio. We spent a lot of time talking about digital transformation. Of course, At the center of that digital transformations data this segment We're going to be talking about the data integration platform. Joining me for that segment is Itamar on Cory on Who's the senior vice president of enterprise data Integration with Click. Thanks so much for joining me. >> Thanks to left me here. >> All right, so a zay just said, you know the customers, you know, digital information when you talked to any user, you know, there there's some that might say, Oh, there's a little bit of hyper I don't understand it, but really leveraging that data, you know, there are very few places that that is not core toe what they need to do, and if they're not doing it, they're competition will do it. So can you bring us inside a little bit? That customers you're talking to that, that you know where that fits into their business needs and you know how the data integration platform, you know, helps them solve that issue. >> Absolutely so, As you mentioned, the diesel transformation is driving a lot ofthe innovation, a lot off efforts by corporations and virtually any organization that we're talking. Toa seize data is a core component off, enabling the little transformation. The data creates new analytics, and there was toe power, the digital transformation, whether it's in making better decisions, whether it's embedding the analytics and the intelligence into business processes and custom applications to ever to reach the experience and make it better. So data becomes key, and the more data you can make available through the process, the faster you can make a development in the process. The faster you can adapt your process to accommodate the changes, the better it will be. So we're saying organization, virtually all of them looking to modernize their day, the strategy and the day, the platforms in order to accommodate these needs. >> Yeah, it's such a complex issue. We've we've been at, you know, chief data officer events way, talk about data initiatives. You know, we worry a little bit that the sea seats sometimes here it's like up. They heard data is the new oil and they came and they said, You know, according to the magazine I read, you need we need to have a date, a strategy, and give me the value of data. But, you know, where is the rubber hitting the road? You know what? What are some of those steps that they're taking? You know, how do I help, you know, get my arms around the data and that help make sure it can move along that spectrum from kind of the raw or two, you know, real value. >> I think you made a great point. Talking about the or to value our as we refer to it is a road to ready. And part of the whole innovation that we're seeing is the modernization of the platform where organizations are looking to tap into the tremendous amount of data that is available today. So a couple of things have happened first in the last decade. First of all, we have significantly more data. It is available and and then ever before, because of digitization, off data and new sources become available. But beyond that, we have the technology is the platforms that can both store in process large amounts of data. So we have foundations. But in the end, to make it happen, we need to get all the data to where we want to analyze it and find a way to put it together and turning from more row material into ready, material ready products that can be consumed. And that's really where the challenges and we're seeing. A lot of organizations, especially the CEO Seo the animals, architecture and First data architecture, teams on a journey to understand how to put together these kind of architectures and data systems. And that's where without data integration platform, we focused on accommodating the new challenges they have encountered in trying to make that happen. >> Yeah, help us unpack a little bit, You know, a here today. You know, it's the economy. Everything should work together when I rolled out. You know, in our company, you know, the industries leading serum, it's like, Oh, I've got hundreds of data sources and hundreds of tools I could put together, and it should be really easy for me to just, you know, allow my data to flow and get to the right place. But I always always find a lot a lot of times that that easy. But I've been having a hard time finding that so so >> that that's a good point. And if you cannot takes the bag, understand water, this side of the court challenges or the new needs that we're seeing because we talk about the transformation and more than analytics field by data being part of it. More analytics created a new type of challenges that didn't exist before and therefore kind of traditional data integration tools didn't do the job they didn't meet. Those model needs me very touched on a few of those. So, first of all, and people, when customers are implementing more than analytics many times where they refer to escape well they're trying to do is to do a I machine learning. We'LL use those terms and we talk about him but machine learning and I get smarter, the more data you give them. So it's all about the scale of data, and what we're seeing with customers is where if in the past data warehouse system, but if typically had five ten twenty, they the source is going into it. When I was saying one hundred X uh, times that number of sources. So we have customers that worked with five hundred six hundred, some over two thousand source of data feeding the data analytics system. So scale becomes a critical need and we talk about scale. You need the ability to bring data from hundreds or thousands of sources so systems efficiently with very low impact and ideally, do it also with less resources. Because again, you need to scale the second second chair and you ran in tow s to do with the fact that more than analytics for many organizations means real Time analytics or streaming analytics. So they wantto be ableto process data in real time. In response for that, to do that, you need away toe move data, capture it in real time and be able to make it available and do that in a very economic fashion. And then the third one is in order to deal with the scare in order to deal with the agility that the customers want. The question is, well, are they doing the analytics? And many of them are adopting the cloud, and we've been seeing multicoloured adoption. So in order to get data to the cloud. Now you're dealing with the challenge of efficiency. I have limited network band with. I have a lot of data that I need to move around. How can I move all of that and do that more efficiently? And, uh, the only thing that would add to that is that beyond that, the mechanics of how you move the data with scale, with efficiency even in real time there's also how you approach the process where the whole solution is to beware. What a join those the operations you can implement and accommodate any type of architecture. I need to have a platform that you may choose and we sink us was changed those overtime. So I need a breather to be agile and flexible. >> Yeah, well, ah, Lotto unpack there because, you know, I just made the comment. You know, if you talk about us humans, the more data we give them doesn't mean I'm actually going to get better. It's I need to We need to be able to have those tool ings in there to be able to have that data and help give me the insights, which then I could do on otherwise, you know, we understand most people. It's like if I have to make decisions or choices and I get more thrown at me, there's less and less likelihood that I can do on that on boy the Data Lakes. Yeah, I I remember the first time I heard Data Lakes. It was, you know, we talked about what infrastructure rebuilding, and now the last couple of years, the cloud public cloud tends to be a big piece of it. Even though we know data is goingto live everywhere, you know everything, not just public private ground. But EJ gets into a piece of it so that you know that the data integration platform, you know how easy it for customers get started on that We'LL talk about that diversity of everything else, you know, Where do they start? Give me a little bit of kind of customer journey, if you would. And maybe even if you have a customer example that that would be a great way to go illustrated. >> Absolutely so First of all, it's a journey, and I think that journey started quite a few years ago. I mean, do it is now over ten years old, and they were actually seeing a big change in shifting the market from what was initially the Duke ecosystem into a much brother sort of technology's, especially with the cloud in order to store and process large scales of data. So the journey customs we're going through with a few years, which were very experimental customers were trying trying it on for size. They were trying to understand how Toby the process around it, the solutions of them ivory batch oriented with may produce back in the early days off. But when you look at it today, it's a very it's already evolved significantly, and you're saying this big data systems needing to support different and diverse type off workloads. Some of them are michelle machine learning and sign. Some of them are streaming in the Olympics. Some of them are serving data for micro services toe parad, Egil applications. So there's a lot of need for the data in the journey, and what we're seeing is that customers as they move through this journey, they sometimes need to people and they need if they find you technology that come out and they had the ability to be able to accommodate, to adapt and adopt new technologies as they go through. It s so that's kind of the journey we have worked with our customers through. And as they evolved, once they figured it out, this scale came along. So it's very common to see a customer start with a smaller project and then scale it up. So for many of the cost me worked with, that's how it worked out. And you ask for an example. So one of her customers this month, the world's largest automotive companies, and they decided to have a strategy to turn what they believe is a huge asset they have, which is data. But the data is in a lot of silos across manufacturing facility supply facilities and others inventory and bring it all together into one place. Combined data with data to bring from the car itself and by having all the data in one place, be able to derive new insights into information that they they can use as well as potentially sale or monetizing other other ways. So as they got started, they initially start by running it out to set a number off their data data centers and their source of information manufacturing facilities. So they started small. But then very quickly, once they figured out they can do it fast and figure out the process to scale it. Today, there are over five hundred systems they have. Martha is over two hundred billion changes in data being fed daily. Okay, enter their Data lake. So it's a very, very large scale system. I feel we can talk about what it takes to put together something so big. >> Yeah. Don't pleaded. Please take the next step. That would that would be perfect. >> Okay, so I think whether the key things customers have to understand, uh, you were saying that the enterprise architecture teams is that when you need to scale, you need to change the way you think about things. And in the end of the day, there are two fundamental differences in the approach and the other light technology that enabled that. So we talked earlier about the little things help for the mind to understand. Now I'm going to focus on and hide it. Only two that should be easy to take away. First is that they're the move from bench to real time or from batch tow. The Delta to the changes. Traditionally, data integration was done in the best process. You reload the data today if you want to scale. If you want to work in a real time, you need to work based on the Delta on the change, the fundamental technology behind it. It's called change data capture, and it's like technology and approach. It allows you to find and identify only the changes on the enterprise data systems and imagine all the innovation you can get by capturing, imposing or the change is. First of all, you have a significant impact on the systems. Okay, so we can scale because you were moving less data. It's very efficient as you move the data around because it's only a fraction off the data, and it could be real time because again, you capturing the data as it changes. So they move from bitch to real time or to streaming data based on changes. The capture is fundamental, fundamental in creating a more than their integration environment. >> I'm assuming there's an initial load that has to go in something like that, >> correct. But he did that once and then for the rest of the time you're really moving onto the deltas. The second difference, toe one was get moving from batch toe streaming based on change. The capture and the second eyes how you approach building it, which is moving from a development. Let platform to automation. So through automation, you could take workloads that have traditionally being in the realm ofthe the developer and allow people with out development skills to be able to implement such solutions very quickly. So again, the move from developer toe toe configuration based automation based products or what we've done opportunity is First, we have been one of the pioneers in the innovators in change that I capture technology. So the platform that now it's part of the clique that integration plan from brings with it okay over fifteen years off innovation and optimization change their capture with the broader set of data sources that our support there, with lots of optimization ranging from data sources like sickle server and Oracle, the mainstream toe mainframes and to escape system. And then one of the key focus with the head is how do we take complex processes and ultimatum. So from a user perspective, you can click a few buttons, then few knobs, and you have the optimize solution available for making data moving data across that they're very sets off systems. So through moving on to the Delta and the automation, you allow this cape. >> So a lot of the systems I'm familiar with it's the metadata you know, comes in the system. I don't have to as an admin or somebody's setting that up. I don't have to do all of this or even if you think about you know, the way I think of photos these days. It used to be. I took photos and trying to sort them was, you know, ridiculous. Now, my, you know, my apple or Google, you know, normally facial recognition, but timestamp location, all those things I can sort it and find it. You know, it's built into the system >> absolutely in the metadata is critical to us to the whole process. First of all, because when you bring data from one system to another system, somebody's to understand their data. And the process of getting data into a lake and into a data warehouse is becoming a multi step day the pipeline, and in order to trust the data and understanding that you need to understand all the steps that they went through. And we also see different teams taking part in this process. So for it seemed to be able to pick up the data and work on it, it needs to understand its meta data. By the way, this is also where the click their integration platform bring together the unity software. Together with Click the catalyst, we'LL provide unique value proposition for you that because you have the ability to capture changed data as it changes, deliver that data virtually anywhere. Any data lake, any cloud platform, any analytic platform. And then we find the data to generate analytic ready data sets and together with the click data Catalyst, create derivative data sets and publish all of their for a catalogue that makes it really easy to understand which data exists and how to use it. So we have an end to end solution for streaming data pipelines that generate analytic data that data sets for the end of the day, wrote to ready an accelerated fashion. >> So, Itamar, your customers of the world that out, How did they measures Casesa? Their critical KP eyes is there You know some, you know, journey, you know, math that they help go along. You know what? What? What are some commonalities? >> So it's a great question. And naturally, for many organizations, it's about an arrow. I It's about total cost of ownership. It seeing result, as I mentioned earlier, agility and the timeto value is really changing. Customers are looking to get results within a matter of, if very few month and even sometimes weeks versus what it used to be, which is many months and sometimes even years. So again, the whole point is to do with much, much faster. So from a metric for success, what we're seeing his customers that buy our solution toe enable again large scale strategic initiatives where they have dozens to hundreds of data sources. One of the key metrics is how many data sources heavy onboard that heavy, made available. How many in the end of the data sets that already analytic ready have we published or made available Torrey Tor users and I'LL give you an example. Another example from one of for customers, very large corporation in the United States in the opportunity of after trying to move to the cloud and build a cloud Data Lake and analytic platform. In the two years they're able to move to two three data sets to the cloud after they try, they knew they'd integration platform okay, there. But they moved thirty day The sits within three months, so completely different result. And the other thing that they pointed out and actually talk about their solution is that unlike traditional data integration software, and they took an example of one of those traditional PTL platforms and they pointed out it takes seven months to get a new person skilled on that platform. Okay, with our data integration platform, they could do that in a matter of hours to a few days. So again, the ability to get results much faster is completely different. When you have that kind of software that goes back to a dimension about automation versus development based mouth now, >> it really seems like the industry's going through another step function, just as we saw from traditional data warehouses. Tto win. Who? Duke rolled out that just the order of magnitude, how long it took and the business value return Seems like we're we're going through yet another step function there. So final thing. Yeah, You know what? Some of the first things that people usually get started with any final takeaways you want to share? >> Sure. First, for what people are starting to work with. Is there usually selecting a platform of choice where they're gonna get started in respect of whether Iran analytics and the one take a way I'LL give customers is don't assume that the platform you chose is we're going to end up because new technologies come to market, a new options come. Customers are having mergers, acquisitions, so things change all the time. And as you plan, make sure you have the right infrastructure toe allow you two kind of people support and make changes as you move through the throw. These are innovation. So they may be key key takeaway. And the other one is make sure that you're feeling the right infrastructure that can accommodate speed in terms of real time accomodate scale. Okay, in terms of both enabling data legs, letting cloud data stores having the right efficiency to scale, and then anything agility in respect to being able to deploy solution much, much faster. Yeah, >> well, tomorrow I think that. That's some real important things to say. Well, we know that the only constant Internet industry is change on DH. Therefore, we need to have solutions that can help keep up with that on and be able to manage those environments. And, you know, the the role of is to be able to respond to those needs of the business fast. Because if I don't choose the right thing, the business will go elsewhere. Tara trying to fuck with Angelo. Thank you so much for sharing all the latest on the immigration data platforms. Thank you. Alright, Uh, always lots more on the cube dot Net comes to minimum is always thanks for watching.

Published Date : May 16 2019

SUMMARY :

It's the queue. Itamar on Cory on Who's the senior vice president of enterprise data Integration with Click. and you know how the data integration platform, you know, helps them solve that issue. and the more data you can make available through the process, the faster you can make a development that spectrum from kind of the raw or two, you know, real value. But in the end, to make it happen, we need to get all the data to easy for me to just, you know, allow my data to flow and get to the right place. the mechanics of how you move the data with scale, with efficiency even in real time there's Yeah, well, ah, Lotto unpack there because, you know, I just made the comment. So the journey customs we're going through with a few years, which were very experimental customers Please take the next step. imagine all the innovation you can get by capturing, imposing or the change is. So through moving on to the Delta and the automation, you allow this cape. So a lot of the systems I'm familiar with it's the metadata you know, absolutely in the metadata is critical to us to the whole process. there You know some, you know, journey, you know, math that they help go along. So again, the ability to get results much faster is completely different. it really seems like the industry's going through another step function, just as we saw from traditional data warehouses. assume that the platform you chose is we're going to end up because new technologies come to market, Alright, Uh, always lots more on the cube dot Net comes to minimum is always

ENTITIES

Entity	Category	Confidence
hundreds	QUANTITY	0.99+
Boston	LOCATION	0.99+
Google	ORGANIZATION	0.99+
April 2019	DATE	0.99+
apple	ORGANIZATION	0.99+
Today	DATE	0.99+
seven months	QUANTITY	0.99+
United States	LOCATION	0.99+
Olympics	EVENT	0.99+
thousands	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
Itamar Ankorion	PERSON	0.99+
two years	QUANTITY	0.99+
First	QUANTITY	0.99+
One	QUANTITY	0.99+
dozens	QUANTITY	0.99+
three months	QUANTITY	0.99+
Tara	PERSON	0.99+
thirty day	QUANTITY	0.99+
one hundred	QUANTITY	0.99+
Boston, Massachusetts	LOCATION	0.99+
over five hundred systems	QUANTITY	0.99+
Aman	PERSON	0.99+
tomorrow	DATE	0.99+
one place	QUANTITY	0.98+
two	QUANTITY	0.98+
over fifteen years	QUANTITY	0.98+
third one	QUANTITY	0.98+
Silicon Angle Media Office	ORGANIZATION	0.98+
one	QUANTITY	0.98+
both	QUANTITY	0.98+
today	DATE	0.98+
second eyes	QUANTITY	0.98+
Duke	ORGANIZATION	0.97+
second difference	QUANTITY	0.97+
Angelo	PERSON	0.97+
Casesa	ORGANIZATION	0.97+
over two thousand	QUANTITY	0.96+
second second chair	QUANTITY	0.96+
first	QUANTITY	0.95+
over two hundred billion changes	QUANTITY	0.95+
five ten twenty	QUANTITY	0.93+
two kind	QUANTITY	0.93+
last couple of years	DATE	0.92+
last decade	DATE	0.91+
over ten years old	QUANTITY	0.9+
five hundred six hundred	QUANTITY	0.88+
two fundamental differences	QUANTITY	0.88+
first time	QUANTITY	0.88+
one system	QUANTITY	0.88+
two three data sets	QUANTITY	0.87+
this month	DATE	0.87+
Qlik	PERSON	0.87+
Itamar	PERSON	0.86+
first things	QUANTITY	0.85+
tools	QUANTITY	0.82+
Data Lakes	TITLE	0.82+
Catalyst	ORGANIZATION	0.81+
EJ	ORGANIZATION	0.8+
few years ago	DATE	0.78+
Torrey Tor	TITLE	0.77+
Click	ORGANIZATION	0.77+
Lotto	ORGANIZATION	0.76+
years	QUANTITY	0.74+
Martha	ORGANIZATION	0.73+
Delta	OTHER	0.72+
hundreds of data sources	QUANTITY	0.7+
Iran	LOCATION	0.68+
Delta	ORGANIZATION	0.5+
couple	QUANTITY	0.49+
Cory	PERSON	0.48+
catalyst	ORGANIZATION	0.45+
sources	QUANTITY	0.44+
michelle	ORGANIZATION	0.44+
Seo	ORGANIZATION	0.43+

Itamar Ankorion & Drew Clarke, Qlik | CUBE Conversation, April 2019

>> from the Silicon Angle Media Office in Boston, Massachusetts. It's the queue. Now here's your host. Still minimum. >> Hi, I'm student men and welcome to a special edition of Cube conversations here in our Boston area studio. Habito. Welcome to the program. First of all, to my right, a first time guests on the program Drew Clark, Who's the chief strategy officer? A click and welcome back to the program tomorrow on Carryon. Who's a senior vice president of enterprise data integration now with Click but new title to to the acquisition of Eternity. So thanks so much for joining us, gentlemen. >> Great to be here. >> All right, True, You know, to Nitti we've had on the program anytime we haven't click on the program, but maybe for audience just give us a quick level set on Click. And you know the acquisition, you know, is some exciting news. So let's start there and we'LL get into it. >> Sure, thanks. Teo and Click were a twenty five year old company and the business analytics space. A lot of people know about our products. Clint View, Click Sense. We have fifty thousand customers around the world and from large companies, too kind of small organizations. >> Yeah. Alright. Eso you No way. Talk a lot about data on our program. You know, I looked through some of the clique documentation. It resonated with me a bit because when we talk about digital transformation on our program, the key thing that different to the most between the old way of doing things the modern is I need to be data driven. They need to make my decision the the analytics piece of that s o it. Tomorrow, let's start there and talk about, you know, other than you know, that the logo on your card changes. You know what's the same? What's different going forward for you? >> Well, first, we were excited about that about this merger and the opportunity that we see in the market because there's a huge demand for data, presumably for doing new types of analytics business intelligence. They they's fueling the transformation. And part of the main challenge customers have organizations have is making more data available faster and putting it in the hands of the people who need it. So, on our part of the coming from eternity, we spend the last few years innovating and creating technology that they helped car organizations and modernize how they create new day. The architecture's to support faster data, more agility in terms ofthe enabling data for analytics. And now, together with Click, we can continue to expand that and then the end of the day, provide more data out to more people. >> S o. You know, Drew, it's interesting, you know that there's been no shortage of data out there. You know, we've for decades been talking about the data growth, but actually getting access store data. It's in silos more than ever. It's, you know, spread out all over the day. We say, you know, the challenge of our time is really building distributed architectures and data is really all over the place and, you know, customers. You know, their stats all over the places to how much a searchable how much is available. You know how much is usable? So, you know, explain a little bit, you know, kind of the challenge you're facing. And you know how you're helping move customers along that journey? >> Well, what you bring up stew is thie kind of the idea of kind of data and analytics for decision making and really, it's about that decision making to go faster, and you're going to get into that right kind of language into the right individuals. And we really believe in his concept of data literacy and data literacy was said, I think, well, between two professors who co authored a white paper. One professor was from M I t. The other one's from ever sin college, a communication school. Data literacy is the kind of the ability to read, understand, analyze and argue with data. And the more you can actually get that working inside an organization, the better you have from a decision making and the better competitive advantage you have your evening or wind, you're going to accomplish a mission. And now with what you said, the proliferation of data, it gets harder. And where do you find it? And you need it in real time, and that's where the acquisition of opportunity comes in. >> Okay, I need to ask a follow up on that. So when a favorite events I ever did with two other Emmett professors, yes, where Boston area. We're putting a lot >> of the >> mighty professors here, but any McAfee and Erik Nilsson talked about racing with the machine because, you know, it's so great, you know? You know who's the best chess player out there? Was it you know, the the human grandmaster, or was that the computer? And, you know, the studies were actually is if you put the grandmaster with the computer, they could actually beat either the best computer or the best person. So when you talk about, you know, the data and analytics everybody's looking at, you know, the guy in the ML pieces is like, OK, you know, how do these pieces go together? How does that fit into the data literacy piece? You know, the people and, you know, the machine learning >> well where you bring up is the idea of kind of augmenting the human, and we believe very much around a cognitive kind of interface of kind of the technology, the software with kind of a person and that decision making point. And so what you'LL see around our own kind of perspective is that we were part of a second generation be eye of like self service, and we've moved rapidly into this third generation, which is the cognitive kind of augmentation and the decision maker, right? And so you say this data literacy is arguing with data. Well, how do you argue and actually have the updated machine learning kind of recommendations? But it's still human making that decision. And that's an important kind of component of our kind of, like, our own kind of technology that we bring to the table. But with the two nitti, that's the data side needs to be there faster and more effective. >> Yeah. So, Itamar, please. You know Phyllis in on that. That data is the, you know, we would in big data, we talk about the three V's. So, you know, where are we today? How dowe I be ableto you know, get in leverage all of that data. >> So that's exactly where we've been focused over the last few years and worked with customers that were focused on building new data lakes, new data warehouses, looking at the clouds, building basically more than new foundations for enabling the organization to use way more data than every before. So it goes back to the volume at least one V out of the previous you mentioned. And the other one, of course, is the velocity. And how fast it is, and I've actually come to see that there are, in a sense, two dimensions velocity that come come together. One is how timely is the data you're using. And one of the big changes we're seeing in the market is that the user expectation and the business need for real time data is becoming ever more critical. If we used to talkto customers and talk about real time data because when they asked her data, they get a response very quickly. But it's last week's data. Well, that's not That doesn't cut it. So what we're seeing is that, first of all, the dimension of getting data that Israel Time Day that represents the data is it's currently second one is how quickly you can actually make that happen. So because business dynamics change match much faster now, this speed of change in the industry accelerates. Customers need the ability to put solutions together, make data available to answer business questions really faster. They cannot do it in the order ofthe month and years. They need to do it indoors off days, sometimes even hours. And that's where our solutions coming. >> Yeah, it's interesting. You know, my backgrounds. On the infrastructure side, I spent a lot of time in the cloud world. And, you know, you talk about, you know, health what we need for real time. Well, you know, used to be, you know, rolled out a server. You know, that took me in a week or month and a V m it reduced in time. Now we're, you know, containerized in communities world. And you know what? We're now talking much sort of time frame, and it's like, Oh, if you show me the way something was, you know, an hour ago. Oh, my gosh, That's not the way the world is. And I think, you know, for years we talked to the Duke world. You know what Israel time and how do I really define that? And the answer. We usually came up. It is getting the right information, you know, in the right place, into the right person. Or in the sales standpoint, it's like I need that information to save that client. They get what they need. So we still, you know, some of those terms, you know, scale in real time, short of require context. But you know what? Where does that fit into your customer discussions. >> Well, >> to part says, you bring up. You know, I think what you're saying is absolutely still true. You know, right? Data, right person, right time. It gets harder, though, with just the volumes of data. Where is it? How do you find it? How do you make sure that it's It's the the right pieces to the right place and you brought up the evolution of just the computer infrastructure and analytics likes to be close to the data. But if you have data everywhere, how do you make sure that part works? And we've been investing in a lot of our own Cloud Analytics infrastructure is now done on a micro services basis. So is running on Cuban eighties. Clusters it Khun work in whatever cloud compute infrastructure you want, be it Amazon or zur or Google or kind of your local kind of platform data centers. But you need that kind of small piece tied to the right kind of did on the side. And so that's where you see a great match between the two solutions and when you in the second part is the response from our customer's on DH after the acquisition was announced was tremendous. We II have more customer who works in a manufacturing space was I think this is exactly what I was looking to do from an analytic spaces I needed. Mohr did a real time and I was looking at a variety of solutions. She said, Thank you very much. You made my kind of life a little easier. I can narrow down Teo. One particular platform s so we have manufacturing companies. We have military kind of units and organizations. Teo Healthcare organizations. I've had just countless kind of feedback coming in along that same kind of questions. All >> right, Amaar, you know, for for for the eternity. Customers, What does this mean for them coming into the click family? >> Well, first of all, it means for them that we have a much broader opportunity to serve them. Click is a much, much bigger company. We have more resources. We can put a bear to both continuing enhance The opportunity. Offering is well as creating integrations with other products, such as collecting the click Data catalyst, which are click acquired several months ago. And there's a great synergy between those the products to the product and the collected a catalyst to provide a much more comprehensive, more an enterprise data integration platform, then beyond there to create, also see energies with other, uh, click analytic product. So again, while the click their integration platform consisting Opportunity and Click the catalyst will be independent and provide solutions for any data platform Analytic platform Cloud platform is it already does. Today we'LL continue to investigate. There's also opportunities to create unique see energies with some afar clicks technologies such as the associative Big Data Index and some others to provide more value, especially its scale. >> All right, eso drew, please expand on that a little bit if you can. There's so many pieces I know we're going to spend a little bit. I'm going deeper and some some of the other ones. But when you talk to your customers when you talk to your partners, what do you want to make sure there their key takeaways are >> right. So there is a couple of important points Itamar you made on the data integration platform, and so that's a combination of the eternity products plus the data catalysts, which was, you know, ca wired through podium data. Both of those kind of components are available and will continue to be available for our customers to use on whatever analytics platform. So we have customers who use the data for data science, and they want to work in our python and their own kind of machine learning or working with platforms like data robots. And they'LL be able to continue to do that with that same speed. They also could be using another kind of analytical visualization tool. And you know, we actually have a number of customers to do that, and we'LL continue to support that. So that's the first point, and I think you made up, which is the important one. The second is, while we do think there is some value with using Click Sense with the platform, and we've been investing on a platform called the Associative Big Data Index, and that sounds like a very complicated piece. But it's what we've done is taken are kind of unique kind of value. Proposition is an analytical company which is thehe, bility, toe work with data and ask questions of it and have the answers come to you very quickly is to be able to take that same associative experience, uh, that people use in our product and bring it down to the Data Lake. And that's where you start to see that same kind of what people love about click, view and click sense and brought into the Data Lake. And that's where Tamara was bringing up from a scale kind of perspective. So you have both kind of opportunities, >> Drew, and I really appreciate you sharing the importance of these coming together. We're going to spend some more time digging into the individual pieces there. I might be able to say, OK, are we passed the Data Lakes? Has it got to a data swamp or a data ocean? Because, you know, there are lots of sources of data and you know the like I always say Is that seems a little bit more pristine than the average environment. Eso But thank you so much and look forward to having more conversations with thanks to all right, you. And be sure to, uh, check out the cute dot net for all our videos on stew minimum. Thanks so much for watching

Published Date : May 16 2019

SUMMARY :

It's the queue. First of all, to my right, a first time guests on the program Drew And you know the acquisition, A lot of people know about our products. Tomorrow, let's start there and talk about, you know, other than you know, is making more data available faster and putting it in the hands of the people who need it. really all over the place and, you know, customers. And the more you can actually get that working So when a favorite events I ever did with two other Emmett You know, the people and, you know, the machine learning And so you say this data literacy is arguing with data. That data is the, you know, looking at the clouds, building basically more than new foundations for enabling the organization to use way more It is getting the right information, you know, in the right place, And so that's where you see a great match between the two solutions right, Amaar, you know, for for for the eternity. And there's a great synergy between those the products to the product and the collected a catalyst to provide a But when you talk to your customers when you talk to your partners, what do you want to make sure there their key the answers come to you very quickly is to be able to take that same associative experience, you know, there are lots of sources of data and you know the like I always say Is that seems

ENTITIES

Entity	Category	Confidence
Steve	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Steve Manly	PERSON	0.99+
Sanjay	PERSON	0.99+
Rick	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Verizon	ORGANIZATION	0.99+
David	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Fernando Castillo	PERSON	0.99+
John	PERSON	0.99+
Dave Balanta	PERSON	0.99+
Erin	PERSON	0.99+
Aaron Kelly	PERSON	0.99+
Jim	PERSON	0.99+
Fernando	PERSON	0.99+
Phil Bollinger	PERSON	0.99+
Doug Young	PERSON	0.99+
1983	DATE	0.99+
Eric Herzog	PERSON	0.99+
Lisa	PERSON	0.99+
Deloitte	ORGANIZATION	0.99+
Yahoo	ORGANIZATION	0.99+
Spain	LOCATION	0.99+
25	QUANTITY	0.99+
Pat Gelsing	PERSON	0.99+
Data Torrent	ORGANIZATION	0.99+
EMC	ORGANIZATION	0.99+
Aaron	PERSON	0.99+
Dave	PERSON	0.99+
Pat	PERSON	0.99+
AWS Partner Network	ORGANIZATION	0.99+
Maurizio Carli	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Drew Clark	PERSON	0.99+
March	DATE	0.99+
John Troyer	PERSON	0.99+
Rich Steeves	PERSON	0.99+
Europe	LOCATION	0.99+
BMW	ORGANIZATION	0.99+
VMware	ORGANIZATION	0.99+
three years	QUANTITY	0.99+
85%	QUANTITY	0.99+
Phu Hoang	PERSON	0.99+
Volkswagen	ORGANIZATION	0.99+
1	QUANTITY	0.99+
Cook Industries	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
Dave Valata	PERSON	0.99+
Red Hat	ORGANIZATION	0.99+
Peter Burris	PERSON	0.99+
Boston	LOCATION	0.99+
Stephen Jones	PERSON	0.99+
UK	LOCATION	0.99+
Barcelona	LOCATION	0.99+
Better Cybercrime Metrics Act	TITLE	0.99+
2007	DATE	0.99+
John Furrier	PERSON	0.99+

Dave McDonnell, IBM | Dataworks Summit EU 2018

>> Narrator: From Berlin, Germany, it's theCUBE (relaxing music) covering DataWorks Summit Europe 2018. (relaxing music) Brought to you by Hortonworks. (quieting music) >> Well, hello and welcome to theCUBE. We're here at DataWorks Summit 2018 in Berlin, Germany, and it's been a great show. Who we have now is we have IBM. Specifically we have Dave McDonnell of IBM, and we're going to be talkin' with him for the next 10 minutes or so about... Dave, you explain. You are in storage for IBM, and IBM of course is a partner of Hortonworks who are of course the host of this show. So Dave, have you been introduced, give us your capacity or roll at IBM. Discuss the partnership of Hortonworks, and really what's your perspective on the market for storage systems for Big Data right now and going forward? And what kind of work loads and what kind of requirements are customers coming to you with for storage systems now? >> Okay, sure, so I lead alliances for the storage business unit, and Hortonworks, we actually partner with Hortonworks not just in our storage business unit but also with our analytics counterparts, our power counterparts, and we're in discussions with many others, right? Our partner organization services and so forth. So the nature of our relationship is quite broad compared to many of our others. We're working with them in the analytics space, so these are a lot of these Big Data Data Lakes, BDDNA a lot of people will use as an acronym. These are the types of work loads that customers are using us both for. >> Mm-hmm. >> And it's not new anymore, you know, by now they're well past their first half dozen applications. We've got customers running hundreds of applications. These are production applications now, so it's all about, "How can I be more efficient? "How can I grow this? "How can I get the best performance and scalability "and ease of management to deploy these "in a way that's manageable?" 'cause if I have 400 production applications, that's not off in any corner anymore. So that's how I'd describe it in a nutshell. >> One of the trends that we're seeing at Wikibon, of course I'm the lead analyst for Big Data Analytics at Wikibon under SiliconANGLE Media, we're seeing a trend in the marketplace towards I wouldn't call them appliances, but what I would call them is workload optimized hardware software platforms so they can combine storage with compute and are optimized for AI and machine learning and so forth. Is that something that you're hearing from customers, that they require those built-out, AI optimized storage systems, or is that far in the future or? Give me a sense for whether IBM is doing anything in that area and whether that's on your horizon. >> If you were to define all of IBM in five words or less, you would say "artificial intelligence and cloud computing," so this is something' >> Yeah. that gets a lot of thought in Mindshare. So absolutely we hear about it a lot. It's a very broad market with a lot of diverse requirements. So we hear people asking for the Converged infrastructure, for Appliance solutions. There's of course Hyper Converged. We actually have, either directly or with partners, answers to all of those. Now we do think one of the things that customers want to do is they're going to scale and grow in these environments is to take a software-defined strategy so they're not limited, they're not limited by hardware blocks. You know, they don't want to have to buy processing power and spend all that money on it when really all they need is more data. >> Yeah. >> There's pros and cons to the different (mumbles). >> You have power AI systems, I know that, so that's where they're probably heading, yeah. >> Yes, yes, yes. So of course, we have packages that we've modeled in AI. They feed off of some of the Hortonworks data lakes that we're building. Of course we see a lot of people putting these on new pieces of infrastructure because they don't want to put this on their production applications, so they're extracting data from maybe a Hortonworks data lake number one, Hortonworks data lake number two, some of the EDWs, some external data, and putting that into the AI infrastructure. >> As customers move their cloud infrastructures towards more edge facing environments, or edge applications, how are storage requirements change or evolving in terms of in the move to edge computing. Can you give us a sense for any sort of trends you're seeing in that area? >> Well, if we're going to the world of AI and cognitive applications, all that data that I mighta thrown in the cloud five years ago I now, I'm educated enough 'cause I've been paying bills for a few years on just how expensive it is, and if I'm going to be bringing that data back, some of which I don't even know I'm going to be bringing back, it gets extremely expensive. So we see a pendulum shift coming back where now a lot of data is going to be on host, ah sorry, on premise, but it's not going to stay there. They need the flexibility to move it here, there, or everywhere. So if it's going to come back, how can we bring customers some of that flexibility that they liked about the cloud, the speed, the ease of deployment, even a consumption based model? These are very big changes on a traditional storage manufacturer like ourselves, right? So that's requiring a lot of development in software, it's requiring a lot of development in our business model, and one of the biggest thing you hear us talk about this year is IBM Cloud Private, which does exactly that, >> Right. and it gives them somethin' they can work with that's flexible, it's agile, and allows you to take containerized based applications and move them back and forth as you please. >> Yeah. So containerized applications. So if you can define it for our audience, what is a containerized application? You talk about Docker and orchestrate it through Kubernetes and so forth. So you mentioned Cloud Private. Can you bring us up to speed on what exactly Cloud Private is and in terms of the storage requirements or storage architecture within that portfolio? >> Oh yes, absolutely. So this is a set of infrastructure that's optimized for on-premise deployment that gives you multi-cloud access, not just IBM Cloud, Amazon Web Services, Microsoft Azure, et cetera, and then it also gives you multiple architectural choices basically wrapped by software to allow you to move those containers around and put them where you want them at the right time at the right place given the business requirement at that hour. >> Now is the data storager persisted in the container itself? I know that's fairly difficult to do in a Docker environment. How do ya handle persistence of data for containerized applications within your architecture? >> Okay, some of those are going to be application specific. It's the question of designing the right data management layer depending on the application. So we have software intelligence, some of it from open source, some of which we add on top of open source to bring some of the enterprise resilience and performance needed. And of course, you have to be very careful if the biggest trend in the world is unstructured data. Well, okay fine, it's a lot of sensor data. That's still fairly easy to move around. But once we get into things like medical images, lots of video, you know, HD video, 4K video, those are the things which you have to give a lot of thought to how to do that. And that's why we have lots of new partners that we work with the help us with edge cloud, which gives that on premise-like performance in really a cloud-like set up. >> Here's a question out of left field, and you may not have the answer, but I would like to hear your thoughts on this. How has Blockchain, and IBM's been making significant investments in blockchain technology database technology, how is blockchain changing the face of the storage industry in terms of customers' requirements for a storage systems to manage data in distributed blockchains? Is that something you're hearing coming from customers as a requirement? I'm just tryin' to get a sense for whether that's, you know, is it moving customers towards more flash, towards more distributed edge-oriented or edge deployed storage systems? >> Okay, so yes, yes, and yes. >> Okay. So all of a sudden, if you're doing things like a blockchain application, things become even more important than they are today. >> Yeah. >> Okay, so you can't lose a transaction. You can't have a storage going down. So there's a lot more care and thought into the resiliency of the infrastructure. If I'm, you know, buying a diamond from you, I can't accept the excuse that my $100,000 diamond, maybe that's a little optimistic, my $10,000 diamond or yours, you know, the transaction's corrupted because the data's not proper. >> Right. >> Or if I want my privacy, I need to be assured that there's good data governance around that transaction, and that that will be protected for a good 10, 20, and 30 years. So it's elevating the importance of all the infrastructure to a whole different level. >> Switching our focus slightly, so we're here at DataWorks Summit in Berlin. Where are the largest growth markets right now for cloud storage systems? Is it Apache, is it the North America, or where are the growth markets in terms of regions, in terms of vertical industries right now in the marketplace for enterprise grade storage systems for big data in the cloud? >> That's a great question, 'cause we certainly have these conversations globally. I'd say the place where we're seeing the most activity would be the Americas, we see it in China. We have a lot of interesting engagements and people reaching out to us. I would say by market, you can also point to financial services in more than those two regions. Financial services, healthcare, retail, these are probably the top verticals. I think it's probably safe to assume, and we can the federal governments also have a lot of stringent requirements and, you know, requirements, new applications around the space as well. >> Right. GDPR, how is that impacting your customers' storage requirements. The requirement for GDPR compliance, is that moving the needle in terms of their requirement for consolidated storage of the data that they need to maintain? I mean obviously there's a security, but there's just the sheer amount of, there's a leading to consolidation or centralization of storage, of customer data, that would seem to make it easier to control and monitor usage of the data. Is it making a difference at all? >> It's making a big difference. Not many people encrypt data today, so there's a whole new level of interest in encryption at many different levels, data at rest, data in motion. There's new levels of focus and attention on performance, on the ability for customers to get their arms around disparate islands of data, because now GDPR is not only a legal requirement that requires you to be able to have it, but you've also got timelines which you're expected to act on a request from a customer to have your data removed. And most of those will have a baseline of 30 days. So you can't fool around now. It's not just a nice to have. It's an actual core part of a business requirement that if you don't have a good strategy for, you could be spending tens of millions of dollars in liability if you're not ready for it. >> Well Dave, thank you very much. We're at the end of our time. This has been Dave McDonnell of IBM talking about system storage and of course a big Hortonworks partner. We are here on day two of the DataWorks Summit, and I'm James Kobielus of Wikibon SiliconANGLE Media, and have a good day. (upbeat music)

Published Date : Apr 19 2018

SUMMARY :

Brought to you by Hortonworks. are customers coming to you with for storage systems now? So the nature of our relationship is quite broad "and ease of management to deploy these One of the trends that we're seeing at Wikibon, and spend all that money on it to the different (mumbles). so that's where they're probably heading, yeah. and putting that into the AI infrastructure. in terms of in the move to edge computing. and one of the biggest thing you hear us and allows you to take containerized based applications and in terms of the storage requirements and put them where you want them at the right time in the container itself? And of course, you have to be very careful and you may not have the answer, and yes. So all of a sudden, Okay, so you can't So it's elevating the importance of all the infrastructure for big data in the cloud? and people reaching out to us. is that moving the needle in terms of their requirement on the ability for customers to get their arms around and of course a big Hortonworks partner.

ENTITIES

Entity	Category	Confidence
Nicola	PERSON	0.99+
Michael	PERSON	0.99+
David	PERSON	0.99+
Josh	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Jeremy Burton	PERSON	0.99+
Paul Gillon	PERSON	0.99+
GM	ORGANIZATION	0.99+
Bob Stefanski	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Dave McDonnell	PERSON	0.99+
amazon	ORGANIZATION	0.99+
John	PERSON	0.99+
James Kobielus	PERSON	0.99+
Keith	PERSON	0.99+
Paul O'Farrell	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Keith Townsend	PERSON	0.99+
BMW	ORGANIZATION	0.99+
Ford	ORGANIZATION	0.99+
David Siegel	PERSON	0.99+
Cisco	ORGANIZATION	0.99+
Sandy	PERSON	0.99+
Nicola Acutt	PERSON	0.99+
Paul	PERSON	0.99+
David Lantz	PERSON	0.99+
Stu Miniman	PERSON	0.99+
three	QUANTITY	0.99+
Lisa	PERSON	0.99+
Lithuania	LOCATION	0.99+
Michigan	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
General Motors	ORGANIZATION	0.99+
Apple	ORGANIZATION	0.99+
America	LOCATION	0.99+
Charlie	PERSON	0.99+
Europe	LOCATION	0.99+
Pat Gelsing	PERSON	0.99+
Google	ORGANIZATION	0.99+
Bobby	PERSON	0.99+
London	LOCATION	0.99+
Palo Alto	LOCATION	0.99+
Dante	PERSON	0.99+
Switzerland	LOCATION	0.99+
six-week	QUANTITY	0.99+
VMware	ORGANIZATION	0.99+
Seattle	LOCATION	0.99+
Bob	PERSON	0.99+
Amazon Web Services	ORGANIZATION	0.99+
100	QUANTITY	0.99+
Michael Dell	PERSON	0.99+
John Walls	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
California	LOCATION	0.99+
Sandy Carter	PERSON	0.99+

Aaron Kalb, Alation | BigData NYC 2017

>> Announcer: Live from midtown Manhattan, it's the Cube. Covering Big Data New York City 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. >> Welcome back everyone, we are here live in New York City, in Manhattan for BigData NYC, our event we've been doing for five years in conjunction with Strata Data which is formerly Strata Hadoop, which was formerly Strata Conference, formerly Hadoop World. We've been covering the big data space going on ten years now. This is the Cube. I'm here with Aaron Kalb, whose Head of Product and co-founder at Alation. Welcome to the cube. >> Aaron Kalb: Thank you so much for having me. >> Great to have you on, so co-founder head of product, love these conversations because you're also co-founder, so it's your company, you got a lot of equity interest in that, but also head of product you get to have the 20 mile stare, on what the future looks, while inventing it today, bringing it to market. So you guys have an interesting take on the collaboration of data. Talk about what the means, what's the motivation behind that positioning, what's the core thesis around Alation? >> Totally so the thing we've observed is a lot of people working in the data space, are concerned about the data itself. How can we make it cheaper to store, faster to process. And we're really concerned with the human side of it. Data's only valuable if it's used by people, how do we help people find the data, understand the data, trust in the data, and that involves a mix of algorithmic approaches and also human collaboration, both human to human and human to computer to get that all organized. >> John Furrier: It's interesting you have a symbolics background from Stanford, worked at Apple, involved in Siri, all this kind of futuristic stuff. You can't go a day without hearing about Alexia is going to have voice-activated, you've got Siri. AI is taking a really big part of this. Obviously all of the hype right now, but what it means is the software is going to play a key role as an interface. And this symbolic systems almost brings on this neural network kind of vibe, where objects, data, plays a critical role. >> Oh, absolutely, yeah, and in the early days when we were co-founding the company, we talked about what is Siri for the enterprise? Right, I was you know very excited to work on Siri, and it's really a kind of fun gimmick, and it's really useful when you're in the car, your hands are covered in cookie dough, but if you could answer questions like what was revenue last quarter in the UK and get the right answer fast, and have that dialogue, oh do you mean fiscal quarter or calendar quarter. Do you mean UK including Ireland, or whatever it is. That would really enable better decisions and a better outcome. >> I was worried that Siri might do something here. Hey Siri, oh there it is, okay be careful, I don't want it to answer and take over my job. >> (laughs) >> Automation will take away the job, maybe Siri will be doing interviews. Okay let's take a step back. You guys are doing well as a start up, you've got some great funding, great investors. How are you guys doing on the product? Give us a quick highlight on where you guys are, obviously this is BigData NYC a lot going on, it's Manhattan, you've got financial services, big industry here. You've got the Strata Data event which is the classic Hadoop industry that's morphed into data. Which really is overlapping with cloud, IoTs application developments all kind of coming together. How do you guys fit into that world? >> Yeah, absolutely, so the idea of the data lake is kind of interesting. Psychologically it's sort of a hoarder mentality, oh everything I've ever had I want to keep in the attic, because I might need it one day. Great opportunity to evolve these new streams of data, with IoT and what not, but just cause you can get to it physically doesn't mean it's easy to find the thing you want, the needle in all that big haystack and to distinguish from among all the different assets that are available, which is the one that is actually trustworthy for your need. So we find that all these trends make the need for a catalog to kind of organize that information and get what you want all the more valuable. >> This has come up a lot, I want to get into the integration piece and how you're dealing with your partnerships, but the data lake integration has been huge, and having the catalog has come up with, has been the buzz. Foundationally if you will saying catalog is important. Why is it important to do the catalog work up front, with a lot of the data strategies? >> It's a great question, so, we see data cataloging as step zero. Before you can prep the data in a tool like Trifacta, PACSAT, or Kylo. Before you can visualize it in a tool like Tableau, or MicroStrategy. Before you can do some sort of cool prediction of what's going to happen in the future, with a data science engine, before any of that. These are all garbage in garbage out processes. The step zero is find the relevant data. Understand it so you can get it in the right format. Trust that it's good and then you can do whatever comes next >> And governance has become a key thing here, we've heard of the regulations, GDPR outside of the United States, but also that's going to have an arms length reach over into the United States impact. So these little decisions, and there's going to be an Equifax someday out there. Another one's probably going to come around the corner. How does the policy injection change the catalog equation? A lot of people are building machine learning algorithms on top of catalogs, and they're worried they might have to rewrite everything. How do you balance the trade off between good catalog design and flexibility on the algorithm side? >> Totally yes it's a complicated thing with governance and consumption right. There's people who are concerned with keeping the data safe, and there are people concerned with turning that data into real value, and these can seem to be at odds. What we find is actually a catalog as a foundation for both, and they are not as opposed as they seem. What Alation fundamentally does is we make a map of where the data is, who's using what data, when, how. And that can actually be helpful if your goal is to say let's follow in the footsteps of the best analyst and make more insights generated or if you want to say, hey this data is being used a lot, let's make sure it's being used correctly. >> And by the right people. >> And by the right people exactly >> Equifax they were fishing that pond dry months, months before it actually happened. With good tools like this they might have seen this right? Am I getting it right? >> That's exactly right, how can you observe what's going on to make sure it's compliant and that the answers are correct and that it's happening quickly and driving results. >> So in a way you're taking the collective intelligence of the user behavior and using that into understanding what to do with the data modeling? >> That's exactly right. We want to make each person in your organization as knowledgeable as all of their peers combined. >> So the benefit then for the customer would be if you see something that's developing you can double down on it. And if the users are using a lot of data, then you can provision more technology, more software. >> Absolutely, absolutely. It's sort of like when I was going to Stanford, there was a place where the grass was all dead, because people were riding their bikes diagonally across it. And then somebody smart was like, we're going to put a real gravel path there. So the infrastructure should follow the usage, instead of being something you try to enforce on people. >> It's a classic design meme that goes around. Good design is here, the more effective design is the path. >> Exactly. >> So let's get into the integration. So one of the hot topics here this year obviously besides cloud and AI, with cloud really being more the driver, the tailwind for the growth, AI being more the futuristic head room, is integration. You guys have some partnerships that you announced with integration, what are some of the key ones, and why are they important? >> Absolutely, so, there have been attempts in the past to centralize all the data in one place have one warehouse or one lake have one BI tool. And those generally fail, for different reasons, different teams pick different stacks that work for them. What we think is important is the single source of reference One hub with spokes out to all those different points. If you think about it it's like Google, it's one index of the whole web even though the web is distributed all over the place. To make that happen it's very important that we have partnerships to get data in from various sources. So we have partnerships with database vendors, with Cloudera and Hortonworks, with different BI tools. What's new are a few things. One is with Cloudera Navigator, they have great technical metadata around security and lineage over HGFS, and that's a way to bolster our catalog to go even deeper into what's happening in the files before things get surfaced and higher for places where we have a deeper offering today. >> So it's almost a connector to them in a way, you kind of share data. >> That's exactly right, we've a lot of different connectors, this is one new one that we have. Another, go ahead. >> I was going to go ahead continue. >> I was just going to say another place that is exciting is data prep tools, so Trifacta and Paxata are both places where you can find and understand an alation and then begin to manipulate in those tools. We announced with Paxata yesterday, the ability to click to profile, so if you want to actually see what's in some raw compressed avro file, you can see that in one click. >> It's interesting, Paxata has really been almost lapping, Trifacta because they were the leader in my mind, but now you've got like a Nascar race going on between the two firms, because data wrangling is a huge issue. Data prep is where everyone is stuck right now, they just want to do the data science, it's interesting. >> They are both amazing companies and I'm happy to partner with both. And actually Trifacta and Alation have a lot of joint customers we're psyched to work with as well. I think what's interesting is that data prep, and this is beginning to happen with analyst definitions of that field. It isn't just preparing the data to be used, getting it cleaned and shaped, it's also preparing the humans to use the data giving them the confidence, the tools, the knowledge to know how to manipulate it. >> And it's great progress. So the question I wanted to ask is now the other big trend here is, I mean it's kind of a subtext in this show, it's not really front and center but we've been seeing it kind of emerge as a concept, we see in the cloud world, on premise vs cloud. On premise a lot of people bring in the dev ops model in, and saying I may move to the cloud for bursting and some native applications, but at the end of the day there is a lot of work going on on premise. A lot of companies are kind of cleaning house, retooling, replatforming, whatever you want to do resetting. They are kind of getting their house in order to do on prem cloud ops, meaning a business model of cloud operations on site. A lot of people doing that, that will impact the story, it's going to impact some of the server modeling, that's a hot trend. How do you guys deal with the on premise cloud dynamic? >> Totally, so we just want to do what's right for the customer, so we deploy both on prem and in the cloud and then from wherever the Alation server is it will point to usually a mix of sources, some that are in the cloud like vetshifter S3 often with Amazon today, and also sources that are on prem. I do think I'm seeing a trend more and more toward the cloud and we have people that are migrating from HGFS to S3 is one thing we hear a lot about it. Strata with sort of dupe interest. But I think what's happening is people are realizing as each Equifax in turn happens, that this old wild west model of oh you surround your bank with people on horseback and it's physically in one place. With data it isn't like that, most people are saying I'd rather have the A+ teams at Salesforce or Amazon or Google be responsible for my security, then the people I can get over in the midwest. >> And the Paxata guys have loved the term Data Democracy, because that is really democratization, making the data free but also having the governance thing. So tell me about the Data Lake governance, because I've never loved the term Data Lake, I think it's more of a data ocean, but now you see data lake, data lake, data lake. Are they just silos of data lakes happening now? Are people trying to connect them? That's key, so that's been a key trend here. How do you handle the governance across multiple data lakes? >> That's right so the key is to have that single source of reference, so that regardless of which lake or warehouse, or little siloed Sequel server somewhere, that you can search in a single portal and find that thing no matter where it is. >> John: Can you guys do that? >> We can do that, yeah, I think the metaphor for people who haven't seen it really is Google, if you think about it, you don't even know what physical server a webpage is hosted from. >> Data lakes should just be invisible >> Exactly. >> So your interfacing with multiple data lakes, that's a value proposition for you. >> That's right so it could be on prem or in the cloud, multi-cloud. >> Can you share an example of a customer that uses that and kind of how it's laid out? >> Absolutely, so one great example of an interesting data environment is eBay. They have the biggest teradata warehouse in the world. They also have I believe two huge data lakes, they have hive on top of that, and Presto is used to sort of virtualize it across a mixture of teradata, and hive and then direct Presto query It gets very complicated, and they have, they are a very data driven organization, so they have people who are product owners who are in jobs where data isn't in their job title and they know how to look at excel and look at numbers and make choices, but they aren't real data people. Alation provides that accessibility so that they can understand it. >> We used to call the Hadoop world the car show for the data world, where for a long time it was about the engine what was doing what, and then it became, what's the car, and now how's it drive. Seeing that same evolution now where all that stuff has to get done under the hood. >> Aaron: Exactly. >> But there are still people who care about that, right. They are the mechanics, they are the plumbers, whatever you want to call them, but then the data science are the guys really driving things and now end users potentially, and even applications bots or what nots. It seems to evolve, that's where we're kind of seeing the show change a little bit, and that's kind of where you see some of the AI things. I want to get your thoughts on how you or your guys are using AI, how you see AI, if it's AI at all if it's just machine learning as a baby step into AI, we all know what AI could be, but it's really just machine learning now. How do you guys use quote AI and how has it evolved? >> It's a really insightful question and a great metaphor that I love. If you think about it, it used to be how do you build the car, and now I can drive the car even though I couldn't build it or even fix it, and soon I don't even have to drive the car, the car will just drive me, all I have to know is where I want to go. That's sortof the progression that we see as well. There's a lot of talk about deep learning, all these different approaches, and it's super interesting and exciting. But I think even more interesting than the algorithms are the applications. And so for us it's like today how do we get that turn by turn directions where we say turn left at the light if you want to get there And eventually you know maybe the computer can do it for you The thing that is also interesting is to make these algorithms work no matter how good your algorithm is it's all based on the quality of your training data. >> John: Which is a historical data. Historical data in essence the more historical data you have you need that to train the data. >> Exactly right, and we call this behavior IO how do we look at all the prior human behavior to drive better behavior in the future. And I think the key for us is we don't want to have a bunch of unpaid >> John: You can actually get that URL behavioral IO. >> We should do it before it's too late (Both laugh) >> We're live right now, go register that Patrick. >> Yeah so the goal is we don't want to have a bunch of unpaid interns trying to manually attack things, that's error prone and that's slow. I look at things like Luis von Ahn over at CMU, he does a thing where as you're writing in a CAPTCHA to get an email account you're also helping Google recognize a hard to read address or a piece of text from books. >> John: If you shoot the arrow forward, you just take this kind of forward, you almost think augmented reality is a pretext to what we might see for what you're talking about and ultimately VR are you seeing some of the use cases for virtual reality be very enterprise oriented or even end consumer. I mean Tom Brady the best quarterback of all time, he uses virtual reality to play the offense virtually before every game, he's a power user, in pharma you see them using virtual reality to do data mining without being in the lab, so lab tests. So you're seeing augmentation coming in to this turn by turn direction analogy. >> It's exactly, I think it's the other half of it. So we use AI, we use techniques to get great data from people and then we do extra work watching their behavior to learn what's right. And to figure out if there are recommendations, but then you serve those recommendations, either it's Google glasses it appears right there in your field of view. We just have to figure out how do we make sure, that in a moment of you're making a dashboard, or you're making a choice that you have that information right on hand. >> So since you're a technical geek, and a lot of folks would love to talk about this, so I'll ask you a tough question cause this is something everyone is trying to chase for the holy grail. How do you get the right piece of data at the right place at the right time, given that you have all these legacy silos, latencies and network issues as well, so you've got a data warehouse, you've got stuff in cold storage, and I've got an app and I'm doing something, there could be any points of data in the world that could be in milliseconds potentially on my phone or in my device my internet of thing wearable. How do you make that happen? Because that's the struggle, at the same time keep all the compliance and all the overhead involved, is it more compute, is it an architectural challenge how do you view that because this is the big challenge of our time. >> Yeah again I actually think it's the human challenge more than the technology challenge. It is true that there is data all over the place kind of gathering dust, but again if you think about Google, billions of web pages, I only care about the one I'm about to use. So for us it's really about being in that moment of writing a query, building a chart, how do we say in that moment, hey you're using an out of date definition of profit. Or hey the database you chose to use, the one thing you chose out of the millions that is actually is broken and stale. And we have interventions to do that with our partners and through our own first party apps that actually change how decisions get made at companies. >> So to make that happen, if I imagine it, you'd have to need access to the data, and then write software that is contextually aware to then run, compute, in context to the user interaction. >> It's exactly right, back to the turn by turn directions concept you have to know both where you're trying to go and where you are. And so for us that can be the from where I'm writing a Sequel statement after join we can suggest the table most commonly joined with that, but also overlay onto that the fact that the most commonly joined table was deprecated by a data steward data curator. So that's the moment that we can change the behavior from bad to good. >> So a chief data officer out there, we've got to wrap up, but I wanted to ask one final question, There's a chief data officer out there they might be empowered or they might be just a CFO assistant that's managing compliance, either way, someone's going to be empowered in an organization to drive data science and data value forward because there is so much proof that data science works. From military to play you're seeing examples where being data driven actually has benefits. So everyone is trying to get there. How do you explain the vision of Alation to that prospect? Because they have so much to select from, there's so much noise, there's like, we call it the tool shed out there, there's like a zillion tools out there there's like a zillion platforms, some tools are trying to turn into something else, a hammer is trying to be a lawnmower. So they've got to be careful on who the select, so what's the vision of Alation to that chief data officer, or that person in charge of analytics to scale operational analytics. >> Absolutely so we say to the CDO we have a shared vision for this place where your company is making decisions based on data, instead of based on gut, or expensive consultants months too late. And the way we get there, the reason Alation adds value is, we're sort of the last tool you have to buy, because with this lake mentality, you've got your tool shed with all the tools, you've got your library with all the books, but they're just in a pile on the floor, if you had a tool that had everything organized, so you just said hey robot, I need an hammer and this size nail and this text book on this set of information and it could just come to you, and it would be correct and it would be quick, then you could actually get value out of all the expense you've already put in this infrastructure, that's especially true on the lake. >> And also tools describe the way the works done so in that model tools can be in the tool shed no one needs to know it's in there. >> Aaron: Exactly. >> You guys can help scale that. Well congratulations and just how far along are you guys in terms of number of employees, how many customers do you have? If you can share that, I don't know if that's confidential or what not >> Absolutely, so we're small but growing very fast planning to double in the next year, and in terms of customers, we've got 85 customers including some really big names. I mentioned eBay, Pfizer, Safeway Albertsons, Tesco, Meijer. >> And what are they saying to you guys, why are they buying, why are they happy? >> They share that same vision of a more data driven enterprise, where humans are empowered to find out, understand, and trust data to make more informed choices for the business, and that's why they come and come back. >> And that's the product roadmap, ethos, for you guys that's the guiding principle? >> Yeah the ultimate goal is to empower humans with information. >> Alright Aaron thanks for coming on the Cube. Aaron Kalb, co-founder head of product for Alation here in New York City for BigData NYC and also Strata Data I'm John Furrier thanks for watching. We'll be right back with more after this short break.

Published Date : Sep 28 2017

SUMMARY :

Brought to you by This is the Cube. Great to have you on, so co-founder head of product, Totally so the thing we've observed is a lot Obviously all of the hype right now, and get the right answer fast, and have that dialogue, I don't want it to answer and take over my job. How are you guys doing on the product? doesn't mean it's easy to find the thing you want, and having the catalog has come up with, has been the buzz. Understand it so you can get it in the right format. and flexibility on the algorithm side? and make more insights generated or if you want to say, Am I getting it right? That's exactly right, how can you observe what's going on We want to make each person in your organization So the benefit then for the customer would be So the infrastructure should follow the usage, Good design is here, the more effective design is the path. You guys have some partnerships that you announced it's one index of the whole web So it's almost a connector to them in a way, this is one new one that we have. the ability to click to profile, going on between the two firms, It isn't just preparing the data to be used, but at the end of the day there is a lot of work for the customer, so we deploy both on prem and in the cloud because that is really democratization, making the data free That's right so the key is to have that single source really is Google, if you think about it, So your interfacing with multiple data lakes, on prem or in the cloud, multi-cloud. They have the biggest teradata warehouse in the world. the car show for the data world, where for a long time and that's kind of where you see some of the AI things. and now I can drive the car even though I couldn't build it Historical data in essence the more historical data you have to drive better behavior in the future. Yeah so the goal is and ultimately VR are you seeing some of the use cases but then you serve those recommendations, and all the overhead involved, is it more compute, the one thing you chose out of the millions So to make that happen, if I imagine it, back to the turn by turn directions concept you have to know How do you explain the vision of Alation to that prospect? And the way we get there, no one needs to know it's in there. If you can share that, I don't know if that's confidential planning to double in the next year, for the business, and that's why they come and come back. Yeah the ultimate goal is Alright Aaron thanks for coming on the Cube.

ENTITIES

Entity	Category	Confidence
Luis von Ahn	PERSON	0.99+
eBay	ORGANIZATION	0.99+
Aaron Kalb	PERSON	0.99+
Pfizer	ORGANIZATION	0.99+
John	PERSON	0.99+
Aaron	PERSON	0.99+
Tesco	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Safeway Albertsons	ORGANIZATION	0.99+
Siri	TITLE	0.99+
Google	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
New York City	LOCATION	0.99+
UK	LOCATION	0.99+
20 mile	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
BigData	ORGANIZATION	0.99+
five years	QUANTITY	0.99+
Equifax	ORGANIZATION	0.99+
two firms	QUANTITY	0.99+
Apple	ORGANIZATION	0.99+
Meijer	ORGANIZATION	0.99+
ten years	QUANTITY	0.99+
Cloudera	ORGANIZATION	0.99+
Trifacta	ORGANIZATION	0.99+
85 customers	QUANTITY	0.99+
Alation	ORGANIZATION	0.99+
Patrick	PERSON	0.99+
both	QUANTITY	0.99+
Strata Data	ORGANIZATION	0.99+
millions	QUANTITY	0.99+
United States	LOCATION	0.99+
Paxata	ORGANIZATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
excel	TITLE	0.99+
Manhattan	LOCATION	0.99+
last quarter	DATE	0.99+
Ireland	LOCATION	0.99+
GDPR	TITLE	0.99+
Tom Brady	PERSON	0.99+
each person	QUANTITY	0.99+
Salesforce	ORGANIZATION	0.98+
next year	DATE	0.98+
NYC	LOCATION	0.98+
one	QUANTITY	0.98+
this year	DATE	0.98+
yesterday	DATE	0.98+
today	DATE	0.97+
one lake	QUANTITY	0.97+
Nascar	ORGANIZATION	0.97+
one warehouse	QUANTITY	0.97+
Strata Data	EVENT	0.96+
Tableau	TITLE	0.96+
One	QUANTITY	0.96+
Both laugh	QUANTITY	0.96+
billions of web pages	QUANTITY	0.96+
single portal	QUANTITY	0.95+

Christian Rodatus, Datameer | BigData NYC 2017

>> Announcer: Live from Midtown Manhattan, it's theCUBE covering Big Data New York City 2017. Brought to by SiliconANGLE Media and its ecosystem sponsors. >> Coverage to theCUBE in New York City for Big Data NYC, the hashtag is BigDataNYC. This is our fifth year doing our own event in conjunction with Strata Hadoop, now called Strata Data, used to be Hadoop World, our eighth year covering the industry, we've been there from the beginning in 2010, the beginning of this revolution. I'm John Furrier, the co-host, with Jim Kobielus, our lead analyst at Wikibon. Our next guest is Christian Rodatus, who is the CEO of Datameer. Datameer, obviously, one of the startups now evolving on the, I think, eighth year or so, roughly seven or eight years old. Great customer base, been successful blocking and tackling, just doing good business. Your shirt says show him the data. Welcome to theCUBE, Christian, appreciate it. >> So well established, I barely think of you as a startup anymore. >> It's kind of true, and actually a couple of months ago, after I took on the job, I met Mike Olson, and Datameer and Cloudera were sort of founded the same year, I believe late 2009, early 2010. Then, he told me there were two open source projects with MapReduce and Hadoop, basically, and Datameer was founded to actually enable customers to do something with it, as an entry platform to help getting data in, create the data and doing something with it. And now, if you walk the show floor, it's a completely different landscape now. >> We've had you guys on before, the founder, Stefan, has been on. Interesting migration, we've seen you guys grow from a customer base standpoint. You've come on as the CEO to kind of take it to the next level. Give us an update on what's going on at Datameer. Obviously, the shirt says "Show me the data." Show me the money kind of play there, I get that. That's where the money is, the data is where the action is. Real solutions, not pie in the sky, we're now in our eighth year of this market, so there's not a lot of tolerance for hype even though there's a lot of AI watching going on. What's going on with you guys? >> I would say, interesting enough I met with a customer, prospective customer, this morning, and this was a very typical organization. So, this is a customer that was an insurance company, and they're just about to spin up their first Hadoop cluster to actually work on customer management applications. And they are overwhelmed with what the market offers now. There's 27 open source projects, there's dozens and dozens of other different tools that try to basically, they try best of reach approaches and certain layers of the stack for specific applications, and they don't really know how to stitch this all together. And if I reflect from a customer meeting at a Canadian bank recently that has very successfully deployed applications on the data lake, like in fraud management and compliance applications and things like this, they still struggle to basically replicate the same performance and the service level agreements that they used from their old EDW that they still have in production. And so, everybody's now going out there and trying to figure out how to get value out of the data lake for the business users, right? There's a lot of approaches that these companies are trying. There's SQL-on-Hadoop that supposedly doesn't perform properly. There is other solutions like OLAP on Hadoop that tries to emulate what they've been used to from the EDWs, and we believe these are the wrong approaches, so we want to stay true to the stack and be native to the stack and offer a platform that really operates end-to-end from interesting the data into the data lake to creation, preparation of the data, and ultimately, building the data pipelines for the business users, and this is certainly something-- >> Here's more of a play for the business users now, not the data scientists and statistical modelers. I thought the data scientists were your core market. Is that not true? >> So, our primary user base as Datameer used to be like, until last week, we were the data engineers in the companies, or basically the people that built the data lake, that created the data and built these data pipelines for the business user community no matter what tool they were using. >> Jim, I want to get your thoughts on this for Christian's interest. Last year, so these guys can fix your microphone. I think you guys fix the microphone for us, his earpiece there, but I want to get a question to Chris, and I ask to redirect through you. Gartner, another analyst firm. >> Jim: I've heard of 'em. >> Not a big fan personally, but you know. >> Jim: They're still in business? >> The magic quadrant, they use that tool. Anyway, they had a good intro stat. Last year, they predicted through 2017, 60% of big data projects will fail. So, the question for both you guys is did that actually happen? I don't think it did, I'm not hearing that 60% have failed, but we are seeing the struggle around analytics and scaling analytics in a way that's like a dev ops mentality. So, thoughts on this 60% data projects fail. >> I don't know whether it's 60%, there was another statistic that said there's only 14% of Hadoop deployments, or production or something, >> They said 60, six zero. >> Or whatever. >> Define failure, I mean, you've built a data lake, and maybe you're not using it immediately for any particular application. Does that mean you've failed, or does it simply mean you haven't found the killer application yet for it? I don't know, your thoughts. >> I agree with you, it's probably not a failure to that extent. It's more like how do they, so they dump the data into it, right, they build the infrastructure, now it's about the next step data lake 2.0 to figure out how do I get value out of the data, how do I go after the right applications, how do I build a platform and tools that basically promotes the use of that data throughout the business community in a meaningful way. >> Okay, so what's going on with you guys from a product standpoint? You guys have some announcements. Let's get to some of the latest and greatest. >> Absolutely. I think we were very strong in data creation, data preparation and the entire data governance around it, and we are using, as a user interface, we are using this spreadsheet-like user interface called a workbook, it really looks like Excel, but it's not. It operates at completely different scale. It's basically an Excel spreadsheet on steroids. Our customers built a data pipeline, so this is the data engineers that we discussed before, but we also have a relatively small power user community in our client base that use that spreadsheet for deep data exploration. Now, we are lifting this to the next level, and we put up a visualization layer on top of it that runs natively in the stack, and what you get is basically a visual experience not only in the data curation process but also in deep data exploration, and this is combined with two platform technologies that we use, it's based on highly scalable distributed search in the backend engine of our product, number one. We have also adopted a columnar data store, Parquet, for our file system now. In this combination, the data exploration capabilities we bring to the market will allow power analysts to really dig deep into the data, so there's literally no limits in terms of the breadth and the depth of the data. It could be billions of rows, it could be thousands of different attributes and columns that you are looking at, and you will get a response time of sub-second as we create indices on demand as we run this through the analytic process. >> With these fast queries and visualization, do you also have the ability to do semantic data virtualization roll-ups across multi-cloud or multi-cluster? >> Yeah, absolutely. We, also there's a second trend that we discussed right before we started the live transmission here. Things are also moving into the cloud, so what we are seeing right now is the EDW's not going away, the on prem is data lake, that prevail, right, and now they are thinking about moving certain workload types into the cloud, and we understand ourselves as a platform play that builds a data fabric that really ties all these data assets together, and it enables business. >> On the trends, we weren't on camera, we'll bring it up here, the impact of cloud to the data world. You've seen this movie before, you have extensive experience in this space going back to the origination, you'd say Teradata. When it was the classic, old-school data warehouse. And then, great purpose, great growth, massive value creation. Enter the Hadoop kind of disruption. Hadoop evolved from batch to do ranking stuff, and then tried to, it was a hammer that turned into a lawnmower, right? Then they started going down the path, and really, it wasn't workable for what people were looking at, but everyone was still trying to be the Teradata of whatever. Fast forward, so things have evolved and things are starting to shake out, same picture of data warehouse-like stuff, now you got cloud. It seems to be changing the nature of what it will become in the future. What's your perspective on that evolution? What's different about now and what's same about now that's, from the old days? What's the similarities of the old-school, and what's different that people are missing? >> I think it's a lot related to cloud, just in general. It is extremely important to fast adoptions throughout the organization, to get performance, and service-level agreements without customers. This is where we clearly can help, and we give them a user experience that is meaningful and that resembles what they were used to from the old EDW world, right? That's number one. Number two, and this comes back to a question to 60% fail, or why is it failing or working. I think there's a lot of really interesting projects out, and our customers are betting big time on the data lake projects whether it being on premise or in the cloud. And we work with HSBC, for instance, in the United Kingdom. They've got 32 data lake projects throughout the organization, and I spoke to one of these-- >> Not 32 data lakes, 32 projects that involve tapping into the data lake. >> 32 projects that involve various data lakes. >> Okay. (chuckling) >> And I spoke to one of the chief data officers there, and they said they are data center infrastructure just by having kick-started these projects will explode. And they're not in the business of operating all the hardware and things like this, and so, a major bank like them, they made an announcement recently, a public announcement, you can read about it, started moving the data assets into the cloud. This is clearly happening at rapid pace, and it will change the paradigm in terms of breathability and being able to satisfy peak workload requirements as they come up, when you run a compliance report at quota end or something like this, so this will certainly help with adoption and creating business value for our customers. >> We talk about all the time real-time, and there's so many examples of how data science has changed the game. I mean, I was talking about, from a cyber perspective, how data science helped capture Bin Laden to how I can get increased sales to better user experience on devices. Having real-time access to data, and you put in some quick data science around things, really helps things in the edge. What's your view on real-time? Obviously, that's super important, you got to kind of get your house in order in terms of base data hygiene and foundational work, building blocks. At the end of the day, the real-time seems to be super hot right now. >> Real-time is a relative term, right, so there's certainly applications like IOT applications, or machine data that you analyze that require real-time access. I would call it right-time, so what's the increment of data load that is required for certain applications? We are certainly not a real-time application yet. We can possibly load data through Kafka and stream data through Kafka, but in general, we are still a batch-oriented platform. We can do. >> Which, by the way, is not going away any time soon. It's like super important. >> No, it's not going away at all, right. It can do many batches at relatively frequent increments, which is usually enough for what our customers demand from our platform today, but we're certainly looking at more streaming types of capability as we move this forward. >> What do the customer architectures look like? Because you brought up the good point, we talk about this all the time, batch versus real-time. They're not mutually exclusive, obviously, good architectures would argue that you decouple them, obviously will have a good software elements all through the life cycle of data. >> Through the stack. >> And have the stack, and the stack's only going to get more robust. Your customers, what's the main value that you guys provide them, the problem that you're solving today and the benefits to them? >> Absolutely, so our true value is that there's no breakages in the stack. We enter, and we can basically satisfy all requirements from interesting the data, from blending and integrating the data, preparing the data, building the data pipelines, and analyzing the data. And all this we do in a highly secure and governed environment, so if you stitch it together, as a customer, the customer this morning asked me, "Whom do you compete with?" I keep getting this question all the time, and we really compete with two things. We compete with build-your-own, which customers still opt to do nowadays, while our things are really point and click and highly automated, and we compete with a combination of different products. You need to have at least three to four different products to be able to do what we do, but then you get security breaks, you get lack of data lineage and data governance through the process, and this is the biggest value that we can bring to the table. And secondly now with visual exploration, we offer capability that literally nobody has in the marketplace, where we give power users the capability to explore with blazing fast response times, billion rows of data in a very free-form type of exploration process. >> Are there more power users now than there were when you started as a company? It seemed like tools like Datameer have brought people into the sort of power user camp, just simply by the virtue of having access to your tool. What are your thoughts there? >> Absolutely, it's definitely growing, and you see also different companies exploiting their capability in different ways. You might find insurance or financial services customers that have a very sophisticated capability building in that area, and you might see 1,000 to 2,000 users that do deep data exploration, and other companies are starting out with a couple of dozen and then evolving it as they go. >> Christian, I got to ask you as the new CEO of Datameer, obviously going to the next level, you guys have been successful. We were commenting yesterday on theCUBE about, we've been covering this for eight years in depth in terms of CUBE coverage, we've seen the waves come and go of hype, but now there's not a lot of tolerance for hype. You guys are one of the companies, I will say, that stay to your knitting, you didn't overplay your hand. You've certainly rode the hype like everyone else did, but your solution is very specific on value, and so, you didn't overplay your hand, the company didn't really overplay their hand, in my opinion. But now, there's really the hand is value. >> Absolutely. >> As the new CEO, you got to kind of put a little shiny new toy on there, and you know, rub the, keep the car lookin' shiny and everything looking good with cutting edge stuff, the same time scaling up what's been working. The question is what are you doubling down on, and what are you investing in to keep that innovation going? >> There's really three things, and you're very much right, so this has become a mature company. We've grown with our customer base, our enterprise features and capabilities are second to none in the marketplace, this is what our customers achieve, and now, the three investment areas that we are putting together and where we are doubling down is really visual exploration as I outlined before. Number two, hybrid cloud architectures, we don't believe the customers move their entire stack right into the cloud. There's a few that are going to do this and that are looking into these things, but we will, we believe in the idea that they will still have to EDW their on premise data lake and some workload capabilities in the cloud which will be growing, so this is investment area number two. Number three is the entire concept of data curation for machine learning. This is something where we've released a plug-in earlier in the year for TensorFlow where we can basically build data pipelines for machine learning applications. This is still very small. We see some interest from customers, but it's growing interest. >> It's a directionally correct kind of vector, you're looking and say, it's a good sign, let's kick the tires on that and play around. >> Absolutely. >> 'Cause machine learning's got to learn, too. You got to learn from somewhere. >> And quite frankly, deep learning, machine learning tools for the rest of us, there aren't really all that many for the rest of us power users, they're going to have to come along and get really super visual in terms of enabling visual modular development and tuning of these models. What are your thoughts there in terms of going forward about a visualization layer to make machine learning and deep learning developers more productive? >> That is an area where we will not engage in a way. We will stick with our platform play where we focus on building the data pipelines into those tools. >> Jim: Gotcha. >> In the last area where we invest is ecosystem integration, so we think with our visual explorer backend that is built on search and on a Parquet file format is, or columnar store, is really a key differentiator in feeding or building data pipelines into the incumbent BRE ecosystems and accelerating those as well. We've currently prototypes running where we can basically give the same performance and depth of analytic capability to some of the existing BI tools that are out there. >> What are some the ecosystem partners do you guys have? I know partnering is a big part of what you guys have done. Can you name a few? >> I mean, the biggest one-- >> Everybody, Switzerland. >> No, not really. We are focused on staying true to our stack and how we can provide value to our customers, so we work actively and very important on our cloud strategy with Microsoft and Amazon AWS in evolving our cloud strategy. We've started working with various BI vendors throughout that you know about, right, and we definitely have a play also with some of the big SIs and IBM is a more popular one. >> So, BI guys mostly on the tool visualization side. You said you were a pipeline. >> On tool and visualization side, right. We have very effective integration for our data pipelines into the BI tools today we support TD for Tableau, we have a native integration. >> Why compete there, just be a service provider. >> Absolutely, and we have more and better technology come up to even accelerate those tools as well in our big data stuff. >> You're focused, you're scaling, final word I'll give to you for the segment. Share with the folks that are a Datameer customer or have not yet become a customer, what's the outlook, what's the new Datameer look like under your leadership? What should they expect? >> Yeah, absolutely, so I think they can expect utmost predictability, the way how we roll out the division and how we build our product in the next couple of releases. The next five, six months are critical for us. We have launched Visual Explorer here at the conference. We're going to launch our native cloud solution probably middle of November to the customer base. So, these are the big milestones that will help us for our next fiscal year and provide really great value to our customers, and that's what they can expect, predictability, a very solid product, all the enterprise-grade features they need and require for what they do. And if you look at it, we are really enterprise play, and the customer base that we have is very demanding and challenging, and we want to keep up and deliver a capability that is relevant for them and helps them create values from the data lakes. >> Christian Rodatus, technology enthusiast, passionate, now CEO of Datameer. Great to have you on theCUBE, thanks for sharing. >> Thanks so much. >> And we'll be following your progress. Datameer here inside theCUBE live coverage, hashtag BigDataNYC, our fifth year doing our own event here in conjunction with Strata Data, formerly Strata Hadoop, Hadoop World, eight years covering this space. I'm John Furrier with Jim Kobielus here inside theCUBE. More after this short break. >> Christian: Thank you. (upbeat electronic music)

Published Date : Sep 27 2017

SUMMARY :

Brought to by SiliconANGLE Media and its ecosystem sponsors. I'm John Furrier, the co-host, with Jim Kobielus, So well established, I barely think of you create the data and doing something with it. You've come on as the CEO to kind of and the service level agreements that they used Here's more of a play for the business users now, that created the data and built these data pipelines and I ask to redirect through you. So, the question for both you guys is the killer application yet for it? the next step data lake 2.0 to figure out Okay, so what's going on with you guys and columns that you are looking at, and we understand ourselves as a platform play the impact of cloud to the data world. and that resembles what they were used to tapping into the data lake. and being able to satisfy peak workload requirements and you put in some quick data science around things, or machine data that you analyze Which, by the way, is not going away any time soon. more streaming types of capability as we move this forward. What do the customer architectures look like? and the stack's only going to get more robust. and analyzing the data. just simply by the virtue of having access to your tool. and you see also different companies and so, you didn't overplay your hand, the company and what are you investing in to keep that innovation going? and now, the three investment areas let's kick the tires on that and play around. You got to learn from somewhere. for the rest of us power users, We will stick with our platform play and depth of analytic capability to some of What are some the ecosystem partners do you guys have? and how we can provide value to our customers, on the tool visualization side. into the BI tools today we support TD for Tableau, Absolutely, and we have more and better technology Share with the folks that are a Datameer customer and the customer base that we have is Great to have you on theCUBE, here in conjunction with Strata Data, Christian: Thank you.

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Chris	PERSON	0.99+
HSBC	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Jim	PERSON	0.99+
Christian Rodatus	PERSON	0.99+
Stefan	PERSON	0.99+
IBM	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
60%	QUANTITY	0.99+
2017	DATE	0.99+
Datameer	ORGANIZATION	0.99+
2010	DATE	0.99+
32 projects	QUANTITY	0.99+
Last year	DATE	0.99+
United Kingdom	LOCATION	0.99+
1,000	QUANTITY	0.99+
New York City	LOCATION	0.99+
14%	QUANTITY	0.99+
eight years	QUANTITY	0.99+
fifth year	QUANTITY	0.99+
one	QUANTITY	0.99+
Cloudera	ORGANIZATION	0.99+
Excel	TITLE	0.99+
eighth year	QUANTITY	0.99+
late 2009	DATE	0.99+
early 2010	DATE	0.99+
Mike Olson	PERSON	0.99+
60	QUANTITY	0.99+
27 open source projects	QUANTITY	0.99+
last week	DATE	0.99+
thousands	QUANTITY	0.99+
two things	QUANTITY	0.99+
Kafka	TITLE	0.99+
seven	QUANTITY	0.99+
second trend	QUANTITY	0.99+
Midtown Manhattan	LOCATION	0.99+
yesterday	DATE	0.99+
Christian	PERSON	0.99+
both	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.98+
two open source projects	QUANTITY	0.98+
Gartner	ORGANIZATION	0.98+
two platform technologies	QUANTITY	0.98+
Wikibon	ORGANIZATION	0.98+
Switzerland	LOCATION	0.98+
billions of rows	QUANTITY	0.98+
first	QUANTITY	0.98+
MapReduce	ORGANIZATION	0.98+
2,000 users	QUANTITY	0.98+
Bin Laden	PERSON	0.98+
NYC	LOCATION	0.97+
Strata Data	ORGANIZATION	0.97+
32 data lakes	QUANTITY	0.97+
six	QUANTITY	0.97+
Hadoop	TITLE	0.97+
secondly	QUANTITY	0.96+
next fiscal year	DATE	0.96+
three things	QUANTITY	0.96+
today	DATE	0.95+
four different products	QUANTITY	0.95+
Teradata	ORGANIZATION	0.95+
Christian	ORGANIZATION	0.95+
this morning	DATE	0.95+
TD	ORGANIZATION	0.94+
EDW	ORGANIZATION	0.94+
BigData	EVENT	0.92+

Itamar Ankorion, Attunity & Arvind Rajagopalan, Verizon - #DataWorks - #theCUBE

>> Narrator: Live from San Jose in the heart of Silicon Valley, it's the CUBE covering DataWorks Summit 2017 brought to you by Hortonworks. >> Hey, welcome back to the CUBE live from the DataWorks Summit day 2. We've been here for a day and a half talking with fantastic leaders and innovators, learning a lot about what's happening in the world of big data, the convergence with Internet of Things Machine Learning, artificial intelligence, I could go on and on. I'm Lisa Martin, my co-host is George Gilbert and we are joined by a couple of guys, one is a Cube alumni, Itamar Ankorion, CMO of Attunity, Welcome back to the Cube. >> Thank you very much, good to be here, thank you Lisa and George. >> Lisa: Great to have you. >> And Arvind Rajagopalan, the Director of Technology Services for Verizon, welcome to the Cube. >> Thank you. >> So we were chatting before we went on, and Verizon, you're actually going to be presenting tomorrow, at the DataWorks summit, tell us about building... the journey that Verizon has been on building a Data Lake. >> Oh, Verizon is over the last 20 years, has been a large corporation, made up of a lot of different acquisitions and mergers, and that's how it was formed in 20 years back, and as we've gone through the journey of the mergers and the acquisitions over the years, we had data from different companies come together and form a lot of different data silos. So the reason we kind of started looking at this, is when our CFO started asking questions around... Being able to answer One Verizon questions, it's as simple as having Days Payable, or Working Capital Analysis across all the lines of businesses. And since we have a three-major-ERP footprint, it is extremely hard to get that data out, and there was a lot of manual data prep activities that was going into bringing together those One Verizon views. So that's really what was the catalyst to get the journey started for us. >> And it was driven by your CFO, you said? >> Arvind: That's right. >> Ah, very interesting, okay. So what are some of the things that people are going to hear tomorrow from your breakout session? >> Arvind: I'm sorry, say that again? >> Sorry, what are some of the things that the people, the attendees from your breakout session, are going to learn about the steps and the journey? >> So I'm going to primarily be talking about the challenges that we ran into, and share some around that, and also talk about some of the factors, such as the catalysts and what drew us to sort of moving in that direction, as well as getting to some architectural components, from high-level standpoint, talk about certain partners that we work with, the choices we made from an architecture perspective and the tools, as well as to kind of close the loop on, user adoption and what users are seeing in terms of business value, as we start centralizing all of the data at Verizon from a backoff as Finance and Supply Chains standpoint. So that's kind of what I'm looking at talking tomorrow. >> Arvind, it's interesting to hear you talk about sort of collecting data from essentially backoff as operational systems in a Data Lake. Were there... I assume that the state is sort of more refined and easily structured than the typical stories we hear about Data Lakes. Were there challenges in making it available for exploration and visualization, or were all the early-use cases really just Production Reporting? >> So standard reporting across the ERP systems is very mature and those capabilities are there, but then you look at across-ERP systems and we have three major ERP systems for each of the lines of businesses, when you want to look at combining all of the data, it's very hard, and to add to that, you pointed on self-service discovery, and visualization across all three datas, that's even more challenging, because it takes a lot of heavy lift, to normalize all of the data and bring it into one centralized platform, and we started off the journey with Oracle, and then we had SAP HANA, we were trying to bring all the data together, but then we were looking at systems in our non-SAP ERP systems and bringing that data into a SAP-kind of footprint, one, the cost was tremendously high, also there was a lot of heavy lift and challenges in terms of manually having to normalize the data and bring it into the same kind of data models. And even after all of that was done, it was not very self-service oriented for our users and Finance and Supply Chain. >> Let me drill into two of those things. So it sounds like the ETL process of converting it into a consumable format was very complex, and then it sounds like also, the discoverability, like where a tool, perhaps like Elation, might help, which is very, very immature right now, or maybe not immature, it's still young. Is that what was missing, or why was the ETL process so much more heavyweight than with a traditional data warehouse? >> The ETL processes, there's a lot of heavy lifting there involved, because of the proprietary data structures of the ERP systems, especially SAP is... The data structures and how the data is used across clustered and pool tables, is very proprietary. And on top of that, bringing the data formats and structures from a PeopleSoft ERP system which are supporting different lines of businesses, so there are a lot of customization that's gone into place, there are specific things that we use in the ERPs, in terms of the modules and how the processes are modeled in each of the lines of businesses, complicates things a lot. And then you try and bring all these three different ERPs, and the nuances that they have over the years, try and bring them together, it actually makes it very complex. >> So tell us then, help us understand, how the Data Lake made that easier. Was it because you didn't have to do all the refinement before it got there. And tell us how Attunity helped make that possible. >> Oh absolutely, so I think that's one of the big things, why we picked the Hortonworks as one of our key partners in terms of buidling out the Data Lake, it just came on greed, you aren't necessarily worried about doing a whole lot of ETL before you bring the data in, and it also provides with the tools and the technologies from a lot other partners. We have a lot of maturity now, better provided self-service discovery capabilities for ad hoc analysis and reporting. So this is helpful to the users because now they don't have to wait for prolonged IT development cycles to model the data, do the ETL and build reports for the to consume, which sometimes could take weeks and months. Now in a matter of days, they're able to see the data they're looking for and they're able to start the analysis, and once they start the analysis and the data is accessible, it's a matter of minutes and seconds looking at the different tools, how they want to look at it, how they want to model it, so it's actually being a huge value from the perspective of the users and what they're looking to do. >> Speaking of value, one of the things that was kind of thematic yesterday, we see enterprises are now embracing big data, they're embracing Hadoop, it's got to coexist within our ecosystem, and it's got to inter-operate, but just putting data in a Data Lake or Hadoop, that's not the value there, it's being able to analyze that data in motion, at rest, structured, unstructured, and start being able to glean or take actionable insights. From your CFO's perspective, where are you know of answering some of the questions that he or she had, from an insights perspective, with the Data Lake that you have in place? >> Yeah, before I address that, I wanted to quickly touch upon and wrap up George's question, if you don't mind. Because one of the key challenges, and I do talk about how Attunity helped. I was just about to answer the question before we moved on, so I just want to close the loop on that a little bit. So in terms of bringing the data in, the data acquisition or ingestion is key aspect of it, and again, looking at the proprietary data structures from the ERP systems is very complex, and involves a multi-step process to bring the data into a strange environment, and be able to put it in the swamp bring it into the Lake. And what Attunity has been able to help us with is, it has the intelligence to look at and understand the proprietary data structures of the ERPs, and it is able to bring all the data from the ERP source systems directly into Hadoop, without any stops, or staging data bases along the way. So it's been a huge value from that standpoint, I'll get into more details around that. And to answer your question, around how it's helping from a CFO standpoint, and the users in Finance, as I said, now all the data is available in one place, so it's very easy for them to consume the data, and be able to do ad hoc analysis. So if somebody's looking to, like I said earlier, want to look at and calculate base table, as an example, or they want to look at working capital, we are actually moving data using Attunity, CDC replicate product, we're getting data in real-time, into the Data Lake. So now they're able to turn things around, and do that kind of analysis in a matter of hours, versus overnight or in a matter of days, which was the previous environment. >> And that was kind of one of the things this morning, is it's really about speed, right? It's how fast can you move and it sounds like together with Attunity, Verizon is really not only making things simpler, as you talked about in this kind of model that you have, with different ERP systems, but you're also really able to get information into the right hands much, much faster. >> Absolutely, that's the beauty of the near real-time, and the CDC architecture, we're able to get data in, very easily and quickly, and Attunity also provides a lot of visibility as the data is in flight, we're able to see what's happening in the source system, how many packets are flowing through, and to a point, my developers are so excited to work with a product, because they don't have to worry about the changes happening in the source systems in terms of DDL and those changes are automatically understood by the product and pushed to the destination of Hadoop. So it's been a game-changer, because we have not had any downtime, because when there are things changing on the source system side, historically we had to take downtime, to change those configurations and the scripts, and publish it across environments, so that's been huge from that standpoint as well. >> Absolutely. >> Itamar, maybe, help us understand where Attunity can... It sounds like there's greatly reduced latency in the pipeline between the operational systems and the analytic system, but it also sounds like you still need to essentially reformat the data, so that it's consumable. So it sounds like there's an ETL pipeline that's just much, much faster, but at the same time, when it's like, replicate, it sounds like that goes without transformations. So help us sort of understand that nuance. >> Yeah, that's a great question, George. And indeed in the past few years, customers have been focused predominantly on getting the data to the Lake. I actually think it's one of the changes in the fame, we're hearing here in the show and the last few months is, how do we move to start using the data, the great applications on the data. So we're kind of moving to the next step, in the last few years we focused a lot on innovating and creating the solutions that facilitate and accelerate the process of getting data to the Lake, from a large scope of systems, including complex ones like SAP, and also making the process of doing that easier, providing real-time data that can both feed streaming architectures as well as batch ones. So once we got that covered, to your question, is what happens next, and one of the things we found, I think Verizon is also looking at it now and are being concomitant later. What we're seeing is, when you bring data in, and you want to adopt the streaming, or a continuous incremental type of data ingestion process, you're inherently building an architecture that takes what was originally a database, but you're kind of, in a sense, breaking it apart to partitions, as you're loading it over time. So when you land the data, and Arvind was referring to a swamp, or some customers refer to it as a landing zone, you bring the data into your Lake environment, but at the first stage that data is not structured, to your point, George, in a manner that's easily consumable. Alright, so the next step is, how do we facilitate the next step of the process, which today is still very manual-driven, has custom development and dealing with complex structures. So we actually are very excited, we've introduced, in the show here, we announced a new product by Attunity, Compose for Hive, which extends our Data Lake solutions, and what Compose of Hive is exactly designed to do, is address part of the problem you just described, where's when the data comes in and is partitioned, what Compose for Hive does, is it reassembles these partitions, and it then creates analytic-ready data sets, back in Hive, so it can create operational data stores, it can create historical data stores, so then the data becomes formatted, in a matter that's more easily accessible for users, who want to use analytic tools, VI-tools, Tableau, Qlik, any type of tool that can easily access a database. >> Would there be, as a next step, whether led by Verizon's requirements or Attunity's anticipation of broader customer requirements, something where, there's a, if not near real-time, but a very low latency landing and transformation, so that data that is time-sensitive can join the historical data. >> Absolutely, absolutely. So what we've done, is focus on real-time availability of data. So when we feed the data into the Data Lake, we fit it into ways, one is directly into Hive, but we also go through a streaming architecture, like Kafka, in the case of Hortonworks, can also fit also very well into HDF. So then the next step in the process, is producing those analytic data sets, or data source, out of it, which we enable, and what we do is design it together with our partners, with our inner customers. So again when we work on Replicate, then we worked on Compose, we worked very close with Fortune companies trying to deal with these challenges, so we can design a product. In the case of Compose for Hive for example, we have done a lot of collaboration, at a product engineering level, with Hortonworks, to leverage the latest and greatest in Hive 2.2, Hive LLAP, to be able to push down transformations, so those can be done faster, including real-time, so those datasets can be updated on a frequent basis. >> You talked about kind of customer requirements, either those specific or not, obviously talking to telecommunications company, are you seeing, Itamar, from Attunity's perspective, more of this need to... Alright, the data's in the Lake, or first it comes to the swamp, now it's in the Lake, to start partitioning it, are you seeing this need driven in specific industries, or is this really pretty horizontal? >> That's a good question and this is definitely a horizontal need, it's part of the infrastructure needs, so Verizon is a great customer, and we even worked similarly in telecommunications, we've been working with other customers in other industries, from manufacturing, to retail, to health care, to automotive and others, and in all of those cases it's on a foundation level, it's very similar architectural challenges. You need to ingest the data, you want to do it fast, you want to do it incrementally or continuously, even if you're loading directly into Hadoop. Naturally, when you're loading the data through a Kafka, or streaming architecture, it's a continuous fashon, and then you partition the data. So the partitioning of the data is kind of inherent to the architecture, and then you need to help deal with the data, for the next step in the process. And we're doing it both with Compose for Hive, but also for customers using streaming architectures like Kafka, we provide the mechanisms, from supporting or facilitating things like schema unpollution, and schema decoding, to be able to facilitate the downstream process of processing those partitions of data, so we can make the data available, that works both for analytics and streaming analytics, as well as for scenarios like microservices, where the way in which you partition the data or deliver the data, allows each microservice to pick up on the data it needs, from the relevant partition. >> Well guys, this has been a really informative conversation. Congratulations, Itamar, on the new announcement that you guys made today. >> Thank you very much. >> Lisa: Arvin, great to hear the use case and how Verizon really sounds quite pioneering in what you're doing, wish you continued success there, we look forward to hearing what's next for Verizon, we want to thank you for watching the CUBE, we are again live, day two, of the DataWorks summit, #DWS17, before me my co-host George Gilbert, I am Lisa Martin, stick around, we'll be right back. (relaxed techno music)

Published Date : Jun 14 2017

SUMMARY :

in the heart of Silicon Valley, and we are joined by a couple of guys, Thank you very much, good to be here, the Director of Technology Services for Verizon, at the DataWorks summit, So the reason we kind of started looking at this, that people are going to hear tomorrow and the tools, as well as to kind of close the loop on, than the typical stories we hear about Data Lakes. and bring it into the same kind of data models. So it sounds like the ETL process and the nuances that they have over the years, how the Data Lake made that easier. do the ETL and build reports for the to consume, and it's got to inter-operate, and it is able to bring all the data and it sounds like together with Attunity, and the CDC architecture, we're able to get data in, and the analytic system, getting the data to the Lake. can join the historical data. like Kafka, in the case of Hortonworks, Alright, the data's in the Lake, You need to ingest the data, you want to do it fast, Congratulations, Itamar, on the new announcement Lisa: Arvin, great to hear the use case

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
Arvind Rajagopalan	PERSON	0.99+
Arvind	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Verizon	ORGANIZATION	0.99+
Itamar Ankorion	PERSON	0.99+
Lisa	PERSON	0.99+
George	PERSON	0.99+
Itamar	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
Silicon Valley	LOCATION	0.99+
two	QUANTITY	0.99+
tomorrow	DATE	0.99+
Kafka	TITLE	0.99+
three	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
Cube	ORGANIZATION	0.99+
Arvin	PERSON	0.99+
DataWorks Summit	EVENT	0.99+
SAP HANA	TITLE	0.99+
One	QUANTITY	0.99+
each	QUANTITY	0.99+
yesterday	DATE	0.99+
#DWS17	EVENT	0.99+
one	QUANTITY	0.98+
a day and a half	QUANTITY	0.98+
CDC	ORGANIZATION	0.98+
first stage	QUANTITY	0.98+
Tableau	TITLE	0.98+
DataWorks Summit 2017	EVENT	0.98+
Attunity	ORGANIZATION	0.98+
Hive	TITLE	0.98+
both	QUANTITY	0.98+
Attunity	PERSON	0.98+
DataWorks	EVENT	0.97+
today	DATE	0.97+
Compose for Hive	ORGANIZATION	0.97+
Compose	ORGANIZATION	0.96+
Hive 2.2	TITLE	0.95+
Qlik	TITLE	0.94+
Hadoop	TITLE	0.94+
one place	QUANTITY	0.93+
day two	QUANTITY	0.92+
each microservice	QUANTITY	0.9+
first	QUANTITY	0.9+
20 years back	DATE	0.89+
#DataWorks	ORGANIZATION	0.87+
three major ERP systems	QUANTITY	0.83+
last 20 years	DATE	0.82+
PeopleSoft	ORGANIZATION	0.8+
Data Lake	COMMERCIAL_ITEM	0.8+
SAP	ORGANIZATION	0.79+

Arun Murthy, Hortonworks | DataWorks Summit 2017

>> Announcer: Live from San Jose, in the heart of Silicon Valley, it's theCUBE covering DataWorks Summit 2017. Brought to you by Hortonworks. >> Good morning, welcome to theCUBE. We are live at day 2 of the DataWorks Summit, and have had a great day so far, yesterday and today, I'm Lisa Martin with my co-host George Gilbert. George and I are very excited to be joined by a multiple CUBE alumni, the co-founder and VP of Engineering at Hortonworks Arun Murthy. Hey, Arun. >> Thanks for having me, it's good to be back. >> Great to have you back, so yesterday, great energy at the event. You could see and hear behind us, great energy this morning. One of the things that was really interesting yesterday, besides the IBM announcement, and we'll dig into that, was that we had your CEO on, as well as Rob Thomas from IBM, and Rob said, you know, one of the interesting things over the last five years was that there have been only 10 companies that have beat the S&P 500, have outperformed, in each of the last five years, and those companies have made big bets on data science and machine learning. And as we heard yesterday, these four meta-trains IoT, cloud streaming, analytics, and now the fourth big leg, data science. Talk to us about what Hortonworks is doing, you've been here from the beginning, as a co-founder I've mentioned, you've been with Hadoop since it was a little baby. How is Hortonworks evolving to become one of those big users making big bets on helping your customers, and yourselves, leverage machine loading to really drive the business forward? >> Absolutely, a great question. So, you know, if you look at some of the history of Hadoop, it started off with this notion of a data lake, and then, I'm talking about the enterprise side of Hadoop, right? I've been working for Hadoop for about 12 years now, you know, the last six of it has been as a vendor selling Hadoop to enterprises. They started off with this notion of data lake, and as people have adopted that vision of a data lake, you know, you bring all the data in, and now you're starting to get governance and security, and all of that. Obviously the, one of the best ways to get value over the data is the notion of, you know, can you, sort of, predict what is going to happen in your world of it, with your customers, and, you know, whatever it is with the data that you already have. So that notion of, you know, Rob, our CEO, talks about how we're trying to move from a post-transactional world to a pre-transactional world, and doing the analytics and data sciences will be, obviously, with me. We could talk about, and there's so many applications of it, something as similar as, you know, we did a demo last year of, you know, of how we're working with a freight company, and we're starting to show them, you know, predict which drivers and which routes are going to have issues, as they're trying to move, alright? Four years ago we did the same demo, and we would say, okay this driver has, you know, we would show that this driver had an issue on this route, but now, within the world, we can actually predict and let you know to take preventive measures up front. Similarly internally, you know, you can take things from, you know, mission-learning, and log analytics, and so on, we have a internal problem, you know, where we have to test two different versions of HDP itself, and as you can imagine, it's a really, really hard problem. We have the support, 10 operating systems, seven databases, like, if you multiply that matrix, it's, you know, tens of thousands of options. So, if you do all that testing, we now use mission-learning internally, to look through the logs, and kind of predict where the failures were, and help our own, sort of, software engineers understand where the problems were, right? An extension of that has been, you know, the work we've done in Smartsense, which is a service we offer our enterprise customers. We collect logs from their Hadoop clusters, and then they can actually help them understand where they can either tune their applications, or even tune their hardware, right? They might have a, you know, we have this example I really like where at a really large enterprise Financial Services client, they had literally, you know, hundreds and, you know, and thousands of machines on HDP, and we, using Smartsense, we actually found that there were 25 machines which had bad NIC configuration, and we proved to them that by fixing those, we got a 30% to put back on their cluster. At that scale, it's a lot of money, it's a lot of cap, it's a lot of optics So, as a company, we try to ourselves, as much as we, kind of, try to help our customers adopt it, that make sense? >> Yeah, let's drill down on that even a little more, cause it's pretty easy to understand what's the standard telemetry you would want out of hardware, but as you, sort of, move up the stack the metrics, I guess, become more custom. So how do you learn, not just from one customer, but from many customers especially when you can't standardize what you're supposed to pull out of them? >> Yeah so, we're sort of really big believers in, sort of, doctoring your own stuff, right? So, we talk about the notion of data lake, we actually run a Smartsense data lake where we actually get data across, you know, the hundreds of of our customers, and we can actually do predictive mission-learning on that data in our own data lake. Right? And to your point about how we go up the stack, this is, kind of, where we feel like we have a natural advantage because we work on all the layers, whether it's the sequel engine, or the storage engine, or, you know, above and beyond the hardware. So, as we build these models, we understand that we need more, or different, telemetry right? And we put that back into the product so the next version of HDP will have that metrics that we wanted. And, now we've been doing this for a couple of years, which means we've done three, four, five turns of the crank, obviously something we always get better at, but I feel like, compared to where we were a couple of years ago when Smartsense first came out, it's actually matured quite a lot, from that perspective. >> So, there's a couple different paths you can add to this, which is customers might want, as part of their big data workloads, some non-Hortonworks, you know, services or software when it's on-prem, and then can you also extend this management to the Cloud if they want to hybrid setup where, in the not too distant future, the Cloud vendor will be also a provider for this type of management. >> So absolutely, in fact it's true today when, you know, we work with, you know, Microsoft's a great partner of ours. We work with them to enable Smartsense on HDI, which means we can actually get the same telemetry back, whether you're running the data on an on-prem HDP, or you're running this on HDI. Similarly, we shipped a version of our Cloud product, our Hortonworks Data Cloud, on Amazon and again Smartsense preplanned there, so whether you're on an Amazon, or a Microsoft, or on-prem, we get the same telemetry, we get the same data back. We can actually, if you're a customer using many of these products, we can actually give you that telemetry back. Similarly, if you guys probably know this we have, you were probably there in an analyst when they announced the Flex Support subscription, which means that now we can actually take the support subscription you have to get from Hortonworks, and you can actually use it on-prem or on the Cloud. >> So in terms of transforming, HDP for example, just want to make sure I'm understanding this, you're pulling in data from customers to help evolve the product, and that data can be on-prem, it can be in a Microsoft lesur, it can be an AWS? >> Exactly. The HDP can be running in any of these, we will actually pull all of them to our data lake, and they actually do the analytics for us and then present it back to the customers. So, in our support subscription, the way this works is we do the analytics in our lake, and it pushes it back, in fact to our support team tickets, and our sales force, and all the support mechanisms. And they get a set of recommendations saying Hey, we know this is the work loads you're running, we see these are the opportunities for you to do better, whether it's tuning a hardware, tuning an application, tuning the software, we sort of send the recommendations back, and the customer can go and say Oh, that makes sense, the accept that and we'll, you know, we'll update the recommendation for you automatically. Then you can have, or you can say Maybe I don't want to change my kernel pedometers, let's have a conversation. And if the customer, you know, is going through with that, then they can go and change it on their own. We do that, sort of, back and forth with the customer. >> One thing that just pops into my mind is, we talked a lot yesterday about data governance, are there particular, and also yesterday on stage were >> Arun: With IBM >> Yes exactly, when we think of, you know, really data-intensive industries, retail, financial services, insurance, healthcare, manufacturing, are there particular industries where you're really leveraging this, kind of, bi-directional, because there's no governance restrictions, or maybe I shouldn't say none, but. Give us a sense of which particular industries are really helping to fuel the evolution of Hortonworks data lake. >> So, I think healthcare is a great example. You know, when we started off, sort of this open-source project, or an atlas, you know, a couple of years ago, we got a lot of traction in the healthcare sort of insurance industry. You know, folks like Aetna were actually founding members of that, you know, sort of consortium of doing this, right? And, we're starting to see them get a lot of leverage, all of this. Similarly now as we go into, you know, Europe and expand there, things like GDPR, are really, really being pardoned, right? And, you guys know GDPR is a really big deal. Like, you pay, if you're not compliant by, I think it's like March of next year, you pay a portion of your revenue as fines. That's, you know, big money for everybody. So, I think that's what we're really excited about the portion with IBM, because we feel like the two of us can help a lot of customers, especially in countries where they're significantly, highly regulated, than the United States, to actually get leverage our, sort of, giant portfolio of products. And IBM's been a great company to atlas, they've adopted wholesale as you saw, you know, in the announcements yesterday. >> So, you're doing a Keynote tomorrow, so give us maybe the top three things, you're giving the Keynote on Data Lake 3.0, walk us through the evolution. Data Lakes 1.0, 2.0, 3.0, where you are now, and what folks can expect to hear and see in your Keynote. >> Absolutely. So as we've, kind of, continued to work with customers and we see the maturity model of customers, you know, initially people are staying up a data lake, and then they'd want, you know, sort of security, basic security what it covers, and so on. Now, they want governance, and as we're starting to go to that journey clearly, our customers are pushing us to help them get more value from the data. It's not just about putting the data lake, and obviously managing data with governance, it's also about Can you help us, you know, do mission-learning, Can you help us build other apps, and so on. So, as we look to there's a fundamental evolution that, you know, Hadoop legal system had to go through was with advance of technologies like, you know, a Docker, it's really important first to help the customers bring more than just workloads, which are sort of native to Hadoop. You know, Hadoop started off with MapReduce, obviously Spark's went great, and now we're starting to see technologies like Flink coming, but increasingly, you know, we want to do data science. To mass market data science is obviously, you know, people, like, want to use Spark, but the mass market is still Python, and R, and so on, right? >> Lisa: Non-native, okay. >> Non-native. Which are not really built, you know, these predate Hadoop by a long way, right. So now as we bring these applications in, having technology like Docker is really important, because now we can actually containerize these apps. It's not just about running Spark, you know, running Spark with R, or running Spark with Python, which you can do today. The problem is, in a true multi-tenant governed system, you want, not just R, but you want specifics of a libraries for R, right. And the libraries, you know, George wants might be completely different than what I want. And, you know, you can't do a multi-tenant system where you install both of them simultaneously. So Docker is a really elegant solution to problems like those. So now we can actually bring those technologies into a Docker container, so George's Docker containers will not, you know, conflict with mine. And you can actually go to the races, you know after the races, we're doing data signs. Which is really key for technologies like DSX, right? Because with DSX if you see, obviously DSX supports Spark with technologies like, you know, Zeppelin which is a front-end, but they also have Jupiter, which is going to work the mass market users for Python and R, right? So we want to make sure there's no friction whether it's, sort of, the guys using Spark, or the guys using R, and equally importantly DSX, you know, in the short map will also support things like, you know, the classic IBM portfolio, SBSS and so on. So bringing all of those things in together, making sure they run with data in the data lake, and also the computer in the data lake, is really big for us. >> Wow, so it sounds like your Keynote's going to be very educational for the folks that are attending tomorrow, so last question for you. One of the themes that occurred in the Keynote this morning was sharing a fun-fact about these speakers. What's a fun-fact about Arun Murthy? >> Great question. I guess, you know, people have been looking for folks with, you know, 10 years of experience on Hadoop. I'm here finally, right? There's not a lot of people but, you know, it's fun to be one of those people who've worked on this for about 10 years. Obviously, I look forward to working on this for another 10 or 15 more, but it's been an amazing journey. >> Excellent. Well, we thank you again for sharing time again with us on theCUBE. You've been watching theCUBE live on day 2 of the Dataworks Summit, hashtag DWS17, for my co-host George Gilbert. I am Lisa Martin, stick around we've got great content coming your way.

Published Date : Jun 14 2017

SUMMARY :

Brought to you by Hortonworks. We are live at day 2 of the DataWorks Summit, and Rob said, you know, one of the interesting and we're starting to show them, you know, when you can't standardize what you're or the storage engine, or, you know, some non-Hortonworks, you know, services when, you know, we work with, you know, And if the customer, you know, Yes exactly, when we think of, you know, Similarly now as we go into, you know, Data Lakes 1.0, 2.0, 3.0, where you are now, with advance of technologies like, you know, And the libraries, you know, George wants One of the themes that occurred in the Keynote this morning There's not a lot of people but, you know, Well, we thank you again for sharing time again

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
Lisa Martin	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Rob	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Rob Thomas	PERSON	0.99+
George	PERSON	0.99+
Lisa	PERSON	0.99+
30%	QUANTITY	0.99+
San Jose	LOCATION	0.99+
Microsoft	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
25 machines	QUANTITY	0.99+
10 operating systems	QUANTITY	0.99+
hundreds	QUANTITY	0.99+
Arun Murthy	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
two	QUANTITY	0.99+
Aetna	ORGANIZATION	0.99+
10 years	QUANTITY	0.99+
Arun	PERSON	0.99+
today	DATE	0.99+
Spark	TITLE	0.99+
yesterday	DATE	0.99+
AWS	ORGANIZATION	0.99+
both	QUANTITY	0.99+
Python	TITLE	0.99+
last year	DATE	0.99+
Four years ago	DATE	0.99+
15	QUANTITY	0.99+
tomorrow	DATE	0.99+
CUBE	ORGANIZATION	0.99+
three	QUANTITY	0.99+
DataWorks Summit	EVENT	0.99+
seven databases	QUANTITY	0.98+
four	QUANTITY	0.98+
DataWorks Summit 2017	EVENT	0.98+
United States	LOCATION	0.98+
Dataworks Summit	EVENT	0.98+
10	QUANTITY	0.98+
Europe	LOCATION	0.97+
10 companies	QUANTITY	0.97+
One	QUANTITY	0.97+
one customer	QUANTITY	0.97+
thousands of machines	QUANTITY	0.97+
about 10 years	QUANTITY	0.96+
GDPR	TITLE	0.96+
Docker	TITLE	0.96+
Smartsense	ORGANIZATION	0.96+
about 12 years	QUANTITY	0.95+
this morning	DATE	0.95+
each	QUANTITY	0.95+
two different versions	QUANTITY	0.95+
five turns	QUANTITY	0.94+
R	TITLE	0.93+
four meta-trains	QUANTITY	0.92+
day 2	QUANTITY	0.92+
Data Lakes 1.0	COMMERCIAL_ITEM	0.92+
Flink	ORGANIZATION	0.91+
first	QUANTITY	0.91+
HDP	ORGANIZATION	0.91+

Day 1 Wrap Up | AWS Public Sector Summit 2017

>> Narrator: Live from Washington DC, it's theCube, covering AWS Public Sector Summit 2017. Brought to you by Amazon Web Services and its partner Ecosystem. >> Welcome back here to Washington, D.C. You're watching Cube Live here at Silicon Angle T.V. The flagship broadcast of Silicon Angle. We are at AWS Public Sector Summit 2017 wrapping up day one coverage here in the Walter Washington Convention Center. Along with John Furrier, we are now joined by our esteemed colleague Jeff Frick who's been alongside all day handling all the machinations behind the scenes. >> Behind the scenes, John. >> John: Doing an admirable job of that, Jeff. >> So what do you think, our first ever visit to your town. >> John: I love it, I love it. >> I sense something tableau at the Opry. The Opry's the other big convention center, here, or Graceland. >> International Harbor. >> It's the same company. >> National harbor, MGM. >> You're a D.C. guy. >> Gaylord. >> Gaylord, thank you. >> What's the connection? So we going to get some tickets for the Nationals game? >> We got Nats game tonight, Strasburg pitched last night, did not pitch well, but who knows? Maybe we'll get Gio tonight. >> Well the action certainly Amazon Web Services >> Yeah let's talk about what we have going on here today, Jeff. >> Well, I mean, we interviewed, you and I did some great interviews. Intel came on, which is obviously Bellwether in the tech business. Jeff, former Intel employee knows what it's like to march to the cadence of Moore's law and Intel is continuing to do well in platinum sponsor or diamond sponsor here at the event. Look it, the chips are getting smarter and smarter, security at the Silicon, powering 5G, a networks transmission, a lot of the plumbing that's going on in cloud and in cars and devices and companies, it's going to all be connected. So it's a connected world we're living in and Intel's going to be a key part of that so they're highly interested and motivated by all the people that are popping up in the cloud. >> We were just talking and Jeff, I know, you're able to listen on the last interview that we did, but a point that you made, that, you know, a point that you raised, about four years ago, when the CIA deal came down and AWS is ON one side and IBM's on the other, and AWS wins that battle. You called it the shot heard round the cloud. And that, now four years later, has turned out to be a hugely pivotal moment. >> Yeah, I mean this is like moments in time history here, again, documenting it on the Cube for the first time. I don't think anything was written about this I'll say it since we're going to be analyzing it. The shot heard around the cloud was 2013 when AWS public sector under Teresa Carlton's team and her leadership, beat IBM for the Central Intelligence Agency, CIA, contract. Guaranteed lots of spec for IBM. Amazon comes out of the woodwork and wins it. And they won it because essentially the sales motion and the power of IBM had this thing lopped in. But at that time the marketplace was booming with what we call Shadow IT, where you could put your credit card down and go into Amazon cloud and get some instants. What happened was someone actually cut a little prototype, showed their boss, and they said, "I like that better than that, let's do a bake-off." So what happened was at the last minute, new opportunity comes in and then they do what they call a bake-off. Bake-offs and RAPs come in and they won. Went to court and the judge in the ruling actually said Amazon has a better product. So they ruled in favor of Amazon Web Services. That was what I called the shot heard around the cloud. Since that point on, the cloud has become more legitimate every single day for not only startups, enterprises, as well as now public sector. So shot heard around the cloud fast forward to today, this show's on a trajectory to take on the pace of re:Invent, which as their core Amazon Web Services show, then of course which is why we're here chronicalizing this moment in history. This is where we believe, Jeff you and I talked about this, and Dave Alante and I talked about the research team, this is where the influction point kicks up. This is a new growth pillar unpredicted by Wall Street, new growth predictor for revenue for Amazon, they're already a cash machine. They're already looking like a hockey stick this way. You add on public sector, it's going to be phenomenal. So, a lot of people are seeing it but this is just growing like a weed. >> Jeff, follow up on that. >> I was going to say, the two mega trends, John, that we've talked about time and time again, and Teresa Carlson and team have done a terrific job here in the public sector, but I always go back to the James, Tuesday night in the James Hamilton at re:Invent, and if you've never gone you got to go, and he talks about just all these big iron infrastructure investments that Amazon continues to make because they have such scale behind them. Whether it's in chips, whether it's in networking, whether it's in new fibers that they're running across the oceans. They can invest so much money to the benefit of their customers, whether it be security, you know, in all the areas of compute, that is fascinating to me. The other thing we always hear about, about cloud, right, is at some point, it's cheaper to own rather than rent. We just keep coming back to Netflix, like at nighttime, I think Netflix owns whatever the number, 45 percent of all internet traffic in the evening is Netflix, whatever the number is. They're still on Amazon. So, it's not necessarily better to rent than buy. You have to know what you're doing and we were at another show the other day, it was Gannet, the newspaper company. When they're using a lot of servers, they use hundreds, but he said there are sometimes, using AWS, that they actually turn all the servers off. You cannot do that in a standard infrastructure world. You can't turn everything off and then on. Which again, you got to manage it. You don't want the expensive bill. But to me, being able to leverage such scale to the benefit of every customer whether it's Netflix or a startup, it's pretty tough. >> And this is the secret, and this is something again, shared with the Cube audience, here, is not new to us, but we're going to re-amplify it because the people make a mistake with the cloud, it's in one area, they don't match the business model to their variable cost expenses. If you get into the cloud business, and you can actually ratchet your revenue coming in and then manage that cost delta redline, blackline, know where those lines are, as long as you're in the black, and revenue, and you then have the cost variable step up with your revenue, that is the magical formula. It's not that hard, it's back of an envelope. >> Right, right. >> Red line cost, black line revenue. >> The other great story, it was from summit, actually, in San Francisco earlier this year, at they keynote, they had Nextdoor, everybody knows Nextdoor it's the social media for your mom, my mom. They love it, right, people are losing dogs, and looking for a plumber, but the guy talked from about Nextdoor. >> John: Don't knock Nextdoor. >> I don't knock Nextdoor, the Nextdoor CEO gets up and he said, well, I laugh because the Nextdoor guy's mom didn't know what he did until he did Nextdoor. Anyway, he said, you know, we have the entire production system for Nextdoor. And then we would build production plus one on a completely separate group of hardwares inside of Amazon. When that was tested out and ready to run, guess what, we just turn off the first one. You can't, you can't, you can't do that in an owned infrastructure world. You can't build N and N plus one and N plus two and turn off N, you just can't do that. >> Well, the Fugue CEO, Josh, everyone should check out on Youtube.com/siliconangle, he was awesome. He basically saw a throwaway infrastructure mindset to your point about Nextdoor. You build it up and then you bring your new stuff in, you digitally throw it away. >> Right, right. >> That's the future. And this is the business model aspect. And public sector, we were joking, look it, let's just be honest with ourselves, it is a glacier antiquated old systems, people trying hard, you know, government servants, you know, that, employees of the government, not appointees, they don't have a lot of budget and they're always under scrutiny for cost. So the cost benefits always there and they have old systems. So they want new systems. So the demand is there. The question is, can they pull it off. >> So, talk about the government mindset or the shift. We've heard a little bit about that today. About how, to the point that you just made there, John, that you know, very reluctant, some foot-dragging going on, that's historical, that's what happens. But now, maybe the CIA deal, whatever it was, we hit that tipping point, and all the sudden, the minds are opening, and some people are embracing, or being more engaging, with new mousetraps, with better ways to do things. >> We've got the speakers coming on here, so we should wrap it up real quick. Final thoughts, from Day One. >> I was just going to say that the other thing is that before there was so much fat, in not only government in general, but in infrastructure purchasing, 'cause you had to, you better not run out of hardware at Q3 when you're running the numbers. So everything was so over provision, so much expense and over provisioning. With Amazon you don't need to over provision. You can tap it when you need it and turn it off so there's a huge amount of budget that should actually be released. >> I want to ask you guys, we'll wrap up here, final, since you're emceeing, final thoughts. What is your impression of day one? I'll start here and you guys can have time to think of an answer. My takeaway for public sector is Teresa Carlson has risen up as a prime executive for Amazon Web Services. She went from knocking doors eight years ago to full on blown growth strategy for Amazon. And it's very clear, they're not there yet. They only have 10,000 people here, so the conference isn't that massive. But it's on its way to becoming massive. Here's their issue. They have to start getting the cadence of re:Invent launches into the public sector. And that's the big story here. They are quickly shortening the cycles between what they launch at Amazon re:Invent and what they roll out of the public sector. The question is how fast can they do that? And that's what we're going to be watching. And then the customer behaviors starting to procure. So greenlight for Amazon. But they got to get those release cycles. Stuff gets released at Amazon re:Invent, they got to roll them with government, shorten that down to almost zero, they'll win. >> Yeah, my just quick impression is, I like to look at the booth action, because we've all had booth duty, right. What's going on in the booths? Did the people that paid for a booth here feel like they got their money's worth? And the traffic in the booths has been good, they've been three deep, four deep. So the people that are here are curious they're interested, they're spending time going booth to booth to booth, and that's a very good sign. >> This is a learning conference. Alright your thoughts. >> I would say, the only thing that is, I wouldn't say it's a red light by any means, but it's like a caution light, it's about budgets, you know, when you run government, you're always, you are vulnerable to somebody else's budget decision. I'm, you know, whether it's Congress, whether it's a city council, whether it's a state legislation, whatever it is, that's always just kind of a, a little hangup you have to deal with because you might have the best mousetrap in the world, but if somebody says nah, you can't write that check this year, maybe next year. We're going to put our money somewhere else. That's the only thing. >> I got my Trump joke in, I don't know if you heard that, but my Trump joke is, I'll say it at the end, there's a lot of data lakes in D.C., and they've turned into data swamps. So Amazon's here to drain the data swamp. >> Jeff: He got it in. He's been practicing that all week. >> I've heard it three times, are you kidding? Funny every time. >> Well you know our Cube, you know we talk about data swamps. I hate the word data lake, as everyone knows, I just hate that word, it's just not. >> Well, there is value in that swamp. >> Hated the word data lake. >> For Jeff Rick, John Furrier, I'm John Walsh. Thank you for joining us here at the AWS Public Sector Summit 2017. Back tomorrow with more coverage, live here on the Cube.

Published Date : Jun 13 2017

SUMMARY :

Brought to you by Amazon Web Services in the Walter Washington Convention Center. I love it. The Opry's the other big convention center, here, We got Nats game tonight, Strasburg pitched last night, Yeah let's talk about what we have and companies, it's going to all be connected. and IBM's on the other, and AWS wins that battle. So shot heard around the cloud fast forward to today, in all the areas of compute, that is fascinating to me. and you can actually ratchet your revenue coming in it's the social media for your mom, my mom. I laugh because the Nextdoor guy's mom didn't know You build it up and then you bring your new stuff in, So the cost benefits always there and they have old systems. and all the sudden, the minds are opening, We've got the speakers coming on here, that the other thing is that before there was so much fat, And that's the big story here. So the people that are here are curious they're interested, This is a learning conference. That's the only thing. I'll say it at the end, there's a lot of data lakes in D.C., He's been practicing that all week. I've heard it three times, are you kidding? I hate the word data lake, as everyone knows, at the AWS Public Sector Summit 2017.

ENTITIES

Entity	Category	Confidence
Jeff	PERSON	0.99+
Jeff Frick	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Dave Alante	PERSON	0.99+
Amazon Web Services	ORGANIZATION	0.99+
Jeff Rick	PERSON	0.99+
Nextdoor	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
CIA	ORGANIZATION	0.99+
2013	DATE	0.99+
Josh	PERSON	0.99+
Teresa Carlson	PERSON	0.99+
Central Intelligence Agency	ORGANIZATION	0.99+
Trump	PERSON	0.99+
John Walsh	PERSON	0.99+
Netflix	ORGANIZATION	0.99+
John	PERSON	0.99+
John Furrier	PERSON	0.99+
Gannet	ORGANIZATION	0.99+
San Francisco	LOCATION	0.99+
next year	DATE	0.99+
Teresa Carlton	PERSON	0.99+
Washington, D.C.	LOCATION	0.99+
Congress	ORGANIZATION	0.99+
D.C.	LOCATION	0.99+
Intel	ORGANIZATION	0.99+
45 percent	QUANTITY	0.99+
hundreds	QUANTITY	0.99+
two	QUANTITY	0.99+
one	QUANTITY	0.99+
Tuesday night	DATE	0.99+
MGM	ORGANIZATION	0.99+
eight years ago	DATE	0.99+
10,000 people	QUANTITY	0.99+
three times	QUANTITY	0.99+
today	DATE	0.99+
Walter Washington Convention Center	LOCATION	0.99+
four years later	DATE	0.99+
first time	QUANTITY	0.99+
this year	DATE	0.99+
last night	DATE	0.99+
tomorrow	DATE	0.99+
tonight	DATE	0.99+
Washington DC	LOCATION	0.98+
earlier this year	DATE	0.98+
Moore	PERSON	0.98+
Graceland	LOCATION	0.97+
day one	QUANTITY	0.97+

Stephanie McReynolds, Alation & Lee Paries, Think Big Analytics - #BigDataSV - #theCUBE

>> Voiceover: San Jose, California, tt's theCUBE, covering Big Data Silicon Valley 2017. (techno music) >> Hey, welcome back everyone. Live in Silicon Valley for Big Data SV. This is theCUBE coverage in conjunction with Strata + Hadoop. I'm John Furrier with George Gilbert at Wikibon. Two great guests. We have Stephanie McReynolds, Vice President of startup Alation, and Lee Paries who is the VP of Think Big Analytics. Thanks for coming back. Both been on theCUBE, you have been on theCUBE before, but Think Big has been on many times. Good to see you. What's new, what are you guys up to? >> Yeah, excited to be here and to be here with Lee. Lee and I have a personal relationship that goes back quite aways in the industry. And then what we're talking about today is the integration between Kylo, which was recently announced as an open source project from Think Big, and Alation's capability to sit on top of Kylo and to gather to increase the velocity of data lake initiatives, kind of going from zero to 60 in a pretty short amount of time to get both technical value from Kylo and business value from Alation. >> So talk about Alation's traction, because you guys has been an interesting startup, a lot of great press. George is a big fan. He's going to jump in with some questions, but some good product fit with the market. What's the update? What's some of the status on the traction in terms of the company and customers and whatnot? >> Yeah, we've been growing pretty rapidly for a startup. We've doubled our production customer count from last time we talked. Some great brand names. Munich Reinsurance this morning was talking about their implementation. So they have 600 users of Alation in their organization. We've entered Europe, not only with Munich Reinsurance but Tesco is a large account of ours in Europe now. And here in the States we've seen broad adoption across a wide range of industries, every one from Pfizer in the healthcare space to eBay, who's been our longest standing customer. They have about 1,000 weekly users on Alation. So not only a great increase in number of logos, but also organic growth internally at many of these companies across data scientists, data analysts, business analysts, a wide range of users of the product, as well. >> It's been interesting. What I like about your approach, and we talk about Think Big about it before, we let every guest come in so far that's been in the same area is talking about metadata layers, and so this is interesting, there's a metadata data addressability if you will for lack of a better description, but yet human usable has to be integrating into human processes, whether it's virtualization, or any kind of real time app or anything. So you're seeing this convergence between I need to get the data into an app, whether it's IoT data or something else, really really fast, so really kind of the discovery pieces now, the interesting layer, how competitive is it, and what's the different solutions that you guys see in this market? >> Yeah, I think it's interesting, because metadata has kind of had a revival, right? Everyone is talking about the importance in metadata and open integration with metadata. I think really our angle is as Alation is that having open transfer of technical metadata is very important for the foundation of analytics, but what really brings that technical metadata to life is also understanding what is the business context of what's happening technically in the system? What's the business context of data? What's the behavioral context of how that data has been used that might inform me as an analyst? >> And what's your unique approach to that? Because that's like the Holy Grail. It's like translating geek metadata, indexing stuff into like usable business outcomes. It's been a cliche for years, you know. >> The approach is really based on machine learning and AI technology to make recommendations to business users about what might be interesting to them. So we're at a state in the market where there is so much data that is available and that you can access, either in Hadoop as a data lake or in a data warehouse in a database like Teradata, that today what you need as state of the art is the system to start to recommend to you what might be interesting data for you to use as a data scientist or an analyst, and not just what's the data you could use, but how accurate is that data, how trustworthy is it? I think there's a whole nother theme of governance that's rising that's tied to that metadata discussion, which is it's not enough to just shove bits and bytes between different systems anymore. You really need to understand how has this data been manipulated and used and how does that influence my security considerations, my privacy considerations, the value I'm going to be able to get out of that data set? >> What's your take on this, 'cause you guys have a relationship. How is Think Big doing? Then talk about the partnership you guys have with Alation. >> Sure, so I mean when you look at what we've done specifically to an open source project it's the first one that Teradata has fully sponsored and released based on Apache 2.0 called Kylo, it's really about the enablement of the full data lake platform and the full framework, everywhere from ingest, to securing it, to governing it, which part of that is collecting is part of that process, the basic technical and business metadata so later you can hand it over to the user so they could sample, they could profile the data, they can find, they can search in a Google like manner, and then you can enable the organization with that data. So when you look at it from a standpoint of partnering together, it's really about collecting that data specifically within Hadoop to enable it, yet with the ability then to hand it off to more the enterprise wide solution like Alation through API connections that connect to that, and then for them they enrich it in a way that they go about it with the social collaboration and the business to extend it from there. >> So that's the accelerant then. So you're accelerating the open source project in through this new, with Alation. So you're still going to rock and roll with the open source. >> Very much going to rock and roll with the open source. So it's really been based on five years of Think Big's work in the marketplace over about 150 data lakes. The IT we've built around that to do things repeatedly, consistently, and then releasing that in the last two years, dedicated development based on Apache Spark and NiFi to stand that out. >> Great work by the way. Open sources continue to be more relevant. But I got to get your perspective on a meme that's been floating around day one here, and maybe it's because of the election, but someone said, "We got to drain the data swamp, "and make data great again." And not a play on Trump, but the data lake is going through a transition and saying, "Okay, we've got data lakes," but now this year it's been a focus on making that much more active and cleaner and making sure it doesn't become a swamp if you will. So there's been a focus of taking data lake content and getting it into real time, and IoT has kind of I think been a forcing function. But you guys, do you guys have a perspective on that on where data lakes are going? Certainly it's been trending conversation here at the show. >> Yeah, I think IoT has been part of drain that data swamp, but I think also now you have a mass of business analysts that are starting to get access to that data in the lake. These Hadoop implementations are maturing to the stage where you have-- >> John: To value coming out of it. >> Yeah, and people are trying to wring value out of that lake, and sometimes finding that it is harder than they expected because the data hasn't been pre-prepared for them. This old world of IT would pre-prepare the data, and then I got a single metric or I got a couple metrics to choose from is now turned on its head. People are taking a more exploratory, discovery oriented approach to navigating through their data and finding that the nuisances of data really matter when trying to evolve an insight. So the literacy in these organizations and their awareness of some of the challenges of a lake are coming to the forefront, and I think that's a healthy conversation for us all to have. If you're going to have a data driven organization, you have to really understand the nuisances of your data to know where to apply it appropriately to decision making. >> So (mumbles) actually going back quite a few years when he started at Microsoft said, Internet software has changed paradigm so much in that we have this new set of actions where it was discover, learn, try, buy, recommend, and it sounds like as a consumer of data in a data lake we've added or preppended this discovery step. Where in a well curated data warehouse it was learn, you had your X dimensions that were curated and refined, and you don't have that as much with the data lake. I guess I'm wondering, it's almost like if you're going to take, as we were talking to the last team with AtScale and moving OLAP to be something you consume on a data lake the way you consume on a data warehouse, it's almost like Alation and a smart catalog is as much a requirement as a visualization tool is by itself on a data warehouse? >> I think what we're seeing is this notion of data needing to be curated, and including many brains and many different perspectives in that curation process is something that's defining the future of analytics and how people use technical metadata, and what does it mean for the devops organization to get involved in draining that swamp? That means not only looking at the elements of the data that are coming in from a technical perspective, but then collaborating with a business to curate the value on top of that data. >> So in other words it's not just to help the user, the business analyst, navigate, but it's also to help the operational folks do a better job of curating once they find out who's using it, who's using the data and how. >> That's right. They kind of need to know how this data is going to be used in the organization. The volumes are so high that they couldn't possibly curate every bit and byte that is stored in the data lake. So by looking at how different individuals in the organization and different groups are trying to access that data that gives early signal to where should we be spending more time or less time in processing this data and helping the organization really get to their end goals of usage. >> Lee, I want to ask you a question. On your blog post, I just was pointed out earlier, you guys quote a Gartner stat which says, which is pretty doom and gloom, which said, "70% of Hadoop deployments in 2017 "will either fail or deliver their estimated cost savings "of their predicted revenue." And then it says, "That's a dim view, "but not shared by the Kylo community." How are you guys going to make the Kylo data lake software work well? What's your thoughts on that? Because I think people, that's the number one, again, question that I highlighted earlier is okay, I don't want a swamp, so that's fear, whether they get one or not, so they worry about data cleansing and all these things. So what's Kylo doing that's going to accelerate, or lower that number, of fails in the data lake world? >> Yeah sure, so again, a lot of it's through experience of going out there and seeing what's done. A lot of people have been doing a lot of different things within the data lakes, but when you go in there there's certain things they're not doing, and then when you're doing them it's about doing them over consistently and continually improving upon that, and that's what Kylo is, it's really a framework that we keep adding to, and as the community grows and other projects come in there can enhance it we bring the value. But a lot of times when we go in it it's basically end users can't get to the data, either one because they're not allowed to because maybe it's not secured and relied to turn it over to them and let them drive with it, or they don't know the data is there, which goes back to basic collecting the basic metadata and data (mumbles) to know it's there to leverage it. So a lot of times it's going back and looking at and leveraging what we have to build that solid foundation so IT and operations can feel like they can hand that over in a template format so business users could get to the data and start acting off of that. >> You just lost your mic there, but Stephanie, I got to ask you a question. So just on a point of clarification, so you guys, are you supporting Kylo? Is that the relationship, or how does that work? >> So we're integrated with Kylo. So Kylo will ingest data into the lake, manage that data lake from a security perspective giving folks permissions, enables some wrangling on that data, and what Alation is receiving then from Kylo is that technical metadata that's being created along that entire path. >> So you're certified with Kylo? How does that all work from the customer standpoint? >> That's a very much integration partnership that we'd be working together. >> So from a customer standpoint it's clean and you then provide the benefits on the other side? >> Correct. >> Yeah, absolutely. We've been working with data lake implementations for some time, since our founding really, and I think this is an extension of our philosophy that the data lakes are going to play an important role that are going to complement databases and analytics tools, business intelligence tools, and the analytics environment, and the open source is part of the future of how folks are building these environments. So we're excited to support the Kylo initiative. We've had a longstanding relationship with Teradata as a partner, so it's a great way to work together. >> Thanks for coming on theCUBE. Really appreciate it, and thank... What do you think of the show you guys so far? What's the current vibe of the show? >> Oh, it's been good so far. I mean, it's one day into it, but very good vibe so far. Different topics and different things-- >> AI machine learning. You couldn't be more happier with that machine learning-- >> Great to see machine learning taking a forefront, people really digging into the details around what it means when you apply it. >> Stephanie, thanks for coming on theCUBE, really appreciate it. More CUBE coverage after the show break. Live from Silicon Valley, I'm John Furrier with George Gilbert. We'll be right back after this short break. (techno music)

Published Date : Mar 15 2017

SUMMARY :

(techno music) What's new, what are you guys up to? and to gather to increase He's going to jump in with some questions, And here in the States we've seen broad adoption that you guys see in this market? Everyone is talking about the importance in metadata Because that's like the Holy Grail. is the system to start to recommend to you Then talk about the partnership you guys have with Alation. and the business to extend it from there. So that's the accelerant then. and NiFi to stand that out. and maybe it's because of the election, to the stage where you have-- and finding that the nuisances of data really matter to be something you consume on a data lake and many different perspectives in that curation process but it's also to help the operational folks and helping the organization really get in the data lake world? and data (mumbles) to know it's there to leverage it. but Stephanie, I got to ask you a question. and what Alation is receiving then from Kylo that we'd be working together. that the data lakes are going to play an important role What's the current vibe of the show? Oh, it's been good so far. You couldn't be more happier with that machine learning-- people really digging into the details More CUBE coverage after the show break.

ENTITIES

Entity	Category	Confidence
Stephanie McReynolds	PERSON	0.99+
George Gilbert	PERSON	0.99+
Europe	LOCATION	0.99+
Stephanie	PERSON	0.99+
Lee	PERSON	0.99+
Tesco	ORGANIZATION	0.99+
Lee Paries	PERSON	0.99+
George	PERSON	0.99+
Trump	PERSON	0.99+
2017	DATE	0.99+
John	PERSON	0.99+
Pfizer	ORGANIZATION	0.99+
five years	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
Think Big	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
70%	QUANTITY	0.99+
San Jose, California	LOCATION	0.99+
Alation	ORGANIZATION	0.99+
Teradata	ORGANIZATION	0.99+
Think Big Analytics	ORGANIZATION	0.99+
Silicon Valley	LOCATION	0.99+
Gartner	ORGANIZATION	0.99+
zero	QUANTITY	0.99+
Kylo	ORGANIZATION	0.99+
60	QUANTITY	0.99+
600 users	QUANTITY	0.98+
AtScale	ORGANIZATION	0.98+
eBay	ORGANIZATION	0.98+
Google	ORGANIZATION	0.98+
today	DATE	0.98+
first one	QUANTITY	0.98+
Hadoop	TITLE	0.98+
Both	QUANTITY	0.98+
both	QUANTITY	0.97+
Two great guests	QUANTITY	0.97+
this year	DATE	0.97+
about 1,000 weekly users	QUANTITY	0.97+
one day	QUANTITY	0.95+
single metric	QUANTITY	0.95+
Apache Spark	ORGANIZATION	0.94+
Kylo	TITLE	0.93+
Wikibon	ORGANIZATION	0.93+
NiFi	ORGANIZATION	0.92+
about 150 data lakes	QUANTITY	0.92+
Apache 2.0	TITLE	0.89+
this morning	DATE	0.88+
couple	QUANTITY	0.86+
Big Data Silicon Valley 2017	EVENT	0.84+
day one	QUANTITY	0.83+
Vice President	PERSON	0.81+
Strata	TITLE	0.77+
Kylo	PERSON	0.77+
#theCUBE	ORGANIZATION	0.76+
Big Data	ORGANIZATION	0.75+
last two years	DATE	0.71+
one	QUANTITY	0.7+
Munich Reinsurance	ORGANIZATION	0.62+
CUBE	ORGANIZATION	0.52+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Data Lakes 1.0: