Lie 2, An Open Source Based Platform Cannot Give You Performance and Control | Starburst

>>We're back with Jess Borgman of Starburst and Richard Jarvis of EVAs health. Okay. We're gonna get into lie. Number two, and that is this an open source based platform cannot give you the performance and control that you can get with a proprietary system. Is that a lie? Justin, the enterprise data warehouse has been pretty dominant and has evolved and matured. Its stack has mature over the years. Why is it not the default platform for data? >>Yeah, well, I think that's become a lie over time. So I, I think, you know, if we go back 10 or 12 years ago with the advent of the first data lake really around Hudu, that probably was true that you couldn't get the performance that you needed to run fast, interactive, SQL queries in a data lake. Now a lot's changed in 10 or 12 years. I remember in the very early days, people would say, you'll, you'll never get performance because you need to be column. You need to store data in a column format. And then, you know, column formats were introduced to, to data lake. You have Parque ORC file in aro that were created to ultimately deliver performance out of that. So, okay. We got, you know, largely over the performance hurdle, you know, more recently people will say, well, you don't have the ability to do updates and deletes like a traditional data warehouse. >>And now we've got the creation of new data formats, again, like iceberg and Delta and hoote that do allow for updates and delete. So I think the data lake has continued to mature. And I remember a quote from, you know, Kurt Monash many years ago where he said, you know, it takes six or seven years to build a functional database. I think that's that's right. And now we've had almost a decade go by. So, you know, these technologies have matured to really deliver very, very close to the same level performance and functionality of, of cloud data warehouses. So I think the, the reality is that's become a lie and now we have large giant hyperscale internet companies that, you know, don't have the traditional data warehouse at all. They do all of their analytics in a data lake. So I think we've, we've proven that it's very much possible today. >>Thank you for that. And so Richard, talk about your perspective as a practitioner in terms of what open brings you versus, I mean, the clothes is it's open as a moving target. I remember Unix used to be open systems and so it's, it is an evolving, you know, spectrum, but, but from your perspective, what does open give you that you can't get from a proprietary system where you are fearful of in a proprietary system? >>I, I suppose for me open buys us the ability to be unsure about the future, because one thing that's always true about technology is it evolves in a, a direction, slightly different to what people expect and what you don't want to end up done is backed itself into a corner that then prevents it from innovating. So if you have chosen the technology and you've stored trillions of records in that technology and suddenly a new way of processing or machine learning comes out, you wanna be able to take advantage your competitive edge might depend upon it. And so I suppose for us, we acknowledge that we don't have perfect vision of what the future might be. And so by backing open storage technologies, we can apply a number of different technologies to the processing of that data. And that gives us the ability to remain relevant, innovate on our data storage. And we have bought our way out of the, any performance concerns because we can use cloud scale infrastructure to scale up and scale down as we need. And so we don't have the concerns that we don't have enough hardware today to process what we want to do, want to achieve. We can just scale up when we need it and scale back down. So open source has really allowed us to maintain the being at the cutting edge. >>So Jess, let me play devil's advocate here a little bit, and I've talked to JAK about this and you know, obviously her vision is there's an open source that, that data mesh is open source, an open source tooling, and it's not a proprietary, you know, you're not gonna buy a data mesh. You're gonna build it with, with open source toolings and, and vendors like you are gonna support it, but come back to sort of today, you can get to market with a proprietary solution faster. I'm gonna make that statement. You tell me if it's a lie and then you can say, okay, we support Apache iceberg. We're gonna support open source tooling, take a company like VMware, not really in the data business, but how, the way they embraced Kubernetes and, and you know, every new open source thing that comes along, they say, we do that too. Why can't proprietary systems do that and be as effective? >>Yeah, well I think at least with the, within the data landscape saying that you can access open data formats like iceberg or, or others is, is a bit dis disingenuous because really what you're selling to your customer is a certain degree of performance, a certain SLA, and you know, those cloud data warehouses that can reach beyond their own proprietary storage drop all the performance that they were able to provide. So it is, it reminds me kind of, of, again, going back 10 or 12 years ago when everybody had a connector to hit and that they thought that was the solution, right? But the reality was, you know, a connector was not the same as running workloads in hit back then. And I think similarly, you know, being able to connect to an external table that lives in an open data format, you know, you're, you're not going to give it the performance that your customers are accustomed to. And at the end of the day, they're always going to be predisposed. They're always going to be incentivized to get that data ingested into the data warehouse, cuz that's where they have control. And you know, the bottom line is the database industry has really been built around vendor lockin. I mean, from the start, how, how many people love Oracle today, but our customers, nonetheless, I think, you know, lockin is, is, is part of this industry. And I think that's really what we're trying to change with open data formats. >>Well, it's interesting remind of when I, you know, I see the, the gas price, the TSR gas price I, I drive up and then I say, oh, that's the cash price credit card. I gotta pay 20 cents more, but okay. But so the, the argument then, so let me, let me come back to you, Justin. So what's wrong with saying, Hey, we support open data formats, but yeah, you're gonna get better performance if you, if you, you keep it into our closed system, are you saying that long term that's gonna come back and bite you cuz you're gonna end up, you mentioned Oracle, you mentioned Teradata. Yeah. That's by, by implication, you're saying that's where snowflake customers are headed. >>Yeah, absolutely. I think this is a movie that, you know, we've all seen before. At least those of us who've been in the industry long enough to, to see this movie play over a couple times. So I do think that's the future. And I think, you know, I loved what Richard said. I actually wrote it down. Cause I thought it was an amazing quote. He said, it buys us the ability to be unsure of the future. That that pretty much says it all the, the future is unknowable and the reality is using open data formats. You remain interoperable with any technology you want to utilize. If you want to use spark to train a machine learning model and you wanna use Starbust to query via sequel, that's totally cool. They can both work off the same exact, you know, data, data sets by contrast, if you're, you know, focused on a proprietary model, then you're kind of locked in again to that model. I think the same applies to data, sharing to data products, to a wide variety of, of aspects of the data landscape that a proprietary approach kind of closes you and, and locks you in. >>So I, I would say this Richard, I'd love to get your thoughts on it. Cause I talked to a lot of Oracle customers, not as many te data customers there, but, but a lot of Oracle customers and they, you know, they'll admit yeah, you know, the Jammin us on price and the license cost, but we do get value out of it. And so my question to you, Richard, is, is do the, let's call it data warehouse systems or the proprietary systems. Are they gonna deliver a greater ROI sooner? And is that in allure of, of that customers, you know, are attracted to, or can open platforms deliver as fast an ROI? >>I think the answer to that is it can depend a bit. It depends on your business's skillset. So we are lucky that we have a number of proprietary teams that work in databases that provide our operational data capability. And we have teams of analytics and big data experts who can work with open data sets and open data formats. And so for those different teams, they can get to an ROI more quickly with different technologies for the business though, we can't do better for our operational data stores than proprietary databases. Today we can back off very tight SLAs to them. We can demonstrate reliability from millions of hours of those databases being run at enterprise scale, but for an analytics workload where increasing our business is growing in that direction, we can't do better than open data formats with cloud-based data mesh type technologies. And so it's not a simple answer. That one will always be the right answer for our business. We definitely have times when proprietary databases provide a capability that we couldn't easily represent or replicate with open technologies. >>Yeah. Richard, stay with you. You mentioned, you know, you know, some things before that, that strike me, you know, the data brick snowflake, you know, thing is always a lot of fun for analysts like me. You've got data bricks coming at it. Richard, you mentioned you have a lot of rockstar, data engineers, data bricks coming at it from a data engineering heritage. You get snowflake coming at it from an analytics heritage. Those two worlds are, are colliding people like PJI Mohan said, you know what? I think it's actually harder to play in the data engineering. So IE, it's easier to for data engineering world to go into the analytics world versus the reverse, but thinking about up and coming engineers and developers preparing for this future of data engineering and data analytics, how, how should they be thinking about the future? What, what's your advice to those young people? >>So I think I'd probably fall back on general programming skill sets. So the advice that I saw years ago was if you have open source technologies, the pythons and Javas on your CV, you command a 20% pay, hike over people who can only do proprietary programming languages. And I think that's true of data technologies as well. And from a business point of view, that makes sense. I'd rather spend the money that I save on proprietary licenses on better engineers, because they can provide more value to the business that can innovate us beyond our competitors. So I think I would my advice to people who are starting here or trying to build teams to capitalize on data assets is begin with open license, free capabilities because they're very cheap to experiment with. And they generate a lot of interest from people who want to join you as a business. And you can make them very successful early, early doors with, with your analytics journey. >>It's interesting. Again, analysts like myself, we do a lot of TCO work and have over the last 20 plus years and in the world of Oracle, you know, normally it's the staff, that's the biggest nut in total cost of ownership, not an Oracle. It's the it's the license cost is by far the biggest component in the, in the blame pie. All right, Justin, help us close out this segment. We've been talking about this sort of data mesh open, closed snowflake data bricks. Where does Starburst sort of as this engine for the data lake data lake house, the data warehouse, it, it fit in this, in this world. >>Yeah. So our view on how the future ultimately unfolds is we think that data lakes will be a natural center of gravity for a lot of the reasons that we described open data formats, lowest total cost of ownership, because you get to choose the cheapest storage available to you. Maybe that's S3 or Azure data lake storage or Google cloud storage, or maybe it's on-prem object storage that you bought at a, at a really good price. So ultimately storing a lot of data in a data lake makes a lot of sense, but I think what makes our perspective unique is we still don't think you're gonna get everything there either. We think that basically centralization of all your data assets is just an impossible endeavor. And so you wanna be able to access data that lives outside of the lake as well. So we kind of think of the lake as maybe the biggest place by volume in terms of how much data you have, but to, to have comprehensive analytics and to truly understand your business and understanding holistically, you need to be able to go access other data sources as well. And so that's the role that we wanna play is to be a single point of access for our customers, provide the right level of fine grained access controls so that the right people have access to the right data and ultimately make it easy to discover and consume via, you know, the creation of data products as well. >>Great. Okay. Thanks guys. Right after this quick break, we're gonna be back to debate whether the cloud data model that we see emerging and the so-called modern data stack is really modern or is it the same wine new bottle when it comes to data architectures, you're watching the cube, the leader in enterprise and emerging tech coverage.

Published Date : Aug 22 2022

SUMMARY :

give you the performance and control that you can get with a proprietary We got, you know, largely over the performance hurdle, you know, more recently people will say, And I remember a quote from, you know, Kurt Monash many years ago where he said, you know, it is an evolving, you know, spectrum, but, but from your perspective, in a, a direction, slightly different to what people expect and what you don't want to end up So Jess, let me play devil's advocate here a little bit, and I've talked to JAK about this and you know, And I think similarly, you know, being able to connect to an external table that lives in an open data format, Well, it's interesting remind of when I, you know, I see the, the gas price, the TSR gas price And I think, you know, I loved what Richard said. you know, the Jammin us on price and the license cost, but we do get value out And so for those different teams, they can get to an you know, the data brick snowflake, you know, thing is always a lot of fun for analysts like me. So the advice that I saw years ago was if you have open source technologies, years and in the world of Oracle, you know, normally it's the staff, to discover and consume via, you know, the creation of data products as well. data model that we see emerging and the so-called modern data stack is

ENTITIES

Entity	Category	Confidence
Jess Borgman	PERSON	0.99+
Richard	PERSON	0.99+
20 cents	QUANTITY	0.99+
six	QUANTITY	0.99+
Justin	PERSON	0.99+
Richard Jarvis	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
Kurt Monash	PERSON	0.99+
20%	QUANTITY	0.99+
Jess	PERSON	0.99+
pythons	TITLE	0.99+
seven years	QUANTITY	0.99+
Today	DATE	0.99+
Javas	TITLE	0.99+
Teradata	ORGANIZATION	0.99+
VMware	ORGANIZATION	0.98+
millions	QUANTITY	0.98+
EVAs	ORGANIZATION	0.98+
JAK	PERSON	0.98+
Starburst	ORGANIZATION	0.98+
both	QUANTITY	0.97+
10	DATE	0.97+
12 years ago	DATE	0.97+
Starbust	TITLE	0.96+
today	DATE	0.95+
Apache iceberg	ORGANIZATION	0.94+
Google	ORGANIZATION	0.93+
12 years	QUANTITY	0.92+
single point	QUANTITY	0.92+
two worlds	QUANTITY	0.92+
10	QUANTITY	0.91+
Hudu	LOCATION	0.91+
Unix	TITLE	0.9+
one thing	QUANTITY	0.87+
trillions of records	QUANTITY	0.83+
first data lake	QUANTITY	0.82+
Starburst	TITLE	0.8+
PJI	ORGANIZATION	0.79+
years ago	DATE	0.76+
IE	TITLE	0.75+
Lie 2	TITLE	0.72+
many years ago	DATE	0.72+
over a couple times	QUANTITY	0.7+
TCO	ORGANIZATION	0.7+
Parque	ORGANIZATION	0.67+
Number two	QUANTITY	0.64+
Kubernetes	ORGANIZATION	0.59+
a decade	QUANTITY	0.58+
plus years	DATE	0.57+
Azure	TITLE	0.57+
S3	TITLE	0.55+
Delta	TITLE	0.54+
20	QUANTITY	0.49+
last	DATE	0.48+
Mohan	PERSON	0.44+
ORC	ORGANIZATION	0.27+

Starburst Panel Q2

>>We're back with Jess Borgman of Starburst and Richard Jarvis of emus health. Okay. We're gonna get into lie. Number two, and that is this an open source based platform cannot give you the performance and control that you can get with a proprietary system. Is that a lie? Justin, the enterprise data warehouse has been pretty dominant and has evolved and matured. Its stack has mature over the years. Why is it not the default platform for data? >>Yeah, well, I think that's become a lie over time. So I, I think, you know, if we go back 10 or 12 years ago with the advent of the first data lake really around Hudu, that probably was true that you couldn't get the performance that you needed to run fast, interactive, SQL queries in a data lake. Now a lot's changed in 10 or 12 years. I remember in the very early days, people would say, you'll, you'll never get performance because you need to be column. You need to store data in a column format. And then, you know, column formats were introduced to, to data lakes. You have Parque ORC file in aro that were created to ultimately deliver performance out of that. So, okay. We got, you know, largely over the performance hurdle, you know, more recently people will say, well, you don't have the ability to do updates and deletes like a traditional data warehouse. >>And now we've got the creation of new data formats, again like iceberg and Delta and DY that do allow for updates and delete. So I think the data lake has continued to mature. And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, it takes six or seven years to build a functional database. I think that's that's right. And now we've had almost a decade go by. So, you know, these technologies have matured to really deliver very, very close to the same level performance and functionality of, of cloud data warehouses. So I think the, the reality is that's become a lie and now we have large giant hyperscale internet companies that, you know, don't have the traditional data warehouse at all. They do all of their analytics in a data lake. So I think we've, we've proven that it's very much possible today. >>Thank you for that. And so Richard, talk about your perspective as a practitioner in terms of what open brings you versus, I mean, the closed is it's open as a moving target. I remember Unix used to be open systems and so it's, it is an evolving, you know, spectrum, but, but from your perspective, what does open give you that you can't get from a proprietary system where you are fearful of in a proprietary system? >>I, I suppose for me open buys us the ability to be unsure about the future, because one thing that's always true about technology is it evolves in a, a direction, slightly different to what people expect. And what you don't want to end up is done is backed itself into a corner that then prevents it from innovating. So if you have chosen the technology and you've stored trillions of records in that technology and suddenly a new way of processing or machine learning comes out, you wanna be able to take advantage and your competitive edge might depend upon it. And so I suppose for us, we acknowledge that we don't have perfect vision of what the future might be. And so by backing open storage technologies, we can apply a number of different technologies to the processing of that data. And that gives us the ability to remain relevant, innovate on our data storage. And we have bought our way out of the, any performance concerns because we can use cloud scale infrastructure to scale up and scale down as we need. And so we don't have the concerns that we don't have enough hardware today to process what we want to do, but want to achieve. We can just scale up when we need it and scale back down. So open source has really allowed us to maintain the being at the cutting edge. >>So Justin, let me play devil's advocate here a little bit, and I've talked to JAK about this and you know, obviously her vision is there's an open source that, that data mesh is open source, an open source tooling, and it's not a proprietary, you know, you're not gonna buy a data mesh. You're gonna build it with, with open source toolings and, and vendors like you are gonna support it, but come back to sort of today, you can get to market with a proprietary solution faster. I'm gonna make that statement. You tell me if it's a lie and then you can say, okay, we support Apache iceberg. We're gonna support open source tooling, take a company like VMware, not really in the data business, but how, the way they embraced Kubernetes and, and you know, every new open source thing that comes along, they say, we do that too. Why can't proprietary systems do that and be as effective? >>Yeah, well, I think at least with the, within the data landscape saying that you can access open data formats like iceberg or, or others is, is a bit dis disingenuous because really what you're selling to your customer is a certain degree of performance, a certain SLA, and you know, those cloud data warehouses that can reach beyond their own proprietary storage drop all the performance that they were able to provide. So it is, it reminds me kind of, of, again, going back 10 or 12 years ago when everybody had a connector to hit and that they thought that was the solution, right? But the reality was, you know, a connector was not the same as running workloads in had back then. And I think, think similarly, you know, being able to connect to an external table that lives in an open data format, you know, you're, you're not going to give it the performance that your customers are accustomed to. And at the end of the day, they're always going to be predisposed. They're always going to be incentivized to get that data ingested into the data warehouse, cuz that's where they have control. And you know, the bottom line is the database industry has really been built around vendor lockin. I mean, from the start, how, how many people love Oracle today, but our customers, nonetheless, I think, you know, lockin is, is, is part of this industry. And I think that's really what we're trying to change with open data formats. >>Well, it's interesting reminded when I, you know, I see the, the gas price, the TSR gas price I, I drive up and then I say, oh, that's the cash price credit card. I gotta pay 20 cents more, but okay. But so the, the argument then, so let me, let me come back to you, Justin. So what's wrong with saying, Hey, we support open data formats, but yeah, you're gonna get better performance if you, if you, you keep it into our closed system, are you saying that long term that's gonna come back and bite you cuz you're gonna end up. You mentioned Oracle, you mentioned Teradata. Yeah. That's by, by implication, you're saying that's where snowflake customers are headed. >>Yeah, absolutely. I think this is a movie that, you know, we've all seen before. At least those of us who've been in the industry long enough to, to see this movie play over a couple times. So I do think that's the future. And I think, you know, I loved what Richard said. I actually wrote it down cause I thought it was amazing quote. He said, it buys us the ability to be unsure of the future. That that pretty much says it all the, the future is unknowable and the reality is using open data formats. You remain interoperable with any technology you want to utilize. If you want to use smart to train a machine learning model and you wanna use Starbust to query be a sequel, that's totally cool. They can both work off the same exact, you know, data, data sets by contrast, if you're, you know, focused on a proprietary model, then you're kind of locked in again to that model. I think the same applies to data, sharing to data products, to a wide variety of, of aspects of the data landscape that a proprietary approach kind of closes you and, and locks you in. >>So I would say this Richard, I'd love to get your thoughts on it. Cause I talked to a lot of Oracle customers, not as many te data customers, but, but a lot of Oracle customers and they, you know, they'll admit yeah, you know, they Jimin some price and the license cost they give, but we do get value out of it. And so my question to you, Richard, is, is do the, let's call it data warehouse systems or the proprietary systems. Are they gonna deliver a greater ROI sooner? And is that in allure of, of that customers, you know, are attracted to, or can open platforms deliver as fast an ROI? >>I think the answer to that is it can depend a bit. It depends on your business's skillset. So we are lucky that we have a number of proprietary teams that work in databases that provide our operational data capability. And we have teams of analytics and big data experts who can work with open data sets and open data formats. And so for those different teams, they can get to an ROI more quickly with different technologies for the business though, we can't do better for our operational data stores than proprietary databases. Today we can back off very tight SLAs to them. We can demonstrate reliability from millions of hours of those databases being run enterprise scale, but for an analytics workload where increasing our business is growing in that direction, we can't do better than open data formats with cloud based data mesh type technologies. And so it's not a simple answer. That one will always be the right answer for our business. We definitely have times when proprietary databases provide a capability that we couldn't easily represent or replicate with open technologies. >>Yeah. Richard, stay with you. You mentioned, you know, you know, some things before that, that strike me, you know, the data brick snowflake, you know, thing is a lot of fun for analysts like me. You've got data bricks coming at it. Richard, you mentioned you have a lot of rockstar, data engineers, data bricks coming at it from a data engineering heritage. You get snowflake coming at it from an analytics heritage. Those two worlds are, are colliding people like P Sanji Mohan said, you know what? I think it's actually harder to play in the data engineering. So I E it's easier to for data engineering world to go into the analytics world versus the reverse, but thinking about up and coming engineers and developers preparing for this future of data engineering and data analytics, how, how should they be thinking about the future? What, what's your advice to those young people? >>So I think I'd probably fall back on general programming skill sets. So the advice that I saw years ago was if you have open source technologies, the pythons and Javas on your CV, you command a 20% pay, hike over people who can only do proprietary programming languages. And I think that's true of data technologies as well. And from a business point of view, that makes sense. I'd rather spend the money that I save on proprietary licenses on better engineers, because they can provide more value to the business that can innovate us beyond our competitors. So I think I would my advice to people who are starting here or trying to build teams to capitalize on data assets is begin with open license, free capabilities, because they're very cheap to experiment with. And they generate a lot of interest from people who want to join you as a business. And you can make them very successful early, early doors with, with your analytics journey. >>It's interesting. Again, analysts like myself, we do a lot of TCO work and have over the last 20 plus years and in the world of Oracle, you know, normally it's the staff, that's the biggest nut in total cost of ownership, not an Oracle. It's the it's the license cost is by far the biggest component in the, in the blame pie. All right, Justin, help us close out this segment. We've been talking about this sort of data mesh open, closed snowflake data bricks. Where does Starburst sort of as this engine for the data lake data lake house, the data warehouse, it fit in this, in this world. >>Yeah. So our view on how the future ultimately unfolds is we think that data lakes will be a natural center of gravity for a lot of the reasons that we described open data formats, lowest total cost of ownership, because you get to choose the cheapest storage available to you. Maybe that's S3 or Azure data lake storage, or Google cloud storage, or maybe it's on-prem object storage that you bought at a, at a really good price. So ultimately storing a lot of data in a data lake makes a lot of sense, but I think what makes our perspective unique is we still don't think you're gonna get everything there either. We think that basically centralization of all your data assets is just an impossible endeavor. And so you wanna be able to access data that lives outside of the lake as well. So we kind of think of the lake as maybe the biggest place by volume in terms of how much data you have, but to, to have comprehensive analytics and to truly understand your business and understand it holistically, you need to be able to go access other data sources as well. And so that's the role that we wanna play is to be a single point of access for our customers, provide the right level of fine grained access control so that the right people have access to the right data and ultimately make it easy to discover and consume via, you know, the creation of data products as well. >>Great. Okay. Thanks guys. Right after this quick break, we're gonna be back to debate whether the cloud data model that we see emerging and the so-called modern data stack is really modern, or is it the same wine new bottle when it comes to data architectures, you're watching the cube, the leader in enterprise and emerging tech coverage.

Published Date : Aug 2 2022

SUMMARY :

cannot give you the performance and control that you can get with We got, you know, largely over the performance hurdle, you know, more recently people will say, And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, open systems and so it's, it is an evolving, you know, spectrum, And what you don't want to end up So Justin, let me play devil's advocate here a little bit, and I've talked to JAK about this and you know, And I think, think similarly, you know, being able to connect to an external table that lives in an open data Well, it's interesting reminded when I, you know, I see the, the gas price, And I think, you know, I loved what Richard said. not as many te data customers, but, but a lot of Oracle customers and they, you know, I think the answer to that is it can depend a bit. that strike me, you know, the data brick snowflake, you know, thing is a lot of fun for analysts So the advice that I saw years ago was if you have open source technologies, years and in the world of Oracle, you know, normally it's the staff, it easy to discover and consume via, you know, the creation of data products as well. data model that we see emerging and the so-called modern data stack

ENTITIES

Entity	Category	Confidence
Richard	PERSON	0.99+
Jess Borgman	PERSON	0.99+
Justin	PERSON	0.99+
six	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
Richard Jarvis	PERSON	0.99+
20 cents	QUANTITY	0.99+
20%	QUANTITY	0.99+
Kurt Monash	PERSON	0.99+
P Sanji Mohan	PERSON	0.99+
Today	DATE	0.99+
seven years	QUANTITY	0.99+
pythons	TITLE	0.99+
Teradata	ORGANIZATION	0.99+
JAK	PERSON	0.99+
Javas	TITLE	0.99+
10	DATE	0.99+
today	DATE	0.98+
Starbust	TITLE	0.98+
Starburst	ORGANIZATION	0.97+
VMware	ORGANIZATION	0.97+
both	QUANTITY	0.97+
12 years ago	DATE	0.96+
single point	QUANTITY	0.96+
millions of hours	QUANTITY	0.95+
10	QUANTITY	0.93+
Unix	TITLE	0.92+
12 years	QUANTITY	0.92+
Google	ORGANIZATION	0.9+
two worlds	QUANTITY	0.9+
DY	ORGANIZATION	0.87+
first data lake	QUANTITY	0.86+
Hudu	LOCATION	0.85+
trillions	QUANTITY	0.85+
one thing	QUANTITY	0.83+
many years ago	DATE	0.79+
Apache iceberg	ORGANIZATION	0.79+
over a couple times	QUANTITY	0.77+
emus health	ORGANIZATION	0.75+
Jimin	PERSON	0.73+
Starburst	TITLE	0.73+
years ago	DATE	0.72+
Azure	TITLE	0.7+
Kubernetes	ORGANIZATION	0.67+
TCO	ORGANIZATION	0.64+
S3	TITLE	0.62+
Delta	ORGANIZATION	0.6+
plus years	DATE	0.59+
Number two	QUANTITY	0.58+
a decade	QUANTITY	0.56+
iceberg	TITLE	0.47+
Parque	ORGANIZATION	0.47+
last	DATE	0.47+
20	QUANTITY	0.46+
Q2	QUANTITY	0.31+
ORC	ORGANIZATION	0.27+

Steven Mih, Ahana and Sachin Nayyar, Securonix | AWS Startup Showcase

>> Voiceover: From theCUBE's Studios in Palo Alto in Boston, connecting with thought leaders all around the world, this is theCUBE Conversation. >> Welcome back to theCUBE's coverage of the AWS Startup Showcase. Next Big Thing in AI, Security and Life Sciences featuring Ahana for the AI Trek. I'm your host, John Furrier. Today, we're joined by two great guests, Steven Mih, Ahana CEO, and Sachin Nayyar, Securonix CEO. Gentlemen, thanks for coming on theCUBE. We're talking about the Next-Gen technologies on AI, Open Data Lakes, et cetera. Thanks for coming on. >> Thanks for having us, John. >> Thanks, John. >> What a great line up here. >> Sachin: Thanks, Steven. >> Great, great stuff. Sachin, let's get in and talk about your company, Securonix. What do you guys do? Take us through, I know you've got a slide to help us through this, I want to introduce your stuff first then jump in with Steven. >> Absolutely. Thanks again, Steven. Ahana team for having us on the show. So Securonix, we started the company in 2010. We are the leader in security analytics and response capability for the cybermarket. So basically, this is a category of solutions called SIEM, Security Incident and Event Management. We are the quadrant leaders in Gartner, we now have about 500 customers today and have been plugging away since 2010. Started the company just really focused on analytics using machine learning and an advanced analytics to really find the needle in the haystack, then moved from there to needle in the needle stack using more algorithms, analysis of analysis. And then kind of, I evolved the company to run on cloud and become sort of the biggest security data lake on cloud and provide all the analytics to help companies with their insider threat, cyber threat, cloud solutions, application threats, emerging internally and externally, and then response and have a great partnership with Ahana as well as with AWS. So looking forward to this session, thank you. >> Awesome. I can't wait to hear the news on that Next-Gen SIEM leadership. Steven, Ahana, talk about what's going on with you guys, give us the update, a lot of stuff happening. >> Yeah. Great to be here and thanks for that such, and we appreciate the partnership as well with both Securonix and AWS. Ahana is the open source company based on PrestoDB, which is a project that came out of Facebook and is widely used, one of the fastest growing projects in data analytics today. And we make a managed service for Presto easily on AWS, all cloud native. And we'll be talking about that more during the show. Really excited to be here. We believe in open source. We believe in all the challenges of having data in the cloud and making it easy to use. So thanks for having us again. >> And looking forward to digging into that managed service and why that's been so successful. Looking forward to that. Let's get into the Securonix Next-Gen SIEM leadership first. Let's share the journey towards what you guys are doing here. As the Open Data Lakes on AWS has been a hot topic, the success of data in the cloud, no doubt is on everyone's mind especially with the edge coming. It's just, I mean, just incredible growth. Take us through Sachin, what do you guys got going on? >> Absolutely. Thanks, John. We are hearing about cyber threats every day. No question about it. So in the past, what was happening is companies, what we have done as enterprise is put all of our eggs in the basket of solutions that were evaluating the network data. With cloud, obviously there is no more network data. Now we have moved into focusing on EDR, right thing to do on endpoint detection. But with that, we also need security analytics across on-premise and cloud. And your other solutions like your OT, IOT, your mobile, bringing it all together into a security data lake and then running purpose built analytics on top of that, and then having a response so we can prevent some of these things from happening or detect them in real time versus innovating for hours or weeks and months, which is is obviously too late. So with some of the recent events happening around colonial and others, we all know cybersecurity is on top of everybody's mind. First and foremost, I also want to. >> Steven: (indistinct) slide one and that's all based off on top of the data lake, right? >> Sachin: Yes, absolutely. Absolutely. So before we go into on Securonix, I also want to congratulate everything going on with the new cyber initiatives with our government and just really excited to see some of the things that the government is also doing in this space to bring, to have stronger regulation and bring together the government and the private sector. From a Securonix perspective, today, we have one third of the fortune 500 companies using our technology. In addition, there are hundreds of small and medium sized companies that rely on Securonix for their cyber protection. So what we do is, again, we are running the solution on cloud, and that is very important. It is not just important for hosting, but in the space of cybersecurity, you need to have a solution, which is not, so where we can update the threat models and we can use the intelligence or the Intel that we gather from our customers, partners, and industry experts and roll it out to our customers within seconds and minutes, because the game is real time in cybersecurity. And that you can only do in cloud where you have the complete telemetry and access to these environments. When we go on-premise traditionally, what you will see is customers are even thinking about pushing the threat models through their standard Dev test life cycle management, and which is just completely defeating the purpose. So in any event, Securonix on the cloud brings together all the data, then runs purpose-built analytics on it. Helps you find very few, we are today pulling in several million events per second from our customers, and we provide just a very small handful of events and reduce the false positives so that people can focus on them. Their security command center can focus on that and then configure response actions on top of that. So we can take action for known issues and have intelligence in all the layers. So that's kind of what the Securonix is focused on. >> Steven, he just brought up, probably the most important story in technology right now. That's ransomware more than, first of all, cybersecurity in general, but ransomware, he mentioned some of the government efforts. Some are saying that the ransomware marketplace is bigger than some governments, nation state governments. There's a business model behind it. It's highly active. It's dominating the scene and it's a real threat. This is the new world we're living in, cloud creates the refactoring capabilities. We're hearing that story here with Securonix. How does Presto and Securonix work together? Because I'm connecting the dots here in real time. I think you're going to go there. So take us through because this is like the most important topic happening. >> Yeah. So as Sachin said, there's all this data that needs to go into the cloud and it's all moving to the cloud. And there's a massive amounts of data and hundreds of terabytes, petabytes of data that's moving into the data lakes and that's the S3-based data lakes, which are the easiest, cheapest, commodified place to put all this data. But in order to deliver the results that Sachin's company is driving, which is intelligence on when there's a ransomware or possibility, you need to have analytics on them. And so Presto is the open source project that is a open source SQL query engine for data lakes and other data sources. It was created by Facebook as part of the Linux foundation, something called Presto foundation. And it was built to replace the complicated Hadoop stack in order to then drive analytics at very lightning fast queries on large, large sets of data. And so Presto fits in with this Open Data Lake analytics movement, which has made Presto one of the fastest growing projects out there. >> What is an Open Data Lake? Real quick for the audience who wants to learn on what it means. Does is it means it's open source in the Linux foundation or open meaning it's open to multiple applications? What does that even mean? >> Yeah. Open Data Lake analytics means that you're, first of all, your data lake has open formats. So it is made up of say something called the ORC or Parquet. And these are formats that any engine can be used against. That's really great, instead of having locked in data types. Data lakes can have all different types of data. It can have unstructured, semi-structured data. It's not just the structured data, which is typically in your data warehouses. There's a lot more data going into the Open Data Lake. And then you can, based on what workload you're looking to get benefit from, the insights come from that, and actually slide two covers this pictorially. If you look on the left here on slide two, the Open Data Lake is where all the data is pulling. And Presto is the layer in between that and the insights which are driven by the visualization, reporting, dashboarding, BI tools or applications like in Securonix case. And so analytics are now being driven by every company for not just industries of security, but it's also for every industry out there, retail, e-commerce, you name it. There's a healthcare, financials, all are looking at driving more analytics for their SaaSified applications as well as for their own internal analysts, data scientists, and folks that are trying to be more data-driven. >> All right. Let's talk about the relationship now with where Presto fits in with Securonix because I get the open data layer. I see value in that. I get also what we're talking about the cloud and being faster with the datasets. So how does, Sachin' Securonix and Ahana fit in together? >> Yeah. Great question. So I'll tell you, we have two customers. I'll give you an example. We have two fortune 10 customers. One has moved most of their operations to the cloud and another customer which is in the process, early stage. The data, the amount of data that we are getting from the customer who's moved fully to the cloud is 20 times, 20 times more than the customer who's in the early stages of moving to the cloud. That is because the ability to add this level of telemetry in the cloud, in this case, it happens to be AWS, Office 365, Salesforce and several other rescalers across several other cloud technologies. But the level of logging that we are able to get the telemetry is unbelievable. So what it does is it allows us to analyze more, protect the customers better, protect them in real time, but there is a cost and scale factor to that. So like I said, when you are trying to pull in billions of events per day from a customer billions of events per day, what the customers are looking for is all of that data goes in, all of data gets enriched so that it makes sense to a normal analyst and all of that data is available for search, sometimes 90 days, sometimes 12 months. And then all of that data is available to be brought back into a searchable format for up to seven years. So think about the amount of data we are dealing with here and we have to provide a solution for this problem at a price that is affordable to the customer and that a medium-sized company as well as a large organization can afford. So after a lot of our analysis on this and again, Securonix is focused on cyber, bringing in the data, analyzing it, so after a lot of our analysis, we zeroed in on S3 as the core bucket where this data needs to be stored because the price point, the reliability, and all the other functions available on top of that. And with that, with S3, we've created a great partnership with AWS as well as with Snowflake that is providing this, from a data lake perspective, a bigger data lake, enterprise data lake perspective. So now for us to be able to provide customers the ability to search that data. So data comes in, we are enriching it. We are putting it in S3 in real time. Now, this is where Presto comes in. In our research, Presto came out as the best search engine to sit on top of S3. The engine is supported by companies like Facebook and Uber, and it is open source. So open source, like you asked the question. So for companies like us, we cannot depend on a very small technology company to offer mission critical capabilities because what if that company gets acquired, et cetera. In the case of open source, we are able to adopt it. We know there is a community behind it and it will be kind of available for us to use and we will be able to contribute in it for the longterm. Number two, from an open source perspective, we have a strong belief that customers own their own data. Traditionally, like Steven used the word locked in, it's a key term, customers have been locked in into proprietary formats in the past and those days are over. You should be, you own the data and you should be able to use it with us and with other systems of choice. So now you get into a data search engine like Presto, which scales independently of the storage. And then when we start looking at Presto, we came across Ahana. So for every open source system, you definitely need a sort of a for-profit company that invests in the community and then that takes the community forward. Because without a company like this, the community will die. So we are very excited about the partnership with Presto and Ahana. And Ahana provides us the ability to take Presto and cloudify it, or make the cloud operations work plus be our conduit to the Ahana community. Help us speed up certain items on the roadmap, help our team contribute to the community as well. And then you have to take a solution like Presto, you have to put it in the cloud, you have to make it scale, you have to put it on Kubernetes. Standard thing that you need to do in today's world to offer it as sort of a micro service into our architecture. So in all of those areas, that's where our partnership is with Ahana and Presto and S3 and we think, this is the search solution for the future. And with something like this, very soon, we will be able to offer our customers 12 months of data, searchable at extremely fast speeds at very reasonable price points and you will own your own data. So it has very significant business benefits for our customers with the technology partnership that we have set up here. So very excited about this. >> Sachin, it's very inspiring, a couple things there. One, decentralize on your own data, having a democratized, that piece is killer. Open source, great point. >> Absolutely. >> Company goes out of business, you don't want to lose the source code or get acquired or whatever. That's a key enabler. And then three, a fast managed service that has a commercial backing behind it. So, a great, and by the way, Snowflake wasn't around a couple of years ago. So like, so this is what we're talking about. This is the cloud scale. Steven, take us home with this point because this is what innovation looks like. Could you share why it's working? What's some of the things that people could walk away with and learn from as the new architecture for the new NextGen cloud is here, so this is a big part of and share how this works? >> That's right. As you heard from Sachin, every company is becoming data-driven and analytics are central to their business. There's more data and it needs to be analyzed at lower cost without the locked in and people want that flexibility. And so a slide three talks about what Ahana cloud for Presto does. It's the best Presto out of the box. It gives you very easy to use for your operations team. So it can be one or two people just managing this and they can get up to speed very quickly in 30 minutes, be up and running. And that jump starts their movement into an Open Data Lake analytics architecture. That architecture is going to be, it is the one that is at Facebook, Uber, Twitter, other large web scale, internet scale companies. And with the amount of data that's occurring, that's now becoming the standard architecture for everyone else in the future. And so just to wrap, we're really excited about making that easy, giving an open source solution because the open source data stack based off of data lake analytics is really happening. >> I got to ask you, you've seen many waves on the industry. Certainly, you've been through the big data waves, Steven. Sachin, you're on the cutting edge and just the cutting edge billions of signals from one client alone is pretty amazing scale and refactoring that value proposition is super important. What's different from 10 years ago when the Hadoop, you mentioned Hadoop earlier, which is RIP, obviously the cloud killed it. We all know that. Everyone kind of knows that. But like, what's different now? I mean, skeptics might say, I don't believe you, but it's just crazy. There's no way it works. S3 costs way too much. Why is this now so much more of an attractive proposition? What do you say the naysayers out there? With Steve, we'll start with you and then Sachin, I want you to like weigh in too. >> Yeah. Well, if you think about the Hadoop era and if you look at slide three, it was a very complicated system that was done mainly on-prem. And you'd have to go and set up a big data team and a rack and stack a bunch of servers and then try to put all this stuff together and candidly, the results and the outcomes of that were very hard to get unless you had the best possible teams and invested a lot of money in this. What you saw in this slide was that, that right hand side which shows the stack. Now you have a separate compute, which is based off of Intel based instances in the cloud. We run the best in that and they're part of the Presto foundation. And that's now data lakes. Now the distributed compute engines are the ones that have become very much easier. So the big difference in what I see is no longer called big data. It's just called data analytics because it's now become commodified as being easy and the bar is much, much lower, so everyone can get the benefit of this across industries, across organizations. I mean, that's good for the world, reduces the security threats, the ransomware, in the case of Securonix and Sachin here. But every company can benefit from this. >> Sachin, this is really as an example in my mind and you can comment too on if you'd believe or not, but replatform with the cloud, that's a no brainer. People do that. They did it. But the value is refactoring in the cloud. It's thinking differently with the assets you have and making sure you're using the right pieces. I mean, there's no brainer, you know it's good. If it costs more money to stand up something than to like get value out of something that's operating at scale, much easier equation. What's your thoughts on this? Go back 10 years and where we are now, what's different? I mean, replatforming, refactoring, all kinds of happening. What's your take on all this? >> Agreed, John. So we have been in business now for about 10 to 11 years. And when we started my hair was all black. Okay. >> John: You're so silly. >> Okay. So this, everything has happened here is the transition from Hadoop to cloud. Okay. This is what the result has been. So people can see it for themselves. So when we started off with deep partnerships with the Hadoop providers and again, Hadoop is the foundation, which has now become EMR and everything else that AWS and other companies have picked up. But when you start with some basic premise, first, the racking and stacking of hardware, companies having to project their entire data volume upfront, bringing the servers and have 50, 100, 500 servers sitting in their data centers. And then when there are spikes in data, or like I said, as you move to the cloud, your data volume will increase between five to 20x and projecting for that. And then think about the agility that it will take you three to six months to bring in new servers and then bring them into the architecture. So big issue. Number two big issue is that the backend of that was built for HDFS. So Hadoop in my mind was built to ingest large amounts of data in batches and then perform some spark jobs on it, some analytics. But we are talking in security about real time, high velocity, high variety data, which has to be available in real time. It wasn't built for that, to be honest. So what was happening is, again, even if you look at the Hadoop companies today as they have kind of figured, kind of define their next generation, they have moved from HDFS to now kind of a cloud based platform capability and have discarded the traditional HDFS architecture because it just wasn't scaling, wasn't searching fast enough, wasn't searching fast enough for hundreds of analysts at the same time. And then obviously, the servers, et cetera wasn't working. Then when we worked with the Hadoop companies, they were always two to three versions behind for the individual services that they had brought together. And again, when you're talking about this kind of a volume, you need to be on the cutting edge always of the technologies underneath that. So even while we were working with them, we had to support our own versions of Kafka, Solr, Zookeeper, et cetera to really bring it together and provide our customers this capability. So now when we have moved to the cloud with solutions like EMR behind us, AWS has invested in in solutions like EMR to make them scalable, to have scale and then scale out, which traditional Hadoop did not provide because they missed the cloud wave. And then on top of that, again, rather than throwing data in that traditional older HDFS format, we are now taking the same format, the parquet format that it supports, putting it in S3 and now making it available and using all the capabilities like you said, the refactoring of that is critical. That rather than on-prem having servers and redundancies with S3, we get built in redundancy. We get built in life cycle management, high degree of confidence data reliability. And then we get all this innovation from companies like, from groups like Presto, companies like Ahana sitting on double that S3. And the last item I would say is in the cloud we are now able to offer multiple, have multiple resilient options on our side. So for example, with us, we still have some premium searching going on with solutions like Solr and Elasticsearch, then you have Presto and Ahana providing majority of our searching, but we still have Athena as a backup in case something goes down in the architecture. Our queries will spin back up to Athena, AWS service on Presto and customers will still get served. So all of these options, but what it doesn't cost us anything, Athena, if we don't use it, but all of these options are not available on-prem. So in my mind, I mean, it's a whole new world we are living in. It is a world where now we have made it possible for companies to even enterprises to even think about having true security data lakes, which are useful and having real-time analytics. From my perspective, I don't even sign up today for a large enterprise that wants to build a data lake on-prem because I know that is not, that is going to be a very difficult project to make it successful. So we've come a long way and there are several details around this that we've kind of endured through the process, but very excited where we are today. >> Well, we certainly follow up with theCUBE on all your your endeavors. Quickly on Ahana, why them, why their solution? In your words, what would be the advice you'd give me if I'm like, okay, I'm looking at this, why do I want to use it, and what's your experience? >> Right. So the standard SQL query engine for data lake analytics, more and more people have more data, want to have something that's based on open source, based on open formats, gives you that flexibility, pay as you go. You only pay for what you use. And so it proved to be the best option for Securonix to create a self-service system that has all the speed and performance and scalability that they need, which is based off of the innovation from the large companies like Facebook, Uber, Twitter. They've all invested heavily. We contribute to the open source project. It's a vibrant community. We encourage people to join the community and even Securonix, we'll be having engineers that are contributing to the project as well. I think, is that right Sachin? Maybe you could share a little bit about your thoughts on being part of the community. >> Yeah. So also why we chose Ahana, like John said. The first reason is you see Steven is always smiling. Okay. >> That's for sure. >> That is very important. I mean, jokes apart, you need a great partner. You need a great partner. You need a partner with a great attitude because this is not a sprint, this is a marathon. So the Ahana founders, Steven, the whole team, they're world-class, they're world-class. The depth that the CTO has, his experience, the depth that Dipti has, who's running the cloud solution. These guys are world-class. They are very involved in the community. We evaluated them from a community perspective. They are very involved. They have the depth of really commercializing an open source solution without making it too commercial. The right balance, where the founding companies like Facebook and Uber, and hopefully Securonix in the future as we contribute more and more will have our say and they act like the right stewards in this journey and then contribute as well. So and then they have chosen the right niche rather than taking portions of the product and making it proprietary. They have put in the effort towards the cloud infrastructure of making that product available easily on the cloud. So I think it's sort of a no-brainer from our side. Once we chose Presto, Ahana was the no-brainer and just the partnership so far has been very exciting and I'm looking forward to great things together. >> Likewise Sachin, thanks so much for that. And we've only found your team, you're world-class as well, and working together and we look forward to working in the community also in the Presto foundation. So thanks for that. >> Guys, great partnership. Great insight and really, this is a great example of cloud scale, cloud value proposition as it unlocks new benefits. Open source, managed services, refactoring the opportunities to create more value. Stephen, Sachin, thank you so much for sharing your story here on open data lakes. Can open always wins in my mind. This is theCUBE we're always open and we're showcasing all the hot startups coming out of the AWS ecosystem for the AWS Startup Showcase. I'm John Furrier, your host. Thanks for watching. (bright music)

Published Date : Jun 24 2021

SUMMARY :

leaders all around the world, of the AWS Startup Showcase. to help us through this, and provide all the what's going on with you guys, in the cloud and making it easy to use. Let's get into the Securonix So in the past, what was So in any event, Securonix on the cloud Some are saying that the and that's the S3-based data in the Linux foundation or open meaning And Presto is the layer in because I get the open data layer. and all the other functions that piece is killer. and learn from as the new architecture for everyone else in the future. obviously the cloud killed it. and the bar is much, much lower, But the value is refactoring in the cloud. So we have been in business and again, Hadoop is the foundation, be the advice you'd give me system that has all the speed The first reason is you see and just the partnership so in the community also in for the AWS Startup Showcase.

ENTITIES

Entity	Category	Confidence
Steven	PERSON	0.99+
Sachin	PERSON	0.99+
John	PERSON	0.99+
Steve	PERSON	0.99+
Securonix	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Steven Mih	PERSON	0.99+
50	QUANTITY	0.99+
Uber	ORGANIZATION	0.99+
2010	DATE	0.99+
Stephen	PERSON	0.99+
Sachin Nayyar	PERSON	0.99+
Facebook	ORGANIZATION	0.99+
20 times	QUANTITY	0.99+
one	QUANTITY	0.99+
12 months	QUANTITY	0.99+
three	QUANTITY	0.99+
Twitter	ORGANIZATION	0.99+
Ahana	PERSON	0.99+
two customers	QUANTITY	0.99+
90 days	QUANTITY	0.99+
Ahana	ORGANIZATION	0.99+
Palo Alto	LOCATION	0.99+
100	QUANTITY	0.99+
30 minutes	QUANTITY	0.99+
Presto	ORGANIZATION	0.99+
hundreds of terabytes	QUANTITY	0.99+
five	QUANTITY	0.99+
First	QUANTITY	0.99+
One	QUANTITY	0.99+
two	QUANTITY	0.99+
hundreds	QUANTITY	0.99+
six months	QUANTITY	0.99+
S3	TITLE	0.99+
Zookeeper	TITLE	0.99+

Krish Prasad, VMware & Paul Turner, VMware | CUBE Conversation, April 2020

[Music] hello and welcome to the Palo Alto students leaky bomb John Farrar we're here for a special cube conversation and special report big news from VMware discuss the launch of the availability of vSphere 7 I'm here with Chris Prasad SVP and general manager of the vSphere business and cloud platform business unit and Paul Turner VP a VP of Product Management guys thanks for coming in and talking about the big news thank you for having us you guys announced some interesting things back in march around containers kubernetes and the vSphere Chris just about the hard news what's being announced today we are announcing the general availability of vSphere 7 John it's by far the biggest release that we have done in the last 10 years we previewed it this project Pacific a few months ago with this release we are putting kubernetes native support into the vSphere platform what that allows us to do is give customers the ability to run both modern applications based on kubernetes and containers as well as traditional VM based applications on the same platform and it also allows the IT departments to provide their developers cloud operating model using the VMware cloud foundation that is powered by this release this is a key part of our tansu portfolio of solutions and products that we announced this year and it is star gated fully at the developers of modern applications and the specific news is vSphere 7 is general available generally vSphere 7 yes ok that so let's on the trend line here the relevance is what what's the big trend line that this is riding obviously we saw the announcements at VMware last year and throughout the year there's a lot of buzz pascal sternness says there's a big wave here with kubernetes what does this announcement mean you guys with the marketplace trend yeah so what kubernetes is really about is people trying to have an agile operation they're trying to modernize their IT applications and they the best way to do that is build off your current platform expand it and make it a an innovative a agile platform for you to run kubernetes applications and VM applications together I'm not just that customers are also looking at being able to manage a hybrid cloud environment both on pram and public cloud together so they want to be able to evolve and modernize their application stack but modernize their infrastructure stack which means hybrid cloud operations with innovative applications kubernetes or container based applications on VMs what's excited about this trend increase we were talking with us at vmworld last year and we've had many conversations around cloud native but you're seeing cloud native becoming the operating model for modern business I mean this is really the move to the cloud if you look at the successful enterprises and even the suppliers the on-premises piece if not move to the cloud native marketplace technologies the on premise isn't effective so it's not so much on premises going away we know it's not but it's turning into cloud native this is the move to the cloud generally this is a big wave yeah absolutely I mean if John if you think about it on-premise we have significant market share by far the leader in the market and so what we are trying to do with this is to allow customers to use the current platform they are using but bring their application modern application development on top of the same platform today customers tend to set up stacks which are different right so you have a kubernetes stack you have a stack for the traditional applications you have operators and administrators who are specialized in kubernetes on one side and you have the traditional VM operators on the other side with this move what we are saying is that you can be on the same common platform you can have the same administrators who are used to administering the environment that you already had and at the same time offer the developers what they like which is kubernetes dial-tone that they can come and deploy their applications on the same platform that you use for traditional applications yeah Paul Paul Pat said kuba is gonna be the dial tone on the Internet most Millennials might even know what dial tone is but if what he meant is is that's the key fabric that's gonna work a straight and you know we've heard over the years skill gap skill gap not a lot of skills out there but when you look at the reality of skills gap it's really about skills gaps and shortages not enough people most CIOs and chief information security are so that we talk to you say I don't want to fork my development teams I don't want to have three separate teams so I don't have to I I want to have automation I want an operating model that's not gonna be fragmented this kind of speaks to this whole idea of you know interoperability and multi-cloud this seems to be the next big way behind ibrid I think it I think it is the next big wake the the thing that customers are looking for is a cloud operating model they like the ability for developers to be able to invoke new services on demand in a very agile way and we want to bring that cloud operating model to on-prem to Google cloud to Amazon Cloud to Microsoft cloud to any of our VC peepee partners you get the same cloud operating experience and it's all driven by a kubernetes based dial-tone it's effective and available within this platform so by bringing a single infrastructure platform that can one run in this hybrid manner and give you the cloud operating agility that developers are looking for that's what's key in version seven says Pat Kelsey near me when he says dial tone of the internet kubernetes does he mean always on or what does he mean specifically just that it's always available what's what's is what's the meaning behind that that phrase no I the first thing he means is that developers can come to the infrastructure which is the VMware cloud foundation and be able to work with a set of api's that are kubernetes api s-- so developers understand that they are looking for that they understand that dial tone right and you come to our VMware cloud foundation once across all these clouds you get the same API said that you can use to deploy that application okay so let's get into the value here of vSphere seven how does VMware vSphere seven specifically help customers isn't just bolting on kubernetes to vSphere some will say is it that's simple or used whether you're running product management no it's not that easy it's yeah some people say hey use bolton kubernetes on vSphere it's it's not that easy so so one of the things if if anybody's actually tried deploying kubernetes first it's highly complicated um so so definitely one of the things that we're bringing is you call it a Bolton but it's certainly not like that we are making it incredibly simple you talked about IT operational shortages customers want to be able to deploy kubernetes environments in a very simple way the easiest way that we can you can do that is take your existing environment that are out ninety percent of IT and just turn on turn on the kubernetes dial tone and it is as simple as that now it's much more than that in version seven as well we're bringing in a couple things that are very important you also have to be able to manage at scale just like you would in the cloud you want to be able to have infrastructure almost self manage and upgrade and lifecycle manage itself and so we're bringing in a new way of managing infrastructure so that you can manage just large scale environments both on-premise and public cloud environments and scale and then associated with that as well is you must make it secure so there's a lot of enhancements we're building into the platform around what we call intrinsic security which is how can we actually build in truly a trusted platform for your developers and IT yeah I mean I was just going to touch on your point about the shortage of IT staff and how we are addressing that here the the way we are addressing that is that the IT administrators that are used to administering vSphere can continue to administer this enhanced platform with kubernetes the same way they administered the older releases so they don't have to learn anything new they are just working the same way we are not changing any tools process technologies it was before same as it was before more capable dealer and developers can come in and they see new capabilities around kubernetes so it's best of both worlds and what was the pain point that you guys are so obviously the ease-of-use is critical Asti operationally I get that as you look at the cloud native developer Saiga's infrastructure as code means as app developers on the other side taking advantage of it what's the real pain point that you guys are solving with vSphere 7 so I think it's it's it's multiple factors so so first is we've we've talked about agility a few times right there is DevOps is a real trend inside an IT organizations they need to be able to build and deliver applications much quicker they need to be able to respond to the business and to do that what they are doing is they need infrastructure that is on demand so what what we're really doing in the core kubernetes kind of enablement is allowing that on-demand fulfillment of infrastructure so you get that agility that you need but it's it's not just tied to modern applications it's also your all of your existing business applications and your modern applications on one platform which means that you know you've got a very simple and and low-cost way of managing large-scale IT infrastructure so that's that's a huge piece as well and and then I I do want to emphasize a couple of other things it's it we're also bringing in new capabilities for AI and ML applications for sa P Hana databases where we can actually scale to some of the largest business applications out there and you have all of the capabilities like like the GPU awareness and FPGA where FPGA awareness that we built into the platform so that you can truly run this as the fastest accelerated platform for your most extreme applications so you've got the ability to run those applications as well as your kubernetes and container based applications that's the accelerate application innovation piece of the announcement right that's right yeah it's it's it's quite powerful that we've actually brought in you know basically new hardware awareness into the product and expose that to your developers whether that's through containers or through VMs which I want to get your thoughts on the ecosystem and then in the community but I want to just dig into one feature you mentioned I get the lifestyle improvement a life cycle improvement I get the application acceleration innovation but the intrinsic security is interesting could you take a minute explain what that is yeah so there's there's a few different aspects one is looking at how can we actually provide a trusted environment and that means that you need to have a way that the key management that even your administrator is not able to get keys to the kingdom as we would call it you you want to have a controlled environment that you know some of the worst security challenges inside and some of the companies has been your in choler internal IT staff so you've got to have a way that you can run a trusted environment and independent we've got these fair trust authority that we released in version 7 that actually gives you a a secure environment for actually managing your keys to the kingdom effectively your certificates so you've got this you know continuous runtime now not only that we've actually gone and taken our carbon black features and we're actually building in full support for carbon black into the platform so that you've got negative security of even your application ecosystem yeah that's been coming up a lot in conversations the carbon black in the security piece chrishelle see these fear everywhere having that operating model makes a lot of sense but you have a lot of touch points you got cloud hyper scale is that the edge you got partners so the other dominant market share and private cloud we are on Amazon as you well know as your Google IBM cloud Oracle cloud so all the major clouds there is a vSphere stack running so it allows customers if you think about it right it allows customers to have the same operating model irrespective of where their workload is residing they can set policies compliance security they said it once it applies to all their environments across this hybrid cloud and it's all fun a supported by our VMware cloud foundation which is powered by vSphere 7 yeah I think having that the cloud is API based having connection points and having that reliable easy to use is critical operating model all right guys so let's summarize the announcement what do you guys take dare take away from this vSphere 7 what is the bottom line what's what's it really mean I I think what we're if we look at it for developers we are democratizing kubernetes we already are in ninety percent of IT environments out there are running vSphere we are bringing to every one of those vSphere environments and all of the virtual infrastructure administrators they can now manage kubernetes environments you can you can manage it by simply upgrading your environment that's a really nice position rather than having independent kind of environments you need to manage so so I think that's that is one of the key things that's in here the other thing though is there is I don't think any other platform out there that other than vSphere that can run in your data center in Google's in Amazon's in Microsoft's in you know thousands of VC PP partners you have one hybrid platform that you can run with and that's got operational benefits that's got efficiency benefits that's got agility benefits yeah I just add to that and say that look we want to meet customers where they are in their journey and we want to enable them to make business decisions without technology getting in the way and I think the announcement that we made today with vSphere 7 is going to help them accelerate their digital transformation journey without making trade-offs on people process and technology and there's more to come look we're laser focused on making our platform the best in the industry for running all kinds of applications and the best platform for a hybrid and multi cloud and so you'll see more capabilities coming in the future stay tuned well one final question on this news announcement which is this awesome vSphere core product for you guys if I'm the customer tell me why it's gonna be important five years from now because of what I just said it is the only platform that is going to be running across all the public clouds right which will allow you to an operational model that is consistent across the clouds so think about it if you go to Amazon native and then you have orc Lord in Azure you're going to have different tools different processes different people trained to work with those clouds but when you come to VMware and you use our cloud foundation you have one operating model across all these environments and that's going to be game-changing great stuff great stuff thanks for unpacking that for us graduates on the insulin thank you at vSphere 7 News special report here inside the cube conversation I'm John Ferger thanks for watching [Music]

Published Date : Apr 2 2020

**Summary and Sentiment Analysis are not been shown because of improper transcript**

ENTITIES

Entity	Category	Confidence
Paul Turner	PERSON	0.99+
April 2020	DATE	0.99+
Chris Prasad	PERSON	0.99+
Pat Kelsey	PERSON	0.99+
John Ferger	PERSON	0.99+
ninety percent	QUANTITY	0.99+
John Farrar	PERSON	0.99+
Paul Paul Pat	PERSON	0.99+
ninety percent	QUANTITY	0.99+
VMware	ORGANIZATION	0.99+
John	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
vSphere 7	TITLE	0.99+
thousands	QUANTITY	0.99+
last year	DATE	0.99+
Amazon	ORGANIZATION	0.99+
today	DATE	0.99+
Google	ORGANIZATION	0.99+
vSphere	TITLE	0.99+
Krish Prasad	PERSON	0.99+
both worlds	QUANTITY	0.97+
one of the key things	QUANTITY	0.97+
vSphere seven	TITLE	0.97+
march	DATE	0.97+
first	QUANTITY	0.97+
one platform	QUANTITY	0.97+
one final question	QUANTITY	0.96+
Saiga	ORGANIZATION	0.96+
this year	DATE	0.95+
Millennials	PERSON	0.95+
both	QUANTITY	0.94+
one feature	QUANTITY	0.94+
VMware vSphere seven	TITLE	0.94+
SVP	PERSON	0.94+
one	QUANTITY	0.93+
a few months ago	DATE	0.93+
Palo Alto	LOCATION	0.92+
orc Lord	ORGANIZATION	0.91+
vSphere 7	ORGANIZATION	0.89+
Azure	TITLE	0.89+
vmworld	ORGANIZATION	0.88+
one side	QUANTITY	0.86+
DevOps	TITLE	0.85+
version seven	OTHER	0.84+
last 10 years	DATE	0.82+
five years	QUANTITY	0.82+
Oracle	ORGANIZATION	0.8+
one of the things	QUANTITY	0.77+
Bolton	ORGANIZATION	0.76+
a minute	QUANTITY	0.74+
three separate teams	QUANTITY	0.72+
first thing	QUANTITY	0.71+
IBM	ORGANIZATION	0.71+
one hybrid	QUANTITY	0.69+
Pacific	ORGANIZATION	0.69+
single infrastructure platform	QUANTITY	0.68+
kuba	PERSON	0.67+
big	EVENT	0.65+
bolton	TITLE	0.64+
Chris	PERSON	0.61+
VMware cloud	TITLE	0.61+
vSphere	ORGANIZATION	0.6+

vSphere Online Launch Event

[Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] hello and welcome to the Palo Alto students leaky bomb John free we're here for a special cube conversation and special report big news from VMware to discuss the launch of the availability of vSphere seven I'm here with Chris Prasad SVP and general manager of the vSphere business and cloud platform business unit and Paul Turner VP a VP of Product Management guys thanks for coming in and talking about the big news thank you for having us you guys announced some interesting things back in march around containers kubernetes and the vSphere there's just about the hard news what's being announced today we are announcing the general availability of vSphere 7 John it's by far the biggest release that we have done in the last 10 years we previewed it this project Pacific a few months ago with this release we are putting kubernetes native support into the vSphere platform what that allows us to do is give customers the ability to run both modern applications based on kubernetes and containers as well as traditional VM based applications on the same platform and it also allows the IT departments to provide their developers cloud operating model using the VMware cloud foundation that is powered by this release this is a key part of our tansu portfolio of solutions and products that we announced this year and it is targeted fully at the developers of modern applications and the specific news is vSphere 7 is general available you know really vSphere 7 yes ok that so let's on the trend line here the relevance is what what's the big trend line that this is riding obviously we saw the announcements at VMworld last year and throughout the year there's a lot of buzz Pat Keller says there's a big wave here with kubernetes what does this announcement mean for you guys with the marketplace trend yeah so what kubernetes is really about is people trying to have an agile operation they're trying to modernize their IT applications and they the best way to do that is build off your current platform expanded and and make it a an innovative a agile platform for you to run kubernetes applications and VM applications together I'm not just that customers are also looking at being able to manage a hybrid cloud environment both on Prem and public cloud together so they want to be able to evolve and modernize their application stack but modernize their infrastructure stack which means hybrid cloud operations with innovative applications kubernetes or container based applications and VMs what's excited about this trend Chris we were talking with us at VMworld last year and we've had many conversations around cloud native but you're seeing cloud native becoming the operating model for modern business I mean this is really the move to the cloud if you look at the successful enterprises even the suppliers the on-premises piece if not move to the cloud native marketplace technologies the on premise isn't effective so it's not so much on premises going away we know it's not but it's turning into cloud native this is the move to the cloud generally this is a big wave yeah absolutely I mean if Jon if you think about it on-premise we have significant market share by far the leader in the market and so what we are trying to do with this is to allow customers to use the current platform they are using but bring their application modern application development on top of the same platform today customers tend to set up stacks which are different right so you have a kubernetes stack you have a stack for the traditional applications you have operators and administrators who are specialized in kubernetes on one side and you have the traditional VM operators on the other side with this move what we are saying is that you can be on the same common platform you can have the same administrators who are used to administering the environment that you already had and at the same time offer the developers what they like which is kubernetes dial-tone that they can come and deploy their applications on the same platform that you use for traditional applications yep all Pat said Cuba is gonna be the dial tone on the internet most Millennials might even know what dial tone is but a buddy mince is is that's the key fabric there's gonna work a straight and you know we've heard over the years skill gap skill gap not a lot of skills out there but when you look at the reality of skills gap it's really about skills gaps and shortages not enough people most CIOs and chief and major security are so that we talk to you say I don't want to fork my development teams I don't want to have three separate teams so I don't have to I want to have automation I want an operating model that's not gonna be fragmented this kind of speaks to this whole idea of you know interoperability and multi-cloud this seems to be the next big way behind ibrid I think it I think it is the next big wake the the thing that customers are looking for is a cloud operating model they like the ability for developers to be able to invoke new services on demand in a very agile way and we want to bring that cloud operating model to on-prem to Google cloud to Amazon Cloud to Microsoft cloud to any of our VC peepee partners you get the same cloud operating experience and it's all driven by a kubernetes based dial-tone it's effective and available within this platform so by bringing a single infrastructure platform that can one run in this hybrid manner and give you the cloud operating agility that developers are looking for that's what's key in version seven says Pat Kelsey near me when he says dial tone of the internet kubernetes does he mean always on or what does he mean specifically just that it's always available what's what says what's the meaning behind that that phrase the the first thing he means is that developers can come to the infrastructure which is the VMware cloud foundation and be able to work with a set of api's that are kubernetes api s-- so developers understand that they're looking for that they understand that dial tone right and you come to our VMware cloud foundation that runs across all these clouds you get the same API said that you can use to deploy their application okay so let's get into the value here of vSphere seven how does VMware vSphere 7 specifically help customers isn't just bolting on kubernetes to vSphere some will say is it that's simple or are you running product management no it's not that easy it's yeah some people say hey just Bolton kubernetes on vSphere it's it's not that easy so so one of the things if if anybody's actually tried deploying kubernetes first it's it's highly complicated um so so definitely one of the things that we're bringing is you call it a bolt on but it's certainly not like that we are making it incredibly simple you talked about IT operational shortages customers want to be able to deploy kubernetes environments in a very simple way the easiest way that we can you can do that is take your existing environment that are out ninety percent of IT and just turn on turn on the kubernetes dial tone and it is as simple as that now it's much more than that in version 7 as well we're bringing in a couple things that are very important you also have to be able to manage at scale just like you would in the cloud you want to be able to have infrastructure almost self-managed and upgrade and lifecycle manage itself and so we're bringing in a new way of managing infrastructure so that you can manage just large scale environments both on-premise and public cloud environments and scale and then associated with that as well is you must make it secure so there's a lot of enhancements we're building into the platform around what we call intrinsic security which is how can we actually build in truly a trusted platform for your developers and IIT yeah I mean I I was just going to touch on your point about the shortage of IT staff and how we are addressing that here the the way we are addressing that is that the IT administrators that are used to administering vSphere can continue to administer this enhanced platform with kubernetes the same way administered the older laces so they don't have to learn anything new they're just working the same way we are not changing any tools process technologies so same as it was before same as it was before more capable dealer and developers can come in and they see new capabilities around kubernetes so it's best of both worlds and what was the pain point that you guys are so obviously the ease-of-use is critical Asti operationally I get that as you look at the cloud native developer Saiga's infrastructure as code means as app developers on the other side taking advantage of it what's the real pain point that you guys are solving with vSphere 7 so I think it's it's it's multiple factors so so first is we've we've talked about agility a few times right there is DevOps as a real trend inside an IT organizations they need to be able to build and deliver applications much quicker they need to be able to respond to the business and to do that what they are doing is is they need infrastructure that is on demand so what what we're really doing in the core kubernetes kind of enablement is allowing that on-demand fulfillment of infrastructure so you get that agility that you need but it's it's not just tied to modern applications it's also your all of your existing business applications and your monitoring applications on one platform which means that you know you've got a very simple and and low-cost way of managing large-scale IT infrastructure so that's a that's a huge piece as well and and then I I do want to emphasize a couple of other things it's we're also bringing in new capabilities for AI and m/l applications for sa P Hana databases where we can actually scale to some of the largest business applications out there and you have all of the capabilities like like the GPU awareness and FPGA were FPGA awareness that we built into the platform so that you can truly run this as the fastest accelerated platform for your most extreme applications so you've got the ability to run those applications as well as your kubernetes and container based applications that's the accelerated application innovation piece of the announcement right that's right yeah it's it's it's quite powerful that we've actually brought in you know basically new hardware awareness into the product and expose that to your developers whether that's through containers or through VMs Chris I want to get your thoughts on the ecosystem and then the community but I want to just dig into one feature you mentioned I get the lifestyle improvement a life cycle improvement I get the application acceleration innovation but the intrinsic security is interesting could you take a minute explain what that is yeah so there's there's a few different aspects one is looking at how can we actually provide a trusted environment and that means that you need to have a way that the the key management that even your administrator is not able to get keys to the kingdom as we would call it you you want to have a controlled environment that you know some of the worst security challenges inside and some of the companies has been your Intel or internal IT staff so you've got to have a way that you can run a trusted environment in independent we've got these fair trust Authority that we released in version 7 that actually gives you a a secure environment for actually managing your keys to the kingdom effectively your certificates so you've got this you know continuous runtime now not only that we've actually gone and taken our carbon black features and we're actually building in full support for carbon black into the platform so that you've got negative security of even your application ecosystem yeah that's been coming up a lot conversations the carbon black in the security piece Chris obviously have vsphere everywhere having that operating model makes a lot of sense but you have a lot of touch points you got cloud hyper scale is got the edge you got partners so the other dominant market share and private cloud we are on Amazon as you well know as your Google IBM cloud Oracle cloud so all the major clouds there is a vSphere stack running so it allows customers if you think about it right it allows customers to have the same operating model irrespective where their workload is residing they can set policies compliance security they said it once it applies to all their environments across this hybrid cloud and it's all for a supported by our VMware cloud foundation which is powered by vSphere 7 yeah I think having that the cloud is API based having connection points and having that reliable easy to use is critical operating model all right guys so let's summarize the announcement what do you guys take Derek take away from this vSphere 7 what is the bottom line what's what's it really mean I think what we're if we look at it for developers we are democratizing kubernetes we already are in 90% of IT environments out there are running vSphere we are bringing to every one of those be sphere environments and all of the virtual infrastructure administrators they can now manage kubernetes environments you can you can manage it by simply upgrading your environment that's a really nice position rather than having independent kind of environments you need to manage so so I think that's that is one of the key things that's in here the other thing though is there is I don't think any other platform out there that other than vSphere that can run in your data center in Google's in Amazon's in Microsoft's in you know thousands of VC PP partners you have one hybrid platform that you can run with and that's got operational benefits that's got efficiency benefits that's got agility benefits yeah I just add to that and say that look we want to meet customers where they are in their journey and we want to enable them to make business decisions without technology getting in the way and I think the announcement that we made today with vSphere 7 is going to help them accelerate their digital transformation journey without making trade-offs on people process and technology and there's more to come that we're laser focused on making our platform the best in the industry for running all kinds of applications and the best platform for a hybrid and multi cloud and so you'll see more capabilities coming in the future stay tuned oh one final question on this news announcement which is this awesome vSphere core product for you guys if I'm the customer tell me why it's gonna be important five years from now because of what I just said it is the only platform that is going to be running across all the public clouds right which will allow you to have an operational model that is consistent across the clouds so think about it if you go to Amazon native and then you have orc Lord and as your you're going to have different tools different processes different people trained to work with those clouds but when you come to VMware and you use our cloud foundation you have one operating model across all these environments and that's going to be game-changing great stuff great stuff thanks for unpacking that for us graduates on the insulin Thank You Vera bees fear 7 News special report here inside the cube conversation I'm John Farrar your thanks for watch [Music] and welcome back everybody Jeff Rick here with the cube we are having a very special Q conversation and kind of the the ongoing unveil if you will of the new VMware vSphere 7 dot gonna get a little bit more of a technical deep dive here today we're excited to have a longtime cube alumni kit Kolbert here is the vp and CTO cloud platform at being work it great to see you yeah and and new to the cube jared rose off he's a senior director of product management at VMware and I'm guessin had a whole lot to do with this build so Jared first off congratulations for birthing this new release and great to have you on board alright so let's just jump into it from kind of a technical aspect what is so different about vSphere seven yeah great so vSphere seven baek's kubernetes right into the virtualization platform and so this means that as a developer I can now use kubernetes to actually provision and control workloads inside of my vSphere environment and it means as an IT admin I'm actually able to deliver kubernetes and containers to my developers really easily right on top of the platform I already run so I think we had kind of a sneaking suspicion that that might be coming when the with the acquisition of the hefty Oh team so really exciting news and I think it you tease it out quite a bit at VMware last year about really enabling customers to deploy workloads across environments regardless of whether that's on Prem public cloud this public cloud that public cloud so this really is the the realization of that vision yes yeah so we talked at VMworld about project Pacific right this technology preview and as Jared mentioned of what that was was how do we take kubernetes and really build it into vSphere as you know we had a hybrid cloud vision for quite a while now how do we proliferate vSphere to as many different locations as possible now part of the broader VMware cloud foundation portfolio and you know as we've gotten more and more of these instances in the cloud on-premises at the edge with service providers there's a secondary question how do we actually evolve that platform so it can support not just the existing workloads but also modern work clothes as well right all right so I think you brought some pictures for us a little demo so why don't ya well into there and let's see what it looks like you guys can cube the demo yes we're gonna start off looking at a developer actually working with the new VMware cloud foundation for an vSphere 7 so what you're seeing here is the developers actually using kubernetes to deploy kubernetes the self eating watermelon right so the developer uses this kubernetes declarative syntax where they can describe a whole kubernetes cluster and the whole developer experience now is driven by kubernetes they can use the coop control tool and all of the ecosystem of kubernetes api is and tool chains to provision workloads right into vSphere and so you know that's not just provisioning workloads though this is also key to the developer being able to explore the things they've already deployed so go look at hey what's the IP address that got allocated to that or what's the CPU load on this you know workload I just deployed on top of kubernetes we've integrated a container registry into vSphere so here we see a developer pushing and pulling container images and you know one of the amazing things about this is from an infrastructure as code standpoint now the developers infrastructure as well as their software is all unified in source control I can check in not just my code but also the description of the kubernetes environment and storage and networking and all the things that are required to run that app so now we're looking at a sort of a side-by-side view where on the right hand side is the developer continuing to deploy some pieces of their application and on the left-hand side we see V Center and what's key here is that as the developer deploys new things through kubernetes those are showing up right inside of the V center console and so the developer and IT are seeing exactly the same things with the same names and so this means what a developer calls their IT department says hey I got a problem with my database we don't spend the next hour trying to figure out which VM they're talking about they got the same name they say they see the same information so what we're gonna do is that you know we're gonna push the the developer screen aside and start digging into the vSphere experience and you know what you'll see here is that V Center is the V Center you've already known and loved but what's different is that now it's much more application focused so here we see a new screen inside of V Center vSphere namespaces and so these vSphere namespaces represent whole logical applications like a whole distributed system now as a single object inside a V Center and when I click into one of these apps this is a managed object inside of e spear I can click on permissions and I can decide which developers have the permission to deploy or read the configuration of one of these namespaces I can hook this into my Active Directory infrastructure so I can use the same you know corporate credentials to access the system I tap into all my existing storage so you know this platform works with all of the existing vSphere storage providers can use storage policy based management to provide storage for kubernetes and it's hooked in with things like DRS right so I can define quotas and limits for CPU and memory and all that's going to be enforced by Drs inside the cluster and again as an as an admin I'm just using vSphere but to the developer they're getting a whole kubernetes experience out of this platform now vSphere also now sucks in all this information from the kubernetes environment so besides you know seeing the VMS and and things that developers have deployed I can see all of the desired state specifications all the different kubernetes objects that the developers have created the compute network and storage objects they're all integrated right inside the the vCenter console and so once again from a diagnostics and troubleshooting perspective this data is invaluable it often saves hours just in trying to figure out what what we're even talking about when we're trying to resolve an issue so the you know as you can see this is all baked right into V Center the V Center experience isn't transformed a lot we get a lot of VI admins who look at this and say where's the kubernetes and they're surprised that like they've been managing kubernetes all this time it just looks it looks like the vSphere experience they've already got but all those kubernetes objects the pods and containers kubernetes clusters load balancer stores they're all represented right there natively in the V Center UI and so we're able to take all of that and make it work for your existing VI admins well that's a it's pretty it's pretty wild you know it really builds off the vision that again I think you kind of outlined kid teased out it at VMworld which was you know the IT still sees vSphere which is what they want to see when they're used to seeing but devs siku Nettie's and really bringing those together in a unified environment so that depending on what your job is and what you're working on that's what you're gonna see in this kind of unified environment yeah yeah as the demo showed it is still vSphere at the center but now there's two different experiences that you can have interacting with vSphere the kubernetes base one which is of course great for developers and DevOps type folks as well as the traditional vSphere interface API is which is great for VI admins and IT operations right and then and really it was interesting to you tease that a lot that was a good little preview of people knew they're watching but you talked about really cloud journey and and kind of this bifurcation of kind of classical school apps that are that are running in their classic memes and then kind of the modern you know county cloud native applications built on kubernetes and youyou outlined a really interesting thing that people often talk about the two ends of the spectrum and getting from one to the other but not really about kind of the messy middle if you will and this is really enabling people to pick where along that spectrum they can move their workloads or move their apps ya know I think we think a lot about it like that that we look at we talk to customers and all of them have very clear visions on where they want to go their future state architecture and that involves embracing cloud it involves modernizing applications and you know as you mentioned that it's it's challenging for them because I think what a lot of customers see is this kind of these two extremes either you're here where you are kind of the old current world and you got the bright Nirvana future on the far end there and they believe it's the only way to get there is to kind of make a leap from one side to the other that you have to kind of change everything out from underneath you and that's obviously very expensive very time-consuming and very error-prone as well there's a lot of things that can go wrong there and so I think what we're doing differently at VMware is really to your point as you call it the the messy middle I would say it's more like how do we offer stepping stones along that journey rather than making this one giant leap we had to invest all this time and resources how come you able people to make smaller incremental steps each of which have a lot of business value but don't have a huge amount of cost right and its really enabling kind of this next gen application where there's a lot of things that are different about about one of the fundamental things is we're now the application defines a reach sources that it needs to operate versus the resources defining kind of the capabilities of what the what the application can't do and that's where everybody is moving as quickly as as makes sense you said not all applications need to make that move but most of them should and most of them are and most of them are at least making that journey you see that yeah definitely I mean I think that you know certainly this is one of the big evolutions we're making in vSphere from you know looking historically at how we managed infrastructure one of things we enable in VCR 7 is how we manage applications right so a lot of the things you would do in infrastructure management of setting up security rules or encryption settings or you know your your resource allocation you would do this in terms of your physical and virtual infrastructure you talk about it in terms of this VM is going to be encrypted or this VM is gonna have this firewall rule and what we do in vSphere 7 is elevate all of that to application centric management so you actually look at an application and say I want this application to be constrained to this much CPU or I want this application to be have these security rules on it and so that shifts the focus of management really up to the application level right yeah and like I kind of even zoom back a little bit there and say you know if you look back one thing we did was something like V San before that people had to put policies on a LUN you know an actual storage LUN and a storage array and then by virtue of a workload being placed on that array it inherited certain policies right and so these have turned that around allows you to put the policy on the VM but what jerez talking about now is that for a modern workload a modern were close not a single VM it's it's a collection of different things you've got some containers in there some VMs probably distributed maybe even some on-prem some in the cloud and so how do you start managing that more holistically and this notion of really having an application as a first-class entity that you can now manage inside of vSphere it's really powerful and very simplifying one right and why this is important is because it's this application centric point of view which enables the digital transformation that people are talking about all the time that's it's a nice big word but the rubber hits the road is how do you execute and deliver applications and more importantly how do you continue to evolve them and change them you know based on either customer demands or competitive demands or just changes in the marketplace yeah well you look at something like a modern app that maybe has a hundred VMs that are part of it and you take something like compliance right so today if I want to check of this app is compliant I got to go look at every individual VM and make sure it's locked down and hardened and secured the right way but now instead what I can do is I can just look at that one application object inside of each Center set the right security settings on that and I can be assured that all the different objects inside of it are gonna inherit that stuff so it really simplifies that it also makes it so that that admin can handle much larger applications you know if you think about vCenter today you might log in and see a thousand VMs in your inventory when you log in with vSphere seven what you see is a few dozen applications so a single admin can manage a much larger pool of infrastructure many more applications and they could before because we automate so much of that operation and it's not just the scale part which is obviously really important but it's also the rate of change and this notion of how do we enable developers to get what they want to get done done ie building applications well at the same time enabling the IT operations teams to put the right sort of guardrails in place around compliance and security performance concerns these sorts of elements and so being by being able to have the IT operations team really manage that logical application at that more abstract level and then have the developer be able to push in new containers or new VMs or whatever they need inside of that abstraction it actually allows those two teams to work actually together and work together better they're not stepping over each other but in fact now they can both get what they need to get done done and do so as quickly as possible but while also being safe and in compliance is ready fourth so there's a lot more to this is a very significant release right again a lot of foreshadowing if you go out and read the tea leaves that's a pretty significant you know kind of RER context or many many parts of ease of beer so beyond the kubernetes you know kind of what are some of the other things that are coming out and there's a very significant release yeah it's a great question because we tend to talk a lot about kubernetes what was project Pacific but is now just part of vSphere and certainly that is a very large aspect of it but to your point you know vSphere 7 is a massive release with all sorts of other features and so instead of a demo here let's pull up with some slides right look at what's there so outside of kubernetes there's kind of three main categories that we think about when we look at vSphere seven so the first first one is simplified lifecycle management and then really focus on security it's a second one and then applications as well out both including you know the cloud native apps that don't fit in the kubernetes bucket as well as others and so we go on the first one the first column there there's a ton of stuff that we're doing around simplifying life cycle so let's go to the next slide here where we can dive in a little bit more to the specifics so we have this new technology vSphere lifecycle management VL cm and the idea here is how do we dramatically simplify upgrades lifecycle management of the ESX clusters and ESX hosts how do we make them more declarative with a single image you can now specify for an entire cluster we find that a lot of our vSphere admins especially at larger scales have a really tough time doing this there's a lot of in and out today it's somewhat tricky to do and so we want to make it really really simple and really easy to automate as well so if you're doing kubernetes on kubernetes I suppose you're gonna have automation on automation right because they're upgrading to the sevens is probably not any consequent inconsequential tasks mm-hm and yeah and going forward and allowing you know as we start moving to deliver a lot of this great VCR functionality at a more rapid clip how do we enable our customers to take advantage of all those great things we're putting out there as well right next big thing you talk about is security yep we just got back from RSA thank goodness we got that that show in before all the badness started yeah but everyone always talked about security's got to be baked in from the bottom to the top yeah talk about kind of the the changes in the security so done a lot of things around security things around identity Federation things around simplifying certificate management you know dramatic simplifications there across the board one I want to focus on here on the next slide is actually what we call vSphere trust Authority and so with that one what we're looking at here is how do we reduce the potential attack surfaces and really ensure there's a trusted computing base when we talk to customers what we find is that they're nervous about a lot of different threats including even internal ones right how do they know all the folks that work for them can be fully trusted and obviously if you're hiring someone you somewhat trust them but you know what what's how do you implement that the concept of least privilege right or zero trust right yeah topic exactly so the idea with trust authorities that we can specify a small number of physical ESX hosts that you can really lock down and sure fully secure those can be managed by a special vCenter server which is in turn very lockdown only a few people have access to it and then those hosts and that vCenter can then manage other hosts that are untrusted and can use attestation to actually prove that okay these untrusted hosts haven't been modified we know they're okay so they're okay to actually run workloads on they're okay to put data on and that sort of thing so is this kind of like building block approach to ensure that businesses can have a very small trust base off of which they can build to include their entire vSphere environment right and then the third kind of leg of the stool is you know just better leveraging you know kind of a more complex asset ecosystem if you know what things like FPGAs and GPUs and you know kind of all of the various components that power these different applications which now the application could draw the appropriate resources as needed so you've done a lot of work here as well yeah there's a ton of innovation happening in the hardware space as you mentioned all sort of accelerators coming out we all know about GPUs and obviously what they can do for machine learning and AI type use cases not to mention 3d rendering but you know FPGAs and all sorts of other things coming down the pike as well there and so what we found is that as customers try to roll these out they have a lot of the same problems that we saw in the very early days of virtualization ie silos of specialized hardware that different teams were using and you know what you find is all things we found before you found we find very low utilization rates inability to automate that inability to manage that well putting security and compliance and so forth and so this is really the reality that we see at most customers and it's funny because and some ones you think well well shouldn't we be past this as an industry shouldn't we have solved this already you know we did this with virtualization but as it turns out the virtualization we did was for compute and then storage and network but now we really needed to virtualize all these accelerators and so that's where this bit fusion technology that we're including now with vSphere it really comes to the forefront so if you see in the current slide we're showing here the challenge is that just these separate pools of infrastructure how do you manage all that and so if you go to the we go to the next slide what we see is that with bit fusion you can do the same thing that we saw with compute virtualization you can now pool all these different silos infrastructure together so they become one big pool of GPUs of infrastructure that anyone in an organization can use we can you know have multiple people sharing a GPU we can do it very dynamically and the great part of it is is that it's really easy for these folks to use they don't even need to think about it in fact integrates seamlessly with their existing workflows so it's pretty it's pretty trick is because the classifications of the assets now are much much larger much varied and much more workload specific right that's really the opportunities flash they are they're good guys are diverse yeah and so like you know a couple other things just I don't have a slide on it but just things we're doing to our base capabilities things around DRS and vmotion really massive evolutions there as well to support a lot of these bigger workloads right so you look at some of the massive sa P Hana or Oracle databases and how do we ensure that the emotion can scale to handle those without impacting their performance or anything else they're making DRS smarter about how it does load balancing and so forth right now a lot of this stuff not just kind of brand new cool new accelerator stuff but it's also how do we ensure the core ass people have already been running for many years we continue to keep up with the innovation and scale there as well right all right so do I give you the last word you've been working on this for a while there's a whole bunch of admins that have to sit and punch keys what do you what do you tell them what should they be excited about what are you excited for them in this new release I think what I'm excited about is how you know IT can really be an enabler of the transformation of modern apps right I think today you look at a lot of these organizations and what ends up happening is the app team ends up sort of building their own infrastructure on top of IT infrastructure right and so now I think we can shift that story around I think that there's you know there's an interesting conversation that a lot of IT departments and appdev teams are gonna be having over the next couple years about how do we really offload some of these infrastructure tasks from the dev team make you more productive give you better performance availability disaster recovery and these kinds of capabilities awesome well Jared congratulations that get both of you for forgetting to release out I'm sure it was a heavy lift and it's always good to get it out in the world and let people play with it and thanks for for sharing a little bit more of a technical deep dive I'm sure there's ton more resources from people I even want to go down into the weeds so thanks for stopping by thank you thank you all right ease Jared he's kid I'm Jeff you're watching the cube we're in the Palo Alto studios thanks for watching we'll see you next time [Music] hi and welcome to a special cube conversation I'm Stu min a minute and we're digging into VMware vSphere seven announcement we've had conversations with some of the executives some of the technical people but we know that there's no better way to really understand a technology than to talk to some of the practitioners that are using it so really happy to have joined me for the program I have Bill Buckley Miller who is in infrastructure designer with British Telecom joining me digitally from across the pond bill thanks so much for joining us nice - all right so Phil let's start of course British Telecom I think most people know you know what BT is and it's a you know a really sprawling company tell us a little bit about you know your group your role and what's your mandate okay so my group it's called service platforms it's the bit of BT that services all of our multi millions of our customers so they we have broadband we have TV we have mobile we have DNS and email systems and one and it's all about our customers it's not a B to be part of BT you with me we we specifically focus on those kind of multi million customers that we've got in those various services I'm in particular my group is for we do infrastructure so we really do from data center all the way up to really about boot time or so we'll just past boot time and the application developers look after that stage and above okay great we definitely gonna want to dig in and talk about that that boundary between the infrastructure teams and the application teams but let's talk a little bit first you know we're talking about VMware so you know how long's your organization been doing VMware and tell us you know what you see with the announcement that VMware's making work BC or seven sure well I mean we've had a really great relationship with VMware for about twelve thirteen years something like that and it's a absolutely key part of our of our infrastructure it's written throughout BT really in every part of our operations design development and the whole ethos of the company is based around a lot of VMware products and so one of the challenges that we've got right now is application architectures are changing quite significantly at the moment and as you know in particular with serving us and with containers and a whole bunch of other things like that we're very comfortable with our ability to manage VMs and have been for a while we currently use extensively we use vSphere NSX t.v raps log insight network insight and a whole bunch of other VMware constellation applications and our operations teams know how to use that they know how to optimize they know how to capacity plan and troubleshoot so that's that's great and that's been like that for a half a decade at least we've been really really confident with our ability to still with Yemen where environments and Along Came containers and like I say multi cloud as well and what we were struggling with was the inability to have a cell pane a glass really on all of that and to use the same people and the same same processes to manage a different kind of technology so we we'd be working pretty closely with VMware on a number of different containerization products for several years now I would really closely with the b-string integrated containers guys in particular and now with the Pacific guys with really the idea that when we we bring in version 7 and the containerization aspects of version 7 we'll be in a position to have that single pane of glass to allow our operations team to really barely differentiate between what's a VM and what's a container that's really the holy grail right so we'll be able to allow our developers to develop our operations team to deploy and to operate and our designers to see the same infrastructure whether that's on premises cloud or off premises and be able to manage the whole piece in that was bad ok so Phil really interesting things you walked through here you've been using containers in a virtualized environment for a number of years want to understand in the organizational piece just a little bit because it sounds I manage all the environment but you know containers are a little bit different than VMs you know if I think back you know from an application standpoint it was you know let's stick it in a vm I don't need to change it and once I spin up a VM often that's gonna sit there for you know months if not years as opposed to you know I think about a containerization environment it's you know I really want a pool of resources I'm gonna create and destroy things all the time so you know bring us inside that organizational piece you know how much will there need to be interaction and more interaction or change in policies between your infrastructure team and your app dev team well yes making absolutely right that's the nature and that the time scales that were talking about between VMs and containers oh he's wildly different as you say we we probably oughta certainly have VMs in place now that were in place in 2000 and 2018 certainly but I imagine I haven't haven't really been touched whereas as you say VMs and a lot of people talk about spinning them all up all the time there are parts of our architecture that require that in particular the very client facing bursty stuff it you know does require spinning up spinning down pretty quickly but some of our smaller the containers do sit around for weeks if not if not months I really just depend on the development cycle aspects of that but the heartbeat that we've we've really had was just the visualizing it and there are a number different products out there that allow you to see the behavior of your containers and understand the resource requirements that they are having at any given moment allows troubleshoot and so on but they are not they need their new products their new things that we we would have to get used to and also it seems that there's an awful lot of competing products quite a Venn diagram if in terms of functionality and user abilities to do that so through again again coming back to being able to manage through vSphere to be able to have a list of VMs and alongside it is a list of containers and to be able to use policies to define how the behave in terms of their networking to be able to essentially put our deployments on Rails by using in particular tag based policies means that we can take the onus of security we can take the onus of performance management and capacity management away from the developers you don't really care about a lot of time and they can just get on with their job which is to develop new functionality and help our customers so that then means that then we have to be really responsible about defining those policies and making sure that they're adhered to but again we know how to do that with VMs new visa so the fact that we can actually apply that straightaway just to add slightly different completely unit which is really what we're talking about here is ideal and then to be able to extend that into multiple clouds as well because we do use multiple cards where AWS and as your customers and were between them is an opportunity that we can't do anything of them be you know excited about take oh yeah still I really like how you described it really the changing roles that are happening there in your organization need to understand right there's things that developers care about you know they want to move fast they want to be able to build new things and there's things that they shouldn't have to worry about and you know we talked about some of the new world and it's like oh can the platform underneath this take care of it well there there's some things platforms take care of there's some things that the software or you know your theme is going to need to understand so maybe if you could dig in a little bit some of those what are the drivers from your application portfolio what is the business asking of your organization that that's driving this change and you know being one of those you know tailwind pushing you towards you know kubernetes and the the vSphere 7 technologies well it all comes down with the customers right our customers want new functionality they want new integrations they want new content and they want better stability and better performance and our ability to extend or contracting capacity as needed as well so they're the real ultimate we want to give our customers the best possible experience of our products and services so we have to address that really from a development perspective it's our developers that have the responsibility to design them to deploy those so we have to in infrastructure we have to act as a firm foundation really underneath all of that that allows them to know that what they spend their time and develop and want to push out to our customers is something that can be trusted as performant we understand where their capacity requirements are coming from in in the short term and in the long term for that and it's secure as well obviously is a big aspect to it so really we're just providing our developers with the best possible chance of giving our customers what will hopefully make them delighted great Phil you've mentioned a couple of times that you're using public clouds as well as you know your your your your VMware farm one of make sure I if you can explain a little bit a couple of things number one is when it comes to your team especially your infrastructure team how much are they involved with setting up some of the the basic pieces or managing things like performance in the public cloud and secondly when you look at your applications are some of your clouds some of your applications hybrid going between the data center and the public cloud and I haven't talked to too many customers that are doing applications that just live in any cloud and move things around but you know maybe if you could clarify those pieces as to you know what cloud really means to your organization and your applications sure well I mean to us climate allows us to accelerate development she's nice because it means we don't have to do on-premises capacity lifts for new pieces of functionality or so we can initially build in the cloud and test in the cloud but very often applications really make better sense especially in the TV environment where people watch TV all the time I mean yes there are peak hours and lighter hours of TV watching same goes for broadband really but we generally we're well more than an eight-hour application profile so what that allows us to do then is to have well it makes sense we run them inside our organization where we have to run them in our organization for you know data protection reasons or whatever then we can do that as well but where we say for instance we have a boxing match on and we're going to be seen enormous spike in the amount of customers that want to sign up into our order journey for to allow them to view that and to gain access to that well why would you spend a lot of money on servers just for that level of additional capacity so we do absolutely have hybrid applications not sorry hybrid blocks we have blocks of suburb locations you know dozens of them really to support oil platform and what you would see is that if you were to look at our full application structure for one of the platform as I mentioned that some of the smoothers application blocks I have to run inside some can run outside and what we want to be able to do is to allow our operations team to define that again by policy as to where they run and to you know have a system that allows us to transparently see where they're running how they're running and the implications of those decisions so that we can tune those maybe in the future as well and that way we best serve our customers we you know we get to get our customers yeah what they need all right great Phil final question I have for you you've been through a few iterations of looking at VMS containers public cloud what what advice would you give your peers with the announcement of vSphere 7 and how they can look at things today in 2020 versus what they might have looked at say a year or two ago well I'll be honest I was a little bit surprised by vSphere so we knew that VMware we're working on trying to make containers on the same level both from a management deployment perspective as we MS I mean they're called VMware after all we knew that they were looking it's no surprise by just quite how quickly they've managed to almost completely reinvent their application really it's you know if you look at the whole tansy stuff from the Mission Control stuff I think a lot of people were blown away by just quite how happy VMware were to reinvent themselves and from an application perspective you know and to really leap forward and this is the very between version six and seven I've been following these since version three at least and it's an absolutely revolutionary change in terms of the overall architecture the aims to - what they want to achieve with the application and you know luckily the nice thing is is that if you're used to version six is not that big a deal it's really not that big a deal to move forward at all it's not such a big change to process and training and things like that but my word there's no awful lot of work underneath that underneath the covers and I'm really excited and I think other people in my position should really just take it as an opportunity to really revisit what they can achieve with them in particular with vSphere and with in combination with and SXT it's it's but you know it's quite hard to put into place unless you've seen the slide or slides about it and useless you've seen the products just how revolutionary the the version 7 is compared to previous revisions which have kind of evolved for a couple of years so yeah I think I'm really excited to run it and know a lot of my peers other companies that I speak with quite often are very excited about seven as well so yeah I'm really excited about the whole ball base well Phil thank you so much absolutely no doubt this is a huge move for VMware the entire company and their ecosystem rallying around helped move to the next phase of where application developers and infrastructure need to go Phil Buckley joining us from British Telecom I'm Stu minimun thank you so much for watching the queue

Published Date : Apr 1 2020

SUMMARY :

really the move to the cloud if you look

ENTITIES

Entity	Category	Confidence
Paul Turner	PERSON	0.99+
Jared	PERSON	0.99+
John Farrar	PERSON	0.99+
90%	QUANTITY	0.99+
Pat Keller	PERSON	0.99+
2000	DATE	0.99+
Jeff Rick	PERSON	0.99+
Chris Prasad	PERSON	0.99+
Chris	PERSON	0.99+
Phil Buckley	PERSON	0.99+
British Telecom	ORGANIZATION	0.99+
Pat Kelsey	PERSON	0.99+
Jeff	PERSON	0.99+
2018	DATE	0.99+
Microsoft	ORGANIZATION	0.99+
Phil	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
two teams	QUANTITY	0.99+
ninety percent	QUANTITY	0.99+
Bill Buckley Miller	PERSON	0.99+
last year	DATE	0.99+
vSphere 7	TITLE	0.99+
AWS	ORGANIZATION	0.99+
VMworld	ORGANIZATION	0.99+
vSphere	TITLE	0.99+
vSphere seven	TITLE	0.99+
vSphere	ORGANIZATION	0.99+
Derek	PERSON	0.99+
Google	ORGANIZATION	0.99+
last year	DATE	0.99+
ESX	TITLE	0.99+
thousands	QUANTITY	0.99+
two extremes	QUANTITY	0.99+
VCR 7	TITLE	0.99+
jared	PERSON	0.99+
VMware	ORGANIZATION	0.99+
each Center	QUANTITY	0.98+
today	DATE	0.98+
three main categories	QUANTITY	0.98+
2020	DATE	0.98+
about twelve thirteen years	QUANTITY	0.98+
first	QUANTITY	0.98+
BT	ORGANIZATION	0.98+
Pat	PERSON	0.98+
first one	QUANTITY	0.98+
both	QUANTITY	0.98+
dozens	QUANTITY	0.98+
second one	QUANTITY	0.98+
two different experiences	QUANTITY	0.97+
first column	QUANTITY	0.97+
one application	QUANTITY	0.97+
eight-hour	QUANTITY	0.97+

UNLIST TILL 4/2 - Vertica Big Data Conference Keynote

>> Joy: Welcome to the Virtual Big Data Conference. Vertica is so excited to host this event. I'm Joy King, and I'll be your host for today's Big Data Conference Keynote Session. It's my honor and my genuine pleasure to lead Vertica's product and go-to-market strategy. And I'm so lucky to have a passionate and committed team who turned our Vertica BDC event, into a virtual event in a very short amount of time. I want to thank the thousands of people, and yes, that's our true number who have registered to attend this virtual event. We were determined to balance your health, safety and your peace of mind with the excitement of the Vertica BDC. This is a very unique event. Because as I hope you all know, we focus on engineering and architecture, best practice sharing and customer stories that will educate and inspire everyone. I also want to thank our top sponsors for the virtual BDC, Arrow, and Pure Storage. Our partnerships are so important to us and to everyone in the audience. Because together, we get things done faster and better. Now for today's keynote, you'll hear from three very important and energizing speakers. First, Colin Mahony, our SVP and General Manager for Vertica, will talk about the market trends that Vertica is betting on to win for our customers. And he'll share the exciting news about our Vertica 10 announcement and how this will benefit our customers. Then you'll hear from Amy Fowler, VP of strategy and solutions for FlashBlade at Pure Storage. Our partnership with Pure Storage is truly unique in the industry, because together modern infrastructure from Pure powers modern analytics from Vertica. And then you'll hear from John Yovanovich, Director of IT at AT&T, who will tell you about the Pure Vertica Symphony that plays live every day at AT&T. Here we go, Colin, over to you. >> Colin: Well, thanks a lot joy. And, I want to echo Joy's thanks to our sponsors, and so many of you who have helped make this happen. This is not an easy time for anyone. We were certainly looking forward to getting together in person in Boston during the Vertica Big Data Conference and Winning with Data. But I think all of you and our team have done a great job, scrambling and putting together a terrific virtual event. So really appreciate your time. I also want to remind people that we will make both the slides and the full recording available after this. So for any of those who weren't able to join live, that is still going to be available. Well, things have been pretty exciting here. And in the analytic space in general, certainly for Vertica, there's a lot happening. There are a lot of problems to solve, a lot of opportunities to make things better, and a lot of data that can really make every business stronger, more efficient, and frankly, more differentiated. For Vertica, though, we know that focusing on the challenges that we can directly address with our platform, and our people, and where we can actually make the biggest difference is where we ought to be putting our energy and our resources. I think one of the things that has made Vertica so strong over the years is our ability to focus on those areas where we can make a great difference. So for us as we look at the market, and we look at where we play, there are really three recent and some not so recent, but certainly picking up a lot of the market trends that have become critical for every industry that wants to Win Big With Data. We've heard this loud and clear from our customers and from the analysts that cover the market. If I were to summarize these three areas, this really is the core focus for us right now. We know that there's massive data growth. And if we can unify the data silos so that people can really take advantage of that data, we can make a huge difference. We know that public clouds offer tremendous advantages, but we also know that balance and flexibility is critical. And we all need the benefit that machine learning for all the types up to the end data science. We all need the benefits that they can bring to every single use case, but only if it can really be operationalized at scale, accurate and in real time. And the power of Vertica is, of course, how we're able to bring so many of these things together. Let me talk a little bit more about some of these trends. So one of the first industry trends that we've all been following probably now for over the last decade, is Hadoop and specifically HDFS. So many companies have invested, time, money, more importantly, people in leveraging the opportunity that HDFS brought to the market. HDFS is really part of a much broader storage disruption that we'll talk a little bit more about, more broadly than HDFS. But HDFS itself was really designed for petabytes of data, leveraging low cost commodity hardware and the ability to capture a wide variety of data formats, from a wide variety of data sources and applications. And I think what people really wanted, was to store that data before having to define exactly what structures they should go into. So over the last decade or so, the focus for most organizations is figuring out how to capture, store and frankly manage that data. And as a platform to do that, I think, Hadoop was pretty good. It certainly changed the way that a lot of enterprises think about their data and where it's locked up. In parallel with Hadoop, particularly over the last five years, Cloud Object Storage has also given every organization another option for collecting, storing and managing even more data. That has led to a huge growth in data storage, obviously, up on public clouds like Amazon and their S3, Google Cloud Storage and Azure Blob Storage just to name a few. And then when you consider regional and local object storage offered by cloud vendors all over the world, the explosion of that data, in leveraging this type of object storage is very real. And I think, as I mentioned, it's just part of this broader storage disruption that's been going on. But with all this growth in the data, in all these new places to put this data, every organization we talk to is facing even more challenges now around the data silo. Sure the data silos certainly getting bigger. And hopefully they're getting cheaper per bit. But as I said, the focus has really been on collecting, storing and managing the data. But between the new data lakes and many different cloud object storage combined with all sorts of data types from the complexity of managing all this, getting that business value has been very limited. This actually takes me to big bet number one for Team Vertica, which is to unify the data. Our goal, and some of the announcements we have made today plus roadmap announcements I'll share with you throughout this presentation. Our goal is to ensure that all the time, money and effort that has gone into storing that data, all the data turns into business value. So how are we going to do that? With a unified analytics platform that analyzes the data wherever it is HDFS, Cloud Object Storage, External tables in an any format ORC, Parquet, JSON, and of course, our own Native Roth Vertica format. Analyze the data in the right place in the right format, using a single unified tool. This is something that Vertica has always been committed to, and you'll see in some of our announcements today, we're just doubling down on that commitment. Let's talk a little bit more about the public cloud. This is certainly the second trend. It's the second wave maybe of data disruption with object storage. And there's a lot of advantages when it comes to public cloud. There's no question that the public clouds give rapid access to compute storage with the added benefit of eliminating data center maintenance that so many companies, want to get out of themselves. But maybe the biggest advantage that I see is the architectural innovation. The public clouds have introduced so many methodologies around how to provision quickly, separating compute and storage and really dialing-in the exact needs on demand, as you change workloads. When public clouds began, it made a lot of sense for the cloud providers and their customers to charge and pay for compute and storage in the ratio that each use case demanded. And I think you're seeing that trend, proliferate all over the place, not just up in public cloud. That architecture itself is really becoming the next generation architecture for on-premise data centers, as well. But there are a lot of concerns. I think we're all aware of them. They're out there many times for different workloads, there are higher costs. Especially if some of the workloads that are being run through analytics, which tend to run all the time. Just like some of the silo challenges that companies are facing with HDFS, data lakes and cloud storage, the public clouds have similar types of siloed challenges as well. Initially, there was a belief that they were cheaper than data centers, and when you added in all the costs, it looked that way. And again, for certain elastic workloads, that is the case. I don't think that's true across the board overall. Even to the point where a lot of the cloud vendors aren't just charging lower costs anymore. We hear from a lot of customers that they don't really want to tether themselves to any one cloud because of some of those uncertainties. Of course, security and privacy are a concern. We hear a lot of concerns with regards to cloud and even some SaaS vendors around shared data catalogs, across all the customers and not enough separation. But security concerns are out there, you can read about them. I'm not going to jump into that bandwagon. But we hear about them. And then, of course, I think one of the things we hear the most from our customers, is that each cloud stack is starting to feel even a lot more locked in than the traditional data warehouse appliance. And as everybody knows, the industry has been running away from appliances as fast as it can. And so they're not eager to get locked into another, quote, unquote, virtual appliance, if you will, up in the cloud. They really want to make sure they have flexibility in which clouds, they're going to today, tomorrow and in the future. And frankly, we hear from a lot of our customers that they're very interested in eventually mixing and matching, compute from one cloud with, say storage from another cloud, which I think is something that we'll hear a lot more about. And so for us, that's why we've got our big bet number two. we love the cloud. We love the public cloud. We love the private clouds on-premise, and other hosting providers. But our passion and commitment is for Vertica to be able to run in any of the clouds that our customers choose, and make it portable across those clouds. We have supported on-premises and all public clouds for years. And today, we have announced even more support for Vertica in Eon Mode, the deployment option that leverages the separation of compute from storage, with even more deployment choices, which I'm going to also touch more on as we go. So super excited about our big bet number two. And finally as I mentioned, for all the hype that there is around machine learning, I actually think that most importantly, this third trend that team Vertica is determined to address is the need to bring business critical, analytics, machine learning, data science projects into production. For so many years, there just wasn't enough data available to justify the investment in machine learning. Also, processing power was expensive, and storage was prohibitively expensive. But to train and score and evaluate all the different models to unlock the full power of predictive analytics was tough. Today you have those massive data volumes. You have the relatively cheap processing power and storage to make that dream a reality. And if you think about this, I mean with all the data that's available to every company, the real need is to operationalize the speed and the scale of machine learning so that these organizations can actually take advantage of it where they need to. I mean, we've seen this for years with Vertica, going back to some of the most advanced gaming companies in the early days, they were incorporating this with live data directly into their gaming experiences. Well, every organization wants to do that now. And the accuracy for clickability and real time actions are all key to separating the leaders from the rest of the pack in every industry when it comes to machine learning. But if you look at a lot of these projects, the reality is that there's a ton of buzz, there's a ton of hype spanning every acronym that you can imagine. But most companies are struggling, do the separate teams, different tools, silos and the limitation that many platforms are facing, driving, down sampling to get a small subset of the data, to try to create a model that then doesn't apply, or compromising accuracy and making it virtually impossible to replicate models, and understand decisions. And if there's one thing that we've learned when it comes to data, prescriptive data at the atomic level, being able to show end of one as we refer to it, meaning individually tailored data. No matter what it is healthcare, entertainment experiences, like gaming or other, being able to get at the granular data and make these decisions, make that scoring applies to machine learning just as much as it applies to giving somebody a next-best-offer. But the opportunity has never been greater. The need to integrate this end-to-end workflow and support the right tools without compromising on that accuracy. Think about it as no downsampling, using all the data, it really is key to machine learning success. Which should be no surprise then why the third big bet from Vertica is one that we've actually been working on for years. And we're so proud to be where we are today, helping the data disruptors across the world operationalize machine learning. This big bet has the potential to truly unlock, really the potential of machine learning. And today, we're announcing some very important new capabilities specifically focused on unifying the work being done by the data science community, with their preferred tools and platforms, and the volume of data and performance at scale, available in Vertica. Our strategy has been very consistent over the last several years. As I said in the beginning, we haven't deviated from our strategy. Of course, there's always things that we add. Most of the time, it's customer driven, it's based on what our customers are asking us to do. But I think we've also done a great job, not trying to be all things to all people. Especially as these hype cycles flare up around us, we absolutely love participating in these different areas without getting completely distracted. I mean, there's a variety of query tools and data warehouses and analytics platforms in the market. We all know that. There are tools and platforms that are offered by the public cloud vendors, by other vendors that support one or two specific clouds. There are appliance vendors, who I was referring to earlier who can deliver package data warehouse offerings for private data centers. And there's a ton of popular machine learning tools, languages and other kits. But Vertica is the only advanced analytic platform that can do all this, that can bring it together. We can analyze the data wherever it is, in HDFS, S3 Object Storage, or Vertica itself. Natively we support multiple clouds on-premise deployments, And maybe most importantly, we offer that choice of deployment modes to allow our customers to choose the architecture that works for them right now. It still also gives them the option to change move, evolve over time. And Vertica is the only analytics database with end-to-end machine learning that can truly operationalize ML at scale. And I know it's a mouthful. But it is not easy to do all these things. It is one of the things that highly differentiates Vertica from the rest of the pack. It is also why our customers, all of you continue to bet on us and see the value that we are delivering and we will continue to deliver. Here's a couple of examples of some of our customers who are powered by Vertica. It's the scale of data. It's the millisecond response times. Performance and scale have always been a huge part of what we have been about, not the only thing. I think the functionality all the capabilities that we add to the platform, the ease of use, the flexibility, obviously with the deployment. But if you look at some of the numbers they are under these customers on this slide. And I've shared a lot of different stories about these customers. Which, by the way, it still amaze me every time I talk to one and I get the updates, you can see the power and the difference that Vertica is making. Equally important, if you look at a lot of these customers, they are the epitome of being able to deploy Vertica in a lot of different environments. Many of the customers on this slide are not using Vertica just on-premise or just in the cloud. They're using it in a hybrid way. They're using it in multiple different clouds. And again, we've been with them on that journey throughout, which is what has made this product and frankly, our roadmap and our vision exactly what it is. It's been quite a journey. And that journey continues now with the Vertica 10 release. The Vertica 10 release is obviously a massive release for us. But if you look back, you can see that building on that native columnar architecture that started a long time ago, obviously, with the C-Store paper. We built it to leverage that commodity hardware, because it was an architecture that was never tightly integrated with any specific underlying infrastructure. I still remember hearing the initial pitch from Mike Stonebreaker, about the vision of Vertica as a software only solution and the importance of separating the company from hardware innovation. And at the time, Mike basically said to me, "there's so much R&D in innovation that's going to happen in hardware, we shouldn't bake hardware into our solution. We should do it in software, and we'll be able to take advantage of that hardware." And that is exactly what has happened. But one of the most recent innovations that we embraced with hardware is certainly that separation of compute and storage. As I said previously, the public cloud providers offered this next generation architecture, really to ensure that they can provide the customers exactly what they needed, more compute or more storage and charge for each, respectively. The separation of compute and storage, compute from storage is a major milestone in data center architectures. If you think about it, it's really not only a public cloud innovation, though. It fundamentally redefines the next generation data architecture for on-premise and for pretty much every way people are thinking about computing today. And that goes for software too. Object storage is an example of the cost effective means for storing data. And even more importantly, separating compute from storage for analytic workloads has a lot of advantages. Including the opportunity to manage much more dynamic, flexible workloads. And more importantly, truly isolate those workloads from others. And by the way, once you start having something that can truly isolate workloads, then you can have the conversations around autonomic computing, around setting up some nodes, some compute resources on the data that won't affect any of the other data to do some things on their own, maybe some self analytics, by the system, etc. A lot of things that many of you know we've already been exploring in terms of our own system data in the product. But it was May 2018, believe it or not, it seems like a long time ago where we first announced Eon Mode and I want to make something very clear, actually about Eon mode. It's a mode, it's a deployment option for Vertica customers. And I think this is another huge benefit that we don't talk about enough. But unlike a lot of vendors in the market who will dig you and charge you for every single add-on like hit-buy, you name it. You get this with the Vertica product. If you continue to pay support and maintenance, this comes with the upgrade. This comes as part of the new release. So any customer who owns or buys Vertica has the ability to set up either an Enterprise Mode or Eon Mode, which is a question I know that comes up sometimes. Our first announcement of Eon was obviously AWS customers, including the trade desk, AT&T. Most of whom will be speaking here later at the Virtual Big Data Conference. They saw a huge opportunity. Eon Mode, not only allowed Vertica to scale elastically with that specific compute and storage that was needed, but it really dramatically simplified database operations including things like workload balancing, node recovery, compute provisioning, etc. So one of the most popular functions is that ability to isolate the workloads and really allocate those resources without negatively affecting others. And even though traditional data warehouses, including Vertica Enterprise Mode have been able to do lots of different workload isolation, it's never been as strong as Eon Mode. Well, it certainly didn't take long for our customers to see that value across the board with Eon Mode. Not just up in the cloud, in partnership with one of our most valued partners and a platinum sponsor here. Joy mentioned at the beginning. We announced Vertica Eon Mode for Pure Storage FlashBlade in September 2019. And again, just to be clear, this is not a new product, it's one Vertica with yet more deployment options. With Pure Storage, Vertica in Eon mode is not limited in any way by variable cloud, network latency. The performance is actually amazing when you take the benefits of separate and compute from storage and you run it with a Pure environment on-premise. Vertica in Eon Mode has a super smart cache layer that we call the depot. It's a big part of our secret sauce around Eon mode. And combined with the power and performance of Pure's FlashBlade, Vertica became the industry's first advanced analytics platform that actually separates compute and storage for on-premises data centers. Something that a lot of our customers are already benefiting from, and we're super excited about it. But as I said, this is a journey. We don't stop, we're not going to stop. Our customers need the flexibility of multiple public clouds. So today with Vertica 10, we're super proud and excited to announce support for Vertica in Eon Mode on Google Cloud. This gives our customers the ability to use their Vertica licenses on Amazon AWS, on-premise with Pure Storage and on Google Cloud. Now, we were talking about HDFS and a lot of our customers who have invested quite a bit in HDFS as a place, especially to store data have been pushing us to support Eon Mode with HDFS. So as part of Vertica 10, we are also announcing support for Vertica in Eon Mode using HDFS as the communal storage. Vertica's own Roth format data can be stored in HDFS, and actually the full functionality of Vertica is complete analytics, geospatial pattern matching, time series, machine learning, everything that we have in there can be applied to this data. And on the same HDFS nodes, Vertica can actually also analyze data in ORC or Parquet format, using External tables. We can also execute joins between the Roth data the External table holds, which powers a much more comprehensive view. So again, it's that flexibility to be able to support our customers, wherever they need us to support them on whatever platform, they have. Vertica 10 gives us a lot more ways that we can deploy Eon Mode in various environments for our customers. It allows them to take advantage of Vertica in Eon Mode and the power that it brings with that separation, with that workload isolation, to whichever platform they are most comfortable with. Now, there's a lot that has come in Vertica 10. I'm definitely not going to be able to cover everything. But we also introduced complex types as an example. And complex data types fit very well into Eon as well in this separation. They significantly reduce the data pipeline, the cost of moving data between those, a much better support for unstructured data, which a lot of our customers have mixed with structured data, of course, and they leverage a lot of columnar execution that Vertica provides. So you get complex data types in Vertica now, a lot more data, stronger performance. It goes great with the announcement that we made with the broader Eon Mode. Let's talk a little bit more about machine learning. We've been actually doing work in and around machine learning with various extra regressions and a whole bunch of other algorithms for several years. We saw the huge advantage that MPP offered, not just as a sequel engine as a database, but for ML as well. Didn't take as long to realize that there's a lot more to operationalizing machine learning than just those algorithms. It's data preparation, it's that model trade training. It's the scoring, the shaping, the evaluation. That is so much of what machine learning and frankly, data science is about. You do know, everybody always wants to jump to the sexy algorithm and we handle those tasks very, very well. It makes Vertica a terrific platform to do that. A lot of work in data science and machine learning is done in other tools. I had mentioned that there's just so many tools out there. We want people to be able to take advantage of all that. We never believed we were going to be the best algorithm company or come up with the best models for people to use. So with Vertica 10, we support PMML. We can import now and export PMML models. It's a huge step for us around that operationalizing machine learning projects for our customers. Allowing the models to get built outside of Vertica yet be imported in and then applying to that full scale of data with all the performance that you would expect from Vertica. We also are more tightly integrating with Python. As many of you know, we've been doing a lot of open source projects with the community driven by many of our customers, like Uber. And so now with Python we've integrated with TensorFlow, allowing data scientists to build models in their preferred language, to take advantage of TensorFlow. But again, to store and deploy those models at scale with Vertica. I think both these announcements are proof of our big bet number three, and really our commitment to supporting innovation throughout the community by operationalizing ML with that accuracy, performance and scale of Vertica for our customers. Again, there's a lot of steps when it comes to the workflow of machine learning. These are some of them that you can see on the slide, and it's definitely not linear either. We see this as a circle. And companies that do it, well just continue to learn, they continue to rescore, they continue to redeploy and they want to operationalize all that within a single platform that can take advantage of all those capabilities. And that is the platform, with a very robust ecosystem that Vertica has always been committed to as an organization and will continue to be. This graphic, many of you have seen it evolve over the years. Frankly, if we put everything and everyone on here wouldn't fit on a slide. But it will absolutely continue to evolve and grow as we support our customers, where they need the support most. So, again, being able to deploy everywhere, being able to take advantage of Vertica, not just as a business analyst or a business user, but as a data scientists or as an operational or BI person. We want Vertica to be leveraged and used by the broader organization. So I think it's fair to say and I encourage everybody to learn more about Vertica 10, because I'm just highlighting some of the bigger aspects of it. But we talked about those three market trends. The need to unify the silos, the need for hybrid multiple cloud deployment options, the need to operationalize business critical machine learning projects. Vertica 10 has absolutely delivered on those. But again, we are not going to stop. It is our job not to, and this is how Team Vertica thrives. I always joke that the next release is the best release. And, of course, even after Vertica 10, that is also true, although Vertica 10 is pretty awesome. But, you know, from the first line of code, we've always been focused on performance and scale, right. And like any really strong data platform, the execution engine, the optimizer and the execution engine are the two core pieces of that. Beyond Vertica 10, some of the big things that we're already working on, next generation execution engine. We're already actually seeing incredible early performance from this. And this is just one example, of how important it is for an organization like Vertica to constantly go back and re-innovate. Every single release, we do the sit ups and crunches, our performance and scale. How do we improve? And there's so many parts of the core server, there's so many parts of our broader ecosystem. We are constantly looking at coverages of how we can go back to all the code lines that we have, and make them better in the current environment. And it's not an easy thing to do when you're doing that, and you're also expanding in the environment that we are expanding into to take advantage of the different deployments, which is a great segue to this slide. Because if you think about today, we're obviously already available with Eon Mode and Amazon, AWS and Pure and actually MinIO as well. As I talked about in Vertica 10 we're adding Google and HDFS. And coming next, obviously, Microsoft Azure, Alibaba cloud. So being able to expand into more of these environments is really important for the Vertica team and how we go forward. And it's not just running in these clouds, for us, we want it to be a SaaS like experience in all these clouds. We want you to be able to deploy Vertica in 15 minutes or less on these clouds. You can also consume Vertica, in a lot of different ways, on these clouds. As an example, in Amazon Vertica by the Hour. So for us, it's not just about running, it's about taking advantage of the ecosystems that all these cloud providers offer, and really optimizing the Vertica experience as part of them. Optimization, around automation, around self service capabilities, extending our management console, we now have products that like the Vertica Advisor Tool that our Customer Success Team has created to actually use our own smarts in Vertica. To take data from customers that give it to us and help them tune automatically their environment. You can imagine that we're taking that to the next level, in a lot of different endeavors that we're doing around how Vertica as a product can actually be smarter because we all know that simplicity is key. There just aren't enough people in the world who are good at managing data and taking it to the next level. And of course, other things that we all hear about, whether it's Kubernetes and containerization. You can imagine that that probably works very well with the Eon Mode and separating compute and storage. But innovation happens everywhere. We innovate around our community documentation. Many of you have taken advantage of the Vertica Academy. The numbers there are through the roof in terms of the number of people coming in and certifying on it. So there's a lot of things that are within the core products. There's a lot of activity and action beyond the core products that we're taking advantage of. And let's not forget why we're here, right? It's easy to talk about a platform, a data platform, it's easy to jump into all the functionality, the analytics, the flexibility, how we can offer it. But at the end of the day, somebody, a person, she's got to take advantage of this data, she's got to be able to take this data and use this information to make a critical business decision. And that doesn't happen unless we explore lots of different and frankly, new ways to get that predictive analytics UI and interface beyond just the standard BI tools in front of her at the right time. And so there's a lot of activity, I'll tease you with that going on in this organization right now about how we can do that and deliver that for our customers. We're in a great position to be able to see exactly how this data is consumed and used and start with this core platform that we have to go out. Look, I know, the plan wasn't to do this as a virtual BDC. But I really appreciate you tuning in. Really appreciate your support. I think if there's any silver lining to us, maybe not being able to do this in person, it's the fact that the reach has actually gone significantly higher than what we would have been able to do in person in Boston. We're certainly looking forward to doing a Big Data Conference in the future. But if I could leave you with anything, know this, since that first release for Vertica, and our very first customers, we have been very consistent. We respect all the innovation around us, whether it's open source or not. We understand the market trends. We embrace those new ideas and technologies and for us true north, and the most important thing is what does our customer need to do? What problem are they trying to solve? And how do we use the advantages that we have without disrupting our customers? But knowing that you depend on us to deliver that unified analytics strategy, it will deliver that performance of scale, not only today, but tomorrow and for years to come. We've added a lot of great features to Vertica. I think we've said no to a lot of things, frankly, that we just knew we wouldn't be the best company to deliver. When we say we're going to do things we do them. Vertica 10 is a perfect example of so many of those things that we from you, our customers have heard loud and clear, and we have delivered. I am incredibly proud of this team across the board. I think the culture of Vertica, a customer first culture, jumping in to help our customers win no matter what is also something that sets us massively apart. I hear horror stories about support experiences with other organizations. And people always seem to be amazed at Team Vertica's willingness to jump in or their aptitude for certain technical capabilities or understanding the business. And I think sometimes we take that for granted. But that is the team that we have as Team Vertica. We are incredibly excited about Vertica 10. I think you're going to love the Virtual Big Data Conference this year. I encourage you to tune in. Maybe one other benefit is I know some people were worried about not being able to see different sessions because they were going to overlap with each other well now, even if you can't do it live, you'll be able to do those sessions on demand. Please enjoy the Vertica Big Data Conference here in 2020. Please you and your families and your co-workers be safe during these times. I know we will get through it. And analytics is probably going to help with a lot of that and we already know it is helping in many different ways. So believe in the data, believe in data's ability to change the world for the better. And thank you for your time. And with that, I am delighted to now introduce Micro Focus CEO Stephen Murdoch to the Vertica Big Data Virtual Conference. Thank you Stephen. >> Stephen: Hi, everyone, my name is Stephen Murdoch. I have the pleasure and privilege of being the Chief Executive Officer here at Micro Focus. Please let me add my welcome to the Big Data Conference. And also my thanks for your support, as we've had to pivot to this being virtual rather than a physical conference. Its amazing how quickly we all reset to a new normal. I certainly didn't expect to be addressing you from my study. Vertica is an incredibly important part of Micro Focus family. Is key to our goal of trying to enable and help customers become much more data driven across all of their IT operations. Vertica 10 is a huge step forward, we believe. It allows for multi-cloud innovation, genuinely hybrid deployments, begin to leverage machine learning properly in the enterprise, and also allows the opportunity to unify currently siloed lakes of information. We operate in a very noisy, very competitive market, and there are people, who are in that market who can do some of those things. The reason we are so excited about Vertica is we genuinely believe that we are the best at doing all of those things. And that's why we've announced publicly, you're under executing internally, incremental investment into Vertica. That investments targeted at accelerating the roadmaps that already exist. And getting that innovation into your hands faster. This idea is speed is key. It's not a question of if companies have to become data driven organizations, it's a question of when. So that speed now is really important. And that's why we believe that the Big Data Conference gives a great opportunity for you to accelerate your own plans. You will have the opportunity to talk to some of our best architects, some of the best development brains that we have. But more importantly, you'll also get to hear from some of our phenomenal Roth Data customers. You'll hear from Uber, from the Trade Desk, from Philips, and from AT&T, as well as many many others. And just hearing how those customers are using the power of Vertica to accelerate their own, I think is the highlight. And I encourage you to use this opportunity to its full. Let me close by, again saying thank you, we genuinely hope that you get as much from this virtual conference as you could have from a physical conference. And we look forward to your engagement, and we look forward to hearing your feedback. With that, thank you very much. >> Joy: Thank you so much, Stephen, for joining us for the Vertica Big Data Conference. Your support and enthusiasm for Vertica is so clear, and it makes a big difference. Now, I'm delighted to introduce Amy Fowler, the VP of Strategy and Solutions for FlashBlade at Pure Storage, who was one of our BDC Platinum Sponsors, and one of our most valued partners. It was a proud moment for me, when we announced Vertica in Eon mode for Pure Storage FlashBlade and we became the first analytics data warehouse that separates compute from storage for on-premise data centers. Thank you so much, Amy, for joining us. Let's get started. >> Amy: Well, thank you, Joy so much for having us. And thank you all for joining us today, virtually, as we may all be. So, as we just heard from Colin Mahony, there are some really interesting trends that are happening right now in the big data analytics market. From the end of the Hadoop hype cycle, to the new cloud reality, and even the opportunity to help the many data science and machine learning projects move from labs to production. So let's talk about these trends in the context of infrastructure. And in particular, look at why a modern storage platform is relevant as organizations take on the challenges and opportunities associated with these trends. The answer is the Hadoop hype cycles left a lot of data in HDFS data lakes, or reservoirs or swamps depending upon the level of the data hygiene. But without the ability to get the value that was promised from Hadoop as a platform rather than a distributed file store. And when we combine that data with the massive volume of data in Cloud Object Storage, we find ourselves with a lot of data and a lot of silos, but without a way to unify that data and find value in it. Now when you look at the infrastructure data lakes are traditionally built on, it is often direct attached storage or data. The approach that Hadoop took when it entered the market was primarily bound by the limits of networking and storage technologies. One gig ethernet and slower spinning disk. But today, those barriers do not exist. And all FlashStorage has fundamentally transformed how data is accessed, managed and leveraged. The need for local data storage for significant volumes of data has been largely mitigated by the performance increases afforded by all Flash. At the same time, organizations can achieve superior economies of scale with that segregation of compute and storage. With compute and storage, you don't always scale in lockstep. Would you want to add an engine to the train every time you add another boxcar? Probably not. But from a Pure Storage perspective, FlashBlade is uniquely architected to allow customers to achieve better resource utilization for compute and storage, while at the same time, reducing complexity that has arisen from the siloed nature of the original big data solutions. The second and equally important recent trend we see is something I'll call cloud reality. The public clouds made a lot of promises and some of those promises were delivered. But cloud economics, especially usage based and elastic scaling, without the control that many companies need to manage the financial impact is causing a lot of issues. In addition, the risk of vendor lock-in from data egress, charges, to integrated software stacks that can't be moved or deployed on-premise is causing a lot of organizations to back off the all the way non-cloud strategy, and move toward hybrid deployments. Which is kind of funny in a way because it wasn't that long ago that there was a lot of talk about no more data centers. And for example, one large retailer, I won't name them, but I'll admit they are my favorites. They several years ago told us they were completely done with on-prem storage infrastructure, because they were going 100% to the cloud. But they just deployed FlashBlade for their data pipelines, because they need predictable performance at scale. And the all cloud TCO just didn't add up. Now, that being said, well, there are certainly challenges with the public cloud. It has also brought some things to the table that we see most organizations wanting. First of all, in a lot of cases applications have been built to leverage object storage platforms like S3. So they need that object protocol, but they may also need it to be fast. And the said object may be oxymoron only a few years ago, and this is an area of the market where Pure and FlashBlade have really taken a leadership position. Second, regardless of where the data is physically stored, organizations want the best elements of a cloud experience. And for us, that means two main things. Number one is simplicity and ease of use. If you need a bunch of storage experts to run the system, that should be considered a bug. The other big one is the consumption model. The ability to pay for what you need when you need it, and seamlessly grow your environment over time totally nondestructively. This is actually pretty huge and something that a lot of vendors try to solve for with finance programs. But no finance program can address the pain of a forklift upgrade, when you need to move to next gen hardware. To scale nondestructively over long periods of time, five to 10 years plus is a crucial architectural decisions need to be made at the outset. Plus, you need the ability to pay as you use it. And we offer something for FlashBlade called Pure as a Service, which delivers exactly that. The third cloud characteristic that many organizations want is the option for hybrid. Even if that is just a DR site in the cloud. In our case, that means supporting appplication of S3, at the AWS. And the final trend, which to me represents the biggest opportunity for all of us, is the need to help the many data science and machine learning projects move from labs to production. This means bringing all the machine learning functions and model training to the data, rather than moving samples or segments of data to separate platforms. As we all know, machine learning needs a ton of data for accuracy. And there is just too much data to retrieve from the cloud for every training job. At the same time, predictive analytics without accuracy is not going to deliver the business advantage that everyone is seeking. You can kind of visualize data analytics as it is traditionally deployed as being on a continuum. With that thing, we've been doing the longest, data warehousing on one end, and AI on the other end. But the way this manifests in most environments is a series of silos that get built up. So data is duplicated across all kinds of bespoke analytics and AI, environments and infrastructure. This creates an expensive and complex environment. So historically, there was no other way to do it because some level of performance is always table stakes. And each of these parts of the data pipeline has a different workload profile. A single platform to deliver on the multi dimensional performances, diverse set of applications required, that didn't exist three years ago. And that's why the application vendors pointed you towards bespoke things like DAS environments that we talked about earlier. And the fact that better options exists today is why we're seeing them move towards supporting this disaggregation of compute and storage. And when it comes to a platform that is a better option, one with a modern architecture that can address the diverse performance requirements of this continuum, and allow organizations to bring a model to the data instead of creating separate silos. That's exactly what FlashBlade is built for. Small files, large files, high throughput, low latency and scale to petabytes in a single namespace. And this is importantly a single rapid space is what we're focused on delivering for our customers. At Pure, we talk about it in the context of modern data experience because at the end of the day, that's what it's really all about. The experience for your teams in your organization. And together Pure Storage and Vertica have delivered that experience to a wide range of customers. From a SaaS analytics company, which uses Vertica on FlashBlade to authenticate the quality of digital media in real time, to a multinational car company, which uses Vertica on FlashBlade to make thousands of decisions per second for autonomous cars, or a healthcare organization, which uses Vertica on FlashBlade to enable healthcare providers to make real time decisions that impact lives. And I'm sure you're all looking forward to hearing from John Yavanovich from AT&T. To hear how he's been doing this with Vertica and FlashBlade as well. He's coming up soon. We have been really excited to build this partnership with Vertica. And we're proud to provide the only on-premise storage platform validated with Vertica Eon Mode. And deliver this modern data experience to our customers together. Thank you all so much for joining us today. >> Joy: Amy, thank you so much for your time and your insights. Modern infrastructure is key to modern analytics, especially as organizations leverage next generation data center architectures, and object storage for their on-premise data centers. Now, I'm delighted to introduce our last speaker in our Vertica Big Data Conference Keynote, John Yovanovich, Director of IT for AT&T. Vertica is so proud to serve AT&T, and especially proud of the harmonious impact we are having in partnership with Pure Storage. John, welcome to the Virtual Vertica BDC. >> John: Thank you joy. It's a pleasure to be here. And I'm excited to go through this presentation today. And in a unique fashion today 'cause as I was thinking through how I wanted to present the partnership that we have formed together between Pure Storage, Vertica and AT&T, I want to emphasize how well we all work together and how these three components have really driven home, my desire for a harmonious to use your word relationship. So, I'm going to move forward here and with. So here, what I'm going to do the theme of today's presentation is the Pure Vertica Symphony live at AT&T. And if anybody is a Westworld fan, you can appreciate the sheet music on the right hand side. What we're going to what I'm going to highlight here is in a musical fashion, is how we at AT&T leverage these technologies to save money to deliver a more efficient platform, and to actually just to make our customers happier overall. So as we look back, and back as early as just maybe a few years ago here at AT&T, I realized that we had many musicians to help the company. Or maybe you might want to call them data scientists, or data analysts. For the theme we'll stay with musicians. None of them were singing or playing from the same hymn book or sheet music. And so what we had was many organizations chasing a similar dream, but not exactly the same dream. And, best way to describe that is and I think with a lot of people this might resonate in your organizations. How many organizations are chasing a customer 360 view in your company? Well, I can tell you that I have at least four in my company. And I'm sure there are many that I don't know of. That is our problem because what we see is a repetitive sourcing of data. We see a repetitive copying of data. And there's just so much money to be spent. This is where I asked Pure Storage and Vertica to help me solve that problem with their technologies. What I also noticed was that there was no coordination between these departments. In fact, if you look here, nobody really wants to play with finance. Sales, marketing and care, sure that you all copied each other's data. But they actually didn't communicate with each other as they were copying the data. So the data became replicated and out of sync. This is a challenge throughout, not just my company, but all companies across the world. And that is, the more we replicate the data, the more problems we have at chasing or conquering the goal of single version of truth. In fact, I kid that I think that AT&T, we actually have adopted the multiple versions of truth, techno theory, which is not where we want to be, but this is where we are. But we are conquering that with the synergies between Pure Storage and Vertica. This is what it leaves us with. And this is where we are challenged and that if each one of our siloed business units had their own stories, their own dedicated stories, and some of them had more money than others so they bought more storage. Some of them anticipating storing more data, and then they really did. Others are running out of space, but can't put anymore because their bodies aren't been replenished. So if you look at it from this side view here, we have a limited amount of compute or fixed compute dedicated to each one of these silos. And that's because of the, wanting to own your own. And the other part is that you are limited or wasting space, depending on where you are in the organization. So there were the synergies aren't just about the data, but actually the compute and the storage. And I wanted to tackle that challenge as well. So I was tackling the data. I was tackling the storage, and I was tackling the compute all at the same time. So my ask across the company was can we just please play together okay. And to do that, I knew that I wasn't going to tackle this by getting everybody in the same room and getting them to agree that we needed one account table, because they will argue about whose account table is the best account table. But I knew that if I brought the account tables together, they would soon see that they had so much redundancy that I can now start retiring data sources. I also knew that if I brought all the compute together, that they would all be happy. But I didn't want them to tackle across tackle each other. And in fact that was one of the things that all business units really enjoy. Is they enjoy the silo of having their own compute, and more or less being able to control their own destiny. Well, Vertica's subclustering allows just that. And this is exactly what I was hoping for, and I'm glad they've brought through. And finally, how did I solve the problem of the single account table? Well when you don't have dedicated storage, and you can separate compute and storage as Vertica in Eon Mode does. And we store the data on FlashBlades, which you see on the left and right hand side, of our container, which I can describe in a moment. Okay, so what we have here, is we have a container full of compute with all the Vertica nodes sitting in the middle. Two loader, we'll call them loader subclusters, sitting on the sides, which are dedicated to just putting data onto the FlashBlades, which is sitting on both ends of the container. Now today, I have two dedicated storage or common dedicated might not be the right word, but two storage racks one on the left one on the right. And I treat them as separate storage racks. They could be one, but i created them separately for disaster recovery purposes, lashing work in case that rack were to go down. But that being said, there's no reason why I'm probably going to add a couple of them here in the future. So I can just have a, say five to 10, petabyte storage, setup, and I'll have my DR in another 'cause the DR shouldn't be in the same container. Okay, but I'll DR outside of this container. So I got them all together, I leveraged subclustering, I leveraged separate and compute. I was able to convince many of my clients that they didn't need their own account table, that they were better off having one. I eliminated, I reduced latency, I reduced our ticketing I reduce our data quality issues AKA ticketing okay. I was able to expand. What is this? As work. I was able to leverage elasticity within this cluster. As you can see, there are racks and racks of compute. We set up what we'll call the fixed capacity that each of the business units needed. And then I'm able to ramp up and release the compute that's necessary for each one of my clients based on their workloads throughout the day. And so while they compute to the right before you see that the instruments have already like, more or less, dedicated themselves towards all those are free for anybody to use. So in essence, what I have, is I have a concert hall with a lot of seats available. So if I want to run a 10 chair Symphony or 80, chairs, Symphony, I'm able to do that. And all the while, I can also do the same with my loader nodes. I can expand my loader nodes, to actually have their own Symphony or write all to themselves and not compete with any other workloads of the other clusters. What does that change for our organization? Well, it really changes the way our database administrators actually do their jobs. This has been a big transformation for them. They have actually become data conductors. Maybe you might even call them composers, which is interesting, because what I've asked them to do is morph into less technology and more workload analysis. And in doing so we're able to write auto-detect scripts, that watch the queues, watch the workloads so that we can help ramp up and trim down the cluster and subclusters as necessary. There has been an exciting transformation for our DBAs, who I need to now classify as something maybe like DCAs. I don't know, I have to work with HR on that. But I think it's an exciting future for their careers. And if we bring it all together, If we bring it all together, and then our clusters, start looking like this. Where everything is moving in harmonious, we have lots of seats open for extra musicians. And we are able to emulate a cloud experience on-prem. And so, I want you to sit back and enjoy the Pure Vertica Symphony live at AT&T. (soft music) >> Joy: Thank you so much, John, for an informative and very creative look at the benefits that AT&T is getting from its Pure Vertica symphony. I do really like the idea of engaging HR to change the title to Data Conductor. That's fantastic. I've always believed that music brings people together. And now it's clear that analytics at AT&T is part of that musical advantage. So, now it's time for a short break. And we'll be back for our breakout sessions, beginning at 12 pm Eastern Daylight Time. We have some really exciting sessions planned later today. And then again, as you can see on Wednesday. Now because all of you are already logged in and listening to this keynote, you already know the steps to continue to participate in the sessions that are listed here and on the previous slide. In addition, everyone received an email yesterday, today, and you'll get another one tomorrow, outlining the simple steps to register, login and choose your session. If you have any questions, check out the emails or go to www.vertica.com/bdc2020 for the logistics information. There are a lot of choices and that's always a good thing. Don't worry if you want to attend one or more or can't listen to these live sessions due to your timezone. All the sessions, including the Q&A sections will be available on demand and everyone will have access to the recordings as well as even more pre-recorded sessions that we'll post to the BDC website. Now I do want to leave you with two other important sites. First, our Vertica Academy. Vertica Academy is available to everyone. And there's a variety of very technical, self-paced, on-demand training, virtual instructor-led workshops, and Vertica Essentials Certification. And it's all free. Because we believe that Vertica expertise, helps everyone accelerate their Vertica projects and the advantage that those projects deliver. Now, if you have questions or want to engage with our Vertica engineering team now, we're waiting for you on the Vertica forum. We'll answer any questions or discuss any ideas that you might have. Thank you again for joining the Vertica Big Data Conference Keynote Session. Enjoy the rest of the BDC because there's a lot more to come

Published Date : Mar 30 2020

SUMMARY :

And he'll share the exciting news And that is the platform, with a very robust ecosystem some of the best development brains that we have. the VP of Strategy and Solutions is causing a lot of organizations to back off the and especially proud of the harmonious impact And that is, the more we replicate the data, Enjoy the rest of the BDC because there's a lot more to come

ENTITIES

Entity	Category	Confidence
Stephen	PERSON	0.99+
Amy Fowler	PERSON	0.99+
Mike	PERSON	0.99+
John Yavanovich	PERSON	0.99+
Amy	PERSON	0.99+
Colin Mahony	PERSON	0.99+
AT&T	ORGANIZATION	0.99+
Boston	LOCATION	0.99+
John Yovanovich	PERSON	0.99+
Vertica	ORGANIZATION	0.99+
Joy King	PERSON	0.99+
Mike Stonebreaker	PERSON	0.99+
John	PERSON	0.99+
May 2018	DATE	0.99+
100%	QUANTITY	0.99+
Wednesday	DATE	0.99+
Colin	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Vertica Academy	ORGANIZATION	0.99+
five	QUANTITY	0.99+
Joy	PERSON	0.99+
2020	DATE	0.99+
two	QUANTITY	0.99+
Uber	ORGANIZATION	0.99+
Stephen Murdoch	PERSON	0.99+
Vertica 10	TITLE	0.99+
Pure Storage	ORGANIZATION	0.99+
one	QUANTITY	0.99+
today	DATE	0.99+
Philips	ORGANIZATION	0.99+
tomorrow	DATE	0.99+
AT&T.	ORGANIZATION	0.99+
September 2019	DATE	0.99+
Python	TITLE	0.99+
www.vertica.com/bdc2020	OTHER	0.99+
One gig	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
Second	QUANTITY	0.99+
First	QUANTITY	0.99+
15 minutes	QUANTITY	0.99+
yesterday	DATE	0.99+

Josh Klahr & Prashanthi Paty | DataWorks Summit 2017

>> Announcer: Live from San Jose, in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2017. Brought to you by Hortonworks. >> Hey, welcome back to theCUBE. Day two of the DataWorks Summit, I'm Lisa Martin with my cohost, George Gilbert. We've had a great day and a half so far, learning a ton in this hyper-growth, big data world meets IoT, machine learning, data science. George and I are excited to welcome our next guests. We have Josh Klahr, the VP of Product Management from AtScale. Welcome George, welcome back. >> Thank you. >> And we have Prashanthi Paty, the Head of Data Engineering for GoDaddy. Welcome to theCUBE. >> Thank you. >> Great to have you guys here. So, wanted to kind of talk to you guys about, one, how you guys are working together, but two, also some of the trends that you guys are seeing. So as we talked about, in the tech industry, it's two degrees of Kevin Bacon, right. You guys worked together back in the day at Yahoo. Talk to us about what you both visualized and experienced in terms of the Hadoop adoption maturity cycle. >> Sure. >> You want to start, Josh? >> Yeah, I'll start, and you can chime in and correct me. But yeah, as you mentioned, Prashanthi and I worked together at Yahoo. It feels like a long time ago. In our central data group. And we had two main jobs. First job was, collect all of the data from our ad systems, our audience systems, and stick that data into a Hadoop cluster. At the time, we were kind of doing it while Hadoop was kind of being developed. And the other thing that we did was, we had to support a bunch of BI consumers. So we built cubes, we built data marts, we used MicroStrategy, Tableau, and I would say the experience there was a great experience with Hadoop in terms of the ability to have low-cost storage, scale out data processing of all of, what were really, billions and billions, tens of billions of events a day. But when it came to BI, it felt like we were doing stuff the old way. And we were moving data off cluster, and making it small. In fact, you did a lot of that. >> Well, yeah, at the end of the day, we were using Hadoop as a staging layer. So we would process a whole bunch of data there, and then we would scale it back, and move it into, again, relational stores or cubes, because basically we couldn't afford to give any accessibility to BI tools or to our end users directly on Hadoop. So while we surely did a large-scale data processing in Hadoop layer, we failed to turn on the insights right there. >> Lisa: Okay. >> Maybe there's a lesson in there for folks who are getting slightly more mature versions of Hadoop now, but can learn from also some of the experiences you've had. Were there issues in terms of, having cleaned and curated data, were there issues for BI with performance and the lack of proper file formats like Parquet? What was it that where you hit the wall? >> It was both, you have to remember this, we were probably one of the first teams to put a data warehouse on Hadoop. So we were dealing with Pig versions of like, 0.5, 0.6, so we were putting a lot of demand on the tooling and the infrastructure. Hadoop was still in a very nascent stage at that time. That was one. And I think a lot of the focus was on, hey, now we have the ability to do clickstream analytics at scale, right. So we did a lot of the backend stuff. But the presentation is where I think we struggled. >> So would that mean that you did do, the idea is that you could do full resolution without sampling on the backend, and then you would extract and presumably sort of denormalize so that you could, essentially run data match for subject matter interests. >> Yeah, and that's exactly what we did is, we took all of this big data, but to make it work for BI, which were two things, one was performance. It was really, can you get an interactive query and response time. And the other thing was the interface. Can a Tableau user connect and understand what they're looking at. You had to make the data small again. And that was actually the genesis of AtScale, which is where I am today, was, we were frustrated with this, big data platform and having to then make the data small again in order to support BI. >> That's a great transition, Josh. Let's actually talk about AtScale. You guys saw BI on Hadoop as this big white space. How have you succeeded there, and then let's talk about what GoDaddy is doing with AtScale and big data. >> Yeah, I think that we definitely learned, we took the learnings from our experience at Yahoo, and we really thought about, if we were to start from scratch, and solve the problem the way we wanted it to be solved, what would that system look like. And it was a few things. One was an interface that worked for BI. I don't want to date myself, but my experience in the software space started with OLAP. And I can tell you OLAP isn't dead. When you go and talk to an enterprise, a fortune 1000 enterprise and you talk about OLAP, that's how they think. They think in terms of measures and dimensions and hierarchies. So one important thing for us was to project an OLAP interface on top of data that's Hadoop native. It's Hive tables, Parquet, ORC, you kind of talk about all of the mess that may sit underneath the covers. So one thing was projecting that interface, the other thing was delivering performance. So we've invested a lot in using the Hadoop cluster natively to deliver performing queries. We do this by creating aggregate tables and summary tables and being smart about how we route queries. But we've done it in a way that makes a Hadoop admin very happy. You don't have to buy a bunch of AtScale servers in addition to your Hadoop cluster. We scale the way the Hadoop cluster scales. So we don't require separate technology. So we fit really nicely into that Hadoop ecosystem. >> So how do you make, making the Hadoop admin happy is a good thing. How do you make the business user happy, who needs now, as we were here yesterday, to kind of merge more with the data science folks to be able to understand or even have the chance to articulate, "These are the business outcomes "we want to look for and we want to see." How do you guys, maybe, under the hood, if you will, AtScale, make the business guys and gals happy? >> I'll share my opinion and then Prashanthi can comment on her experience but, as I've mentioned before, the business users want an interface that's simple to use. And so that's one thing we do, is, we give them the ability to just look at measures and dimensions. If I'm a business, I grew up using Excel to do my analysis. The thing I like most as an analyst is a big fat wide table. And so that's what, we make an underlying Hadoop cluster and what could be tens or hundreds of tables look like a single big fat wide table for a data analyst. You talk to a data scientist, you talk to a business analyst, that's the way they want to view the world. So that's one thing we do. And then, we give them response times that are fast. We give them interactivity, so that you could really quickly start to get a sense of the shape of the data. >> And allowing them to get that time to value. >> Yes. >> I can imagine. >> Just a follow-up on that. When you have to prepare the aggregates, essentially like the cubes, instead of the old BI tools running on a data mart, what is the additional latency that's required from data coming fresh into the data lake and then transforming it into something that's consumption ready for the business user? >> Yeah, I think I can take that. So again, if you look at the last 10 years, in the initial period, certainly at Yahoo, we just threw engineering resources at that problem, right. So we had teams dedicated to building these aggregates. But the whole premise of Hadoop was the ability to do unstructured optimizations. And by having a team find out the new data coming in and then integrating that into your pipeline, so we were adding a lot of latency. And so we needed to figure out how we can do this in a more seamless way, in a more real-time way. And get the, you know, the real premise of Hadoop. Get it at the hands of our business users. I mean, I think that's where AtScale is doing a lot of the good work in terms of dynamically being able to create aggregates based on the design that you put in the cube. So we are starting to work with them on our implementation. We're looking forward to the results. >> Tell us a little bit more about what you're looking to achieve. So GoDaddy is a customer of AtScale. Tell us a little bit more about that. What are you looking to build together, and kind of, where are you in your journey right now? >> Yeah, so the main goal for us is to move beyond predefined models, dashboards, and reports. So we want to be more agile with our schema changes. Time to market is one. And performance, right. Ability to put BI tools directly on top of Hadoop, is one. And also to push as much of the semantics as possible down into the Hadoop layer. So those are the things that we're looking to do. >> So that sounds like a classic business intelligence component, but sort of rethought for a big data era. >> I love that quote, and I feel it. >> Prashanthi: Yes. >> Josh: Yes. (laughing) >> That's exactly what we're trying to do. >> But that's also, some of the things you mentioned are non-trivial. You want to have this, time goes in to the pre-processing of data so that it's consumable, but you also wanted it to be dynamic, which is sort of a trade-off, which means, you know, that takes time. So is that a sort of a set of requirements, a wishlist for AtScale, or is that something that you're building on your own? >> I think there's a lot happening in that space. They are one of the first people to come out with their product, which is solving a real problem that we tried to solve for a long time. And I think as we start using them more and more, we'll surely be pushing them to bring in more features. I think the algorithm that they have to dynamically generate aggregates is something that we're giving quite a lot of feedback to them on. >> Our last guest from Pentaho was talking about, there was, in her keynote today, the quote from I think McKinsey report that said, "40% of machine learning data is either not fully "exploited or not used at all." So, tell us, kind of, where is big daddy regarding machine learning? What are you seeing? What are you seeing at AtScale and how are you guys going to work together to maybe venture into that frontier? >> Yeah, I mean, I think one of the key requirements we're placing on our data scientists is, not only do you have to be very good at your data science job, you have to be a very good programmer too to make use of the big data technologies. And we're seeing some interesting developments like very workload-specific engines coming into the market now for search, for graph, for machine learning, as well. Which is supposed to give the tools right into the hands of data scientists. I personally haven't worked with them to be able to comment. But I do think that the next realm on big data is this workload-specific engines, and coming on top of Hadoop, and realizing more of the insights for the end users. >> Curious, can you elaborate a little more on those workload-specific engines, that sounds rather intriguing. >> Well, I think interactive, interacting with Hadoop on a real-time basis, we see search-based engines like Elasticsearch, Solr, and there is also Druid. At Yahoo, we were quite a bit shop of Druid actually. And we were using it as an interactive query layer directly with our applications, BI applications. This is our JavaScript-based BI applications, and Hadoop. So I think there are quite a few means to realize insights from Hadoop now. And that's the space where I see workload-specific engines coming in. >> And you mentioned earlier before we started that you were using Mahout, presumably for machine learning. And I guess I thought the center of gravity for that type of analytics has moved to Spark, and you haven't mentioned Spark yet. We are not using Mahout though. I mentioned it as something that's in that space. But yeah, I mean, Spark is pretty interesting. Spark SQL, doing ETL with Spark, as well as using Spark SQL for queries is something that looks very, very promising lately. >> Quick question for you, from a business perspective, so you're the Head of Engineering at GoDaddy. How do you interact with your business users? The C-suite, for example, where data science, machine learning, they understand, we have to have, they're embracing Hadoop more and more. They need to really, embracing big data and leveraging Hadoop as an enabler. What's the conversation like, or maybe even the influence of the GoDaddy business C-suite on engineering? How do you guys work collaboratively? >> So we do have very regular stakeholder meeting. And these are business stakeholders. So we have representatives from our marketing teams, finance, product teams, and data science team. We consider data science as one of our customers. We take requirements from them. We give them peek into the work we're doing. We also let them be part of our agile team so that when we have something released, they're the first ones looking at it and testing it. So they're very much part of the process. I don't think we can afford to just sit back and work on this monolithic data warehouse and at the end of the day say, "Hey, here is what we have" and ask them to go get the insights from it. So it's a very agile process, and they're very much part of it. >> One last question for you, sorry George, is, you guys mentioned you are sort of early in your partnership, unless I misunderstood. What has AtScale help GoDaddy achieve so far and what are your expectations, say the next six months? >> We want the world. (laughing) >> Lisa: Just that. >> Yeah, but the premise is, I mean, so Josh and I, we were part of the same team at Yahoo, where we faced problems that AtScale is trying to solve. So the premise of being able to solve those problems, which is, like their name, basically delivering data at scale, that's the premise that I'm very much looking forward to from them. >> Well, excellent. Well, we want to thank you both for joining us on theCUBE. We wish you the best of luck in attaining the world. (all laughing) >> Josh: There we go, thank you. >> Excellent, guys. Josh Klahr, thank you so much. >> My pleasure. Prashanthi, thank you for being on theCUBE for the first time. >> No problem. >> You've been watching theCUBE live at the day two of the DataWorks Summit. For my cohost George Gilbert, I am Lisa Martin. Stick around guys, we'll be right back. (jingle)

Published Date : Jun 14 2017

SUMMARY :

Brought to you by Hortonworks. George and I are excited to welcome our next guests. And we have Prashanthi Paty, Talk to us about what you both visualized and experienced And the other thing that we did was, and then we would scale it back, and the lack of proper file formats like Parquet? So we were dealing with Pig versions of like, the idea is that you could do full resolution And the other thing was the interface. How have you succeeded there, and solve the problem the way we wanted it to be solved, So how do you make, And so that's one thing we do, is, that's consumption ready for the business user? based on the design that you put in the cube. and kind of, where are you in your journey right now? So we want to be more agile with our schema changes. So that sounds like a classic business intelligence Josh: Yes. of data so that it's consumable, but you also wanted And I think as we start using them more and more, What are you seeing at AtScale and how are you guys and realizing more of the insights for the end users. Curious, can you elaborate a little more And we were using it as an interactive query layer and you haven't mentioned Spark yet. machine learning, they understand, we have to have, and at the end of the day say, "Hey, here is what we have" you guys mentioned you are sort of early We want the world. So the premise of being able to solve those problems, Well, we want to thank you both for joining us on theCUBE. Josh Klahr, thank you so much. for the first time. of the DataWorks Summit.

ENTITIES

Entity	Category	Confidence
Josh	PERSON	0.99+
George	PERSON	0.99+
Lisa Martin	PERSON	0.99+
George Gilbert	PERSON	0.99+
Josh Klahr	PERSON	0.99+
Prashanthi Paty	PERSON	0.99+
Prashanthi	PERSON	0.99+
Lisa	PERSON	0.99+
Yahoo	ORGANIZATION	0.99+
Kevin Bacon	PERSON	0.99+
San Jose	LOCATION	0.99+
Excel	TITLE	0.99+
Silicon Valley	LOCATION	0.99+
GoDaddy	ORGANIZATION	0.99+
40%	QUANTITY	0.99+
yesterday	DATE	0.99+
AtScale	ORGANIZATION	0.99+
tens	QUANTITY	0.99+
Spark	TITLE	0.99+
Druid	TITLE	0.99+
First job	QUANTITY	0.99+
Hadoop	TITLE	0.99+
two	QUANTITY	0.99+
Spark SQL	TITLE	0.99+
today	DATE	0.99+
two degrees	QUANTITY	0.99+
both	QUANTITY	0.98+
one	QUANTITY	0.98+
DataWorks Summit	EVENT	0.98+
two things	QUANTITY	0.98+
Elasticsearch	TITLE	0.98+
first time	QUANTITY	0.98+
DataWorks Summit 2017	EVENT	0.97+
first teams	QUANTITY	0.96+
Solr	TITLE	0.96+
Mahout	TITLE	0.95+
hundreds of tables	QUANTITY	0.95+
two main jobs	QUANTITY	0.94+
One last question	QUANTITY	0.94+
billions and	QUANTITY	0.94+
McKinsey	ORGANIZATION	0.94+
Day two	QUANTITY	0.94+
One	QUANTITY	0.94+
Parquet	TITLE	0.94+
Tableau	TITLE	0.93+

Jean-Pierre Dijcks, Oracle - On the Ground - #theCUBE

>> Narrator: The Cube presents, On the Ground. (techno music) >> Hi I'm Peter Burris, welcome to, an On the Ground here at Oracle Headquarters, with Silicon Angle media The Cube. Today we're talking to JP Dijcks, who is the master product manager inside, or one of the master product managers, inside Oracle's big data product group, welcome JP. >> Thank you Peter. >> Well, we're going to talk about how developers get access to this plethora, this miasma, this unbelievable complexity of data that's being made possible by IOT, traditional applications, and other sources, how are developers going to get access to this data? >> That's a good question Peter, I still think that one of the key aspects to getting access to that data is SQL, and so that's one of the ways we are driving, try to figure out, can we get the Oracle SQL engine, and all the richness of SQL analytics enabled on all of that data, no matter the what the format is, or no matter where it lives, how can I enable those SQL analytics on that, and then obviously we've all seemed to shift in APIs, and languages, like people don't necessarily always want to speak SQL and write SQL questions, or write SQL queries. So how do we then enable things like R, how do we enable plural, how do we enable Python, all sorts of things like that, how do we do that, and so the thought we had was, can we use SQL as the common meta-data interface? And the common structure around some of this, and enable all of these languages on top of that through the database. So that's kind of the baseline of what we're thinking of, of enabling this to developers and large communities of users. So that's SQL as an access method, do you also envision that SQL will also be a data creation language? As we think about how to envision big data coming together from a modeling perspective. >> So I think from a modeling perspective the meta-data part we certainly look at as a creation or definition language is probably the better word, how do I do structured queries, 'cause that's what SQL stands for, how do I do that on Jason documents, how do I do that on IOT data as you said, how do I get that done, and so we certainly want to create the meta-data, in like a very traditional data base catalog, or if you compare to a Hive Catalog, very much like that. The execution is very different, it uses the mechanisms under the cover that no SQL data bases have, or that Hadoop HDFS offer, and we certainly have no real interest in doing insert into Hadoop, 'cause the transaction mechanisms work very very differently, so its really focused on the meta-data areas and how do I expose that, how do I classify and categorize that data in ways people know and have seen for years. >> So that data manipulation will be handled by native tools, and some of the creations, some of the generation, some of the modeling will be handled now inside SQL, and there are a lot of SQL folks out there that have pretty good afinity for how to work with data. >> That's absolutely correct. >> So that's what it is, now how does it work? Tell us a bit about how this big data SQL is going to work, in a practical world. >> Okay. So we talked about the modeling already. The first step is that we extend the Oracle database and the catalog to understand things like Hive objects or HDFS kind of, where does stuff live. So we expanded and so we found a way to classify the meta-data first and foremost. The real magic is leveraging the Hadoop stack, so you ask a BI question and you want to join data in Oracle transactions, finance information, let's say with IOT data, which you'd reach out to HDFS for, big data SQL runs on the Hadoop notes, so it's local processing of that data, and it works exactly as HDFS and Hadoop work, in other words, I'm going to do processing local, I'm going to ask the name note which blocks am I supposed to read, that'll get run, we generate that query, we put it down to the Hadoop notes. And that's when some of the magic of SQL kicks in, which is really focused on performance, its performance, performance, performance, that's always the problem with federated data, how do I get it to perform across the board. And so what we took was, >> Predictably. >> Predictably, that's an interesting one, predictable performance, 'cause sometimes it works, sometimes it doesn't. So what we did is we took the exadata that was stored on the software, with all the magic as to how do I get a performance out of a file system out of IO, and we put that on the Hadoop notes, and then we push the queries all the way down to that software, and it does filtering, it does predicate pushdown, it leverages features like Parquet and ORC on the HDFS side, and at the end of the day, it kind of takes the IO requests, which is what a SQL query gives, feeds it to the Hadoop notes, runs it locally, and then sends it back to the database. And so we filter out a lot of the gunk we don't need, 'cause you said, oh I only need yesterdays data, or whatever the predicates are, and so that's how we think we can get an architecture ready that allows the global optimization, 'cause we can see the entire ecosystem in its totality, IOT, Oracle, all of it combined, we optimized the queries, push everything down as far as we can, algorithms to data, not data to algorithms, and that's how we're going to run this performance, predictably performance, on all of these pieces of data. >> So we end up with, if I got this right, let me recap, so we've got this notion that for data creation, data modeling, we can now use SQL, understood by a lot of people, doesn't preclude us from using native tools, but at least that's one place where we can see how it all comes together, we continue to use local tools for the actual manipulation elements. >> Absolutely. >> We are now using synergy like structures so we can push algorithm down to the data, so we're moving a small amount of data to a large amount of data, 'cause its cost down and improves predictability, but at the same time we've got meta-data objects that allow us to anticipate with some degree of predictability how this whole thing will run, and how this will come together back at the keynote, got that right? >> Got that right. >> Alright, so, next question is what's the impact of doing it this way? Talk a bit about, if you can, about how its helping folks who run data, who build applications, and who actually who are trying to get business value out of this whole process. >> So if we start with the business value, I think the biggest thing we bring to the table is simplicity, and standardization. If I have to understand how is this object represented in NoSQL, how in HDFS, how did somebody put a Jason file in here, I have to now spend time on literally digging through that, and then does it conform, do I have to modify it, what do I do? So I think the business value comes out of the SQL layer on top of it. It all looks exactly the same. It's well known, it's well understood, its far quicker to get from, I've got a bunch of data, to actually building a VI report, building a dashboard, building KPIs, and integrating that data, there's nothing new to data, its a level of abstraction we put on top of this, whether you use API or in this case we use SQL, 'cause that's the most common analytics language. So that's one part of how it will impact things. The 2nd is, and I think that's where the architecture is completely unique, we keep complete control of the query execution, from the meta-data we just talked about, and that enables us to do global optimization, and we can, and if you think this through a little bit, and go, oh global optimization sounds really cool, what does that mean? I can now actually start pushing processing, I can move data, and its what we've done in the exadata platform for years, data lives on disk, oh, Peter likes to query it very frequently, let's move it up to Flash, let's move it up to in-memory, let's twist the data around. So all the sudden we got control, we understand what gets queried, we understand where data lives, and we can start to optimize, exactly for the usage pattern the customer has, and that's always the performance aspect. And that goes to the old saying of, how can I get data as quickly to a customer when he really needs it, that's what this does, right, how can I optimize this? I've got thousands of people querying certain elements, move them up in the stack and get the performance and all these queries come back in like seconds. Regulatory stuff that needs to go through like five years of data, let's put it in cheap areas, and let's optimize that, and so the impact is cheaper and faster at the end of the day, and all 'cause there's a singular entity almost that governs the data, it governs the queries, it governs the usage patterns, that's what we uniquely bring to the table with this architecture. >> So I want to build on the notion of governance, because actually one of the interesting things you said was the idea that if its all under a common sort of interfaces, then you have greater visibility, where the data is, who owns it, et cetera. If you do this right, one of the biggest challenges that business are having is the global sense of how you govern your data. If you do this right, are you that much closer to having a competent overall data governance? >> I think we were able to set up a big step forward on it, and it sounds very simple, but we now have a central catalog, that actually understands what your data is and where it lives, in kind of like a well-known way, and again it sounds very simple but if you look at silos, that's the biggest problem, you have multiple silos, multiple things are in there, nobody knows really what's in there, so here we start to publish this in like a common structural layer, we have all the technical meta-data, we track who queries what, who does all those things, so that's a tremendous help in governance. The other side of course, because we still use native tools to let's say manipulate some data, or augment or add new data, we now are going to tie in a lot of the meta-data, that comes from say the Hadoop ecosystem, again into this catalog, and while we're probably not there yet just today on the end to end governance everything's kind of out of the box, here we go. >> And probably never will be. >> And we probably never will, you're right, and I think we set a major step forward with just consolidating it, and exposing people to all the data the have, and you can run all the other tools like, crawl my data and check box anything that says SSN, or looks like a social security number, all of those tools are are still relevant. We just have a consolidated view, dramatically improved governance. >> So I'm going to throw you a curve ball. >> Sure. >> Not all data I want to use is inside my business, or is being generated by sensors that I control, how does big data SQL and related technologies play a role in the actual contracting for additional data sources, and sustaining those relationships that are very very fundamental, how data's shared across organizations. Do you see this information being brought in under this umbrella? Do you see Oracle facilitating those types of relationships, introducing standards for data sharing across partnerships becomes even easier? >> I'm not convinced that big data SQL as a technology is going to solve all the problems we see there, I'm absolutely convinced that Oracle is going to work towards that, you see it in so many acquisitions we've done, you see it in the efforts of making data as a service available to people, and to some extent big data SQL will be a foundation layer to make BI queries run smoother across more and more and more pillars of data. If we can integrate database, Hadoop, and NoSQL, there's nothing that says, oh and by the way, storage cloud. >> And we have relatively common physical governance, that I have the same physical governance, and you have the same physical governance, now its easier for us to show how we can introduce governance across our instances. >> Absolutely, and today we focus a lot on HDFS or Hadoop as the next data pillar, storage cloud, ground to cloud, all of those are on the roadmap for big data SQL to catch up with that, and so if you have data as a service, let's declare that cloud for a second, and I have data in my database in my Hadoop cluster, again, all now becomes part of the same ecosystem of data, and it all looks the same to me from a BI query perspective, from an analytics perspective. And then the, how do I get the data sharing standards set up and all that, part of that is driving a lot of it into cloud, and making it all as a service, 'cause again you put a level of abstraction on top of it, that makes it easier to consume, understand where it came from, and capture the meta-data. >> So JP one last question. >> Sure. >> Oracle opens worlds on the horizon, what are you looking for, or what will your customers be looking for as it pertains to this big data SQL and related technologies? >> I think specifically from a big data SQL perspective, is we're going to drive the possible adoption scope much much further, today we work with HDFS an we work with Oracle database, we're going to announce certain things like exadata, Hadoop will be supportive, we hold down super cluster support, we're going to dramatically expand the footprint big data SQL will run on, people who come for big data SQL or analytics sessions you'll see a lot of the roadmap looking far more forward. I already mentioned some things like ground to cloud, how can I run big data SQL when my exadata is on Premis, and then the rest of my HDFS data is in the cloud, we're going to be talking about how we're going to do that, and what do we think the evolution of big data SQL is going to be, I think that's going to be a very fun session to go to. >> JP Dijcks, a master product manager inside the Oracle big data product group, thank you very much for joining us here On the Ground, at Oracle headquarters, this is The Cube.

Published Date : Sep 6 2016

SUMMARY :

Narrator: The Cube presents, On the Ground. or one of the master product managers, and so that's one of the ways we are driving, and so we certainly want to create the meta-data, and some of the creations, some of the generation, So that's what it is, now how does it work? and the catalog to understand things like Hive objects and so that's how we think we can get an architecture ready So we end up with, if I got this right, let me recap, and who actually who are trying to get business value out of and we can, and if you think this through a little bit, because actually one of the interesting things you said everything's kind of out of the box, here we go. and I think we set a major step forward and sustaining those relationships that are and to some extent big data SQL will be a foundation and you have the same physical governance, Absolutely, and today we focus a lot on HDFS or Hadoop and what do we think the evolution the Oracle big data product group,

ENTITIES

Entity	Category	Confidence
Peter Burris	PERSON	0.99+
Peter	PERSON	0.99+
JP Dijcks	PERSON	0.99+
Jean-Pierre Dijcks	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
JP	PERSON	0.99+
five years	QUANTITY	0.99+
Jason	PERSON	0.99+
Python	TITLE	0.99+
SQL	TITLE	0.99+
NoSQL	TITLE	0.99+
2nd	QUANTITY	0.98+
first step	QUANTITY	0.98+
today	DATE	0.98+
HDFS	ORGANIZATION	0.98+
Today	DATE	0.98+
one	QUANTITY	0.97+
Hadoop	TITLE	0.96+
Parquet	TITLE	0.96+
one part	QUANTITY	0.95+
The Cube	ORGANIZATION	0.95+
thousands of people	QUANTITY	0.94+
yesterdays	DATE	0.94+
ORC	TITLE	0.93+
Silicon Angle	ORGANIZATION	0.92+
Flash	TITLE	0.91+
first	QUANTITY	0.84+
years	QUANTITY	0.83+
a second	QUANTITY	0.82+
the Ground	TITLE	0.82+
Hadoop HDFS	TITLE	0.81+
a lot of people	QUANTITY	0.8+
one place	QUANTITY	0.77+
The Cube	TITLE	0.6+
singular	QUANTITY	0.58+
Narrator	TITLE	0.54+
last question	QUANTITY	0.52+
IOT	ORGANIZATION	0.37+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for ORC: