Image Title

Search Results for Presto:

Breaking Analysis: CIOs in a holding pattern but ready to strike at monetization


 

>> From theCUBE Studios in Palo Alto and Boston, bringing you data-driven insights from theCUBE and ETR. This is "Breaking Analysis" with Dave Vellante. >> Recent conversations with IT decision makers show a stark contrast between exiting 2023 versus the mindset when we were leaving 2022. CIOs are generally funding new initiatives by pushing off or cutting lower priority items, while security efforts are still being funded. Those that enable business initiatives that generate revenue or taking priority over cleaning up legacy technical debt. The bottom line is, for the moment, at least, the mindset is not cut everything, rather, it's put a pause on cleaning up legacy hairballs and fund monetization. Hello, and welcome to this week's Wikibon Cube Insights powered by ETR. In this breaking analysis, we tap recent discussions from two primary sources, year-end ETR roundtables with IT decision makers, and CUBE conversations with data, cloud, and IT architecture practitioners. The sources of data for this breaking analysis come from the following areas. Eric Bradley's recent ETR year end panel featured a financial services DevOps and SRE manager, a CSO in a large hospitality firm, a director of IT for a big tech company, the head of IT infrastructure for a financial firm, and a CTO for global travel enterprise, and for our upcoming Supercloud2 conference on January 17th, which you can register free by the way, at supercloud.world, we've had CUBE conversations with data and cloud practitioners, specifically, heads of data in retail and financial services, a cloud architect and a biotech firm, the director of cloud and data at a large media firm, and the director of engineering at a financial services company. Now we've curated commentary from these sources and now we share them with you today as anecdotal evidence supporting what we've been reporting on in the marketplace for these last couple of quarters. On this program, we've likened the economy to the slingshot effect when you're driving, when you're cruising along at full speed on the highway, and suddenly you see red brake lights up ahead, so, you tap your own brakes and then you speed up again, and traffic is moving along at full speed, so, you think nothing of it, and then, all of a sudden, the same thing happens. You slow down to a crawl and you start wondering, "What the heck is happening?" And you become a lot more cautious about the rate of acceleration when you start moving again. Well, that's the trend in IT spend right now. Back in June, we reported that despite the macro headwinds, CIOs were still expecting 6% to 7% spending growth for 2022. Now that was down from 8%, which we reported at the beginning of 2022. That was before Ukraine, and Fed tightening, but given those two factors, you know that that seemed pretty robust, but throughout the fall, we began reporting consistently declining expectations where CIOs are now saying Q4 will come in at around 3% growth relative to last year, and they're expecting, or should we say hoping that it pops back up in 2023 to 4% to 5%. The recent ETR panelists, when they heard this, are saying based on their businesses and discussions with their peers, they could see low single digit growth for 2023, so, 1%, 2%, 3%, so, this sort of slingshotting, or sometimes we call it a seesaw economy, has caught everyone off guard. Amazon is a good example of this, and there are others, but Amazon entered the pandemic with around 800,000 employees. It doubled that workforce during the pandemic. Now, right before Thanksgiving in 2022, Amazon announced that it was laying off 10,000 employees, and, Jassy, the CEO of Amazon, just last week announced that number is now going to grow to 18,000. Now look, this is a rounding error at Amazon from a headcount standpoint and their headcount remains far above 2019 levels. Its stock price, however, does not and it's back down to 2019 levels. The point is that visibility is very poor right now and it's reflected in that uncertainty. We've seen a lot of layoffs, obviously, the stock market's choppy, et cetera. Now importantly, not everything is on hold, and this downturn is different from previous tech pullbacks in that the speed at which new initiatives can be rolled out is much greater thanks to the cloud, and if you can show a fast return, you're going to get funding. Organizations are pausing on the cleanup of technical debt, unless it's driving fast business value. They're holding off on modernization projects. Those business enablement initiatives are still getting funded. CIOs are finding the money by consolidating redundant vendors, and they're stealing from other pockets of budget, so, it's not surprising that cybersecurity remains the number one technology priority in 2023. We've been reporting that for quite some time now. It's specifically cloud, cloud native security container and API security. That's where all the action is, because there's still holes to plug from that forced march to digital that occurred during COVID. Cloud migration, kind of showing here on number two on this chart, still a high priority, while optimizing cloud spend is definitely a strategy that organizations are taking to cut costs. It's behind consolidating redundant vendors by a long shot. There's very little evidence that cloud repatriation, i.e., moving workloads back on prem is a major cost cutting trend. The data just doesn't show it. What is a trend is getting more real time with analytics, so, companies can do faster and more accurate customer targeting, and they're really prioritizing that, obviously, in this down economy. Real time, we sometimes lose it, what's real time? Real time, we sometimes define as before you lose the customer. Now in the hiring front, customers tell us they're still having a hard time finding qualified site reliability engineers, SREs, Kubernetes expertise, and deep analytics pros. These job markets remain very tight. Let's stay with security for just a moment. We said many times that, prior to COVID, zero trust was this undefined buzzword, and the joke, of course, is, if you ask three people, "What is zero trust?" You're going to get three different answers, but the truth is that virtually every security company that was resisting taking a position on zero trust in an attempt to avoid... They didn't want to get caught up in the buzzword vortex, but they're now really being forced to go there by CISOs, so, there are some good quotes here on cyber that we want to share that came out of the recent conversations that we cited up front. The first one, "Zero trust is the highest ROI, because it enables business transformation." In other words, if I can have good security, I can move fast, it's not a blocker anymore. Second quote here, "ZTA," zero trust architecture, "Is more than securing the perimeter. It encompasses strong authentication and multiple identity layers. It requires taking a software approach to security instead of a hardware focus." The next one, "I'd love to have a security data lake that I could apply to asset management, vulnerability management, incident management, incident response, and all aspects for my security team. I see huge promise in that space," and the last one, I see NLP, natural language processing, as the foundation for email security, so, instead of searching for IP addresses, you can now read emails at light speed and identify phishing threats, so, look at, this is a small snapshot of the mindset around security, but I'll add, when you talk to the likes of CrowdStrike, and Zscaler, and Okta, and Palo Alto Networks, and many other security firms, they're listening to these narratives around zero trust. I'm confident they're working hard on skating to this puck, if you will. A good example is this idea of a security data lake and using analytics to improve security. We're hearing a lot about that. We're hearing architectures, there's acquisitions in that regard, and so, that's becoming real, and there are many other examples, because data is at the heart of digital business. This is the next area that we want to talk about. It's obvious that data, as a topic, gets a lot of mind share amongst practitioners, but getting data right is still really hard. It's a challenge for most organizations to get ROI and expected return out of data. Most companies still put data at the periphery of their businesses. It's not at the core. Data lives within silos or different business units, different clouds, it's on-prem, and increasingly it's at the edge, and it seems like the problem is getting worse before it gets better, so, here are some instructive comments from our recent conversations. The first one, "We're publishing events onto Kafka, having those events be processed by Dataproc." Dataproc is a Google managed service to run Hadoop, and Spark, and Flank, and Presto, and a bunch of other open source tools. We're putting them into the appropriate storage models within Google, and then normalize the data into BigQuery, and only then can you take advantage of tools like ThoughtSpot, so, here's a company like ThoughtSpot, and they're all about simplifying data, democratizing data, but to get there, you have to go through some pretty complex processes, so, this is a good example. All right, another comment. "In order to use Google's AI tools, we have to put the data into BigQuery. They haven't integrated in the way AWS and Snowflake have with SageMaker. Moving the data is too expensive, time consuming, and risky," so, I'll just say this, sharing data is a killer super cloud use case, and firms like Snowflake are on top of it, but it's still not pretty across clouds, and Google's posture seems to be, "We're going to let our database product competitiveness drive the strategy first, and the ecosystem is going to take a backseat." Now, in a way, I get it, owning the database is critical, and Google doesn't want to capitulate on that front. Look, BigQuery is really good and competitive, but you can't help but roll your eyes when a CEO stands up, and look, I'm not calling out Thomas Kurian, every CEO does this, and talks about how important their customers are, and they'll do whatever is right by the customer, so, look, I'm telling you, I'm rolling my eyes on that. Now let me also comment, AWS has figured this out. They're killing it in database. If you take Redshift for example, it's still growing, as is Aurora, really fast growing services and other data stores, but AWS realizes it can make more money in the long-term partnering with the Snowflakes and Databricks of the world, and other ecosystem vendors versus sub optimizing their relationships with partners and customers in order to sell more of their own homegrown tools. I get it. It's hard not to feature your own product. IBM chose OS/2 over Windows, and tried for years to popularize it. It failed. Lotus, go back way back to Lotus 1, 2, and 3, they refused to run on Windows when it first came out. They were running on DEC VAX. Many of you young people in the United States have never even heard of DEC VAX. IBM wanted to run every everything only in its cloud, the same with Oracle, originally. VMware, as you might recall, tried to build its own cloud, but, eventually, when the market speaks and reveals what seems to be obvious to analysts, years before, the vendors come around, they face reality, and they stop wasting money, fighting a losing battle. "The trend is your friend," as the saying goes. All right, last pull quote on data, "The hardest part is transformations, moving traditional Informatica, Teradata, or Oracle infrastructure to something more modern and real time, and that's why people still run apps in COBOL. In IT, we rarely get rid of stuff, rather we add on another coat of paint until the wood rots out or the roof is going to cave in. All right, the last key finding we want to highlight is going to bring us back to the cloud repatriation myth. Followers of this program know it's a real sore spot with us. We've heard the stories about repatriation, we've read the thoughtful articles from VCs on the subject, we've been whispered to by vendors that you should investigate this trend. It's really happening, but the data simply doesn't support it. Here's the question that was posed to these practitioners. If you had unlimited budget and the economy miraculously flipped, what initiatives would you tackle first? Where would you really lean into? The first answer, "I'd rip out legacy on-prem infrastructure and move to the cloud even faster," so, the thing here is, look, maybe renting infrastructure is more expensive than owning, maybe, but if I can optimize my rental with better utilization, turn off compute, use things like serverless, get on a steeper and higher performance over time, and lower cost Silicon curve with things like Graviton, tap best of breed tools in AI, and other areas that make my business more competitive. Move faster, fail faster, experiment more quickly, and cheaply, what's that worth? Even the most hard-o CFOs understand the business benefits far outweigh the possible added cost per gigabyte, and, again, I stress "possible." Okay, other interesting comments from practitioners. "I'd hire 50 more data engineers and accelerate our real-time data capabilities to better target customers." Real-time is becoming a thing. AI is being injected into data and apps to make faster decisions, perhaps, with less or even no human involvement. That's on the rise. Next quote, "I'd like to focus on resolving the concerns around cloud data compliance," so, again, despite the risks of data being spread out in different clouds, organizations realize cloud is a given, and they want to find ways to make it work better, not move away from it. The same thing in the next one, "I would automate the data analytics pipeline and focus on a safer way to share data across the states without moving it," and, finally, "The way I'm addressing complexity is to standardize on a single cloud." MonoCloud is actually a thing. We're hearing this more and more. Yes, my company has multiple clouds, but in my group, we've standardized on a single cloud to simplify things, and this is a somewhat dangerous trend, because it's creating even more silos and it's an opportunity that needs to be addressed, and that's why we've been talking so much about supercloud is a cross-cloud, unifying, architectural framework, or, perhaps, it's a platform. In fact, that's a question that we will be exploring later this month at Supercloud2 live from our Palo Alto Studios. Is supercloud an architecture or is it a platform? And in this program, we're featuring technologists, analysts, practitioners to explore the intersection between data and cloud and the future of cloud computing, so, you don't want to miss this opportunity. Go to supercloud.world. You can register for free and participate in the event directly. All right, thanks for listening. That's a wrap. I'd like to thank Alex Myerson, who's on production and manages our podcast, Ken Schiffman as well, Kristen Martin and Cheryl Knight, they helped get the word out on social media, and in our newsletters, and Rob Hof is our editor-in-chief over at siliconangle.com. He does some great editing. Thank you, all. Remember, all these episodes are available as podcasts wherever you listen. All you've got to do is search "breaking analysis podcasts." I publish each week on wikibon.com and siliconangle.com where you can email me directly at david.vellante@siliconangle.com or DM me, @Dante, or comment on our LinkedIn posts. By all means, check out etr.ai. They get the best survey data in the enterprise tech business. We'll be doing our annual predictions post in a few weeks, once the data comes out from the January survey. This is Dave Vellante for theCUBE Insights powered by ETR. Thanks for watching, everybody, and we'll see you next time on "Breaking Analysis." (upbeat music)

Published Date : Jan 7 2023

SUMMARY :

This is "Breaking Analysis" and the director of engineering

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Alex MyersonPERSON

0.99+

AWSORGANIZATION

0.99+

Ken SchiffmanPERSON

0.99+

Dave VellantePERSON

0.99+

AmazonORGANIZATION

0.99+

JassyPERSON

0.99+

Cheryl KnightPERSON

0.99+

Eric BradleyPERSON

0.99+

Rob HofPERSON

0.99+

OktaORGANIZATION

0.99+

Kristen MartinPERSON

0.99+

ZscalerORGANIZATION

0.99+

GoogleORGANIZATION

0.99+

Thomas KurianPERSON

0.99+

6%QUANTITY

0.99+

IBMORGANIZATION

0.99+

2023DATE

0.99+

18,000QUANTITY

0.99+

Palo Alto NetworksORGANIZATION

0.99+

10,000 employeesQUANTITY

0.99+

CrowdStrikeORGANIZATION

0.99+

JanuaryDATE

0.99+

2022DATE

0.99+

January 17thDATE

0.99+

BostonLOCATION

0.99+

Lotus 1TITLE

0.99+

2019DATE

0.99+

JuneDATE

0.99+

8%QUANTITY

0.99+

United StatesLOCATION

0.99+

david.vellante@siliconangle.comOTHER

0.99+

SnowflakesORGANIZATION

0.99+

Palo AltoLOCATION

0.99+

LotusTITLE

0.99+

two factorsQUANTITY

0.99+

OracleORGANIZATION

0.99+

DataprocORGANIZATION

0.99+

three peopleQUANTITY

0.99+

last weekDATE

0.99+

Supercloud2EVENT

0.99+

TeradataORGANIZATION

0.99+

1%QUANTITY

0.99+

3TITLE

0.99+

WindowsTITLE

0.99+

5%QUANTITY

0.99+

3%QUANTITY

0.99+

BigQueryTITLE

0.99+

Second quoteQUANTITY

0.99+

4%QUANTITY

0.99+

DEC VAXTITLE

0.99+

ThanksgivingEVENT

0.98+

OS/2TITLE

0.98+

7%QUANTITY

0.98+

last yearDATE

0.98+

two primary sourcesQUANTITY

0.98+

each weekQUANTITY

0.98+

InformaticaORGANIZATION

0.98+

pandemicEVENT

0.98+

first oneQUANTITY

0.98+

siliconangle.comOTHER

0.97+

first answerQUANTITY

0.97+

2%QUANTITY

0.97+

around 800,000 employeesQUANTITY

0.97+

50 more data engineersQUANTITY

0.97+

zero trustQUANTITY

0.97+

SnowflakeORGANIZATION

0.96+

single cloudQUANTITY

0.96+

2TITLE

0.96+

todayDATE

0.95+

ETRORGANIZATION

0.95+

single cloudQUANTITY

0.95+

LinkedInORGANIZATION

0.94+

later this monthDATE

0.94+

Mitesh Shah, Alation & Ash Naseer, Warner Bros Discovery | Snowflake Summit 2022


 

(upbeat music) >> Welcome back to theCUBE's continuing coverage of Snowflake Summit '22 live from Caesar's Forum in Las Vegas. I'm Lisa Martin, my cohost Dave Vellante, we've been here the last day and a half unpacking a lot of news, a lot of announcements, talking with customers and partners, and we have another great session coming for you next. We've got a customer and a partner talking tech and data mash. Please welcome Mitesh Shah, VP in market strategy at Elation. >> Great to be here. >> and Ash Naseer great, to have you, senior director of data engineering at Warner Brothers Discovery. Welcome guys. >> Thank you for having me. >> It's great to be back in person and to be able to really get to see and feel and touch this technology, isn't it? >> Yeah, it is. I mean two years or so. Yeah. Great to feel the energy in the conference center. >> Yeah. >> Snowflake was virtual, I think for two years and now it's great to kind of see the excitement firsthand. So it's wonderful. >> Th excitement, but also the boom and the number of customers and partners and people attending. They were saying the first, or the summit in 2019 had about 1900 attendees. And this is around 10,000. So a huge jump in a short time period. Talk a little bit about the Elation-Snowflake partnership and probably some of the acceleration that you guys have been experiencing as a Snowflake partner. >> Yeah. As a snowflake partner. I mean, Snowflake is an investor of us in Elation early last year, and we've been a partner for, for longer than that. And good news. We have been awarded Snowflake partner of the year for data governance, just earlier this week. And that's in fact, our second year in a row for winning that award. So, great news on that front as well. >> Repeat, congratulations. >> Repeat. Absolutely. And we're going to hope to make it a three-peat as well. And we've also been awarded industry competency badges in five different industries, those being financial services, healthcare, retail technology, and Median Telcom. >> Excellent. Okay. Going to right get into it. Data mesh. You guys actually have a data mesh and you've presented at the conference. So, take us back to the beginning. Why did you decide that you needed to implement something like data mesh? What was the impetus? >> Yeah. So when people think of Warner brothers, you always think of like the movie studio, but we're more than that, right? I mean, you think of HBO, you think of TNT, you think of CNN, we have 30 plus brands in our portfolio and each have their own needs. So the idea of a data mesh really helps us because what we can do is we can federate access across the company so that, you know, CNN can work at their own pace. You know, when there's election season, they can ingest their own data and they don't have to, you know, bump up against as an example, HBO, if Game of Thrones is going on. >> So, okay. So the, the impetus was to serve those lines of business better. Actually, given that you've got these different brands, it was probably easier than most companies. Cause if you're, let's say you're a big financial services company, and now you have to decide who owns what. CNN owns its own data products, HBO. Now, do they decide within those different brands, how to distribute even further? Or is it really, how deep have you gone in that decentralization? >> That's a great question. It's a very close partnership, because there are a number of data sets, which are used by all the brands, right? You think about people browsing websites, right? You know, CNN has a website, Warner brothers has a website. So for us to ingest that data for each of the brands to ingest that data separately, that means five different ways of doing things and you know, a big environment, right? So that is where our team comes into play. We ingest a lot of the common data sets, but like I said, any unique data sets, data sets regarding theatrical as an example, you know, Warner brothers does it themselves, you know, for streaming, HBO Max, does it themselves. So we kind of operate in partnership. >> So do you have a centralized data team and also decentralized data teams, right? >> That's right. >> So I love this conversation because that was heresy 10 years ago, five years ago, even, cause that's inefficient. But you've, I presume you've found that it's actually more productive in terms of the business output, explain that dynamic. >> You know, you bring up such a good point. So I, you know, I consider myself as one of the dinosaurs who started like 20 plus years ago in this industry. And back then, we were all taught to think of the data warehouse as like a monolithic thing. And the reason for that is the technology wasn't there. The technology didn't catch up. Now, 20 years later, the technology is way ahead, right? But like, our mindset's still the same because we think of data warehouses and data platforms still as a monolithic thing. But if you really sort of remove that sort of mental barrier, if you will, and if you start thinking about, well, how do I sort of, you know, federate everything and make sure that you let folks who are building, or are closest to the customer or are building their products, let them own that data and have a partnership. The results have been amazing. And if we were only sort of doing it as a centralized team, we would not be able to do a 10th of what we do today. So it's that massive scale in, in our company as well. >> And I should have clarified, when we talk about data mesh are we talking about the implementing in practice, the octagon sort of framework, or is this sort of your own sort of terminology? >> Well, so the interesting part is four years ago, we didn't have- >> It didn't exist. >> Yeah. It didn't exist. And, and so we, our principle was very simple, right? When we started out, we said, we want to make sure that our brands are able to operate independently with some oversight and guidance from our technology teams, right? That's what we set out to do. We did that with Snowflake by design because Snowflake allows us to, you know, separate those, those brands into different accounts. So that was done by design. And then the, the magic, I think, is the Snowflake data sharing where, which allows us to sort of bring data in here once, and then share it with whoever needs it. So think about HBO Max. On HBO Max, You not only have HBO Max content, but content from CNN, from Cartoon Network, from Warner Brothers, right? All the movies, right? So to see how The Batman movie did in theaters and then on streaming, you don't need, you know, Warner brothers doesn't need to ingest the same streaming data. HBO Max does it. HBO Max shares it with Warner brothers, you know, store once, share many times, and everyone works at their own pace. >> So they're building data products. Those data products are discoverable APIs, I presume, or I guess maybe just, I guess the Snowflake cloud, but very importantly, they're governed. And that's correct, where Elation comes in? >> That's precisely where Elation comes in, is where sort of this central flexible foundation for data governance. You know, you mentioned data mesh. I think what's interesting is that it's really an answer to the bottlenecks created by centralized IT, right? There's this notion of decentralizing that the data engineers and making the data domain owners, the people that know the data the best, have them be in control of publishing the data to the data consumers. There are other popular concepts actually happening right now, as we speak, around modern data stack. Around data fabric that are also in many ways underpinned by this notion of decentralization, right? These are concepts that are underpinned by decentralization and as the pendulum swings, sort of between decentralization and centralization, as we go back and forth in the world of IT and data, there are certain constants that need to be centralized over time. And one of those I believe is very much a centralized platform for data governance. And that's certainly, I think where we come in. Would love to hear more about how you use Elation. >> Yeah. So, I mean, elation helps us sort of, as you guys say, sort of, map, the treasure map of the data, right? So for consumers to find where their data is, that's where Elation helps us. It helps us with the data cataloging, you know, storing all the metadata and, you know, users can go in, they can sort of find, you know, the data that they need and they can also find how others are using data. So it's, there's a little bit of a crowdsourcing aspect that Elation helps us to do whereby you know, you can see, okay, my peer in the other group, well, that's how they use this piece of data. So I'm not going to spend hours trying to figure this out. You're going to use the query that they use. So yeah. >> So you have a master catalog, I presume. And then each of the brands has their own sub catalogs, is that correct? >> Well, for the most part, we have that master catalog and then the brands sort of use it, you know, separately themselves. The key here is all that catalog, that catalog isn't maintained by a centralized group as well, right? It's again, maintained by the individual teams and not only in the individual teams, but the folks that are responsible for the data, right? So I talked about the concept of crowdsourcing, whoever sort of puts the data in, has to make sure that they update the catalog and make sure that the definitions are there and everything sort of in line. >> So HBO, CNN, and each have their own, sort of access to their catalog, but they feed into the master catalog. Is that the right way to think about it? >> Yeah. >> Okay. And they have their own virtual data warehouses, right? They have ownership over that? They can spin 'em up, spin 'em down as they see fit? Right? And they're governed. >> They're governed. And what's interesting is it's not just governed, right? Governance is a, is a big word. It's a bit nebulous, but what's really being enabled here is this notion of self-service as well, right? There's two big sort of rockets that need to happen at the same time in any given organization. There's this notion that you want to put trustworthy data in the hands of data consumers, while at the same time mitigating risk. And that's precisely what Elation does. >> So I want to clarify this for the audience. So there's four principles of database. This came after you guys did it. And I wonder how it aligns. Domain ownership, give data, as you were saying to the, to the domain owners who have context, data as product, you guys are building data products, and that creates two problems. How do you give people self-service infrastructure and how do you automate governance? So the first two, great. But then it creates these other problems. Does that align with your philosophy? Where's alignment? What's different? >> Yeah. Data products is exactly where we're going. And that sort of, that domain based design, that's really key as well. In our business, you think about who the customer is, as an example, right? Depending on who you ask, it's going to be, the answer might be different, you know, to the movie business, it's probably going to be the person who watches a movie in a theater. To the streaming business, to HBO Max, it's the streamer, right? To others, someone watching live CNN on their TV, right? There's yet another group. Think about all the franchising we do. So you see Batman action figures and T-shirts, and Warner brothers branded stuff in stores, that's yet another business unit. But at the end of the day, it's not a different person, it's you and me, right? We do all these things. So the domain concept, make sure that you ingest data and you bring data relevant to the context, however, not sort of making it so stringent where it cannot integrate, and then you integrate it at a higher level to create that 360. >> And it's discoverable. So the point is, I don't have to go tap Ash on the shoulder, say, how do I get this data? Is it governed? Do I have access to it? Give me the rules of it. Just, I go grab it, right? And the system computationally automates whether or not I have access to it. And it's, as you say, self-service. >> In this case, exactly right. It enables people to just search for data and know that when they find the data, whether it's trustworthy or not, through trust flags, and the like, it's doing both of those things at the same time. >> How is it an enabler of solving some of the big challenges that the media and entertainment industry is going through? We've seen so much change the last couple of years. The rising consumer expectations aren't going to go back down. They're only going to come up. We want you to serve us up content that's relevant, that's personalized, that makes sense. I'd love to understand from your perspective, Mitesh, from an industry challenges perspective, how does this technology help customers like Warner Brothers Discovery, meet business customers, where they are and reduce the volume on those challenges? >> It's a great question. And as I mentioned earlier, we had five industry competency badges that were awarded to us by Snowflake. And one of those four, Median Telcom. And the reason for that is we're helping media companies understand their audiences better, and ultimately serve up better experiences for their audiences. But we've got Ash right here that can tell us how that's happening in practice. >> Yeah, tell us. >> So I'll share a story. I always like to tell stories, right? Once once upon a time before we had Elation in place, it was like, who you knew was how you got access to the data. So if I knew you and I knew you had access to a certain kind of data and your access to the right kind of data was based on the network you had at the company- >> I had to trust you. >> Yeah. >> I might not want to give up my data. >> That's it. And so that's where Elation sort of helps us democratize it, but, you know, puts the governance and controls, right? There are certain sensitive things as well, such as viewership, such as subscriber accounts, which are very important. So making sure that the right people have access to it, that's the other problem that Elation helps us solve. >> That's precisely part of our integration with Snowflake in particular, being able to define and manage policies within Elation. Saying, you know, certain people should have access to certain rows, doing column level masking. And having those policies actually enforced at the Snowflake data layer is precisely part of our value product. >> And that's automated. >> And all that's automated. Exactly. >> Right. So I don't have to think about it. I don't have to go through the tap on their shoulder. What has been the impact, Ash, on data quality as you've pushed it down into the domains? >> That's a great question. So it has definitely improved, but data quality is a very interesting subject, because back to my example of, you know, when we started doing things, we, you know, the centralized IT team always said, well, it has to be like this, Right? And if it doesn't fit in this, then it's bad quality. Well, sometimes context changes. Businesses change, right? You have to be able to react to it quickly. So making sure that a lot of that quality is managed at the decentralized level, at the place where you have that business context, that ensures you have the most up to date quality. We're talking about media industry changing so quickly. I mean, would we have thought three years ago that people would watch a lot of these major movies on streaming services? But here's the reality, right? You have to react and, you know, having it at that level just helps you react faster. >> So data, if I play that back, data quality is not a static framework. It's flexible based on the business context and the business owners can make those adjustments, cause they own the data. >> That's it. That's exactly it. >> That's awesome. Wow. That's amazing progress that you guys have made. >> In quality, if I could just add, it also just changes depending on where you are in your data pipeline stage, right? Data, quality data observability, this is a very fast evolving space at the moment, and if I look to my left right now, I bet you I can probably see a half-dozen quality observability vendors right now. And so given that and given the fact that Elation still is sort of a central hub to find trustworthy data, we've actually announced an open data quality initiative, allowing for best-of-breed data quality vendors to integrate with the platform. So whoever they are, whatever tool folks want to use, they can use that particular tool of choice. >> And this all runs in the cloud, or is it a hybrid sort of? >> Everything is in the cloud. We're all in the cloud. And you know, again, helps us go faster. >> Let me ask you a question. I could go on forever in this topic. One of the concepts that was put forth is whether it's a Snowflake data warehouse or a data bricks, data lake, or an Oracle data warehouse, they should all be inclusive. They should just be a node on the mesh. Like, wow, that sounds good. But I haven't seen it yet. Right? I'm guessing that Snowflake and Elation enable all the self-serve, all this automated governance, and that including those other items, it's got to be a one-off at this point in time. Do you ever see you expanding that scope or is it better off to just kind of leave it into the, the Snowflake data cloud? >> It's a good question. You know, I feel like where we're at today, especially in terms of sort of technology giving us so many options, I don't think there's a one size fits all. Right? Even though we are very heavily invested in Snowflake and we use Snowflake consistently across the organization, but you could, theoretically, could have an architecture that blends those two, right? Have different types of data platforms like a teradata or an Oracle and sort of bring it all together today. We have the technology, you know, that and all sorts of things that can make sure that you query on different databases. So I don't think the technology is the problem, I think it's the organizational mindset. I think that that's what gets in the way. >> Oh, interesting. So I was going to ask you, will hybrid tables help you solve that problem? And, maybe not, what you're saying, it's the organization that owns the Oracle database saying, Hey, we have our system. It processes, it works, you know, go away. >> Yeah. Well, you know, hybrid tables I think, is a great sort of next step in Snowflake's evolution. I think it's, in my opinion, I, think it's a game changer, but yeah. I mean, they can still exist. You could do hybrid tables right on Snowflake, or you could, you know, you could kind of coexist as well. >> Yeah. But, do you have a thought on this? >> Yeah, I do. I mean, we're always going to live in a time where you've got data distributed in throughout the organization and around the globe. And that could be even if you're all in on Snowflake, you could have data in Snowflake here, you could have data in Snowflake in EMEA and Europe somewhere. It could be anywhere. By the same token you might be using. Every organization is using on-premises systems. They have data, they naturally have data everywhere. And so, you know, this one solution to this is really centralizing, as I mentioned, not just governance, but also metadata about all of the data in your organization so that you can enable people to search and find and discover trustworthy data no matter where it is in your organization. >> Yeah. That's a great point. I mean, if you have the data about the data, then you can, you can treat these independent nodes. That's just that. Right? And maybe there's some advantages of putting it all in the Snowflake cloud, but to your point, organizationally, that's just not feasible. The whole, unfortunately, sorry, Snowflake, all the world's data is not going to go into Snowflake, but they play a key role in accelerating, what I'm hearing, your vision of data mesh. >> Yeah, absolutely. I think going forward in the future, we have to start thinking about data platforms as just one place where you sort of dump all the data. That's where the mesh concept comes in. It is going to be a mesh. It's going to be distributed and organizations have to be okay with that. And they have to embrace the tools. I mean, you know, Facebook developed a tool called Presto many years ago that that helps them solve exactly the same problem. So I think the technology is there. I think the organizational mindset needs to evolve. >> Yeah. Definitely. >> Culture. Culture is one of the hardest things to change. >> Exactly. >> Guys, this was a masterclass in data mesh, I think. Thank you so much for coming on talking. >> We appreciate it. Thank you so much. >> Of course. What Elation is doing with Snowflake and with Warner Brothers Discovery, Keep that content coming. I got a lot of stuff I got to catch up on watching. >> Sounds good. Thank you for having us. >> Thanks guys. >> Thanks, you guys. >> For Dave Vellante, I'm Lisa Martin. You're watching theCUBE live from Snowflake Summit '22. We'll be back after a short break. (upbeat music)

Published Date : Jun 30 2022

SUMMARY :

session coming for you next. and Ash Naseer great, to have you, in the conference center. and now it's great to kind of see the acceleration that you guys have of the year for data And we've also been awarded Why did you decide that you So the idea of a data mesh Or is it really, how deep have you gone the brands to ingest that data separately, terms of the business and make sure that you let allows us to, you know, separate those, guess the Snowflake cloud, of decentralizing that the data engineers the data cataloging, you know, storing all So you have a master that are responsible for the data, right? Is that the right way to think about it? And they're governed. that need to happen at the So the first two, great. the answer might be different, you know, So the point is, It enables people to just search that the media and entertainment And the reason for that is So if I knew you and I knew that the right people have access to it, Saying, you know, certain And all that's automated. I don't have to go through You have to react and, you know, It's flexible based on the That's exactly it. that you guys have made. and given the fact that Elation still And you know, again, helps us go faster. a node on the mesh. We have the technology, you that owns the Oracle database saying, you know, you could have a thought on this? And so, you know, this one solution I mean, if you have the I mean, you know, the hardest things to change. Thank you so much for coming on talking. Thank you so much. of stuff I got to catch up on watching. Thank you for having us. from Snowflake Summit '22.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Dave VellantePERSON

0.99+

Lisa MartinPERSON

0.99+

CNNORGANIZATION

0.99+

HBOORGANIZATION

0.99+

Mitesh ShahPERSON

0.99+

Ash NaseerPERSON

0.99+

EuropeLOCATION

0.99+

FacebookORGANIZATION

0.99+

MiteshPERSON

0.99+

ElationORGANIZATION

0.99+

TNTORGANIZATION

0.99+

Warner brothersORGANIZATION

0.99+

EMEALOCATION

0.99+

second yearQUANTITY

0.99+

OracleORGANIZATION

0.99+

2019DATE

0.99+

two yearsQUANTITY

0.99+

oneQUANTITY

0.99+

Cartoon NetworkORGANIZATION

0.99+

Game of ThronesTITLE

0.99+

two problemsQUANTITY

0.99+

twoQUANTITY

0.99+

Warner BrothersORGANIZATION

0.99+

10thQUANTITY

0.99+

firstQUANTITY

0.99+

SnowflakeORGANIZATION

0.99+

Snowflake Summit '22EVENT

0.99+

Warner brothersORGANIZATION

0.99+

eachQUANTITY

0.99+

fourQUANTITY

0.99+

Las VegasLOCATION

0.99+

Median TelcomORGANIZATION

0.99+

20 years laterDATE

0.98+

bothQUANTITY

0.98+

five different industriesQUANTITY

0.98+

10 years agoDATE

0.98+

30 plus brandsQUANTITY

0.98+

AlationPERSON

0.98+

four years agoDATE

0.98+

todayDATE

0.98+

20 plus years agoDATE

0.97+

Warner Brothers DiscoveryORGANIZATION

0.97+

OneQUANTITY

0.97+

five years agoDATE

0.97+

Snowflake Summit 2022EVENT

0.97+

three years agoDATE

0.97+

five different waysQUANTITY

0.96+

earlier this weekDATE

0.96+

SnowflakeTITLE

0.96+

MaxTITLE

0.96+

early last yearDATE

0.95+

about 1900 attendeesQUANTITY

0.95+

SnowflakeEVENT

0.94+

AshPERSON

0.94+

three-peatQUANTITY

0.94+

around 10,000QUANTITY

0.93+

Mitesh Shah, Alation & Ash Naseer, Warner Bros Discovery | Snowflake Summit 2022


 

(upbeat music) >> Welcome back to theCUBE's continuing coverage of Snowflake Summit '22 live from Caesar's Forum in Las Vegas. I'm Lisa Martin, my cohost Dave Vellante, we've been here the last day and a half unpacking a lot of news, a lot of announcements, talking with customers and partners, and we have another great session coming for you next. We've got a customer and a partner talking tech and data mash. Please welcome Mitesh Shah, VP in market strategy at Elation. >> Great to be here. >> and Ash Naseer great, to have you, senior director of data engineering at Warner Brothers Discovery. Welcome guys. >> Thank you for having me. >> It's great to be back in person and to be able to really get to see and feel and touch this technology, isn't it? >> Yeah, it is. I mean two years or so. Yeah. Great to feel the energy in the conference center. >> Yeah. >> Snowflake was virtual, I think for two years and now it's great to kind of see the excitement firsthand. So it's wonderful. >> Th excitement, but also the boom and the number of customers and partners and people attending. They were saying the first, or the summit in 2019 had about 1900 attendees. And this is around 10,000. So a huge jump in a short time period. Talk a little bit about the Elation-Snowflake partnership and probably some of the acceleration that you guys have been experiencing as a Snowflake partner. >> Yeah. As a snowflake partner. I mean, Snowflake is an investor of us in Elation early last year, and we've been a partner for, for longer than that. And good news. We have been awarded Snowflake partner of the year for data governance, just earlier this week. And that's in fact, our second year in a row for winning that award. So, great news on that front as well. >> Repeat, congratulations. >> Repeat. Absolutely. And we're going to hope to make it a three-peat as well. And we've also been awarded industry competency badges in five different industries, those being financial services, healthcare, retail technology, and Median Telcom. >> Excellent. Okay. Going to right get into it. Data mesh. You guys actually have a data mesh and you've presented at the conference. So, take us back to the beginning. Why did you decide that you needed to implement something like data mesh? What was the impetus? >> Yeah. So when people think of Warner brothers, you always think of like the movie studio, but we're more than that, right? I mean, you think of HBO, you think of TNT, you think of CNN, we have 30 plus brands in our portfolio and each have their own needs. So the idea of a data mesh really helps us because what we can do is we can federate access across the company so that, you know, CNN can work at their own pace. You know, when there's election season, they can ingest their own data and they don't have to, you know, bump up against as an example, HBO, if Game of Thrones is going on. >> So, okay. So the, the impetus was to serve those lines of business better. Actually, given that you've got these different brands, it was probably easier than most companies. Cause if you're, let's say you're a big financial services company, and now you have to decide who owns what. CNN owns its own data products, HBO. Now, do they decide within those different brands, how to distribute even further? Or is it really, how deep have you gone in that decentralization? >> That's a great question. It's a very close partnership, because there are a number of data sets, which are used by all the brands, right? You think about people browsing websites, right? You know, CNN has a website, Warner brothers has a website. So for us to ingest that data for each of the brands to ingest that data separately, that means five different ways of doing things and you know, a big environment, right? So that is where our team comes into play. We ingest a lot of the common data sets, but like I said, any unique data sets, data sets regarding theatrical as an example, you know, Warner brothers does it themselves, you know, for streaming, HBO Max, does it themselves. So we kind of operate in partnership. >> So do you have a centralized data team and also decentralized data teams, right? >> That's right. >> So I love this conversation because that was heresy 10 years ago, five years ago, even, cause that's inefficient. But you've, I presume you've found that it's actually more productive in terms of the business output, explain that dynamic. >> You know, you bring up such a good point. So I, you know, I consider myself as one of the dinosaurs who started like 20 plus years ago in this industry. And back then, we were all taught to think of the data warehouse as like a monolithic thing. And the reason for that is the technology wasn't there. The technology didn't catch up. Now, 20 years later, the technology is way ahead, right? But like, our mindset's still the same because we think of data warehouses and data platforms still as a monolithic thing. But if you really sort of remove that sort of mental barrier, if you will, and if you start thinking about, well, how do I sort of, you know, federate everything and make sure that you let folks who are building, or are closest to the customer or are building their products, let them own that data and have a partnership. The results have been amazing. And if we were only sort of doing it as a centralized team, we would not be able to do a 10th of what we do today. So it's that massive scale in, in our company as well. >> And I should have clarified, when we talk about data mesh are we talking about the implementing in practice, the octagon sort of framework, or is this sort of your own sort of terminology? >> Well, so the interesting part is four years ago, we didn't have- >> It didn't exist. >> Yeah. It didn't exist. And, and so we, our principle was very simple, right? When we started out, we said, we want to make sure that our brands are able to operate independently with some oversight and guidance from our technology teams, right? That's what we set out to do. We did that with Snowflake by design because Snowflake allows us to, you know, separate those, those brands into different accounts. So that was done by design. And then the, the magic, I think, is the Snowflake data sharing where, which allows us to sort of bring data in here once, and then share it with whoever needs it. So think about HBO Max. On HBO Max, You not only have HBO Max content, but content from CNN, from Cartoon Network, from Warner Brothers, right? All the movies, right? So to see how The Batman movie did in theaters and then on streaming, you don't need, you know, Warner brothers doesn't need to ingest the same streaming data. HBO Max does it. HBO Max shares it with Warner brothers, you know, store once, share many times, and everyone works at their own pace. >> So they're building data products. Those data products are discoverable APIs, I presume, or I guess maybe just, I guess the Snowflake cloud, but very importantly, they're governed. And that's correct, where Elation comes in? >> That's precisely where Elation comes in, is where sort of this central flexible foundation for data governance. You know, you mentioned data mesh. I think what's interesting is that it's really an answer to the bottlenecks created by centralized IT, right? There's this notion of decentralizing that the data engineers and making the data domain owners, the people that know the data the best, have them be in control of publishing the data to the data consumers. There are other popular concepts actually happening right now, as we speak, around modern data stack. Around data fabric that are also in many ways underpinned by this notion of decentralization, right? These are concepts that are underpinned by decentralization and as the pendulum swings, sort of between decentralization and centralization, as we go back and forth in the world of IT and data, there are certain constants that need to be centralized over time. And one of those I believe is very much a centralized platform for data governance. And that's certainly, I think where we come in. Would love to hear more about how you use Elation. >> Yeah. So, I mean, elation helps us sort of, as you guys say, sort of, map, the treasure map of the data, right? So for consumers to find where their data is, that's where Elation helps us. It helps us with the data cataloging, you know, storing all the metadata and, you know, users can go in, they can sort of find, you know, the data that they need and they can also find how others are using data. So it's, there's a little bit of a crowdsourcing aspect that Elation helps us to do whereby you know, you can see, okay, my peer in the other group, well, that's how they use this piece of data. So I'm not going to spend hours trying to figure this out. You're going to use the query that they use. So yeah. >> So you have a master catalog, I presume. And then each of the brands has their own sub catalogs, is that correct? >> Well, for the most part, we have that master catalog and then the brands sort of use it, you know, separately themselves. The key here is all that catalog, that catalog isn't maintained by a centralized group as well, right? It's again, maintained by the individual teams and not only in the individual teams, but the folks that are responsible for the data, right? So I talked about the concept of crowdsourcing, whoever sort of puts the data in, has to make sure that they update the catalog and make sure that the definitions are there and everything sort of in line. >> So HBO, CNN, and each have their own, sort of access to their catalog, but they feed into the master catalog. Is that the right way to think about it? >> Yeah. >> Okay. And they have their own virtual data warehouses, right? They have ownership over that? They can spin 'em up, spin 'em down as they see fit? Right? And they're governed. >> They're governed. And what's interesting is it's not just governed, right? Governance is a, is a big word. It's a bit nebulous, but what's really being enabled here is this notion of self-service as well, right? There's two big sort of rockets that need to happen at the same time in any given organization. There's this notion that you want to put trustworthy data in the hands of data consumers, while at the same time mitigating risk. And that's precisely what Elation does. >> So I want to clarify this for the audience. So there's four principles of database. This came after you guys did it. And I wonder how it aligns. Domain ownership, give data, as you were saying to the, to the domain owners who have context, data as product, you guys are building data products, and that creates two problems. How do you give people self-service infrastructure and how do you automate governance? So the first two, great. But then it creates these other problems. Does that align with your philosophy? Where's alignment? What's different? >> Yeah. Data products is exactly where we're going. And that sort of, that domain based design, that's really key as well. In our business, you think about who the customer is, as an example, right? Depending on who you ask, it's going to be, the answer might be different, you know, to the movie business, it's probably going to be the person who watches a movie in a theater. To the streaming business, to HBO Max, it's the streamer, right? To others, someone watching live CNN on their TV, right? There's yet another group. Think about all the franchising we do. So you see Batman action figures and T-shirts, and Warner brothers branded stuff in stores, that's yet another business unit. But at the end of the day, it's not a different person, it's you and me, right? We do all these things. So the domain concept, make sure that you ingest data and you bring data relevant to the context, however, not sort of making it so stringent where it cannot integrate, and then you integrate it at a higher level to create that 360. >> And it's discoverable. So the point is, I don't have to go tap Ash on the shoulder, say, how do I get this data? Is it governed? Do I have access to it? Give me the rules of it. Just, I go grab it, right? And the system computationally automates whether or not I have access to it. And it's, as you say, self-service. >> In this case, exactly right. It enables people to just search for data and know that when they find the data, whether it's trustworthy or not, through trust flags, and the like, it's doing both of those things at the same time. >> How is it an enabler of solving some of the big challenges that the media and entertainment industry is going through? We've seen so much change the last couple of years. The rising consumer expectations aren't going to go back down. They're only going to come up. We want you to serve us up content that's relevant, that's personalized, that makes sense. I'd love to understand from your perspective, Mitesh, from an industry challenges perspective, how does this technology help customers like Warner Brothers Discovery, meet business customers, where they are and reduce the volume on those challenges? >> It's a great question. And as I mentioned earlier, we had five industry competency badges that were awarded to us by Snowflake. And one of those four, Median Telcom. And the reason for that is we're helping media companies understand their audiences better, and ultimately serve up better experiences for their audiences. But we've got Ash right here that can tell us how that's happening in practice. >> Yeah, tell us. >> So I'll share a story. I always like to tell stories, right? Once once upon a time before we had Elation in place, it was like, who you knew was how you got access to the data. So if I knew you and I knew you had access to a certain kind of data and your access to the right kind of data was based on the network you had at the company- >> I had to trust you. >> Yeah. >> I might not want to give up my data. >> That's it. And so that's where Elation sort of helps us democratize it, but, you know, puts the governance and controls, right? There are certain sensitive things as well, such as viewership, such as subscriber accounts, which are very important. So making sure that the right people have access to it, that's the other problem that Elation helps us solve. >> That's precisely part of our integration with Snowflake in particular, being able to define and manage policies within Elation. Saying, you know, certain people should have access to certain rows, doing column level masking. And having those policies actually enforced at the Snowflake data layer is precisely part of our value product. >> And that's automated. >> And all that's automated. Exactly. >> Right. So I don't have to think about it. I don't have to go through the tap on their shoulder. What has been the impact, Ash, on data quality as you've pushed it down into the domains? >> That's a great question. So it has definitely improved, but data quality is a very interesting subject, because back to my example of, you know, when we started doing things, we, you know, the centralized IT team always said, well, it has to be like this, Right? And if it doesn't fit in this, then it's bad quality. Well, sometimes context changes. Businesses change, right? You have to be able to react to it quickly. So making sure that a lot of that quality is managed at the decentralized level, at the place where you have that business context, that ensures you have the most up to date quality. We're talking about media industry changing so quickly. I mean, would we have thought three years ago that people would watch a lot of these major movies on streaming services? But here's the reality, right? You have to react and, you know, having it at that level just helps you react faster. >> So data, if I play that back, data quality is not a static framework. It's flexible based on the business context and the business owners can make those adjustments, cause they own the data. >> That's it. That's exactly it. >> That's awesome. Wow. That's amazing progress that you guys have made. >> In quality, if I could just add, it also just changes depending on where you are in your data pipeline stage, right? Data, quality data observability, this is a very fast evolving space at the moment, and if I look to my left right now, I bet you I can probably see a half-dozen quality observability vendors right now. And so given that and given the fact that Elation still is sort of a central hub to find trustworthy data, we've actually announced an open data quality initiative, allowing for best-of-breed data quality vendors to integrate with the platform. So whoever they are, whatever tool folks want to use, they can use that particular tool of choice. >> And this all runs in the cloud, or is it a hybrid sort of? >> Everything is in the cloud. We're all in the cloud. And you know, again, helps us go faster. >> Let me ask you a question. I could go on forever in this topic. One of the concepts that was put forth is whether it's a Snowflake data warehouse or a data bricks, data lake, or an Oracle data warehouse, they should all be inclusive. They should just be a node on the mesh. Like, wow, that sounds good. But I haven't seen it yet. Right? I'm guessing that Snowflake and Elation enable all the self-serve, all this automated governance, and that including those other items, it's got to be a one-off at this point in time. Do you ever see you expanding that scope or is it better off to just kind of leave it into the, the Snowflake data cloud? >> It's a good question. You know, I feel like where we're at today, especially in terms of sort of technology giving us so many options, I don't think there's a one size fits all. Right? Even though we are very heavily invested in Snowflake and we use Snowflake consistently across the organization, but you could, theoretically, could have an architecture that blends those two, right? Have different types of data platforms like a teradata or an Oracle and sort of bring it all together today. We have the technology, you know, that and all sorts of things that can make sure that you query on different databases. So I don't think the technology is the problem, I think it's the organizational mindset. I think that that's what gets in the way. >> Oh, interesting. So I was going to ask you, will hybrid tables help you solve that problem? And, maybe not, what you're saying, it's the organization that owns the Oracle database saying, Hey, we have our system. It processes, it works, you know, go away. >> Yeah. Well, you know, hybrid tables I think, is a great sort of next step in Snowflake's evolution. I think it's, in my opinion, I, think it's a game changer, but yeah. I mean, they can still exist. You could do hybrid tables right on Snowflake, or you could, you know, you could kind of coexist as well. >> Yeah. But, do you have a thought on this? >> Yeah, I do. I mean, we're always going to live in a time where you've got data distributed in throughout the organization and around the globe. And that could be even if you're all in on Snowflake, you could have data in Snowflake here, you could have data in Snowflake in EMEA and Europe somewhere. It could be anywhere. By the same token you might be using. Every organization is using on-premises systems. They have data, they naturally have data everywhere. And so, you know, this one solution to this is really centralizing, as I mentioned, not just governance, but also metadata about all of the data in your organization so that you can enable people to search and find and discover trustworthy data no matter where it is in your organization. >> Yeah. That's a great point. I mean, if you have the data about the data, then you can, you can treat these independent nodes. That's just that. Right? And maybe there's some advantages of putting it all in the Snowflake cloud, but to your point, organizationally, that's just not feasible. The whole, unfortunately, sorry, Snowflake, all the world's data is not going to go into Snowflake, but they play a key role in accelerating, what I'm hearing, your vision of data mesh. >> Yeah, absolutely. I think going forward in the future, we have to start thinking about data platforms as just one place where you sort of dump all the data. That's where the mesh concept comes in. It is going to be a mesh. It's going to be distributed and organizations have to be okay with that. And they have to embrace the tools. I mean, you know, Facebook developed a tool called Presto many years ago that that helps them solve exactly the same problem. So I think the technology is there. I think the organizational mindset needs to evolve. >> Yeah. Definitely. >> Culture. Culture is one of the hardest things to change. >> Exactly. >> Guys, this was a masterclass in data mesh, I think. Thank you so much for coming on talking. >> We appreciate it. Thank you so much. >> Of course. What Elation is doing with Snowflake and with Warner Brothers Discovery, Keep that content coming. I got a lot of stuff I got to catch up on watching. >> Sounds good. Thank you for having us. >> Thanks guys. >> Thanks, you guys. >> For Dave Vellante, I'm Lisa Martin. You're watching theCUBE live from Snowflake Summit '22. We'll be back after a short break. (upbeat music)

Published Date : Jun 15 2022

SUMMARY :

session coming for you next. and Ash Naseer great, to have you, in the conference center. and now it's great to kind of see the acceleration that you guys have of the year for data And we've also been awarded Why did you decide that you So the idea of a data mesh Or is it really, how deep have you gone the brands to ingest that data separately, terms of the business and make sure that you let allows us to, you know, separate those, guess the Snowflake cloud, of decentralizing that the data engineers the data cataloging, you know, storing all So you have a master that are responsible for the data, right? Is that the right way to think about it? And they're governed. that need to happen at the So the first two, great. the answer might be different, you know, So the point is, It enables people to just search that the media and entertainment And the reason for that is So if I knew you and I knew that the right people have access to it, Saying, you know, certain And all that's automated. I don't have to go through You have to react and, you know, It's flexible based on the That's exactly it. that you guys have made. and given the fact that Elation still And you know, again, helps us go faster. a node on the mesh. We have the technology, you that owns the Oracle database saying, you know, you could have a thought on this? And so, you know, this one solution I mean, if you have the I mean, you know, the hardest things to change. Thank you so much for coming on talking. Thank you so much. of stuff I got to catch up on watching. Thank you for having us. from Snowflake Summit '22.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Dave VellantePERSON

0.99+

Lisa MartinPERSON

0.99+

CNNORGANIZATION

0.99+

HBOORGANIZATION

0.99+

Mitesh ShahPERSON

0.99+

Ash NaseerPERSON

0.99+

EuropeLOCATION

0.99+

FacebookORGANIZATION

0.99+

MiteshPERSON

0.99+

ElationORGANIZATION

0.99+

TNTORGANIZATION

0.99+

Warner brothersORGANIZATION

0.99+

EMEALOCATION

0.99+

second yearQUANTITY

0.99+

OracleORGANIZATION

0.99+

2019DATE

0.99+

two yearsQUANTITY

0.99+

oneQUANTITY

0.99+

Cartoon NetworkORGANIZATION

0.99+

Game of ThronesTITLE

0.99+

two problemsQUANTITY

0.99+

twoQUANTITY

0.99+

Warner BrothersORGANIZATION

0.99+

10thQUANTITY

0.99+

firstQUANTITY

0.99+

SnowflakeORGANIZATION

0.99+

Snowflake Summit '22EVENT

0.99+

Warner brothersORGANIZATION

0.99+

eachQUANTITY

0.99+

fourQUANTITY

0.99+

Las VegasLOCATION

0.99+

Median TelcomORGANIZATION

0.99+

20 years laterDATE

0.98+

bothQUANTITY

0.98+

five different industriesQUANTITY

0.98+

10 years agoDATE

0.98+

30 plus brandsQUANTITY

0.98+

AlationPERSON

0.98+

four years agoDATE

0.98+

todayDATE

0.98+

20 plus years agoDATE

0.97+

Warner Brothers DiscoveryORGANIZATION

0.97+

OneQUANTITY

0.97+

five years agoDATE

0.97+

Snowflake Summit 2022EVENT

0.97+

three years agoDATE

0.97+

five different waysQUANTITY

0.96+

earlier this weekDATE

0.96+

SnowflakeTITLE

0.96+

MaxTITLE

0.96+

early last yearDATE

0.95+

about 1900 attendeesQUANTITY

0.95+

SnowflakeEVENT

0.94+

AshPERSON

0.94+

three-peatQUANTITY

0.94+

around 10,000QUANTITY

0.93+

Wrap with Stu Miniman | Red Hat Summit 2022


 

(bright music) >> Okay, we're back in theCUBE. We said we were signing off for the night, but during the hallway track, we ran into old friend Stu Miniman who was the Director of Market Insights at Red Hat. Stu, friend of theCUBE done the thousands of CUBE interviews. >> Dave, it's great to be here. Thanks for pulling me on, you and I hosted Red Hat Summit before. It's great to see Paul here. I was actually, I was talking to some of the Red Hatters walking around Boston. It's great to have an event here. Boston's got strong presence and I understand, I think was either first or second year, they had it over... What's the building they're tearing down right down the road here. Was that the World Trade Center? I think that's where they actually held it, the first time they were here. We hosted theCUBE >> So they moved up. >> at the Hines Convention Center. We did theCUBE for summit at the BCEC next door. And of course, with the pandemic being what it was, we're a little smaller, nice intimate event here. It's great to be able to room the hall, see a whole bunch of people and lots watching online. >> It's great, it's around the same size as those, remember those Vertica Big Data events that we used to have here. And I like that you were commenting out at the theater and the around this morning for the keynotes, that was good. And the keynotes being compressed, I think, is real value for the attendees, you know? 'Cause people come to these events, they want to see each other, you know? They want to... It's like the band getting back together. And so when you're stuck in the keynote room, it's like, "Oh, it's okay, it's time to go." >> I don't know that any of us used to sitting at home where I could just click to another tab or pause it or run for, do something for the family, or a quick bio break. It's the three-hour keynote I hope has been retired. >> But it's an interesting point though, that the virtual event really is driving the physical and this, the way Red Hat marketed this event was very much around the virtual attendee. Physical was almost an afterthought, so. >> Right, this is an invite only for in-person. So you're absolutely right. It's optimizing the things that are being streamed, the online audience is the big audience. And we just happy to be in here to clap and do some things see around what you're doing. >> Wonderful see that becoming the norm. >> I think like virtual Stu, you know this well when virtual first came in, nobody had a clue with what they were doing. It was really hard. They tried different things, they tried to take the physical and just jam it into the virtual. That didn't work, they tried doing fun things. They would bring in a famous person or a comedian. And that kind of worked, I guess, but everybody showed up for that and then left. And I think they're trying to figure it out what this hybrid thing is. I've seen it both ways. I've seen situations like this, where they're really sensitive to the virtual. I've seen others where that's the FOMO of the physical, people want physical. So, yeah, I think it depends. I mean, reinvent last year was heavy physical. >> Yeah, with 15,000 people there. >> Pretty long keynotes, you know? So maybe Amazon can get away with it, but I think most companies aren't going to be able to. So what is the market telling you? What are these insights? >> So Dave just talking about Amazon, obviously, the world I live in cloud and that discussion of cloud, the journey that customers are going on is where we're spending a lot of the discussions. So, it was great to hear in the keynote, talked about our deep partnerships with the cloud providers and what we're doing to help people with, you like to call it super cloud, some call it hybrid, or multi-cloud... >> New name. (crosstalk) Meta-Cloud, come on. >> All right, you know if Che's my executive, so it's wonderful. >> Love it. >> But we'll see, if I could put on my VR Goggles and that will help me move things. But I love like the partnership announcement with General Motors today because not every company has the needs of software driven electric vehicles all over the place. But the technology that we build for them actually has ramifications everywhere. We've working to take Kubernetes and make it smaller over time. So things that we do at the edge benefit the cloud, benefit what we do in the data center, it's that advancement of science and technology just lifts all boats. >> So what's your take on all this? The EV and software on wheels. I mean, Tesla obviously has a huge lead. It's kind of like the Amazon of vehicles, right? It's sort of inspired a whole new wave of innovation. Now you've got every automobile manufacturer kind of go and after. That is the future of vehicles is something you followed or something you have an opinion on Stu? >> Absolutely. It's driving innovation in some ways, the way the DOS drove innovation on the desktop, if you remember the 64K DOS limit, for years, that was... The software developers came up with some amazing ways to work within that 64K limit. Then when it was gone, we got bloatware, but it actually does enforce a level of discipline on you to try to figure out how to make software run better, run more efficiently. And that has upstream impacts on the enterprise products. >> Well, right. So following your analogy, you talk about the enablement to the desktop, Linux was a huge influence on allowing the individual person to write code and write software, and what's happening in the EV, it's software platform. All of these innovations that we're seeing across industries, it's how is software transforming things. We go back to the mark end reasons, software's eating the world, open source is the way that software is developed. Who's at the intersection of all those? We think we have a nice part to play in that. I loved tha- Dave, I don't know if you caught at the end of the keynote, Matt Hicks basically said, "Our mission isn't just to write enterprise software. "Our mission is based off of open source because open source unlocks innovation for the world." And that's one of the things that drew me to Red Hat, it's not just tech in good places, but allowing underrepresented, different countries to participate in what's happening with software. And we can all move that ball forward. >> Well, can we declare victory for open source because it's not just open source products, but everything that's developed today, whether proprietary or open has open source in it. >> Paul, I agree. Open source is the development model period, today. Are there some places that there's proprietary? Absolutely. But I had a discussion with Deepak Singh who's been on theCUBE many times. He said like, our default is, we start with open source code. I mean, even Amazon when you start talking about that. >> I said this, the $70 billion business on open source. >> Exactly. >> Necessarily give it back, but that say, Hey, this is... All's fair in tech and more. >> It is interesting how the managed service model has sort of rescued open source, open source companies, that were trying to do the Red Hat model. No one's ever really successfully duplicated the Red Hat model. A lot of companies were floundering and failing. And then the managed service option came along. And so now they're all cloud service providers. >> So the only thing I'd say is that there are some other peers we have in the industry that are built off open source they're doing okay. The recent example, GitLab and Hashicorp, both went public. Hashi is doing some managed services, but it's not the majority of their product. Look at a company like Mongo, they've heavily pivoted toward the managed service. It is where we see the largest growth in our area. The products that we have again with Amazon, with Microsoft, huge growth, lots of interest. It's one of the things I spend most of my time talking on. >> I think Databricks is another interesting example 'cause Cloudera was the now company and they had the sort of open core, and then they had the proprietary piece, and they've obviously didn't work. Databricks when they developed Spark out of Berkeley, everybody thought they were going to do kind of a similar model. Instead, they went for all in managed services. And it's really worked well, I think they were ahead of that curve and you're seeing it now is it's what customers want. >> Well, I mean, Dave, you cover the database market pretty heavily. How many different open source database options are there today? And that's one of the things we're solving. When you look at what is Red Hat doing in the cloud? Okay, I've got lots of databases. Well, we have something called, it's Red Hat Open Database Access, which is from a developer, I don't want to have to think about, I've got six different databases, which one, where's the repository? How does all that happen? We give that consistency, it's tied into OpenShift, so it can help abstract some of those pieces. we've got same Kafka streaming and we've got APIs. So it's frameworks and enablers to help bridge that gap between the complexity that's out there, in the cloud and for the developer tool chain. >> That's really important role you guys play though because you had this proliferation, you mentioned Mongo. So many others, Presto and Starbursts, et cetera, so many other open source options out there now. And companies, developers want to work with multiple databases within the same application. And you have a role in making that easy. >> Yeah, so and that is, if you talk about the question I get all the time is, what's next for Kubernetes? Dave, you and I did a preview for KubeCon and it's automation and simplicity that we need to be. It's not enough to just say, "Hey, we've got APIs." It's like Dave, we used to say, "We've got standards? Great." Everybody's implementation was a little bit different. So we have API Sprawl today. So it's building that ecosystem. You've been talking to a number of our partners. We are very active in the community and trying to do things that can lift up the community, help the developers, help that cloud native ecosystem, help our customers move faster. >> Yeah API's better than scripts, but they got to be managed, right? So, and that's really what you guys are doing that's different. You're not trying to own everything, right? It's sort of antithetical to how billions and trillions are made in the IT industry. >> I remember a few years ago we talked here, and you look at the size that Red Hat is. And the question is, could Red Hat have monetized more if the model was a little different? It's like, well maybe, but that's not the why. I love that they actually had Simon Sinek come in and work with Red Hat and that open, unlocks the world. Like that's the core, it's the why. When I join, they're like, here's a book of Red Hat, you can get it online and that why of what we do, so we never have to think of how do we get there. We did an acquisition in the security space a year ago, StackRox, took us a year, it's open source. Stackrox.io, it's community driven, open source project there because we could have said, "Oh, well, yeah, it's kind of open source and there's pieces that are open source, but we want it to be fully open source." You just talked to Gunnar about how he's RHEL nine, based off CentOS stream, and now developing out in the open with that model, so. >> Well, you were always a big fan of Whitehurst culture book, right? It makes a difference. >> The open organization and right, Red Hat? That culture is special. It's definitely interesting. So first of all, most companies are built with the hierarchy in mind. Had a friend of mine that when he joined Red Hat, he's like, I don't understand, it's almost like you have like lots of individual contractors, all doing their things 'cause Red Hat works on thousands of projects. But I remember talking to Rackspace years ago when OpenStack was a thing and they're like, "How do you figure out what to work on?" "Oh, well we hired great people and they work on what's important to them." And I'm like, "That doesn't sound like a business." And he is like, "Well, we struggle sometimes to that balance." Red Hat has found that balance because we work on a lot of different projects and there are people inside Red Hat that are, you know, they care more about the project than they do the business, but there's the overall view as to where we participate and where we productize because we're not creating IP because it's all an open source. So it's the monetizations, the relationships we have our customers, the ecosystems that we build. And so that is special. And I'll tell you that my line has been Red Hat on the inside is even more Red Hat. The debates and the discussions are brutal. I mean, technical people tearing things apart, questioning things and you can't be thin skinned. And the other thing is, what's great is new people. I've talked to so many people that started at Red Hat as interns and will stay for seven, eight years. And they come there and they have as much of a seat at the table, and when I talk to new people, your job, is if you don't understand something or you think we might be able to do it differently, you better speak up because we want your opinion and we'll take that, everybody takes that into consideration. It's not like, does the decision go all the way up to this executive? And it's like, no, it's done more at the team. >> The cultural contrast between that and your parent, IBM, couldn't be more dramatic. And we talked earlier with Paul Cormier about has IBM really walked the walk when it comes to leaving Red Hat alone. Naturally he said, "Yes." Well what's your perspective. >> Yeah, are there some big blue people across the street or something I heard that did this event, but look, do we interact with IBM? Of course. One of the reasons that IBM and IBM Services, both products and services should be able to help get us breadth in the marketplace. There are times that we go arm and arm into customer meetings and there are times that customers tell us, "I like Red Hat, I don't like IBM." And there's other ones that have been like, "Well, I'm a long time IBM, I'm not sure about Red Hat." And we have to be able to meet all of those customers where they are. But from my standpoint, I've got a Red Hat badge, I've got a Red Hat email, I've got Red Hat benefits. So we are fiercely independent. And you know, Paul, we've done blogs and there's lots of articles been written is, Red Hat will stay Red Hat. I didn't happen to catch Arvin I know was on CNBC today and talking at their event, but I'm sure Red Hat got mentioned, but... >> Well, he talks about Red Hat all time. >> But in his call he's talking backwards. >> It's interesting that he's not here, greeting this audience, right? It's again, almost by design, right? >> But maybe that's supposed to be... >> Hundreds of yards away. >> And one of the questions being in the cloud group is I'm not out pitching IBM Cloud, you know? If a customer comes to me and asks about, we have a deep partnership and IBM will be happy to tell you about our integrations, as opposed to, I'm happy to go into a deep discussion of what we're doing with Google, Amazon, and Microsoft. So that's how we do it. It's very different Dave, from you and I watch really closely the VMware-EMC, VMware-Dell, and how that relationship. This one is different. We are owned by IBM, but we mostly, it does IBM fund initiatives and have certain strategic things that are done, absolutely. But we maintain Red Hat. >> But there are similarities. I mean, VMware crowd didn't want to talk about EMC, but they had to, they were kind of forced to. Whereas, you're not being forced to. >> And then once Dell came in there, it was joint product development. >> I always thought a spin in. Would've been the more effective, of course, Michael Dell and Egon wouldn't have gotten their $40 billion out. But I think a spin in was more natural based on where they were going. And it would've been, I think, a more dominant position in the marketplace. They would've had more software, but again, financially it wouldn't have made as much sense, but that whole dynamic is different. I mean, but people said they were going to look at VMware as a model and it's been largely different because remember, VMware of course was a separate company, now is a fully separate company. Red Hat was integrated, we thought, okay, are they going to get blue washed? We're watching and watching, and watching, you had said, well, if the Red Hat culture isn't permeating IBM, then it's a failure. And I don't know if that's happening, but it's definitely... >> I think a long time for that. >> It's definitely been preserved. >> I mean, Dave, I know I read one article at the beginning of the year is, can Arvin make IBM, Microsoft Junior? Follow the same turnaround that Satya Nadella drove over there. IBM I think making some progress, I mean, I read and watch what you and the team are all writing about it. And I'll withhold judgment on IBM. Obviously, there's certain financial things that we'd love to see IBM succeed. We worry about our business. We do our thing and IBM shares our results and they've been solid, so. >> Microsoft had such massive cash flow that even bomber couldn't screw it up. Well, I mean, this is true, right? I mean, you think about how were relevant Microsoft was in the conversation during his tenure and yet they never got really... They maintained a position so that when the Nadella came in, they were able to reascend and now are becoming that dominant player. I mean, IBM just doesn't have that cash flow and that luxury, but I mean, if he pulls it off, he'll be the CEO of the decade. >> You mentioned partners earlier, big concern when the acquisition was first announced, was that the Dells and the HP's and the such wouldn't want to work with Red Hat anymore, you've sort of been here through that transition. Is that an issue? >> Not that I've seen, no. I mean, the hardware suppliers, the ISVs, the GSIs are all very important. It was great to see, I think you had Accenture on theCUBE today, obviously very important partner as we go to the cloud. IBM's another important partner, not only for IBM Cloud, but IBM Services, deep partnership with Azure and AWS. So those partners and from a technology standpoint, the cloud native ecosystem, we talked about, it's not just a Red Hat product. I constantly have to talk about, look, we have a lot of pieces, but your developers are going to have other tools that they're going to use and the security space. There is no such thing as a silver bullet. So I've been having some great conversations here already this week with some of our partners that are helping us to round out that whole solution, help our customers because it has to be, it's an ecosystem. And we're one of the drivers to help that move forward. >> Well, I mean, we were at Dell Tech World last week, and there's a lot of talk about DevSecOps and DevOps and Dell being more developer friendly. Obviously they got a long way to go, but you can't have that take that posture and not have a relationship with Red Hat. If all you got is Pivotal and VMware, and Tansu >> I was thrilled to hear the OpenShift mention in the keynote when they talked about what they were doing. >> How could you not, how could you have any credibility if you're just like, Oh, Pivotal, Pivotal, Pivotal, Tansu, Tansu. Tansu is doing its thing. And they smart strategy. >> VMware is also a partner of ours, but that we would hope that with VMware being independent, that does open the door for us to do more with them. >> Yeah, because you guys have had a weird relationship with them, under ownership of EMC and then Dell, right? And then the whole IBM thing. But it's just a different world now. Ecosystems are forming and reforming, and Dell's building out its own cloud and it's got to have... Look at Amazon, I wrote about this. I said, "Can you envision the day where Dell actually offers competitive products in its suite, in its service offering?" I mean, it's hard to see, they're not there yet. They're not even close. And they have this high say/do ratio, or really it's a low say/do, they say high say/do, but look at what they did with Nutanix. You look over- (chuckles) would tell if it's the Cisco relationship. So it's got to get better at that. And it will, I really do believe. That's new thinking and same thing with HPE. And, I don't know about Lenovo that not as much of an ecosystem play, but certainly Dell and HPE. >> Absolutely. Michael Dell would always love to poke at HPE and HP really went very far down the path of their own products. They went away from their services organization that used to be more like IBM, that would offer lots of different offerings and very much, it was HP Invent. Well, if we didn't invent it, you're not getting it from us. So Dell, we'll see, as you said, the ecosystems are definitely forming, converging and going in lots of different directions. >> But your position is, Hey, we're here, we're here to help. >> Yeah, we're here. We have customers, one of the best proof points I have is the solution that we have with Amazon. Amazon doesn't do the engineering work to make us a native offering if they didn't have the customer demand because Amazon's driven off of data. So they came to us, they worked with us. It's a lot of work to be able to make that happen, but you want to make it frictionless for customers so that they can adopt that. That's a long path. >> All right, so evening event, there's a customer event this evening upstairs in the lobby. Microsoft is having a little shin dig, and then serves a lot of customer dinners going on. So Stu, we'll see you out there tonight. >> All right, thanks you. >> Were watching a brewing somewhere. >> Keynotes tomorrow, a lot of good sessions and enablement, and yeah, it's great to be in person to be able to bump some people, meet some people and, Hey, I'm still a year and a half in still meeting a lot of my peers in person for the first time. >> Yeah, and that's kind of weird, isn't it? Imagine. And then we kick off tomorrow at 10:00 AM. Actually, Stephanie Chiras is coming on. There she is in the background. She's always a great guest and maybe do a little kickoff and have some fun tomorrow. So this is Dave Vellante for Stu Miniman, Paul Gillin, who's my co-host. You're watching theCUBEs coverage of Red Hat Summit 2022. We'll see you tomorrow. (bright music)

Published Date : May 11 2022

SUMMARY :

but during the hallway track, Was that the World Trade Center? at the Hines Convention Center. And I like that you were It's the three-hour keynote that the virtual event really It's optimizing the things becoming the norm. and just jam it into the virtual. aren't going to be able to. a lot of the discussions. Meta-Cloud, come on. All right, you know But the technology that we build for them It's kind of like the innovation on the desktop, And that's one of the things Well, can we declare I mean, even Amazon when you start talking the $70 billion business on open source. but that say, Hey, this is... the managed service model but it's not the majority and then they had the proprietary piece, And that's one of the And you have a role in making that easy. I get all the time is, are made in the IT industry. And the question is, Well, you were always a big fan the relationships we have our customers, And we talked earlier One of the reasons that But in his call he's talking that's supposed to be... And one of the questions I mean, VMware crowd didn't And then once Dell came in there, Would've been the more I think a long time It's definitely been at the beginning of the year is, and that luxury, the HP's and the such I mean, the hardware suppliers, the ISVs, and not have a relationship with Red Hat. the OpenShift mention in the keynote And they smart strategy. that does open the door for us and it's got to have... the ecosystems are definitely forming, But your position is, Hey, is the solution that we have with Amazon. So Stu, we'll see you out there tonight. Were watching a brewing person for the first time. There she is in the background.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
GoogleORGANIZATION

0.99+

PaulPERSON

0.99+

Dave VellantePERSON

0.99+

AmazonORGANIZATION

0.99+

IBMORGANIZATION

0.99+

Stu MinimanPERSON

0.99+

General MotorsORGANIZATION

0.99+

Paul GillinPERSON

0.99+

DavePERSON

0.99+

sevenQUANTITY

0.99+

MicrosoftORGANIZATION

0.99+

Stephanie ChirasPERSON

0.99+

HPORGANIZATION

0.99+

Matt HicksPERSON

0.99+

DellORGANIZATION

0.99+

GunnarPERSON

0.99+

Paul CormierPERSON

0.99+

Deepak SinghPERSON

0.99+

$40 billionQUANTITY

0.99+

BostonLOCATION

0.99+

DatabricksORGANIZATION

0.99+

BerkeleyLOCATION

0.99+

AWSORGANIZATION

0.99+

Satya NadellaPERSON

0.99+

HPEORGANIZATION

0.99+

$70 billionQUANTITY

0.99+

CiscoORGANIZATION

0.99+

tomorrowDATE

0.99+

Simon SinekPERSON

0.99+

StuPERSON

0.99+

last weekDATE

0.99+

HashicorpORGANIZATION

0.99+

GitLabORGANIZATION

0.99+

DellsORGANIZATION

0.99+

LenovoORGANIZATION

0.99+

TeslaORGANIZATION

0.99+

Red HatORGANIZATION

0.99+

MongoORGANIZATION

0.99+

EMCORGANIZATION

0.99+

15,000 peopleQUANTITY

0.99+

Red HatTITLE

0.99+

Michael DellPERSON

0.99+

64KQUANTITY

0.99+

last yearDATE

0.99+

ArvinPERSON

0.99+

VMwareORGANIZATION

0.99+

Red HatORGANIZATION

0.99+

Rahul Pathak Opening Session | AWS Startup Showcase S2 E2


 

>>Hello, everyone. Welcome to the cubes presentation of the 80 minutes startup showcase. Season two, episode two, the theme is data as code, the future of analytics. I'm John furry, your host. We had a great day lineup for you. Fast growing startups, great lineup of companies, founders, and stories around data as code. And we're going to kick it off here with our opening keynote with Rahul Pathak VP of analytics at AWS cube alumni. Right? We'll thank you for coming on and being the opening keynote for this awesome event. >>Yeah. And it's great to see you, and it's great to be part of this event, uh, excited to, um, to help showcase some of the great innovation that startups are doing on top of AWS. >>Yeah. We last spoke at AWS reinvent and, uh, a lot's happened there, service loss of serverless as the center of the, of the action, but all these start-ups rock set Dremio Cribble monks next Liccardo, a HANA imply all doing great stuff. Data as code has a lot of traction. So a lot of still momentum going on in the marketplace. Uh, pretty exciting. >>No, it's, uh, it's awesome. I mean, I think there's so much innovation happening and you know, the, the wonderful part of working with data is that the demand for services and products that help customers drive insight from data is just skyrocketing and has no sign of no sign of slowing down. And so it's a great time to be in the data business. >>It's interesting to see the theme of the show getting traction, because you start to see data being treated almost like how developers write software, taking things out of branches, working on them, putting them back in, uh, machine learnings, uh, getting iterated on you, seeing more models, being trained differently with better insights, action ones that all kind of like working like code. And this is a whole nother way. People are reinventing their businesses. This has been a big, huge wave. What's your reaction to that? >>Uh, I think it's spot on, I mean, I think the idea of data's code and bringing some of the repeatability of processes from software development into how people built it, applications is absolutely fundamental and especially so in machine learning where you need to think about the explainability of a model, what version of the world was it trained on? When you build a better model, you need to be able to explain and reproduce it. So I think your insights are spot on and these ideas are showing up in all stages of the data work flow from ingestion to analytics to I'm out >>This next way is about modernization and going to the next level with cloud-scale. Uh, thank you so much for coming on and being the keynote presenter here for this great event. Um, I'll let you take it away. Reinventing businesses, uh, with ads analytics, right? We'll take it away. >>Okay, perfect. Well, folks, we're going to talk about, uh, um, reinventing your business with, uh, data. And if you think about it, the first wave of reinvention was really driven by the cloud. As customers were able to really transform how they thought about technology and that's well on her way. Although if you stop and think about it, I think we're only about five to 10% of the way done in terms of it span being on the cloud. So lots of work to do there, but we're seeing another wave of reinvention, which is companies reinventing their businesses with data and really using data to transform what they're doing to look for new opportunities and look for ways to operate more efficiently. And I think the past couple of years of the pandemic, it really only accelerated that trend. And so what we're seeing is, uh, you know, it's really about the survival of the most informed folks for the best data are able to react more quickly to what's happening. >>Uh, we've seen customers being able to scale up if they're in, say the delivery business or scale down, if they were in the travel business at the beginning of all of this, and then using data to be able to find new opportunities and new ways to serve customers. And so it's really foundational and we're seeing this across the board. And so, um, you know, it's great to see the innovation that's happening to help customers make sense of all of this. And our customers are really looking at ways to put data to work. It's about making better decisions, finding new efficiencies and really finding new opportunities to succeed and scale. And, um, you know, when it comes to, uh, good examples of this FINRA is a great one. You may not have heard of them, but that the U S equities regulators, all trading that happens in equities, they keep track of they're look at about 250 billion records per day. >>Uh, the examiner, I was only EMR, which is our spark and Hadoop service, and they're processing 20 terabytes of data running across tens of thousands of nodes. And they're looking for fraud and bad actors in the market. So, um, you know, huge, uh, transformation journey for FINRA over the years of customer I've gotten to work with personally since really 2013 onward. So it's been amazing to see their journey, uh, Pinterest, not a great customer. I'm sure everyone's familiar with, but, um, you know, they're about visual search and discovery and commerce, and, um, they're able to scale their daily lot searches, um, really a factor of three X or more, uh, drive down their costs. And they're using the Amazon Opus search service. And really what we're trying to do at AWS is give our customers the most comprehensive set of services for the end-to-end journey around, uh, data from ingestion to analytics and machine learning. And we will want to provide a comprehensive set of capabilities for ingestion, cataloging analytics, and then machine learning. And all of these are things that our partners and the startups that are run on us have available to them to build on as they build and deliver value for their customers. >>And, you know, the way we think about this is we want customers to be able to modernize what they're doing and their infrastructure. And we provide services for that. It's about unifying data, wherever it lives, connecting it. So the customers can build a complete picture of their customers and business. And then it's about innovation and really using machine learning to bring all of this unified data, to bear on driving new innovation and new opportunities for customers. And what we're trying to do AWS is really provide a scalable and secure cloud platform that customers and partners can build on a unifying is about connecting data. And it's also about providing well-governed access to data. So one of the big trends that we see is customers looking for the ability to make self-service data available to that customer there and use. And the key to that is good foundational governance. >>Once you can define good access controls, you then are more comfortable setting data free. And, um, uh, the other part of it is, uh, data lakes play a huge role because you need to be able to think about structured and unstructured data. In fact, about 80% of the data being generated today, uh, is unstructured. And you want to be able to connect data that's in data lakes with data that's in purpose-built data stores, whether that's databases on AWS databases, outside SAS products, uh, as well as things like data warehouses and machine learning systems, but really connecting data as key. Uh, and then, uh, innovation, uh, how can we bring to bear? And we imagine all processes with new technologies like AI and machine learning, and AI is also key to unlocking a lot of the value that's in unstructured data. If you can figure out what's in an imagine the sentiment of audio and do that in real-time that lets you then personalize and dynamically tailor experiences, all of which are super important to getting an edge, um, in, uh, in the modern marketplace. And so at AWS, we, when we think about connecting the dots across sources of data, allowing customers to use data, lakes, databases, analytics, and machine learning, we want to provide a common catalog and governance and then use these to help drive new experiences for customers and their apps and their devices. And then this, you know, in an ideal world, we'll create a closed loop. So you create a new experience. You observe our customers interact with it, that generates more data, which is a data source that feeds into the system. >>And, uh, you know, on AWS, uh, thinking about a modern data strategy, uh, really at the core is a data lakes built on us three. And I'll talk more about that in a second. Then you've got services like Athena included, lake formation for managing that data, cataloging it and querying it in place. And then you have the ability to use the right tool for the right job. And so we're big believers in purpose-built services for data because that's where you can avoid compromising on performance functionality or scale. Uh, and then as I mentioned, unification and inter interconnecting, all of that data. So if you need to move data between these systems, uh, there's well-trodden pathways that allow you to do that, and then features built into services that enable that. >>And, um, you know, some of the core ideas that guide the work that we do, um, scalable data lakes at key, um, and you know, this is really about providing arbitrarily scalable high throughput systems. It's about open format data for future-proofing. Uh, then we talk about purpose-built systems at the best possible functionality, performance, and cost. Uh, and then from a serverless perspective, this has been another big trend for us. We announced a bunch of serverless services and reinvented the goal here is to really take away the need to manage infrastructure from customers. They can really focus about driving differentiated business value, integrated governance, and then machine learning pervasively, um, not just as an end product for data scientists, but also machine learning built into data, warehouses, visualization and a database. >>And so it's scalable data lakes. Uh, data three is really the foundation for this. One of our, um, original services that AWS really the backbone of so much of what we do, uh, really unmatched your ability, availability, and scale, a huge portfolio of analytics services, uh, both that we offer, but also that our partners and customers offer and really arbitrary skin. We've got individual customers and estimator in the expert range, many in the hundreds of petabytes. And that's just growing. You know, as I mentioned, we see roughly a 10 X increase in data volume every five years. So that's a exponential increase in data volumes, Uh, from a purpose-built perspective, it's the right tool for the right job, the red shift and data warehousing Athena for querying all your data. Uh, EMR is our managed sparking to do, uh, open search for log analytics and search, and then Kinesis and Amex care for CAFCA and streaming. And that's been another big trend is, uh, real time. Data has been exploding and customers wanting to make sense of that data in real time, uh, is another big deal. >>Uh, some examples of how we're able to achieve differentiated performance and purpose-built systems. So with Redshift, um, using managed storage and it's led us and since types, uh, the three X better price performance, and what's out there available to all our customers and partners in EMR, uh, with things like spark, we're able to deliver two X performance of open source with a hundred percent compatibility, uh, almost three X and Presto, uh, with on two, which is our, um, uh, new Silicon chips on AWS, better price performance, about 10 to 12% better price performance, and 20% lower costs. And then, uh, all compatible source. So drop your jobs, then have them run faster and cheaper. And that translates to customer benefits for better margins for partners, uh, from a serverless perspective, this is about simplifying operations, reducing total cost of ownership and freeing customers from the need to think about capacity management. If we invent, we, uh, announced serverless redshifts EMR, uh, serverless, uh, Kinesis and Kafka, um, and these are all game changes for customers in terms of freeing our customers and partners from having to think about infrastructure and allowing them to focus on data. >>And, um, you know, when it comes to several assumptions in analytics, we've really got a very full and complete set. So, uh, whether that's around data warehousing, big data processing streaming, or cataloging or governance or visualization, we want all of our customers to have an option to run something struggles as well as if they have specialized needs, uh, uh, instances are available as well. And so, uh, really providing a comprehensive deployment model, uh, based on the customer's use cases, uh, from a governance perspective, uh, you know, like information is about easy build and management of data lakes. Uh, and this is what enables data sharing and self service. And, um, you know, with you get very granular access controls. So rule level security, uh, simple data sharing, and you can tag data. So you can tag a group of analysts in the year when you can say those only have access to the new data that's been tagged with the new tags, and it allows you to very, scaleably provide different secure views onto the same data without having to make multiple copies, another big win for customers and partners, uh, support transactions on data lakes. >>So updates and deletes. And time-travel, uh, you know, John talked about data as code and with time travel, you can look at, um, querying on different versions of data. So that's, uh, a big enabler for those types of strategies. And with blue, you're able to connect data in multiple places. So, uh, whether that's accessing data on premises in other SAS providers or, uh, clouds, uh, as well as data that's on AWS and all of this is, uh, serverless and interconnected. And, um, and really it's about plugging all of your data into the AWS ecosystem and into our partner ecosystem. So this API is all available for integration as well, but then from an AML perspective, what we're really trying to do is bring machine learning closer to data. And so with our databases and warehouses and lakes and BI tools, um, you know, we've infused machine learning throughout our, by, um, the state of the art machine running that we offer through SageMaker. >>And so you've got a ML in Aurora and Neptune for broths. Uh, you can train machine learning models from SQL, directly from Redshift and a female. You can use free inference, and then QuickSight has built in forecasting built in natural language, querying all powered by machine learning, same with anomaly detection. And here are the ideas, you know, how can we up our systems get smarter at the surface, the right insights for our customers so that they don't have to always rely on smart people asking the right questions, um, and you know, uh, really it's about bringing data back together and making it available for innovation. And, uh, thank you very much. I appreciate your attention. >>Okay. Well done reinventing the business with AWS analytics rural. That was great. Thanks for walking through that. That was awesome. I have to ask you some questions on the end-to-end view of the data. That seems to be a theme serverless, uh, in there, uh, Mel integration. Um, but then you also mentioned picking the right tool for the job. So then you've got like all these things moving on, simplify it for me right now. So from a business standpoint, how do they modernize? What's the steps that the clients are taking with analytics, what's the best practice? How do they, what's the what's the high order bit here? >>Uh, so the basic hierarchy is, you know, historically legacy systems are rigid and inflexible, and they weren't really designed for the scale of modern data or the variety of it. And so what customers are finding is they're moving to the cloud. They're moving from legacy systems with punitive licensing into more flexible, more systems. And that allows them to really think about building a decoupled, scalable future proof architecture. And so you've got the ability to combine data lakes and databases and data warehouses and connect them using common KPIs and common data protection. And that sets you up to deal with arbitrary scale and arbitrary types. And it allows you to evolve as the future changes since it makes it easy to add in a new type of engine, as we invent a better one a few years from now. Uh, and then, uh, once you've kind of got your data in a cloud and interconnected in this way, you can now build complete pictures of what's going on. You can understand all your touch points with customers. You can understand your complete supply chain, and once you can build that complete picture of your business, you can start to use analytics and machine learning to find new opportunities. So, uh, think about modernizing, moving to the cloud, setting up for the future, connecting data end to end, and then figuring out how to use that to your advantage. >>I know as you mentioned, modern data strategy gives you the best of both worlds. And you've mentioned, um, briefly, I want to get a little bit more, uh, insight from you on this. You mentioned open, open formats. One of the themes that's come out of some of the interviews, these companies we're going to be hearing from today is open source. The role opens playing. Um, how do you see that integrating in? Because again, this is just like software, right? Open, uh, open source software, open source data. It seems to be a trend. What does open look like to you? How do you see that progressing? >>Uh, it's a great question. Uh, open operates on multiple dimensions, John, as you point out, there's open data formats. These are things like JSI and our care for analytics. This allows multiple engines tend to operate on data and it'll, it, it creates option value for customers. If you're going to data in an open format, you can use it with multiple technologies and that'll be future-proofed. You don't have to migrate your data. Now, if you're thinking about using a different technology. So that's one piece now that sort of software, um, also, um, really a big enabler for innovation and for customers. And you've got things like squat arc and Presto, which are popular. And I know some of the startups, um, you know, that we're talking about as part of the showcase and use these technologies, and this allows for really the world to contribute, to innovating and these engines and moving them forward together. And we're big believers in that we've got open source services. We contribute to open-source, we support open source projects, and that's another big part of what we do. And then there's open API is things like SQL or Python. Uh, again, uh, common ways of interacting with data that are broadly adopted. And this one, again, create standardization. It makes it easier for customers to inter-operate and be flexible. And so open is really present all the way through. And it's a big part, I think, of, uh, the present and the future. >>Yeah. It's going to be fun to watch and see how that grows. It seems to be a lot of traction there. I want to ask you about, um, the other comment I thought was cool. You had the architectural slides out there. One was data lakes built on S3, and you had a theme, the glue in lake formation kind of around S3. And then you had the constellation of, you know, Kinesis SageMaker and other things around it. And you said, you know, pick the tool for the right job. And then you had the other slide on the analytics at the center and you had Redshift and all the other, other, other services around it around serverless. So one was more about the data lake with Athena glue and lake formation. The other one's about serverless. Explain that a little bit more for me, because I'm trying to understand where that fits. I get the data lake piece. Okay. Athena glue and lake formation enables it, and then you can pick and choose what you need on the serverless side. What does analytics in the center mean? >>So the idea there is that really, we wanted to talk about the fact that if you zoom into the analytics use case within analytics, everything that we offer, uh, has a serverless option for our customers. So, um, you could look at the bucket of analytics across things like Redshift or EMR or Athena, or, um, glue and league permission. You have the option to use instances or containers, but also to just not worry about infrastructure and just think declaratively about the data that you want to. >>Oh, so basically you're saying the analytics is going serverless everywhere. Talking about volumes, you mentioned 10 X volumes. Um, what are other stats? Can you share in terms of volumes? What are people seeing velocity I've seen data warehouses can't move as fast as what we're seeing in the cloud with some of your customers and how they're using data. How does the volume and velocity community have any kind of other kind of insights into those numbers? >>Yeah, I mean, I think from a stats perspective, um, you know, take Redshift, for example, customers are processing. So reading and writing, um, multiple exabytes of data there across from each shift. And, uh, you know, one of the things that we've seen in, uh, as time has progressed as, as data volumes have gone up and did a tapes have exploded, uh, you've seen data warehouses get more flexible. So we've added things like the ability to put semi-structured data and arbitrary, nested data into Redshift. Uh, we've also seen the seamless integration of data warehouses and data lakes. So, um, actually Redshift was one of the first to enable a straightforward acquiring of data. That's sitting in locally and drives as well as feed and that's managed on a stream and, uh, you know, those trends will continue. I think you'll kind of continue to see this, um, need to query data wherever it lives and, um, and, uh, allow, uh, leaks and warehouses and purpose-built stores to interconnect. >>You know, one of the things I liked about your presentation was, you know, kind of had the theme of, you know, modernize, unify, innovate, um, and we've been covering a lot of companies that have been, I won't say stumbling, but like getting to the future, some go faster than others, but they all kind of get stuck in an area that seems to be the same spot. It's the silos, breaking down the silos and get in the data lakes and kind of blending that purpose built data store. And they get stuck there because they're so used to silos and their teams, and that's kind of holding back the machine learning side of it because the machine learning can't do its job if they don't have access to all the data. And that's where we're seeing machine learning kind of being this new iterative model where the models are coming in faster. And so the silo brake busting is an issue. So what's your take on this part of the equation? >>Uh, so there's a few things I plan it. So you're absolutely right. I think that transition from some old data to interconnected data is always straightforward and it operates on a number of levels. You want to have the right technology. So, um, you know, we enable things like queries that can span multiple stores. You want to have good governance, you can connect across multiple ones. Uh, then you need to be able to get data in and out of these things and blue plays that role. So there's that interconnection on the technical side, but the other piece is also, um, you know, you want to think through, um, organizationally, how do you organize, how do you define it once data when they share it? And one of the asylees for enabling that sharing and, um, think about, um, some of the processes that need to get put in place and create the right incentives in your company to enable that data sharing. And then the foundational piece is good guardrails. You know, it's, uh, it can be scary to open data up. And, uh, the key to that is to put good governance in place where you can ensure that data can be shared and distributed while remaining protected and adhering to the privacy and compliance and security regulations that you have for that. And once you can assert that level of protection, then you can set that data free. And that's when, uh, customers really start to see the benefits of connecting all of it together, >>Right? And then we have a batch of startups here on this episode that are doing a lot of different things. Uh, some have, you know, new lake new lakes are forming observability lakes. You have CQL innovation on the front end data, tiering innovation at the data tier side, just a ton of innovation around this new data as code. How do you see as executive at AWS? You're enabling all this, um, where's the action going? Where are the white spaces? Where are the opportunities as this architecture continues to grow, um, and get traction because of the relevance of machine learning and AI and the apps are embedding data in there now as code where's the opportunities for these startups and how can they continue to grow? >>Yeah, the, I mean, the opportunity is it's amazing, John, you know, we talked a little bit about this at the beginning, but the, there is no slow down insight for the volume of data that we're generating pretty much everything that we have, whether it's a watch or a phone or the systems that we interact with are generating data and, uh, you know, customers, uh, you know, we talk a lot about the things that'll stay the same over time. And so, you know, the data volumes will continue to go up. Customers are gonna want to keep analyzing that data to make sense of it. They're going to want to be able to do it faster and more cheaply than they were yesterday. And then we're going to want to be able to make decisions and innovate, uh, in a shorter cycle and run more experiments than they were able to do. >>And so I think as long as, and they're always going to want this data to be secure and well-protected, and so I think as long as we, and the startups that we work with can continue to push on making these things better. Can I deal with more data? Can I deal with it more cheaply? Can I make it easier to get insight? And can I maintain a super high bar in security investments in these areas will just be off. Um, because, uh, the demand side of this equation is just in a great place, given what we're seeing in terms of theater and the architect for forum. >>I also love your comment about, uh, ML integration being the last leg of the equation here or less likely the journey, but you've got that enablement of the AIP solves a lot of problems. People can see benefits from good machine learning and AI is creating opportunities. Um, and also you also have mentioned the end to end with security piece. So data and security are kind of going hand in hand these days, not just the governments and the compliance stuff we're talking about security. So machine learning integration kind of connects all of this. Um, what's it all mean for the customers, >>For customers. That means that with machine learning and really enabling themselves to use machine learning, to make sense of data, they're able to find patterns that can represent new opportunities, um, quicker than ever before. And they're able to do it, uh, dynamically. So, you know, in a prior version of the world, we'd have little bit of systems and they would be relatively rigid and then we'd have to improve them. Um, with machine learning, this can be dynamic and near real time and you can customize them. So, uh, that just represents an opportunity to deepen relationships with customers and create more value and to find more efficiency in how businesses are run. So that piece is there. Um, and you know, your ideas around, uh, data's code really come into play because machine learning needs to be repeatable and explainable. And that means versioning, uh, keeping track of everything that you've done from a code and data and learning and training perspective >>And data sets are updating the machine learning. You got data sets growing, they become code modules that can be reused and, uh, interrogated, um, security okay. Is a big as a big theme data, really important security is seen as one of our top use cases. Certainly now in this day and age, we're getting a lot of, a lot of breaches and hacks coming in, being defended. It brings up the open, brings up the data as code security is a good proxy for kind of where this is going. What's your what's take on that and your reaction to that. >>So I'm, I'm security. You can, we can never invest enough. And I think one of the things that we, um, you know, guide us in AWS is security, availability, durability sort of jobs, you know, 1, 2, 3, and, um, and it operates at multiple levels. You need to protect data and rest with encryption, good key management and good practices though. You need to protect data on the wire. You need to have a good sense of what data is allowed to be seen by whom. And then you need to keep track of who did what and be able to verify and come back and prove that, uh, you know, uh, only the things that were allowed to happen actually happened. And you can actually then use machine learning on top of all of this apparatus to say, uh, you know, can I detect things that are happening that shouldn't be happening in near real time so they could put a stop to them. So I don't think any of us can ever invest enough in securing and protecting my data and our systems, and it is really fundamental or adding customer trust and it's just good business. So I think it is absolutely crucial. And we think about it all the time and are always looking for ways to raise >>Well, I really appreciate you taking the time to give the keynote final word here for the folks watching a lot of these startups that are presenting, they're doing well. Business wise, they're being used by large enterprises and people buying their products and using their services for customers are implementing more and more of the hot startups products they're relevant. What's your advice to the customer out there as they go on this journey, this new data as code this new future of analytics, what's your recommendation. >>So for customers who are out there, uh, recommend you take a look at, um, what, uh, the startups on AWS are building. I think there's tremendous innovation and energy, uh, and, um, there's really great technology being built on top of a rock solid platform. And so I encourage customers thinking about it to lean forward, to think about new technology and to embrace, uh, move to the cloud suite, modernized, you know, build a single picture of our data and, and figure out how to innovate and when >>Well, thanks for coming on. Appreciate your keynote. Thanks for the insight. And thanks for the conversation. Let's hand it off to the show. Let the show begin. >>Thank you, John pleasure, as always.

Published Date : Apr 5 2022

SUMMARY :

And we're going to kick it off here with our opening keynote with um, to help showcase some of the great innovation that startups are doing on top of AWS. service loss of serverless as the center of the, of the action, but all these start-ups rock set Dremio And so it's a great time to be in the data business. It's interesting to see the theme of the show getting traction, because you start to see data being treated and especially so in machine learning where you need to think about the explainability of a model, Uh, thank you so much for coming on and being the keynote presenter here for this great event. And so what we're seeing is, uh, you know, it's really about the survival And so, um, you know, it's great to see the innovation that's happening to help customers make So, um, you know, huge, uh, transformation journey for FINRA over the years of customer And the key to that is good foundational governance. And you want to be able to connect data that's in data lakes with data And then you have the ability to use the right tool for the right job. And, um, you know, some of the core ideas that guide the work that we do, um, scalable data lakes at And that's been another big trend is, uh, real time. and freeing customers from the need to think about capacity management. those only have access to the new data that's been tagged with the new tags, and it allows you to And time-travel, uh, you know, John talked about data as code And here are the ideas, you know, how can we up our systems get smarter at the surface, I have to ask you some questions on the end-to-end Uh, so the basic hierarchy is, you know, historically legacy systems are I know as you mentioned, modern data strategy gives you the best of both worlds. And I know some of the startups, um, you know, that we're talking about as part of the showcase And then you had the other slide on the analytics at the center and you had Redshift and all the other, So the idea there is that really, we wanted to talk about the fact that if you zoom about volumes, you mentioned 10 X volumes. And, uh, you know, one of the things that we've seen And so the silo brake busting is an issue. side, but the other piece is also, um, you know, you want to think through, Uh, some have, you know, new lake new lakes are forming observability lakes. And so, you know, the data volumes will continue to go up. And so I think as long as, and they're always going to want this data to be secure and well-protected, Um, and also you also have mentioned the end to end with security piece. And they're able to do it, uh, that can be reused and, uh, interrogated, um, security okay. And then you need to keep track of who did what and be able Well, I really appreciate you taking the time to give the keynote final word here for the folks watching a And so I encourage customers thinking about it to lean forward, And thanks for the conversation.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Rahul PathakPERSON

0.99+

JohnPERSON

0.99+

20 terabytesQUANTITY

0.99+

AWSORGANIZATION

0.99+

2013DATE

0.99+

20%QUANTITY

0.99+

yesterdayDATE

0.99+

twoQUANTITY

0.99+

S3TITLE

0.99+

PythonTITLE

0.99+

FINRAORGANIZATION

0.99+

10 XQUANTITY

0.99+

AmazonORGANIZATION

0.99+

hundred percentQUANTITY

0.99+

SQLTITLE

0.98+

bothQUANTITY

0.98+

OneQUANTITY

0.98+

80 minutesQUANTITY

0.98+

each shiftQUANTITY

0.98+

one pieceQUANTITY

0.98+

about 80%QUANTITY

0.98+

NeptuneLOCATION

0.98+

oneQUANTITY

0.98+

PinterestORGANIZATION

0.98+

todayDATE

0.97+

QuickSightORGANIZATION

0.97+

threeQUANTITY

0.97+

RedshiftTITLE

0.97+

wave of reinventionEVENT

0.97+

firstEVENT

0.96+

hundreds of petabytesQUANTITY

0.96+

HANATITLE

0.96+

firstQUANTITY

0.95+

both worldsQUANTITY

0.95+

AuroraLOCATION

0.94+

AmexORGANIZATION

0.94+

SASORGANIZATION

0.94+

pandemicEVENT

0.94+

12%QUANTITY

0.93+

about 10QUANTITY

0.93+

past couple of yearsDATE

0.92+

KafkaTITLE

0.92+

KinesisORGANIZATION

0.92+

LiccardoTITLE

0.91+

EMRTITLE

0.91+

about fiveQUANTITY

0.89+

tens of thousands of nodesQUANTITY

0.88+

KinesisTITLE

0.88+

10%QUANTITY

0.87+

three XQUANTITY

0.86+

AthenaORGANIZATION

0.86+

about 250 billion records perQUANTITY

0.85+

U SORGANIZATION

0.85+

CAFCAORGANIZATION

0.84+

SiliconORGANIZATION

0.83+

every five yearsQUANTITY

0.82+

Season twoQUANTITY

0.82+

AthenaOTHER

0.78+

single pictureQUANTITY

0.74+

Wen Phan, Ahana & Satyam Krishna, Blinkit & Akshay Agarwal, Blinkit | AWS Startup Showcase S2 E2


 

(gentle music) >> Welcome everyone to theCUBE's presentation of the AWS Startup Showcase. The theme is Data as Code; The Future of Enterprise Data and Analytics. This is the season two, episode two of the ongoing series of covering the exciting startups in the AWS ecosystem around data analytics and cloud computing. I'm your host, John Furrier. Today we're joined by great guests here. Three guests. Wen Phan, who's a Director of Product Management at Ahana, Satyam Krishna, Engineering Manager at Blinkit, and we have Akshay Agarwal, Senior Engineer at Blinkit as well. We're going to get into the relationship there. Let's get into. We're going to talk about how Blinkit's using open data lake, data house with Presto on AWS. Gentlemen, thanks for joining us. >> Thanks for having us. >> So we're going to get into the deep dive on the open data lake, but I want to just quickly get your thoughts on what it is for the folks out there. Set the table. What is the open data lakehouse? Why it is important? What's in it for the customers? Why are we seeing adoption around this because this is a big story. >> Sure. Yeah, the open data lakehouse is really being able to run a gamut of analytics, whether it be BI, SQL, machine learning, data science, on top of the data lake, which is based on inexpensive, low cost, scalable storage. And more importantly, it's also on top of open formats. And this to the end customer really offers a tremendous range of flexibility. They can run a bunch of use cases on the same storage and great price performance. >> You guys have any other thoughts on what's your reaction to the lakehouse? What is your experience with it? What's going on with Blinkit? >> No, I think for us also, it has been the primary driver of how as a company we have shifted our completely delivery model from us delivering in one day to someone who is delivering in 10 minutes, right? And a lot of this was made possible by having this kind of architecture in place, which helps us to be more open-source, more... where the tools are open-source, we have an open table format which helps us be very modular in nature, meaning we can pick solutions which works best for us, right? And that is the kind of architecture that we want to be in. >> Awesome. Wen, you know last time we chat with Ahana, we had a great conversation around Presto, data. The theme of this episode is Data as Code, which is interesting because in all the conversations in these episodes all around developers, which administrators are turning into developers, there's a developer vibe with data. And with opensource, it's software. Now you've got data taking a similar trajectory as how software development was with code, but the people running data they're not developers, they're administrators, they're operators. Now they're turning into DataOps. So it's kind of a similar vibe going on with branches and taking stuff out of and putting it back in, and testing it. Datasets becoming much more stable, iterating on machine learning algorithm. This is a movement. What's your guys reaction before we get into the relationships here with you guys. But, what's your reaction to this Data as Code movement? >> Yeah, so I think the folks at Blinkit are doing a great job there. I mean, they have a pretty compact data engineering team and they have some pretty stringent SLAs, as well as in terms of time to value and reliability. And what that ultimately translates for them is not only flexibility but reliability. So they've done some very fantastic work on a lot of automation, a lot of integration with code, and their data pipelines. And I'm sure they can give the details on that. >> Yes. Satyam and Akshay, you guys are engineers' software, but this is becoming a whole another paradigm where the frontline coding and or work or engineer data engineering is implementing the operations as well. It's kind of like DevOps for data. >> For sure. Right. And I think whenever you're working, even as a software engineer, the understanding of business is equally important. You cannot be working on something and be away from business, right? And that's where, like I mentioned earlier, when we realized that we have to completely move our stack and start giving analytics at 10 minutes, right. Because when you're delivering in 10 minutes, your leaders want to take decisions in your real-time. That means you need to move with them. You need to move with business. And when you do that, the kind of flexibility these softwares give is what enables the businesses at the end of the day. >> Awesome. This is the really kind of like, is there going to be a book called agile data warehouses? I don't think so. >> I think so. (laughing) >> The agile cloud data. This is cool. So let's get into what you guys do. What is Blinkit up to? What do you guys do? Can you take a minute to explain the company and your product? >> Sure. I'll take that. So Blinkit is India's biggest 10 minute delivery platform. It pioneered the delivery model in the country with over 10 million Indian shopping on our platform, ranging from everything: grocery staples, vegetables, emergency services, electronics, and much more, right. It currently delivers over 200,000 orders every day, and is in a hurry to bring the future of farmers to everyone in India. >> What's the relationship with Ahana and Blinkit? Wen, what's the tie in? >> Yeah, so Blinkit had a pretty well formed stack. They needed a little bit more flexibility and control. They thought a managed service was the way to go. And here at Ahana, we provide a SaaS managed service for Presto. So they engaged us and they evaluated our offering. And more importantly, we're able to partner. As a early stage startup, we really rely on very strong partners with great use cases that are willing to collaborate. And the folks at Blinkit have been really great in helping us push our product, develop our product. And we've been very happy about the value that we've been able to deliver to them as well. >> Okay. So let's unpack the open data lakehouse. What is it? What's under the covers? Let's get into it. >> Sure. So if bring up a slide. Like I said before, it's really a paradigm on being able to run a gamut of analytics on top of the open data lake. So what does that mean? How did it come about? So on the left hand side of the slide, we are coming out of this world where for the last several decades, the primary workhorse for SQL based processing and reporting and dashboarding use cases was really the data warehouse. And what we're seeing is a shift due to the trends in inexpensive scalable storage, cloud storage. The proliferation of open formats to facilitate using this storage to get certain amounts of reliability and performance, and the adoption of frameworks that can operate on top of this cloud data lake. So while here at Ahana, we're primarily focused on SQL workloads and Presto, this architecture really allows for other types of frameworks. And you see the ML and AI side. And like to Satyam's point earlier, offers a great amount of flexibility modularity for many use cases in the cloud. So really, that's really the lakehouse, and people like it for the performance, the openness, and the price performance. >> How's the open-source open side of it playing in the open-source? It's kind of open formats. What is the open-source angle on this because there's a lot of different approaches. I'm hearing open formats. You know, you have data stores which are a big part of seeing that. You got SQL, you mentioned SQL. There's got a mishmash of opportunities. Is it all coexisting? Is it one tool to rule the world or is it interchangeable? What's the open-source angle? >> There's multiple angles and I'll let definitely Satyam add to what I'm saying. This was definitely a big piece for Blinkit. So on one hand, you have the open formats. And what really the open formats enable is multiple compute engines to work on that data. And that's very huge. 'Cause it's open, you're not locked in. I think the other part of open that is important and I think it was important to Blinkit was the governance around that. So in particular Presto is governed by the Linux Foundation. And so, as a customer of open-source technology, they want some assurances for things like how's it governed? Is the license going to change? So there's that aspect of openness that I think is very important. >> Yeah. Blinkit, what's the data strategy here with lakehouse and you guys? Why are you adopting this type of architecture? >> So adding to what... Yeah, I think adding to Wen said, right. When we are thinking in terms of all these OpenStacks, you have got these open table formats, everything which is deployed over cloud, the primary reason there is modularity. It's as simple as that, right. You can plug and play so many different table formats from one thing to another based on the use case that you're trying to serve, so that you get the most value out of data. Right? I'll give you a very simple example. So for us we use... not even use one single table format. It's not that one thing solves for everything, right? We use both Hudi and Iceberg to solve for different use cases. One is good for when you're working for a certain data site. Icebergs works well when you're in the SQL kind of interface, right. Hudi's still trying to reach there. It's going to go there very soon. So having the ability to plug and play different formats based on the use case helps you to grow faster, helps you to take decisions faster because you now you're not stuck on one thing. They will have to implement it. Right. So I think that's what it is great about this data lake strategy. Keeping yourself cost effective. Yeah, please. >> So the enablement is basically use case driven. You don't have to be rearchitecturing for use cases. You can simply plug can play based on what you need for the use case. >> Yeah. You can... and again, you can focus on your business use case. You can figure out what your business users need and not worry about these things because that's where Presto comes in, helps you stitch that data together with multiple data formats, give you the performance that you need and it works out the best there. And that's something that you don't get to with traditional warehouse these days. Right? The kind of thing that we need, you don't get that. >> I do want to add. This is just to riff on what Satyam said. I think it's pretty interesting. So, it really allowed him to take the best-of-breed of what he was seeing in the community, right? So in the case of table formats, you've got Delta, you've got Hudi, you've got Iceberg, and they all have got their own roadmap and it's kind of organic of how these different communities want to evolve, and I think that's great, but you have these end consumers like Blinkit who have different maybe use cases overlapping, and they're not forced to pick one. When you have an open architecture, they can really put together best-of-breed. And as these projects evolve, they can continue to monitor it and then make decisions and continue to remain agile based on the landscape and how it's evolving. >> So the agility is a key point. Flexibility and agility, and time to valuing with your data. >> Yeah. >> All right. Wen, I got to get in to why the Presto is important here. Where does that fit in? Why is Presto important? >> Yeah. For me, it all comes down to the use cases and the needs. And reporting and dashboarding is not going to go away anytime soon. It's a very common use case. Many of our customers like Blinkit come to us for that use case. The difference now is today, people want to do that particular use case on top of the modern data lake, on top of scalable, inexpensive, low cost storage. Right? In addition to that, there's a need for this low latency interactive ability to engage with the data. This is often arises when you need to do things in a ad hoc basis or you're in the developmental phase of building things up. So if that's what your need is. And latency's important and getting your arms around the problems, very important. You have a certain SLA, I need to deliver something. That puts some requirements in the technology. And Presto is a perfect for that ideal use case. It's ideal for that use case. It's distributed, it's scalable, it's in memory. And so it's able to really provide that. I think the other benefit for Presto and why we're bidding on Presto is it works well on the data lakes, but you have to think about how are these organizations maturing with this technology. So it's not necessarily an all or nothing. You have organizations that have maybe the data lake and it's augmented with other analytical data stores like Snowflake or Redshift. So Presto also... a core aspect is its ability to federate or connect and query across different data sources. So this can be a permanent thing. This could also be a transitionary thing. We have some customers that are moving and slowly shifting their data portfolio from maybe all data warehouse into 80% data lake. But it gives that optionality, it gives that ability to transition over a timeframe. But for all those reasons, the latency, the scalability, the federation, is why Presto for this particular use case. >> And you can connect with other databases. It can be purpose built database, could be whatever. Right? >> Sure. Yes, yes. Presto has a very pluggable architecture. >> Okay. Here's the question for the Blinkit team? Why did you choose Presto and what led you to Ahana? >> So I'll take this better, over this what Presto sits well in that reach is, is how it is designed. Like basically, Presto decouples your storage with the compute. Basically like, people can use any storage and Presto just works as a query engine for them. So basically, it has a constant connectors where you can connect with a real-time databases like Pinot or a Druid, along with your warehouses like Redshift, along with your data lake that's like based on Hudi or Iceberg. So it's like a very landscape that you can use with the Presto. And consumers like the analytics doesn't need to learn the SQL or different paradigms of the querying for different sources. They just need to learn a single source. And, they get a single place to consume from. They get a single consumer on their single destination to write on also. So, it's a homologous architecture, which allows you to put a central security like which Presto integrates. So it's also based on open architecture, that's Apache engine. And it has also certain innovative features that you can see based on caching, which reduces a lot of the cost. And since you have further decoupled your storage with the compute, you can further reduce your cost, because now the biggest part of our tradition warehouse is a storage. And the cost goes massively upwards with the amount of data that you've added. Like basically, each time that you add more data, you require more storage, and warehouses ask you to write the data in their own format. Over here since we have decoupled that, the storage cost have gone down. It's literally that your cost that you are writing, and you just pay for the compute, and you can scale in scale out based on the requirements. If you have high traffic, you scale out. If you have low traffic, you scale in. So all those. >> So huge cost savings. >> Yeah. >> Yeah. Cost effectiveness, for sure. >> Cost effectiveness and you get a very good price value out of it. Like for each query, you can estimate what's the cost for you based on that tracking and all those things. >> I mean, if you think about the other classic Iceberg and what's under the water you don't know, it's the hidden cost. You think about the tooling, right, and also, time it takes to do stuff. So if you have flexibility on choice, when we were riffing on this last time we chatted with you guys and you brought it up earlier around, you can have the open formats to have different use cases in different tools or different platforms to work on it. Redshift, you can use Redshift here, or use something over there. You don't have to get locking >> Absolutely. >> Satyam & Akshay: Yeah. >> Locking is a huge problem. How do you guys see that 'cause sounds like here there's not a lot of locking. You got the open formats, and you got choice. >> Yeah. So you get best of the both worlds. Like you get with Ahana or with the Presto, you can get the best of the both worlds. Since it's cloud native, you can easily deploy your clusters very easily within like five minutes. Your cluster is up, you can start working on it. You can deploy multiple clusters for multiple teams. You get also flexibility of adding new connectors since it's open and further it's also much more secure since it's based on cloud native. So basically, you can control your security endpoints very well. So all those things comes in together with this architecture. So you can definitely go more on the lakehouse architecture than warehousing when you want to deliver data value faster. And basically, you get the much more high value out of your data in a sorted template. >> So Satyam, it sounds like the old warehousing was like the application person, not a lot of usage, old, a lot of latency. Okay. Here and there. But now you got more speed to deploy clusters, scale up scale down. Application developers are as everyone. It's not one person. It's not one group. It's whenever you want. So, you got speed. You got more diversity in the data opportunities, and your coding. >> Yeah. I think data warehouses are a way to start for every organization who is getting into data. I don't think data warehousing is still a solution and will be a solution for a lot of teams which are still getting into data. But as soon as you start scaling, as you start seeing the cost going up, as you start seeing the number of use cases adding up, having an open format definitely helps. So, I would say that's where we are also heading into and that's how our journey as well started with Presto as well, why we even thought about Ahana, right. >> (John chuckles) >> So, like you mentioned, one of the things that happened was as we were moving to the lakehouse and the open table format, I think Ahana is one of the first ones in the market to have Hudi as a first class citizen completely supported with all the things which are not even present at the time of... even with Presto, right. So we see Ahana working behind the scenes, improving even some of the things already over the open-source ecosystem. And that's where we get the most value out of Ahana as well. >> This is the convergence of open-source magic and commercialization. Wen, because you think about Data as Code, reminds me, I hear, "Data warehouse, it's not going to go away." But you got cloud scale or scale. It reminds me of the old, "Oh yeah, I have a data center." Well, here comes the cloud. So, doesn't really kill the data center, although Amazon would say that the data center's going to be eliminated. No, you just use it for whatever you need it for. You use it for specific use cases, but everyone, all the action goes to the cloud for scale. The same things happen with data, and look at the open-source community. It's kind of coming together. Data as Code is coming together. >> Yeah, absolutely. >> Absolutely. >> I do want to again to connect on another dot in terms of cost and that. You know, we've been talking a little bit about price performance, but there's an implicit cost, and I think this was also very important to Blinkit, and also why we're offering a managed service. So one piece of it. And it really revolves around the people, right? So outside of the technology, the performance. One thing that Akshay brought up and it's another important piece that I should have highlighted a little bit more is, Presto exposes the ability to interact your data in a widely adopted way, which is basically ANSI SQL. So the ability for your practitioners to use this technology is huge. That's just regular Presto. In terms of a managed service, the guys at Blinkit are a great high performing team, but they have to be very efficient with their time and what they manage. And what we're trying to do is provide leverage for them. So take a lot of the heavy lifting away, but at the same time, figuring out the right things to expose so that they have that same flexibility. And that's been the balancing point that we've been trying to balance at Ahana, but that goes back to cost. How do I total cost of ownership? And that not doesn't include just the actual querying processing time, but the ability for the organization to go ahead and absorb the solution. And what does it cost in terms of the people involved? >> Yeah. Great conversation. I mean, this brings up the question of back in the data center, the cloud days, you had the concept of an SRE, which is now popular, site reliability engineer. One person does all the clusters and manages all the scale. Is the data engineer the new SRE for data? Are we seeing a similar trajectory? Just want to get your reaction. What do you guys think? >> Yes, so I would say, definitely. It depends on the teams and the sizes of that. We are high performing team so each automation takes bits on the pieces of the architecture, like where they want to invest in. And it comes out with the value of the engineer's time and basically like how much they can invest in, how much they need to configure the architecture, and how much time it'll take to time to market. So basically like, this is what I would also highlight as an engineer. I found Ahana like the... I would say as a Presto in a cloud native environment, or I think so there's the one in the market that seamlessly scales and then scales out. And further, with a team of us, I would say our team size like three to four engineers managing cluster day in day out, conferring, tuning and all those things takes a lot of time. And Ahana came in and takes it off our plate and the hands in a solution which works out of box. So that's where this comes in. Ahana it's also based on open-source community. >> So the time of the engineer's time is so valuable. >> Yeah. >> My take on it really in terms of the data engineering being the SRE. I think that can work, it depends on the actual person, and we definitely try to make the process as easy as possible. I think in Blinkit's case, you guys are... There are data platform owners, but they definitely are aware of the pipelines. >> John: Yeah. >> So they have very intimate knowledge of what data engineers do, but I think in their case, you guys, you're managing a ton of systems. So it's not just even Presto. They have a ton of systems and surfacing that interface so they can cater to all the data engineers across their data systems, I think is the big need for them. I know you guys you want to chime in. I mean, we've seen the architecture and things like that. I think you guys did an amazing job there. >> So, and to adding to Wen's point, right. Like I generally think what DevOps is to the tech team. I think, what is data engineer or the data teams are to the data organization, right? Like they play a very similar role that you have to act as a guardrail to ensure that everyone has access to the data so the democratizing and everything is there, but that has to also come with security, right? And when you do that, there are (indistinct) a lot of points where someone can interact with data. We have... And again, there's a mixed match of open-source tools that works well, as well. And there are some paid tools as well. So for us like for visualization, we use Redash for our ad hoc analysis. And we use Tableau as well whenever we want to give a very concise reporting. We have Jupyter notebooks in place and we have EMRs as well. So we always have a mixed batch of things where people can interact with data. And most of our time is spent in acting as that guardrail to ensure that everyone should have access to data, but it shouldn't be exploited, right. And I think that's where we spend most of our time in. >> Yeah. And I think the time is valuable, but that your point about the democratization aspect of it, there seems to be a bigger step function value that you're enabling and needs to be talked out. The 10x engineer, it's more like 50x, right? If you get it done right, the enablement downstream at the scale that we're seeing with this new trend is significant. It's not just, oh yeah, visualization and get some data quicker, there's actually real advantages on a multiple with that engineering. So, and we saw that with DevOps, right? Like, you do this right and then magic happens on the edges. So, yeah, it's interesting. You guys, congratulations. Great environment. Thanks for sharing the insight Blinkit. Wen, great to see you. Ahana again with Presto, congratulations. The open-source meets data engineering. Thanks so much. >> Thanks, John. >> Appreciate it. >> Okay. >> Thanks John. >> Thanks. >> Thanks for having us. >> This season two, episode two of our ongoing series. This one is Data as Code. This is theCUBE. I'm John furrier. Thanks for watching. (gentle music)

Published Date : Apr 1 2022

SUMMARY :

This is the season two, episode What is the open data lakehouse? And this to the end customer And that is the kind of into the relationships here with you guys. give the details on that. is implementing the operations as well. You need to move with business. This is the really kind of like, I think so. So let's get into what you guys do. and is in a hurry to bring And the folks at Blinkit the open data lakehouse. So on the left hand side of the slide, What is the open-source angle on this Is the license going to change? with lakehouse and you guys? So having the ability to plug So the enablement is and again, you can focus So in the case of table formats, So the agility is a key point. Wen, I got to get in and the needs. And you can connect Presto has a very pluggable architecture. and what led you to Ahana? And consumers like the analytics and you get a very good and also, time it takes to do stuff. and you got choice. best of the both worlds. like the old warehousing as you start seeing the cost going up, and the open table format, the data center's going to be eliminated. figuring out the right things to expose and manages all the scale. and the sizes of that. So the time of the it depends on the actual person, I think you guys did an amazing job there. So, and to adding Thanks for sharing the insight Blinkit. This is theCUBE.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
John FurrierPERSON

0.99+

Wen PhanPERSON

0.99+

Akshay AgarwalPERSON

0.99+

JohnPERSON

0.99+

AmazonORGANIZATION

0.99+

AhanaPERSON

0.99+

IndiaLOCATION

0.99+

BlinkitORGANIZATION

0.99+

Satyam KrishnaPERSON

0.99+

Linux FoundationORGANIZATION

0.99+

AhanaORGANIZATION

0.99+

five minutesQUANTITY

0.99+

AkshayPERSON

0.99+

AWSORGANIZATION

0.99+

10 minutesQUANTITY

0.99+

Three guestsQUANTITY

0.99+

SatyamPERSON

0.99+

BlinkitPERSON

0.99+

one dayQUANTITY

0.99+

10 minuteQUANTITY

0.99+

RedshiftTITLE

0.99+

both worldsQUANTITY

0.99+

over 200,000 ordersQUANTITY

0.99+

PrestoPERSON

0.99+

over 10 millionQUANTITY

0.99+

SQLTITLE

0.99+

10xQUANTITY

0.99+

WenPERSON

0.98+

50xQUANTITY

0.98+

agileTITLE

0.98+

one pieceQUANTITY

0.98+

bothQUANTITY

0.98+

threeQUANTITY

0.98+

todayDATE

0.98+

oneQUANTITY

0.98+

single destinationQUANTITY

0.97+

One personQUANTITY

0.97+

each timeQUANTITY

0.96+

eachQUANTITY

0.96+

PrestoORGANIZATION

0.96+

one personQUANTITY

0.96+

single sourceQUANTITY

0.96+

TableauTITLE

0.96+

one toolQUANTITY

0.96+

IcebergsORGANIZATION

0.96+

TodayDATE

0.95+

OneQUANTITY

0.95+

one thingQUANTITY

0.95+

Steven Mih, Ahana & Girish Baliga, Uber | CUBE Conversation


 

(bright music) >> Hey everyone, welcome to this CUBE conversation featuring Ahana, I'm your host Lisa Martin. I've got two guests here with me today. Steven Mih joins us, the Presto Foundation governing board member, co-founder and CEO of Ahana, and Girish Baliga Presto Foundation governing board chair and senior engineering manager at Uber. Guys thanks for joining us. >> Thanks for having us. >> Thanks for having us. >> So Steven we're going to dig into and unpack Presto in the next few minutes or so, but Steven let's go ahead and start with you. Talk to us about some of the challenges with the open data lake house market. What are some of those key challenges that organizations are facing? >> Yeah, just pulling up the slide you know, what we see is that many organizations are dealing with a lot more data and very different data types and putting that all into, traditionally as the data warehouse, which has been the workhorse for BI and analytics traditionally, it becomes very, very expensive, and there's a lot of lock in associated with that. And so what's happening is that people are putting the data semistructured and unstructured data for example, in cloud data lakes or other data lakes, and they find that they can query directly with a SQL query engine like Presto. And that lets you have a much more approach to dealing with getting insights out of your data. And that's what this is all about, and that's why companies are moving to a modern architecture. Girish maybe you can share some of your thoughts on how Uber uses Presto for this. >> Yeah, at Uber we use Presto in our internal deployments. So at Uber we have our own data centers, we store data locally in our data centers, but we have made the conscious choice to go with an open data stack. Our entire data stack is built around open source technologies like Hadoop, Hive, Spark and Presto. And so Presto is an invaluable engine that is able to connect to all these different storage and data formats and allow us to have a single entry point for our users, to run their SQL engines and get insights rather quickly compared to some of the other engines that we have at Uber. >> So let's talk a little bit about Presto so that the audience gets a good overview of that. Steven starting with you, you talked about the challenges of the traditional data warehouse application. Talk to us about why Presto was founded the open, the project, give us that background information if you will. >> Absolutely, so Presto was originally developed out of the biggest hyperscaler out there which is Facebook now known as Meta. And they donated that project to the, and open sourced it and donated it to the Linux Foundation. And so Presto is a SQL query engine, it's a storage SQL query engine, that runs directly on open data lakes, so you can put your data into open formats like 4K or C, and get insights directly from that at a very good price performance ratio. The Presto Foundation of which Girish and I are part of, we're all working together as a consortium of companies that all want to see Presto continue to get bigger and bigger. Kind of like Kubernetes has a, has an organization called CNCF, Presto has Presto Foundation all under the umbrella of the Linux Foundation. And so there's a lot of exciting things that are coming on the roadmap that make Presto very unique. You know, RaptorX is a multilevel caching system that it's been fantastic, Aria optimizations are another area, we Ahana have developed some security features with donating the integrations with Apache Ranger and that's the type of things that we do to help the community. But maybe Girish can talk about some of the exciting items on the roadmap that you're looking forward to. >> Absolutely, I think from Uber's point of view just a sheer scale of data and our volume of query traffic. So we run about half a million Presto queries a day, right? And we have thousands of machines in our Presto deployments. So at that scale in addition to functionality you really want a system that can handle traffic reliably, that can scale, and that is backed by a strong community which guarantees that if you pull in the new version of Presto, you won't break anything, right? So all of those things are very important to us. So I think that's where we are relying on our partners particularly folks like Facebook and Twitter and Ahana to build and maintain this ecosystem that gives us those guarantees. So that is on the reliability front, but on the roadmap side we are also excited to see where Presto is extending. So in addition to the projects that Steven talked about, we are also looking at things like Presto and Spark, right? So take the Presto SQL and run it as a Spark job for instance, or running Presto on real-time analytics applications something that we built and contributed from Uber side. So we are all taking it in very different directions, we all have different use cases to support, and that's the exciting thing about the foundation. That it allows us all to work together to get Presto to a bigger and better and more flexible engine. >> You guys mentioned Facebook and I saw on the slide I think Twitter as well. Talk to me about some of the organizations that are leveraging the Presto engine and some of the business benefits. I think Steve you talked about insights, Steven obviously being able to get insights from data is critical for every business these days. >> Yeah, a major, major use case is finding the ad hoc and interactive queries, and being able to drive insights from doing so. And so, as I mentioned there's so much data that's being generated and stored, and to be able to query that data in place, at a, with very, very high performance, meaning that you can get answers back in seconds of time. That lets you have the interactive ability to drill into data and innovate your business. And so this is fantastic because it's been developed at hyperscalers like Uber that allow you to have open source technology, pick that up, and just download it right from prestodb.io, and then start to run with this and join the community. I think from an open source perspective this project under the governance of Linux Foundation gives you the confidence that it's fully transparent and you'll never see any licensing changes by the Linux Foundation charter. And therefore that means the technology remains free forever without later on limitations occurring, which then would perhaps favor commercialization of any one vendor. That's not the case. So maybe Girish your thoughts on how we've been able to attract industry giants to collaborate, to innovate further, and your thoughts on that. >> Yeah, so of the interesting I've seen in the space is that there is a bifurcation of companies in this ecosystem. So there are these large internet scale companies like Facebook, and Uber, and Twitter, which basically want to use something like Presto for their internal use cases. And then there is the second set of companies, enterprise companies like Ahana which basically wanted to take Presto and provide it as a service for other companies to use as an alternative to things like Snowflake and other systems right? So, and the foundation is a great place for both sets of companies to come together and work. The internet scale companies bring in the scale, the reliability, the different kind of ways in which you can challenge the system, optimize it, and so forth, and then companies like Ahana bring in the flexibility and the extensibility. So you can work with different clouds, different storage formats, different engines, and I think it's a great partnership that we can see happening primarily through the foundational spaces. Which you would be hard pressed to find in a single vendor or a, you know, a single-source system that is there on the market today. >> How long ago was the Presto Foundation initiated? >> It's been over three years now and it's been going strong, we're over a dozen members and it's open to everyone. And it's all governed like the Linux Foundation so we use best practices from that and you can just check it out at prestodb.io where you can get the software, or you can hear about how to join the foundation. So it includes members like Intel, and HPE as well, and we're really excited for new members to come, and contribute in and participate. >> Sounds like you've got good momentum there in the foundation. Steven talk a little bit about the last two years. Have you seen the acceleration in use cases in the number of users as we've been in such an interesting environment where the need for real-time insights is essential for every business initially a few couple of years ago to survive but now to be, to really thrive, is it, have you seen the acceleration in Presto in that timeframe? >> Absolutely, we see there's acceleration of being more data-driven and especially moving to cloud and having more data in the cloud, we think that innovation is happening, digital innovation is happening very fast and Presto is a major enabler of that, again, being able to get, drive insights from the data this is not just your typical business data, it's now getting into really clickstream data, knowing about how customers are operating today, Uber is a great example of all the different types of innovations they can drive, whether it be, you know, knowing in real time what's happening with rides, or offering you a subscription for special deals to use the service more. So, you know, Ahana we really love Presto, and we provide a SaaS manage service of the open source and provide free trials, and help people get up to speed that may not have the same type of skills as Uber or Facebook does. And we work with all companies in that way. >> Think about the consumers these days, we're very demanding, right? When I think one of the things that was in short supply during the last two years was patience. And if I think of Uber as a great example, I want to know if I'm asking for a ride I want to know exactly in real time what's coming for me? Where is it now? How many more minutes is it going to take? I mean, that need to fulfill real-time insights is critical across every industry but have you seen anything in the last couple years that's been more leading edge, like e-commerce or retail for example? I'm just curious. >> Girish you want to take that one or? >> Yeah, sure. So I can speak from the Uber point of view. So real-time insights has really exploded as an area, particularly as you mentioned with this just-in-time economy, right? Just to talk about it a little bit from Uber side, so some of the insights that you mentioned about when is your ride coming, and things of that nature, right? Look at it from the driver's point of view who are, now we have Uber Eats, so look at it from the restaurant manager's point of view, right? They also want to know how is their business coming? How many customer orders are coming for instance? what is the conversion rate? And so forth, right? And today these are all insights that are powered by a system which has a Presto as an front-end interface at Uber. And these queries run like, you have like tens of thousands of queries every single second, and the queries run in like a second and so forth. So you are really talking about production systems running on top of Presto, production serving systems. So coming to other use cases like eCommerce, we definitely have seen some of that uptake happen as well, so in the broader community for instance, we have companies like Stripe, and other folks who are also using this hashtag which is very similar to us based on another open source technology called Pino, using Presto as an interface. And so we are seeing this whole open data lakehouse more from just being, you know, about interactive analytics to driving all different kinds of analytics. Having anything to do with data and insights in this space. >> Yeah, sounds like the evolution has been kind of on a rocket ship the last couple years. Steven, one more time we're out of time, but can you mention that URL where folks can go to learn more? >> Yeah, prestodb.io and that's the Presto Foundation. And you know, just want to say that we'll be sharing the use case at the Startup Showcase coming up with theCUBE. We're excited about that and really welcome everyone to join the community, it's a real vibrant, expanding community and look forward to seeing you online. >> Sounds great guys. Thank you so much for sharing with us what Presto Foundation is doing, all of the things that it is catalyzing, great stuff, we look forward to hearing that customer use case, thanks for your time. >> Thank you. >> Thanks Lisa, thank you. >> Thanks everyone. >> For Steven and Girish, I'm Lisa Martin, you're watching theCUBE the leader in live tech coverage. (bright music)

Published Date : Mar 24 2022

SUMMARY :

and Girish Baliga Presto in the next few minutes or so, And that lets you have that is able to connect to so that the audience gets and that's the type of things that we do So that is on the reliability front, and some of the business benefits. and then start to run with So, and the foundation is a great place and it's open to everyone. in the number of users as we've been and having more data in the cloud, I mean, that need to fulfill so some of the insights that you mentioned Yeah, sounds like the evolution and look forward to seeing you online. all of the things that it For Steven and Girish, I'm Lisa Martin,

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Lisa MartinPERSON

0.99+

StevenPERSON

0.99+

StevePERSON

0.99+

GirishPERSON

0.99+

LisaPERSON

0.99+

UberORGANIZATION

0.99+

Steven MihPERSON

0.99+

Presto FoundationORGANIZATION

0.99+

FacebookORGANIZATION

0.99+

AhanaORGANIZATION

0.99+

Linux FoundationORGANIZATION

0.99+

CNCFORGANIZATION

0.99+

TwitterORGANIZATION

0.99+

IntelORGANIZATION

0.99+

two guestsQUANTITY

0.99+

HPEORGANIZATION

0.99+

PrestoORGANIZATION

0.99+

second setQUANTITY

0.99+

both setsQUANTITY

0.99+

over three yearsQUANTITY

0.99+

AhanaPERSON

0.98+

KubernetesORGANIZATION

0.98+

SparkTITLE

0.97+

Girish BaligaPERSON

0.97+

about half a millionQUANTITY

0.97+

todayDATE

0.97+

over a dozen membersQUANTITY

0.96+

oneQUANTITY

0.96+

PrestoTITLE

0.96+

SQLTITLE

0.95+

singleQUANTITY

0.95+

thousands of machinesQUANTITY

0.94+

every single secondQUANTITY

0.93+

Girish Baliga Presto FoundationORGANIZATION

0.92+

prestodb.ioOTHER

0.91+

last couple yearsDATE

0.9+

4KOTHER

0.89+

Startup ShowcaseEVENT

0.88+

one vendorQUANTITY

0.88+

Justin Borgman, Starburst and Teresa Tung, Accenture | AWS re:Invent 2021


 

>>Hey, welcome back to the cubes. Continuing coverage of AWS reinvent 2021. I'm your host, Lisa Martin. This is day two, our first full day of coverage. But day two, we have two life sets here with AWS and its ecosystem partners to remote sets over a hundred guests on the program. We're going to be talking about the next decade of cloud innovation, and I'm pleased to welcome back to cube alumni to the program. Justin Borkman is here, the co-founder and CEO of Starburst and Teresa Tung, the cloud first chief technologist at Accenture guys. Welcome back to the queue. Thank you. Thank you for having me. Good to have you back. So, so Teresa, I was doing some research on you and I see you are the most prolific prolific inventor at Accenture with over 220 patents and patent applications. That's huge. Congratulations. Thank you. Thank you. And I love your title. I think it's intriguing. I'd like to learn a little bit more about your role cloud-first chief technologist. Tell me about, >>Well, I get to think about the future of cloud and if you think about clouded powers, everything experiences in our everyday lives and our homes and our car in our stores. So pretty much I get to be cute, right? The rest of Accenture's James Bond >>And your queue. I like that. Wow. What a great analogy. Just to talk to me a little bit, I know service has been on the program before, but give me a little bit of an overview of the company, what you guys do. What were some of the gaps in the markets that you saw a few years ago and said, we have an idea to solve this? Sure. >>So Starburst offers a distributed query engine, which essentially means we're able to run SQL queries on data anywhere, uh, could be in traditional relational databases, data lakes in the cloud on-prem. And I think that was the gap that we saw was basically that people had data everywhere and really had a challenge with how they analyze that data. And, uh, my co-founders are the creators of an open source project originally called Presto now called Trino. And it's how Facebook and Netflix and Airbnb and, and a number of the internet companies run their analytics. And so our idea was basically to take that, commercialize that and make it enterprise grade for the thousands of other companies that are struggling with data management, data analytics problems. >>And that's one of the things we've seen explode during the last 22 months, among many other things is data, right? In every company. These days has to be a data company. If they're not, there's a competitor in the rear view rear view mirror, ready to come and take that place. We're going to talk about the data mesh Teresa, we're going to start with you. This is not a new car. This is a new concept. Talk to us about what a data mesh is and why organizations need to embrace this >>Approach. So there's a canonical definition about data mesh with four attributes and any data geek or data architect really resonates with them. So number one, it's really routed decentralized domain ownership. So data is not within a single line of business within a single entity within a single partner has to be across different domains. Second is publishing data as products. And so instead of these really, you know, technology solutions, data sets, data tables, really thinking about the product and who's going to use it. The third one is really around self-service infrastructure. So you want everybody to be able to use those products. And finally, number four, it's really about federated and global governance. So even though their products, you really need to make sure that you're doing the right things, but what's data money. >>We're not talking about a single tool here, right? This is more of a, an approach, a solution. >>It is a data strategy first and foremost, right? So companies, they are multi-cloud, they have many projects going on, they are on premise. So what do you do about it? And so that's the reality of the situation today, and it's first and foremost, a business strategy and framework to think about the data. And then there's a new architecture that underlines and supports that >>Just didn't talk to me about when you're having customer conversations. Obviously organizations need to have a core data strategy that runs the business. They need to be able to, to democratize really truly democratized data access across all business units. What are some of the, what are some of your customer conversations like are customers really embracing the data strategy, vision and approach? >>Yeah, well, I think as you alluded to, you know, every business is data-driven today and the pandemic, if anything has accelerated digital transformation in that move to become data-driven. So it's imperative that every business of every shape and size really put the power of data in the hands of everyone within their organization. And I think part of what's making data mesh resonates so well, is that decentralization concept that Teresa spoke about? Like, I think companies acknowledge that data is inherently decentralized. They have a lot of different database systems, different teams and data mesh is a framework for thinking about that. Then not only acknowledges that reality, but also braces it and basically says there's actually advantages to this decentralized approach. And so I think that's, what's driving the interest level in the data mesh, uh, paradigm. And it's been exciting to work with customers as they think about that strategy. And I think that, you know, essentially every company in the space is, is in transition, whether they're moving from on cloud to the prem, uh, to, uh, sorry, from on-prem to the cloud or from one cloud to another cloud or undergoing that digital transformation, they have left behind data everywhere. And so they're, they're trying to wrestle with how to grasp that. >>And there's, we know that there's so much value in data. The, the need is to be able to get it, to be able to analyze it quickly in real time. I think another thing we learned in the pandemic is it real-time is no longer a nice to have. It is essential for businesses in every organization. So Theresa let's talk about how Accenture and servers are working together to take the data mesh from a concept of framework and put this into production into execution. >>Yeah. I mean, many clients are already doing some aspect of the data mesh as I listed those four attributes. I'm sure everybody thought like I'm already doing some of this. And so a lot of that is reviewing your existing data projects and looking at it from a data product landscape we're at Amazon, right? Amazon famous for being customer obsessed. So in data, we're not always customer obsessed. We put up tables, we put up data sets, feature stores. Who's actually going to use this data. What's the value from it. And I think that's a big change. And so a lot of what we're doing is helping apply that product lens, a literal product lens and thinking about the customer. >>So what are some w you know, we often talk about outcomes, everything being outcomes focused and customers, vendors wanting to help customers deliver big outcomes, you know, cost reduction, et cetera, things like that. How, what are some of the key outcomes Theresa that the data mesh framework unlocks for organizations in any industry to be able to leverage? >>Yeah. I mean, it really depends on the product. Some of it is organizational efficiency and data-driven decisions. So just by the able to see the data, see what's happening now, that's great. But then you have so beyond the, now what the, so what the analytics, right. Both predictive prescriptive analytics. So what, so now I have all this data I can analyze and drive and predict. And then finally, the, what if, if I have this data and my partners have this data in this mesh, and I can use it, I can ask a lot of what if and, and kind of game out scenarios about what if I did things differently, all of this in a very virtualized data-driven fashion, >>Right? Well, we've been talking about being data-driven for years and years and years, but it's one thing to say that it's a whole other thing to actually be able to put that into practice and to use it, to develop new products and services, delight customers, right. And, and really achieve the competitive advantage that businesses want to have. Just so talk to me about how your customer conversations have changed in the last 22 months, as we've seen this massive acceleration of digital transformation companies initially, really trying to survive and figure out how to pivot, not once, but multiple times. How are those customer conversations changing now is as that data strategy becomes core to the survival of every business and its ability to thrive. >>Yeah. I mean, I think it's accelerated everything and, and that's been obviously good for companies like us and like Accenture, cause there's a lot of work to be done out there. Um, but I think it's a transition from a storage centric mindset to more of an analytics centric mindset. You know, I think traditionally data warehousing has been all about moving data into one central place. And, and once you get it there, then you can analyze it. But I think companies don't have the time to wait for that anymore. Right there, there's no time to build all the ETL pipelines and maintain them and get all of that data together. We need to shorten that time to insight. And that's really what we, what we've been focusing on with our, with our customers, >>Shorten that time to insight to get that value out of the data faster. Exactly. Like I said, you know, the time is no longer a nice to have. It's an absolute differentiator for folks in every business. And as, as in our consumer lives, we have this expectation that we can get whatever we want on our phone, on any device, 24 by seven. And of course now in our business lives, we're having the same expectation, but you have to be able to unlock that access to that data, to be able to do the analytics, to make the decisions based on what the data say. Are you, are you finding our total? Let's talk about a little bit about the go to market strategy. You guys go in together. Talk to me about how you're working with AWS, Theresa, we'll start with you. And then Justin we'll head over to you. Okay. >>Well, a lot of this is powered by the cloud, right? So being able to imagine a new data business to run the analytics on it and then push it out, all of that is often cloud-based. But then the great thing about data mesh it's it gives you a framework to look at and tap into multi-cloud on-prem edge data, right? Data that can't be moved because it is a private and secure has to be at the edge and on-prem so you need to have that's their data reality. And the cloud really makes this easier to do. And then with data virtualization, especially coming from the digital natives, we know it scales >>Just to talk to me about it from your perspective that the GTL. >>Yeah. So, I mean, I think, uh, data mesh is really about people process and technology. I think Theresa alluded to it as a strategy. It's, it's more than just technology. Obviously we bring some of that technology to bear by allowing customers to query the data where it lives. But the people in process side is just as important training people to kind of think about how they do data management, data analytics differently is essential thinking about how to create data as a product. That's one of the core principles that Theresa mentioned, you know, that's where I think, um, you know, folks like Accenture can be really instrumental in helping people drive that transformational change within their organization. And that's >>Hard. Transformational change is hard with, you know, the last 22 months. I've been hard on everyone for every reason. How are you facilitating? I'm curious, like to get Theresa, we'll start with you, your perspectives on how our together as servers and Accenture, with the power of AWS, helping to drive that cultural change within organizations. Because like we talked about Justin there, nobody has extra time to waste on anything these days. >>The good news is there's that imperative, right? Every business is a digital business. We found that our technology leaders, right, the top 10% investors in digital, they are outperforming are the laggards. So before pandemic, it's times to post pep devek times five, so there's a need to change. And so data is really the heart of the company. That's how you unlock your technical debt into technical wealth. And so really using cloud and technologies like Starburst and data virtualization is how we can actually do that. >>And so how do you, Justin, how does Starburst help organizations transfer that technical debt or reduce it? How does the D how does the data much help facilitate that? Because we talk about technical debt and it can, it can really add up. >>Yeah, well, a lot of people use us, uh, or think about us as an abstraction layer above the different data sources that they have. So they may have legacy data sources today. Um, then maybe they want to move off of over time, um, could be classical data, warehouses, other classical, uh, relational databases, perhaps they're moving to the cloud. And by leveraging Starburst as this abstraction, they can query the data that they have today, while in the background, moving data into the cloud or moving it into the new data stores that they want to utilize. And it sort of hides that complexity. It decouples the end user experience, the business analyst, the data scientists from where the data lives. And I think that gives people a lot of freedom and a lot of optionality. And I think, you know, the only constant is change. Um, and so creating an architecture that can stand the test of time, I think is really, really important. >>Absolutely. Speaking of change, I just saw the announcement about Starburst galaxy fully managed SAS platform now available in all three major clouds. Of course, here we are at AWS. This is a, is this a big directional shift for servers? >>It is, you know, uh, I think there's great precedent within open source enterprise software companies like Mongo DB or confluent who started with a self managed product, much the way that we did, and then moved in the direction of creating a SAS product, a cloud hosted, fully managed product that really I think, expands the market. And that's really essentially what we're doing with galaxy galaxy is designed to be as easy as possible. Um, you know, Starburst was already powerful. This makes it powerful and easy. And, uh, and, and in our view, can, can hopefully expand the market to thousands of potential customers that can now leverage this technology in a, in a faster, easier way, >>Just in sticking with you for a minute. Talk to me about kind of where you're going in, where services heading in terms of support for the data mesh architecture across industries. >>Yeah. So a couple of things that we've, we've done recently, and whether we're doing, uh, as we speak, one is, uh, we introduced a new capability. We call star gate. Now star gate is a connector between Starburst clusters. So you're going to have a Starbucks cluster, and let's say Azure service cluster in AWS, a Starbucks cluster, maybe an AWS west and AWS east. And this basically pushes the processing to where the data lives. So again, living within this construct of, uh, of decentralized data that a data mesh is all about, this allows you to do that at an even greater level of abstraction. So it doesn't even matter what cloud region the data lives in or what cloud entirely it lives in. And there are a lot of important applications for this, not only latency in terms of giving you fast, uh, ability to join across those different clouds, but also, uh, data sovereignty constraints, right? >>Um, increasingly important, especially in Europe, but increasingly everywhere. And, you know, if your data isn't Switzerland, it needs to stay in Switzerland. So starting date as a way of pushing the processing to Switzerland. So you're minimizing the data that you need to pull back to complete your analysis. And, uh, and so we think that's a big deal about, you know, kind of enabling a data mash on a, on a global scale. Um, another thing we're working on back to the point of data products is how do customers curate and create these data products and share them within their organization. And so we're investing heavily in our product to make that easier as well, because I think back to one of the things, uh, Theresa said, it's, it's really all about, uh, making this practical and finding quick wins that customers can deploy, deploy in their data mess journey, right? >>This quick wins are key. So Theresa, last question to you, where should companies go to get started today? Obviously everybody has gotten, we're still in this work from anywhere environment. Companies have tons of data, tons of sources of data, did it, infrastructure's already in place. How did they go and get started with data? >>I think they should start looking at their data projects and thinking about the best data products. I think just that mindset shift about thinking about who's this for what's the business value. And then underneath that architecture and support comes to bear. And then thinking about who are the products that your product could work better with just like any other practice partnerships, like what we have with AWS, right? Like that's a stronger together sort of thing, >>Right? So there's that kind of that cultural component that really strategic shift in thinking and on the architecture. Awesome guys, thank you so much for joining me on the program, coming back on the cube at re-invent talking about data mesh really help. You can help organizations and industry put that together and what's going on at service. We appreciate your time. Thanks again. All right. For my guests, I'm Lisa Martin, you're watching the cubes coverage of AWS reinvent 2021. The cube is the leader in global live tech coverage. We'll be right back.

Published Date : Nov 30 2021

SUMMARY :

Good to have you back. Well, I get to think about the future of cloud and if you think about clouded powers, I know service has been on the program before, but give me a little bit of an overview of the company, what you guys do. And it's how Facebook and Netflix and Airbnb and, and a number of the internet And that's one of the things we've seen explode during the last 22 months, among many other things is data, So even though their products, you really need to make sure that you're doing the right things, but what's data money. This is more of a, an approach, And so that's the reality of the situation today, and it's first and foremost, Just didn't talk to me about when you're having customer conversations. And I think that, you know, essentially every company in the space is, The, the need is to be able to get it, And so a lot of that is reviewing your existing data projects So what are some w you know, we often talk about outcomes, So just by the able to see the data, see what's happening now, that's great. Just so talk to me about how your customer conversations have changed in the last 22 But I think companies don't have the time to wait for that anymore. Let's talk about a little bit about the go to market strategy. And the cloud really makes this easier to do. That's one of the core principles that Theresa mentioned, you know, that's where I think, I'm curious, like to get Theresa, we'll start with you, your perspectives on how And so data is really the heart of the company. And so how do you, Justin, how does Starburst help organizations transfer that technical And I think, you know, the only constant is change. This is a, is this a big directional can, can hopefully expand the market to thousands of potential customers that can now leverage Talk to me about kind of where you're going in, where services heading in the processing to where the data lives. And, uh, and so we think that's a big deal about, you know, kind of enabling a data mash So Theresa, last question to you, where should companies go to get started today? And then thinking about who are the products that your product could work better with just like any other The cube is the leader in global live tech coverage.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Lisa MartinPERSON

0.99+

TheresaPERSON

0.99+

AWSORGANIZATION

0.99+

Teresa TungPERSON

0.99+

Justin BorkmanPERSON

0.99+

Justin BorgmanPERSON

0.99+

TeresaPERSON

0.99+

AmazonORGANIZATION

0.99+

JustinPERSON

0.99+

EuropeLOCATION

0.99+

SwitzerlandLOCATION

0.99+

StarburstORGANIZATION

0.99+

AccentureORGANIZATION

0.99+

SecondQUANTITY

0.99+

thousandsQUANTITY

0.99+

NetflixORGANIZATION

0.99+

FacebookORGANIZATION

0.99+

third oneQUANTITY

0.99+

pandemicEVENT

0.98+

four attributesQUANTITY

0.98+

BothQUANTITY

0.98+

todayDATE

0.98+

24QUANTITY

0.98+

firstQUANTITY

0.98+

AirbnbORGANIZATION

0.98+

over 220 patentsQUANTITY

0.97+

over a hundred guestsQUANTITY

0.97+

2021DATE

0.97+

oneQUANTITY

0.96+

StarbucksORGANIZATION

0.96+

single partnerQUANTITY

0.96+

PrestoORGANIZATION

0.96+

single lineQUANTITY

0.96+

sevenQUANTITY

0.95+

confluentORGANIZATION

0.95+

10%QUANTITY

0.94+

one central placeQUANTITY

0.94+

one thingQUANTITY

0.93+

single toolQUANTITY

0.92+

day twoQUANTITY

0.92+

next decadeDATE

0.92+

single entityQUANTITY

0.92+

star gateTITLE

0.92+

Mongo DBORGANIZATION

0.91+

last 22 monthsDATE

0.91+

two lifeQUANTITY

0.91+

StarburstTITLE

0.88+

last 22 monthsDATE

0.87+

Thomas Hazel, ChaosSearchJSON Flex on ChaosSearch


 

[Thomas Hazel] - Hello, this is Thomas Hazel, founder CTO here at ChaosSearch. And tonight I'm going to demonstrate a new feature we are offering this quarter called JSON Flex. If you're familiar with JSON datasets, they're wonderful ways to represent information. You know, they're multidimensional, they have ability to set up arrays as attributes but those arrays are really problematic when you need to expand them or flatten them to do any type of elastic search or relational access, particularly when you're trying to do aggregations. And so the common process is to exclude those arrays or pick and choose that information. But with this new Chaos Flex capability, our system uniquely can index that data horizontally in a very small and efficient representation. And then with our Chaos Refinery, expand each attribute as you wish vertically, so you can do all the basic and natural constructs you would have done if you had, you know, a more straightforward, two dimensional, three dimensional type representation. So without further ado, I'mma get into this presentation of JSON Flex. Now, in this case, I've already set up the service to point to a particular S3 account that has CloudTrail data, one that is pretty problematic when it comes down to flattening data. And again, if you know CloudTrail, one row can become 10,000 as data gets flattened. So without further ado, let me jump right in. When you first log into the ChaosSearch service, you'll see a tab called 'Storage'. This is the S3 account, and I have variety of buckets. I have the refinery, it's a data refinery. This is where we create views or lenses into these index streams that you can do analysis that publishes it in elastic API as an index pattern or relational table in SQL Now a particular bucket I have here is a whole bunch of demonstration datasets that we have to show off our capabilities and our offering. In this bucket, I have CloudTrail data and I'm going to create what we call a 'object group'. An object group is a entry point, a filter of which files I want to index that data. Now, it can be statically there or a live streaming. These object groups had the ability to say, what type of data do you want to index on? Now through our wizard, you can type in, you know, prefix in this case, I want to type in CloudTrail, and you see here, I have a whole bunch of CloudTrail. I'mma choose one file to make it quick and easy. But this particular CloudTrail data will expand and we can show the capability of this horizontal to vertical expansion. So I walked through the wizard, as you can see here, we discovered JSON, it's a gzip file. Leave flattening unlimited 'cause we want to be able to expand infinitely. But this case, instead of doing default virtual, I'm going to horizontally represent this information. And this uniquely compresses the data in a way that can be stored efficiently on disc but then expanded in our data refinery on Pond Query or search requests. So I'mma create this object group. Now I'm going to call this, you know, 'JSON Flex test' and I could set up live indexing, SQS pops up but I'mma skip that and skip Retention and just create it. Once this object group is created, you kind of think of it as a virtual bucket, 'cause it does filter the data as you can see here. When I look at the view, I just see CloudTrail, but within the console, I can say start indexing. Now this is static data there could be a live stream and we set up workers to index this data. Whether it's one file, a million files or one terabyte, or one petabyte, we index the data. We discover all the schema, and as you see here, we discovered 104 columns. Now what's interesting is that we represent this expansion in a horizontal way. You know, if you know CloudTrail records zero, record one, record two. This can expand pretty dramatically if you fully flatten it but this case we horizontally representing it as the index. So when I go into the data refinery, I can create a view. Now, if you know the data refinery of ChaosSearch, you can bring multiple data streams together. You can do transformations virtually, you can do correlations, but in this case, I'm just going to take this one particular index stream, we call 'JSON Flex' and walk through a wizard, we try to simplify everything and select a particular attribute to expand. Now, again, we represent this in one row but if you had arrays and do all the permutations, it could go one to 100 to 10,000. We had one JSON audit that went from one row to 1 million rows. Now, clearly you don't want to create all those permutations, when you're tryna put into a database. With our unique index technology, you can do it virtually and sort horizontally. So let me just select 'Virtual' and walk through the wizard. Now, as I mentioned, we do all these different transformations changed schema, we're going to skip all that and select the order time, records event and say, 'create this'. I'm going to say, you know, 'JSON Flex View', I can set up caching, do a variety of things, I'm going to skip that. And once I create this, it's now available in the elastic API as an index pattern, as well as SQL via our Presto API dialect. And you can use Looker, Tableau, et cetera. But in this case, we go to this 'Analytics tab' and we built in the Kibana, open search tooling that is Apache Tonetto. And I click on discovery here and I'm going to select that particular view. Again, it looks like, oops, it looks like an index pattern, and I'mma choose, let's see here, let's choose 15 years from past and present and make sure I find where actually was timed. And what you'll see here is, you know, sure. It's just one particular data set has a variety of columns, but you see here is unlike that record zero, records one, now it's expanded. And so it has been expanded like a vertical flattening that you would traditionally do if you wanted to do anything that was an elastic or a relational construct, you know, that fit into a table format. Now the 'vantage of JSON Flex, you don't have that stored as a blob and use these proprietary JSON API's. You can use your native elastic API or your native SQL tooling to get access to it naturally without that expense of that explosion or without the complexity of ETLing it, and picking and choosing before you actually put into the database. That completes the demonstration of ChaosSearch new JSON Flex capability. If you're interested, come to ChaosSearch.io and set up a free trial. Thank you.

Published Date : Nov 15 2021

SUMMARY :

and as you see here, we

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Thomas HazelPERSON

0.99+

10,000QUANTITY

0.99+

one terabyteQUANTITY

0.99+

one fileQUANTITY

0.99+

104 columnsQUANTITY

0.99+

one petabyteQUANTITY

0.99+

1 million rowsQUANTITY

0.99+

JSON FlexTITLE

0.99+

ChaosSearchORGANIZATION

0.99+

one rowQUANTITY

0.99+

a million filesQUANTITY

0.99+

tonightDATE

0.98+

TableauTITLE

0.98+

each attributeQUANTITY

0.98+

firstQUANTITY

0.98+

SQLTITLE

0.98+

S3TITLE

0.98+

100QUANTITY

0.98+

JSONTITLE

0.98+

15 yearsQUANTITY

0.98+

PrestoTITLE

0.97+

oneQUANTITY

0.96+

LookerTITLE

0.95+

twoQUANTITY

0.93+

JSON Flex ViewTITLE

0.92+

JSON APITITLE

0.91+

FlexTITLE

0.87+

zeroQUANTITY

0.87+

SQSTITLE

0.86+

ChaosSearchJSONORGANIZATION

0.8+

this quarterDATE

0.8+

CloudTrailCOMMERCIAL_ITEM

0.79+

Apache TonettoORGANIZATION

0.72+

JSONORGANIZATION

0.69+

Chaos FlexTITLE

0.69+

CloudTrailTITLE

0.6+

ChaosSearchTITLE

0.58+

ChaosSearch.ioTITLE

0.57+

data setQUANTITY

0.56+

KibanaORGANIZATION

0.45+

Did HPE GreenLake Just Set a New Bar in the On-Prem Cloud Services Market?


 

>> Welcome back to The Cube's coverage of HPE's GreenLake announcements. My name is Dave Vellante and you're watching the Cube. I'm here with Holger Mueller, who is an analyst at Constellation Research. And Matt Maccaux is the global field CTO of Ezmeral software at HPE. We're going to talk data. Gents, great to see you. >> Holger: Great to be here. >> So, Holger, what do you see happening in the data market? Obviously data's hot, you know, digital, I call it the force marks to digital. Everybody realizes wow, digital business, that's a data business. We've got to get our data act together. What do you see in the market is the big trends, the big waves? >> We are all young enough or old enough to remember when people were saying data is the new oil, right? Nothing has changed, right? Data is the key ingredient, which matters to enterprise, which they have to store, which they have to enrich, which they have to use for their decision-making. It's the foundation of everything. If you want to go into machine learning or (indistinct) It's growing very fast, right? We have the capability now to look at all the data in enterprise, which weren't able 10 years ago to do that. So data is main center to everything. >> Yeah, it's even more valuable than oil, I think, right? 'Cause with oil, you can only use once. Data, you can, it's kind of polyglot. I can go in different directions and it's amazing, right? >> It's the beauty of digital products, right? They don't get consumed, right? They don't get fired up, right? And no carbon footprint, right? "Oh wait, wait, we have to think about carbon footprint." Different story, right? So to get to the data, you have to spend some energy. >> So it's that simple, right? I mean, it really is. Data is fundamental. It's got to be at the core. And so Matt, what are you guys announcing today, and how does that play into what Holger just said? >> What we're announcing today is that organizations no longer need to make a difficult choice. Prior to today, organizations were thinking if I'm going to do advanced machine learning and really exploit my data, I have to go to the cloud. But all my data's still on premises because of privacy rules, industry rules. And so what we're announcing today, through GreenLake Services, is a cloud services way to deliver that same cloud-based analytical capability. Machine learning, data engineering, through hybrid analytics. It's a unified platform to tie together everything from data engineering to advance data science. And we're also announcing the world's first Kubernetes native object store, that is hybrid cloud enabled. Which means you can keep your data connected across clouds in a data fabric, or Dave, as you say, mesh. >> Okay, can we dig into that a little bit? So, you're essentially saying that, so you're going to have data in both places, right? Public cloud, edge, on-prem, and you're saying, HPE is announcing a capability to connect them, I think you used the term fabric. I'm cool, by the way, with the term fabric, we can, we'll parse that out another time. >> I love for you to discuss textiles. Fabrics vs. mesh. For me, every fabric breaks down to mesh if you put it on a microscope. It's the same thing. >> Oh wow, now that's really, that's too detailed for my brain, right this moment. But, you're saying you can connect all those different estates because data by its very nature is everywhere. You're going to unify that, and what, that can manage that through sort of a single view? >> That's right. So, the management is centralized. We need to be able to know where our data is being provisioned. But again, we don't want organizations to feel like they have to make the trade off. If they want to use cloud surface A in Azure, and cloud surface B in GCP, why not connect them together? Why not allow the data to remain in sync or not, through a distributed fabric? Because we use that term fabric over and over again. But the idea is let the data be where it most naturally makes sense, and exploit it. Monetization is an old tool, but exploit it in a way that works best for your users and applications. >> In sync or not, that's interesting. So it's my choice? >> That's right. Because the back of an automobile could be a teeny tiny, small edge location. It's not always going to be in sync until it connects back up with a training facility. But we still need to be able to manage that. And maybe that data gets persisted to a core data center. Maybe it gets pushed to the cloud, but we still need to know where that data is, where it came from, its lineage, what quality it has, what security we're going to wrap around that, that all should be part of this fabric. >> Okay. So, you've got essentially a governance model, at least maybe you're working toward that, and maybe it's not all baked today, but that's the north star. Is this fabric connect, single management view, governed in a federated fashion? >> Right. And it's available through the most common API's that these applications are already written in. So, everybody today's talking S3. I've got to get all of my data, I need to put it into an object store, it needs to be S3 compatible. So, we are extending this capability to be S3 native. But it's optimized for performance. Today, when you put data in an object store, it's kind of one size fits all. Well, we know for those streaming analytical capabilities, those high performance workloads, it needs to be tuned for that. So, how about I give you a very small object on the very fastest disk in your data center and maybe that cheaper location somewhere else. And so we're giving you that balance as part of the overall management estate. >> Holger, what's your take on this? I mean, Frank Slootman says we'll never, we're not going halfway house. We're never going to do on-prem, we're only in the cloud. So that basically says, okay, he's ignoring a pretty large market by choice. You're not, Matt, you must love those words. But what do you see as the public cloud players, kind of the moves on-prem, particularly in this realm? >> Well, we've seen lots of cloud players who were only cloud coming back towards on-premise, right? We call it the next generation compute platform where I can move data and workloads between on-premise and ideally, multiple clouds, right? Because I don't want to be logged into public cloud vendors. And we see two trends, right? One trend is the traditional hardware supplier of on-premise has not scaled to cloud technology in terms of big data analytics. They just missed the boat for that in the past, this is changing. You guys are a traditional player and changing this, so congratulations. The other thing, is there's been no innovation for the on-premise tech stack, right? The only technology stack to run modern application has been invested for a long time in the cloud. So what we see since two, three years, right? With the first one being Google with Kubernetes, that are good at GKE on-premise, then onto us, right? Bringing their tech stack with compromises to on-premises, right? Acknowledging exactly what we're talking about, the data is everywhere, data is important. Data gravity is there, right? It's just the network's fault, where the networks are too slow, right? If you could just move everything anywhere we want like juggling two balls, then we'd be in different place. But that's the not enough investment for the traditional IT players for that stack, and the modern stack being there. And now every public cloud player has an on-premise offering with different flavors, different capabilities. >> I want to give you guys Dave's story of kind of history and you can kind of course correct, and tell me how this, Matt, maybe fits into what's happened with customers. So, you know, before Hadoop, obviously you had to buy a big Oracle database and you know, you running Unix, and you buy some big storage subsystem if you had any money left over, you know, you maybe, you know, do some actual analytics. But then Hadoop comes in, lowers the cost, and then S3 kneecaps the entire Hadoop market, right? >> I wouldn't say that, I wouldn't agree. Sorry to jump on your history. Because the fascinating thing, what Hadoop brought to the enterprise for the first time, you're absolutely right, affordable, right, to do that. But it's not only about affordability because S3 as the affordability. The big thing is you can store information without knowing how to analyze it, right? So, you mentioned Snowflake, right? Before, it was like an Oracle database. It was Starschema for data warehouse, and so on. You had to make decisions how to store that data because compute capabilities, storage capabilities, were too limited, right? That's what Hadoop blew away. >> I agree, no schema on, right. But then that created data lakes, which create a data swamps, and that whole mess, and then Spark comes in and help clean it out, okay, fine. So, we're cool with that. But the early days of Hadoop, you had, companies would have a Hadoop monolith, they probably had their data catalog in Excel or Google sheets, right? And so now, my question to you, Matt, is there's a lot of customers that are still in that world. What do they do? They got an option to go to the cloud. I'm hearing that you're giving them another option? >> That's right. So we know that data is going to move to the cloud, as I mentioned. So let's keep that data in sync, and governed, and secured, like you expect. But for the data that can't move, let's bring those cloud native services to your data center. And so that's a big part of this announcement is this unified analytics. So that you can continue to run the tools that you want to today while bringing those next generation tools based on Apache Spark, using libraries like Delta Lake so you can go anything from Tableaux through Presto sequel, to advance machine learning in your Jupiter notebooks on-premises where you know your data is secured. And if it happens to sit in existing Hadoop data lake, that's fine too. We don't want our customers to have to make that trade off as they go from one to the other. Let's give you the best of both worlds, or as they say, you can eat your cake and have it too. >> Okay, so. Now let's talk about sort of developers on-prem, right? They've been kind of... If they really wanted to go cloud native, they had to go to the cloud. Do you feel like this changes the game? Do on-prem developers, do they want that capability? Will they lean into that capability? Or will they say no, no, the cloud is cool. What's your take? >> I love developers, right? But it's about who makes the decision, who pays the developers, right? So the CXOs in the enterprises, they need exactly, this is why we call the next-gen computing platform, that you can move your code assets. It's very hard to build software, so it's very valuable to an enterprise. I don't want to have limited to one single location or certain computing infrastructure, right? Luckily, we have Kubernetes to be able to move that, but I want to be able to deploy it on-premise if I have to. I want to deploy it, would be able to deploy in the multiple clouds which are available. And that's the key part. And that makes developers happy too, because the code you write has got to run multiple places. So you can build more code, better code, instead of building the same thing multiple places, because a little compiler change here, a little compiler change there. Nobody wants to do portability testing and rewriting, recertified for certain platforms. >> The head of application development or application architecture and the business are ultimately going to dictate that, number one. Number two, you're saying that developers shouldn't care because it can write once, run anywhere. >> That is the promise, and that's the interesting thing which is available now, 'cause people know, thanks to Kubernetes as a container platform and the abstraction which containers provide, and that makes everybody's life easier. But it goes much more higher than the Head of Apps, right? This is the digital transformation strategy, the next generation application the company has to build as a response to a pandemic, as a pivot, as digital transformation, as digital disruption capability. >> I mean, I see a lot of organizations basically modernizing by building some kind of abstraction to their backend systems, modernizing it through cloud native, and then saying, hey, as you were saying Holger, run it anywhere you want, or connect to those cloud apps, or connect across clouds, connect to other on-prem apps, and eventually out to the edge. Is that what you see? >> It's so much easier said than done though. Organizations have struggled so much with this, especially as we start talking about those data intensive app and workloads. Kubernetes and Hadoop? Up until now, organizations haven't been able to deploy those services. So, what we're offering as part of these GreenLake unified analytics services, a Kubernetes runtime. It's not ours. It's top of branch open source. And open source operators like Apache Spark, bringing in Delta Lake libraries, so that if your developer does want to use cloud native tools to build those next generation advanced analytics applications, but prod is still on-premises, they should just be able to pick that code up, and because we are deploying 100% open-source frameworks, the code should run as is. >> So, it seems like the strategy is to basically build, now that's what GreenLake is, right? It's a cloud. It's like, hey, here's your options, use whatever you want. >> Well, and it's your cloud. That's, what's so important about GreenLake, is it's your cloud, in your data center or co-lo, with your data, your tools, and your code. And again, we know that organizations are going to go to a multi or hybrid cloud location and through our management capabilities, we can reach out if you don't want us to control those, not necessarily, that's okay, but we should at least be able to monitor and audit the data that sits in those other locations, the applications that are running, maybe I register your GKE cluster. I don't manage it, but at least through a central pane of glass, I can tell the Head of Applications, what that person's utilization is across these environments. >> You know, and you said something, Matt, that struck, resonated with me, which is this is not trivial. I mean, not as simple to do. I mean what you see, you see a lot of customers or companies, what they're doing, vendors, they'll wrap their stack in Kubernetes, shove it in the cloud, it's essentially hosted stack, right? And, you're kind of taking a different approach. You're saying, hey, we're essentially building a cloud that's going to connect all these estates. And the key is you're going to have to keep, and you are, I think that's probably part of the reason why we're here, announcing stuff very quickly. A lot of innovation has to come out to satisfy that demand that you're essentially talking about. >> Because we've oversimplified things with containers, right? Because containers don't have what matters for data, and what matters for enterprise, which is persistence, right? I have to be able to turn my systems down, or I don't know when I'm going to use that data, but it has to stay there. And that's not solved in the container world by itself. And that's what's coming now, the heavy lifting is done by people like HPE, to provide that persistence of the data across the different deployment platforms. And then, there's just a need to modernize my on-premise platforms. Right? I can't run on a server which is two, three years old, right? It's no longer safe, it doesn't have trusted identity, all the good stuff that you need these days, right? It cannot be operated remotely, or whatever happens there, where there's two, three years, is long enough for a server to have run their course, right? >> Well you're a software guy, you hate hardware anyway, so just abstract that hardware complexity away from you. >> Hardware is the necessary evil, right? It's like TSA. I want to go somewhere, but I have to go through TSA. >> But that's a key point, let me buy a service, if I need compute, give it to me. And if I don't, I don't want to hear about it, right? And that's kind of the direction that you're headed. >> That's right. >> Holger: That's what you're offering. >> That's right, and specifically the services. So GreenLake's been offering infrastructure, virtual machines, IaaS, as a service. And we want to stop talking about that underlying capability because it's a dial tone now. What organizations and these developers want is the service. Give me a service or a function, like I get in the cloud, but I need to get going today. I need it within my security parameters, access to my data, my tools, so I can get going as quickly as possible. And then beyond that, we're going to give you that cloud billing practices. Because, just because you're deploying a cloud native service, if you're still still being deployed via CapEx, you're not solving a lot of problems. So we also need to have that cloud billing model. >> Great. Well Holger, we'll give you the last word, bring us home. >> It's very interesting to have the cloud qualities of subscription-based pricing maintained by HPE as the cloud vendor from somewhere else. And that gives you that flexibility. And that's very important because data is essential to enterprise processes. And there's three reasons why data doesn't go to the cloud, right? We know that. It's privacy residency requirement, there is no cloud infrastructure in the country. It's performance, because network latency plays a role, right? Especially for critical appraisal. And then there's not invented here, right? Remember Charles Phillips saying how old the CIO is? I know if they're going to go to the cloud or not, right? So, it was not invented here. These are the things which keep data on-premise. You know that load, and HP is coming on with a very interesting offering. >> It's physics, it's laws, it's politics, and sometimes it's cost, right? Sometimes it's too expensive to move and migrate. Guys, thanks so much. Great to see you both. >> Matt: Dave, it's always a pleasure. All right, and thank you for watching the Cubes continuous coverage of HPE's big GreenLake announcements. Keep it right there for more great content. (calm music begins)

Published Date : Sep 28 2021

SUMMARY :

And Matt Maccaux is the global field CTO I call it the force marks to digital. So data is main center to everything. 'Cause with oil, you can only use once. So to get to the data, you And so Matt, what are you I have to go to the cloud. capability to connect them, It's the same thing. You're going to unify that, and what, We need to be able to know So it's my choice? It's not always going to be in sync but that's the north star. I need to put it into an object store, But what do you see as for that in the past, I want to give you guys Sorry to jump on your history. And so now, my question to you, Matt, And if it happens to sit in they had to go to the cloud. because the code you write has and the business the company has to build as and eventually out to the edge. to pick that code up, So, it seems like the and audit the data that sits to have to keep, and you are, I have to be able to turn my systems down, guy, you hate hardware anyway, I have to go through TSA. And that's kind of the but I need to get going today. the last word, bring us home. I know if they're going to go Great to see you both. the Cubes continuous coverage

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Dave VellantePERSON

0.99+

Frank SlootmanPERSON

0.99+

MattPERSON

0.99+

Matt MaccauxPERSON

0.99+

HolgerPERSON

0.99+

DavePERSON

0.99+

Holger MuellerPERSON

0.99+

twoQUANTITY

0.99+

100%QUANTITY

0.99+

Charles PhillipsPERSON

0.99+

Constellation ResearchORGANIZATION

0.99+

HPEORGANIZATION

0.99+

ExcelTITLE

0.99+

HPORGANIZATION

0.99+

todayDATE

0.99+

three yearsQUANTITY

0.99+

GreenLakeORGANIZATION

0.99+

three reasonsQUANTITY

0.99+

TodayDATE

0.99+

GoogleORGANIZATION

0.99+

two ballsQUANTITY

0.98+

firstQUANTITY

0.98+

OracleORGANIZATION

0.98+

10 years agoDATE

0.98+

EzmeralORGANIZATION

0.98+

both worldsQUANTITY

0.98+

first timeQUANTITY

0.98+

S3TITLE

0.98+

One trendQUANTITY

0.98+

GreenLake ServicesORGANIZATION

0.98+

first oneQUANTITY

0.98+

SnowflakeTITLE

0.97+

both placesQUANTITY

0.97+

KubernetesTITLE

0.97+

onceQUANTITY

0.96+

bothQUANTITY

0.96+

two trendsQUANTITY

0.96+

Delta LakeTITLE

0.95+

GoogleTITLE

0.94+

HadoopTITLE

0.94+

CapExORGANIZATION

0.93+

TableauxTITLE

0.93+

AzureTITLE

0.92+

GKEORGANIZATION

0.92+

CubesORGANIZATION

0.92+

UnixTITLE

0.92+

one single locationQUANTITY

0.91+

single viewQUANTITY

0.9+

SparkTITLE

0.86+

ApacheORGANIZATION

0.85+

pandemicEVENT

0.82+

HadoopORGANIZATION

0.81+

three years oldQUANTITY

0.8+

singleQUANTITY

0.8+

KubernetesORGANIZATION

0.74+

big wavesEVENT

0.73+

Apache SparkORGANIZATION

0.71+

Number twoQUANTITY

0.69+

Balaji Ganesan, Privacera | CUBE Conversation


 

(upbeat techno music) >> Welcome to this CUBE Conversation. I'm Lisa Martin; I am joined by the CEO and co-founder of Privacera, Balaji Ganesan. Balaji, it's great to have you on theCUBE. >> Great to great to see you, Lisa. Good to see you again, and thanks for the opportunity. >> So tell our audience about Privacera. How do you help balance data security, data sharing? >> Absolutely. At Privacera we are on a mission to help enterprises unlock their data, but do it in a secure and a compliant way. We are in this balance between, we call it a dual mandate, where we see enterprise data teams, on one hand, they are being asked to democratize data and make this data available to all parts of the organization. So everybody in the organization is looking forward to get access to the data faster. On the other hand, governance, privacy, and compliance mandates have become more stringent. And it has come from regulations such as GDPR or California Privacy, but in general, the environment and the culture has changed where, from a board level, there's more owners who are making sure that you have visibility on what data you're bringing in, but also make sure that right people have access to the right data. And that notion is no longer in textbooks or in books, right? It's being actually, an onus is on making it happen. And it's really hard for these data teams do that, as the platforms are very diverse. And again, driven by data democratization today, companies are running very diverse platforms. Even in a single cloud like AWS, they have choices between Snowflake or Databricks and Amazon's native tools and other other services, which are really cropping up and being available in the cloud. But if you need to make sure right people have access to the right data, in that paradigm it's really, really hard. And this is where a tool like Privacera comes in, where we can help them get visibility on their data, but also make sure that we can help them with building a unified layer where they can start managing these tools more cohesively. And the end result is they can get access to the data faster, but you're compliant, you're governed, and you have visibility around who's doing what. And that's the big enabler in their data strategy. >> So, we talk about the need for data monetization, for organizations to be able to give enterprise-wide access across business units, to identify new sources of revenue and new opportunities. That's a big challenge to do. You mentioned the security and governance front at the board level. I imagined that the data-sharing is as well. How are you helping customers navigate multiple platforms, multiple clouds, to be able to get access that is actually secure, that the CEO can go back to the board and say we've got everything, you know, all I's dotted and T's crossed here? >> Absolutely, absolutely. I think this is one of the biggest challenges that we have the CIOs face today, is on one hand, they have to be agile to the business and make sure that they're present in the cloud, but they are enabling multiple services that the business needs for agility. And data is being one of the business drivers today, And most companies are becoming data companies. And it is to make decisions to serve your customer better, bring more revenue, cut costs. Even in the midst of COVID, we have seen our customers go in and leverage data to find out how they can shift to a different paradigm of doing business. Now, we had a customer which was primarily in retail stores, but they had to go and shift and analyze data on how they can pivot into a more online world in the COVID paradigm, how they can make supply chain decisions faster. So every company is becoming a data-driven business. The data is becoming the currency. So more units want faster access to the data as possible. But on the other hand, you cannot forget about governance. You can not forget about security, it's becoming a table stakes as part of it. And traditionally, this has been a zero-sum game, where, you know, in order to maintain more security, you cannot give more access to the data or you will make copies of the data, and that creates redundancy. The newer paradigm, in our belief, is that you can do both. And that's how Privacera has built toward. And this is how we are helping our customers in their journey where, you know, if you take Comcast, for example, they're building a massive infrastructure on top of AWS to serve the digital analytics part of it. And they are collecting a lot of data and making decisions based on that. But on the other hand, in order for them to achieve compliance and privacy, there needs to be an approach, a more unified layer, which is not innovating from using the data. And this is where a solution like Privacera is coming in, where we have built an approach, we have built an architecture, where they can enable governance and policies, and these policies are being implemented across the data infrastructure. So it doesn't matter which application you use, where you're coming from, you're governed by the same rules and policies. And that uniformity, that consistency is something we can bring in, of being in horizontal layer and having built those integrations, prebuilt those integrations in. So with Comcast, what the end result they're saying is they can be faster to the market, right? Before us, they would be spending a lot of time with manual processes to build that governance. But with an automated layer, with an automated governance, which has prebuilt integrations into all the layers, they are now able to go to market faster, but now they're going into the market with the governance and the compliance built in, so they can have both. So again, our belief is it's not zero-sum. Your governance, security can be built in with this business agility. And we are helping customers do that. >> You mentioned that retail customer and COVID-19, and we saw a massive pivot about a year and a half ago. And some companies did a great job of pivoting from brick and mortar to curbside delivery, for example, which is table stakes. But we saw so much acceleration of digital transformation last year. How has COVID-19 impacted governance? And what are some of the things that you're helping customers achieve there as they're accelerating their digital journeys? >> Again, going back to the drivers, we are seeing our customers, right? So on one hand, digitization and cloud journey, that accelerated during COVID right? So more companies where they were doing their cloud journey, they accelerated, because they can unlock data faster. And, to my earlier examples, they want to make decisions, leveraging data. And COVID brought that, even accelerated some of these initiatives. So there has been more data initiatives than before. Digitalization has accelerated; cloud migration has accelerated. But COVID also brought in the fact that you are not physically located. You can't sit in a room and trust each other and say, "I trust all of you and I'll give you all equal access." You are now sitting in disparate locations, without the traditional securities you would have, a physical boundary, having that. You're now remote. All of a sudden, the CIOs have to think how we can be more agile? How do you build in security, governance in that layer where you have to think start from bottom staff and then say, are you governing and protecting your data wherever it is stored and being accessed, Rather than relying on perimeter or relying on a physical boundary or being in a physical location. So those traditional paradigms are getting shattered. And most companies have recognized, most forward-looking companies, are recognizing that. They accelerated those trends. And from what we have seen from our point of view is we are able to help in that transformation, both in enabling companies to become digital and democratize data faster, but also building this bottom-up layer where they can be sure that they have visibility on what data they have, but also making sure right people have access to the right data, irrespective of what tool they use, irrespective of where they are set, they're always getting that part of it. And that's a sea change we are seeing in the companies now. So COVID in our industry, in our world, has brought in massive transformation and massive opportunities to set a new paradigm for how organizations treat governance, as well as the data initiative. >> A lot of change that it's brought. Some good, as you've mentioned. Talk to me about, so Privacera is built on Apache Ranger; how are you guys helping AWS customers from a cloud migration perspective? Because we know cloud migration is continuing to accelerate. >> Our foundation, given our work in open source, has always been building around open standards and interoperability, and we believe an enterprise solution needs to be built around these standards that we can talk to. You're not the only solution that enterprises will have. There needs to be interoperability, especially around governance and where we exchanging information, and with other tools. And given a legacy of Ranger, it helps us build those standards. And Ranger as a project today is supported from the likes of Cloudera or in the cloud, Microsoft, AWS, and Google, and most of the forward-looking standards and tools, like Presto and Spark. It has been a de facto standard used by some of these analytical engines. The wide adoption around that, and being built on Ranger gives us that standard of interoperability. So when we go and work with other tools, it makes it easier for us to talk. It makes it easier for organizations to transition in their cloud journey, where they can now very easily move the governance and policies of, even if they are running Ranger on premise, they can easily move those standards, those policies, easily into the cloud. For example, with Sun Life, it was the same case, where they built a lot of these rules and policies in their on-premise environment. Being an insurance company, they always had governance and compliance at top of their mind. Very strict rules around who can access what data and what portions of data, because this data is governed by federal laws, by a lot of the industry laws and mandates and compliance. And they always had this notion in on-premise. Now when they're migrating to the cloud, one of the bottlenecks is how do you move this governance and do you have to build it from scratch? But with our tool and the standards we have built in, we can migrate that in days rather than months. So for them, we help in the overall cloud migration. To my earlier point, we are helping customers achieve faster time to market by enabling this governance and making it easier. And by having this open standard, it makes it easier for customers to migrate and then cooperate, rather than having to build it again, having to reinvent the wheel when they migrate to the cloud. Because, the governance and compliance mandates are not changing when you go from prem to cloud. In fact cloud, in some cases, it's more diverse. So by helping organizations do that, we are helping them achieve a faster acceleration, which is the case happened in Sun Life. >> That time to market is absolutely imperative. If anything, we've learned in the last 18 months, it's businesses that needed to pivot overnight multiple times. And they need to be able to get to market faster, whether it's pivoting from being a brick and mortar, to being able to deliver a curbside delivery. The time to market, people don't have that time, regardless of industry, because there's competitors in the rear-view mirror who might be smaller, more agile, and able to get to market faster. So these bigger companies, and any company, needs to have a faster time to market. >> Yeah, absolutely. And that's what we are seeing. And that's big driver for journey into the cloud is to bring that agility. In the earlier paradigm, you're going to have a monolithical technology standard, and you can adopt changes faster when you are reliant on the IT team. What cloud brings in is, you can now move data into the cloud and enable any service and any team faster than ever before. You can enable a team on Snowflake, you can enable a team on a different machine learning tool, all having access to the same data, without it being the need for the data to be copied and servers built out. The cloud is really bringing that digital transformation, but it's also bringing in the agility of being faster and nimble and as part of it. But the challenge for cloud is it's happening at the same time governance, privacy has become real. And organizations no longer can be assuming that, you know, they can just move data into the cloud and be done with it. You have to really think about all layers of the cloud and say, how do you make sure that data is protected on all layers, in all consumption? How do you make sure that right people have access to the right data? And that's a much comprehensive problem, given the world that we are now not sitting in a physical office anymore, we are distributed. How do you do that? So while cloud brings that business agility, it's also happening, not because of cloud, but because of the climate we are in, that governance and compliance is real. And most forward-looking organizations are thinking about how they can build a foundation that can handle both. How they can build, institutionalize these governance frameworks in the newer paradigms of cloud. We are seeing the companies implementing what is called a data mesh, which is essentially a concept of how the data could be decentralized and owned by business owners and teams. But how do you bring governance in that? How do you make sure that a layer of that, and then a newer paradigm most forward-looking organizations are adopting is, governance doesn't need to be managed by one team. It can be a distributed function. But can you institutionalize a foundation or a framework, and you have pools which can be used by different teams. So they are bound by the same rules, but they're operating in their own independent way. And that's the future for us, is how the organizations can figure out how in the cloud, they can have a more distributed, delegated, decentralized governance that aligns with their business strategy of self-service analytics and use of data across multiple teams, but all bound by the same framework, all bound by common rules so that you're not building your own; the tools and the methods are all common, but each team is able to operate independently. And that's where the agility, true agility, will come in, when organizations are able to do that. And I think we are in probably step one or two of the journey. It's fascinating to see some of the organizations take leaps in that. But for us, the future is how if some organizations can build those foundations in from processes and people, they can truly unlock the power of the cloud. >> You brought in technology and people; last question is, how do you advise customers when you're in conversations? We talked about data access, governance, security, being a board-level conversation, the ability for an organization to monetize their data; but how do you talk about that balance when you're with customers? That's a tricky line. >> And what we say to the customer, it's a journey. You don't have to think of solving this on day one. What we really think about is foundational steps you need to do to achieve that journey. And what are the steps you can do today? And add onto it, rather than trying to solve for everything on day one. And that's what most of the focus areas goes in, is how we can help our customers put together a program which achieves both their data strategy and aligns their governance with it. And most forward-looking organizations are already doing that, where they have a multi-year journey that they're already working on. They are thinking about some of the things that we help with. And in some cases, when organizations are not thinking about it, we come and help and advise with that. Our advice always is, start thinking about today and what your next two or three years is going to look like. We put together a program. And that involves tools, that involves people, and that involves organization structure. And we are a cog in the wheel, but we also recommend them to look at, holistically, all the aspects. And that's our job at the end of the day as vendors in this industry, to help collectively learn from customers what we are learning and can help the next set of customers coming. But we believe, again, going back to my point, if organizations are able to set up this paradigm where they're able to set structures, where they can delegate governance, but they build those common rules and frameworks upfront, they are set up to succeed in the future. They can be more agile than their competitors. >> And that is absolutely table stakes these days. Balaji, thank you so much for joining, telling our audience about Privacera, what you're doing, how you're helping customers, particularly AWS customers, migrate to the cloud in such a dynamic environment. We appreciate your time. >> Thank you so much. It was a pleasure talking to you and I appreciate it. >> Likewise. For Balaji Ganesan, I'm Lisa Martin. You're watching this CUBE Conversation. (upbeat music)

Published Date : Sep 7 2021

SUMMARY :

Balaji, it's great to have you on theCUBE. Good to see you again, and How do you help balance And the end result is they can for organizations to be able to give But on the other hand, you to curbside delivery, All of a sudden, the CIOs have to think is continuing to accelerate. and most of the forward-looking And they need to be able but because of the climate we are in, to monetize their data; And that's our job at the end of the day And that is absolutely to you and I appreciate it. For Balaji Ganesan, I'm Lisa Martin.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Lisa MartinPERSON

0.99+

ComcastORGANIZATION

0.99+

PrivaceraORGANIZATION

0.99+

AmazonORGANIZATION

0.99+

AWSORGANIZATION

0.99+

MicrosoftORGANIZATION

0.99+

LisaPERSON

0.99+

last yearDATE

0.99+

GDPRTITLE

0.99+

Balaji GanesanPERSON

0.99+

DatabricksORGANIZATION

0.99+

BalajiPERSON

0.99+

bothQUANTITY

0.99+

GoogleORGANIZATION

0.99+

Sun LifeORGANIZATION

0.99+

each teamQUANTITY

0.99+

oneQUANTITY

0.98+

todayDATE

0.98+

one teamQUANTITY

0.98+

three yearsQUANTITY

0.98+

SnowflakeORGANIZATION

0.97+

twoQUANTITY

0.97+

California PrivacyTITLE

0.95+

COVIDOTHER

0.95+

Sun LifeORGANIZATION

0.95+

COVIDTITLE

0.94+

about a year and a half agoDATE

0.94+

COVID-19OTHER

0.91+

day oneQUANTITY

0.9+

COVIDORGANIZATION

0.87+

dualQUANTITY

0.86+

RangerORGANIZATION

0.86+

step oneQUANTITY

0.84+

SnowflakeTITLE

0.82+

single cloudQUANTITY

0.81+

Apache RangerORGANIZATION

0.78+

PrestoORGANIZATION

0.7+

last 18 monthsDATE

0.7+

SparkTITLE

0.69+

one of the bottlenecksQUANTITY

0.62+

ClouderaTITLE

0.54+

PrivaceraPERSON

0.51+

Dipti Borkar, Ahana, and Derrick Harcey, Securonix | CUBE Conversation, July 2021


 

(upbeat music) >> Welcome to theCUBE Conversation. I'm John Furrier, host of theCUBE here in Palo Alto, California, in our studios. We've got a great conversation around open data link analytics on AWS, two great companies, Ahana and Securonix. Dipti Borkar, Co-founder and Chief Product Officer at Ahana's here. Great to see you, and Derrick Harcey, Chief Architect at Securonix. Thanks for coming on, really appreciate you guys spending the time. >> Yeah, thanks so much, John. Thank you for having us and Derrick, hello again. (laughing) >> Hello, Dipti. >> We had a great conversation around our startup showcase, which you guys were featured last month this year, 2021. The conversation continues and a lot of people are interested in this idea of open systems, open source. Obviously open data lakes is really driving a lot of value, especially with machine learning and whatnot. So this is a key, key point. So can you guys just take a step back before we get under the hood and set the table on Securonix and Ahana? What's the big play here? What is the value proposition? >> Why sure, I'll give a quick update. Securonix has been in the security business. First, a user and entity, behavioral analytics, and then the next generation SIEM platform for 10 years now. And we really need to take advantage of some cutting edge technologies in the open source community and drive adoption and momentum that we can not only bring in data from our customers, that they can find security threats, but also store in a way that they can use for other purposes within their organization. That's where the open data lake is very critical. >> Yeah and to add on to that, John, what we've seen, you know, traditionally we've had data warehouses, right? We've had operational systems move all of their data into the warehouse and those, you know, while these systems are really good, built for good use cases, the amount of data is exploding, the types of data is exploding, different types, semi-structured, structured and so when, as companies like Securonix in the security space, as well as other verticals, look for getting more insights out of their data, there's a new approach that's emerging where you have a data lake, which AWS has revolutionized with S3 and commoditized and there's analytics that's built on top of it. And so we're seeing a lot of good advantages that come out of this new approach. >> Well, it's interesting EC2 and S3 are having their 15th birthday, as they say in Amazon's interesting teenage years, but while I got you guys here, I want to just ask you, can you define the SIEM thing because the SIEM market is exploding, it just changed a little bit. Obviously it's data, event management, but again, as data becomes more proliferating, and it's not stopping anytime soon, as cloud native applications emerge, why is this important? What is this SIEM category? What's it about? >> Yeah, thanks. I'll take that. So obviously SIEM traditionally has been around for about a couple of decades and it really started with first log collection and management and rule-based threat detection. Now what we call next generation SIEM is really the modernization of a security platform that includes streaming threat detection and behavioral analysis and data analytics. We literally look for thousands of different threat detection techniques, and we chained together sequences of events and we stream everything in real time and it's very important to find threats as quickly as possible. But the momentum that we see in the industry as we see massive sizes of customers, we have made a transition from on-premise to the cloud and we literally are processing tens of petabytes of data for our customers. And it's critical that we can adjust data quickly, find threats quickly and allow customers to have the tools to respond to those security incidents quickly and really get the handle on their security posture. >> Derrick, if I ask you what's different about this next gen SIEM, what would you say and what's the big a-ha? What's the moment there? What's the key thing? >> The real key is taking the off the boundaries of scale. We want to be able to ingest massive quantities of data. We want to be able to do instant threat detection, and we want to be able to search on the entire forensic data set across all of the history of our customer base. In the past, we had to make sacrifices, either on the amount of data we ingest or the amount of time that we stored that data. And the really the next generation SIEM platform is offering advanced capabilities on top of that data set because those boundaries are no longer barriers for us. >> Dipti, any comment before I jump into the question for you? >> Yeah, you know, absolutely. It is about scale and like I mentioned earlier, the amount of data is only increasing and it's also the types of information. So the systems that were built to process this information in the past are, you know, support maybe terabytes of data, right? And that's where new technologies open source engines like Presto come in, which were built to handle internet scale. Presto was kind of created at Facebook to handle these petabytes that Derrick is talking about that every industry is now seeing where we're are moving from gigs to terabytes to petabytes. And that's where the analytic stack is moving. >> That's a great segue. I want to ask you while I got you here 'cause this is again, the definitions, 'cause people love to hear the experts weigh in. What is open data lake analytics? How would you define that? And then talk about where Presto fits in. >> Yeah, that's a great question. So the way I define open data lake analytics is you have a data lake on the core, which is, let's say S3, it's the most popular one, but on top of it, there are open aspects, it is open format. Open formats play a very important role because you can have different types of processing. It could be SQL processing, it could be machine learning, it could be other types of workloads, all work on these open formats versus a proprietary format where it's locked and it's open interfaces. Open interfaces that are like SQL, JDBC, ODBC is widely accessible to a range of tools. And so it's everywhere. Open source is a very important part of it. As companies like Securonix pick these technologies for their mission critical systems, they want to know that this is going to be available and open for them for a long period of time. And that's why open source becomes important. And then finally, I would say open cloud because at the end of the day, you know, while AWS is where a lot of the innovations happening, a lot of the market is, there are other clouds and open cloud is something that these engines were built for, right? So that's how I define open data lake analytics. It's analytics with query engines built on top of these open formats, open source, open interfaces and open cloud. Now Presto comes in where you want to find the needle in the haystack, right? And so when you have these deep questions about where did the threat come from or who was it, right? You have to ask these questions of your data. And Presto is an open source distributed SQL engine that allows data platform teams to run queries on their data lakes in a high-performance ways, in memory and on these petabytes of data. So that's where Presto fits in. It's one of the defacto query engines for SQL analysis on the data lake. So hopefully that answers the question, gives more context. >> Yeah, I mean, the joke about data lakes has been you don't want to be a data swamp, right? That's what people don't want. >> That's right. >> But at the same time, the needle in the haystack, it's like big data is like a needle in a haystack of needles. So there's a constant struggle to getting that data, the right data at the right time. And what I learned in the last presentation, you guys both presented, your teams presented at the conference was the managed service approach. Could you guys talk about why that approach works well together with you guys? Because I think when people get to the cloud, they replatform, then they start refactoring and data becomes a real big part of that. Why is the managed service the best approach to solving these problems? >> Yeah and interestingly, both Securonix and Ahana have a managed service approach so maybe Derrick can go first and I can go after. >> Yeah, yeah. I'll be happy to go first. You know, we really have found making the transition over the last decade from off premise to the cloud for the majority of our customers that running a large open data lake requires a lot of different skillsets and there's hundreds of technologies in the open source community to choose from and to be able to choose the right blend of skillsets and technologies to produce a comprehensive service is something that customers can do, many customers did do, and it takes a lot of resources and effort. So what we really want to be able to do is take and package up our security service, our next generation SIEM platform to our customers where they don't need to become experts in every aspect of it. Now, an underlying component of that for us is how we store data in an open standards way and how we access that data in an open standards way. So just like we want our customers to get immediate value from the security services that we provide, we also want to be able take advantage of a search service that is offered to us and supported by a vendor like Ahana where we can very quickly take advantage of that value within our core underlying platform. So we really want to be able to make a frictionless effort to allow our customers achieve value as quick as possible. >> That's great stuff. And on the Ahana side, open data lakes, really the ease of use there, it sounds easy to me, but we know it's not easy just to put data in a data lake. At the end of the day, a lot of customers want simplicity 'cause they don't have the staffing. This comes up a lot. How do you leverage their open source participation and/or getting stood up quickly so they can get some value? Because that seems to be the number one thing people want right now. Dipti, how does that work? How do people get value quickly? >> Yeah, absolutely. When you talk about these open source press engines like Presto and others, right? They came out of these large internet companies that have a lot of distributed systems, engineers, PhDs, very kind of advanced level teams. And they can manage these distributed systems building onto them, add features at large scale, but not every company can and these engines are extremely powerful. So when you combine the power of Presto with the cloud and a managed service, that's where value for everyone comes in. And that's what I did with Ahana is looked at Presto, which is a great engine, but converted it into a great user experience so that whether it's a three person platform team or a five person platform team, they still get the same benefit of Presto that a Facebook gets, but at much, much a less operational complexity cost, as well as the ability to depend on a vendor who can then drive the innovation and make it even better. And so that's where managed services really com in. There's thousands of credit parameters that need to be tuned. With Ahana, you get it out of the box. So you have the best practices that are followed at these larger companies. Our team comes from Facebook, HuBERT and others, and you get that out of the box, with a few clicks you can get up and running. And so you see value immediately, in 30 minutes you're up and running and you can create your data lake versus with Hadoop and these prior systems, it would take months to receive real value from some of these systems. >> Yeah, we saw the Hadoop scar tissue is all great and all good now, but it takes too much resource, standing up clusters, managing it, you can't hire enough people. I got to ask you while you're on that topic, do you guys ship templates? How do you solve the problem of out of the box? You mentioned some out of the box capability. Do you guys think of as recipes, templates? What's your thoughts around what you're providing customers to get up and running? >> Yeah so in the case of Securonix, right, let's say they want to create a Presto cluster. They go into our SAS console. You essentially put in the number of nodes that you want. Number of workers you want. There's a lot of additional value that we built in like caching capabilities if you want more performance, built in cataloging that's again, another single click. And there isn't really as much of a template. Everybody gets the best tuned Presto for their workloads. Now there are certain workloads where you might have interactive in some cases, or you might have transformation batch ETL, and what we're doing next is actually giving you the knobs so that it comes pre tuned for the type of workload that you want to run versus you figuring it out. And so that's what I mean by out of the box, where you don't have to worry about these configuration parameters. You get the performance. And maybe Derrick can you talk a little bit about the benefits of the managed service and the usage as well. >> Yeah, absolutely. So, I'll answer the same question and then I'll tie back to what Dipti asked. Really, you know, our customers, we want it to be very easy for them to ingest security event logs. And there's really hundreds of types of a security event logs that we support natively out of the box, but the key for us is a standard that we call the open event format. And that is a normalized schema. We take any data source in it's normalized format, be a collector device a customer uses on-premise, they send the data up to our cloud, we do streaming analysis and data analytics to determine where the threats are. And once we do that, then we send the data off to a long-term storage format in a standards-based Parquet file. And that Parquet file is natively read by the Ahana service. So we simply deploy an Ahana cluster that uses the Presto engine that natively supports our open standard file format. And we have a normalized schema that our application can immediately start to see value from. So we handle the collection and streaming ingest, and we simply leverage the engine in Ahana to give us the appropriate scale. We can size up and down and control the cost to give the users the experience that they're paying for. >> I really love this topic because one, not only is it cutting edge, but it's very relevant for modern applications. You mentioned next gen SIEMs, SIEM, security information event management, not SIM as memory card, which I think of all the time because I always want to add more, but this brings up the idea of streaming data real-time, but as more services go to the cloud, Derrick, if you don't mind sharing more on this. Share the journey that you guys gone through, because I think a lot of people are looking at the cloud and saying, and I've been in a lot of these conversations about repatriation versus cloud. People aren't going that way. They're going more innovation with his net new revenue models emerging from the value that they're getting out of understanding events that are happening within the network and the apps, even when they're being stood up and torn down. So there's a lot of cloud native action going on where just controlling and understanding is way beyond the, just put stuff into an event log. It's a whole nother animal. >> Well, there's a couple of paradigm shifts that we've seen major patterns for in the last five or six years. Like I said, we started with the safe streaming ingest platform on premise. We use some different open source technologies. What we've done when we moved to the cloud is we've adopted cloud native services as part of our underlying platform to modernize and make our service cloud native. But what we're seeing as many customers either want to focus on on-premise deployments and especially financial institutions and government institute things, because they are very risk averse. Now we're seeing even those customers are realizing that it's very difficult to maintain the hundreds or thousands of servers that it requires on premise and have the large skilled staff required to keep it running. So what we're seeing now is a lot of those customers deployed some packaged products like our own, and even our own customers are doing a mass migration to the cloud because everything is handled for them as a service. And we have a team of experts that we maintain to support all of our global customers, rather than every one of our global customers having their own teams that we then support on the back end. So it's a much more efficient model. And then the other major approach that many of our customers also went down the path of is, is building their own security data lake. And many customers were somewhat successful in building their own security data lake but in order to keep up with the innovation, if you look at the analyst groups, the Gartner Magic Quadrant on the SIEM space, the feature set that is provided by a packaged product is a very large feature set. And even if somebody was put together all of the open source technologies to meet 20% of those features, just maintaining that over time is very expensive and very difficult. So we want to provide a service that has all of the best in class features, but also leverages the ability to innovate on the backend without the customer knowing. So we can do a technology shift to Ahana and Presto from our previous technology set. The customer doesn't know the difference, but they see the value add within the service that we're offering. >> So if I get this right, Derrick, Presto's enabling you guys to do threat detection at a level that you're super happy with as well as giving you the option for give self-service. Is that right for the, is that a kind of a- >> Well, let me clarify our definition. So we do streaming threat detection. So we do a machine learning based behavioral analysis and threat detection on rule-based correlation as well. So we do threat detection during the streaming process, but as part of the process of managing cybersecurity, the customer has a team of security analysts that do threat hunting. And the threat hunting is where Ahana comes in. So a human gets involved and starts searches for the forensic logs to determine what happened over time that might be suspicious and they start to investigate through a series of queries to give them the information that's relevant. And once they find information that's relevant, then they package it up into an algorithm that will do a analysis on an ongoing basis as part of the stream processing. So it's really part of the life cycle of hunting a real time threat detection. >> It's kind of like old adage hunters and farmers, you're farming through the streaming and hunting with the detection. I got to ask you, what would it be the alternative if you go back, I mean, I know cloud's so great because you have cutting edge applications and technologies. Without Presto, where would you be? I mean, what would be life like without these capabilities? What would have to happen? >> Well, the issue is not that we had the same feature set before we moved to Presto, but the challenge was on scale. The cost profile to continue to grow from 100 terabytes to one petabyte, to tens of petabytes, not only was it expensive, but it just, the scaling factors were not linear. So not only did we have a problem with the costs, but we also had a problem with the performance tailing off and keeping the service running. A large Hadoop cluster, for example, our first incarnation of this use, the hive service, in order to query data in a MapReduce cluster. So it's a completely different technology that uses a distributed Hadoop compute cluster to do the query. It does work, but then we start to see resource contention with that, and all the other things in the Hadoop platform. The Presto engine has the beauty of it, not only was it designed for scale, but it's feature built just for a query engine and that's the providing the right tool for the job, as opposed to a general purpose tool. >> Derrick, you've got a very busy job as chief architect. What are you excited about going forward when you look at the cloud technologies? What are you looking at? What are you watching? What are you getting excited about or what worries you? >> Well, that's a good question. What we're really doing, I'm leading up a group called the Securonix Innovation Labs, and we're looking at next generation technologies. We go through and analyze both open source technologies, technologies that are proprietary as well as building own technologies. And that's where we came across Ahana as part of a comprehensive analysis of different search engines, because we wanted to go through another round of search engine modernization, and we worked together in a partnership, and we're going to market together as part of our modernization efforts that we're continuously going through. So I'm looking forward to iterative continuous improvement over time. And this next journey, what we're seeing because of the growth in cybersecurity, really requires new and innovative technologies to work together holistically. >> Dipti, you got a great company that you co-founded. I got to ask you as the co-founder and chief product officer, you both the lead entrepreneur also, got the keys to the kingdom with the products. You got to balance that 20 miles stare out in the future while driving product excellence. You've got open source as a tailwind. What's on your mind as you go forward with your venture? >> Yeah. Great question. It's been super exciting to have found the Ahana in this space, cloud data and open source. That's where the action is happening these days, but there's two parts to it. One is making our customers successful and continuously delivering capabilities, features, continuing on our ease of use theme and a foundation to get customers like Securonix and others to get most value out of their data and as fast as possible, right? So that's a continuum. In terms of the longer term innovation, the way I see the space, there is a lot more innovation to be done and Presto itself can be made even better and there's a next gen Presto that we're working on. And given that Presto is a part of the foundation, the Linux Foundation, a lot of this innovation is happening together collaboratively with Facebook, with Uber who are members of the foundation with us. Securonix, we look forward to making a part of that foundation. And that innovation together can then benefit the entire community as well as the customer base. This includes better performance with more capabilities built in, caching and many other different types of database innovations, as well as scaling, auto scaling and keeping up with this ease of use theme that we're building on. So very exciting to work together with all these companies, as well as Securonix who's been a fantastic partner. We work together, build features together, and I look at delivering those features and functionalities to be used by these analysts, data scientists and threat hunters as Derrick called them. >> Great success, great partnership. And I love the open innovation, open co-creation you guys are doing together and open data lakes, great concept, open data analytics as well. This is the future. Insights coming from the open and sharing and actually having some standards. I love this topic, so Dipti, thank you very much, and Derrick, thanks for coming on and sharing on this Cube Conversation. Thanks for coming on. >> Thank you so much, John. >> Thanks for having us. >> Thanks. Take care. Bye-bye. >> Okay, it's theCube Conversation here in Palo Alto, California. I'm John furrier, your host of theCube. Thanks for watching. (upbeat music)

Published Date : Jul 30 2021

SUMMARY :

guys spending the time. and Derrick, hello again. and set the table on Securonix and Ahana? and momentum that we can into the warehouse and those, you know, because the SIEM market is exploding, and really get the handle either on the amount of data we ingest and it's also the types of information. hear the experts weigh in. So hopefully that answers the Yeah, I mean, the joke Why is the managed Yeah and interestingly, a search service that is offered to us And on the Ahana side, open data lakes, and you get that out of the box, I got to ask you while and the usage as well. and control the cost from the value that they're getting and have the large skilled staff as well as giving you the for the forensic logs to and hunting with the detection. and that's the providing when you look at the cloud technologies? because of the growth in cybersecurity, got the keys to the and a foundation to get And I love the open here in Palo Alto, California.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
SecuronixORGANIZATION

0.99+

JohnPERSON

0.99+

Derrick HarceyPERSON

0.99+

DerrickPERSON

0.99+

FacebookORGANIZATION

0.99+

AhanaORGANIZATION

0.99+

AhanaPERSON

0.99+

John FurrierPERSON

0.99+

20%QUANTITY

0.99+

July 2021DATE

0.99+

UberORGANIZATION

0.99+

DiptiPERSON

0.99+

100 terabytesQUANTITY

0.99+

AmazonORGANIZATION

0.99+

10 yearsQUANTITY

0.99+

AWSORGANIZATION

0.99+

hundredsQUANTITY

0.99+

Linux FoundationORGANIZATION

0.99+

two partsQUANTITY

0.99+

thousandsQUANTITY

0.99+

Securonix Innovation LabsORGANIZATION

0.99+

tens of petabytesQUANTITY

0.99+

30 minutesQUANTITY

0.99+

one petabyteQUANTITY

0.99+

Dipti BorkarPERSON

0.99+

20 milesQUANTITY

0.99+

Palo Alto, CaliforniaLOCATION

0.99+

five personQUANTITY

0.99+

FirstQUANTITY

0.99+

SQLTITLE

0.99+

last monthDATE

0.99+

bothQUANTITY

0.99+

OneQUANTITY

0.98+

15th birthdayQUANTITY

0.97+

two great companiesQUANTITY

0.96+

HuBERTORGANIZATION

0.96+

HadoopTITLE

0.96+

S3TITLE

0.96+

hundreds of technologiesQUANTITY

0.96+

three personQUANTITY

0.95+

ParquetTITLE

0.94+

first incarnationQUANTITY

0.94+

firstQUANTITY

0.94+

PrestoORGANIZATION

0.93+

GartnerORGANIZATION

0.93+

last decadeDATE

0.92+

terabytes of dataQUANTITY

0.92+

first logQUANTITY

0.91+

single clickQUANTITY

0.9+

PrestoPERSON

0.9+

theCUBEORGANIZATION

0.88+

Steven Mih, Ahana and Sachin Nayyar, Securonix | AWS Startup Showcase


 

>> Voiceover: From theCUBE's Studios in Palo Alto in Boston, connecting with thought leaders all around the world, this is theCUBE Conversation. >> Welcome back to theCUBE's coverage of the AWS Startup Showcase. Next Big Thing in AI, Security and Life Sciences featuring Ahana for the AI Trek. I'm your host, John Furrier. Today, we're joined by two great guests, Steven Mih, Ahana CEO, and Sachin Nayyar, Securonix CEO. Gentlemen, thanks for coming on theCUBE. We're talking about the Next-Gen technologies on AI, Open Data Lakes, et cetera. Thanks for coming on. >> Thanks for having us, John. >> Thanks, John. >> What a great line up here. >> Sachin: Thanks, Steven. >> Great, great stuff. Sachin, let's get in and talk about your company, Securonix. What do you guys do? Take us through, I know you've got a slide to help us through this, I want to introduce your stuff first then jump in with Steven. >> Absolutely. Thanks again, Steven. Ahana team for having us on the show. So Securonix, we started the company in 2010. We are the leader in security analytics and response capability for the cybermarket. So basically, this is a category of solutions called SIEM, Security Incident and Event Management. We are the quadrant leaders in Gartner, we now have about 500 customers today and have been plugging away since 2010. Started the company just really focused on analytics using machine learning and an advanced analytics to really find the needle in the haystack, then moved from there to needle in the needle stack using more algorithms, analysis of analysis. And then kind of, I evolved the company to run on cloud and become sort of the biggest security data lake on cloud and provide all the analytics to help companies with their insider threat, cyber threat, cloud solutions, application threats, emerging internally and externally, and then response and have a great partnership with Ahana as well as with AWS. So looking forward to this session, thank you. >> Awesome. I can't wait to hear the news on that Next-Gen SIEM leadership. Steven, Ahana, talk about what's going on with you guys, give us the update, a lot of stuff happening. >> Yeah. Great to be here and thanks for that such, and we appreciate the partnership as well with both Securonix and AWS. Ahana is the open source company based on PrestoDB, which is a project that came out of Facebook and is widely used, one of the fastest growing projects in data analytics today. And we make a managed service for Presto easily on AWS, all cloud native. And we'll be talking about that more during the show. Really excited to be here. We believe in open source. We believe in all the challenges of having data in the cloud and making it easy to use. So thanks for having us again. >> And looking forward to digging into that managed service and why that's been so successful. Looking forward to that. Let's get into the Securonix Next-Gen SIEM leadership first. Let's share the journey towards what you guys are doing here. As the Open Data Lakes on AWS has been a hot topic, the success of data in the cloud, no doubt is on everyone's mind especially with the edge coming. It's just, I mean, just incredible growth. Take us through Sachin, what do you guys got going on? >> Absolutely. Thanks, John. We are hearing about cyber threats every day. No question about it. So in the past, what was happening is companies, what we have done as enterprise is put all of our eggs in the basket of solutions that were evaluating the network data. With cloud, obviously there is no more network data. Now we have moved into focusing on EDR, right thing to do on endpoint detection. But with that, we also need security analytics across on-premise and cloud. And your other solutions like your OT, IOT, your mobile, bringing it all together into a security data lake and then running purpose built analytics on top of that, and then having a response so we can prevent some of these things from happening or detect them in real time versus innovating for hours or weeks and months, which is is obviously too late. So with some of the recent events happening around colonial and others, we all know cybersecurity is on top of everybody's mind. First and foremost, I also want to. >> Steven: (indistinct) slide one and that's all based off on top of the data lake, right? >> Sachin: Yes, absolutely. Absolutely. So before we go into on Securonix, I also want to congratulate everything going on with the new cyber initiatives with our government and just really excited to see some of the things that the government is also doing in this space to bring, to have stronger regulation and bring together the government and the private sector. From a Securonix perspective, today, we have one third of the fortune 500 companies using our technology. In addition, there are hundreds of small and medium sized companies that rely on Securonix for their cyber protection. So what we do is, again, we are running the solution on cloud, and that is very important. It is not just important for hosting, but in the space of cybersecurity, you need to have a solution, which is not, so where we can update the threat models and we can use the intelligence or the Intel that we gather from our customers, partners, and industry experts and roll it out to our customers within seconds and minutes, because the game is real time in cybersecurity. And that you can only do in cloud where you have the complete telemetry and access to these environments. When we go on-premise traditionally, what you will see is customers are even thinking about pushing the threat models through their standard Dev test life cycle management, and which is just completely defeating the purpose. So in any event, Securonix on the cloud brings together all the data, then runs purpose-built analytics on it. Helps you find very few, we are today pulling in several million events per second from our customers, and we provide just a very small handful of events and reduce the false positives so that people can focus on them. Their security command center can focus on that and then configure response actions on top of that. So we can take action for known issues and have intelligence in all the layers. So that's kind of what the Securonix is focused on. >> Steven, he just brought up, probably the most important story in technology right now. That's ransomware more than, first of all, cybersecurity in general, but ransomware, he mentioned some of the government efforts. Some are saying that the ransomware marketplace is bigger than some governments, nation state governments. There's a business model behind it. It's highly active. It's dominating the scene and it's a real threat. This is the new world we're living in, cloud creates the refactoring capabilities. We're hearing that story here with Securonix. How does Presto and Securonix work together? Because I'm connecting the dots here in real time. I think you're going to go there. So take us through because this is like the most important topic happening. >> Yeah. So as Sachin said, there's all this data that needs to go into the cloud and it's all moving to the cloud. And there's a massive amounts of data and hundreds of terabytes, petabytes of data that's moving into the data lakes and that's the S3-based data lakes, which are the easiest, cheapest, commodified place to put all this data. But in order to deliver the results that Sachin's company is driving, which is intelligence on when there's a ransomware or possibility, you need to have analytics on them. And so Presto is the open source project that is a open source SQL query engine for data lakes and other data sources. It was created by Facebook as part of the Linux foundation, something called Presto foundation. And it was built to replace the complicated Hadoop stack in order to then drive analytics at very lightning fast queries on large, large sets of data. And so Presto fits in with this Open Data Lake analytics movement, which has made Presto one of the fastest growing projects out there. >> What is an Open Data Lake? Real quick for the audience who wants to learn on what it means. Does is it means it's open source in the Linux foundation or open meaning it's open to multiple applications? What does that even mean? >> Yeah. Open Data Lake analytics means that you're, first of all, your data lake has open formats. So it is made up of say something called the ORC or Parquet. And these are formats that any engine can be used against. That's really great, instead of having locked in data types. Data lakes can have all different types of data. It can have unstructured, semi-structured data. It's not just the structured data, which is typically in your data warehouses. There's a lot more data going into the Open Data Lake. And then you can, based on what workload you're looking to get benefit from, the insights come from that, and actually slide two covers this pictorially. If you look on the left here on slide two, the Open Data Lake is where all the data is pulling. And Presto is the layer in between that and the insights which are driven by the visualization, reporting, dashboarding, BI tools or applications like in Securonix case. And so analytics are now being driven by every company for not just industries of security, but it's also for every industry out there, retail, e-commerce, you name it. There's a healthcare, financials, all are looking at driving more analytics for their SaaSified applications as well as for their own internal analysts, data scientists, and folks that are trying to be more data-driven. >> All right. Let's talk about the relationship now with where Presto fits in with Securonix because I get the open data layer. I see value in that. I get also what we're talking about the cloud and being faster with the datasets. So how does, Sachin' Securonix and Ahana fit in together? >> Yeah. Great question. So I'll tell you, we have two customers. I'll give you an example. We have two fortune 10 customers. One has moved most of their operations to the cloud and another customer which is in the process, early stage. The data, the amount of data that we are getting from the customer who's moved fully to the cloud is 20 times, 20 times more than the customer who's in the early stages of moving to the cloud. That is because the ability to add this level of telemetry in the cloud, in this case, it happens to be AWS, Office 365, Salesforce and several other rescalers across several other cloud technologies. But the level of logging that we are able to get the telemetry is unbelievable. So what it does is it allows us to analyze more, protect the customers better, protect them in real time, but there is a cost and scale factor to that. So like I said, when you are trying to pull in billions of events per day from a customer billions of events per day, what the customers are looking for is all of that data goes in, all of data gets enriched so that it makes sense to a normal analyst and all of that data is available for search, sometimes 90 days, sometimes 12 months. And then all of that data is available to be brought back into a searchable format for up to seven years. So think about the amount of data we are dealing with here and we have to provide a solution for this problem at a price that is affordable to the customer and that a medium-sized company as well as a large organization can afford. So after a lot of our analysis on this and again, Securonix is focused on cyber, bringing in the data, analyzing it, so after a lot of our analysis, we zeroed in on S3 as the core bucket where this data needs to be stored because the price point, the reliability, and all the other functions available on top of that. And with that, with S3, we've created a great partnership with AWS as well as with Snowflake that is providing this, from a data lake perspective, a bigger data lake, enterprise data lake perspective. So now for us to be able to provide customers the ability to search that data. So data comes in, we are enriching it. We are putting it in S3 in real time. Now, this is where Presto comes in. In our research, Presto came out as the best search engine to sit on top of S3. The engine is supported by companies like Facebook and Uber, and it is open source. So open source, like you asked the question. So for companies like us, we cannot depend on a very small technology company to offer mission critical capabilities because what if that company gets acquired, et cetera. In the case of open source, we are able to adopt it. We know there is a community behind it and it will be kind of available for us to use and we will be able to contribute in it for the longterm. Number two, from an open source perspective, we have a strong belief that customers own their own data. Traditionally, like Steven used the word locked in, it's a key term, customers have been locked in into proprietary formats in the past and those days are over. You should be, you own the data and you should be able to use it with us and with other systems of choice. So now you get into a data search engine like Presto, which scales independently of the storage. And then when we start looking at Presto, we came across Ahana. So for every open source system, you definitely need a sort of a for-profit company that invests in the community and then that takes the community forward. Because without a company like this, the community will die. So we are very excited about the partnership with Presto and Ahana. And Ahana provides us the ability to take Presto and cloudify it, or make the cloud operations work plus be our conduit to the Ahana community. Help us speed up certain items on the roadmap, help our team contribute to the community as well. And then you have to take a solution like Presto, you have to put it in the cloud, you have to make it scale, you have to put it on Kubernetes. Standard thing that you need to do in today's world to offer it as sort of a micro service into our architecture. So in all of those areas, that's where our partnership is with Ahana and Presto and S3 and we think, this is the search solution for the future. And with something like this, very soon, we will be able to offer our customers 12 months of data, searchable at extremely fast speeds at very reasonable price points and you will own your own data. So it has very significant business benefits for our customers with the technology partnership that we have set up here. So very excited about this. >> Sachin, it's very inspiring, a couple things there. One, decentralize on your own data, having a democratized, that piece is killer. Open source, great point. >> Absolutely. >> Company goes out of business, you don't want to lose the source code or get acquired or whatever. That's a key enabler. And then three, a fast managed service that has a commercial backing behind it. So, a great, and by the way, Snowflake wasn't around a couple of years ago. So like, so this is what we're talking about. This is the cloud scale. Steven, take us home with this point because this is what innovation looks like. Could you share why it's working? What's some of the things that people could walk away with and learn from as the new architecture for the new NextGen cloud is here, so this is a big part of and share how this works? >> That's right. As you heard from Sachin, every company is becoming data-driven and analytics are central to their business. There's more data and it needs to be analyzed at lower cost without the locked in and people want that flexibility. And so a slide three talks about what Ahana cloud for Presto does. It's the best Presto out of the box. It gives you very easy to use for your operations team. So it can be one or two people just managing this and they can get up to speed very quickly in 30 minutes, be up and running. And that jump starts their movement into an Open Data Lake analytics architecture. That architecture is going to be, it is the one that is at Facebook, Uber, Twitter, other large web scale, internet scale companies. And with the amount of data that's occurring, that's now becoming the standard architecture for everyone else in the future. And so just to wrap, we're really excited about making that easy, giving an open source solution because the open source data stack based off of data lake analytics is really happening. >> I got to ask you, you've seen many waves on the industry. Certainly, you've been through the big data waves, Steven. Sachin, you're on the cutting edge and just the cutting edge billions of signals from one client alone is pretty amazing scale and refactoring that value proposition is super important. What's different from 10 years ago when the Hadoop, you mentioned Hadoop earlier, which is RIP, obviously the cloud killed it. We all know that. Everyone kind of knows that. But like, what's different now? I mean, skeptics might say, I don't believe you, but it's just crazy. There's no way it works. S3 costs way too much. Why is this now so much more of an attractive proposition? What do you say the naysayers out there? With Steve, we'll start with you and then Sachin, I want you to like weigh in too. >> Yeah. Well, if you think about the Hadoop era and if you look at slide three, it was a very complicated system that was done mainly on-prem. And you'd have to go and set up a big data team and a rack and stack a bunch of servers and then try to put all this stuff together and candidly, the results and the outcomes of that were very hard to get unless you had the best possible teams and invested a lot of money in this. What you saw in this slide was that, that right hand side which shows the stack. Now you have a separate compute, which is based off of Intel based instances in the cloud. We run the best in that and they're part of the Presto foundation. And that's now data lakes. Now the distributed compute engines are the ones that have become very much easier. So the big difference in what I see is no longer called big data. It's just called data analytics because it's now become commodified as being easy and the bar is much, much lower, so everyone can get the benefit of this across industries, across organizations. I mean, that's good for the world, reduces the security threats, the ransomware, in the case of Securonix and Sachin here. But every company can benefit from this. >> Sachin, this is really as an example in my mind and you can comment too on if you'd believe or not, but replatform with the cloud, that's a no brainer. People do that. They did it. But the value is refactoring in the cloud. It's thinking differently with the assets you have and making sure you're using the right pieces. I mean, there's no brainer, you know it's good. If it costs more money to stand up something than to like get value out of something that's operating at scale, much easier equation. What's your thoughts on this? Go back 10 years and where we are now, what's different? I mean, replatforming, refactoring, all kinds of happening. What's your take on all this? >> Agreed, John. So we have been in business now for about 10 to 11 years. And when we started my hair was all black. Okay. >> John: You're so silly. >> Okay. So this, everything has happened here is the transition from Hadoop to cloud. Okay. This is what the result has been. So people can see it for themselves. So when we started off with deep partnerships with the Hadoop providers and again, Hadoop is the foundation, which has now become EMR and everything else that AWS and other companies have picked up. But when you start with some basic premise, first, the racking and stacking of hardware, companies having to project their entire data volume upfront, bringing the servers and have 50, 100, 500 servers sitting in their data centers. And then when there are spikes in data, or like I said, as you move to the cloud, your data volume will increase between five to 20x and projecting for that. And then think about the agility that it will take you three to six months to bring in new servers and then bring them into the architecture. So big issue. Number two big issue is that the backend of that was built for HDFS. So Hadoop in my mind was built to ingest large amounts of data in batches and then perform some spark jobs on it, some analytics. But we are talking in security about real time, high velocity, high variety data, which has to be available in real time. It wasn't built for that, to be honest. So what was happening is, again, even if you look at the Hadoop companies today as they have kind of figured, kind of define their next generation, they have moved from HDFS to now kind of a cloud based platform capability and have discarded the traditional HDFS architecture because it just wasn't scaling, wasn't searching fast enough, wasn't searching fast enough for hundreds of analysts at the same time. And then obviously, the servers, et cetera wasn't working. Then when we worked with the Hadoop companies, they were always two to three versions behind for the individual services that they had brought together. And again, when you're talking about this kind of a volume, you need to be on the cutting edge always of the technologies underneath that. So even while we were working with them, we had to support our own versions of Kafka, Solr, Zookeeper, et cetera to really bring it together and provide our customers this capability. So now when we have moved to the cloud with solutions like EMR behind us, AWS has invested in in solutions like EMR to make them scalable, to have scale and then scale out, which traditional Hadoop did not provide because they missed the cloud wave. And then on top of that, again, rather than throwing data in that traditional older HDFS format, we are now taking the same format, the parquet format that it supports, putting it in S3 and now making it available and using all the capabilities like you said, the refactoring of that is critical. That rather than on-prem having servers and redundancies with S3, we get built in redundancy. We get built in life cycle management, high degree of confidence data reliability. And then we get all this innovation from companies like, from groups like Presto, companies like Ahana sitting on double that S3. And the last item I would say is in the cloud we are now able to offer multiple, have multiple resilient options on our side. So for example, with us, we still have some premium searching going on with solutions like Solr and Elasticsearch, then you have Presto and Ahana providing majority of our searching, but we still have Athena as a backup in case something goes down in the architecture. Our queries will spin back up to Athena, AWS service on Presto and customers will still get served. So all of these options, but what it doesn't cost us anything, Athena, if we don't use it, but all of these options are not available on-prem. So in my mind, I mean, it's a whole new world we are living in. It is a world where now we have made it possible for companies to even enterprises to even think about having true security data lakes, which are useful and having real-time analytics. From my perspective, I don't even sign up today for a large enterprise that wants to build a data lake on-prem because I know that is not, that is going to be a very difficult project to make it successful. So we've come a long way and there are several details around this that we've kind of endured through the process, but very excited where we are today. >> Well, we certainly follow up with theCUBE on all your your endeavors. Quickly on Ahana, why them, why their solution? In your words, what would be the advice you'd give me if I'm like, okay, I'm looking at this, why do I want to use it, and what's your experience? >> Right. So the standard SQL query engine for data lake analytics, more and more people have more data, want to have something that's based on open source, based on open formats, gives you that flexibility, pay as you go. You only pay for what you use. And so it proved to be the best option for Securonix to create a self-service system that has all the speed and performance and scalability that they need, which is based off of the innovation from the large companies like Facebook, Uber, Twitter. They've all invested heavily. We contribute to the open source project. It's a vibrant community. We encourage people to join the community and even Securonix, we'll be having engineers that are contributing to the project as well. I think, is that right Sachin? Maybe you could share a little bit about your thoughts on being part of the community. >> Yeah. So also why we chose Ahana, like John said. The first reason is you see Steven is always smiling. Okay. >> That's for sure. >> That is very important. I mean, jokes apart, you need a great partner. You need a great partner. You need a partner with a great attitude because this is not a sprint, this is a marathon. So the Ahana founders, Steven, the whole team, they're world-class, they're world-class. The depth that the CTO has, his experience, the depth that Dipti has, who's running the cloud solution. These guys are world-class. They are very involved in the community. We evaluated them from a community perspective. They are very involved. They have the depth of really commercializing an open source solution without making it too commercial. The right balance, where the founding companies like Facebook and Uber, and hopefully Securonix in the future as we contribute more and more will have our say and they act like the right stewards in this journey and then contribute as well. So and then they have chosen the right niche rather than taking portions of the product and making it proprietary. They have put in the effort towards the cloud infrastructure of making that product available easily on the cloud. So I think it's sort of a no-brainer from our side. Once we chose Presto, Ahana was the no-brainer and just the partnership so far has been very exciting and I'm looking forward to great things together. >> Likewise Sachin, thanks so much for that. And we've only found your team, you're world-class as well, and working together and we look forward to working in the community also in the Presto foundation. So thanks for that. >> Guys, great partnership. Great insight and really, this is a great example of cloud scale, cloud value proposition as it unlocks new benefits. Open source, managed services, refactoring the opportunities to create more value. Stephen, Sachin, thank you so much for sharing your story here on open data lakes. Can open always wins in my mind. This is theCUBE we're always open and we're showcasing all the hot startups coming out of the AWS ecosystem for the AWS Startup Showcase. I'm John Furrier, your host. Thanks for watching. (bright music)

Published Date : Jun 24 2021

SUMMARY :

leaders all around the world, of the AWS Startup Showcase. to help us through this, and provide all the what's going on with you guys, in the cloud and making it easy to use. Let's get into the Securonix So in the past, what was So in any event, Securonix on the cloud Some are saying that the and that's the S3-based data in the Linux foundation or open meaning And Presto is the layer in because I get the open data layer. and all the other functions that piece is killer. and learn from as the new architecture for everyone else in the future. obviously the cloud killed it. and the bar is much, much lower, But the value is refactoring in the cloud. So we have been in business and again, Hadoop is the foundation, be the advice you'd give me system that has all the speed The first reason is you see and just the partnership so in the community also in for the AWS Startup Showcase.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
StevenPERSON

0.99+

SachinPERSON

0.99+

JohnPERSON

0.99+

StevePERSON

0.99+

SecuronixORGANIZATION

0.99+

AWSORGANIZATION

0.99+

John FurrierPERSON

0.99+

Steven MihPERSON

0.99+

50QUANTITY

0.99+

UberORGANIZATION

0.99+

2010DATE

0.99+

StephenPERSON

0.99+

Sachin NayyarPERSON

0.99+

FacebookORGANIZATION

0.99+

20 timesQUANTITY

0.99+

oneQUANTITY

0.99+

12 monthsQUANTITY

0.99+

threeQUANTITY

0.99+

TwitterORGANIZATION

0.99+

AhanaPERSON

0.99+

two customersQUANTITY

0.99+

90 daysQUANTITY

0.99+

AhanaORGANIZATION

0.99+

Palo AltoLOCATION

0.99+

100QUANTITY

0.99+

30 minutesQUANTITY

0.99+

PrestoORGANIZATION

0.99+

hundreds of terabytesQUANTITY

0.99+

fiveQUANTITY

0.99+

FirstQUANTITY

0.99+

OneQUANTITY

0.99+

twoQUANTITY

0.99+

hundredsQUANTITY

0.99+

six monthsQUANTITY

0.99+

S3TITLE

0.99+

ZookeeperTITLE

0.99+

Matt Maccaux, HPE | HPE Discover 2021


 

(bright music) >> Data by its very nature is distributed and siloed, but most data architectures today are highly centralized. Organizations are increasingly challenged to organize and manage data, and turn that data into insights. This idea of a single monolithic platform for data, it's giving way to new thinking. Where a decentralized approach, with open cloud native principles and federated governance, will become an underpinning of digital transformations. Hi everybody. This is Dave Volante. Welcome back to HPE Discover 2021, the virtual version. You're watching theCube's continuous coverage of the event and we're here with Matt Maccaux, who's a field CTO for Ezmeral Software at HPE. We're going to talk about HPE software strategy, and Ezmeral and specifically how to take AI analytics to scale and ensure the productivity of data teams. Matt, welcome to theCube. Good to see you. >> Good to see you again, Dave. Thanks for having me today. >> You're welcome. So talk a little bit about your role as a CTO. Where do you spend your time? >> I spend about half of my time talking to customers and partners about where they are on their digital transformation journeys and where they struggle with this sort of last phase where we start talking about bringing those cloud principles and practices into the data world. How do I take those data warehouses, those data lakes, those distributed data systems, into the enterprise and deploy them in a cloud-like manner? Then the other half of my time is working with our product teams to feed that information back, so that we can continually innovate to the next generation of our software platform. >> So when I remember, I've been following HP and HPE, for a long, long time, theCube has documented, we go back to sort of when the company was breaking in two parts, and at the time a lot of people were saying, "Oh, HP is getting rid of their software business, they're getting out of software." I said, "No, no, no, hold on. They're really focusing", and the whole focus around hybrid cloud and now as a service, you've really retooling that business and sharpened your focus. So tell us more about Ezmeral, it's a cool name, but what exactly is Ezmeral software? >> I get this question all the time. So what is Ezmeral? Ezmeral is a software platform for modern data and analytics workloads, using open source software components. We came from some inorganic growth. We acquired a company called Cytec, that brought us a zero trust approach to doing security with containers. We bought BlueData who came to us with an orchestrator before Kubernetes even existed in mainstream. They were orchestrating workloads using containers for some of these more difficult workloads. Clustered applications, distributed applications like Hadoop. Then finally we acquired MapR, which gave us this scale out distributed file system and additional analytical capabilities. What we've done is we've taken those components and we've also gone out into the marketplace to see what open source projects exist to allow us to bring those cloud principles and practices to these types of workloads, so that we can take things like Hadoop, and Spark, and Presto, and deploy and orchestrate them using open source Kubernetes. Leveraging GPU's, while providing that zero trust approach to security, that's what Ezmeral is all about is taking those cloud practices and principles, but without locking you in. Again, using those open source components where they exist, and then committing and contributing back to the opensource community where those projects don't exist. >> You know, it's interesting, thank you for that history, and when I go back, I have been there since the early days of Big Data and Hadoop and so forth and MapR always had the best product, but they couldn't get it out. Back then it was like kumbaya, open source, and they had this kind of proprietary system but it worked and that's why it was the best product. So at the same time they participated in open source projects because everybody did, that's where the innovation is going. So you're making that really hard to use stuff easier to use with Kubernetes orchestration, and then obviously, I'm presuming with the open source chops, sort of leaning into the big trends that you're seeing in the marketplace. So my question is, what are those big trends that you're seeing when you speak to technology executives which is a big part of what you do? >> So the trends are, I think, are a couplefold, and it's funny about Hadoop, but I think the final nails in the coffin have been hammered in with the Hadoop space now. So that leading trend, of where organizations are going, we're seeing organizations wanting to go cloud first. But they really struggle with these data-intensive workloads. Do I have to store my data in every cloud? Am I going to pay egress in every cloud? Well, what if my data scientists are most comfortable in AWS, but my data analysts are more comfortable in Azure, how do I provide that multi-cloud experience for these data workloads? That's the number one question I get asked, and that's probably the biggest struggle for these chief data officers, chief digital officers, is how do I allow that innovation but maintaining control over my data compliance especially when we talk international standards, like GDPR, to restrict access to data, the ability to be forgotten, in these multinational organizations how do I sort of square all of those components? Then how do I do that in a way that just doesn't lock me into another appliance or software vendor stack? I want to be able to work within the confines of the ecosystem, use the tools that are out there, but allow my organization to innovate in a very structured compliant way. >> I mean, I love this conversation and you just, to me, you hit on the key word, which is organization. I want to talk about what some of the barriers are. And again, you heard my wrap up front. I really do think that we've created, not only from a technology standpoint, and yes the tooling is important, but so is the organization, and as you said an analyst might want to work in one environment, a data scientist might want to work in another environment. The data may be very distributed. You might have situations where they're supporting the line of business. The line of business is trying to build new products, and if I have to go through this monolithic centralized organization, that's a barrier for me. And so we're seeing that change, that I kind of alluded to it up front, but what do you see as the big barriers that are blocking this vision from becoming a reality? >> It very much is organization, Dave. The technology's actually no longer the inhibitor here. We have enough technology, enough choices out there that technology is no longer the issue. It's the organization's willingness to embrace some of those technologies and put just the right level of control around accessing that data. Because if you don't allow your data scientists and data analysts to innovate, they're going to do one of two things. They're either going to leave, and then you have a huge problem keeping up with your competitors, or they're going to do it anyway. And they're going to do it in a way that probably doesn't comply with the organizational standards. So the more progressive enterprises that I speak with have realized that they need to allow these various analytical users to choose the tools they want, to self provision those as they need to and get access to data in a secure and compliant way. And that means we need to bring the cloud to generally where the data is because it's a heck of a lot easier than trying to bring the data where the cloud is, while conforming to those data principles, and that's HPE's strategy. You've heard it from our CEO for years now. Everything needs to be delivered as a service. It's Ezmeral Software that enables that capability, such as self-service and secure data provisioning, et cetera. >> Again, I love this conversation because if you go back to the early days of Hadoop, that was what was profound about a Hadoop. Bring five megabytes of code to a petabyte of data, and it didn't happen. We shoved it all into a data lake and it became a data swamp. And that's okay, it's a one dot oh, you know, maybe in data as is like data warehouses, data hubs, data lakes, maybe this is now a four dot oh, but we're getting there. But open source, one thing's for sure, it continues to gain momentum, it's where the innovation is. I wonder if you could comment on your thoughts on the role that open-source software plays for large enterprises, maybe some of the hurdles that are there, whether they're legal or licensing, or just fears, how important is open source software today? >> I think the cloud native developments, following the 12 factor applications, microservices based, paved the way over the last decade to make using open source technology tools and libraries mainstream. We have to tip our hats to Red Hat, right? For allowing organizations to embrace something so core as an operating system within the enterprise. But what everyone realized is that it's support that's what has to come with that. So we can allow our data scientists to use open source libraries, packages, and notebooks, but are we going to allow those to run in production? So if the answer is no, well? Then if we can't get support, we're not going to allow that. So where HPE Ezmeral is taking the lead here is, again, embracing those open source capabilities, but, if we deploy it, we're going to support it. Or we're going to work with the organization that has the committers to support it. You call HPE, the same phone number you've been calling for years for tier one 24 by seven support, and we will support your Kubernetes, your Spark your Presto, your Hadoop ecosystem of components. We're that throat to choke and we'll provide, all the way up to break/fix support, for some of these components and packages, giving these large enterprises the confidence to move forward with open source, but knowing that they have a trusted partner in which to do so. >> And that's why we've seen such success with say, for instance, managed services in the cloud, versus throwing out all the animals in the zoo and say, okay, figure it out yourself. But then, of course, what we saw, which was kind of ironic, was people finally said, "Hey, we can do this in the cloud more easily." So that's where you're seeing a lot of data land. However, the definition of cloud or the notion of cloud is changing. No longer is it just this remote set of services, "Somewhere out there in the cloud", some data center somewhere, no, it's moving to on-prem, on-prem is creating hybrid connections. You're seeing co-location facilities very proximate to the cloud. We're talking now about the edge, the near edge, and the far edge, deeply embedded. So that whole notion of cloud is changing. But I want to ask you, there's still a big push to cloud, everybody has a cloud first mantra, how do you see HPE competing in this new landscape? >> I think collaborating is probably a better word, although you could certainly argue if we're just leasing or renting hardware, then it would be competition, but I think again... The workload is going to flow to where the data exists. So if the data's being generated at the edge and being pumped into the cloud, then cloud is prod. That's the production system. If the data is generated via on-premises systems, then that's where it's going to be executed. That's production, and so HPE's approach is very much co-exist. It's a co-exist model of, if you need to do DevTests in the cloud and bring it back on-premises, fine, or vice versa. The key here is not locking our customers and our prospective clients into any sort of proprietary stack, as we were talking about earlier, giving people the flexibility to move those workloads to where the data exists, that is going to allow us to continue to get share of wallet, mind share, continue to deploy those workloads. And yes, there's going to competition that comes along. Do you run this on a GCP or do you run it on a GreenLake on-premises? Sure, we'll have those conversations, but again, if we're using open source software as the foundation for that, then actually where you run it is less relevant. >> So there's a lot of choices out there, when it comes to containers generally and Kubernetes specifically, and you may have answered this, you get the zero trust component, you've got the orchestrator, you've got the scale-out piece, but I'm interested in hearing in your words why an enterprise would or should consider Ezmeral instead of alternatives to Kubernetes solutions? >> It's a fair question, and it comes up in almost every conversation. "Oh, we already do Kubernetes, we have a Kubernetes standard", and that's largely true in most of the enterprises I speak to. They're using one of the many on-premises distributions to their cloud distributions, and they're all fine. They're all fine for what they were built for. Ezmeral was generally built for something a little different. Yes, everybody can run microservices based applications, DevOps based workloads, but where Ezmeral is different is for those data intensive, in clustered applications. Those sorts of applications require a certain degree of network awareness, persistent storage, et cetera, which requires either a significant amount of intelligence. Either you have to write in Golang, or you have to write your own operators, or Ezmeral can be that easy button. We deploy those stateful applications, because we bring a persistent storage layer, that came from MapR. We're really good at deploying those stateful clustered applications, and, in fact, we've opened sourced that as a project, KubeDirector, that came from BlueData, and we're really good at securing these, using SPIFFE and SPIRE, to ensure that there's that zero trust approach, that came from Scytale, and we've wrapped all of that in Kubernetes. So now you can take the most difficult, gnarly complex data intensive applications in your enterprise and deploy them using open source. And if that means we have to co-exist with an existing Kubernetes distribution, that's fine. That's actually the most common scenario that I walk into is, I start asking about, "What about these other applications you haven't done yet?" The answer is usually, "We haven't gotten to them yet", or "We're thinking about it", and that's when we talk about the capabilities of Ezmeral and I usually get the response, "Oh. A, we didn't know you existed and B well, let's talk about how exactly you do that." So again, it's more of a co-exist model rather than a compete with model, Dave. >> Well, that makes sense. I mean, I think again, a lot of people, they go, "Oh yeah, Kubernetes, no big deal. It's everywhere." But you're talking about a solution, kind of taking a platform approach with capabilities. You got to protect the data. A lot of times, these microservices aren't so micro and things are happening really fast. You've got to be secure. You got to be protected. And like you said, you've got a single phone number. You know, people say one throat to choke. Somebody in the media the other day said, "No, no. Single hand to shake." It's more of a partnership. I think that's apropos for HPE, Matt, with your heritage. >> That one's better. >> So, you know, thinking about this whole, we've gone through the pre big data days and the big data was all the hot buzzword. People don't maybe necessarily use that term anymore, although the data is bigger and getting bigger, which is kind of ironic. Where do you see this whole space going? We've talked about that sort of trend toward breaking down the silos, decentralization, maybe these hyper specialized roles that we've created, maybe getting more embedded or aligned with the line of business. How do you see... It feels like the next 10 years are going to be different than the last 10 years. How do you see it, Matt? >> I completely agree. I think we are entering this next era, and I don't know if it's well-defined. I don't know if I would go out on an edge to say exactly what the trend is going to be. But as you said earlier, data lakes really turned into data swamps. We ended up with lots of them in the enterprise, and enterprises had to allow that to happen. They had to let each business unit or each group of users collect the data that they needed and IT sort of had to deal with that down the road. I think that the more progressive organizations are leading the way. They are, again, taking those lessons from cloud and application developments, microservices, and they're allowing a freedom of choice. They're allowing data to move, to where those applications are, and I think this decentralized approach is really going to be king. You're going to see traditional software packages. You're going to see open source. You're going to see a mix of those, but what I think will probably be common throughout all of that is there's going to be this sense of automation, this sense that, we can't just build an algorithm once, release it and then wish it luck. That we've got to treat these analytics, and these data systems, as living things. That there's life cycles that we have to support. Which means we need to have DevOps for our data science. We need a CI/CD for our data analytics. We need to provide engineering at scale, like we do for software engineering. That's going to require automation, and an organizational thinking process, to allow that to actually occur. I think all of those things. The sort of people, process, products. It's all three of those things that are going to have to come into play, but stealing those best ideas from cloud and application developments, I think we're going to end up with probably something new over the next decade or so. >> Again, I'm loving this conversation, so I'm going to stick with it for a sec. It's hard to predict, but some takeaways that I have, Matt, from our conversation, I wonder if you could comment? I think the future is more open source. You mentioned automation, Devs are going to be key. I think governance as code, security designed in at the point of code creation, is going to be critical. It's no longer going be a bolt on. I don't think we're going to throw away the data warehouse or the data hubs or the data lakes. I think they become a node. I like this idea, I don't know if you know Zhamak Dehghani? but she has this idea of a global data mesh where these tools, lakes, whatever, they're a node on the mesh. They're discoverable. They're shareable. They're governed in a way. I think the mistake a lot of people made early on in the big data movement is, "Oh, we got data. We have to monetize our data." As opposed to thinking about what products can I build that are based on data that then can lead to monetization? I think the other thing I would say is the business has gotten way too technical. (Dave chuckles) It's alienated a lot of the business lines. I think we're seeing that change, and I think things like Ezmeral that simplify that, are critical. So I'll give you the final thoughts, based on my rant. >> No, your rant is spot on Dave. I think we are in agreement about a lot of things. Governance is absolutely key. If you don't know where your data is, what it's used for, and can apply policies to it. It doesn't matter what technology you throw at it, you're going to end up in the same state that you're essentially in today, with lots of swamps. I did like that concept of a node or a data mesh. It kind of goes back to the similar thing with a service mesh, or a set of APIs that you can use. I think we're going to have something similar with data. The trick is always, how heavy is it? How easy is it to move about? I think there's always going to be that latency issue, maybe not within the data center, but across the WAN. Latency is still going to be key, which means we need to have really good processes to be able to move data around. As you said, govern it. Determine who has access to what, when, and under what conditions, and then allow it to be free. Allow people to bring their choice of tools, provision them how they need to, while providing that audit, compliance and control. And then again, as you need to provision data across those nodes for those use cases, do so in a well measured and governed way. I think that's sort of where things are going. But we keep using that term governance, I think that's so key, and there's nothing better than using open source software because that provides traceability, auditability and this, frankly, openness that allows you to say, "I don't like where this project's going. I want to go in a different direction." And it gives those enterprises a control over these platforms that they've never had before. >> Matt, thanks so much for the discussion. I really enjoyed it. Awesome perspectives. >> Well thank you for having me, Dave. Excellent conversation as always. Thanks for having me again. >> You're very welcome. And thank you for watching everybody. This is theCube's continuous coverage of HPE Discover 2021. Of course, the virtual version. Next year, we're going to be back live. My name is Dave Volante. Keep it right there. (upbeat music)

Published Date : Jun 22 2021

SUMMARY :

and ensure the productivity of data teams. Good to see you again, Dave. Where do you spend your time? and practices into the data world. and at the time a lot and practices to these types of workloads, and MapR always had the best product, the ability to be forgotten, and if I have to go through this the cloud to generally where it continues to gain momentum, the committers to support it. of cloud or the notion that is going to allow us in most of the enterprises I speak to. You got to be protected. and the big data was all the hot buzzword. of that is there's going to so I'm going to stick with it for a sec. and then allow it to be free. for the discussion. Well thank you for having me, Dave. Of course, the virtual version.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
DavePERSON

0.99+

Matt MaccauxPERSON

0.99+

MattPERSON

0.99+

Dave VolantePERSON

0.99+

HPORGANIZATION

0.99+

CytecORGANIZATION

0.99+

Next yearDATE

0.99+

two partsQUANTITY

0.99+

AWSORGANIZATION

0.99+

Zhamak DehghaniPERSON

0.99+

HPEORGANIZATION

0.99+

BlueDataORGANIZATION

0.99+

todayDATE

0.99+

HadoopTITLE

0.99+

12 factorQUANTITY

0.99+

each business unitQUANTITY

0.99+

GDPRTITLE

0.98+

GolangTITLE

0.98+

each groupQUANTITY

0.98+

EzmeralORGANIZATION

0.97+

threeQUANTITY

0.97+

zero trustQUANTITY

0.97+

single phone numberQUANTITY

0.96+

EzmeralPERSON

0.96+

singleQUANTITY

0.96+

oneQUANTITY

0.96+

sevenQUANTITY

0.95+

kumbayaORGANIZATION

0.95+

one thingQUANTITY

0.93+

Big DataTITLE

0.91+

two thingsQUANTITY

0.9+

theCubeORGANIZATION

0.9+

next 10 yearsDATE

0.89+

four dotQUANTITY

0.89+

first mantraQUANTITY

0.89+

last 10 yearsDATE

0.88+

Ezmeral SoftwareORGANIZATION

0.88+

one environmentQUANTITY

0.88+

MapRORGANIZATION

0.87+

ScytaleORGANIZATION

0.87+

next decadeDATE

0.86+

firstQUANTITY

0.86+

KubernetesTITLE

0.86+

SPIFFETITLE

0.84+

SPIRETITLE

0.83+

tier oneQUANTITY

0.82+

SparkTITLE

0.8+

five megabytes of codeQUANTITY

0.77+

KubeDirectorORGANIZATION

0.75+

one questionQUANTITY

0.74+

Single handQUANTITY

0.74+

yearsQUANTITY

0.73+

last decadeDATE

0.73+

2021DATE

0.73+

AzureTITLE

0.7+

Matt Maccaux


 

>>data by its very nature is distributed and siloed. But most data architectures today are highly centralized. Organizations are increasingly challenged to organize and manage data and turn that data into insights this idea of a single monolithic platform for data, it's giving way to new thinking. We're a decentralized approach with open cloud native principles and Federated governance will become an underpinning underpinning of digital transformations. Hi everybody, this is Day Volonte. Welcome back to HP discover 2021 the virtual version. You're watching the cubes continuous coverage of the event and we're here with Matt Mako is the field C T O for Israel software at H P E. And we're gonna talk about HP software strategy and esmeralda and specifically how to take a I analytics to scale and ensure the productivity of data teams. Matt, welcome to the cube. Good to see you. >>Good to see you again. Dave thanks for having me today. >>You're welcome. So talk a little bit about your role as CTO. Where do you spend your time? >>Yeah. So I spend about half of my time talking to customers and partners about where they are on their digital transformation journeys and where they struggle with this sort of last phase where we start talking about bringing those cloud principles and practices into the data world. How do I take those data warehouses, those data lakes, those distributed data systems into the enterprise and deploy them in a cloud like manner. And then the other half of my time is working with our product teams to feed that information back so that we can continually innovate to the next generation of our software platform. >>So when I remember I've been following HP and HP for a long, long time, the cube is documented. We go back to sort of when the company was breaking in two parts and at the time a lot of people were saying, oh HP is getting rid of the software business to get out of software. I said no, no, no hold on, they're really focusing and and the whole focus around hybrid cloud and and now as a service and so you're really retooling that business and sharpen your focus. So so tell us more about asthma, it's cool name. But what exactly is as moral software, >>I get this question all the time. So what is Israel? Israel is a software platform for modern data and analytics workloads using open source software components. And we came from some inorganic growth. We acquired a company called citing that brought us a zero trust approach to doing security with containers. We bought blue data who came to us with an orchestrator before kubernetes even existed in mainstream. They were orchestrating workloads using containers for some of these more difficult workloads, clustered applications, distributed applications like Hadoop. And then finally we acquired Map are which gave us this scale out, distributed file system and additional analytical capabilities. And so what we've done is we've taken those components and we've also gone out into the marketplace to see what open source projects exist, to allow us to bring those club principles and practices to these types of workloads so that we can take things like Hadoop and spark and Presto and deploy and orchestrate them using open source kubernetes, leveraging Gpu s while providing that zero trust approaches security. That's what Israel is all about. Is taking those cloud practices and principles but without locking you in again using those open source components where they exist and then committing and contributing back to the open source community where those projects don't exist. >>You know, it's interesting. Thank you for that history. And when I go back, I always been there since the early days of big data and Hadoop and so forth. The map are always had the best product. But but they can't get back then. It was like Kumbaya open source and they had this kind of proprietary system, but it worked and that's why it was the best product. And so at the same time they participated in open source projects because everybody that that's where the innovation is going. So you're making that really hard to use stuff easier to use with kubernetes orchestration. And then obviously I'm presuming with the open source chops, sort of leaning into the big trends that you're seeing in the marketplace. So my question is, what are those big trends that you're seeing when you speak to technology executives, which is a big part of what you do? >>Yeah. So the trends I think are a couple of fold and it's funny about Duke, I think the final nails in the coffin have been hammered in with the Hadoop space now. And so that that leading trend of of where organizations are going. We're seeing organizations wanting to go cloud first, but they really struggle with these data intensive workloads. Do I have to store my data in every cloud? Am I going to pay egress in every cloud? Well, what if my data scientists are most comfortable in AWS? But my data analysts are more comfortable in Azure. How do I provide that multi cloud experience for these data workloads? That's the number one question I get asked. And that's the probably the biggest struggle for these Chief Data Officers. Chief Digital Officer XYZ. How do I allow that innovation but maintaining control over my data compliance especially, we talk international standards like G. D. P. R. To restrict access to data, the ability to be forgotten in these multinational organizations. How do I sort of square all of those components and then how do I do that in a way that just doesn't lock me into another appliance or software vendors stack? I want to be able to work within the confines of the ecosystem. Use the tools that are out there but allow my organization to innovate in a very structured, compliant way. >>I mean I love this conversation. And just to me you hit on the key word which is organization. I want to I want to talk about what some of the barriers are. And again, you heard my wrap up front. I I really do think that we've created not only from a technology standpoint and yes, the tooling is important, but so is the organization. And as you said, you know, an analyst might want to work in one environment, a data scientist might want to work in another environment. The data may be very distributed. They maybe you might have situations where they're supporting the line of business. The line of business is trying to build new products. And if I have to go through this, hi this monolithic centralized organization, that's a barrier uh for me. And so we're seeing that change that kind of alluded to it upfront. But what do you see as the big, you know, barriers that are blocking this vision from becoming a reality? >>It very much is organization dave it's the technology is actually no longer the inhibitor here. We have enough technology, enough choices out there. That technology is no longer the issue. It's the organization's willingness to embrace some of those technologies and put just the right level of control around accessing that data because if you don't allow your data scientists and data analysts to innovate, they're going to do one of two things, they're either going to leave and then you have a huge problem keeping up with your competitors or they're gonna do it anyway, and they're gonna do it in a way that probably doesn't comply with the organizational standards. So the more progressive enterprises that I speak with have realized that they need to allow these various analytical users to choose the tools, they want to self provision those as they need to and get access to data in a secure and compliant way. And that means we need to bring the cloud to generally where the data is because it's a heck of a lot easier than trying to bring the data where the cloud is while conforming to those data principles. And that's, that's Hve strategy, you've heard it from our CEO for years now, everything needs to be delivered as a service. It's essential software that enables that capability, such as self service and secure data provisioning, etcetera. >>Again, I love this conversation because if you go back to the early days of the Duke, that was what was profound about. Do bring bring five megabytes of code, do a petabyte of data and it didn't happen. We shoved it all into a data lake and it became a data swamp. And so it's okay, you know, and that's okay. It's a one dato maybe maybe in data is is like data warehouses, data hubs data lake. So maybe this is now a four dot Oh, but we're getting there. Uh, so an open but open source one thing's for sure. It continues to gain momentum. It's where the innovation is. I wonder if you could comment on your thoughts on the role that open source software plays for large enterprises. Maybe some of the hurdles that are there, whether they're legal or licensing or or or just fears. How important is open source software today? >>I think the cloud native development, you know, following the 12 factor applications microservices based, pave the way over the last decade to make using open source technology tools and libraries mainstream, we have to tip our hats to red hat right for allowing organizations to embrace something. So core is an operating system within the enterprise. But what everyone realizes that its support, that's what has to come with that. So we can allow our data scientists to use open source libraries, packages and notebooks. But are we going to allow those to run in production? And so if the answer is no, then that if we can't get support, we're not going to allow that. So where HP es Merrill is taking the lead here is again embracing those open source capabilities, but if we deploy it, we're going to support it or we're going to work with the organization that has the committees to support it. You call HPD the same phone number you've been calling for years for tier 1 24 by seven support and we will support your kubernetes, your spark your presto your Hadoop ecosystem of components were that throat to choke and we'll provide all the way up to break fix support for some of these components and packages giving these large enterprises the confidence to move forward with open source but knowing that they have a trusted partner in which to do so >>and that's why we've seen such success with, say, for instance, managed services in the cloud or versus throwing out all the animals in the zoo and say, okay, figure it out yourself. But of course what we saw, which was kind of ironic was we, we saw people finally said, hey, we can do this in the cloud more easily. So that's where you're seeing a lot of data. A land. However, the definition of cloud or the notion of cloud is changing no longer. Is it just this remote set of services somewhere out there? In the cloud? Some data center somewhere. No, it's, it's moving on. Prem on prem is creating hybrid connections you're seeing, you know, co location facility is very proximate to the cloud. We're talking now about the edge, the near edge and the far edge deeply embedded, you know? And so that whole notion of cloud is, is changing. But I want to ask you, there's still a big push to cloud, everybody is a cloud first mantra. How do you see HP competing in this new landscape? >>I I think collaborating is probably a better word, although you could certainly argue if we're just leasing or renting hardware than it would be competition. But I think again, the workload is going to flow to where the data exists. So if the data is being generated at the edge and being pumped into the cloud, then cloud is prod, that's the production system. If the data is generated, the on system on premises systems, then that's where it's going to be executed, that's production. And so HBs approach is very much coexist, coexist model of if you need to do deaf tests in the cloud and bring it back on premises, fine or vice versa. The key here is not locking our customers and our prospective clients into any sort of proprietary stack, as we were talking about earlier, giving people the flexibility to move those workloads to where the data exists. That is going to allow us to continue to get share of wallet. Mindshare, continue to deploy those workloads and yes, there's going to be competition that comes along. Do you run this on a G C P or do you run it on a green lake on premises? Sure. We'll have those conversations. But again, if we're using open source software as the foundation for that, then actually where you run it is less relevant. >>So a lot of, there's a lot of choices out there when it comes to containers generally and kubernetes specifically, uh, you may have answered this, you get zero trust component, you've got the orchestrator, you've got the, the scale out, you know, peace. But I'm interested in hearing in your words why an enterprise would or should consider s morale instead of alternatives to kubernetes solutions? >>It's a fair question. And it comes up in almost every conversation. We already do kubernetes, so we have a kubernetes standard and that's largely true. And most of the enterprises I speak to their using one of the many on premises distributions of the cloud distributions and they're all fine. They're all fine for what they were built for. Israel was generally built for something a little different. Yes, everybody can run microservices based applications, devoPS based workloads, but where is Meryl is different is for those data intensive and clustered applications. Those sort of applications require a certain degree of network awareness, persistent storage etcetera, which requires either a significant amount of intelligence. Either you have to write in go lang or you have to write your own operators or Israel can be that easy button. We deploy those state full applications because we bring a persistent storage later that came from that bar we're really good at deploying those stable clustered applications and in fact we've open sourced that as a project cube director that came from Blue data and we're really good at securing these using spiffy inspire to ensure that there is that zero trust approach that came from side tail and we've wrapped all of that in kubernetes so now you can take the most difficult, gnarly, complex data intensive applications in your enterprise and deploy them using open source and if that means we have to coexist with an existing kubernetes distribution, that's fine. That's actually the most common scenario that I walk into is I start asking about what about these other applications you haven't done yet? The answer is usually we haven't gotten to him yet or we're thinking about it and that's when we talk about the capabilities of s role and I usually get the response, oh, a we didn't know you existed and be, well, let's talk about how exactly you do that. So again, it's more of a coexist model rather than a compete with model. Dave >>Well, that makes sense. I mean, I think again, a lot of people think, oh yeah, Kubernetes, no big deal, it's everywhere. But you're talking about a solution, I'm kind of taking a platform approach with capabilities, you've got to protect the data. A lot of times these microservices aren't some micro uh and things are happening really fast, You've got to be secure, you've got to be protected. And like you said, you've got a single phone number, you know, people say one throat to choke, Somebody said the other day said no, no single hand to shake, it's more of a partnership and I think that's a proposed for HPV met with your >>hair better. >>So you know, thinking about this whole, you know, we've gone through the pre big data days and the big data was all, you know, the hot buzz where people don't maybe necessarily use that term anymore, although the data is bigger and getting bigger, which is kind of ironic. Um where do you see this whole space going? We've talked about that sort of trends are breaking down the silos, decentralization. Maybe these hyper specialized roles that we've created maybe getting more embedded are lined with the line of business. How do you see it feels like the last, the next 10 years are going to be different than the last 10 years. How do you see it matt? >>I completely agree. I think we are entering this next era and I don't know if it's well defined, I don't know if I would go out on an edge to say exactly what the trend is going to be. But as you said earlier, data lakes really turned into data swamps. We ended up with lots of them in the enterprise and enterprises had to allow that to happen. They had to let each business unit or each group of users collect the data that they needed and I. T. Sort of had to deal with that down the road. And so I think the more progressive organizations are leading the way they are again taking those lessons from cloud and application developments, microservices and they're allowing a freedom of choice there, allowing data to move to where those applications are. And I think this decentralized approach is really going to be king. And you're gonna see traditional software packages, you're gonna see open source, you're going to see a mix of those. But what I think we'll probably be common throughout all of that is there's going to be this sense of automation, this sense that we can't just build an algorithm once released and then wish it luck that we've got to treat these these analytics and these these data systems as living things that there's life cycles that we have to support, which means we need to have devops for our data science. We need a ci cd for our data analytics. We need to provide engineering at scale like we do for software engineering. That's going to require automation and an organizational thinking process to allow that to actually occur. And so I think all of those things that sort of people process product, but it's all three of those things are going to have to come into play. But stealing those best ideas from cloud and application development, I think we're going to end up with probably something new over the next decade or so >>again, I'm loving this conversation so I'm gonna stick with it for a second. I it's hard to predict, but I'll some takeaways that I have matt from our conversation. I wonder if you could, you could comment. I think, you know, the future is more open source. You mentioned automation deV's are going to be key. I think governance as code, security designed in at the point of code creation is going to be critical. It's not no longer to be a bolt on and I don't think we're gonna throw away the data warehouse or the data hubs or the data lakes. I think they become a node. I like this idea and you know, jim octagon. But she has this idea of a global data mesh where these tools lakes, whatever their their node on the mesh, they're discoverable. They're shareable. They're they're governed uh in a way and that really I think the mistake a lot of people made early on in the big data movement, Oh we have data, we have to monetize our data as opposed to thinking about what products that I can I build that are based on data that then I can, you know, can lead to monetization. And I think and I think the other thing I would say is the business has gotten way too technical. All right. It's an alienated a lot of the business lines and I think we're seeing that change. Um and I think, you know, things like Edinburgh that simplify that are critical. So I'll give you the final thoughts based on my rent. >>I know you're ready to spot on. Dave. I think we we were in agreement about a lot of things. Governance is absolutely key. If you don't know where your data is, what it's used for and can apply policies to it, it doesn't matter what technology throw at it, you're going to end up in the same state that you're essentially in today with lots of swamps. Uh I did like that concept of of a note or a data mesh. It kind of goes back to the similar thing with a service smashed or a set of a P I is that you can use. I think we're going to have something similar with data that the trick is always how heavy is it? How easy is it to move about? And so I think there's always gonna be that latency issue. Maybe not within the data center, but across the land, latency is still going to be key, which means we need to have really good processes to be able to move data around. As you said, government determine who has access to what, when and under what conditions and then allow it to be free, allow people to bring their choice of tools, provision them how they need to while providing that audit compliance and control. And then again, as as you need to provision data across those notes for those use cases do so in a well measured and govern way. I think that's sort of where things are going. But we keep using that term governance. I think that's so key. And there's nothing better than using open source software because that provides traceability, the audit ability and this frankly openness that allows you to say, I don't like where this project is going. I want to go in a different direction and it gives those enterprises that control over these platforms that they've never had before. >>Matt. Thanks so much for the discussion. I really enjoyed it. Awesome perspectives. >>Well, thank you for having me. Dave are excellent conversation as always. Uh, thanks for having me again. >>All right. You're very welcome. And thank you for watching everybody. This is the cubes continuous coverage of HP discover 2021 of course, the virtual version next year. We're gonna be back live. My name is Dave a lot. Keep it right there. >>Yeah.

Published Date : Jun 2 2021

SUMMARY :

how to take a I analytics to scale and ensure the productivity of data Good to see you again. Where do you spend your time? innovate to the next generation of our software platform. We go back to sort of when the company was breaking in two parts and at the time gone out into the marketplace to see what open source projects exist, to allow us to bring those club that really hard to use stuff easier to use with kubernetes orchestration. the ability to be forgotten in these multinational organizations. And just to me you hit on the key word which is organization. they're either going to leave and then you have a huge problem keeping up with your competitors or they're gonna do it anyway, Again, I love this conversation because if you go back to the early days of the Duke, that was what was profound about. I think the cloud native development, you know, following the 12 factor How do you see HP competing in this new landscape? I I think collaborating is probably a better word, although you could certainly argue if we're just leasing or the scale out, you know, peace. And most of the enterprises I speak to their using And like you said, So you know, thinking about this whole, and I. T. Sort of had to deal with that down the road. I like this idea and you know, jim octagon. but across the land, latency is still going to be key, which means we need to have really good I really enjoyed it. Well, thank you for having me. And thank you for watching everybody.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
DavePERSON

0.99+

HPORGANIZATION

0.99+

Matt MaccauxPERSON

0.99+

MattPERSON

0.99+

HPDORGANIZATION

0.99+

Matt MakoPERSON

0.99+

two partsQUANTITY

0.99+

next yearDATE

0.99+

AWSORGANIZATION

0.99+

12 factorQUANTITY

0.99+

todayDATE

0.99+

each groupQUANTITY

0.99+

2021DATE

0.99+

threeQUANTITY

0.99+

each business unitQUANTITY

0.98+

MindshareORGANIZATION

0.98+

HadoopTITLE

0.98+

jim octagonPERSON

0.98+

singleQUANTITY

0.97+

single phone numberQUANTITY

0.96+

HP es MerrillORGANIZATION

0.96+

IsraelLOCATION

0.96+

next decadeDATE

0.95+

oneQUANTITY

0.95+

two thingsQUANTITY

0.95+

H P E.ORGANIZATION

0.93+

one environmentQUANTITY

0.92+

sevenQUANTITY

0.91+

firstQUANTITY

0.91+

first mantraQUANTITY

0.9+

zero trustQUANTITY

0.89+

next 10 yearsDATE

0.89+

last 10 yearsDATE

0.87+

sparkORGANIZATION

0.87+

EdinburghORGANIZATION

0.86+

single handQUANTITY

0.84+

Blue dataORGANIZATION

0.83+

yearsQUANTITY

0.82+

tier 1QUANTITY

0.82+

four dotQUANTITY

0.82+

zeroQUANTITY

0.8+

KubernetesORGANIZATION

0.78+

last decadeDATE

0.78+

five megabytes of codeQUANTITY

0.74+

XYZORGANIZATION

0.73+

PrestoORGANIZATION

0.72+

AzureTITLE

0.72+

MerylTITLE

0.7+

halfQUANTITY

0.7+

premORGANIZATION

0.67+

24OTHER

0.67+

about halfQUANTITY

0.63+

HveORGANIZATION

0.63+

KumbayaORGANIZATION

0.62+

VolontePERSON

0.61+

DukeORGANIZATION

0.61+

number one questionQUANTITY

0.61+

PremORGANIZATION

0.59+

esmeraldaPERSON

0.55+

secondQUANTITY

0.53+

IsraelORGANIZATION

0.52+

HPVOTHER

0.52+

spiffyORGANIZATION

0.51+

petabyteQUANTITY

0.5+

DayEVENT

0.49+

DukeLOCATION

0.45+

G.ORGANIZATION

0.36+

Ed Walsh, ChaosSearch | AWS re:Invent 2020 Partner Network Day


 

>> Narrator: From around the globe it's theCUBE, with digital coverage of AWS re:Invent 2020. Special coverage sponsored by AWS Global Partner Network. >> Hello and welcome to theCUBE Virtual and our coverage of AWS re:Invent 2020 with special coverage of APN partner experience. We are theCUBE Virtual and I'm your host, Justin Warren. And today I'm joined by Ed Walsh, CEO of ChaosSearch. Ed, welcome to theCUBE. >> Well thank you for having me, I really appreciate it. >> Now, this is not your first time here on theCUBE. You're a regular here and I've loved it to have you back. >> I love the platform you guys are great. >> So let's start off by just reminding people about what ChaosSearch is and what do you do there? >> Sure, the best way to say is so ChaosSearch helps our clients know better. We don't do that by a special wizard or a widget that you give to your, you know, SecOp teams. What we do is a hard work to give you a data platform to get insights at scale. And we do that also by achieving the promise of data lakes. So what we have is a Chaos data platform, connects and indexes data in a customer's S3 or glacier accounts. So inside your data lake, not our data lake but renders that data fully searchable and available for analysis using your existing tools today 'cause what we do is index it and publish open API, it's like API like Elasticsearch API, and soon SQL. So give you an example. So based upon those capabilities were an ideal replacement for a commonly deployed, either Elasticsearch or ELK Stack deployments, if you're hitting scale issues. So we talk about scalable log analytics, and more and more people are hitting these scale issues. So let's say if you're using Elasticsearch ELK or Amazon Elasticsearch, and you're hitting scale issues, what I mean by that is like, you can't keep enough retention. You want longer retention, or it's getting very expensive to keep that retention, or because the scale you hit where you have availability, where the cluster is hard to keep up running or is crashing. That's what we mean by the issues at scale. And what we do is simply we allow you, because we're publishing the open API of Elasticsearch use all your tools, but we save you about 80% off your monthly bill. We also give you an, and it's an and statement and give you unlimited retention. And as much as you want to keep on S3 or into Glacier but we also take care of all the hassles and management and the time to manage these clusters, which ends up being on a database server called leucine. And we take care of that as a managed service. And probably the biggest thing is all of this without changing anything your end users are using. So we include Kibana, but imagine it's an Elastic API. So if you're using API or Kibana, it's just easy to use the exact same tools used today, but you get the benefits of a true data lake. In fact, we're running now Elasticsearch on top of S3 natively. If that makes it sense. >> Right and natively is pretty cool. And look, 80% savings, is a dramatic number, particularly this year. I think there's a lot of people who are looking to save a few quid. So it'd be very nice to be able to save up to 80%. I am curious as to how you're able to achieve that kind of saving though. >> Yeah, you won't be the first person to ask me that. So listen, Elastic came around, it was, you know we had Splunk and we also have a lot of Splunk clients, but Elastic was a more cost effective solution open source to go after it. But what happens is, especially at scale, if it's fall it's actually very cost-effective. But underneath last six tech ELK Stack is a leucine database, it's a database technology. And that sits on our servers that are heavy memory count CPU count in and SSDs. So you can do on-prem or even in the clouds, so if you do an Amazon, basically you're spinning up a server and it stays up, it doesn't spin up, spin down. So those clusters are not one server, it's a cluster of those servers. And typically if you have any scale you're actually having multiple clusters because you don't dare put it on one, for different use cases. So our savings are actually you no longer need those servers to spin up and you don't need to pay for those seen underneath. You can still use Kibana under API but literally it's $80 off your bill that you're paying for your service now, and it's hard dollars. So it's not... And we typically see clients between 70 and 80%. It's up to 80, but it's literally right within a 10% margin that you're saving a lot of money, but more importantly, saving money is a great thing. But now you have one unified data lake that you can have. You used to go across some of the data or all the data through the role-based access. You can give different people. Like we've seen people who say, hey give that, help that person 40 days of this data. But the SecOp up team gets to see across all the different law. You know, all the machine generated data they have. And we can give you a couple of examples of that and walk you through how people deploy if you want. >> I'm always keen to hear specific examples of how customers are doing things. And it's nice that you've thought of drawn that comparison there around what what cloud is good for and what it isn't is. I'll often like to say that AWS is cheap to fail in, but expensive to succeed. So when people are actually succeeding with this and using this, this broad amount of data so what you're saying there with that savings I've actually got access to a lot more data that I can do things with. So yeah, if you could walk through a couple of examples of what people are doing with this increased amount of data that they have access to in EKL Search, what are some of the things that people are now able to unlock with that data? >> Well, literally it's always good for a customer size so we can go through and we go through it however it might want, Kleiner, Blackboard, Alert Logic, Armor Security, HubSpot. Maybe I'll start with HubSpot. One of our good clients, they were doing some Cloud Flare data that was one of their clusters they were using a lot to search for. But they were looking at to look at a denial service. And they were, we find everyone kind of at scale, they get limited. So they were down to five days retention. Why? Well, it's not that they meant to but basically they couldn't cost-effectively handle that in the scale. And also they're having scale issues with the environment, how they set the cluster and sharding. And when they also denial service tech, what happened that's when the influx of data that is one thing about scale is how fast it comes out, yet another one is how much data you have. But this is as the data was coming after them at denial service, that's when the cluster would actually go down believe it or not, you know right. When you need your log analysis tools. So what we did is because they're just using Kibana, it was easy swap. They ran in parallel because we published the open API but we took them from five days to nine days. They could keep as much as they want but nine days for denial services is what they wanted. And then we did save them in over $4 million a year in hard dollars, What they're paying in their environment from really is the savings on the server farm and a little bit on the Elasticsearch Stack. But more importantly, they had no outages since. Now here's the thing. Are you talking about the use case? They also had other clusters and you find everyone does it. They don't dare put it on one cluster, even though these are not one server, they're multiple servers. So the next use case for CloudFlare was one, the next QS and it was a 10 terabyte a day influx kept it for 90 days. So it's about a petabyte. They brought another use case on which was NetMon, again, Network Monitoring. And again, I'm having the same scale issue, retention area. And what they're able to do is easily roll that on. So that's one data platform. Now they're adding the next one. They have about four different use cases and it's just different clusters able to bring together. But now what they're able to do give you use cases either they getting more cost effective, more stability and freedom. We say saves you a lot of time, cost and complexity. Just the time they manage that get the data in the complexities around it. And then the cost is easy to kind of quantify but they've got better but more importantly now for particular teams they only need their access to one data but the SecOP team wants to see across all the data. And it's very easy for them to see across all the data where before it was impossible to do. So now they have multiple large use cases streaming at them. And what I love about that particular case is at one point they were just trying to test our scale. So they started tossing more things at it, right. To see if they could kind of break us. So they spiked us up to 30 terabytes a day which is for Elastic would even 10 terabytes a day makes things fall over. Now, if you think of what they just did, what were doing is literally three steps, put your data in S3 and as fast as you can, don't modify, just put it there. Once it's there three steps connect to us, you give us readability access to those buckets and a place to write the indexy. All of that stuff is in your S3, it never comes out. And then basically you set up, do you want to do live or do you want to do real time analysis? Or do you want to go after old data? We do the rest, we ingest, we normalize the schema. And basically we give you our back and the refinery to give the right people access. So what they did is they basically throw a whole bunch of stuff at it. They were trying to outrun S3. So, you know, we're on shoulders of giants. You know, if you think about our platform for clients what's a better dental like than S3. You're not going to get a better cross curve, right? You're not going to get a better parallelism. And so, or security it's in your, you know a virtual environment. But if you... And also you can keep data in the right location. So Blackboard's a good example. They need to keep that in all the different regions and because it's personal data, they, you know, GDPR they got to keep data in that location. It's easy, we just put compute in each one of the different areas they are. But the net net is if you think that architecture is shoulders of giants if you think you can outrun by just sheer volume or you can put in more cost-effective place to keep long-term or you think you can out store you have so much data that S3 and glacier can't possibly do it. Then you got me at your bigger scale at me but that's the scale we'r&e talking about. So if you think about the spiked our throughput what they really did is they try to outrun S3. And we didn't pick up. Now, the next thing is they tossed a bunch of users at us which were just spinning up in our data fabric different ways to do the indexing, to keep up with it. And new use cases in case they're going after everyone gets their own worker nodes which are all expected to fail in place. So again, they did some of that but really they're like you guys handled all the influx. And if you think about it, it's the shoulders of giants being on top of an Amazon platform, which is amazing. You're not going to get a more cost effective data lake in the world, and it's continuing to fall in price. And it's a cost curve, like no other, but also all that resiliency, all that security and the parallelism you can get, out of an S3 Glacier is just a bar none is the most scalable environment, you can build an environment. And what we do is a thin layer. It's a data platform that allows you to have your data now fully searchable and queryable using your tools >> Right and you, you mentioned there that, I mean you're running in AWS, which has broad experience in doing these sorts of things at scale but on that operational management side of things. As you mentioned, you actually take that off, off the hands of customers so that you run it on their behalf. What are some of the areas that you see people making in trying to do this themselves, when you've gone into customers, and brought it into the EKL Search platform? >> Yeah, so either people are just trying their best to build out clusters of Elasticsearch or they're going to services like Logz.io, Sumo Logic or Amazon Elasticsearch services. And those are all basically on the same ELK Stack. So they have the exact same limits as the same bits. Then we see people trying to say, well I really want to go to a data lake. I want to get away from these database servers and which have their limits. I want to use a data Lake. And then we see a lot of people putting data into environments before they, instead of using Elasticsearch, they want to use SQL type tools. And what they do is they put it into a Parquet or Presto form. It's a Presto dialect, but it into Parquet and structure it. And they go a lot of other way to, Hey it's in the data lake, but they end up building these little islands inside their data lake. And it's a lot of time to transform the data, to get it in a format that you can go after our tools. And then what we do is we don't make you do that. Just literally put the data there. And then what we do is we do the index and a polish API. So right now it's Elasticsearch in a very short time we'll publish Presto or the SQL dialect. You can use the same tool. So we do see people, either brute forcing and trying their best with a bunch of physical servers. We do see another group that says, you know, I want to go use an Athena use cases, or I want to use a there's a whole bunch of different startups saying, I do data lake or data lake houses. But they are, what they really do is force you to put things in the structure before you get insight. True data lake economics is literally just put it there, and use your tools natively to go after it. And that's where we're unique compared to what we see from our competition. >> Hmm, so with people who have moved into ChaosSearch, what's, let's say pick one, if you can, the most interesting example of what people have started to do with, with their data. What's new? >> That's good. Well, I'll give you another one. And so Armor Security is a good one. So Armor Security is a security service company. You know, thousands of clients doing great I mean a beautiful platform, beautiful business. And they won Rackspace as a partner. So now imagine thousand clients, but now, you know massive scale that to keep up with. So that would be an example but another example where we were able to come in and they were facing a major upgrade of their environment just to keep up, and they expose actually to their customers is how their customers do logging analytics. What we're able to do is literally simply because they didn't go below the API they use the exact same tools that are on top and in 30 days replaced that use case, save them tremendous amount of dollars. But now they're able to go back and have unlimited retention. They used to restrict their clients to 14 days. Now they have an opportunity to do a bunch of different things, and possible revenue opportunities and other. But allow them to look at their business differently and free up their team to do other things. And now they're, they're putting billing and other things into the same environment with us because one is easy it's scale but also freed up their team. No one has enough team to do things. And then the biggest thing is what people do interesting with our product is actually in their own tools. So, you know, we talk about Kibana when we do SQL again we talk about Looker and Tableau and Power BI, you know, the really interesting thing, and we think we did the hard work on the data layer which you can say is, you know I can about all the ways you consolidate the performance. Now, what becomes really interesting is what they're doing at the visibility level, either Kibana or the API or Tableau or Looker. And the key thing for us is we just say, just use the tools you're used to. Now that might be a boring statement, but to me, a great value proposition is not changing what your end users have to use. And they're doing amazing things. They're doing the exact same things they did before. They're just doing it with more data at bigger scale. And also they're able to see across their different machine learning data compared to being limited going at one thing at a time. And that getting the correlation from a unified data lake is really what we, you know we get very excited about. What's most exciting to our clients is they don't have to tell the users they have to use a different tool, which, you know, we'll decide if that's really interesting in this conversation. But again, I always say we didn't build a new algorithm that you going to give the SecOp team or a new pipeline cool widget that going to help the machine learning team which is another API we'll publish. But basically what we do is a hard work of making the data platform scalable, but more importantly give you the APIs that you're used to. So it's the platform that you don't have to change what your end users are doing, which is a... So we're kind of invisible behind the scenes. >> Well, that's certainly a pretty strong proposition there and I'm sure that there's plenty of scope for customers to come and and talk to you because no one's creating any less data. So Ed, thanks for coming out of theCUBE. It's always great to see you here. >> Know, thank you. >> You've been watching theCUBE Virtual and our coverage of AWS re:Invent 2020 with special coverage of APN partner experience. Make sure you check out all our coverage online, either on your desktop, mobile on your phone, wherever you are. I've been your host, Justin Warren. And I look forward to seeing you again soon. (soft music)

Published Date : Dec 3 2020

SUMMARY :

the globe it's theCUBE, and our coverage of AWS re:Invent 2020 Well thank you for having me, loved it to have you back. and the time to manage these clusters, be able to save up to 80%. And we can give you a So yeah, if you could walk and the parallelism you can get, that you see people making it's in the data lake, but they end up what's, let's say pick one, if you can, I can about all the ways you It's always great to see you here. And I look forward to

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Justin WarrenPERSON

0.99+

Ed WalshPERSON

0.99+

$80QUANTITY

0.99+

40 daysQUANTITY

0.99+

five daysQUANTITY

0.99+

Ed WalshPERSON

0.99+

90 daysQUANTITY

0.99+

AmazonORGANIZATION

0.99+

AWS Global Partner NetworkORGANIZATION

0.99+

nine daysQUANTITY

0.99+

80%QUANTITY

0.99+

10 terabytesQUANTITY

0.99+

thousandsQUANTITY

0.99+

AWSORGANIZATION

0.99+

HubSpotORGANIZATION

0.99+

EdPERSON

0.99+

10%QUANTITY

0.99+

ElasticsearchTITLE

0.99+

30 daysQUANTITY

0.99+

Armor SecurityORGANIZATION

0.99+

14 daysQUANTITY

0.99+

thousand clientsQUANTITY

0.99+

BlackboardORGANIZATION

0.99+

KleinerORGANIZATION

0.99+

S3TITLE

0.99+

OneQUANTITY

0.99+

Alert LogicORGANIZATION

0.99+

three stepsQUANTITY

0.98+

oneQUANTITY

0.98+

GDPRTITLE

0.98+

one thingQUANTITY

0.98+

one dataQUANTITY

0.98+

one serverQUANTITY

0.98+

ElasticTITLE

0.98+

70QUANTITY

0.98+

SQLTITLE

0.98+

about 80%QUANTITY

0.97+

KibanaTITLE

0.97+

first timeQUANTITY

0.97+

over $4 million a yearQUANTITY

0.97+

one clusterQUANTITY

0.97+

first personQUANTITY

0.97+

CloudFlareTITLE

0.97+

ChaosSearchORGANIZATION

0.97+

this yearDATE

0.97+

GlacierTITLE

0.97+

up to 80%QUANTITY

0.97+

ParquetTITLE

0.96+

each oneQUANTITY

0.95+

SplunkORGANIZATION

0.95+

Sumo LogicORGANIZATION

0.94+

up to 80QUANTITY

0.94+

Power BITITLE

0.93+

todayDATE

0.93+

RackspaceORGANIZATION

0.92+

up to 30 terabytes a dayQUANTITY

0.92+

one pointQUANTITY

0.91+

S3 GlacierCOMMERCIAL_ITEM

0.91+

Elastic APITITLE

0.89+

Yaron Haviv, Iguazio | theCUBE NYC 2018


 

Live from New York It's theCUBE! Covering theCUBE New York City 2018 Brought to you by Silicon Angle Media and it's ecosystem partners >> Hey welcome back and we're live in theCUBE in New York city. It's our 2nd day of two days of coverage CUBE NYC. The hashtag CUBENYC Formerly Big data NYC renamed because it's about big data, it's about the server, it's about Cooper _________'s multi-cloud data. It's all about data, and that's the fundamental change in the industry. Our next guest is Yaron Haviv, who's the CTO of Iguazio, key alumni, always coming out with some good commentary smart analysis. Kind of a guest host as well as an industry participant supplier. Welcome back to theCUBE. Good to see you. >> Thank you John. >> Love having you on theCUBE because you always bring some good insight and we appreciate that. Thank you so much. First, before we get into some of the comments because I really want to delve into comments that David Richards said a few years ago, CEO of RenDisco. He said, "Cloud's going to kill Hadoop". And people were looking at him like, "Oh my God, who is this heretic? He's crazy. What is he talking about?" But you might not need Hadoop, if you can run server less Spark, Tensorflow.... You talk about this off camera. Is Hadoop going to be the open stack of the big data world? >> I don't think cloud necessary killed Hadoop, although it is working on that, you know because you go to Amazon and you know, you can consume a bunch of services and you don't really need to think about Hadoop. I think cloud native serve is starting to kill Hadoop, cause Hadoop is three layers, you know, it's a file system, it's DFS, and then you have server scheduling Yarn, then you have applications starting with map produce and then you evolve into things like Spark. Okay, so, file system I don't really need in the cloud. I use Asfree, I can use a database as a service, as you know, pretty efficient way of storing data. For scheduling, Kubernetes is a much more generic way of scheduling workloads and not confined to Spark and specific workloads. I can run with Dancerflow, I can run with data science tools, etc., just containerize. So essentially, why would I need Hadoop? If I can take the traditional tools people are now evolving in and using like Jupiter Notebooks, Spark, Dancerflow, you know, those packages with Kubernetes on top of a database as a service and some object store, I have a much easier stack to work with. And I could mobilize that whether it's in the cloud, you know on different vendors. >> Scale is important too. How do you scale it? >> Of course, you have independent scaling between data and computation, unlike Hadoop. So I can just go to Google, and use Vquery, or use, you know, DynamoDB on Amazon or Redchick, or whatever and automatically scale it down and then, you know >> That's a unique position, so essentially, Hadoop versus Kubernetes is a top-line story. And wouldn't that be ironic for Google, because Google essentially created Map Produce and Coudera ran with it and went public, but when we're talking about 2008 timeframe, 2009 timeframe, back when ventures with cloud were just emerging in the mainstream. So wouldn't it be ironic Kubernetes, which is being driven by Google, ends up taking over Hadoop? In terms of running things on Kubernetes and cloud eight on Visa Vis on premise with Hadoop. >> The poster is tend to give this comment about Google, but essentially Yahoo started Hadoop. Google started the technology  and couple of years after Hadoop started, with Google they essentially moved to a different architecture, with something called Percolator. So Google's not too associated with Hadoop. They're not really using this approach for a long time. >> Well they wrote the map-produced paper and the internal conversations we report on theCUBE about Google was, they just let that go. And Yahoo grabbed it. (cross-conversation) >> The companies that had the most experience were the first to leave. And I think it may respect what you're saying. As the marketplace realizes the outcomes of the dubious associate with, they will find other ways of achieving those outcomes. It might be more depth. >> There's also a fundamental shift in the consumption where Hadoop was about a ranking pages in a batch form. You know, just collecting logs and ranking pages, okay. The chances that people have today revolve around applying AI to business application. It needs to be a lot more concurring, transactional, real-time ish, you know? It's nothing to do with Hadoop, okay? So that's why you'll see more and more workers, mobilizing different black server functions, into service pre-canned services, etc. And Kubernetes playing a good role here is providing the trend. Transport for migrating workloads across cloud providers, because I can use GKE, the Google Kubenetes, or Amazon Kubernetes, or Azure Kubernetes, and I could write a similar application and deploy it on any cloud, or on Clam on my own private cluster. It makes the infrastructure agnostic really application focused. >> Question about Kubernetes we heard on theCUBE earlier, the VP of Project BlueData said that Kubernetes ecosystem and community needs to do a better job with Stapla, they nailed Stapflalis, Stafle application support is something that they need help on. Do you agree with that comment, and then if so, what alternatives do you have for customers who care about Stafe? >> They should use our product (laughing) >> (mumbling) Is Kubernetes struggling there? And if so, talk about your product >> So, I think that our challenge is rounded that there are many solutions in that. I think that they are attacking it from a different approach Many of them are essentially providing some block storage to different containers on really cloud 90. What you want to be able is to have multiple containers access the same data. That means either sharing through file systems, for objects or through databases because one container is generating, for example, ingestion or __________. Another container is manipulating that same data. A third container may look for something in the data, and generate a trigger or an action. So you need shared access to data from those containers. >> The rest of the data synchronizes all three of those things. >> Yes because the data is the form of state. The form of state cannot be associated with the same container, which is what most of where I am very active and sincere in those committees, and you have all the storage guys in the committees, and they think the block story just drag solution. Cause they still think like virtual machines, okay? But the general idea is that if you think about Kubernetes is like the new OS, where you have many processes, they're just scattered around. In OS, the way for us to share state between processes an OS, is whether through files, or through databases, in those form. And that's really what >> Threads and databases as a positive engagement. >> So essentially I gave maybe two years ago, a session at KubeCon in Europe about what we're doing on storing state. It's really high-performance access from those container processes to our database. Impersonate objects, files, streams or time series data, etc And then essentially, all those workloads just mount on top of and we can all share stape. We can even control the access for each >> Do you think you nailed the stape problem? >> Yes, by the way, we have a managed service. Anyone could go today to our cloud, to our website, that's in our cloud. It gets it's own Kubernetes cluster, a provision within less than 10 minutes, five to 10 minutes. With all of those services pre-integrated with Spark, Presto, ______________, real-time, these services functions. All that pre-configured on it's own time. I figured all of these- >> 100% compatible with Kubernetes, it's a good investment >> Well we're just expanding it to Kubernetes stripes, now it's working on them, Amazon Kubernetes, EKS I think, we're working on AKS and GK. We partner with Azure and Google. And we're also building an ad solution that is essentially exactly the same stock. Can run on an edge appliance in a factory. You can essentially mobilize data and functions back and forth. So you can go and develop your work loads, your application in the cloud, test it under simulation, push a single button and teleport the artifacts into the edge factory. >> So is it like a real-time Kubernetes? >> Yes, it's a real-time Kubernetes. >> If you _______like the things we're doing, it's all real-time. >> Talk about real-time in the database world because you mentioned time-series databases. You give objects store versus blog. Talk about time series. You're talking about data that is very relevant in the moment. And also understanding time series data. And then, it's important post-event, if you will, meaning How do you store it? Do you care? I mean, it's important to manage the time series. At the same time, it might not be as valuable as other data, or valuable at certain points and time, which changes it's relationship to how it's stored and how it's used. Talk about the dynamic of time series.. >> Figured it out in the last six or 12 months that since real-time is about time series. Everything you think about real-time censored data, even video is a time-series of frames, okay And what everyone wants to do is just huge amount of time series. They want to cross-correlate it, because for example, you think about stock tickers you know, the stock has an impact from news feeds or Twitter feeds, or of a company or a segment. So essentially, what they need to do is something called multi-volume analysis of multiple time series to be able to extract some meaning, and then decide if you want to sell or buy a stock, as in vacation example. And there is a huge gap in the solution in that market, because most of the time series databases were designed for operational databases, you know, things that monitor apps. Nothing that injects millions of data points per second, and cross-correlates and run real-time AI analytics. Ah, so we've essentially extended because we have a programmable database essentially under the hoop. We've extended it to support time series data with about 50 to 1 compression ratio, compared to some other solutions. You know we've break with the customer, we've done sizing, they told them us they need half a pitabyte. After a small sizing exercise, about 10 to 20 terabytes of storage for the same data they stored in Kassandra for 500 terabytes. No huge ingestion rates, and what's very important, we can do an in-flight with all those cross-correlations, so, that's something that's working very well for us. >> This could help on smart mobility. Kenex 5G comes on, certainly. Intelligent edge. >> So the customers we have, these cases that we applied right now is in financial services, two or three main applications. One is tick data and analytics, everyone wants to be smarter learning on how to buy and sell stocks or manage risk, the second one is infrastructure, monitoring, critical infrastructure, monitoring is SLA monitoring is be able to monitor network devices, latencies, applications, you now, transaction rate, or that, be able to predict potential failures or escalation We have similar applications; we have about three Telco customers using it for real-time time. Series analytics are metric data, cybersecurity attacks, congestion avoidance, SLA management, and also automotive. Fleet management, file linking, they are also essentially feeding huge data sets of time series analytics. They're running cross-correlation and AI logic, so now they can generate triggers. Now apply to Hadoop. What does Hadoop have anything to do with those kinds of applications? They cannot feed huge amounts of datasets, they cannot react in real-time, doesn't store time-series efficiently. >> Hapoop (laughing) >> You said that. >> Yeah. That's good. >> One, I know we don't have a lot of time left. We're running out of time, but I want to make sure we get this out here. How are you engaging with customers? You guys got great technical support. We can vouch for the tech chops that you guys have. We seen the solution. If it's compatible to Kubernetes, certainly this is an alternative to have really great analytical infrastructure. Cloud native, goodness of your building, You do PFC's, they go to your website, and how do you engage, how do you get deals? How do people work with you? >> So because now we have a cloud service, so also we engage through the cloud. Mainly, we're going after customers and leads, or from webinars and activities on the internet, and we sort of follow-up with those customers, we know >> Direct sales? >> Direct sales, but through lead generation mechanism. Marketplace activity, Amazon, Azure, >> Partnerships with Azure and Google now. And Azure joint selling activities. They can actually resale and get compensated. Our solution is an edge for Azure. Working on similar solution for Google. Very focused on retailers. That's the current market focus of since you think about stores that have a single supermarket will have more than a 1,000 cameras. Okay, just because they're monitoring shelves in real-time, think about Amazon go, kind of replication. Real-time inventory management. You cannot push a 1,000 camera feeds into the cloud. In order to analyze it then decide on inventory level. Proactive action, so, those are the kind of applications. >> So bigger deals, you've had some big deals. >> Yes, we're really not a raspberry pie-kind of solution. That's where the bigger customers >> Got it. Yaron, thank you so much. The CTO of Iguazio Check him out. It's actually been great commentary. The Hadoop versus Kubernetes narrative. Love to explore that further with you. Stay with us for more coverage after this short break. We're live in day 2 of CUBE NYC. Par Strata, Hadoop Strata, Hadoop World. CUBE Hadoop World, whatever you want to call it. It's all because of the data. We'll bring it to ya. Stay with us for more after this short break. (upbeat music)

Published Date : Sep 13 2018

SUMMARY :

It's all about data, and that's the fundamental change Love having you on theCUBE because you always and then you evolve into things like Spark. How do you scale it? and then, you know and cloud eight on Visa Vis on premise with Hadoop. Google started the technology and couple of years and the internal conversations we report on theCUBE The companies that had the most experience It's nothing to do with Hadoop, okay? and then if so, what alternatives do you have for So you need shared access to data from those containers. The rest of the data synchronizes is like the new OS, where you have many processes, We can even control the access for each Yes, by the way, we have a managed service. So you can go and develop your work loads, your application If you And then, it's important post-event, if you will, meaning because most of the time series databases were designed for This could help on smart mobility. So the customers we have, and how do you engage, how do you get deals? and we sort of follow-up with those customers, we know Direct sales, but through lead generation mechanism. since you think about stores that have Yes, we're really not a raspberry pie-kind of solution. It's all because of the data.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
JohnPERSON

0.99+

Lisa MartinPERSON

0.99+

Ed MacoskyPERSON

0.99+

Darren AnthonyPERSON

0.99+

Yaron HavivPERSON

0.99+

Mandy DollyPERSON

0.99+

Mandy DhaliwalPERSON

0.99+

David RichardsPERSON

0.99+

Suzi JewettPERSON

0.99+

AmazonORGANIZATION

0.99+

AWSORGANIZATION

0.99+

John FurrierPERSON

0.99+

HPORGANIZATION

0.99+

twoQUANTITY

0.99+

2.9 timesQUANTITY

0.99+

DarrenPERSON

0.99+

GoogleORGANIZATION

0.99+

SuziPERSON

0.99+

Silicon Angle MediaORGANIZATION

0.99+

RenDiscoORGANIZATION

0.99+

2009DATE

0.99+

Suzie JewittPERSON

0.99+

HPEORGANIZATION

0.99+

2022DATE

0.99+

YahooORGANIZATION

0.99+

LisaPERSON

0.99+

2008DATE

0.99+

AKSORGANIZATION

0.99+

Las VegasLOCATION

0.99+

500 terabytesQUANTITY

0.99+

60%QUANTITY

0.99+

2021DATE

0.99+

HadoopTITLE

0.99+

1,000 cameraQUANTITY

0.99+

oneQUANTITY

0.99+

18,000 customersQUANTITY

0.99+

fiveQUANTITY

0.99+

AmsterdamLOCATION

0.99+

2030DATE

0.99+

OneQUANTITY

0.99+

HIPAATITLE

0.99+

tomorrowDATE

0.99+

2026DATE

0.99+

YaronPERSON

0.99+

two daysQUANTITY

0.99+

EuropeLOCATION

0.99+

FirstQUANTITY

0.99+

todayDATE

0.99+

TelcoORGANIZATION

0.99+

bothQUANTITY

0.99+

threeQUANTITY

0.99+

Brent Compton, Red Hat | theCUBE NYC 2018


 

>> Live from New York, it's theCUBE, covering theCUBE New York City 2018. Brought to you by SiliconANGLE Media and its ecosystem partners. >> Hello, everyone, welcome back. This is theCUBE live in New York City for theCUBE NYC, #CUBENYC. This is our ninth year covering the big data ecosystem, which has now merged into cloud. All things coming together. It's really about AI, it's about developers, it's about operations, it's about data scientists. I'm John Furrier, my co-host Dave Vellante. Our next guest is Brent Compton, Technical Marketing Director for Storage Business at Red Hat. As you know, we cover Red Hat Summit and great to have the conversation. Open source, DevOps is the theme here. Brent, thanks for joining us, thanks for coming on. >> My pleasure, thank you. >> We've been talking about the role of AI and AI needs data and data needs storage, which is what you do, but if you look at what's going on in the marketplace, kind of an architectural shift. It's harder to find a cloud architect than it is to find diamonds these days. You can't find a good cloud architect. Cloud is driving a lot of the action. Data is a big part of that. What's Red Hat doing in this area and what's emerging for you guys in this data landscape? >> Really, the days of specialists are over. You mentioned it's more difficult to find a cloud architect than find diamonds. What we see is the infrastructure, it's become less about compute as storage and networking. It's the architect that can bring the confluence of those specialties together. One of the things that we see is people bringing their analytics workloads onto the common platforms where they've been running the rest of their enterprise applications. For instance, if they're running a lot of their enterprise applications on AWS, of course, they want to run their analytics workloads in AWS and that's EMRs long since in the history books. Likewise, if they're running a lot of their enterprise applications on OpenStack, it's natural that they want to run a lot of their analytics workloads on the same type of dynamically provisioned infrastructure. Emerging, of course, we just announced on Monday this week with Hortonworks and IBM, if they're running a lot of their enterprise applications on a Kubernetes substrate like OpenShift, they want to run their analytics workloads on that same kind of agile infrastructure. >> Talk about the private cloud impact and hybrid cloud because obviously we just talked to the CEO of Hortonworks. Normally it's about early days, about Hadoop, data legs and then data planes. They had a good vision. They're years into it, but I like what Hortonworks is doing. But he said Kubernetes, on a data show Kubernetes. Kubernetes is a multi-cloud, hybrid cloud concept, containers. This is really enabling a lot of value and you guys have OpenShift which became very successful over the past few years, the growth has been phenomenal. So congratulations, but it's pointing to a bigger trend and that is that the infrastructure software, the platform as a service is becoming the middleware, the glue, if you will, and Kubernetes and containers are facilitating a new architecture for developers and operators. How important is that with you guys, and what's the impact of the customer when they think, okay I'm going to have an agile DevOps environment, workload portability, but do I have to build that out? You mentioned people don't have to necessarily do that anymore. The trend has become on-premise. What's the impact of the customer as they hear Kubernetese and containers and the data conversation? >> You mentioned agile DevOps environment, workload portability so one of the things that customers come to us for is having that same thing, but infrastructure agnostic. They say, I don't want to be locked in. Love AWS, love Azure, but I don't want to be locked into those platforms. I want to have an abstraction layer for my Kubernetese layer that sits on top of those infrastructure platforms. As I bring my workloads, one-by-one, custom DevOps from a lift and shift of legacy apps onto that substrate, I want to have it be independent, private cloud or public cloud and, time permitting, we'll go into more details about what we've seen happening in the private cloud with analytics as well, which is effectively what brought us here today. The pattern that we've discovered with a lot of our large customers who are saying, hey, we're running OpenStack, they're large institutions that for lots of reasons they store a lot of their data on-premises saying, we want to use the utility compute model that OpenStack gives us as well as the shared data context that Ceph gives us. We want to use that same thing for our analytics workload. So effectively some of our large customers taught us this program. >> So they're building infrastructure for analytics essentially. >> That's what it is. >> One of the challenges with that is the data is everywhere. It's all in silos, it's locked in some server somewhere. First of all, am I overstating that problem and how are you seeing customers deal with that? What are some of the challenges that they're having and how are you guys helping? >> Perfect lead in, in fact, one of our large government customers, they recently sent us an unsolicited email after they deployed the first 10 petabytes in a deca petabyte solution. It's OpenStack based as well as Ceph based. Three taglines in their email. The first was releasing the lock on data. The second was releasing the lock on compute. And the third was releasing the lock on innovation. Now, that sounds a bit buzzword-y, but when it comes from a customer to you. >> That came from a customer? Sounds like a marketing department wrote that. >> In the details, as you know, traditional HDFS clusters, traditional Hadoop clusters, sparklers or whatever, HDFS is not shared between clusters. One of our large customers has 50 plus analytics clusters. Their data platforms team employ a maze of scripts to copy data from one cluster to the other. And if you are a scientist or an engineer, you'd say, I'm trying to obtain these types of answers, but I need access to data sets A, B, C, and D, but data sets A and B are only on this cluster. I've got to go contact the data platforms team and have them copy it over and ensure that it's up-to-date and in sync so it's messy. >> It's a nightmare. >> Messy. So that's why the one customer said releasing the lock on data because now it's in a shared. Similar paradigm as AWS with EMR. The data's in a shared context, an S3. You spin up your analytics workloads on AC2. Same paradigm discussion as with OpenStack. Your spinning up your analytics workloads via OpenStack virtualization and their sourcing is shared data context inside of Ceph, S3 compatible Ceph so same architecture. I love his last bit, the one that sounds the most buzzword-y which was releasing lock on innovation. And this individual, English was not this person's first language so love the word. He said, our developers no longer fear experimentation because it's so easy. In minutes they can spin up an analytics cluster with a shared data context, they get the wrong mix of things they shut it down and spin it up again. >> In previous example you used HDFS clusters. There's so many trip wires, right. You can break something. >> It's fragile. >> It's like scripts. You don't want to tinker with that. Developers don't want to get their hand slapped. >> The other thing is also the recognition that innovation comes from data. That's what my takeaway is. The customer saying, okay, now we can innovate because we have access to the data, we can apply intelligence to that data whether it's machine intelligence or analytics, et cetera. >> This the trend in infrastructure. You mentioned the shared context. What other observations and learnings have you guys come to as Red Hat starts to get more customer interactions around analytical infrastructure. Is it an IT problem? You mentioned abstracting the way different infrastructures, and that means multi-cloud's probably setup for you guys in a big way. But what does that mean for a customer? If you had to explain infrastructure analytics, what needs to get done, what does the customer need to do? How do you describe that? >> I love the term that industry uses of multi-tenant workload isolation with shared data context. That's such a concise term to describe what we talk to our customers about. And most of them, that's what they're looking for. They've got their data scientist teams that don't want their workloads mixed in with the long running batch workloads. They say, listen, I'm on deadline here. I've got an hour to get these answers. They're working with Impala. They're working with Presto. They iterate, they don't know exactly the pattern they're looking for. So having to take a long time because their jobs are mixed in with these long MapReduce jobs. They need to be able to spin up infrastructure, workload isolation meaning they have their own space, shared context, they don't want to be placing calls over to the platform team saying, I need data sets C, D, and E. Could you please send them over? I'm on deadline here. That phrase, I think, captures so nicely what customers are really looking to do with their analytics infrastructure. Analytics tools, they'll still do their thing, but the infrastructure underneath analytics delivering this new type of agility is giving that multi-tenant workload isolation with shared data context. >> You know what's funny is we were talking at the kickoff. We were looking back nine years. We've been at this event for nine years now. We made prediction there will be no Red Hat of big data. John, years ago said, unless it's Red Hat. You guys got dragged into this by your customers really is how it came about. >> Customers and partners, of course with your recent guest from Hortonworks, the announcement that Red Hat, Hortonworks, and IBM had on Monday of this week. Dialing up even further taking the agility, okay, OpenStack is great for agility, private cloud, utility based computing and storage with OpenStack and Ceph, great. OpenShift dials up that agility another notch. Of course, we heard from the CEO of Hortonworks how much they love the agility that a Kubernetes based substrate provides their analytics customers. >> That's essentially how you're creating that sort of same-same experience between on-prem and multi-cloud, is that right? >> Yeah, OpenShift is deployed pervasively on AWS, on-premises, on Azure, on GCE. >> It's a multi-cloud world, we see that for sure. Again, the validation was at VMworld. AWS CEO, Andy Jassy announced RDS which is their product on VMware on-premises which they've never done. Amazon's never done any product on-premises. We were speculating it would be a hardware device. We missed that one, but it's a software. But this is the validation, seamless cloud operations on-premise in the cloud really is what people want. They want one standard operating model and they want to abstract away the infrastructure, as you were saying, as the big trend. The question that we have is, okay, go to the next level. From a developer standpoint, what is this modern developer using for tools in the infrastructure? How can they get that agility and spinning up isolated, multi-tenant infrastructure concept all the time? This is the demand we're seeing, that's an evolution. Question for Red Hat is, how does that change your partnership strategy because you mentioned Rob Bearden. They've been hardcore enterprise and you guys are hardcore enterprise. You kind of know the little things that customers want that might not be obvious to people: compliance, certification, a decade of support. How is Red Hat's partnership model changing with this changing landscape, if you will? You mentioned IBM and Hortonworks release this week, but what in general, how does the partnership strategy look for you? >> The more it changes, the more it looks the same. When you go back 20 years ago, what Red Hat has always stood for is any application on any infrastructure. But back in the day it was we had n-thousand of applications that were certified on Red Hat Linux and we ran on anybody's server. >> Box. >> Running on a box, exactly. It's a similar play, just in 2018 in the world of hybrid, multi-cloud architectures. >> Well, you guys have done some serious heavy lifting. Don't hate me for saying this, but you're kind of like the mules of the industry. You do a lot of stuff that nobody either wants to do or knows how to do and it's really paid off. You just look at the ascendancy of the company, it's been amazing. >> Well, multi-cloud is hard. Look at what it takes to do multi-cloud in DevOps. It's not easy and a lot of pretenders will fall out of the way, you guys have done well. What's next for you guys? What's on the horizon? What's happening for you guys this next couple months for Red Hat and technology? Any new announcements coming? What's the vision, what's happening? >> One of the announcements that you saw last week, was Red Hat, Cloudera, and Eurotech as analytics in the data center is great. Increasingly, the world's businesses run on data-driven decisions. That's great, but analytics at the edge for more realtime industrial automation, et cetera. Per the announcements we did with Cloudera and Eurotech about the use of, we haven't even talked about Red Hat's middleware platforms, such as AMQ Streams now based on Kafka, a Kafka distribution, Fuze, an integration master effectively bringing Red Hat technology to the edge of analytics so that you have the ability to do some processing in realtime before back calling all the way back to the data center. That's an area that you'll also see is pushing some analytics to the edge through our partnerships such as announced with Cloudera and Eurotech. >> You guys got the Red Hat Summit coming up next year. theCUBE will be there, as usual. It's great to cover Red Hat. Thanks for coming on theCUBE, Brent. Appreciate it, thanks for spending the time. We're here in New York City live. I'm John Furrier, Dave Vallante, stay with us. All day coverage today and tomorrow in New York City. We'll be right back. (upbeat music)

Published Date : Sep 12 2018

SUMMARY :

Brought to you by SiliconANGLE Media Open source, DevOps is the theme here. Cloud is driving a lot of the action. One of the things that we see is people and that is that the infrastructure software, the shared data context that Ceph gives us. So they're building infrastructure One of the challenges with that is the data is everywhere. And the third was releasing the lock on innovation. That came from a customer? In the details, as you know, I love his last bit, the one that sounds the most buzzword-y In previous example you used HDFS clusters. You don't want to tinker with that. that innovation comes from data. You mentioned the shared context. I love the term that industry uses of You guys got dragged into this from Hortonworks, the announcement that Yeah, OpenShift is deployed pervasively on AWS, You kind of know the little things that customers want But back in the day it was we had n-thousand of applications in the world of hybrid, multi-cloud architectures. You just look at the ascendancy of the company, What's on the horizon? One of the announcements that you saw last week, You guys got the Red Hat Summit coming up next year.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Dave VallantePERSON

0.99+

Dave VellantePERSON

0.99+

IBMORGANIZATION

0.99+

JohnPERSON

0.99+

Brent ComptonPERSON

0.99+

AWSORGANIZATION

0.99+

John FurrierPERSON

0.99+

EurotechORGANIZATION

0.99+

HortonworksORGANIZATION

0.99+

AmazonORGANIZATION

0.99+

BrentPERSON

0.99+

New York CityLOCATION

0.99+

2018DATE

0.99+

Red HatORGANIZATION

0.99+

Rob BeardenPERSON

0.99+

nine yearsQUANTITY

0.99+

Andy JassyPERSON

0.99+

last weekDATE

0.99+

first languageQUANTITY

0.99+

Three taglinesQUANTITY

0.99+

SiliconANGLE MediaORGANIZATION

0.99+

firstQUANTITY

0.99+

tomorrowDATE

0.99+

secondQUANTITY

0.99+

OneQUANTITY

0.99+

ClouderaORGANIZATION

0.99+

next yearDATE

0.99+

thirdQUANTITY

0.99+

New YorkLOCATION

0.99+

ImpalaORGANIZATION

0.99+

Monday this weekDATE

0.99+

VMworldORGANIZATION

0.98+

one clusterQUANTITY

0.98+

Red Hat SummitEVENT

0.98+

ninth yearQUANTITY

0.98+

oneQUANTITY

0.98+

OpenStackTITLE

0.98+

todayDATE

0.98+

NYCLOCATION

0.97+

20 years agoDATE

0.97+

KuberneteseTITLE

0.97+

KafkaTITLE

0.97+

FirstQUANTITY

0.96+

this weekDATE

0.96+

Red HatTITLE

0.95+

EnglishOTHER

0.95+

Monday of this weekDATE

0.94+

OpenShiftTITLE

0.94+

one standardQUANTITY

0.94+

50 plus analytics clustersQUANTITY

0.93+

CephTITLE

0.92+

AzureTITLE

0.92+

GCETITLE

0.9+

PrestoORGANIZATION

0.9+

agile DevOpsTITLE

0.89+

theCUBEORGANIZATION

0.88+

DevOpsTITLE

0.87+

Greg Fee, Lyft | Flink Forward 2018


 

>> Narrator: Live from San Francisco, it's theCUBE covering Flink Forward brought to you by Data Artisans. >> This is George Gilbert. We are at Data Artisan's conference Flink Forward. It is for the Apache Flink commmunity, sponsored by Data Artisans, and all the work they're doing to move Flink Forward, and to surround it with additional value that makes building stream-processing applications accessible to mainstream companies. Right now though, we are not talking to a mainstream company, we're talking to Greg Fee from Lyft. Not Uber. (laughs) And Greg tell us a little bit about what you're doing with Flink. What's the first-use case, that comes to mind that really exercises its capabilities? >> Sure, yeah, so the process of adopting Flink at Lyft has really started with a use case, which was, we're trying to make machine learning more accessible across all of Lyft. So we already use machine learning in quite a few applications, but we want to make sure that we use machine learning as much as possible, we really think that's the path forward. And one of the fundamental difficulties with that is having consistent feature generation between these offline batch-y training scenarios and the online real-time streaming scenarios. And the unified processing engine of Flink really helps us bridge that gap, so. >> When you say unified processing engine, are you saying that the fact that you can manage code and data, as sort of an application version, and some of the, either code or data, is part of the model, and so your versioning? >> That's even a step beyond what I'm talking about. >> Okay. >> Just the basic fundamental ability to have one piece of business logic that you can apply at the batch bulk layer, and in the real-time layer. >> George: Yeah. >> So that's sort of like the core of what Flink gives you. >> Are you running both batch and streaming on Flink? >> Yes, that's right. >> And using the, so, you're using the windows? Or just periodic execution on a stream to simulate batch? >> That's right. So we have, so feature generation crosses a broad spectrum of possible use cases in Flink. >> George: Yeah. >> And this is where we sort of transition more into what dA platform could give for us. So, we're looking to have thousands of different features across all of our machine learning models. So having a platform that can help us host many of these little programs running, help with the application life-cycle of each of these features, as we version them over time. So, we're very excited about what dA platform can do for us. >> Can you tell us a little more about how the stream processing helps you with the feature selection engineering, and is it that you're using streaming, or simulated batch, or batch using the same programming model to train these models, and you're using, you're picking up different derived data, is that how it's working? >> So, typical life-cycle is, it's going to be a feature engineering stage, so the data scientist is looking at their data, they're trying figure out patterns in the data, and they're going to, how you apply Flink there, is as you come up with potential algorithms for how you generate your feature, can run that through Flink, generate some data, apply machine learning model on top of it, and sort of play around with that data, prototype things. >> So, what you're doing is offline, or out of the platform, you're doing the feature selection and the engineering. >> Man: Right. >> Then you attach a stream to it that has just the relevant, perhaps, the relevant features. >> Man: Right. >> And then that model gets sort of, well maybe not yet, but eventually versioned as part of the application, which includes the application, the rest of the application logic and the data. >> Right. So, like some of the stuff that was touched on this morning at the keynotes, the versioning and maintaining machine learning applications, is a much, is a very complex ecosystem there. So being able to say, okay, going from the prototype stage, doing stuff in batch, to doing stuff in production, and real-time, then being able to version those over time, to move to better and better versions of the future generation, is very important to us. >> I don't know if this is the most politically correct thing, but you just explained it better than everyone else we have talked to. >> Great. (laughs) >> About how it all fits together with the machine learning. So, once you've got that in place, it sounds like you're using the dA platform, as well as, you know, perhaps some extensions for machine learning, to sort of add that as a separate life-cycle, besides the application code. Then, is that going to be the enterprise-wide platform for deploying, developing and deploying, machine learning applications? >> Yes, certainly we think there's probably a broad ecosystem to do machine learning. It's a very, sort of, wide open area. Certainly my agenda is to push it across the company and get as many things running in this system as possible. I think the real-time aspects of it, a unifying aspect, of what Flink can give us, and the platform can give us, in terms of the life-cycles. >> So, are you set up essentially like where you're the, a shared resource, a shared service, which is the platform group? >> Man: Right. >> And then, all the business units, adopt that platform and build their apps on it. >> Right. So my initiative is part of a greater data science platform at Lyft, so, my goal is to have, we have hundreds of data scientists who are going to be looking at this data, giving me little features that they want to do, and we're probably going to end up numbering in the thousands of features, being able to generate all those, maintain all those little programs. >> And when you say generate all those little programs, that's the application logic, and the models specific to that application? >> That's right, well. >> Or is it this? >> There's features that are typically shared across many models. >> Okay. >> So there's like two layers of things happening. >> So you're managing features separately from the models. >> That's right. >> Interesting. Okay, haven't heard that. And is the application manager tooling going to help address that, or is that custom stuff that you have to do? >> So, I think there's, I think there's a potential that that's the way we're going to manage the model stuff as well, but it's still little new over there. >> That you put it on the application platform? >> Right. >> Then that's sort of at the boundary of what you're doing right now, or what you will be doing shortly. >> Right. It's all, it's a matter of use-case, whether it's online or offline, and how it fits best in with the rest of the Lyft engineering system. >> When you're talking about your application landscape, do you have lots of streaming applications that feed other streaming applications, going through a hub. Or, are they sort of more discrete, you know, artifacts, discrete programs, and then when do you keep, stay within the streaming processors, and when do you have it in a shared database? >> That's a, that's a lot of questions, kind of a deep question. So, the goal is to have a central hub, where sort of all of our event data passes through it, and that allows us to decouple. >> So that's to be careful, that's not a database central hub, that's a, like a? >> An event hub. >> Event hub. >> Right. >> Yeah, okay. >> So, an event hub in the middle allows us to decompose the different, sort of smaller programs, which again are probably going to number in the thousands, so that being able to have different parts of the company maintain their own part of the overall system is very important to us. I think we'll probably see Flink as a major player, in terms of how those programs run, but we'll probably be shooting things off to other systems like Druid, like Hive, like Presto, like Elasticsearch. >> As derived data? >> As all derived data, from these Flink jobs. And then also, pushing data directly out into some of our production systems to feed into machine learning decisions. >> Okay, this is quite, sounds like the most ambitious infrastructure that we've heard, in that it sounds like pretty ubiquitous. >> We want to be a machine-learning first company. So, it's everywhere. >> So, now help me clarify for me, when? Because this is, you know, for mainstream companies who've programmed with, you know, DBMS, as a shared state manager for decades, help explain to them when you would still use a DBMS for shared state, and when you would start using the distributed state that's embedded in Flink, and the derived data, you know, at the endpoints, at the syncs. >> So I mean, I guess this kind of gets into your exact, your use cases and, you know, your opinions and thoughts about how to use these things best, but. >> George: Your opinion is what we're interested in. >> Right. From where I'm coming, I see basically databases as potential one sync for this data. They do things very well, right? They do structured queries very well. You can have indices built off that, aggregates, really feed into a lot of visualization stuff. >> George: Yeah. >> But, from where I am sitting, like we're really moving away from databases as something that feeds production data. We've got other stores to do that, that are sort of more tailored towards those scenarios. >> When you say to feed production data, this is transaction capture, or data capture. >> Right. So we don't have a lot of atomic transactions, outside the payments at Lyft, most of the stuff is eventually consistent. So we have stores, more like Dynamo or Cassandra HBase that feed a lot of our production data. >> And those databases, are they for like ambient information like influencing an interaction, it doesn't sound like automating a transaction. It would be, it sounds like, context that helps with analytics, but very separate from the OLTP apps. >> That's right. So we have, you can kind of bifurcate the company into the data that's used in production to make decisions that are like facing the user, and then our analytics back end, that really helps business analysts and like the executives make decisions about how we proceed. >> And so that second part, that backend, is more like operational efficiency. >> Man: Right. >> And coding new business processes to support new ways of doing business, but the customer-facing stuff specifically like with payments, that still needs a traditional OLTP. >> Man: Right. >> But there not, those use cases aren't growing that much. >> That's right. So, basically we have very specific use-cases for like a traditional database, but in terms of capturing the types of scale, and the type of growth, we're looking for at Lyft, we think some of the other storage engines suit those better. >> So in that use-case, would the OLTP DBMS be at the front end, would it be a source, or a sync? It sounds like it's a source. >> So we actually do it both ways. Right, so, it's great to get our transactional data flowing through our streaming system, it's a lot of value in that, but also then pushing it out, back to some of the aggregate results to DBMS, helps with our analytics pipeline. >> Okay, okay. Well this is actually really interesting. So, where do you see the dA platform helping, you know, going forward; is it something you don't really need because you've built all that scaffolding to help with sort of application life-cycle management, or or do you see it as something that'll help sort of push Flink sort of enterprise-wide? >> I think the dA platform really helps people sort of adopt Flink at an enterprise level. Maintaining the applications is a core part of what it means to run it as a business. And so we're looking at dA platform as a way of managing our applications, and I think, like I'm just talking about one, I'm mostly talking about one application we have for Flink at Lyft. >> Yeah. >> We have many other Flink programs actually running, that are sort of unrelated to my project. >> What about managing non-Flink applications? Do you need an application manager? Is it okay that it's associated with one service, or platform like Flink, or is there a desire you know among bleeding edge customers to have an overall, sort of infrastructure management, application management kind of suite. >> Yes, for sure. You're touching on something I have started to push inside of Lyft, which is the need for an overall application life-cycle management product that's not technology specific. >> Would these sort of plug into the dA platform and whatever the confluent, you know, equivalent is, or is it going to to directly tie to the, you know, operational capabilities, or the functional capabilities, not the management capabilities. In other words would it plug into like core Flink, core Kafka, core Spark, that sort of stuff? >> I think that's sort of largely to be determined. If you go back to sort of how distributed design system works, typically. We have a user plane, which is going to be our data users. Then you end up with the thing we're probably most familiar with, which is our data plane, technologies like Flink and Kafka and Hive, all those guys. What's missing in the middle right now is a control plane. It's a map from the user desire, from the user intention, to what we do with all of that data plane stuff. So launch a new program, maybe you need a new Kafka topic, maybe you need to provision in Kafka. Higher, you need to get some Flink programs running, and whether that talks directly talks to Flink, and goes against Kubernetes, or something like that, or whether it talks to a higher level, like more application-specific platform. >> Man: Yeah. >> I think, you know it's certainly a lot easier, if we have some of these platforms in the way. >> Because they give you better abstractions. >> That's right. >> To talk to the platforms. >> That's right. >> That's interesting. Okay, geesh, we learn something really, really interesting with each interview. I'm curious though, if you look out a couple years, how much of your application landscape will be continuous processing, and is that something you can see mainstream enterprises adopting, or has decades of work with, you know, batch and interactive sort of made people too difficult to learn something so radically new? >> I think it's all going to be driven by the business needs, and whether the value is there for people to make that transition 'cause it is quite expensive to invest in new infrastructure. For companies like Lyft, where we're trying to make decisions very quickly, you know, users get down to two seconds makes a difference for the customer, so we're trying to be as, you know, real-time as possible. I used to work at Salesforce. Salespeople are a little less sensitive to these things, and you know it's very, very traditional world. >> That's interesting. (background applauding) >> But even Salesforce is moving towards that style. >> Even Salesforce is moving? >> Is moving toward streaming processing. >> Really? >> George: So like, I think we're going to see it slowly be adopted across the big enterprises. >> George: I imagine that's probably for their analytics. >> That's where they're starting, of course, yeah. >> Okay. So, this was, a little more affirmation on to how we're going to see the control plane evolve, and the interesting use-cases that you're up to. I hope we can see you back next year. And you can tell us how far you've proceeded. >> I certainly hope so, yeah. >> This was really interesting. So, Greg Fee from Lyft. We will hopefully see you again. And this is George Gilbert. We're at the Data Artisans Flink Forward conference in San Francisco. We'll be back after this break. (techno music)

Published Date : Apr 12 2018

SUMMARY :

brought to you by Data Artisans. What's the first-use case, that comes to mind And one of the fundamental difficulties with that That's even a step beyond what Just the basic fundamental ability to have So we have, so feature generation crosses a broad So having a platform that can help us host with potential algorithms for how you So, what you're doing is offline, or out of the platform, Then you attach a stream to it that has just of the application logic and the data. So, like some of the stuff that was touched on politically correct thing, but you just explained (laughs) Then, is that going to be the enterprise-wide platform in terms of the life-cycles. and build their apps on it. in the thousands of features, being able to generate There's features that are typically And is the application manager tooling going to help that that's the way we're going to manage the model stuff Then that's sort of at the boundary of what you're of the Lyft engineering system. and when do you have it in a shared database? So, the goal is to have a central hub, So, an event hub in the middle allows us to decompose some of our production systems to feed into Okay, this is quite, sounds like the most ambitious So, it's everywhere. and the derived data, you know, at the endpoints, about how to use these things best, but. into a lot of visualization stuff. We've got other stores to do that, that are sort of When you say to feed production data, outside the payments at Lyft, most of the stuff And those databases, are they for like ambient information So we have, you can kind of bifurcate the company And so that second part, that backend, is more like of doing business, but the customer-facing stuff the types of scale, and the type of growth, we're looking be at the front end, would it be a source, or a sync? some of the aggregate results to DBMS, So, where do you see the dA platform helping, you know, Maintaining the applications is a core part actually running, that are sort of unrelated to my project. you know among bleeding edge customers to have an overall, inside of Lyft, which is the need for an overall application or is it going to to directly tie to the, you know, to what we do with all of that data plane stuff. I think, you know it's certainly a lot easier, or has decades of work with, you know, and you know it's very, That's interesting. that style. adopted across the big enterprises. I hope we can see you back next year. We're at the Data Artisans Flink Forward conference

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
George GilbertPERSON

0.99+

GeorgePERSON

0.99+

GregPERSON

0.99+

Greg FeePERSON

0.99+

Data ArtisansORGANIZATION

0.99+

San FranciscoLOCATION

0.99+

LyftORGANIZATION

0.99+

thousandsQUANTITY

0.99+

next yearDATE

0.99+

second partQUANTITY

0.99+

UberORGANIZATION

0.99+

each interviewQUANTITY

0.99+

DynamoORGANIZATION

0.99+

SalesforceORGANIZATION

0.99+

ApacheORGANIZATION

0.98+

FlinkORGANIZATION

0.98+

one serviceQUANTITY

0.98+

two layersQUANTITY

0.98+

two secondsQUANTITY

0.98+

eachQUANTITY

0.97+

thousands of featuresQUANTITY

0.97+

both waysQUANTITY

0.97+

KafkaTITLE

0.93+

first-use caseQUANTITY

0.92+

one applicationQUANTITY

0.92+

DruidTITLE

0.92+

Flink ForwardTITLE

0.92+

decadesQUANTITY

0.91+

ElasticsearchTITLE

0.89+

Data Artisans Flink ForwardEVENT

0.89+

oneQUANTITY

0.89+

ArtisanEVENT

0.87+

first companyQUANTITY

0.87+

hundreds of data scientistsQUANTITY

0.87+

both batchQUANTITY

0.84+

one pieceQUANTITY

0.83+

2018DATE

0.81+

FlinkTITLE

0.8+

HiveTITLE

0.77+

PrestoTITLE

0.76+

this morningDATE

0.75+

featuresQUANTITY

0.74+

coupleQUANTITY

0.73+

Flink ForwardEVENT

0.69+

HiveORGANIZATION

0.65+

SparkTITLE

0.62+

KubernetesORGANIZATION

0.61+

DataORGANIZATION

0.6+

Cassandra HBaseORGANIZATION

0.57+

Yaron Haviv, iguazio | AWS re:Invent 2017


 

Live from Las Vegas. It's the Cube covering AWS Reinvent 2017 presented by AWS, Intel, and our ecosystem of partners. >> Hello, welcome back. This is live coverage of the Cube's AWS re:Invent 2017. Two sets, a lot of action, day one of three days of wall to wall coverage. I'm John Furrier with my co-host Keith Townsend. Our next guest cube alumni is Yaron Haviv who's the founder and CTO of Iguazio, a hot new start up. And big news coming next. We got a big announcement. In following their work, Yaron, good to see you again. Thanks for coming back on. >> Hi, thanks! >> Hey you got a new shirt. Share that logo there. >> That's nuclio. That's our new serverless brainwork which is open source. Really kicks ass, it's about 100 times faster than Amazon. >> Word says it's 200 times faster. >> Yeah we don't want to shame. >> You set the bar. >> We doing 400,000 events per second on a single process. They do about 2000. Most of the open source project around the same ball park. >> Yaron, I got to get this off the bat. And then we can have a nice discussion afterwards. A pleasant discussion. Serverless. Let's first define what that means. Because there's a bunch of- I can take nuclio, install it in my data center, run it, am I serverless? >> You know so I mean I'm in the serverless working group. >> For CNCF >> for CNCF. And a we had a hot debate between the open source start ups. Doing what is called functional service and Amazon and others trying to push the notion of serverless. Which is serverless stands for server less. Meaning you don't manage server. And the way we position nucleo, it's actually both. Because on one end you can consume it as an open source project. Very easy to download. Single docker instruction and it's up and running unlike some other solutions. And on the other hand you can consume it as something within the Iguazio data platform. There is a slide from Amazon which I really like. Which is about serverless. They show serverless is attached to kinesis, DynaomoDB, S3 and Athena. Four services of data that attach to Lamda. Iguazio has API compatibility with kineses, DynamoDB with S3 and Presto, which is Athena as well. So exactly the same four data services that they position as far as the service ecosystem are supported on our platform. So we provide one platform, all the data services at Amazon has or at least interesting ones, serverless functions which are a hundred times faster, a few more tricks that they don't have-- >> So what is the definition then. In a pithy way, for someone out there who's learning about serverless. What is it? What's the definition? >> So the notion as a developer, you're sort of avoiding IT. You go, open a nice portal, you write the function, or you write your function in a get up repository somewhere. You click on a button and it gets deployed somewhere. Right now you know where it's going to get deployed. In the future, you may not know. >> Instead of an EC2 instance, get that prepared >> It's not really an EC2. >> The old way. The old way was. Right? >> The old way there were infrastructure guys building your EC2 instance, security layers, milware, etc. You go develop on your laptop and then you need to go and conform and all the continuous integration play was very complicated. Serverless comes inherently with scale out without the scale in, with continuous integration. You have versioning for the code. You can downgrade the version, you can upgrade the version. So essentially its a package version of a cloud native solution. That's the general idea. >> So I can do that if I'm doing it and managing it myself. It functions as a server. And if I'm doing it and it's a provided it as a cloud provider as a server, as a service, it's serverless. None of my operations team is dealing with servers. It's just writing code and just go. >> Yeah, you're writing a function. Push commit. You should play with nucleo, not just other things. But you'll see you're writing a function. Even see it has a built in editor. You write, you push deploy and it's already deployed somewhere. >> So give us some perspective before you move on. On the game what the impact is to a developer. Apples to oranges. Our old way you described it, new ways, it sounds easier! What's the impact? Is it time? Money? Can you quantify? >> The biggest challenge for businesses is to transform. I saw an interesting sentence. It's not about digital transformation, it's about businesses that need to work in a digital world. Okay? Because again, most of the communication of customers to businesses is becoming digital. Okay? Whether it's today from mobile apps tomorrow through Alexa. >> As Luke Cerney says, it's all software. Your business is the software. >> It's all about interactive really. Okay. As a business I always position there are two things you need to take care of as a business. One is increasing the revenue. And that's by engaging more customers. And increasing the revenue per customer. How do you engage more customer? Through digital services. Whether it's Twitters or proving a new service through your web portal. And the next thing is how do you generate more revenue from a customer is by showing recommendations. >> Finding more value. >> And the other aspect is operational efficiency. How do you automate your reparations to reduce the cost. You know Amazon uses robots to do the shipping and packing. So their margins can now be lower. So the generator is both those things. Reducing cost is becoming more and more dependent on automation which is digital. And increasing revenue become more about customer engagement which is digital. Okay so now you're a traditional enterprise. And you have your exchange to worry about. And all the legal stuff and the mainframes. But if you're not going to work on the transformation piece. You're going to die. Because some other start up is going to build insurance company which is sort of agile and all that. >> So you made an interesting comment earlier when you were talking about nucleo. And integrating the functions that really matter. The services that matter. Amazon releases 800 new services a year. >> Actually 1300. >> I'm sorry 1300. >> This time less, no? >> Right now they're at 1130 and they expect 1500, 1700 by the end of the year. Two years ago it was like 750 and then the year before that was 600. >> So is that an indicator as to Amazon's leading this race between the big, I don't know, three, four cloud providers. Rack and stack them for us. How do we assess the capability? >> It's a matter of mentality. Okay. Persos thinks like a supermarket. Just like an Amazon market. I could say I need a cover for my iPad. I'm gonna get 100 covers for my iPad. No one really, I need to now choose. So their strategy is we'll put dozens of services that do similar things. One is better at this, one is better at that. We control the market we'll sell more. We have a different approach. We do fewer services but each one sort of kicks ass. Each one is much better, much faster, much better engineered. Okay? This is also why we are on data plus provides 10 different data APIs and not 10 different individual data platforms. >> Alright so let's talk about the scoreboard. Even though they might be thinking about the supermarket. You've got Amazon, Azure Microsoft and Google. I've looked at some of the data. I mean, Microsoft's been international for a while from their MSN business. They now have Skype. They have data centers, they know a little bit about cloud. Amazon's got a lot more services. They support multiple versions of things. Google is kind of non-existent on the scale of comprehensiveness. >> Have you looked at their serverless functions? By the way? >> There's new stuff. Tensorflow, serverless. >> But serverless they only support an OJS. They have very few triggers and it's still defined as beta. >> That's the point, so people are touting my Forbes article. They're touting like a feature. There's a lot more that needs to get done. So the question I have for you is. There's a level of comprehensiveness that you need now. And I know you guys spend a lot of time building your solution. We've talked abut this at our last Cube interview. So the question is the whole MVP cousin, minimal viable product. Is great when you're building a consumer app for an iPhone. But when you start talking about a platform and now cloud. Question to you is there a level of completeness bar to be hurdled over for a legit cloud or cloud player? >> I don't think you need 1000 services to build a good cloud. But you do need a bunch of services. Okay? Now the way we see the world like Satya. Okay? Which is there is a core cloud. But there is sort of a belt around it which is what we call intelligence cloud. We would define ourselves as the intelligence cloud. So if someone is building a machine learning model and it needs a 5 year worth of data. And it just needs to do crawling on top of it. It's not really an interesting problem. It's commoditized, lots of CPO power, object storage. But the bigger challenge is doing game referencing close to the edge. This is what needs to happen in real time. You need fewer services but you need to be real time. >> Smarter integration to do that. Right? I mean. >> You have density problems. You don't have a lot of room to put a 100 servers. It needs to be a lot more integrated. You know look at Azure stack. Their slogan is consistency. Look at a slide that shows which Azure services are part of Azure stack. Less than 20%. Because it's a lot more complicated to take technology design whereas hyper scale and put them on few servers. >> How do customers figure it out? What does a customer do? It's all mind boggling. >> I love that concept of core services and then value around those core services. What are those core services that a cloud must have before I start to invest in that cloud providers strategy? >> So the point again, there's a lot of legacy that you need to grab with you. Especially someone like Amazon. So they have to have VMs and migration services from Oracle, etc. But let's assume I'm a start up and building a new client native applications. Do I need any of that? No. I can probably can do with containers. I don't really need to be VMs. I can use something like cybernetics, I can use sequel databases maybe some like sequel. So I can redesign my application differently with a lot fewer services. The problem for someone like Amazon in order to grow and be a supermarket, you have to have ten of everything. If I'm someone that focus on new applications I don't need so many services and so much legacy. >> Well I'll say one thing. You can call them a supermarket, use that retail analogy, I buy that analogy only to the extent that you used it. But if that's the case, then everyone's hungry for food. And they're the only supermarket in town. >> But Wholefoods maybe less stuff on the shelf. >> Everyone else is like a little hot dog stand compared to the supermarket. Amazon is crushing it. Your thoughts? I say that. Are they kicking ass? >> Obviously Amazon is kicking ass. But I think Azure is ramping up faster. Amazon is generating more alienation among people that they are starting to compete with. You know. >> Azure is copying Amazon. Right? >> Yeah. But they have a different angle. They know how to sell to enterprises. They already have the foot in the door for Office 365. I've talked to a customer. We're going Azure. I say why? >> Together: They've got 365. >> We already certify the security with 365 for us to use Azure it's a- >> Right up until that next breech. >> So the guys owning ITs, it's easier for them to go to Azure. The developers want Amazon. Because Amazon is sexier. >> We got to break. We debated this on the intro segment with he analyst. Question. IT buyers have been driven by a top down CIO driven, CXO driven waterfall, whatever you want to call it, old way. With developers now at the driver's seat, with all of this serverless function, serverless coming around the corner very fast. Are developers driving the buying decisions or not? Or is it IT? The budget's still there. They want to eliminate labor. They want more efficiencies. Are you seeing it again? Will it happen? >> Yeah because we are just in the middle. On one end we're an infrastructure. We're an infrastructure consumed by developers. So we keep on having those challenges within the accounts themselves. IT doesn't get what we're doing. Serverless, and database is serverless. Because they like to build stuff. They want to take the nutanix and take a hundred services on top of it. And it will take them two years to integrate it. By that time the business already moved somewhere else. >> So IT could be a dinosaur like the mainframe? >> Right. I think the smart ITs understand they need to adopt cloud instead of fight it. And more the line further up the step. And that sort of the thing we are trying to provide to them. When you are building stuff you are buying EMC storage. You are not just taking discs. So why do you focus on this low level block storage when you're buying infrastructure. Why no buy database as a service. And then you don't need all the hassle. Streaming is a service. Serverless is a service. And then you don't need all that stack. >> Yaron, you should be our guest analyst. But you're too busy building a company. We're going see you next week in Austin for Cubicon. Congratulations. I know you guys have worked hard. The founder and CTO of Iguazio. You're going to hear a lot about these guys. Smart team. They're either going to go big or go home. I think they're going to go big. Congratulations. More coverage here at AWS Re:Invent after this short break. I'm John Furrier with Keith Townsend.

Published Date : Nov 29 2017

SUMMARY :

It's the Cube This is live coverage of the Cube's AWS re:Invent 2017. Hey you got a new shirt. which is open source. Most of the open source project around the same ball park. Yaron, I got to get this off the bat. And on the other hand you can consume it as something What's the definition? In the future, you may not know. The old way was. You can downgrade the version, you can upgrade the version. So I can do that if I'm doing it and managing it myself. You write, you push deploy So give us some perspective before you move on. The biggest challenge for businesses is to transform. Your business is the software. And the next thing is how do you generate more revenue And all the legal stuff and the mainframes. And integrating the functions that really matter. and they expect 1500, 1700 by the end of the year. So is that an indicator as to Amazon's leading this race We control the market we'll sell more. on the scale of comprehensiveness. There's new stuff. But serverless they only support an OJS. So the question I have for you is. You need fewer services but you need to be real time. Smarter integration to do that. You don't have a lot of room to put a 100 servers. How do customers figure it out? before I start to invest in that cloud providers strategy? So the point again, there's a lot of legacy to the extent that you used it. compared to the supermarket. that they are starting to compete with. Azure is copying Amazon. They already have the foot in the door for Office 365. So the guys owning ITs, it's easier With developers now at the driver's seat, Because they like to build stuff. And that sort of the thing we are trying to provide to them. I know you guys have worked hard.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Susan WojcickiPERSON

0.99+

Dave VellantePERSON

0.99+

Lisa MartinPERSON

0.99+

JimPERSON

0.99+

JasonPERSON

0.99+

Tara HernandezPERSON

0.99+

David FloyerPERSON

0.99+

DavePERSON

0.99+

Lena SmartPERSON

0.99+

John TroyerPERSON

0.99+

Mark PorterPERSON

0.99+

MellanoxORGANIZATION

0.99+

Kevin DeierlingPERSON

0.99+

Marty LansPERSON

0.99+

TaraPERSON

0.99+

JohnPERSON

0.99+

AWSORGANIZATION

0.99+

Jim JacksonPERSON

0.99+

Jason NewtonPERSON

0.99+

IBMORGANIZATION

0.99+

Daniel HernandezPERSON

0.99+

Dave WinokurPERSON

0.99+

DanielPERSON

0.99+

LenaPERSON

0.99+

Meg WhitmanPERSON

0.99+

TelcoORGANIZATION

0.99+

Julie SweetPERSON

0.99+

MartyPERSON

0.99+

Yaron HavivPERSON

0.99+

AmazonORGANIZATION

0.99+

Western DigitalORGANIZATION

0.99+

Kayla NelsonPERSON

0.99+

Mike PiechPERSON

0.99+

JeffPERSON

0.99+

Dave VolantePERSON

0.99+

John WallsPERSON

0.99+

Keith TownsendPERSON

0.99+

fiveQUANTITY

0.99+

IrelandLOCATION

0.99+

AntonioPERSON

0.99+

Daniel LauryPERSON

0.99+

Jeff FrickPERSON

0.99+

MicrosoftORGANIZATION

0.99+

sixQUANTITY

0.99+

Todd KerryPERSON

0.99+

John FurrierPERSON

0.99+

$20QUANTITY

0.99+

MikePERSON

0.99+

January 30thDATE

0.99+

MegPERSON

0.99+

Mark LittlePERSON

0.99+

Luke CerneyPERSON

0.99+

PeterPERSON

0.99+

Jeff BasilPERSON

0.99+

Stu MinimanPERSON

0.99+

DanPERSON

0.99+

10QUANTITY

0.99+

AllanPERSON

0.99+

40 gigQUANTITY

0.99+

Michael Weiss & Shere Saidon, NASDAQ | PentahoWorld 2017


 

>> Narrator: Live from Orlando, Florida, it's theCube covering PentahoWorld 2017 brought to you by Hitachi Ventara. >> Welcome back to theCube's live coverage of PentahoWorld brought to you by Hitachi Ventara. My name is Rebecca Knight, I'm your host along with my co-host, Dave Vellante. We're joined by Michael Weiss, he is the senior manager at NASDAQ, and Shere Saidon, who is analytics manager at NASDAQ. Thanks so much for coming back to theCube, I should say, you're Cube veterans now. >> We are, at least I am. This is his first year, this is his first time at PentahoWorld. So, excited to bring him along. >> Okay so you're a newbie but you're a veteran so. (laughing) >> Great. So, tell us a little bit about what has changed since the last time you came on, which was 2015, back then? >> So the biggest thing that's happened in the past 18 months is we've launched seven new exchanges. Integrated seven new exchanges. We bought the ISE, the International Stock Exchange, which is three options markets. We just completed that integration in August. We've also bought the Canadian, CHI-X, the Canadian Exchange, which also had three equities markets, so we integrated them, and we went live with a dark pool offering for Goldman back in June. So now we operate a dark pool for Goldman Sachs, and we're looking to kind of expand that offering at this point. >> So you're just getting bigger and bigger. So tell our viewers a little bit how Pentaho fits into this. >> So Pentaho is the engine that kind of does all our analytics behind the scenes at post trade, right. So we do a lot of traditionally TL, where we're doing batch processing. In the back-end we're doing a little bit more with the Hadoop ecosystem leveraging things like EMR, Spark, Presto, that type of stuff, And Pentaho kind of helps blend that stuff together a little bit. We use it for reporting, we do some of the BA, we're actually now looking to have the data Pentaho generates plug in a little bit of Tableau. So, we're looking to expand it and really leverage that data in other ways at this point. Even doing some things more externally, doing more data offerings via Pentaho externally. >> So I got to do a NASDAQ 101 for my 13 year-old. Came up to me the other day and said, "Daddy, what's the NASDAQ index and how does it work?" Well, give us a 20 second answer. >> Michael: On the NASDAQ index? >> Yeah, what's the NASDAQ Index and how does it work? >> Probably the wrong person to answer that one but, the index is generally just a blend of various stocks. So the S&P 500 is a blend of different stocks, much like that the cues, are NASDAQ's equivalent of the S&P, right, so, we use a different algorithm to determine the companies that make up that blend, but it's an index just like at the S&P. >> They're weighted by market cap- >> Michael: Right, yeah. >> And that determines the number at the end- >> Michael: Correct. >> And it goes up and down based on what the stock's index. >> Right, and that's how most people know NASDAQ, right. They see the S&P went up by 5 points, The Dow went down by 3 and the NASDAQ went up by a point, right. But most people don't realize that NASDAQ also operates 27 exchanges worldwide, I think it is now. So, probably a little bit more, maybe closer to 32, but... >> So you mentioned that you're doing a dark pool for Goldman >> Michael: Yes. >> So that's interesting. We were talking off camera about HFT and kind of the old days, and dark pools were criticized at the time. Now Goldman was one of the ones shown to be honest and above board, but what does that mean the dark pool for your business and how does that all tie in? >> Michael: So, dark pools are isolated markets, right, so they don't necessarily interact with the NASDAQ exchange themselves, it's all done within the pool. You interact with only people trading on that pool. What NASDAQ has done is we took our technology and we now host it for Goldman so, we have I-NETs our trading system, so we gave them I-NET, we built all the surrounding solutions, how you manage symbols, how you manage membership. Even the data, we curate their data in the AWS. We do some Pentaho transformations for them. We do some analytics for them. And that's actually going to start expanding, but yeah, we've provided them an entire solution, so now they don't have to manage their own dark pool. And now we're going to look to expand that to other potential clients. >> Dave: So that's NASDAQ as a technology >> Yes. >> Dave: Provider. Very interesting. So I was saying, earlier, the Hong Kong Stock Exchange is basically closing the facility where they house humans, again another example of machines replacing humans. So the joining, well NASDAQ, kind of, but NYSE, London Stock Exchange, Singapore, now Hong Kong... Essentially, electronic trading. So, brings us to the sort of technology underpinnings of NASDAQ. Shere, maybe you can talk a little bit about your role, and paint a picture of the technology infrastructure. >> Yeah so I focus primarily on the financial side of corporate finance. So we leverage Pentaho to do a lot of data integration, allow us to really answer our business questions. So, previously it would take days to put basic reporting together, now you've got it all automated, or we're working towards getting it mostly automated, and it just answer the questions that we need. And no longer use our gut to drive decisions, we're using hard data. And so that's helped us instrumentally in a lot of different places. >> Dave: So, talk more about the data pipeline, where the data's coming from, how you're blending it, and how you're bringing it through the pipeline and operationalizing it. >> Yeah, so we've got a lot of different billing systems, so we integrate companies, and historically we've let them keep their billings systems. So just kind of bring it all together into our core ERP, seeing how quantities...and just getting the data, and just figuring out on the basic side, how much do we make from a certain customer? What are we making from them? What happens in different scenarios if they consolidate, or if they default? And some of the pipeline there is just blending it all together, normalizing the data, making sure it's all in the same format, and then putting it in a format where our executives or business managers can actually make decisions off of it. >> Well you're talking about the decision making process, and you said it's no longer gut, you're using data to drive your decisions, to know which direction is the right direction. How big a change is that, just culturally speaking? How has that changed? >> Yeah, it's huge, at least on our side, it's making us a long more confident in the decisions we're making. We're no longer going in saying, hey this is probably how we should do it. No, the numbers are showing us that this is going to pay off, and we stick to it and look at the hard facts, rather than what do we think is going to happen? >> So, talk a little bit about what you guys are seeing here, and you're doing a lot of speaking here, we were joking earlier, you're kind of losing your voice. You're telling your story, what kind of reactions you getting? Share with us the behind the scenes at the conference. >> I think at this conference you're seeing a lot of people kind of fall in line with similar ideas that we're trying to get to. Taking advantage more instead of your traditional MPPs, or your traditional relational databases, moving more towards this Hadoop ecosystem. Leveraging Spark, Presto, Flume, all these various new technologies that have emerged over the past two to five years, and are now more viable than ever. They're easier to scale, if you look at your traditional MPPs, like we're a big Redshift user, but every time you scale it there's a cost with that, and we don't necessarily need to maintain all that data all the time, so something in the Hadoop ecosystem now lets us maintain that data without all the unnecessary cost. I see a lot of more of that than I did two years ago, a lot more people are following that trend. I think the other interesting trend I've seen this week is this idea of becoming more cloud agnostic. Where do you operate, and how do you store your data should be irrelevant to the data processing, and I think it's going to be a tough nut to crack for Pentaho, or any vendor. But if you can figure out a way to either do some type of cloud parity, where you have support across all your services, but you don't have to know which service you deploy to when you design your pipelines, I think that's going to be huge. I think we're a little ways from that, but that's been a common theme this week as well, both private and your big three cloud providers right now, your Googles, your Azures, and your AWS. >> So when I asked you said cloud agnostic, that's great, good vision and aspiration. The follow up would be, am I correct that you don't see it as data location agnostic, right, you want to bring the cloud model to your data, versus try to force your data into a cloud? Or not necessarily? >> A lot of it I think is being driven by not wanting to be vendor locked in, so they want to have the ability to, and I think this is easier said than done, the ability to move your data to different cloud providers based on pricing or offerings, right, and right now going from AWS to Google to Azure would be a very painful process. So you move petabytes of data across, it's not cost efficient and all the savings you want to realize by moving to maybe a Google in the future, are not going to be realized cause of all the effort it's going to take to get there. >> Dave: We had CERN on earlier, and they were working on that problem... >> Yeah, it's not a trivial problem to solve, but if you can crack that, and you can then say hey I wanna...even if I have a service offering, Like our operating a dark pool for Goldman. We also have a market tech side, where we sell our trading platform and various solutions to other exchanges worldwide. If we can come up with a way to be able to deploy to any cloud provider, even on an on-prem cloud, without having to do a bunch of customizations each time, that would be huge, it would revolutionize what we do. We're, as our own company, starting to look at that, and talking with Pentaho, they're also... are going to eye that as a potential way to go, with abstractions and things like that, but it's going to take some time. >> We're you guys here yesterday for the keynotes? >> Michael: Saw some of the keynotes, yes. >> The big messaging, like every conference that you go to, is be the disruptor, or you're going to get disrupted. We talked earlier off camera... Trading volumes are down, so the way you traditionally did business is changing, and made money is changing. >> Michael: Right. >> We talked earlier about you guys becoming a technology provider, I wonder if you could help us understand that a little bit, from the standpoint of NASDAQ strategy, when we hear your CEOs talk, real visionary, technology driven transformations. >> Yeah, I think Adena's coming in is definitely looking at that as a trend, right? Trading volumes are down, they've been going down, they've kind of stabilized a little bit, and we're stable able to make money in that space, but the problem is there's not a ton of growth. We acquire the ISE, we acquire the CHI-X, we're buying market share at that point. So you increase revenue, but you also increase overhead in that way. And you can only do so many major acquisitions at a time, you can only do how many one billion dollar acquisitions a year before you have to call it a day. And we can look at more strategic, smaller acquisitions for exchanges, but that doesn't necessarily bring you the transformation, the net revenue you're looking for. So what Adena has started to look at is, how do we transform to more of a technology company? We're really good at operating exchanges, how do we take that, and we already have market tech doing it, but how do we make that more scalable, not just to the financial sector, but to your other exchanges, your Ubers or your StubHubs of the world? How do you become a service provider, or a platform as a service for these other companies, to come in and use your tech? So we're looking at how do we rewrite our entire platform, from trading to the back-end, to do things like: Can we deploy to any cloud provider? Can we deploy on-prem? Can we be a little bit more technology agnostic so to speak, and offer these as services, and offer a bunch of microservices, so that if a startup comes up and wants to set up an exchange, they can do it, they can leverage our services, then build whatever other applications they want on top of it. I think that's a transformation we need to go through, I think it's good vision, and I'm looking forward to executing it. It's going to be a couple years before we see the fruits of that labor, but Adena's really doing a great job of coming in, and really driving that innovation, and Brad Peterson as well, our CIO, has really been pushing this vision, and I think it's really going to work out for us, assuming we can execute. >> Well you know what's interesting about that, if I may, is financial services is usually so secretive about their technology, right? But your business, you guys are becoming a technology provider, so you got to face the world and start marketing your capabilities now, and opening about that. It's sort of an interesting change. >> I think you'll see that starting to become more of a thing over the next year or two, as we start actually looking to build out the platform and figure it out. We do market on the market tech side, I mean it's not a small business, but we're more strategic about who we market to, cause we're still targeting your financial exchanges, more internationally than in the U.S., but there's only so many of them, again you have to start looking at rebranding, rebuilding, and rethinking how we think about exchanges in general, and not thinking of them as just a financial thing. >> Well that's what I wanted to get into, because you're talking about this rebranding, and this rebuilding, this transformation, to the backdrop within an industry that is changing rapidly, and we have sort of the threat of legislative reform, perhaps some administrative reforms coming down all the time, so how do you manage that? I mean, those are a lot of pressures there, are you constantly trying to push the envelope right up until any changes take place? Or what would you say Shere and Michael? >> Probably again not the right person to ask about this, but we're definitely trying to stay on top of the cutting edge in innovation and the technologies out there that, whether it be Blockchain, or different types of technologies. I mean we're definitely trying to make sure we're investing in them, while maintaining our core businesses. >> Right, it's trying to find that balance right now of when to make the next step in the technology food chain, and when to balance that with regulatory obligations. And if you look at it, going back to the idea of being able to launch marketplaces, I think what you're ending up seeing over the coming years is your Ubers, your StubHubs, I think they're going to become more regulated at some level. And we're good at operating more regulated markets, so I think that's where we can kind of come in and play a role, and help wade through those regulations a little bit more, and help build software to adhere to those regulations. >> Since you brought up Blockchain, Jamie Dimon craps all over Blockchain, or you know, Bitcoin, and then clarifies his remarks, saying look, technology underneath is here to stay. Thoughts on Blockchain? Obviously Financial Services is looking at it very closely, doing some really advanced stuff, what can you tell us? >> Yeah, I think there's no argument that it's definitely an innovation and a disruptive technology. I think that it's definitely in it's early stages across the board, so we're investing in it where we can, and trying to keep a close eye on it. We think that there's a lot of potential in a lot of different applications. >> As the NASDAQ transforms its business, how does that effect the sort of back-end analytics activity and infrastructure? >> The data is just growing, that's like the biggest challenge we have now. Data that used to be done in Excel, it's just no longer an option, so now in order to get the insights that we used to get just from having a couple people doing Excel transformations, you need to now invest in the infrastructure in the back-end, and so there's a lot that needs to go into building out an infrastructure to be able to ingest the data, and then also having the UI on the front-end, so that the business can actually view it the way they want. >> So skills wise, how's that affecting who you guys are hiring and training? And how's that transformation going? >> Michael: I'll let you go first. >> I think there's definitely, data analytics is a hot field. It's very new, there's definitely a big skills gap in administrative work and in the analytics side. Usually you have people could perform analytical functions just by being administrative or operational, and now it's really, we're investing in analysts, and making sure that we have the right people in place to be able to do these transformations, or pull the data and get the answers that we need from them. >> I mean from the tech side, I think what you're seeing is where we traditionally would just plug a developer in there, whether a Java developer, or an ETL developer, I think what you're seeing now is we're looking to bring more of a business minded data analyst to the tech side, right? So we're looking to bring a data engineer, so to speak, more to the tech side. So we're not looking to hire a traditional four year Computer Science degree, or Software Engineering degree, you're looking for a different breed of person, cause quite honestly because you're traditional Java dev. or C++ developer, they're not skilled or geared towards data. And when we've tried to plug that paradigm in, it just doesn't really work, so we're looking now to hiring more of an analyst, but someone who's a little bit more techie as well. They still need to have those skills to do some level of coding, and what we are finding is that skill gap is still very much... There's a gap there. There's a huge gap. And I think it's closing, but- >> And as you have to fund those for the new areas, I presume, like many companies in your business, you're trying to move away from the sort of undifferentiated low-level infrastructure deployment hassles, and the IT labor costs there, especially as we move to the cloud, presumably, so is that shift palpable? I mean, can you see that going on? >> Yeah, I think we made a lot of progress over the past couple years in doing that. We do more one button deployments, where the operation cost is a lot lower, a lot more automation around alerting, around when things go wrong, so there's not necessarily a human being sitting there watching a computer. We've invested a lot in that area to kind of reduce the costs, and make the experience better for our end user. And even from a development side, the cost of a new application is a lot less every time you have to do a release. The question is, how do you balance that with the regulations, and make sure you still have a good process in place. The idea of putting single button deployments in place is a great one, but you still have to balance that with making sure that what you push to productions been tested, well defined, and it meets the need, and you're not just arbitrarily throwing things out there. So we're still trying to hit that balance a little bit, it's more on the back-end side. The trading system is not quite there for obvious reasons, we're way more protective of what goes out there, then surrounding it a lot of the times, but I can see a future where, again going back to this idea of transforming our business, where you can stand up and do an exchange with the click of a button. I think that's a trend we're looking at. >> Rebecca: It's not too far in the future. >> No, I don't think it is. >> Last question, Pentaho report card. What are they doing really well? What do you want to see them do better? >> I think they continue to focus in the right areas, focusing more on the data processing side, and with the big data technologies, trying to fill that gap in the big data, and be the layer that you don't have to tie yourself to ike vCloud Air or MapR, you can kind of be a little bit more plug and play. I think they still need to do some improvements on there visualizations in their front-ends. I think they've been so much more focused on the data processing, that part of it, that the visualization's kind of lacked behind, so I think they need to put a little more focus into that, but all in all, they're an A, and we've been extremely happy with them as a software provider. >> Great. >> Shere: I think the visualization part is the part that allows people to understand that value being created at Pentaho. So I think being able to maybe improve a little bit on the visualization could go a far way. >> Michael, Shere, it's been so much fun having you on theCube, and having this conversation, keep that bull market coming please, do whatever you can. >> We'll do our best. >> I'm Rebecca Knight. We are here at PentahoWorld, sponsored by Hitachi Vantara. For Dave Vellante, we will have more from theCube in just a little bit.

Published Date : Oct 27 2017

SUMMARY :

brought to you by Hitachi Ventara. brought to you by Hitachi Ventara. So, excited to bring him along. Okay so you're a newbie the last time you came on, So the biggest thing that's So you're just getting So Pentaho is the engine So I got to do a NASDAQ of the S&P, right, so, we use a different And it goes up and down and the NASDAQ went up by a point, right. kind of the old days, and dark pools so now they don't have to and paint a picture of the and it just answer the about the data pipeline, And some of the pipeline there is just and you said it's no longer gut, in the decisions we're making. scenes at the conference. and I think it's going to that you don't see it as the ability to move your data and they were working on that problem... but it's going to take some time. so the way you traditionally from the standpoint of NASDAQ strategy, We acquire the ISE, we acquire the CHI-X, so you got to face the world We do market on the market tech side, and the technologies I think they're going to become stuff, what can you tell us? across the board, so we're so that the business can actually and in the analytics side. I mean from the tech side, and make the experience Rebecca: It's not What do you want to see them do better? and be the layer that you don't have to So I think being able to having you on theCube, and For Dave Vellante, we will

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Michael WeissPERSON

0.99+

Rebecca KnightPERSON

0.99+

RebeccaPERSON

0.99+

Dave VellantePERSON

0.99+

MichaelPERSON

0.99+

DavePERSON

0.99+

NYSEORGANIZATION

0.99+

NASDAQORGANIZATION

0.99+

AugustDATE

0.99+

Jamie DimonPERSON

0.99+

JuneDATE

0.99+

AWSORGANIZATION

0.99+

London Stock ExchangeORGANIZATION

0.99+

GoldmanORGANIZATION

0.99+

GoogleORGANIZATION

0.99+

2015DATE

0.99+

ExcelTITLE

0.99+

SherePERSON

0.99+

Goldman SachsORGANIZATION

0.99+

Shere SaidonPERSON

0.99+

Hong Kong Stock ExchangeORGANIZATION

0.99+

20 secondQUANTITY

0.99+

GooglesORGANIZATION

0.99+

four yearQUANTITY

0.99+

27 exchangesQUANTITY

0.99+

Brad PetersonPERSON

0.99+

5 pointsQUANTITY

0.99+

UbersORGANIZATION

0.99+

AdenaORGANIZATION

0.99+

Orlando, FloridaLOCATION

0.99+

seven new exchangesQUANTITY

0.99+

PentahoORGANIZATION

0.99+

CERNORGANIZATION

0.99+

first yearQUANTITY

0.99+

yesterdayDATE

0.99+

International Stock ExchangeORGANIZATION

0.99+

three optionsQUANTITY

0.99+

two years agoDATE

0.99+

JavaTITLE

0.99+

first timeQUANTITY

0.98+

Hitachi VantaraORGANIZATION

0.98+

oneQUANTITY

0.98+

DavPERSON

0.98+

U.S.LOCATION

0.98+

a dayQUANTITY

0.98+

3QUANTITY

0.98+

this weekDATE

0.98+

bothQUANTITY

0.97+

each timeQUANTITY

0.97+

StubHubsORGANIZATION

0.97+

SparkORGANIZATION

0.97+

ISEORGANIZATION

0.97+

Hitachi VentaraORGANIZATION

0.97+

Aaron Kalb, Alation | BigData NYC 2017


 

>> Announcer: Live from midtown Manhattan, it's the Cube. Covering Big Data New York City 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. >> Welcome back everyone, we are here live in New York City, in Manhattan for BigData NYC, our event we've been doing for five years in conjunction with Strata Data which is formerly Strata Hadoop, which was formerly Strata Conference, formerly Hadoop World. We've been covering the big data space going on ten years now. This is the Cube. I'm here with Aaron Kalb, whose Head of Product and co-founder at Alation. Welcome to the cube. >> Aaron Kalb: Thank you so much for having me. >> Great to have you on, so co-founder head of product, love these conversations because you're also co-founder, so it's your company, you got a lot of equity interest in that, but also head of product you get to have the 20 mile stare, on what the future looks, while inventing it today, bringing it to market. So you guys have an interesting take on the collaboration of data. Talk about what the means, what's the motivation behind that positioning, what's the core thesis around Alation? >> Totally so the thing we've observed is a lot of people working in the data space, are concerned about the data itself. How can we make it cheaper to store, faster to process. And we're really concerned with the human side of it. Data's only valuable if it's used by people, how do we help people find the data, understand the data, trust in the data, and that involves a mix of algorithmic approaches and also human collaboration, both human to human and human to computer to get that all organized. >> John Furrier: It's interesting you have a symbolics background from Stanford, worked at Apple, involved in Siri, all this kind of futuristic stuff. You can't go a day without hearing about Alexia is going to have voice-activated, you've got Siri. AI is taking a really big part of this. Obviously all of the hype right now, but what it means is the software is going to play a key role as an interface. And this symbolic systems almost brings on this neural network kind of vibe, where objects, data, plays a critical role. >> Oh, absolutely, yeah, and in the early days when we were co-founding the company, we talked about what is Siri for the enterprise? Right, I was you know very excited to work on Siri, and it's really a kind of fun gimmick, and it's really useful when you're in the car, your hands are covered in cookie dough, but if you could answer questions like what was revenue last quarter in the UK and get the right answer fast, and have that dialogue, oh do you mean fiscal quarter or calendar quarter. Do you mean UK including Ireland, or whatever it is. That would really enable better decisions and a better outcome. >> I was worried that Siri might do something here. Hey Siri, oh there it is, okay be careful, I don't want it to answer and take over my job. >> (laughs) >> Automation will take away the job, maybe Siri will be doing interviews. Okay let's take a step back. You guys are doing well as a start up, you've got some great funding, great investors. How are you guys doing on the product? Give us a quick highlight on where you guys are, obviously this is BigData NYC a lot going on, it's Manhattan, you've got financial services, big industry here. You've got the Strata Data event which is the classic Hadoop industry that's morphed into data. Which really is overlapping with cloud, IoTs application developments all kind of coming together. How do you guys fit into that world? >> Yeah, absolutely, so the idea of the data lake is kind of interesting. Psychologically it's sort of a hoarder mentality, oh everything I've ever had I want to keep in the attic, because I might need it one day. Great opportunity to evolve these new streams of data, with IoT and what not, but just cause you can get to it physically doesn't mean it's easy to find the thing you want, the needle in all that big haystack and to distinguish from among all the different assets that are available, which is the one that is actually trustworthy for your need. So we find that all these trends make the need for a catalog to kind of organize that information and get what you want all the more valuable. >> This has come up a lot, I want to get into the integration piece and how you're dealing with your partnerships, but the data lake integration has been huge, and having the catalog has come up with, has been the buzz. Foundationally if you will saying catalog is important. Why is it important to do the catalog work up front, with a lot of the data strategies? >> It's a great question, so, we see data cataloging as step zero. Before you can prep the data in a tool like Trifacta, PACSAT, or Kylo. Before you can visualize it in a tool like Tableau, or MicroStrategy. Before you can do some sort of cool prediction of what's going to happen in the future, with a data science engine, before any of that. These are all garbage in garbage out processes. The step zero is find the relevant data. Understand it so you can get it in the right format. Trust that it's good and then you can do whatever comes next >> And governance has become a key thing here, we've heard of the regulations, GDPR outside of the United States, but also that's going to have an arms length reach over into the United States impact. So these little decisions, and there's going to be an Equifax someday out there. Another one's probably going to come around the corner. How does the policy injection change the catalog equation? A lot of people are building machine learning algorithms on top of catalogs, and they're worried they might have to rewrite everything. How do you balance the trade off between good catalog design and flexibility on the algorithm side? >> Totally yes it's a complicated thing with governance and consumption right. There's people who are concerned with keeping the data safe, and there are people concerned with turning that data into real value, and these can seem to be at odds. What we find is actually a catalog as a foundation for both, and they are not as opposed as they seem. What Alation fundamentally does is we make a map of where the data is, who's using what data, when, how. And that can actually be helpful if your goal is to say let's follow in the footsteps of the best analyst and make more insights generated or if you want to say, hey this data is being used a lot, let's make sure it's being used correctly. >> And by the right people. >> And by the right people exactly >> Equifax they were fishing that pond dry months, months before it actually happened. With good tools like this they might have seen this right? Am I getting it right? >> That's exactly right, how can you observe what's going on to make sure it's compliant and that the answers are correct and that it's happening quickly and driving results. >> So in a way you're taking the collective intelligence of the user behavior and using that into understanding what to do with the data modeling? >> That's exactly right. We want to make each person in your organization as knowledgeable as all of their peers combined. >> So the benefit then for the customer would be if you see something that's developing you can double down on it. And if the users are using a lot of data, then you can provision more technology, more software. >> Absolutely, absolutely. It's sort of like when I was going to Stanford, there was a place where the grass was all dead, because people were riding their bikes diagonally across it. And then somebody smart was like, we're going to put a real gravel path there. So the infrastructure should follow the usage, instead of being something you try to enforce on people. >> It's a classic design meme that goes around. Good design is here, the more effective design is the path. >> Exactly. >> So let's get into the integration. So one of the hot topics here this year obviously besides cloud and AI, with cloud really being more the driver, the tailwind for the growth, AI being more the futuristic head room, is integration. You guys have some partnerships that you announced with integration, what are some of the key ones, and why are they important? >> Absolutely, so, there have been attempts in the past to centralize all the data in one place have one warehouse or one lake have one BI tool. And those generally fail, for different reasons, different teams pick different stacks that work for them. What we think is important is the single source of reference One hub with spokes out to all those different points. If you think about it it's like Google, it's one index of the whole web even though the web is distributed all over the place. To make that happen it's very important that we have partnerships to get data in from various sources. So we have partnerships with database vendors, with Cloudera and Hortonworks, with different BI tools. What's new are a few things. One is with Cloudera Navigator, they have great technical metadata around security and lineage over HGFS, and that's a way to bolster our catalog to go even deeper into what's happening in the files before things get surfaced and higher for places where we have a deeper offering today. >> So it's almost a connector to them in a way, you kind of share data. >> That's exactly right, we've a lot of different connectors, this is one new one that we have. Another, go ahead. >> I was going to go ahead continue. >> I was just going to say another place that is exciting is data prep tools, so Trifacta and Paxata are both places where you can find and understand an alation and then begin to manipulate in those tools. We announced with Paxata yesterday, the ability to click to profile, so if you want to actually see what's in some raw compressed avro file, you can see that in one click. >> It's interesting, Paxata has really been almost lapping, Trifacta because they were the leader in my mind, but now you've got like a Nascar race going on between the two firms, because data wrangling is a huge issue. Data prep is where everyone is stuck right now, they just want to do the data science, it's interesting. >> They are both amazing companies and I'm happy to partner with both. And actually Trifacta and Alation have a lot of joint customers we're psyched to work with as well. I think what's interesting is that data prep, and this is beginning to happen with analyst definitions of that field. It isn't just preparing the data to be used, getting it cleaned and shaped, it's also preparing the humans to use the data giving them the confidence, the tools, the knowledge to know how to manipulate it. >> And it's great progress. So the question I wanted to ask is now the other big trend here is, I mean it's kind of a subtext in this show, it's not really front and center but we've been seeing it kind of emerge as a concept, we see in the cloud world, on premise vs cloud. On premise a lot of people bring in the dev ops model in, and saying I may move to the cloud for bursting and some native applications, but at the end of the day there is a lot of work going on on premise. A lot of companies are kind of cleaning house, retooling, replatforming, whatever you want to do resetting. They are kind of getting their house in order to do on prem cloud ops, meaning a business model of cloud operations on site. A lot of people doing that, that will impact the story, it's going to impact some of the server modeling, that's a hot trend. How do you guys deal with the on premise cloud dynamic? >> Totally, so we just want to do what's right for the customer, so we deploy both on prem and in the cloud and then from wherever the Alation server is it will point to usually a mix of sources, some that are in the cloud like vetshifter S3 often with Amazon today, and also sources that are on prem. I do think I'm seeing a trend more and more toward the cloud and we have people that are migrating from HGFS to S3 is one thing we hear a lot about it. Strata with sort of dupe interest. But I think what's happening is people are realizing as each Equifax in turn happens, that this old wild west model of oh you surround your bank with people on horseback and it's physically in one place. With data it isn't like that, most people are saying I'd rather have the A+ teams at Salesforce or Amazon or Google be responsible for my security, then the people I can get over in the midwest. >> And the Paxata guys have loved the term Data Democracy, because that is really democratization, making the data free but also having the governance thing. So tell me about the Data Lake governance, because I've never loved the term Data Lake, I think it's more of a data ocean, but now you see data lake, data lake, data lake. Are they just silos of data lakes happening now? Are people trying to connect them? That's key, so that's been a key trend here. How do you handle the governance across multiple data lakes? >> That's right so the key is to have that single source of reference, so that regardless of which lake or warehouse, or little siloed Sequel server somewhere, that you can search in a single portal and find that thing no matter where it is. >> John: Can you guys do that? >> We can do that, yeah, I think the metaphor for people who haven't seen it really is Google, if you think about it, you don't even know what physical server a webpage is hosted from. >> Data lakes should just be invisible >> Exactly. >> So your interfacing with multiple data lakes, that's a value proposition for you. >> That's right so it could be on prem or in the cloud, multi-cloud. >> Can you share an example of a customer that uses that and kind of how it's laid out? >> Absolutely, so one great example of an interesting data environment is eBay. They have the biggest teradata warehouse in the world. They also have I believe two huge data lakes, they have hive on top of that, and Presto is used to sort of virtualize it across a mixture of teradata, and hive and then direct Presto query It gets very complicated, and they have, they are a very data driven organization, so they have people who are product owners who are in jobs where data isn't in their job title and they know how to look at excel and look at numbers and make choices, but they aren't real data people. Alation provides that accessibility so that they can understand it. >> We used to call the Hadoop world the car show for the data world, where for a long time it was about the engine what was doing what, and then it became, what's the car, and now how's it drive. Seeing that same evolution now where all that stuff has to get done under the hood. >> Aaron: Exactly. >> But there are still people who care about that, right. They are the mechanics, they are the plumbers, whatever you want to call them, but then the data science are the guys really driving things and now end users potentially, and even applications bots or what nots. It seems to evolve, that's where we're kind of seeing the show change a little bit, and that's kind of where you see some of the AI things. I want to get your thoughts on how you or your guys are using AI, how you see AI, if it's AI at all if it's just machine learning as a baby step into AI, we all know what AI could be, but it's really just machine learning now. How do you guys use quote AI and how has it evolved? >> It's a really insightful question and a great metaphor that I love. If you think about it, it used to be how do you build the car, and now I can drive the car even though I couldn't build it or even fix it, and soon I don't even have to drive the car, the car will just drive me, all I have to know is where I want to go. That's sortof the progression that we see as well. There's a lot of talk about deep learning, all these different approaches, and it's super interesting and exciting. But I think even more interesting than the algorithms are the applications. And so for us it's like today how do we get that turn by turn directions where we say turn left at the light if you want to get there And eventually you know maybe the computer can do it for you The thing that is also interesting is to make these algorithms work no matter how good your algorithm is it's all based on the quality of your training data. >> John: Which is a historical data. Historical data in essence the more historical data you have you need that to train the data. >> Exactly right, and we call this behavior IO how do we look at all the prior human behavior to drive better behavior in the future. And I think the key for us is we don't want to have a bunch of unpaid >> John: You can actually get that URL behavioral IO. >> We should do it before it's too late (Both laugh) >> We're live right now, go register that Patrick. >> Yeah so the goal is we don't want to have a bunch of unpaid interns trying to manually attack things, that's error prone and that's slow. I look at things like Luis von Ahn over at CMU, he does a thing where as you're writing in a CAPTCHA to get an email account you're also helping Google recognize a hard to read address or a piece of text from books. >> John: If you shoot the arrow forward, you just take this kind of forward, you almost think augmented reality is a pretext to what we might see for what you're talking about and ultimately VR are you seeing some of the use cases for virtual reality be very enterprise oriented or even end consumer. I mean Tom Brady the best quarterback of all time, he uses virtual reality to play the offense virtually before every game, he's a power user, in pharma you see them using virtual reality to do data mining without being in the lab, so lab tests. So you're seeing augmentation coming in to this turn by turn direction analogy. >> It's exactly, I think it's the other half of it. So we use AI, we use techniques to get great data from people and then we do extra work watching their behavior to learn what's right. And to figure out if there are recommendations, but then you serve those recommendations, either it's Google glasses it appears right there in your field of view. We just have to figure out how do we make sure, that in a moment of you're making a dashboard, or you're making a choice that you have that information right on hand. >> So since you're a technical geek, and a lot of folks would love to talk about this, so I'll ask you a tough question cause this is something everyone is trying to chase for the holy grail. How do you get the right piece of data at the right place at the right time, given that you have all these legacy silos, latencies and network issues as well, so you've got a data warehouse, you've got stuff in cold storage, and I've got an app and I'm doing something, there could be any points of data in the world that could be in milliseconds potentially on my phone or in my device my internet of thing wearable. How do you make that happen? Because that's the struggle, at the same time keep all the compliance and all the overhead involved, is it more compute, is it an architectural challenge how do you view that because this is the big challenge of our time. >> Yeah again I actually think it's the human challenge more than the technology challenge. It is true that there is data all over the place kind of gathering dust, but again if you think about Google, billions of web pages, I only care about the one I'm about to use. So for us it's really about being in that moment of writing a query, building a chart, how do we say in that moment, hey you're using an out of date definition of profit. Or hey the database you chose to use, the one thing you chose out of the millions that is actually is broken and stale. And we have interventions to do that with our partners and through our own first party apps that actually change how decisions get made at companies. >> So to make that happen, if I imagine it, you'd have to need access to the data, and then write software that is contextually aware to then run, compute, in context to the user interaction. >> It's exactly right, back to the turn by turn directions concept you have to know both where you're trying to go and where you are. And so for us that can be the from where I'm writing a Sequel statement after join we can suggest the table most commonly joined with that, but also overlay onto that the fact that the most commonly joined table was deprecated by a data steward data curator. So that's the moment that we can change the behavior from bad to good. >> So a chief data officer out there, we've got to wrap up, but I wanted to ask one final question, There's a chief data officer out there they might be empowered or they might be just a CFO assistant that's managing compliance, either way, someone's going to be empowered in an organization to drive data science and data value forward because there is so much proof that data science works. From military to play you're seeing examples where being data driven actually has benefits. So everyone is trying to get there. How do you explain the vision of Alation to that prospect? Because they have so much to select from, there's so much noise, there's like, we call it the tool shed out there, there's like a zillion tools out there there's like a zillion platforms, some tools are trying to turn into something else, a hammer is trying to be a lawnmower. So they've got to be careful on who the select, so what's the vision of Alation to that chief data officer, or that person in charge of analytics to scale operational analytics. >> Absolutely so we say to the CDO we have a shared vision for this place where your company is making decisions based on data, instead of based on gut, or expensive consultants months too late. And the way we get there, the reason Alation adds value is, we're sort of the last tool you have to buy, because with this lake mentality, you've got your tool shed with all the tools, you've got your library with all the books, but they're just in a pile on the floor, if you had a tool that had everything organized, so you just said hey robot, I need an hammer and this size nail and this text book on this set of information and it could just come to you, and it would be correct and it would be quick, then you could actually get value out of all the expense you've already put in this infrastructure, that's especially true on the lake. >> And also tools describe the way the works done so in that model tools can be in the tool shed no one needs to know it's in there. >> Aaron: Exactly. >> You guys can help scale that. Well congratulations and just how far along are you guys in terms of number of employees, how many customers do you have? If you can share that, I don't know if that's confidential or what not >> Absolutely, so we're small but growing very fast planning to double in the next year, and in terms of customers, we've got 85 customers including some really big names. I mentioned eBay, Pfizer, Safeway Albertsons, Tesco, Meijer. >> And what are they saying to you guys, why are they buying, why are they happy? >> They share that same vision of a more data driven enterprise, where humans are empowered to find out, understand, and trust data to make more informed choices for the business, and that's why they come and come back. >> And that's the product roadmap, ethos, for you guys that's the guiding principle? >> Yeah the ultimate goal is to empower humans with information. >> Alright Aaron thanks for coming on the Cube. Aaron Kalb, co-founder head of product for Alation here in New York City for BigData NYC and also Strata Data I'm John Furrier thanks for watching. We'll be right back with more after this short break.

Published Date : Sep 28 2017

SUMMARY :

Brought to you by This is the Cube. Great to have you on, so co-founder head of product, Totally so the thing we've observed is a lot Obviously all of the hype right now, and get the right answer fast, and have that dialogue, I don't want it to answer and take over my job. How are you guys doing on the product? doesn't mean it's easy to find the thing you want, and having the catalog has come up with, has been the buzz. Understand it so you can get it in the right format. and flexibility on the algorithm side? and make more insights generated or if you want to say, Am I getting it right? That's exactly right, how can you observe what's going on We want to make each person in your organization So the benefit then for the customer would be So the infrastructure should follow the usage, Good design is here, the more effective design is the path. You guys have some partnerships that you announced it's one index of the whole web So it's almost a connector to them in a way, this is one new one that we have. the ability to click to profile, going on between the two firms, It isn't just preparing the data to be used, but at the end of the day there is a lot of work for the customer, so we deploy both on prem and in the cloud because that is really democratization, making the data free That's right so the key is to have that single source really is Google, if you think about it, So your interfacing with multiple data lakes, on prem or in the cloud, multi-cloud. They have the biggest teradata warehouse in the world. the car show for the data world, where for a long time and that's kind of where you see some of the AI things. and now I can drive the car even though I couldn't build it Historical data in essence the more historical data you have to drive better behavior in the future. Yeah so the goal is and ultimately VR are you seeing some of the use cases but then you serve those recommendations, and all the overhead involved, is it more compute, the one thing you chose out of the millions So to make that happen, if I imagine it, back to the turn by turn directions concept you have to know How do you explain the vision of Alation to that prospect? And the way we get there, no one needs to know it's in there. If you can share that, I don't know if that's confidential planning to double in the next year, for the business, and that's why they come and come back. Yeah the ultimate goal is Alright Aaron thanks for coming on the Cube.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Luis von AhnPERSON

0.99+

eBayORGANIZATION

0.99+

Aaron KalbPERSON

0.99+

PfizerORGANIZATION

0.99+

JohnPERSON

0.99+

AaronPERSON

0.99+

TescoORGANIZATION

0.99+

John FurrierPERSON

0.99+

Safeway AlbertsonsORGANIZATION

0.99+

SiriTITLE

0.99+

GoogleORGANIZATION

0.99+

AmazonORGANIZATION

0.99+

New York CityLOCATION

0.99+

UKLOCATION

0.99+

20 mileQUANTITY

0.99+

HortonworksORGANIZATION

0.99+

BigDataORGANIZATION

0.99+

five yearsQUANTITY

0.99+

EquifaxORGANIZATION

0.99+

two firmsQUANTITY

0.99+

AppleORGANIZATION

0.99+

MeijerORGANIZATION

0.99+

ten yearsQUANTITY

0.99+

ClouderaORGANIZATION

0.99+

TrifactaORGANIZATION

0.99+

85 customersQUANTITY

0.99+

AlationORGANIZATION

0.99+

PatrickPERSON

0.99+

bothQUANTITY

0.99+

Strata DataORGANIZATION

0.99+

millionsQUANTITY

0.99+

United StatesLOCATION

0.99+

PaxataORGANIZATION

0.99+

SiliconANGLE MediaORGANIZATION

0.99+

excelTITLE

0.99+

ManhattanLOCATION

0.99+

last quarterDATE

0.99+

IrelandLOCATION

0.99+

GDPRTITLE

0.99+

Tom BradyPERSON

0.99+

each personQUANTITY

0.99+

SalesforceORGANIZATION

0.98+

next yearDATE

0.98+

NYCLOCATION

0.98+

oneQUANTITY

0.98+

this yearDATE

0.98+

yesterdayDATE

0.98+

todayDATE

0.97+

one lakeQUANTITY

0.97+

NascarORGANIZATION

0.97+

one warehouseQUANTITY

0.97+

Strata DataEVENT

0.96+

TableauTITLE

0.96+

OneQUANTITY

0.96+

Both laughQUANTITY

0.96+

billions of web pagesQUANTITY

0.96+

single portalQUANTITY

0.95+

Fireside Chat with Andy Jassy, AWS CEO, at the AWS Summit SF 2017


 

>> Announcer: Please welcome Vice President of Worldwide Marketing, Amazon Web Services, Ariel Kelman. (applause) (techno music) >> Good afternoon, everyone. Thank you for coming. I hope you guys are having a great day here. It is my pleasure to introduce to come up on stage here, the CEO of Amazon Web Services, Andy Jassy. (applause) (techno music) >> Okay. Let's get started. I have a bunch of questions here for you, Andy. >> Just like one of our meetings, Ariel. >> Just like one of our meetings. So, I thought I'd start with a little bit of a state of the state on AWS. Can you give us your quick take? >> Yeah, well, first of all, thank you, everyone, for being here. We really appreciate it. We know how busy you guys are. So, hope you're having a good day. You know, the business is growing really quickly. In the last financials, we released, in Q four of '16, AWS is a 14 billion dollar revenue run rate business, growing 47% year over year. We have millions of active customers, and we consider an active customer as a non-Amazon entity that's used the platform in the last 30 days. And it's really a very broad, diverse customer set, in every imaginable size of customer and every imaginable vertical business segment. And I won't repeat all the customers that I know Werner went through earlier in the keynote, but here are just some of the more recent ones that you've seen, you know NELL is moving their their digital and their connected devices, meters, real estate to AWS. McDonalds is re-inventing their digital platform on top of AWS. FINRA is moving all in to AWS, yeah. You see at Reinvent, Workday announced AWS was its preferred cloud provider, and to start building on top of AWS further. Today, in press releases, you saw both Dunkin Donuts and Here, the geo-spatial map company announced they'd chosen AWS as their provider. You know and then I think if you look at our business, we have a really large non-US or global customer base and business that continues to expand very dramatically. And we're also aggressively increasing the number of geographic regions in which we have infrastructure. So last year in 2016, on top of the broad footprint we had, we added Korea, India, and Canada, and the UK. We've announced that we have regions coming, another one in China, in Ningxia, as well as in France, as well as in Sweden. So we're not close to being done expanding geographically. And then of course, we continue to iterate and innovate really quickly on behalf of all of you, of our customers. I mean, just last year alone, we launched what we considered over 1,000 significant services and features. So on average, our customers wake up every day and have three new capabilities they can choose to use or not use, but at their disposal. You've seen it already this year, if you look at Chime, which is our new unified communication service. It makes meetings much easier to conduct, be productive with. You saw Connect, which is our new global call center routing service. If you look even today, you look at Redshift Spectrum, which makes it easy to query all your data, not just locally on disk in your data warehouse but across all of S3, or DAX, which puts a cash in front of DynamoDB, we use the same interface, or all the new features in our machine learning services. We're not close to being done delivering and iterating on your behalf. And I think if you look at that collection of things, it's part of why, as Gartner looks out at the infrastructure space, they estimate the AWS is several times the size business of the next 14 providers combined. It's a pretty significant market segment leadership position. >> You talked a lot about adopts in there, a lot of customers moving to AWS, migrating large numbers of workloads, some going all in on AWS. And with that as kind of backdrop, do you still see a role for hybrid as being something that's important for customers? >> Yeah, it's funny. The quick answer is yes. I think the, you know, if you think about a few years ago, a lot of the rage was this debate about private cloud versus what people call public cloud. And we don't really see that debate very often anymore. I think relatively few companies have had success with private clouds, and most are pretty substantially moving in the direction of building on top of clouds like AWS. But, while you increasingly see more and more companies every month announcing that they're going all in to the cloud, we will see most enterprises operate in some form of hybrid mode for the next number of years. And I think in the early days of AWS and the cloud, I think people got confused about this, where they thought that they had to make this binary decision to either be all in on the public cloud and AWS or not at all. And of course that's not the case. It's not a binary decision. And what we know many of our enterprise customers want is they want to be able to run the data centers that they're not ready to retire yet as seamlessly as they can alongside of AWS. And it's why we've built a lot of the capabilities we've built the last several years. These are things like PPC, which is our virtual private cloud, which allows you to cordon off a portion of our network, deploy resources into it and connect to it through VPN or Direct Connect, which is a private connection between your data centers and our regions or our storage gateway, which is a virtual storage appliance, or Identity Federation, or a whole bunch of capabilities like that. But what we've seen, even though the vast majority of the big hybrid implementations today are built on top of AWS, as more and more of the mainstream enterprises are now at the point where they're really building substantial cloud adoption plans, they've come back to us and they've said, well, you know, actually you guys have made us make kind of a binary decision. And that's because the vast majority of the world is virtualized on top of VMWare. And because VMWare and AWS, prior to a few months ago, had really done nothing to try and make it easy to use the VMWare tools that people have been using for many years seamlessly with AWS, customers were having to make a binary choice. Either they stick with the VMWare tools they've used for a while but have a really tough time integrating with AWS, or they move to AWS and they have to leave behind the VMWare tools they've been using. And it really was the impetus for VMWare and AWS to have a number of deep conversations about it, which led to the announcement we made late last fall of VMWare and AWS, which is going to allow customers who have been using the VMWare tools to manage their infrastructure for a long time to seamlessly be able to run those on top of AWS. And they get to do so as they move workloads back and forth and they evolve their hybrid implementation without having to buy any new hardware, which is a big deal for companies. Very few companies are looking to find ways to buy more hardware these days. And customers have been very excited about this prospect. We've announced that it's going to be ready in the middle of this year. You see companies like Amadeus and Merck and Western Digital and the state of Louisiana, a number of others, we've a very large, private beta and preview happening right now. And people are pretty excited about that prospect. So we will allow customers to run in the mode that they want to run, and I think you'll see a huge transition over the next five to 10 years. >> So in addition to hybrid, another question we get a lot from enterprises around the concept of lock-in and how they should think about their relationship with the vendor and how they should think about whether to spread the workloads across multiple infrastructure providers. How do you think about that? >> Well, it's a question we get a lot. And Oracle has sure made people care about that issue. You know, I think people are very sensitive about being locked in, given the experience that they've had over the last 10 to 15 years. And I think the reality is when you look at the cloud, it really is nothing like being locked into something like Oracle. The APIs look pretty similar between the various providers. We build an open standard, it's like Linux and MySQL and Postgres. All the migration tools that we build allow you to migrate in or out of AWS. It's up to customers based on how they want to run their workload. So it is much easier to move away from something like the cloud than it is from some of the old software services that has created some of this phobia. But I think when you look at most CIOs, enterprise CIOs particularly, as they think about moving to the cloud, many of them started off thinking that they, you know, very well might split their workloads across multiple cloud providers. And I think when push comes to shove, very few decide to do so. Most predominately pick an infrastructure provider to run their workloads. And the reason that they don't split it across, you know, pretty evenly across clouds is a few reasons. Number one, if you do so, you have to standardize in the lowest common denominator. And these platforms are in radically different stages at this point. And if you look at something like AWS, it has a lot more functionality than anybody else by a large margin. And we're also iterating more quickly than you'll find from the other providers. And most folks don't want to tie the hands of their developers behind their backs in the name of having the ability of splitting it across multiple clouds, cause they actually are, in most of their spaces, competitive, and they have a lot of ideas that they want to actually build and invent on behalf of their customers. So, you know, they don't want to actually limit their functionality. It turns out the second reason is that they don't want to force their development teams to have to learn multiple platforms. And most development teams, if any of you have managed multiple stacks across different technologies, and many of us have had that experience, it's a pain in the butt. And trying to make a shift from what you've been doing for the last 30 years on premises to the cloud is hard enough. But then forcing teams to have to get good at running across two or three platforms is something most teams don't relish, and it's wasteful of people's time, it's wasteful of natural resources. That's the second thing. And then the third reason is that you effectively diminish your buying power because all of these cloud providers have volume discounts, and then you're splitting what you buy across multiple providers, which gives you a lower amount you buy from everybody at a worse price. So when most CIOs and enterprises look at this carefully, they don't actually end up splitting it relatively evenly. They predominately pick a cloud provider. Some will just pick one. Others will pick one and then do a little bit with a second, just so they know they can run with a second provider, in case that relationship with the one they choose to predominately run with goes sideways in some fashion. But when you really look at it, CIOs are not making that decision to split it up relatively evenly because it makes their development teams much less capable and much less agile. >> Okay, let's shift gears a little bit, talk about a subject that's on the minds of not just enterprises but startups and government organizations and pretty much every organization we talk to. And that's AI and machine learning. Reinvent, we introduced our Amazon AI services and just this morning Werner announced the general availability of Amazon Lex. So where are we overall on machine learning? >> Well it's a hugely exciting opportunity for customers, and I think, we believe it's exciting for us as well. And it's still in the relatively early stages, if you look at how people are using it, but it's something that we passionately believe is going to make a huge difference in the world and a huge difference with customers, and that we're investing a pretty gigantic amount of resource and capability for our customers. And I think the way that we think about, at a high level, the machine learning and deep learning spaces are, you know, there's kind of three macro layers of the stack. I think at that bottom layer, it's generally for the expert machine learning practitioners, of which there are relatively few in the world. It's a scarce resource relative to what I think will be the case in five, 10 years from now. And these are folks who are comfortable working with deep learning engines, know how to build models, know how to tune those models, know how to do inference, know how to get that data from the models into production apps. And for that group of people, if you look at the vast majority of machine learning and deep learning that's being done in the cloud today, it's being done on top of AWS, are P2 instances, which are optimized for deep learning and our deep learning AMIs, that package, effectively the deep learning engines and libraries inside those AMIs. And you see companies like Netflix, Nvidia, and Pinterest and Stanford and a whole bunch of others that are doing significant amounts of machine learning on top of those optimized instances for machine learning and the deep learning AMIs. And I think that you can expect, over time, that we'll continue to build additional capabilities and tools for those expert practitioners. I think we will support and do support every single one of the deep learning engines on top of AWS, and we have a significant amount of those workloads with all those engines running on top of AWS today. We also are making, I would say, a disproportionate investment of our own resources and the MXNet community just because if you look at running deep learning models once you get beyond a few GPUs, it's pretty difficult to have those scale as you get into the hundreds of GPUs. And most of the deep learning engines don't scale very well horizontally. And so what we've found through a lot of extensive testing, cause remember, Amazon has thousands of deep learning experts inside the company that have built very sophisticated deep learning capabilities, like the ones you see in Alexa, we have found that MXNet scales the best and almost linearly, as we continue to add nodes, as we continue to horizontally scale. So we have a lot of investment at that bottom layer of the stack. Now, if you think about most companies with developers, it's still largely inaccessible to them to do the type of machine learning and deep learning that they'd really like to do. And that's because the tools, I think, are still too primitive. And there's a number of services out there, we built one ourselves in Amazon Machine Learning that we have a lot of customers use, and yet I would argue that all of those services, including our own, are still more difficult than they should be for everyday developers to be able to build machine learning and access machine learning and deep learning. And if you look at the history of what AWS has done, in every part of our business, and a lot of what's driven us, is trying to democratize technologies that were really only available and accessible before to a select, small number of companies. And so we're doing a lot of work at what I would call that middle layer of the stack to get rid of a lot of the muck associated with having to do, you know, building the models, tuning the models, doing the inference, figuring how to get the data into production apps, a lot of those capabilities at that middle layer that we think are really essential to allow deep learning and machine learning to reach its full potential. And then at the top layer of the stack, we think of those as solutions. And those are things like, pass me an image and I'll tell you what that image is, or show me this face, does it match faces in this group of faces, or pass me a string of text and I'll give you an mpg file, or give me some words and what your intent is and then I'll be able to return answers that allow people to build conversational apps like the Lex technology. And we have a whole bunch of other services coming in that area, atop of Lex and Polly and Recognition, and you can imagine some of those that we've had to use in Amazon over the years that we'll continue to make available for you, our customers. So very significant level of investment at all three layers of that stack. We think it's relatively early days in the space but have a lot of passion and excitement for that. >> Okay, now for ML and AI, we're seeing customers wanting to load in tons of data, both to train the models and to actually process data once they've built their models. And then outside of ML and AI, we're seeing just as much demand to move in data for analytics and traditional workloads. So as people are looking to move more and more data to the cloud, how are we thinking about making it easier to get data in? >> It's a great question. And I think it's actually an often overlooked question because a lot of what gets attention with customers is all the really interesting services that allow you to do everything from compute and storage and database and messaging and analytics and machine learning and AI. But at the end of the day, if you have a significant amount of data already somewhere else, you have to get it into the cloud to be able to take advantage of all these capabilities that you don't have on premises. And so we have spent a disproportionate amount of focus over the last few years trying to build capabilities for our customers to make this easier. And we have a set of capabilities that really is not close to matched anywhere else, in part because we have so many customers who are asking for help in this area that it's, you know, that's really what drives what we build. So of course, you could use the good old-fashioned wire to send data over the internet. Increasingly, we find customers that are trying to move large amounts of data into S3, is using our S3 transfer acceleration service, which basically uses our points of presence, or POPs, all over the world to expedite delivery into S3. You know, a few years ago, we were talking to a number of companies that were looking to make big shifts to the cloud, and they said, well, I need to move lots of data that just isn't viable for me to move it over the wire, given the connection we can assign to it. It's why we built Snowball. And so we launched Snowball a couple years ago, which is really, it's a 50 terabyte appliance that is encrypted, the data's encrypted three different ways, and you ingest the data from your data center into Snowball, it has a Kindle connected to it, it allows you to, you know, that makes sure that you send it to the right place, and you can also track the progress of your high-speed ingestion into our data centers. And when we first launched Snowball, we launched it at Reinvent a couple years ago, I could not believe that we were going to order as many Snowballs to start with as the team wanted to order. And in fact, I reproached the team and I said, this is way too much, why don't we first see if people actually use any of these Snowballs. And so the team thankfully didn't listen very carefully to that, and they really only pared back a little bit. And then it turned out that we, almost from the get-go, had ordered 10X too few. And so this has been something that people have used in a very broad, pervasive way all over the world. And last year, at the beginning of the year, as we were asking people what else they would like us to build in Snowball, customers told us a few things that were pretty interesting to us. First, one that wasn't that surprising was they said, well, it would be great if they were bigger, you know, if instead of 50 terabytes it was more data I could store on each device. Then they said, you know, one of the problems is when I load the data onto a Snowball and send it to you, I have to still keep my local copy on premises until it's ingested, cause I can't risk losing that data. So they said it would be great if you could find a way to provide clustering, so that I don't have to keep that copy on premises. That was pretty interesting. And then they said, you know, there's some of that data that I'd actually like to be loading synchronously to S3, and then, or some things back from S3 to that data that I may want to compare against. That was interesting, having that endpoint. And then they said, well, we'd really love it if there was some compute on those Snowballs so I can do analytics on some relatively short-term signals that I want to take action on right away. Those were really the pieces of feedback that informed Snowball Edge, which is the next version of Snowball that we launched, announced at Reinvent this past November. So it has, it's a hundred-terabyte appliance, still the same level of encryption, and it has clustering so that you don't have to keep that copy of the data local. It allows you to have an endpoint to S3 to synchronously load data back and forth, and then it has a compute inside of it. And so it allows customers to use these on premises. I'll give you a good example. GE is using these for their wind turbines. And they collect all kinds of data from those turbines, but there's certain short-term signals they want to do analytics on in as close to real time as they can, and take action on those. And so they use that compute to do the analytics and then when they fill up that Snowball Edge, they detach it and send it back to AWS to do broad-scale analytics in the cloud and then just start using an additional Snowball Edge to capture that short-term data and be able to do those analytics. So Snowball Edge is, you know, we just launched it a couple months ago, again, amazed at the type of response, how many customers are starting to deploy those all over the place. I think if you have exabytes of data that you need to move, it's not so easy. An exabyte of data, if you wanted to move from on premises to AWS, would require 10,000 Snowball Edges. Those customers don't want to really manage a fleet of 10,000 Snowball Edges if they don't have to. And so, we tried to figure out how to solve that problem, and it's why we launched Snowmobile back at Reinvent in November, which effectively, it's a hundred-petabyte container on a 45-foot trailer that we will take a truck and bring out to your facility. It comes with its own power and its own network fiber that we plug in to your data center. And if you want to move an exabyte of data over a 10 gigabit per second connection, it would take you 26 years. But using 10 Snowmobiles, it would take you six months. So really different level of scale. And you'd be surprised how many companies have exabytes of data at this point that they want to move to the cloud to get all those analytics and machine learning capabilities running on top of them. Then for streaming data, as we have more and more companies that are doing real-time analytics of streaming data, we have Kinesis, where we built something called the Kinesis Firehose that makes it really simple to stream all your real-time data. We have a storage gateway for companies that want to keep certain data hot, locally, and then asynchronously be loading the rest of their data to AWS to be able to use in different formats, should they need it as backup or should they choose to make a transition. So it's a very broad set of storage capabilities. And then of course, if you've moved a lot of data into the cloud or into anything, you realize that one of the hardest parts that people often leave to the end is ETL. And so we have announced an ETL service called Glue, which we announced at Reinvent, which is going to make it much easier to move your data, be able to find your data and map your data to different locations and do ETL, which of course is hugely important as you're moving large amounts. >> So we've talked a lot about moving things to the cloud, moving applications, moving data. But let's shift gears a little bit and talk about something not on the cloud, connected devices. >> Yeah. >> Where do they fit in and how do you think about edge? >> Well, you know, I've been working on AWS since the start of AWS, and we've been in the market for a little over 11 years at this point. And we have encountered, as I'm sure all of you have, many buzzwords. And of all the buzzwords that everybody has talked about, I think I can make a pretty strong argument that the one that has delivered fastest on its promise has been IOT and connected devices. Just amazing to me how much is happening at the edge today and how fast that's changing with device manufacturers. And I think that if you look out 10 years from now, when you talk about hybrid, I think most companies, majority on premise piece of hybrid will not be servers, it will be connected devices. There are going to be billions of devices all over the place, in your home, in your office, in factories, in oil fields, in agricultural fields, on ships, in cars, in planes, everywhere. You're going to have these assets that sit at the edge that companies are going to want to be able to collect data on, do analytics on, and then take action. And if you think about it, most of these devices, by their very nature, have relatively little CPU and have relatively little disk, which makes the cloud disproportionately important for them to supplement them. It's why you see most of the big, successful IOT applications today are using AWS to supplement them. Illumina has hooked up their genome sequencing to AWS to do analytics, or you can look at Major League Baseball Statcast is an IOT application built on top of AWS, or John Deer has over 200,000 telematically enabled tractors that are collecting real-time planting conditions and information that they're doing analytics on and sending it back to farmers so they can figure out where and how to optimally plant. Tata Motors manages their truck fleet this way. Phillips has their smart lighting project. I mean, there're innumerable amounts of these IOT applications built on top of AWS where the cloud is supplementing the device's capability. But when you think about these becoming more mission-critical applications for companies, there are going to be certain functions and certain conditions by which they're not going to want to connect back to the cloud. They're not going to want to take the time for that round trip. They're not going to have connectivity in some cases to be able to make a round trip to the cloud. And what they really want is customers really want the same capabilities they have on AWS, with AWS IOT, but on the devices themselves. And if you've ever tried to develop on these embedded devices, it's not for mere mortals. It's pretty delicate and it's pretty scary and there's a lot of archaic protocols associated with it, pretty tough to do it all and to do it without taking down your application. And so what we did was we built something called Greengrass, and we announced it at Reinvent. And Greengrass is really like a software module that you can effectively have inside your device. And it allows developers to write lambda functions, it's got lambda inside of it, and it allows customers to write lambda functions, some of which they want to run in the cloud, some of which they want to run on the device itself through Greengrass. So they have a common programming model to build those functions, to take the signals they see and take the actions they want to take against that, which is really going to help, I think, across all these IOT devices to be able to be much more flexible and allow the devices and the analytics and the actions you take to be much smarter, more intelligent. It's also why we built Snowball Edge. Snowball Edge, if you think about it, is really a purpose-built Greengrass device. We have Greengrass, it's inside of the Snowball Edge, and you know, the GE wind turbine example is a good example of that. And so it's to us, I think it's the future of what the on-premises piece of hybrid's going to be. I think there're going to be billions of devices all over the place and people are going to want to interact with them with a common programming model like they use in AWS and the cloud, and we're continuing to invest very significantly to make that easier and easier for companies. >> We've talked about several feature directions. We talked about AI, machine learning, the edge. What are some of the other areas of investment that this group should care about? >> Well there's a lot. (laughs) That's not a suit question, Ariel. But there's a lot. I think, I'll name a few. I think first of all, as I alluded to earlier, we are not close to being done expanding geographically. I think virtually every tier-one country will have an AWS region over time. I think many of the emerging countries will as well. I think the database space is an area that is radically changing. It's happening at a faster pace than I think people sometimes realize. And I think it's good news for all of you. I think the database space over the last few decades has been a lonely place for customers. I think that they have felt particularly locked into companies that are expensive and proprietary and have high degrees of lock-in and aren't so customer-friendly. And I think customers are sick of it. And we have a relational database service that we launched many years ago and has many flavors that you can run. You can run MySQL, you can run Postgres, you can run MariaDB, you can run SQLServer, you can run Oracle. And what a lot of our customers kept saying to us was, could you please figure out a way to have a database capability that has the performance characteristics of the commercial-grade databases but the customer-friendly and pricing model of the more open engines like the MySQL and Postgres and MariaDB. What you do on your own, we do a lot of it at Amazon, but it's hard, I mean, it takes a lot of work and a lot of tuning. And our customers really wanted us to solve that problem for them. And it's why we spent several years building Aurora, which is our own database engine that we built, but that's fully compatible with MySQL and with Postgres. It's at least as fault tolerant and durable and performant as the commercial-grade databases, but it's a tenth of the cost of those. And it's also nice because if it turns out that you use Aurora and you decide for whatever reason you don't want to use Aurora anymore, because it's fully compatible with MySQL and Postgres, you just dump it to the community versions of those, and off you are. So there's really hardly any transition there. So that is the fastest-growing service in the history of AWS. I'm amazed at how quickly it's grown. I think you may have heard earlier, we've had 23,000 database migrations just in the last year or so. There's a lot of pent-up demand to have database freedom. And we're here to help you have it. You know, I think on the analytic side, it's just never been easier and less expensive to collect, store, analyze, and share data than it is today. Part of that has to do with the economics of the cloud. But a lot of it has to do with the really broad analytics capability that we provide you. And it's a much broader capability than you'll find elsewhere. And you know, you can manage Hadoop and Spark and Presto and Hive and Pig and Yarn on top of AWS, or we have a managed elastic search service, and you know, of course we have a very high scale, very high performing data warehouse in Redshift, that just got even more performant with Spectrum, which now can query across all of your S3 data, and of course you have Athena, where you can query S3 directly. We have a service that allows you to do real-time analytics of streaming data in Kinesis. We have a business intelligence service in QuickSight. We have a number of machine learning capabilities I talked about earlier. It's a very broad array. And what we find is that it's a new day in analytics for companies. A lot of the data that companies felt like they had to throw away before, either because it was too expensive to hold or they didn't really have the tools accessible to them to get the learning from that data, it's a totally different day today. And so we have a pretty big investment in that space, I mentioned Glue earlier to do ETL on all that data. We have a lot more coming in that space. I think compute, super interesting, you know, I think you will find, I think we will find that companies will use full instances for many, many years and we have, you know, more than double the number of instances than you'll find elsewhere in every imaginable shape and size. But I would also say that the trend we see is that more and more companies are using smaller units of compute, and it's why you see containers becoming so popular. We have a really big business in ECS. And we will continue to build out the capability there. We have companies really running virtually every type of container and orchestration and management service on top of AWS at this point. And then of course, a couple years ago, we pioneered the event-driven serverless capability in compute that we call Lambda, which I'm just again, blown away by how many customers are using that for everything, in every way. So I think the basic unit of compute is continuing to get smaller. I think that's really good for customers. I think the ability to be serverless is a very exciting proposition that we're continuing to to fulfill that vision that we laid out a couple years ago. And then, probably, the last thing I'd point out right now is, I think it's really interesting to see how the basic procurement of software is changing. In significant part driven by what we've been doing with our Marketplace. If you think about it, in the old world, if you were a company that was buying software, you'd have to go find bunch of the companies that you should consider, you'd have to have a lot of conversations, you'd have to talk to a lot of salespeople. Those companies, by the way, have to have a big sales team, an expensive marketing budget to go find those companies and then go sell those companies and then both companies engage in this long tap-dance around doing an agreement and the legal terms and the legal teams and it's just, the process is very arduous. Then after you buy it, you have to figure out how you're going to actually package it, how you're deploy to infrastructure and get it done, and it's just, I think in general, both consumers of software and sellers of software really don't like the process that's existed over the last few decades. And then you look at AWS Marketplace, and we have 35 hundred product listings in there from 12 hundred technology providers. If you look at the number of hours, that software that's been running EC2 just in the last month alone, it's several hundred million hours, EC2 hours, of that software being run on top of our Marketplace. And it's just completely changing how software is bought and procured. I think that if you talk to a lot of the big sellers of software, like Splunk or Trend Micro, there's a whole number of them, they'll tell you it totally changes their ability to be able to sell. You know, one of the things that really helped AWS in the early days and still continues to help us, is that we have a self-service model where we don't actually have to have a lot of people talk to every customer to get started. I think if you're a seller of software, that's very appealing, to allow people to find your software and be able to buy it. And if you're a consumer, to be able to buy it quickly, again, without the hassle of all those conversations and the overhead associated with that, very appealing. And I think it's why the marketplace has just exploded and taken off like it has. It's also really good, by the way, for systems integrators, who are often packaging things on top of that software to their clients. This makes it much easier to build kind of smaller catalogs of software products for their customers. I think when you layer on top of that the capabilities that we've announced to make it easier for SASS providers to meter and to do billing and to do identity is just, it's a very different world. And so I think that also is very exciting, both for companies and customers as well as software providers. >> We certainly touched on a lot here. And we have a lot going on, and you know, while we have customers asking us a lot about how they can use all these new services and new features, we also tend to get a lot of questions from customers on how we innovate so quickly, and they can think about applying some of those lessons learned to their own businesses. >> So you're asking how we're able to innovate quickly? >> Mmm hmm. >> I think there's a few things that have helped us, and it's different for every company. But some of these might be helpful. I'll point to a few. I think the first thing is, I think we disproportionately index on hiring builders. And we think of builders as people who are inventors, people who look at different customer experiences really critically, are honest about what's flawed about them, and then seek to reinvent them. And then people who understand that launch is the starting line and not the finish line. There's very little that any of us ever built that's a home run right out of the gate. And so most things that succeed take a lot of listening to customers and a lot of experimentation and a lot of iterating before you get to an equation that really works. So the first thing is who we hire. I think the second thing is how we organize. And we have, at Amazon, long tried to organize into as small and separable and autonomous teams as we can, that have all the resources in those teams to own their own destiny. And so for instance, the technologists and the product managers are part of the same team. And a lot of that is because we don't want the finger pointing that goes back and forth between the teams, and if they're on the same team, they focus all their energy on owning it together and understanding what customers need from them, spending a disproportionate amount of time with customers, and then they get to own their own roadmaps. One of the reasons we don't publish a 12 to 18 month roadmap is we want those teams to have the freedom, in talking to customers and listening to what you tell us matters, to re-prioritize if there are certain things that we assumed mattered more than it turns out it does. So, you know I think that the way that we organize is the second piece. I think a third piece is all of our teams get to use the same AWS building blocks that all of you get to use, which allow you to move much more quickly. And I think one of the least told stories about Amazon over the last five years, in part because people have gotten interested in AWS, is people have missed how fast our consumer business at Amazon has iterated. Look at the amount of invention in Amazon's consumer business. And they'll tell you that a big piece of that is their ability to use the AWS building blocks like they do. I think a fourth thing is many big companies, as they get larger, what starts to happen is what people call the institutional no, which is that leaders walk into meetings on new ideas looking to find ways to say no, and not because they're ill intended but just because they get more conservative or they have a lot on their plate or things are really managed very centrally, so it's hard to imagine adding more to what you're already doing. At Amazon, it's really the opposite, and in part because of the way we're organized in such a decoupled, decentralized fashion, and in part because it's just part of our DNA. When the leaders walk into a meeting, they are looking for ways to say yes. And we don't say yes to everything, we have a lot of proposals. But we say yes to a lot more than I think virtually any other company on the planet. And when we're having conversations with builders who are proposing new ideas, we're in a mode where we're trying to problem-solve with them to get to yes, which I think is really different. And then I think the last thing is that we have mechanisms inside the company that allow us to make fast decisions. And if you want a little bit more detail, you should read our founder and CEO Jeff Bezos's shareholder letter, which just was released. He talks about the fast decision-making that happens inside the company. It's really true. We make fast decisions and we're willing to fail. And you know, we sometimes talk about how we're working on several of our next biggest failures, and we hope that most of the things we're doing aren't going to fail, but we know, if you're going to push the envelope and if you're going to experiment at the rate that we're trying to experiment, to find more pillars that allow us to do more for customers and allow us to be more relevant, you are going to fail sometimes. And you have to accept that, and you have to have a way of evaluating people that recognizes the inputs, meaning the things that they actually delivered as opposed to the outputs, cause on new ventures, you don't know what the outputs are going to be, you don't know consumers or customers are going to respond to the new thing you're trying to build. So you have to be able to reward employees on the inputs, you have to have a way for them to continue to progress and grow in their career even if they work on something didn't work. And you have to have a way of thinking about, when things don't work, how do I take the technology that I built as part of that, that really actually does work, but I didn't get it right in the form factor, and use it for other things. And I think that when you think about a culture like Amazon, that disproportionately hires builders, organizes into these separable, autonomous teams, and allows them to use building blocks to move fast, and has a leadership team that's looking to say yes to ideas and is willing to fail, you end up finding not only do you do more inventing but you get the people at every level of the organization spending their free cycles thinking about new ideas because it actually pays to think of new ideas cause you get a shot to try it. And so that has really helped us and I think most of our customers who have made significant shifts to AWS and the cloud would argue that that's one of the big transformational things they've seen in their companies as well. >> Okay. I want to go a little bit deeper on the subject of culture. What are some of the things that are most unique about the AWS culture that companies should know about when they're looking to partner with us? >> Well, I think if you're making a decision on a predominant infrastructure provider, it's really important that you decide that the culture of the company you're going to partner with is a fit for yours. And you know, it's a super important decision that you don't want to have to redo multiple times cause it's wasted effort. And I think that, look, I've been at Amazon for almost 20 years at this point, so I have obviously drank the Kool Aid. But there are a few things that I think are truly unique about Amazon's culture. I'll talk about three of them. The first is I think that we are unusually customer-oriented. And I think a lot of companies talk about being customer-oriented, but few actually are. I think most of the big technology companies truthfully are competitor-focused. They kind of look at what competitors are doing and then they try to one-up one another. You have one or two of them that I would say are product-focused, where they say, hey, it's great, you Mr. and Mrs. Customer have ideas on a product, but leave that to the experts, and you know, you'll like the products we're going to build. And those strategies can be good ones and successful ones, they're just not ours. We are driven by what customers tell us matters to them. We don't build technology for technology's sake, we don't become, you know, smitten by any one technology. We're trying to solve real problems for our customers. 90% of what we build is driven by what you tell us matters. And the other 10% is listening to you, and even if you can't articulate exactly what you want, trying to read between the lines and invent on your behalf. So that's the first thing. Second thing is that we are pioneers. We really like to invent, as I was talking about earlier. And I think most big technology companies at this point have either lost their will or their DNA to invent. Most of them acquire it or fast follow. And again, that can be a successful strategy. It's just not ours. I think in this day and age, where we're going through as big a shift as we are in the cloud, which is the biggest technology shift in our lifetime, as dynamic as it is, being able to partner with a company that has the most functionality, it's iterating the fastest, has the most customers, has the largest ecosystem of partners, has SIs and ISPs, that has had a vision for how all these pieces fit together from the start, instead of trying to patch them together in a following act, you have a big advantage. I think that the third thing is that we're unusually long-term oriented. And I think that you won't ever see us show up at your door the last day of a quarter, the last day of a year, trying to harass you into doing some kind of deal with us, not to be heard from again for a couple years when we either audit you or try to re-up you for a deal. That's just not the way that we will ever operate. We are trying to build a business, a set of relationships, that will outlast all of us here. And I think something that always ties it together well is this trusted advisor capability that we have inside our support function, which is, you know, we look at dozens of programmatic ways that our customers are using the platform and reach out to you if you're doing something we think's suboptimal. And one of the things we do is if you're not fully utilizing resources, or hardly, or not using them at all, we'll reach out and say, hey, you should stop paying for this. And over the last couple of years, we've sent out a couple million of these notifications that have led to actual annualized savings for customers of 350 million dollars. So I ask you, how many of your technology partners reach out to you and say stop spending money with us? To the tune of 350 million dollars lost revenue per year. Not too many. And I think when we first started doing it, people though it was gimmicky, but if you understand what I just talked about with regard to our culture, it makes perfect sense. We don't want to make money from customers unless you're getting value. We want to reinvent an experience that we think has been broken for the prior few decades. And then we're trying to build a relationship with you that outlasts all of us, and we think the best way to do that is to provide value and do right by customers over a long period of time. >> Okay, keeping going on the culture subject, what about some of the quirky things about Amazon's culture that people might find interesting or useful? >> Well there are a lot of quirky parts to our culture. And I think any, you know lots of companies who have strong culture will argue they have quirky pieces but I think there's a few I might point to. You know, I think the first would be the first several years I was with the company, I guess the first six years or so I was at the company, like most companies, all the information that was presented was via PowerPoint. And we would find that it was a very inefficient way to consume information. You know, you were often shaded by the charisma of the presenter, sometimes you would overweight what the presenters said based on whether they were a good presenter. And vice versa. You would very rarely have a deep conversation, cause you have no room on PowerPoint slides to have any depth. You would interrupt the presenter constantly with questions that they hadn't really thought through cause they didn't think they were going to have to present that level of depth. You constantly have the, you know, you'd ask the question, oh, I'm going to get to that in five slides, you want to do that now or you want to do that in five slides, you know, it was just maddening. And we would often find that most of the meetings required multiple meetings. And so we made a decision as a company to effectively ban PowerPoints as a communication vehicle inside the company. Really the only time I do PowerPoints is at Reinvent. And maybe that shows. And what we found is that it's a much more substantive and effective and time-efficient way to have conversations because there is no way to fake depth in a six-page narrative. So what we went to from PowerPoint was six-page narrative. You can write, have as much as you want in the appendix, but you have to assume nobody will read the appendices. Everything you have to communicate has to be done in six pages. You can't fake depth in a six-page narrative. And so what we do is we all get to the room, we spend 20 minutes or so reading the document so it's fresh in everybody's head. And then where we start the conversation is a radically different spot than when you're hearing a presentation one kind of shallow slide at a time. We all start the conversation with a fair bit of depth on the topic, and we can really hone in on the three or four issues that typically matter in each of these conversations. So we get to the heart of the matter and we can have one meeting on the topic instead of three or four. So that has been really, I mean it's unusual and it takes some time getting used to but it is a much more effective way to pay attention to the detail and have a substantive conversation. You know, I think a second thing, if you look at our working backwards process, we don't write a lot of code for any of our services until we write and refine and decide we have crisp press release and frequently asked question, or FAQ, for that product. And in the press release, what we're trying to do is make sure that we're building a product that has benefits that will really matter. How many times have we all gotten to the end of products and by the time we get there, we kind of think about what we're launching and think, this is not that interesting. Like, people are not going to find this that compelling. And it's because you just haven't thought through and argued and debated and made sure that you drew the line in the right spot on a set of benefits that will really matter to customers. So that's why we use the press release. The FAQ is to really have the arguments up front about how you're building the product. So what technology are you using? What's the architecture? What's the customer experience? What's the UI look like? What's the pricing dimensions? Are you going to charge for it or not? All of those decisions, what are people going to be most excited about, what are people going to be most disappointed by. All those conversations, if you have them up front, even if it takes you a few times to go through it, you can just let the teams build, and you don't have to check in with them except on the dates. And so we find that if we take the time up front we not only get the products right more often but the teams also deliver much more quickly and with much less churn. And then the third thing I'd say that's kind of quirky is it is an unusually truth-seeking culture at Amazon. I think we have a leadership principle that we say have backbone, disagree, and commit. And what it means is that we really expect people to speak up if they believe that we're headed down a path that's wrong for customers, no matter who is advancing it, what level in the company, everybody is empowered and expected to speak up. And then once we have the debate, then we all have to pull the same way, even if it's a different way than you were advocating. And I think, you always hear the old adage of where, two people look at a ceiling and one person says it's 14 feet and the other person says, it's 10 feet, and they say, okay let's compromise, it's 12 feet. And of course, it's not 12 feet, there is an answer. And not all things that we all consider has that black and white answer, but most things have an answer that really is more right if you actually assess it and debate it. And so we have an environment that really empowers people to challenge one another and I think it's part of why we end up getting to better answers, cause we have that level of openness and rigor. >> Okay, well Andy, we have time for one more question. >> Okay. >> So other than some of the things you've talked about, like customer focus, innovation, and long-term orientation, what is the single most important lesson that you've learned that is really relevant to this audience and this time we're living in? >> There's a lot. But I'll pick one. I would say I'll tell a short story that I think captures it. In the early days at Amazon, our sole business was what we called an owned inventory retail business, which meant we bought the inventory from distributors or publishers or manufacturers, stored it in our own fulfillment centers and shipped it to customers. And around the year 1999 or 2000, this third party seller model started becoming very popular. You know, these were companies like Half.com and eBay and folks like that. And we had a really animated debate inside the company about whether we should allow third party sellers to sell on the Amazon site. And the concerns internally were, first of all, we just had this fundamental belief that other sellers weren't going to care as much about the customer experience as we did cause it was such a central part of everything we did DNA-wise. And then also we had this entire business and all this machinery that was built around owned inventory business, with all these relationships with publishers and distributors and manufacturers, who we didn't think would necessarily like third party sellers selling right alongside us having bought their products. And so we really debated this, and we ultimately decided that we were going to allow third party sellers to sell in our marketplace. And we made that decision in part because it was better for customers, it allowed them to have lower prices, so more price variety and better selection. But also in significant part because we realized you can't fight gravity. If something is going to happen, whether you want it to happen or not, it is going to happen. And you are much better off cannibalizing yourself or being ahead of whatever direction the world is headed than you are at howling at the wind or wishing it away or trying to put up blockers and find a way to delay moving to the model that is really most successful and has the most amount of benefits for the customers in question. And that turned out to be a really important lesson for Amazon as a company and for me, personally, as well. You know, in the early days of doing Marketplace, we had all kinds of folks, even after we made the decision, that despite the have backbone, disagree and commit weren't really sure that they believed that it was going to be a successful decision. And it took several months, but thankfully we really were vigilant about it, and today in roughly half of the units we sell in our retail business are third party seller units. Been really good for our customers. And really good for our business as well. And I think the same thing is really applicable to the space we're talking about today, to the cloud, as you think about this gigantic shift that's going on right now, moving to the cloud, which is, you know, I think in the early days of the cloud, the first, I'll call it six, seven, eight years, I think collectively we consumed so much energy with all these arguments about are people going to move to the cloud, what are they going to move to the cloud, will they move mission-critical applications to the cloud, will the enterprise adopt it, will public sector adopt it, what about private cloud, you know, we just consumed a huge amount of energy and it was, you can see both in the results in what's happening in businesses like ours, it was a form of fighting gravity. And today we don't really have if conversations anymore with our customers. They're all when and how and what order conversations. And I would say that this going to be a much better world for all of us, because we will be able to build in a much more cost effective fashion, we will be able to build much more quickly, we'll be able to take our scarce resource of engineers and not spend their resource on the undifferentiated heavy lifting of infrastructure and instead on what truly differentiates your business. And you'll have a global presence, so that you have lower latency and a better end user customer experience being deployed with your applications and infrastructure all over the world. And you'll be able to meet the data sovereignty requirements of various locales. So I think it's a great world that we're entering right now, I think we're at a time where there's a lot less confusion about where the world is headed, and I think it's an unprecedented opportunity for you to reinvent your businesses, reinvent your applications, and build capabilities for your customers and for your business that weren't easily possible before. And I hope you take advantage of it, and we'll be right here every step of the way to help you. Thank you very much. I appreciate it. (applause) >> Thank you, Andy. And thank you, everyone. I appreciate your time today. >> Thank you. (applause) (upbeat music)

Published Date : May 3 2017

SUMMARY :

of Worldwide Marketing, Amazon Web Services, Ariel Kelman. It is my pleasure to introduce to come up on stage here, I have a bunch of questions here for you, Andy. of a state of the state on AWS. And I think if you look at that collection of things, a lot of customers moving to AWS, And of course that's not the case. and how they should think about their relationship And I think the reality is when you look at the cloud, talk about a subject that's on the minds And I think that you can expect, over time, So as people are looking to move and it has clustering so that you don't and talk about something not on the cloud, And I think that if you look out 10 years from now, What are some of the other areas of investment and we have, you know, more than double and you know, while we have customers and listening to what you tell us matters, What are some of the things that are most unique And the other 10% is listening to you, And I think any, you know lots of companies moving to the cloud, which is, you know, And thank you, everyone. Thank you.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
AmadeusORGANIZATION

0.99+

AWSORGANIZATION

0.99+

Western DigitalORGANIZATION

0.99+

AndyPERSON

0.99+

NvidiaORGANIZATION

0.99+

AmazonORGANIZATION

0.99+

FranceLOCATION

0.99+

SwedenLOCATION

0.99+

NingxiaLOCATION

0.99+

ChinaLOCATION

0.99+

Andy JassyPERSON

0.99+

StanfordORGANIZATION

0.99+

six monthsQUANTITY

0.99+

Ariel KelmanPERSON

0.99+

Jeff BezosPERSON

0.99+

twoQUANTITY

0.99+

threeQUANTITY

0.99+

2000DATE

0.99+

OracleORGANIZATION

0.99+

12QUANTITY

0.99+

26 yearsQUANTITY

0.99+

20 minutesQUANTITY

0.99+

ArielPERSON

0.99+

two peopleQUANTITY

0.99+

10 feetQUANTITY

0.99+

six pagesQUANTITY

0.99+

90%QUANTITY

0.99+

GEORGANIZATION

0.99+

six-pageQUANTITY

0.99+

second pieceQUANTITY

0.99+

last yearDATE

0.99+

14 feetQUANTITY

0.99+

sixQUANTITY

0.99+

PowerPointTITLE

0.99+

47%QUANTITY

0.99+

50 terabytesQUANTITY

0.99+

Amazon Web ServicesORGANIZATION

0.99+

12 feetQUANTITY

0.99+

sevenQUANTITY

0.99+

five slidesQUANTITY

0.99+

TodayDATE

0.99+

fourQUANTITY

0.99+

oneQUANTITY

0.99+

10%QUANTITY

0.99+

2016DATE

0.99+

350 million dollarsQUANTITY

0.99+

10XQUANTITY

0.99+

NetflixORGANIZATION

0.99+

NovemberDATE

0.99+

USLOCATION

0.99+

second reasonQUANTITY

0.99+

McDonaldsORGANIZATION

0.99+

Bruno Aziza & Josh Klahr, AtScale - Big Data SV 17 - #BigDataSV - #theCUBE1


 

>> Announcer: Live from San Jose, California, it's The Cube. Covering Big Data, Silicon Valley, 2017. (electronic music) >> Okay, welcome back everyone, live at Silicon Valley for the big The Cube coverage, I'm John Furrier, with me Wikibon analyst George Gilbert, Bruno Aziza, who's on the CMO of AtScale, Cube alumni, and Josh Klahr VP at AtScale, welcome to the Cube. >> Welcome back. >> Thank you. >> Thanks, Brian. >> Bruno, great to see you. You look great, you're smiling as always. Business is good? >> Business is great. >> Give us the update on AtScale, what's up since we last saw you in New York? >> Well, thanks for having us, first of all. And, yeah, business is great, we- I think Last time I was here on The Cube we talked about the Hadoop Maturity Survey and at the time we'd just launched the company. And, so now you look about a year out and we've grown about 10x. We have large enterprises across just about any vertical you can think of. You know, financial services, your American Express, healthcare, think about ETNA, SIGNA, GSK, retail, Home Depot, Macy's and so forth. And, we've also done a lot of work with our partner Ecosystem, so Mork's- OEM's AtScale technology which is a great way for us to get you AtScale across the US, but also internationally. And then our customers are getting recognized for the work that they are doing with AtScale. So, last year, for instance, Yellowpages got recognized by Cloudera, on their leadership award. And Macy's got a leadership award as well. So, things are going the right trajectory, and I think we're also benefitting from the fact that the industry is changing, it's maturing on the the big data side, but also there's a right definition of what business intelligence means. This idea that you can have analytics on large-scale data without having to change your visualization tools and make that work with existing stock you have in place. And, I think that's been helping us in growing- >> How did you guys do it? I mean, you know, we've talked many times in there's some secret sauce there, but, at the time when you guys were first starting it was kind of crowded field, right? >> Bruno: Yeah. >> And all these BI tools were out there, you had front end BI tools- >> Bruno: Yep. But everyone was still separate from the whole batch back end. So, what did you guys do to break out? >> So, there's two key differentiators with AtScale. The first one is we are the only platform that does not have a visualization tool. And, so people think about this as, that's a bug, that's actually a feature. Because, most enterprises have already that stuff made with traditional BI tools. And so our ability to talk to MDX and SQL types of BI tools, without any changes is a big differentiator. And then the other piece of our technology, this idea that you can get the speed, the scale and security on large data sets without having to move the data. It's a big differentiation for our enterprise to get value out of the data. They already have in Hadoop as well as non-Hadoop systems, which we cover. >> Josh, you're the VP of products, you have the roadmaps, give us a peek into what's happening with the current product. And, where's the work areas? Where are you guys going? What's the to-do list, what's the check box, and what's the innovation coming around the corner? >> Yeah, I think, to follow up on what Bruno said about how we hit the sweet spot. I think- we made a strategic choice, which is we don't want to be in the business of trying to be Tableu or Excel or be a better front end. And there's so much diversity on the back end if you look at the ecosystem right now, whether it's Spark Sequel, or Hive, or Presto, or even new cloud based systems, the sweet spot is really how do you fit into those ecosystems and support the right level of BI on top of those applications. So, what we're looking at, from a road map perspective is how do we expand and support the back end data platforms that customers are asking about? I think we saw a big white space in BI on Hadoop in particular. And that's- I'd say, we've nailed it over the past year and a half. But, we see customers now that are asking us about Google Big Query. They're asking us about Athena. I think these server-less data platforms are really, really compelling. They're going to take a while to get adoption. So, that's a big investment area for us. And then, in terms of supporting BI front ends, we're kind of doubling down on making sure our Tableau integration is great, Power BI is I think getting really big traction. >> Well, two great products, you've got Microsoft and Tableau, leaders in that area. >> The self-service BI revolution has, I would say, has won. And the business user wants their tool of choice. Where we come in is the folks responsible for data platforms on the back end, they want some level of control and consistency and so they're trying to figure out, where do you draw the line? Where do you provide standards? Where do you provide governance, and where do you let the business lose? >> All right, so, Bruno and Josh, I want you to answer the questions, be a good quiz. So, define next generation BI platforms from a functional standpoint and then under the hood. >> Yeah, there's a few things you can look at. I think if you were at the Gartner BI conference last week you saw that there was 24 vendors in the magic quadrant and I think in general people are now realizing that this is a space that is extremely crowded and it's also sitting on technology that was built 20 years ago. Now, when you talk to enterprises like the ones we work with, like, as I named earlier, you realize that they all have multiple BI tools. So, the visualization war, if you will, kind of has been set up and almost won by Microsoft and Tableau at this point. And, the average enterprise is 15 different BI tools. So, clearly, if you're trying to innovate on the visualization side, I would say you're going to have a very hard time. So, you're dealing with that level of complexity. And then, at the back end standpoint, you're now having to deal with database from the past - that's the Teradata of this world - data sources from today - Hadoop - and data sources from the future, like Google Big Query. And, so, I think the CIO answer of what is the next gen BI platform I want is something that is enabling me to simplify this very complex world. I have lots of BI tools, lots of data, how can I standardize in the middle in order to provide security, provide scale, provide speed to my business users and, you know, that's really radically going to change the space, I think. If you're trying to sell a full stack that's integrated from the bottom all the way to visualization, I don't think that's what enterprises want anymore >> Josh, under the hood, what's the next generation- you know, key leverage for the tech, and, just the enabler. >> Yeah, so, for me the end state for the next generation GI platform is a user can log in, they can point to their data, wherever that data is, it's on Prime, it's in the cloud, it's in a relational database, it's a flat file, they can design their business model. We spend a lot of time making sure we can support the creation of business models, what are the key metrics, what are the hierarchies, what are the measures, it may sound like I'm talking about OLAP. You know, that's what our history is steeped in. >> Well, faster data is coming, that's- streaming and data is coming together. >> So, I should be able to just point at those data sets and turn around and be able to analyze it immediately. On the back end that means we need to have pretty robust modeling capabilities. So that you can define those complex metrics, so you can functionally do what are traditional business analytics, period over period comparisons, rolling averages, navigate up and down business hierarchies. The optimizations should be built in. It shouldn't be the responsibility of the designer to figure out, do I need to create indeces, do I need to create aggregates, do I need to create summarization? That should all be handled for you automatically. Shouldn't think about data movement. And so that's really what we've built in from an AtScale perspective on the back end. Point to data, we're smart about creating optimal data structure so you get fast performance. And then, you should be able to connect whatever BI tool you want. You should be able to connect Excel, we can talk the MDX Query language. We can talk Sequel, we can talk Dax, whatever language you want to talk. >> So, take the syntax out of the hands of the user. >> Yeah. >> Yeah. >> And getting in the weeds on that stuff. Make it easier for them- >> Exactly. >> And the key word I think, for the future of BI is open, right? We've been buying tools over the last- >> What do you mean by that, explain. >> Open means that you can choose whatever BI tool you want, and you can choose whatever data you want. And, as a business user there's no real compromise. But, because you're getting an open platform it doesn't mean that you have to trade off complexity. I think some of the stuff that Josh was talking about, period analysis, the type of multidimensional analysis that you need, calendar analysis, historical data, that's still going to be needed, but you're going to need to provide this in a world where the business, user, and IT organization expects that the tools they buy are going to be open to the rest of the ecosystem, and that's new, I think. >> George, you want to get a question in, edgewise? Come on. (group laughs) >> You know, I've been sort of a single-issue candidate, I guess, this week on machine learning and how it's sort of touching all the different sectors. And, I'm wondering, are you- how do you see yourselves as part of a broader pipeline of different users adding different types of value to data? >> I think maybe on the machine learning topic there is a few different ways to look at it. The first is we do use machine learning in our own product. I talked about this concept of auto-optimization. One of the things that AtScale does is it looks at end-user query patterns. And we look at those query patterns and try to figure out how can we be smart about anticipating the next thing they're going to ask so we can pre-index, or pre-materialize that data? So, there's machine learning in the context of making AtScale a better product. >> Reusing things that are already done, that's been the whole machine-learning- >> Yes. >> Demos, we saw Google Next with the video editing and the video recognition stuff, that's been- >> Exactly. >> Huge part of it. >> You've got users giving you signals, take that information and be smart with it. I think, in terms of the customer work flow - Comcast, for example, a customer of ours - we are in a data discovery phase, there's a data science group that looks at all of their set top box data, and they're trying to discover programming patterns. Who uses the Yankees' network for example? And where they use AtScale is what I would call a descriptive element, where they're trying to figure out what are the key measures and trends, and what are the attributes that contribute to that. And then they'll go in and they'll use machine learning tools on top of that same data set to come up with predictive algorithms. >> So, just to be clear there, they're hypotehsizing about, like, say, either the pattern of users that might be- have an affinity for a certain channel or channels, or they're looking for pathways. >> Yes. And I'd say our role in that right now is a descriptive role. We're supporting the descriptive element of that analytics life cycle. I think over time our customers are going to push us to build in more of our own capabilities, when it comes to, okay, I discovered something descriptive, can you come up with a model that helps me predict it the next time around? Honestly, right now people want BI. People want very traditional BI on the next generation data platform. >> Just, continuing on that theme, leaving machine learning aside, I guess, as I understand it, when we talked about the old school vendors, Care Data, when they wanted to support data scientists they grafted on some machine learning, like a parallel version of our- in the core Teradata engine. They also bought Astro Data, which was, you know, for a different audience. So, I guess, my question is, will we see from you, ultimately, a separate product line to support a new class of users? Or, are you thinking about new functionality that gets integrated into the core product. I think it's more of the latter. So, the way that we view it- and this is really looking at, like I said, what people are asking for today is, kind of, the basic, traditional BI. What we're building is essentially a business model. So, when someone uses AtScale, they're designing and they're telling us, they're asserting, these are the things I'm interested in measuring, and these are the attributes that I think might contribute to it. And, so that puts us in a pretty good position to start using, whether it's Spark on the back end, or built in machine learning algorithms on the Hadoop cluster, let's start using our knowledge of that business model to help make predictions on behalf of the customer. So, just a follow-up, and this really leaves out the machine learning part, which is, it sounds like, we went- in terms of big data we we first to archive it- supported more data retension than could do affordably with the data warehouse. Then we did the ETL offload, now we're doing more and more of the visualization, the ad-hoc stuff. >> That's exactly right. So, what- in a couple years time, what remains in the classic data warehouse, and what's in the Hadoop category? >> Well, so there is, I think what you're describing is the pure evolution, of, you know, any technology where you start with the infrastructure, you know, we've been in this for over ten years, now, you've got cloud. They are going APO and then going into the data science workbench. >> That's not official yet. >> I think we read about this, or at least they filed. But I think the direction is showing- now people are relying on the platform, the Hadoop platform, in order to build applications on top of it. And, so, I think, just like Josh is saying, the mainstream application on top of the database - and I think this is true for non-Hadoop systems as well - is always going to be analytics. Of course, data science is something that provides a lot of value, but it typically provides a lot of value to a few set of people that will then scale it out to the rest of their organization. I think if you now project out to what does this mean for the CIO and their environment, I don't think any of these platforms, Teradata or Hadoop, or Google, or Amazon or any of those, I don't think do 100% replace. And, I think that's where it becomes interesting, because you're now having to deal with a hetergeneous environment, where the business user is up, they're using Excel, they're using they're standard net application, they might be using the result of machine learning models, but they're also having to deal with the heterogeneous environment at the data level. Hadoop on Prime, Hadoop in the cloud, non-Hadoop in the cloud and non-Hadoop on Prime. And, of course that's a market that I think is very interesting for us as a simplification platform for that world. >> I think you guys are really thinking about it in a new way, and I think that's kind of a great, modern approach, let the freedom- and by the way, quick question on the Microsoft tool and Tableau, what percentage share do you think they are of the market? 50? Because you mentioned those are the two top ones. >> Are they? >> Yeah, I mentioned them, because if you look at the magic quadrant, clearly Microsoft, Power BI and Tableau have really shot up all the way to the right. >> Because it's easy to use, and it's easy to work with data. >> I think so, I think- look, from a functionality standpoint, you see Tableau's done a very good job on the visualization side. I think, from a business standpoint, and a business model execution, and I can talk from my days at Microsoft, it's a very great distribution model to get thousands and thousands of users to use power BI. Now, the guys that we didn't talk about on the last magic quadrant. People who are like Google Data Studio, or Amazon Quicksite, and I think that will change the ecosystem as well. Which, again, is great news for AtScale. >> More muscle coming in. >> That's right. >> For you guys, just more rising tide floats all boats. >> That's right. >> So, you guys are powering it. >> That's right. >> Modern BI would be safe to say? >> That's the idea. The idea is that the visualization is basically commoditized at this point. And what business users want and what enterprise leaders want is the ability to provide freedom and openness to their business users and never have to compromise security, speed and also the complexity of those models, which is what we- we're in the business of. >> Get people working, get people productive faster. >> In whatever tool they want. >> All right, Bruno. Thanks so much. Thanks for coming on. AtScale. Modern BI here in The Cube. Breaking it down. This is The Cube covering bid data SV strata Hadoop. Back with more coverage after this short break. (electronic music)

Published Date : Mar 15 2017

SUMMARY :

it's The Cube. live at Silicon Valley for the big The Cube coverage, Bruno, great to see you. Hadoop Maturity Survey and at the time So, what did you guys do to break out? this idea that you can get the speed, What's the to-do list, what's the check box, the sweet spot is really how do you Microsoft and Tableau, leaders in that area. and where do you let the business lose? I want you to answer the questions, So, the visualization war, if you will, and, just the enabler. for the next generation GI platform is and data is coming together. of the designer to figure out, So, take the syntax out of the hands And getting in the weeds on that stuff. the type of multidimensional analysis that you need, George, you want to get a question in, edgewise? all the different sectors. the next thing they're going to ask You've got users giving you signals, either the pattern of users that might be- on the next generation data platform. So, the way that we view it- and what's in the Hadoop category? is the pure evolution, of, you know, the Hadoop platform, in order to build applications I think you guys are really thinking about it because if you look at the magic quadrant, and it's easy to work with data. Now, the guys that we didn't talk about For you guys, just more The idea is that the visualization This is The Cube covering bid data

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
George GilbertPERSON

0.99+

BrunoPERSON

0.99+

Bruno AzizaPERSON

0.99+

GeorgePERSON

0.99+

ComcastORGANIZATION

0.99+

ETNAORGANIZATION

0.99+

BrianPERSON

0.99+

John FurrierPERSON

0.99+

New YorkLOCATION

0.99+

Josh KlahrPERSON

0.99+

SIGNAORGANIZATION

0.99+

GSKORGANIZATION

0.99+

JoshPERSON

0.99+

Home DepotORGANIZATION

0.99+

24 vendorsQUANTITY

0.99+

MicrosoftORGANIZATION

0.99+

Yankees'ORGANIZATION

0.99+

thousandsQUANTITY

0.99+

USLOCATION

0.99+

ExcelTITLE

0.99+

last yearDATE

0.99+

AmazonORGANIZATION

0.99+

100%QUANTITY

0.99+

San Jose, CaliforniaLOCATION

0.99+

last weekDATE

0.99+

Silicon ValleyLOCATION

0.99+

AtScaleORGANIZATION

0.99+

American ExpressORGANIZATION

0.99+

first oneQUANTITY

0.99+

firstQUANTITY

0.99+

20 years agoDATE

0.99+

50QUANTITY

0.98+

2017DATE

0.98+

TableauTITLE

0.98+

Macy'sORGANIZATION

0.98+

OneQUANTITY

0.98+

MorkORGANIZATION

0.98+

power BITITLE

0.98+

EcosystemORGANIZATION

0.98+

SequelPERSON

0.97+

GoogleORGANIZATION

0.97+

this weekDATE

0.97+

Power BITITLE

0.97+

ClouderaORGANIZATION

0.96+

15 different BI toolsQUANTITY

0.95+

past year and a halfDATE

0.95+

over ten yearsQUANTITY

0.95+

todayDATE

0.95+

TableuTITLE

0.94+

TableauORGANIZATION

0.94+

SQLTITLE

0.93+

Astro DataORGANIZATION

0.93+

CubeORGANIZATION

0.92+

WikibonORGANIZATION

0.92+

two key differentiatorsQUANTITY

0.92+

AtScaleTITLE

0.91+

Care DataORGANIZATION

0.9+

about 10xQUANTITY

0.9+

Spark SequelTITLE

0.89+

two top onesQUANTITY

0.89+

HadoopTITLE

0.88+

AthenaORGANIZATION

0.87+

two great productsQUANTITY

0.87+

Big QueryTITLE

0.86+

The CubeORGANIZATION

0.85+

Big DataORGANIZATION

0.85+