Image Title

Search Results for www confluent IO:

Analyst Predictions 2023: The Future of Data Management


 

(upbeat music) >> Hello, this is Dave Valente with theCUBE, and one of the most gratifying aspects of my role as a host of "theCUBE TV" is I get to cover a wide range of topics. And quite often, we're able to bring to our program a level of expertise that allows us to more deeply explore and unpack some of the topics that we cover throughout the year. And one of our favorite topics, of course, is data. Now, in 2021, after being in isolation for the better part of two years, a group of industry analysts met up at AWS re:Invent and started a collaboration to look at the trends in data and predict what some likely outcomes will be for the coming year. And it resulted in a very popular session that we had last year focused on the future of data management. And I'm very excited and pleased to tell you that the 2023 edition of that predictions episode is back, and with me are five outstanding market analyst, Sanjeev Mohan of SanjMo, Tony Baer of dbInsight, Carl Olofson from IDC, Dave Menninger from Ventana Research, and Doug Henschen, VP and Principal Analyst at Constellation Research. Now, what is it that we're calling you, guys? A data pack like the rat pack? No, no, no, no, that's not it. It's the data crowd, the data crowd, and the crowd includes some of the best minds in the data analyst community. They'll discuss how data management is evolving and what listeners should prepare for in 2023. Guys, welcome back. Great to see you. >> Good to be here. >> Thank you. >> Thanks, Dave. (Tony and Dave faintly speaks) >> All right, before we get into 2023 predictions, we thought it'd be good to do a look back at how we did in 2022 and give a transparent assessment of those predictions. So, let's get right into it. We're going to bring these up here, the predictions from 2022, they're color-coded red, yellow, and green to signify the degree of accuracy. And I'm pleased to report there's no red. Well, maybe some of you will want to debate that grading system. But as always, we want to be open, so you can decide for yourselves. So, we're going to ask each analyst to review their 2022 prediction and explain their rating and what evidence they have that led them to their conclusion. So, Sanjeev, please kick it off. Your prediction was data governance becomes key. I know that's going to knock you guys over, but elaborate, because you had more detail when you double click on that. >> Yeah, absolutely. Thank you so much, Dave, for having us on the show today. And we self-graded ourselves. I could have very easily made my prediction from last year green, but I mentioned why I left it as yellow. I totally fully believe that data governance was in a renaissance in 2022. And why do I say that? You have to look no further than AWS launching its own data catalog called DataZone. Before that, mid-year, we saw Unity Catalog from Databricks went GA. So, overall, I saw there was tremendous movement. When you see these big players launching a new data catalog, you know that they want to be in this space. And this space is highly critical to everything that I feel we will talk about in today's call. Also, if you look at established players, I spoke at Collibra's conference, data.world, work closely with Alation, Informatica, a bunch of other companies, they all added tremendous new capabilities. So, it did become key. The reason I left it as yellow is because I had made a prediction that Collibra would go IPO, and it did not. And I don't think anyone is going IPO right now. The market is really, really down, the funding in VC IPO market. But other than that, data governance had a banner year in 2022. >> Yeah. Well, thank you for that. And of course, you saw data clean rooms being announced at AWS re:Invent, so more evidence. And I like how the fact that you included in your predictions some things that were binary, so you dinged yourself there. So, good job. Okay, Tony Baer, you're up next. Data mesh hits reality check. As you see here, you've given yourself a bright green thumbs up. (Tony laughing) Okay. Let's hear why you feel that was the case. What do you mean by reality check? >> Okay. Thanks, Dave, for having us back again. This is something I just wrote and just tried to get away from, and this just a topic just won't go away. I did speak with a number of folks, early adopters and non-adopters during the year. And I did find that basically that it pretty much validated what I was expecting, which was that there was a lot more, this has now become a front burner issue. And if I had any doubt in my mind, the evidence I would point to is what was originally intended to be a throwaway post on LinkedIn, which I just quickly scribbled down the night before leaving for re:Invent. I was packing at the time, and for some reason, I was doing Google search on data mesh. And I happened to have tripped across this ridiculous article, I will not say where, because it doesn't deserve any publicity, about the eight (Dave laughing) best data mesh software companies of 2022. (Tony laughing) One of my predictions was that you'd see data mesh washing. And I just quickly just hopped on that maybe three sentences and wrote it at about a couple minutes saying this is hogwash, essentially. (laughs) And that just reun... And then, I left for re:Invent. And the next night, when I got into my Vegas hotel room, I clicked on my computer. I saw a 15,000 hits on that post, which was the most hits of any single post I put all year. And the responses were wildly pro and con. So, it pretty much validates my expectation in that data mesh really did hit a lot more scrutiny over this past year. >> Yeah, thank you for that. I remember that article. I remember rolling my eyes when I saw it, and then I recently, (Tony laughing) I talked to Walmart and they actually invoked Martin Fowler and they said that they're working through their data mesh. So, it takes a really lot of thought, and it really, as we've talked about, is really as much an organizational construct. You're not buying data mesh >> Bingo. >> to your point. Okay. Thank you, Tony. Carl Olofson, here we go. You've graded yourself a yellow in the prediction of graph databases. Take off. Please elaborate. >> Yeah, sure. So, I realized in looking at the prediction that it seemed to imply that graph databases could be a major factor in the data world in 2022, which obviously didn't become the case. It was an error on my part in that I should have said it in the right context. It's really a three to five-year time period that graph databases will really become significant, because they still need accepted methodologies that can be applied in a business context as well as proper tools in order for people to be able to use them seriously. But I stand by the idea that it is taking off, because for one thing, Neo4j, which is the leading independent graph database provider, had a very good year. And also, we're seeing interesting developments in terms of things like AWS with Neptune and with Oracle providing graph support in Oracle database this past year. Those things are, as I said, growing gradually. There are other companies like TigerGraph and so forth, that deserve watching as well. But as far as becoming mainstream, it's going to be a few years before we get all the elements together to make that happen. Like any new technology, you have to create an environment in which ordinary people without a whole ton of technical training can actually apply the technology to solve business problems. >> Yeah, thank you for that. These specialized databases, graph databases, time series databases, you see them embedded into mainstream data platforms, but there's a place for these specialized databases, I would suspect we're going to see new types of databases emerge with all this cloud sprawl that we have and maybe to the edge. >> Well, part of it is that it's not as specialized as you might think it. You can apply graphs to great many workloads and use cases. It's just that people have yet to fully explore and discover what those are. >> Yeah. >> And so, it's going to be a process. (laughs) >> All right, Dave Menninger, streaming data permeates the landscape. You gave yourself a yellow. Why? >> Well, I couldn't think of a appropriate combination of yellow and green. Maybe I should have used chartreuse, (Dave laughing) but I was probably a little hard on myself making it yellow. This is another type of specialized data processing like Carl was talking about graph databases is a stream processing, and nearly every data platform offers streaming capabilities now. Often, it's based on Kafka. If you look at Confluent, their revenues have grown at more than 50%, continue to grow at more than 50% a year. They're expected to do more than half a billion dollars in revenue this year. But the thing that hasn't happened yet, and to be honest, they didn't necessarily expect it to happen in one year, is that streaming hasn't become the default way in which we deal with data. It's still a sidecar to data at rest. And I do expect that we'll continue to see streaming become more and more mainstream. I do expect perhaps in the five-year timeframe that we will first deal with data as streaming and then at rest, but the worlds are starting to merge. And we even see some vendors bringing products to market, such as K2View, Hazelcast, and RisingWave Labs. So, in addition to all those core data platform vendors adding these capabilities, there are new vendors approaching this market as well. >> I like the tough grading system, and it's not trivial. And when you talk to practitioners doing this stuff, there's still some complications in the data pipeline. And so, but I think, you're right, it probably was a yellow plus. Doug Henschen, data lakehouses will emerge as dominant. When you talk to people about lakehouses, practitioners, they all use that term. They certainly use the term data lake, but now, they're using lakehouse more and more. What's your thoughts on here? Why the green? What's your evidence there? >> Well, I think, I was accurate. I spoke about it specifically as something that vendors would be pursuing. And we saw yet more lakehouse advocacy in 2022. Google introduced its BigLake service alongside BigQuery. Salesforce introduced Genie, which is really a lakehouse architecture. And it was a safe prediction to say vendors are going to be pursuing this in that AWS, Cloudera, Databricks, Microsoft, Oracle, SAP, Salesforce now, IBM, all advocate this idea of a single platform for all of your data. Now, the trend was also supported in 2023, in that we saw a big embrace of Apache Iceberg in 2022. That's a structured table format. It's used with these lakehouse platforms. It's open, so it ensures portability and it also ensures performance. And that's a structured table that helps with the warehouse side performance. But among those announcements, Snowflake, Google, Cloud Era, SAP, Salesforce, IBM, all embraced Iceberg. But keep in mind, again, I'm talking about this as something that vendors are pursuing as their approach. So, they're advocating end users. It's very cutting edge. I'd say the top, leading edge, 5% of of companies have really embraced the lakehouse. I think, we're now seeing the fast followers, the next 20 to 25% of firms embracing this idea and embracing a lakehouse architecture. I recall Christian Kleinerman at the big Snowflake event last summer, making the announcement about Iceberg, and he asked for a show of hands for any of you in the audience at the keynote, have you heard of Iceberg? And just a smattering of hands went up. So, the vendors are ahead of the curve. They're pushing this trend, and we're now seeing a little bit more mainstream uptake. >> Good. Doug, I was there. It was you, me, and I think, two other hands were up. That was just humorous. (Doug laughing) All right, well, so I liked the fact that we had some yellow and some green. When you think about these things, there's the prediction itself. Did it come true or not? There are the sub predictions that you guys make, and of course, the degree of difficulty. So, thank you for that open assessment. All right, let's get into the 2023 predictions. Let's bring up the predictions. Sanjeev, you're going first. You've got a prediction around unified metadata. What's the prediction, please? >> So, my prediction is that metadata space is currently a mess. It needs to get unified. There are too many use cases of metadata, which are being addressed by disparate systems. For example, data quality has become really big in the last couple of years, data observability, the whole catalog space is actually, people don't like to use the word data catalog anymore, because data catalog sounds like it's a catalog, a museum, if you may, of metadata that you go and admire. So, what I'm saying is that in 2023, we will see that metadata will become the driving force behind things like data ops, things like orchestration of tasks using metadata, not rules. Not saying that if this fails, then do this, if this succeeds, go do that. But it's like getting to the metadata level, and then making a decision as to what to orchestrate, what to automate, how to do data quality check, data observability. So, this space is starting to gel, and I see there'll be more maturation in the metadata space. Even security privacy, some of these topics, which are handled separately. And I'm just talking about data security and data privacy. I'm not talking about infrastructure security. These also need to merge into a unified metadata management piece with some knowledge graph, semantic layer on top, so you can do analytics on it. So, it's no longer something that sits on the side, it's limited in its scope. It is actually the very engine, the very glue that is going to connect data producers and consumers. >> Great. Thank you for that. Doug. Doug Henschen, any thoughts on what Sanjeev just said? Do you agree? Do you disagree? >> Well, I agree with many aspects of what he says. I think, there's a huge opportunity for consolidation and streamlining of these as aspects of governance. Last year, Sanjeev, you said something like, we'll see more people using catalogs than BI. And I have to disagree. I don't think this is a category that's headed for mainstream adoption. It's a behind the scenes activity for the wonky few, or better yet, companies want machine learning and automation to take care of these messy details. We've seen these waves of management technologies, some of the latest data observability, customer data platform, but they failed to sweep away all the earlier investments in data quality and master data management. So, yes, I hope the latest tech offers, glimmers that there's going to be a better, cleaner way of addressing these things. But to my mind, the business leaders, including the CIO, only want to spend as much time and effort and money and resources on these sorts of things to avoid getting breached, ending up in headlines, getting fired or going to jail. So, vendors bring on the ML and AI smarts and the automation of these sorts of activities. >> So, if I may say something, the reason why we have this dichotomy between data catalog and the BI vendors is because data catalogs are very soon, not going to be standalone products, in my opinion. They're going to get embedded. So, when you use a BI tool, you'll actually use the catalog to find out what is it that you want to do, whether you are looking for data or you're looking for an existing dashboard. So, the catalog becomes embedded into the BI tool. >> Hey, Dave Menninger, sometimes you have some data in your back pocket. Do you have any stats (chuckles) on this topic? >> No, I'm glad you asked, because I'm going to... Now, data catalogs are something that's interesting. Sanjeev made a statement that data catalogs are falling out of favor. I don't care what you call them. They're valuable to organizations. Our research shows that organizations that have adequate data catalog technologies are three times more likely to express satisfaction with their analytics for just the reasons that Sanjeev was talking about. You can find what you want, you know you're getting the right information, you know whether or not it's trusted. So, those are good things. So, we expect to see the capabilities, whether it's embedded or separate. We expect to see those capabilities continue to permeate the market. >> And a lot of those catalogs are driven now by machine learning and things. So, they're learning from those patterns of usage by people when people use the data. (airy laughs) >> All right. Okay. Thank you, guys. All right. Let's move on to the next one. Tony Bear, let's bring up the predictions. You got something in here about the modern data stack. We need to rethink it. Is the modern data stack getting long at the tooth? Is it not so modern anymore? >> I think, in a way, it's got almost too modern. It's gotten too, I don't know if it's being long in the tooth, but it is getting long. The modern data stack, it's traditionally been defined as basically you have the data platform, which would be the operational database and the data warehouse. And in between, you have all the tools that are necessary to essentially get that data from the operational realm or the streaming realm for that matter into basically the data warehouse, or as we might be seeing more and more, the data lakehouse. And I think, what's important here is that, or I think, we have seen a lot of progress, and this would be in the cloud, is with the SaaS services. And especially you see that in the modern data stack, which is like all these players, not just the MongoDBs or the Oracles or the Amazons have their database platforms. You see they have the Informatica's, and all the other players there in Fivetrans have their own SaaS services. And within those SaaS services, you get a certain degree of simplicity, which is it takes all the housekeeping off the shoulders of the customers. That's a good thing. The problem is that what we're getting to unfortunately is what I would call lots of islands of simplicity, which means that it leads it (Dave laughing) to the customer to have to integrate or put all that stuff together. It's a complex tool chain. And so, what we really need to think about here, we have too many pieces. And going back to the discussion of catalogs, it's like we have so many catalogs out there, which one do we use? 'Cause chances are of most organizations do not rely on a single catalog at this point. What I'm calling on all the data providers or all the SaaS service providers, is to literally get it together and essentially make this modern data stack less of a stack, make it more of a blending of an end-to-end solution. And that can come in a number of different ways. Part of it is that we're data platform providers have been adding services that are adjacent. And there's some very good examples of this. We've seen progress over the past year or so. For instance, MongoDB integrating search. It's a very common, I guess, sort of tool that basically, that the applications that are developed on MongoDB use, so MongoDB then built it into the database rather than requiring an extra elastic search or open search stack. Amazon just... AWS just did the zero-ETL, which is a first step towards simplifying the process from going from Aurora to Redshift. You've seen same thing with Google, BigQuery integrating basically streaming pipelines. And you're seeing also a lot of movement in database machine learning. So, there's some good moves in this direction. I expect to see more than this year. Part of it's from basically the SaaS platform is adding some functionality. But I also see more importantly, because you're never going to get... This is like asking your data team and your developers, herding cats to standardizing the same tool. In most organizations, that is not going to happen. So, take a look at the most popular combinations of tools and start to come up with some pre-built integrations and pre-built orchestrations, and offer some promotional pricing, maybe not quite two for, but in other words, get two products for the price of two services or for the price of one and a half. I see a lot of potential for this. And it's to me, if the class was to simplify things, this is the next logical step and I expect to see more of this here. >> Yeah, and you see in Oracle, MySQL heat wave, yet another example of eliminating that ETL. Carl Olofson, today, if you think about the data stack and the application stack, they're largely separate. Do you have any thoughts on how that's going to play out? Does that play into this prediction? What do you think? >> Well, I think, that the... I really like Tony's phrase, islands of simplification. It really says (Tony chuckles) what's going on here, which is that all these different vendors you ask about, about how these stacks work. All these different vendors have their own stack vision. And you can... One application group is going to use one, and another application group is going to use another. And some people will say, let's go to, like you go to a Informatica conference and they say, we should be the center of your universe, but you can't connect everything in your universe to Informatica, so you need to use other things. So, the challenge is how do we make those things work together? As Tony has said, and I totally agree, we're never going to get to the point where people standardize on one organizing system. So, the alternative is to have metadata that can be shared amongst those systems and protocols that allow those systems to coordinate their operations. This is standard stuff. It's not easy. But the motive for the vendors is that they can become more active critical players in the enterprise. And of course, the motive for the customer is that things will run better and more completely. So, I've been looking at this in terms of two kinds of metadata. One is the meaning metadata, which says what data can be put together. The other is the operational metadata, which says basically where did it come from? Who created it? What's its current state? What's the security level? Et cetera, et cetera, et cetera. The good news is the operational stuff can actually be done automatically, whereas the meaning stuff requires some human intervention. And as we've already heard from, was it Doug, I think, people are disinclined to put a lot of definition into meaning metadata. So, that may be the harder one, but coordination is key. This problem has been with us forever, but with the addition of new data sources, with streaming data with data in different formats, the whole thing has, it's been like what a customer of mine used to say, "I understand your product can make my system run faster, but right now I just feel I'm putting my problems on roller skates. (chuckles) I don't need that to accelerate what's already not working." >> Excellent. Okay, Carl, let's stay with you. I remember in the early days of the big data movement, Hadoop movement, NoSQL was the big thing. And I remember Amr Awadallah said to us in theCUBE that SQL is the killer app for big data. So, your prediction here, if we bring that up is SQL is back. Please elaborate. >> Yeah. So, of course, some people would say, well, it never left. Actually, that's probably closer to true, but in the perception of the marketplace, there's been all this noise about alternative ways of storing, retrieving data, whether it's in key value stores or document databases and so forth. We're getting a lot of messaging that for a while had persuaded people that, oh, we're not going to do analytics in SQL anymore. We're going to use Spark for everything, except that only a handful of people know how to use Spark. Oh, well, that's a problem. Well, how about, and for ordinary conventional business analytics, Spark is like an over-engineered solution to the problem. SQL works just great. What's happened in the past couple years, and what's going to continue to happen is that SQL is insinuating itself into everything we're seeing. We're seeing all the major data lake providers offering SQL support, whether it's Databricks or... And of course, Snowflake is loving this, because that is what they do, and their success is certainly points to the success of SQL, even MongoDB. And we were all, I think, at the MongoDB conference where on one day, we hear SQL is dead. They're not teaching SQL in schools anymore, and this kind of thing. And then, a couple days later at the same conference, they announced we're adding a new analytic capability-based on SQL. But didn't you just say SQL is dead? So, the reality is that SQL is better understood than most other methods of certainly of retrieving and finding data in a data collection, no matter whether it happens to be relational or non-relational. And even in systems that are very non-relational, such as graph and document databases, their query languages are being built or extended to resemble SQL, because SQL is something people understand. >> Now, you remember when we were in high school and you had had to take the... Your debating in the class and you were forced to take one side and defend it. So, I was was at a Vertica conference one time up on stage with Curt Monash, and I had to take the NoSQL, the world is changing paradigm shift. And so just to be controversial, I said to him, Curt Monash, I said, who really needs acid compliance anyway? Tony Baer. And so, (chuckles) of course, his head exploded, but what are your thoughts (guests laughing) on all this? >> Well, my first thought is congratulations, Dave, for surviving being up on stage with Curt Monash. >> Amen. (group laughing) >> I definitely would concur with Carl. We actually are definitely seeing a SQL renaissance and if there's any proof of the pudding here, I see lakehouse is being icing on the cake. As Doug had predicted last year, now, (clears throat) for the record, I think, Doug was about a year ahead of time in his predictions that this year is really the year that I see (clears throat) the lakehouse ecosystems really firming up. You saw the first shots last year. But anyway, on this, data lakes will not go away. I've actually, I'm on the home stretch of doing a market, a landscape on the lakehouse. And lakehouse will not replace data lakes in terms of that. There is the need for those, data scientists who do know Python, who knows Spark, to go in there and basically do their thing without all the restrictions or the constraints of a pre-built, pre-designed table structure. I get that. Same thing for developing models. But on the other hand, there is huge need. Basically, (clears throat) maybe MongoDB was saying that we're not teaching SQL anymore. Well, maybe we have an oversupply of SQL developers. Well, I'm being facetious there, but there is a huge skills based in SQL. Analytics have been built on SQL. They came with lakehouse and why this really helps to fuel a SQL revival is that the core need in the data lake, what brought on the lakehouse was not so much SQL, it was a need for acid. And what was the best way to do it? It was through a relational table structure. So, the whole idea of acid in the lakehouse was not to turn it into a transaction database, but to make the data trusted, secure, and more granularly governed, where you could govern down to column and row level, which you really could not do in a data lake or a file system. So, while lakehouse can be queried in a manner, you can go in there with Python or whatever, it's built on a relational table structure. And so, for that end, for those types of data lakes, it becomes the end state. You cannot bypass that table structure as I learned the hard way during my research. So, the bottom line I'd say here is that lakehouse is proof that we're starting to see the revenge of the SQL nerds. (Dave chuckles) >> Excellent. Okay, let's bring up back up the predictions. Dave Menninger, this one's really thought-provoking and interesting. We're hearing things like data as code, new data applications, machines actually generating plans with no human involvement. And your prediction is the definition of data is expanding. What do you mean by that? >> So, I think, for too long, we've thought about data as the, I would say facts that we collect the readings off of devices and things like that, but data on its own is really insufficient. Organizations need to manipulate that data and examine derivatives of the data to really understand what's happening in their organization, why has it happened, and to project what might happen in the future. And my comment is that these data derivatives need to be supported and managed just like the data needs to be managed. We can't treat this as entirely separate. Think about all the governance discussions we've had. Think about the metadata discussions we've had. If you separate these things, now you've got more moving parts. We're talking about simplicity and simplifying the stack. So, if these things are treated separately, it creates much more complexity. I also think it creates a little bit of a myopic view on the part of the IT organizations that are acquiring these technologies. They need to think more broadly. So, for instance, metrics. Metric stores are becoming much more common part of the tooling that's part of a data platform. Similarly, feature stores are gaining traction. So, those are designed to promote the reuse and consistency across the AI and ML initiatives. The elements that are used in developing an AI or ML model. And let me go back to metrics and just clarify what I mean by that. So, any type of formula involving the data points. I'm distinguishing metrics from features that are used in AI and ML models. And the data platforms themselves are increasingly managing the models as an element of data. So, just like figuring out how to calculate a metric. Well, if you're going to have the features associated with an AI and ML model, you probably need to be managing the model that's associated with those features. The other element where I see expansion is around external data. Organizations for decades have been focused on the data that they generate within their own organization. We see more and more of these platforms acquiring and publishing data to external third-party sources, whether they're within some sort of a partner ecosystem or whether it's a commercial distribution of that information. And our research shows that when organizations use external data, they derive even more benefits from the various analyses that they're conducting. And the last great frontier in my opinion on this expanding world of data is the world of driver-based planning. Very few of the major data platform providers provide these capabilities today. These are the types of things you would do in a spreadsheet. And we all know the issues associated with spreadsheets. They're hard to govern, they're error-prone. And so, if we can take that type of analysis, collecting the occupancy of a rental property, the projected rise in rental rates, the fluctuations perhaps in occupancy, the interest rates associated with financing that property, we can project forward. And that's a very common thing to do. What the income might look like from that property income, the expenses, we can plan and purchase things appropriately. So, I think, we need this broader purview and I'm beginning to see some of those things happen. And the evidence today I would say, is more focused around the metric stores and the feature stores starting to see vendors offer those capabilities. And we're starting to see the ML ops elements of managing the AI and ML models find their way closer to the data platforms as well. >> Very interesting. When I hear metrics, I think of KPIs, I think of data apps, orchestrate people and places and things to optimize around a set of KPIs. It sounds like a metadata challenge more... Somebody once predicted they'll have more metadata than data. Carl, what are your thoughts on this prediction? >> Yeah, I think that what Dave is describing as data derivatives is in a way, another word for what I was calling operational metadata, which not about the data itself, but how it's used, where it came from, what the rules are governing it, and that kind of thing. If you have a rich enough set of those things, then not only can you do a model of how well your vacation property rental may do in terms of income, but also how well your application that's measuring that is doing for you. In other words, how many times have I used it, how much data have I used and what is the relationship between the data that I've used and the benefits that I've derived from using it? Well, we don't have ways of doing that. What's interesting to me is that folks in the content world are way ahead of us here, because they have always tracked their content using these kinds of attributes. Where did it come from? When was it created, when was it modified? Who modified it? And so on and so forth. We need to do more of that with the structure data that we have, so that we can track what it's used. And also, it tells us how well we're doing with it. Is it really benefiting us? Are we being efficient? Are there improvements in processes that we need to consider? Because maybe data gets created and then it isn't used or it gets used, but it gets altered in some way that actually misleads people. (laughs) So, we need the mechanisms to be able to do that. So, I would say that that's... And I'd say that it's true that we need that stuff. I think, that starting to expand is probably the right way to put it. It's going to be expanding for some time. I think, we're still a distance from having all that stuff really working together. >> Maybe we should say it's gestating. (Dave and Carl laughing) >> Sorry, if I may- >> Sanjeev, yeah, I was going to say this... Sanjeev, please comment. This sounds to me like it supports Zhamak Dehghani's principles, but please. >> Absolutely. So, whether we call it data mesh or not, I'm not getting into that conversation, (Dave chuckles) but data (audio breaking) (Tony laughing) everything that I'm hearing what Dave is saying, Carl, this is the year when data products will start to take off. I'm not saying they'll become mainstream. They may take a couple of years to become so, but this is data products, all this thing about vacation rentals and how is it doing, that data is coming from different sources. I'm packaging it into our data product. And to Carl's point, there's a whole operational metadata associated with it. The idea is for organizations to see things like developer productivity, how many releases am I doing of this? What data products are most popular? I'm actually in right now in the process of formulating this concept that just like we had data catalogs, we are very soon going to be requiring data products catalog. So, I can discover these data products. I'm not just creating data products left, right, and center. I need to know, do they already exist? What is the usage? If no one is using a data product, maybe I want to retire and save cost. But this is a data product. Now, there's a associated thing that is also getting debated quite a bit called data contracts. And a data contract to me is literally just formalization of all these aspects of a product. How do you use it? What is the SLA on it, what is the quality that I am prescribing? So, data product, in my opinion, shifts the conversation to the consumers or to the business people. Up to this point when, Dave, you're talking about data and all of data discovery curation is a very data producer-centric. So, I think, we'll see a shift more into the consumer space. >> Yeah. Dave, can I just jump in there just very quickly there, which is that what Sanjeev has been saying there, this is really central to what Zhamak has been talking about. It's basically about making, one, data products are about the lifecycle management of data. Metadata is just elemental to that. And essentially, one of the things that she calls for is making data products discoverable. That's exactly what Sanjeev was talking about. >> By the way, did everyone just no notice how Sanjeev just snuck in another prediction there? So, we've got- >> Yeah. (group laughing) >> But you- >> Can we also say that he snuck in, I think, the term that we'll remember today, which is metadata museums. >> Yeah, but- >> Yeah. >> And also comment to, Tony, to your last year's prediction, you're really talking about it's not something that you're going to buy from a vendor. >> No. >> It's very specific >> Mm-hmm. >> to an organization, their own data product. So, touche on that one. Okay, last prediction. Let's bring them up. Doug Henschen, BI analytics is headed to embedding. What does that mean? >> Well, we all know that conventional BI dashboarding reporting is really commoditized from a vendor perspective. It never enjoyed truly mainstream adoption. Always that 25% of employees are really using these things. I'm seeing rising interest in embedding concise analytics at the point of decision or better still, using analytics as triggers for automation and workflows, and not even necessitating human interaction with visualizations, for example, if we have confidence in the analytics. So, leading companies are pushing for next generation applications, part of this low-code, no-code movement we've seen. And they want to build that decision support right into the app. So, the analytic is right there. Leading enterprise apps vendors, Salesforce, SAP, Microsoft, Oracle, they're all building smart apps with the analytics predictions, even recommendations built into these applications. And I think, the progressive BI analytics vendors are supporting this idea of driving insight to action, not necessarily necessitating humans interacting with it if there's confidence. So, we want prediction, we want embedding, we want automation. This low-code, no-code development movement is very important to bringing the analytics to where people are doing their work. We got to move beyond the, what I call swivel chair integration, between where people do their work and going off to separate reports and dashboards, and having to interpret and analyze before you can go back and do take action. >> And Dave Menninger, today, if you want, analytics or you want to absorb what's happening in the business, you typically got to go ask an expert, and then wait. So, what are your thoughts on Doug's prediction? >> I'm in total agreement with Doug. I'm going to say that collectively... So, how did we get here? I'm going to say collectively as an industry, we made a mistake. We made BI and analytics separate from the operational systems. Now, okay, it wasn't really a mistake. We were limited by the technology available at the time. Decades ago, we had to separate these two systems, so that the analytics didn't impact the operations. You don't want the operations preventing you from being able to do a transaction. But we've gone beyond that now. We can bring these two systems and worlds together and organizations recognize that need to change. As Doug said, the majority of the workforce and the majority of organizations doesn't have access to analytics. That's wrong. (chuckles) We've got to change that. And one of the ways that's going to change is with embedded analytics. 2/3 of organizations recognize that embedded analytics are important and it even ranks higher in importance than AI and ML in those organizations. So, it's interesting. This is a really important topic to the organizations that are consuming these technologies. The good news is it works. Organizations that have embraced embedded analytics are more comfortable with self-service than those that have not, as opposed to turning somebody loose, in the wild with the data. They're given a guided path to the data. And the research shows that 65% of organizations that have adopted embedded analytics are comfortable with self-service compared with just 40% of organizations that are turning people loose in an ad hoc way with the data. So, totally behind Doug's predictions. >> Can I just break in with something here, a comment on what Dave said about what Doug said, which (laughs) is that I totally agree with what you said about embedded analytics. And at IDC, we made a prediction in our future intelligence, future of intelligence service three years ago that this was going to happen. And the thing that we're waiting for is for developers to build... You have to write the applications to work that way. It just doesn't happen automagically. Developers have to write applications that reference analytic data and apply it while they're running. And that could involve simple things like complex queries against the live data, which is through something that I've been calling analytic transaction processing. Or it could be through something more sophisticated that involves AI operations as Doug has been suggesting, where the result is enacted pretty much automatically unless the scores are too low and you need to have a human being look at it. So, I think that that is definitely something we've been watching for. I'm not sure how soon it will come, because it seems to take a long time for people to change their thinking. But I think, as Dave was saying, once they do and they apply these principles in their application development, the rewards are great. >> Yeah, this is very much, I would say, very consistent with what we were talking about, I was talking about before, about basically rethinking the modern data stack and going into more of an end-to-end solution solution. I think, that what we're talking about clearly here is operational analytics. There'll still be a need for your data scientists to go offline just in their data lakes to do all that very exploratory and that deep modeling. But clearly, it just makes sense to bring operational analytics into where people work into their workspace and further flatten that modern data stack. >> But with all this metadata and all this intelligence, we're talking about injecting AI into applications, it does seem like we're entering a new era of not only data, but new era of apps. Today, most applications are about filling forms out or codifying processes and require a human input. And it seems like there's enough data now and enough intelligence in the system that the system can actually pull data from, whether it's the transaction system, e-commerce, the supply chain, ERP, and actually do something with that data without human involvement, present it to humans. Do you guys see this as a new frontier? >> I think, that's certainly- >> Very much so, but it's going to take a while, as Carl said. You have to design it, you have to get the prediction into the system, you have to get the analytics at the point of decision has to be relevant to that decision point. >> And I also recall basically a lot of the ERP vendors back like 10 years ago, we're promising that. And the fact that we're still looking at the promises shows just how difficult, how much of a challenge it is to get to what Doug's saying. >> One element that could be applied in this case is (indistinct) architecture. If applications are developed that are event-driven rather than following the script or sequence that some programmer or designer had preconceived, then you'll have much more flexible applications. You can inject decisions at various points using this technology much more easily. It's a completely different way of writing applications. And it actually involves a lot more data, which is why we should all like it. (laughs) But in the end (Tony laughing) it's more stable, it's easier to manage, easier to maintain, and it's actually more efficient, which is the result of an MIT study from about 10 years ago, and still, we are not seeing this come to fruition in most business applications. >> And do you think it's going to require a new type of data platform database? Today, data's all far-flung. We see that's all over the clouds and at the edge. Today, you cache- >> We need a super cloud. >> You cache that data, you're throwing into memory. I mentioned, MySQL heat wave. There are other examples where it's a brute force approach, but maybe we need new ways of laying data out on disk and new database architectures, and just when we thought we had it all figured out. >> Well, without referring to disk, which to my mind, is almost like talking about cave painting. I think, that (Dave laughing) all the things that have been mentioned by all of us today are elements of what I'm talking about. In other words, the whole improvement of the data mesh, the improvement of metadata across the board and improvement of the ability to track data and judge its freshness the way we judge the freshness of a melon or something like that, to determine whether we can still use it. Is it still good? That kind of thing. Bringing together data from multiple sources dynamically and real-time requires all the things we've been talking about. All the predictions that we've talked about today add up to elements that can make this happen. >> Well, guys, it's always tremendous to get these wonderful minds together and get your insights, and I love how it shapes the outcome here of the predictions, and let's see how we did. We're going to leave it there. I want to thank Sanjeev, Tony, Carl, David, and Doug. Really appreciate the collaboration and thought that you guys put into these sessions. Really, thank you. >> Thank you. >> Thanks, Dave. >> Thank you for having us. >> Thanks. >> Thank you. >> All right, this is Dave Valente for theCUBE, signing off for now. Follow these guys on social media. Look for coverage on siliconangle.com, theCUBE.net. Thank you for watching. (upbeat music)

Published Date : Jan 11 2023

SUMMARY :

and pleased to tell you (Tony and Dave faintly speaks) that led them to their conclusion. down, the funding in VC IPO market. And I like how the fact And I happened to have tripped across I talked to Walmart in the prediction of graph databases. But I stand by the idea and maybe to the edge. You can apply graphs to great And so, it's going to streaming data permeates the landscape. and to be honest, I like the tough grading the next 20 to 25% of and of course, the degree of difficulty. that sits on the side, Thank you for that. And I have to disagree. So, the catalog becomes Do you have any stats for just the reasons that And a lot of those catalogs about the modern data stack. and more, the data lakehouse. and the application stack, So, the alternative is to have metadata that SQL is the killer app for big data. but in the perception of the marketplace, and I had to take the NoSQL, being up on stage with Curt Monash. (group laughing) is that the core need in the data lake, And your prediction is the and examine derivatives of the data to optimize around a set of KPIs. that folks in the content world (Dave and Carl laughing) going to say this... shifts the conversation to the consumers And essentially, one of the things (group laughing) the term that we'll remember today, to your last year's prediction, is headed to embedding. and going off to separate happening in the business, so that the analytics didn't And the thing that we're waiting for and that deep modeling. that the system can of decision has to be relevant And the fact that we're But in the end We see that's all over the You cache that data, and improvement of the and I love how it shapes the outcome here Thank you for watching.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
DavePERSON

0.99+

Doug HenschenPERSON

0.99+

Dave MenningerPERSON

0.99+

DougPERSON

0.99+

CarlPERSON

0.99+

Carl OlofsonPERSON

0.99+

Dave MenningerPERSON

0.99+

Tony BaerPERSON

0.99+

TonyPERSON

0.99+

Dave ValentePERSON

0.99+

CollibraORGANIZATION

0.99+

Curt MonashPERSON

0.99+

Sanjeev MohanPERSON

0.99+

Christian KleinermanPERSON

0.99+

Dave ValentePERSON

0.99+

WalmartORGANIZATION

0.99+

MicrosoftORGANIZATION

0.99+

AWSORGANIZATION

0.99+

SanjeevPERSON

0.99+

Constellation ResearchORGANIZATION

0.99+

IBMORGANIZATION

0.99+

Ventana ResearchORGANIZATION

0.99+

2022DATE

0.99+

HazelcastORGANIZATION

0.99+

OracleORGANIZATION

0.99+

Tony BearPERSON

0.99+

25%QUANTITY

0.99+

2021DATE

0.99+

last yearDATE

0.99+

65%QUANTITY

0.99+

GoogleORGANIZATION

0.99+

todayDATE

0.99+

five-yearQUANTITY

0.99+

TigerGraphORGANIZATION

0.99+

DatabricksORGANIZATION

0.99+

two servicesQUANTITY

0.99+

AmazonORGANIZATION

0.99+

DavidPERSON

0.99+

RisingWave LabsORGANIZATION

0.99+

Haseeb Budhani & Anant Verma | AWS re:Invent 2022 - Global Startup Program


 

>> Well, welcome back here to the Venetian. We're in Las Vegas. It is Wednesday, Day 2 of our coverage here of AWS re:Invent, 22. I'm your host, John Walls on theCUBE and it's a pleasure to welcome in two more guests as part of our AWS startup showcase, which is again part of the startup program globally at AWS. I've got Anant Verma, who is the Vice President of Engineering at Elation. Anant, good to see you, sir. >> Good to see you too. >> Good to be with us. And Haseeb Budhani, who is the CEO and co-founder of Rafay Systems. Good to see you, sir. >> Good to see you again. >> Thanks for having, yeah. A cuber, right? You've been on theCUBE? >> Once or twice. >> Many occasions. But a first timer here, as a matter of fact, glad to have you aboard. All right, tell us about Elation. First for those whom who might not be familiar with what you're up to these days, just give it a little 30,000 foot level. >> Sure, sure. So, yeah, Elation is a startup and a leader in the enterprise data intelligence space. That really includes a lot of different things including data search, data discovery, metadata management, data cataloging, data governance, data policy management, a lot of different things that companies want to do with the hoards of data that they have and Elation, our product is the answer to solve some of those problems. We've been doing pretty good. Elation is in running for about 10 years now. We are a series A startup now, we just raised around a few, a couple of months ago. We are already a hundred million plus in revenue. So. >> John: Not shabby. >> Yeah, it's a big benchmark for companies to, startup companies, to cross that milestone. So, yeah. >> And what's the relationship? I know Rafay and you have worked together, in fact, the two of you have, which I find interesting, you have a chance, you've been meeting on Zoom for a number of months, as many of us have it meeting here for the first time. But talk about that relationship with Rafay. >> Yeah, so I actually joined Elation in January and this is part of the move of Elation to a more cloud native solution. So, we have been running on AWS since last year and as part of making our solution more cloud native, we have been looking to containerize our services and run them on Kubernetes. So, that's the reason why I joined Elation in the first place to kind of make sure that this migration or move to a cloud native actually works out really well for us. This is a big move for the companies. A lot of companies that have done in the past, including, you know, Confluent or MongoDB, when they did that, they actually really reap great benefits out of that. So to do that, of course, you know, as we were looking at Kubernetes as a solution, I was personally more looking for a way to speed up things and get things out in production as fast as possible. And that's where I think, Janeb introduced us... >> That's right. >> Two of us. I think we share the same investor actually, so that's how we found each other. And yeah, it was a pretty simple decision in terms of, you know, getting the solution, figuring it out if it's useful for us and then of course, putting it out there. >> So you've hit the keyword, Kubernetes, right? And, so if you would to honestly jump in here, there are challenges, right? That you're trying to help them solve and you're working on the Kubernetes platform. So, you know, just talk about that and how that's influenced the work that the two of you are doing together. >> Absolutely. So, the business we're in is to help companies who adopt Kubernetes as an orchestration platform do it easier, faster. It's a simple story, right? Everybody is using Kubernetes, but it turns out that Kubernetes is actually not that easy to to operationalize, playing in a sandbox is one thing. Operationalizing this at a certain level of scale is not easy. Now, we have a lot of enterprise customers who are deploying their own applications on Kubernetes, and we've had many, many of them. But when it comes to a company like Elation, it's a more complicated problem set because they're taking a very complex application, their application, but then they're providing that as a service to their customers. So then we have a chain of customers we have to make happy. Anant's team, the platform organization, his internal customers who are the developers who are deploying applications, and then, the company has customers, we have to make sure that they get a good experience as they consume this application that happens to be running on Kubernetes. So that presented a really interesting challenge, right? How do we make this partnership successful? So I will say that, we've learned a lot from each other, right? And, end of the day, the goal is, my customer, Anant's specifically, right? He has to feel that, this investment, 'cause he has to pay us money, we would like to get paid. >> John: Sure. (John laughs) >> It reduces his internal expenditure because otherwise he'd have to do it himself. And most importantly, it's not the money part, it's that he can get to a certain goalpost significantly faster because the invention time for Kubernetes management, the platform that you have to build to run Kubernetes is a very complex exercise. It took us four and a half years to get here. You want to do that again, as a company, right? Why? Why do you want to do that? We, as Rafay, the way I think about what we deliver, yes, we sell a product, but to what end? The product is the what, the why, is that every enterprise, every ISV is building a Kubernetes platform in house. They shouldn't, they shouldn't need to. They should be able to consume that as a service. They consume the Kubernetes engine the EKS is Amazon's Kubernetes, they consume that as an engine. But the management layer was a gap in the market. How do I operationalize Kubernetes? And what we are doing is we're going to, you know, the Anant said. So the warden saying, "Hey you, your team is technical, you understand the problem set. Would you like to build it or would you rather consume this as a service so you can go faster?" And, resoundingly the answer is, I don't want to do this anymore. I wouldn't allow to buy. >> Well, you know, as Haseeb is saying, speed is again, when we started talking, it only took us like a couple of months to figure out if Rafay is the right solution for us. And so we ended up purchasing Rafay in April. We launched our product based on Rafay in Kubernetes, in EKS in August. >> August. >> So that's about four months. I've done some things like this before. It takes a couple of years just to sort of figure out, how do you really work with Kubernetes, right? In a production at a large scale. Right now, we are running about a 600 node cluster on Rafay and that's serving our customers. Like, one of the biggest thing that's actually happening on December 8th is we are running what we call a virtual hands on lab. >> A virtual? >> Hands on lab. >> Okay. >> For Elation. And they're probably going to be about 500 people is going to be attending it. It's like a webinar style. But what we do in that hands on lab is we will spin up an Elation instance for each attendee, right on the spot. Okay? Now, think about this enterprise software running and people just sign up for it and it's there for you, right on the spot. And that's the beauty of the software that we have been building. There's the beauty of the work that Rafay has helped us to do over the last few months. >> Okay. >> I think we need to charge them more money, I'm getting from this congregation. I'm going to go work on that. >> I'm going to let the two of you work that out later. All right. I don't want to get in the way of a big deal. But you mentioned that, we heard about it earlier that, it's you that would offer to your cert, to your clients, these services. I assume they have their different levels of tolerance and their different challenges, right? They've got their own complexities and their own organizational barriers. So how are you juggling that end of it? Because you're kind of learning as, well, not learning, but you're experiencing some of the thing. >> Right. Same things. And yet you've got this other client base that has a multitude of experiences that they're going through. >> Right. So I think, you know a lot of our customers, they are large enterprise companies. They got a whole bunch of data that they want work with us. So one of the thing that we have learned over the past few years is that we used to actually ship our software to the customers and then they would manage it for their privacy security reasons. But now, since we're running in the cloud, they're really happy about that because they don't need to juggle with the infrastructure and the software management and upgrades and things like that, we do it for them, right? And, that's the speed for them because now they are only interested in solving the problems with the data that they're working with. They don't need to deal with all these software management issues, right? So that frees our customers up to do the thing that they want to do. Of course it makes our job harder and I'm sure in turn it makes his job harder. >> We get a short end of the stick, for sure. >> That's why he is going to get more money. >> Exactly. >> Yeah, this is a great conversation. >> No, no, no. We'll talk about that. >> So, let's talk about the cloud then. How, in terms of being the platform where all this is happening and AWS, about your relationship with them as part of the startup program and what kind of value that brings to you, what does that do for you when you go out and are looking for work and what kind of cache that brings to you >> Talk about the AWS? >> Yes, sir. >> Okay. Well, so, the thing is really like of course AWS, a lot of programs in terms of making sure that as we move our customers into AWS, they can give us some, I wouldn't call it discount, but there's some credits that you can get as you move your workloads onto AWS. So that's a really great program. Our customers love it. They want us to do more things with AWS. It's a pretty seamless way for us to, as we were talking about or thinking about moving into the cloud, AWS was our number one choice and that's the only cloud that we are in, today. We're not going to go to any other place. >> That's it. >> Yeah. >> How would you characterize? I mean, we've already heard, from one side of the fence here, but. >> Absolutely. So for us, AWS is a make or break partner, frankly. As the EKS team knows very well, we support Azure's Kubernetes and Google's Kubernetes and the community Kubernetes as well. But the number of customers on our platform who are AWS native, either a hundred percent or a large percentage is, you know, that's the majority of our customer base. >> John: Yeah. >> And AWS has made it very easy for us in a variety of ways to make us successful and our customers successful. So Anant mentioned the credit program they have which is very useful 'cause we can, you know, readily kind of bring a customer to try things out and they can do that at no cost, right? So they can spin up infrastructure, play with things and AWS will cover the cost, as one example. So that's a really good thing. Beyond that, there are multiple programs at AWS, ISV accelerate, et cetera. That, you know, you got to over time, you kind of keep getting taller and taller. And you keep getting on bigger and bigger. And as you make progress, what I'm finding is that there's a great ecosystem of support that they provide us. They introduce us to customers, they help us, you know, think through architecture issues. We get access to their roadmap. We work very, very closely with the guest team, for example. Like the, the GM for Kubernetes at AWS is a gentleman named Barry Cooks who was my sponsor, right? So, we spend a lot of time together. In fact, right after this, I'm going to be spending time with him because look, they take us seriously as a partner. They spend time with us because end of the day, they understand that if they make their partners, in this case, Rafay successful, at the end of the day helps the customer, right? Anant's customer, my customer, their AWS customers, also. So they benefit because we are collectively helping them solve a problem faster. The goal of the cloud is to help people modernize, right? Reduce operational costs because data centers are expensive, right? But then if these complex solutions this is an enterprise product, Kubernetes, at the enterprise level is a complex problem. If we don't collectively work together to save the customer effort, essentially, right? Reduce their TCO for whatever it is they're doing, right? Then the cost of the cloud is too high. And AWS clearly understands and appreciates that and that's why they are going out of their air, frankly, to make us successful and make other companies successful in the startup program. >> Well. >> I would just add a couple of things there. Yeah, so, you know, cloud is not new. It's been there for a while. You know, people used to build things on their own. And so what AWS has really done is they have advanced technology enough where everything is really simple as just turning on a switch and using it, right? So, just a recent example, and I, by the way, I love managed services, right? So the reason is really because I don't need to put my own people to build and manage those things, right? So, if you want to use a search, they got the open search, if you want to use caching, they got elastic caching and stuff like that. So it's really simple and easy to just pick and choose which services you want to use and they're ready to be consumed right away. And that's the beautiful, and that that's how we can move really fast and get things done. >> Ease of use, right? Efficiency, saving money. It's a winning combination. Thanks for sharing this story, appreciate. Anant, Haseeb thanks for being with us. >> Yeah, thank you so much having us. >> We appreciate it. >> Thank you so much. >> You have been a part of the global startup program at AWS and startup showcase. Proud to feature this great collaboration. I'm John Walls. You're watching theCUBE, which is of course the leader in high tech coverage.

Published Date : Nov 30 2022

SUMMARY :

and it's a pleasure to Good to be with us. Thanks for having, yeah. glad to have you aboard. and Elation, our product is the answer startup companies, to the two of you have, So, that's the reason why I joined Elation you know, getting the solution, that the two of you are doing together. And, end of the day, the goal is, John: Sure. the platform that you have to build the right solution for us. Like, one of the biggest thing And that's the beauty of the software I'm going to go work on that. of you work that out later. that they're going through. So one of the thing that we have learned of the stick, for sure. going to get more money. We'll talk about that. and what kind of cache that brings to you and that's the only cloud from one side of the fence here, but. and the community Kubernetes as well. The goal of the cloud is to and that that's how we Ease of use, right? the global startup program

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
AWSORGANIZATION

0.99+

AmazonORGANIZATION

0.99+

Haseeb BudhaniPERSON

0.99+

JohnPERSON

0.99+

John WallsPERSON

0.99+

Barry CooksPERSON

0.99+

AprilDATE

0.99+

RafayPERSON

0.99+

December 8thDATE

0.99+

Anant VermaPERSON

0.99+

JanuaryDATE

0.99+

Las VegasLOCATION

0.99+

ElationORGANIZATION

0.99+

AnantPERSON

0.99+

twoQUANTITY

0.99+

AugustDATE

0.99+

Rafay SystemsORGANIZATION

0.99+

TwoQUANTITY

0.99+

last yearDATE

0.99+

FirstQUANTITY

0.99+

twiceQUANTITY

0.99+

four and a half yearsQUANTITY

0.99+

JanebPERSON

0.99+

firstQUANTITY

0.99+

RafayORGANIZATION

0.99+

HaseebPERSON

0.99+

OnceQUANTITY

0.99+

one exampleQUANTITY

0.99+

EKSORGANIZATION

0.98+

oneQUANTITY

0.98+

first timeQUANTITY

0.98+

GoogleORGANIZATION

0.98+

VenetianLOCATION

0.97+

ConfluentORGANIZATION

0.97+

one sideQUANTITY

0.97+

30,000 footQUANTITY

0.97+

AnantORGANIZATION

0.97+

about four monthsQUANTITY

0.97+

KubernetesORGANIZATION

0.96+

each attendeeQUANTITY

0.96+

one thingQUANTITY

0.96+

two more guestsQUANTITY

0.95+

KubernetesTITLE

0.95+

about 10 yearsQUANTITY

0.93+

Wednesday, Day 2DATE

0.92+

about 500 peopleQUANTITY

0.91+

todayDATE

0.91+

ZoomORGANIZATION

0.9+

Venkat Venkataramani, Rockset | AWS re:Invent 2022 - Global Startup Program


 

>>And good afternoon. Welcome back here on the Cub as to continue our coverage at aws Reinvent 22, win the Venetian here in Las Vegas, day two, it's Wednesday. Thanks. Still rolling. Quite a along. We have another segment for you as part of the Global Startup program, which is under the AWS Startup Showcase. I'm joined now by Vink at Viera, who is the CEO and co-founder of R Set. And good to see you, >>Sir. Thanks for having me here. Yeah, >>No, a real pleasure. Looking forward to it. So first off, for some of, for yours who might not be familiar with Roxette, I know you've been on the cube a little bit, so you're, you're an alum, but, but why don't you set the stage a little bit for Rock set and you know, where you're engaged with in terms of, with aws? >>Definitely. Rock Set is a realtime analytics database that is built for the cloud. You know, we make realtime applications possible in the cloud. You know, realtime applications need high concurrency, low latency query processing data needs to be fresh, your analytic needs to be fast. And, you know, we built on aws and that's why we are here. We are very, very proud partners of aws. We are in the AWS Accelerate program, and also we are in the startup program of aws. We are strategic ISV partner. And so yeah, we make real time analytics possible without all the cost and complexity barriers that are usually associated with it. And very, very happy to be part of this movement from batch to real time that is happening in the world. >>Right. Which is certainly an exciting trend. Right. I know great news for you, you made news yesterday, had an announcement involved with the intel with aws, who wants to share some of that >>With us too? Definitely. So, you know, one, one question that I always ask people is like, you know, if you go perspective that I share is like, if you go ask a hundred people, do you want fast analytics on fresh data or slow analytics on stale data? You know, a hundred out of a hundred would say fast and fresh, right? Sure. So then the question is, why hasn't this happened already? Why is this still a new trend that is emerging as opposed to something that everybody's taking for granted? It really comes down to compute efficiency, right? I think, you know, at the end of the day, real time analytics was always in using, you know, technologies that are, let's say 10 years ago using let's say processors that were available 10 years ago to, you know, three cloud, you know, days. There was a lot of complexity barriers associated with realtime analytics and also a lot of cost and, and performance barriers associated with it. >>And so Rox said from the, you know, from the very beginning, has been obsessing about building the most compute efficient realtime database in the world. And, you know, AWS on one hand, you know, allows us to make a consumption based pricing model. So you only pay for what you use. Sure. And that shatters all the cost barriers. But in terms of computer efficiency, what we announced yesterday is the Intel's third generation Zon scalable processors, it's code named Intel Ice Lake. When we port it over Rock said to that architecture, taking advantage of some of the instructions sets that Intel has, we got an 84% performance boost, 84, 84, 84. >>It's, it's incredible, right? >>It's, it's an incredible charts, it's an incredible milestone. It reduces the barrier even more in terms of cost and, you know, and, and pushes the efficiency and sets a, a really new record for how efficient realtime, you know, data processing can be in the cloud. And, and it's very, very exciting news. And so we used to benchmark ourselves against some of our other, you know, realtime, you know, did up providers and we were already faster and now we've set a, a much, much higher bar for other people to follow. >>Yep. And, and so what is, or what was it about real time that, that, you know, was such a barrier because, and now you've got the speed of, of course, obviously, and maybe that's what it was, but I think cost is probably part of that too, right? That's all part of that equation. I mean, real time, so elusive. >>Yeah. So real time has this inherent pattern that your data never stops coming. And when your data never stops coming, and you can now actually do analytics on that. Now, initially people start with saying, oh, I just want a real time dashboard. And then very quickly they realize, well, the dashboard is actually in real time. I'm not gonna be staring at the 24 7. Can you tap on my shoulder when something is off, something needs to be looked at. So in which case you're constantly also asking the question, is everything okay? Is everything all right? Do I need to, is is that something that I need to be, you know, double clicking on and, and following up on? So essentially very quickly in real time analytics, what happens is your queries never stop. The questions that you're asking on your data never stops. And it's often a program asking the question to detect anomalies and things like that. >>And your data never stops coming. And so compute is running 24 7. If you look at traditional data warehouses and data lakes, they're not really optimized for these kinds of workloads. They're optimized to store massive volumes of data and in a storage efficient format. And when an analyst comes and asks a question to generate a report, you can spin up a whole bunch of compute, generate the report and tear it all down when you're done. Well, that is not compute running 24 7 to continuously, you know, you know, keep ingesting the data or continuously keep answering questions. So the compute efficiency that is needed is, is much, much, much higher. Right? And that is why, you know, Rox was born. So from the very beginning, we're only built, you know, for these use cases, we have a, an extremely powerful SQL engine that can give you full feature SQL analytics in a very, very compute efficient way in the cloud. >>Right. So, so let's talk about the leap that you've made, say in the last two years and, and, and what's been the spur of that? What has been allowed you to, to create this, you know, obviously a, a different kind of an array for your customers from which to choose, but, but what's been the spark you think >>We touched upon this a little earlier, right? This spark is really, you know, the world going from batch to real time. So if you look at mainstream adoption of technologies like Apache, Kafka and Confluent doing a really good job at that. In, in, in growing that community and, and use cases, now businesses are now acquiring business data, really important business data in real time. Now they want to operationalize it, right? So, you know, extract based static reports and bi you know, business intelligence is getting replaced in all modern enterprises with what we call operational intelligence, right? Don't tell me what happened last quarter and how to plan this quarter better. Tell me what's happening today, what's happening right now. And it's, it's your business operations using data to make day to day decisions better that either grows your top line, compresses your bottom line, eliminates risk that are inherently creeping up in your business. >>Sure. You know, eliminate potential churn from a customer or fraud, you know, deduction and, and getting on top of, you know, that, you know, a minute into this, into, into an outage as opposed to an hour into the outage. Right? And so essentially I think businesses are now realizing that operational intelligence and operational analytics really, you know, allows them to leverage data and especially real time data to make their, you know, to grow their businesses faster and more efficiently. And especially in this kind of macro environment that is, you know, more important to have better unit economics in your business than ever before. Sure. And so that is really, I think that is the real market movement happening. And, and we are here to just serve that market. We are making it much, much easier for companies that have already adopted, you know, streaming technologies like Kafka and, and, and knows Canis MSK and all these technologies. Now businesses are acquiring these data in real time now. They can also get realtime analytics on the other end of it. Sure. >>You know, you just touched on this and, and I'd like to hear your thoughts about this, about, about the economic environment because it does drive decisions, right? And it does motivate people to look for efficiencies and maybe costs, you know, right. Cutting costs. What are you seeing right now in terms of that, that kind of looming influence, right? That the economy can have in terms of driving decisions about where investments are being made and what expectations are in terms of delivering value, more value for the buck? >>Exactly. I think we see across the board, all of our customers come back and tell us, we don't want to manage data infrastructure and we don't want to do kind of DIY open source clusters. We don't wanna manage and scale and build giant data ops and DevOps teams to manage that, because that is not really, you know, in their business. You know, we have car rental companies want to be better at car rentals, we want airlines to be a better airline, and they don't, don't want their, you know, a massive investment in DevOps and data ops, which is not really their core business. And they really want to leverage, you know, you know, fully managed and, you know, cloud offerings like Rock said, you know, built on aws, massively scalable in the cloud with zero operational overhead, very, very easy to get started and scale. >>And so that completely removes all the operational overhead. And so they can invest the resources they have, the manpower, they have, the calories that they have on actually growing their businesses because that is what really gonna allow them to have better unit economics, right? So everybody that is on my payroll is helping me grow my top line or shrink my bottom line, eliminate risk in my business and, and, and, and churn and, and fraud and other, and eliminate all those risks that are inherent in my business. So, so that is where I think a lot of the investments going. So gone are the days where, you know, you're gonna have these in like five to 10% team managing a very hard to operate, you know, open source data management clusters on EC two nodes in, in AWS and, and kind of DIYing it their way because those 10 people, you know, if all they do is just operational maintenance of infrastructure, which is a means to an end, you're way better off, you know, using a cloud, you know, a bond in the cloud built for the cloud solution like rock and eliminate all that cost and, and replace that with an operationally much, much simpler, you know, system to op, you know, to to work with such as, such as rock. >>So that is really the big trend that we are seeing why, you know, not only real time is going more and more mainstream cloud native solutions or the real future even when it comes to real time because the complexity barrier needs to be shattered and only cloud native solutions can actually, >>You get the two Cs cost and complexity, right. That you, you need to address. Exactly. Yeah, for sure. You know, what is it about building trust with your, with your clients, with your partners? Because you, you're talking about this cloud environment that, that everyone is talking about, right? Not everyone's made that commitment. There are still some foot draggers out there. How are you going about establishing confidence and establishing trust and, and, and providing them with really concrete examples of the values and the benefits that you can provide, you know, with, with these opportunities? >>So, you know, I grew up, so there's a few ways to to, to answer this question. I'll, I'll, I'll come, I'll cover all the angles. So in, in order to establish trust, you have to create value. They, you know, your customer has to see that with you. They were able to solve the problem faster, better, cheaper, and they're able to, you know, have a, the business impact they were looking for, which is why they started the project in the first place. And so establishing that and proving that, I think there's no equivalence to that. And, you know, I grew up at, at, you know, at Facebook back in the day, you know, I was managing online data infrastructure, okay. For Facebook from 2007 and 2015. And internally we always had this kind of culture of all the product teams building on top of the infrastructure that my team was responsible for. >>And so they were not ever, there was never a, a customer vendor relationship internally within Facebook that we're all like, we're all part of the same team. We're partnering here to have you, you know, to help you have a successful product launch. There's a very similar DNA that, that exists in Rock said, when our customers work with us and they come to us and we are there to make them successful, our consumption based pricing model also forces us to say they're not gonna really use Rock said and consume more. I mean, we don't make money until they consume, right? And so their success is very much integral part of our, our success. And so that I think is one really important angle on, you know, give us a shot, come and do an evaluation, and we will work with you to build the most efficient way to solve your problem. >>And then when you succeed, we succeed. So that I think is a very important aspect. The second one is AWS partnership. You know, we are an ISV partner, you know, AWS a lot of the time. That really helps us establish trust. And a lot of the time, one of the, the, the people that they look up to, when a customer comes in saying, Hey, what is, who is Rock? Said? You know, who are your friends? Yeah. Who are your friends? And then, you know, and then the AWS will go like, oh, you know, we'll tell you, you know, all these other successful case studies that R has, you know, you know, built up on, you know, the world's largest insurance provider, Europe's largest insurance provider. We have customers like, you know, JetBlue Airlines to Klarna, which is a big bator company. And so, so all these case studies help and, and, and, and platform and partners like AWS helps us, helps you amplify that, that, you know, and, and, and, and, and give more credibility. And last but not least, compliance matters. You know, being Soto type two compliant is, is a really important part of establishing trust. We are hip hop compliant now so that, you know, we can, you know, pi I phi data handling that. And so I think that will continue to be a part, a big part of our focus in improving the security, you know, functionality and, and capabilities that R set has in the cloud, and also compliance and, and the set of com, you know, you know, standards that we are gonna be compliant against. >>Well, I'm glad you hit on the AWS too, cause I did wanna bring that up. I, I appreciate that and I know they appreciate the relationship as well. Thanks for the time here. It's been a pleasure. Awesome. Learning about Rockette and what you're up to. Thank you. >>You bet. >>It's a pleasure. Thank you. Vi ka. All right. You are watching the cube coverage here at AWS Reinvent 22. And on the cube, of course, the leader, the leader in high tech coverage.

Published Date : Nov 30 2022

SUMMARY :

We have another segment for you as part of the Global Startup program, which is Yeah, but why don't you set the stage a little bit for Rock set and you know, where you're engaged with in terms of, And, you know, I know great news for you, you made news yesterday, you know, three cloud, you know, days. And so Rox said from the, you know, from the very beginning, has been obsessing about building benchmark ourselves against some of our other, you know, realtime, you know, did up providers That's all part of that equation. you know, double clicking on and, and following up on? And that is why, you know, to create this, you know, obviously a, a different kind of an array for your customers from which This spark is really, you know, the world going from batch you know, deduction and, and getting on top of, you know, that, you know, a minute into this, maybe costs, you know, right. And they really want to leverage, you know, you know, and, and replace that with an operationally much, much simpler, you know, system to op, that you can provide, you know, with, with these opportunities? at, you know, at Facebook back in the day, you know, I was managing online data infrastructure, you know, give us a shot, come and do an evaluation, and we will work with you to build the most efficient way and the set of com, you know, you know, standards that we are gonna be compliant against. Well, I'm glad you hit on the AWS too, cause I did wanna bring that up. And on the cube, of course, the leader, the leader in high

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
AWSORGANIZATION

0.99+

2007DATE

0.99+

2015DATE

0.99+

84%QUANTITY

0.99+

Venkat VenkataramaniPERSON

0.99+

Las VegasLOCATION

0.99+

FacebookORGANIZATION

0.99+

84QUANTITY

0.99+

WednesdayDATE

0.99+

last quarterDATE

0.99+

10 years agoDATE

0.99+

VinkPERSON

0.99+

JetBlue AirlinesORGANIZATION

0.99+

10 peopleQUANTITY

0.99+

yesterdayDATE

0.99+

fiveQUANTITY

0.99+

todayDATE

0.99+

Ice LakeCOMMERCIAL_ITEM

0.99+

EuropeLOCATION

0.99+

IntelORGANIZATION

0.99+

oneQUANTITY

0.98+

firstQUANTITY

0.98+

10%QUANTITY

0.98+

ApacheORGANIZATION

0.98+

awsORGANIZATION

0.98+

twoQUANTITY

0.98+

Vi kaPERSON

0.98+

RockPERSON

0.97+

RoxPERSON

0.97+

ConfluentORGANIZATION

0.97+

an hourQUANTITY

0.96+

R SetORGANIZATION

0.96+

one questionQUANTITY

0.96+

KlarnaORGANIZATION

0.95+

24 7QUANTITY

0.95+

RocksetORGANIZATION

0.94+

this quarterDATE

0.93+

day twoQUANTITY

0.93+

24 7OTHER

0.92+

SQLTITLE

0.91+

intelORGANIZATION

0.91+

second oneQUANTITY

0.9+

RockORGANIZATION

0.89+

DevOpsTITLE

0.88+

VieraORGANIZATION

0.87+

Startup ShowcaseEVENT

0.85+

zeroQUANTITY

0.85+

KafkaORGANIZATION

0.84+

a minuteQUANTITY

0.82+

generationCOMMERCIAL_ITEM

0.82+

EC two nodesTITLE

0.78+

a hundred peopleQUANTITY

0.77+

Reinvent 22EVENT

0.76+

a hundredQUANTITY

0.76+

important angleQUANTITY

0.75+

RocketteTITLE

0.75+

re:Invent 2022 - Global Startup ProgramTITLE

0.74+

KafkaTITLE

0.73+

Reinvent 22TITLE

0.68+

thirdQUANTITY

0.68+

ZonCOMMERCIAL_ITEM

0.66+

last two yearsDATE

0.65+

RoxettePERSON

0.65+

rockORGANIZATION

0.64+

SotoORGANIZATION

0.53+

Canis MSKORGANIZATION

0.52+

Evan Kaplan, InfluxData | AWS re:invent 2022


 

>>Hey everyone. Welcome to Las Vegas. The Cube is here, live at the Venetian Expo Center for AWS Reinvent 2022. Amazing attendance. This is day one of our coverage. Lisa Martin here with Day Ante. David is great to see so many people back. We're gonna be talk, we've been having great conversations already. We have a wall to wall coverage for the next three and a half days. When we talk to companies, customers, every company has to be a data company. And one of the things I think we learned in the pandemic is that access to real time data and real time analytics, no longer a nice to have that is a differentiator and a competitive all >>About data. I mean, you know, I love the topic and it's, it's got so many dimensions and such texture, can't get enough of data. >>I know we have a great guest joining us. One of our alumni is back, Evan Kaplan, the CEO of Influx Data. Evan, thank you so much for joining us. Welcome back to the Cube. >>Thanks for having me. It's great to be here. So here >>We are, day one. I was telling you before we went live, we're nice and fresh hosts. Talk to us about what's new at Influxed since the last time we saw you at Reinvent. >>That's great. So first of all, we should acknowledge what's going on here. This is pretty exciting. Yeah, that does really feel like, I know there was a show last year, but this feels like the first post Covid shows a lot of energy, a lot of attention despite a difficult economy. In terms of, you know, you guys were commenting in the lead into Big data. I think, you know, if we were to talk about Big Data five, six years ago, what would we be talking about? We'd been talking about Hadoop, we were talking about Cloudera, we were talking about Hortonworks, we were talking about Big Data Lakes, data stores. I think what's happened is, is this this interesting dynamic of, let's call it if you will, the, the secularization of data in which it breaks into different fields, different, almost a taxonomy. You've got this set of search data, you've got this observability data, you've got graph data, you've got document data and what you're seeing in the market and now you have time series data. >>And what you're seeing in the market is this incredible capability by developers as well and mostly open source dynamic driving this, this incredible capability of developers to assemble data platforms that aren't unicellular, that aren't just built on Hado or Oracle or Postgres or MySQL, but in fact represent different data types. So for us, what we care about his time series, we care about anything that happens in time, where time can be the primary measurement, which if you think about it, is a huge proportion of real data. Cuz when you think about what drives ai, you think about what happened, what happened, what happened, what happened, what's going to happen. That's the functional thing. But what happened is always defined by a period, a measurement, a time. And so what's new for us is we've developed this new open source engine called IOx. And so it's basically a refresh of the whole database, a kilo database that uses Apache Arrow, par K and data fusion and turns it into a super powerful real time analytics platform. It was already pretty real time before, but it's increasingly now and it adds SQL capability and infinite cardinality. And so it handles bigger data sets, but importantly, not just bigger but faster, faster data. So that's primarily what we're talking about to show. >>So how does that affect where you can play in the marketplace? Is it, I mean, how does it affect your total available market? Your great question. Your, your customer opportunities. >>I think it's, it's really an interesting market in that you've got all of these different approaches to database. Whether you take data warehouses from Snowflake or, or arguably data bricks also. And you take these individual database companies like Mongo Influx, Neo Forge, elastic, and people like that. I think the commonality you see across the volume is, is many of 'em, if not all of them, are based on some sort of open source dynamic. So I think that is an in an untractable trend that will continue for on. But in terms of the broader, the broader database market, our total expand, total available tam, lots of these things are coming together in interesting ways. And so the, the, the wave that will ride that we wanna ride, because it's all big data and it's all increasingly fast data and it's all machine learning and AI is really around that measurement issue. That instrumentation the idea that if you're gonna build any sophisticated system, it starts with instrumentation and the journey is defined by instrumentation. So we view ourselves as that instrumentation tooling for understanding complex systems. And how, >>I have to follow quick follow up. Why did you say arguably data bricks? I mean open source ethos? >>Well, I was saying arguably data bricks cuz Spark, I mean it's a great company and it's based on Spark, but there's quite a gap between Spark and what Data Bricks is today. And in some ways data bricks from the outside looking in looks a lot like Snowflake to me looks a lot like a really sophisticated data warehouse with a lot of post-processing capabilities >>And, and with an open source less >>Than a >>Core database. Yeah. Right, right, right. Yeah, I totally agree. Okay, thank you for that >>Part that that was not arguably like they're, they're not a good company or >>No, no. They got great momentum and I'm just curious. Absolutely. You know, so, >>So talk a little bit about IOx and, and what it is enabling you guys to achieve from a competitive advantage perspective. The key differentiators give us that scoop. >>So if you think about, so our old storage engine was called tsm, also open sourced, right? And IOx is open sourced and the old storage engine was really built around this time series measurements, particularly metrics, lots of metrics and handling those at scale and making it super easy for developers to use. But, but our old data engine only supported either a custom graphical UI that you'd build yourself on top of it or a dashboarding tool like Grafana or Chronograph or things like that. With IOCs. Two or three interventions were important. One is we now support, we'll support things like Tableau, Microsoft, bi, and so you're taking that same data that was available for instrumentation and now you're using it for business intelligence also. So that became super important and it kind of answers your question about the expanded market expands the market. The second thing is, when you're dealing with time series data, you're dealing with this concept of cardinality, which is, and I don't know if you're familiar with it, but the idea that that it's a multiplication of measurements in a table. And so the more measurements you want over the more series you have, you have this really expanding exponential set that can choke a database off. And the way we've designed IIS to handle what we call infinite cardinality, where you don't even have to think about that design point of view. And then lastly, it's just query performance is dramatically better. And so it's pretty exciting. >>So the unlimited cardinality, basically you could identify relationships between data and different databases. Is that right? Between >>The same database but different measurements, different tables, yeah. Yeah. Right. Yeah, yeah. So you can handle, so you could say, I wanna look at the way, the way the noise levels are performed in this room according to 400 different locations on 25 different days, over seven months of the year. And that each one is a measurement. Each one adds to cardinality. And you can say, I wanna search on Tuesdays in December, what the noise level is at 2:21 PM and you get a very quick response. That kind of instrumentation is critical to smarter systems. How are >>You able to process that data at at, in a performance level that doesn't bring the database to its knees? What's the secret sauce behind that? >>It's AUM database. It's built on Parque and Apache Arrow. But it's, but to say it's nice to say without a much longer conversation, it's an architecture that's really built for pulling that kind of data. If you know the data is time series and you're looking for a time measurement, you already have the ability to optimize pretty dramatically. >>So it's, it's that purpose built aspect of it. It's the >>Purpose built aspect. You couldn't take Postgres and do the same >>Thing. Right? Because a lot of vendors say, oh yeah, we have time series now. Yeah. Right. So yeah. Yeah. Right. >>And they >>Do. Yeah. But >>It's not, it's not, the founding of the company came because Paul Dicks was working on Wall Street building time series databases on H base, on MyQ, on other platforms and realize every time we do it, we have to rewrite the code. We build a bunch of application logic to handle all these. We're talking about, we have customers that are adding hundreds of millions to billions of points a second. So you're talking about an ingest level. You know, you think about all those data points, you're talking about ingest level that just doesn't, you know, it just databases aren't designed for that. Right? And so it's not just us, our competitors also build good time series databases. And so the category is really emergent. Yeah, >>Sure. Talk about a favorite customer story they think really articulates the value of what Influx is doing, especially with IOx. >>Yeah, sure. And I love this, I love this story because you know, Tesla may not be in favor because of the latest Elon Musker aids, but, but, but so we've had about a four year relationship with Tesla where they built their power wall technology around recording that, seeing your device, seeing the stuff, seeing the charging on your car. It's all captured in influx databases that are reporting from power walls and mega power packs all over the world. And they report to a central place at, at, at Tesla's headquarters and it reports out to your phone and so you can see it. And what's really cool about this to me is I've got two Tesla cars and I've got a Tesla solar roof tiles. So I watch this date all the time. So it's a great customer story. And actually if you go on our website, you can see I did an hour interview with the engineer that designed the system cuz the system is super impressive and I just think it's really cool. Plus it's, you know, it's all the good green stuff that we really appreciate supporting sustainability, right? Yeah. >>Right, right. Talk about from a, what's in it for me as a customer, what you guys have done, the change to IOCs, what, what are some of the key features of it and the key values in it for customers like Tesla, like other industry customers as well? >>Well, so it's relatively new. It just arrived in our cloud product. So Tesla's not using it today. We have a first set of customers starting to use it. We, the, it's in open source. So it's a very popular project in the open source world. But the key issues are, are really the stuff that we've kind of covered here, which is that a broad SQL environment. So accessing all those SQL developers, the same people who code against Snowflake's data warehouse or data bricks or Postgres, can now can code that data against influx, open up the BI market. It's the cardinality, it's the performance. It's really an architecture. It's the next gen. We've been doing this for six years, it's the next generation of everything. We've seen how you make time series be super performing. And that's only relevant because more and more things are becoming real time as we develop smarter and smarter systems. The journey is pretty clear. You instrument the system, you, you let it run, you watch for anomalies, you correct those anomalies, you re instrument the system. You do that 4 billion times, you have a self-driving car, you do that 55 times, you have a better podcast that is, that is handling its audio better, right? So everything is on that journey of getting smarter and smarter. So >>You guys, you guys the big committers to IOCs, right? Yes. And how, talk about how you support the, develop the surrounding developer community, how you get that flywheel effect going >>First. I mean it's actually actually a really kind of, let's call it, it's more art than science. Yeah. First of all, you you, you come up with an architecture that really resonates for developers. And Paul Ds our founder, really is a developer's developer. And so he started talking about this in the community about an architecture that uses Apache Arrow Parque, which is, you know, the standard now becoming for file formats that uses Apache Arrow for directing queries and things like that and uses data fusion and said what this thing needs is a Columbia database that sits behind all of this stuff and integrates it. And he started talking about it two years ago and then he started publishing in IOCs that commits in the, in GitHub commits. And slowly, but over time in Hacker News and other, and other people go, oh yeah, this is fundamentally right. >>It addresses the problems that people have with things like click cows or plain databases or Coast and they go, okay, this is the right architecture at the right time. Not different than original influx, not different than what Elastic hit on, not different than what Confluent with Kafka hit on and their time is you build an audience of people who are committed to understanding this kind of stuff and they become committers and they become the core. Yeah. And you build out from it. And so super. And so we chose to have an MIT open source license. Yeah. It's not some secondary license competitors can use it and, and competitors can use it against us. Yeah. >>One of the things I know that Influx data talks about is the time to awesome, which I love that, but what does that mean? What is the time to Awesome. Yeah. For developer, >>It comes from that original story where, where Paul would have to write six months of application logic and stuff to build a time series based applications. And so Paul's notion was, and this was based on the original Mongo, which was very successful because it was very easy to use relative to most databases. So Paul developed this commitment, this idea that I quickly joined on, which was, hey, it should be relatively quickly for a developer to build something of import to solve a problem, it should be able to happen very quickly. So it's got a schemaless background so you don't have to know the schema beforehand. It does some things that make it really easy to feel powerful as a developer quickly. And if you think about that journey, if you feel powerful with a tool quickly, then you'll go deeper and deeper and deeper and pretty soon you're taking that tool with you wherever you go, it becomes the tool of choice as you go to that next job or you go to that next application. And so that's a fundamental way we think about it. To be honest with you, we haven't always delivered perfectly on that. It's generally in our dna. So we do pretty well, but I always feel like we can do better. >>So if you were to put a bumper sticker on one of your Teslas about influx data, what would it >>Say? By the way, I'm not rich. It just happened to be that we have two Teslas and we have for a while, we just committed to that. The, the, so ask the question again. Sorry. >>Bumper sticker on influx data. What would it say? How, how would I >>Understand it be time to Awesome. It would be that that phrase his time to Awesome. Right. >>Love that. >>Yeah, I'd love it. >>Excellent time to. Awesome. Evan, thank you so much for joining David, the >>Program. It's really fun. Great thing >>On Evan. Great to, you're on. Haven't Well, great to have you back talking about what you guys are doing and helping organizations like Tesla and others really transform their businesses, which is all about business transformation these days. We appreciate your insights. >>That's great. Thank >>You for our guest and Dave Ante. I'm Lisa Martin, you're watching The Cube, the leader in emerging and enterprise tech coverage. We'll be right back with our next guest.

Published Date : Nov 29 2022

SUMMARY :

And one of the things I think we learned in the pandemic is that access to real time data and real time analytics, I mean, you know, I love the topic and it's, it's got so many dimensions and such Evan, thank you so much for joining us. It's great to be here. Influxed since the last time we saw you at Reinvent. terms of, you know, you guys were commenting in the lead into Big data. And so it's basically a refresh of the whole database, a kilo database that uses So how does that affect where you can play in the marketplace? And you take these individual database companies like Mongo Influx, Why did you say arguably data bricks? And in some ways data bricks from the outside looking in looks a lot like Snowflake to me looks a lot Okay, thank you for that You know, so, So talk a little bit about IOx and, and what it is enabling you guys to achieve from a And the way we've designed IIS to handle what we call infinite cardinality, where you don't even have to So the unlimited cardinality, basically you could identify relationships between data And you can say, time measurement, you already have the ability to optimize pretty dramatically. So it's, it's that purpose built aspect of it. You couldn't take Postgres and do the same So yeah. And so the category is really emergent. especially with IOx. And I love this, I love this story because you know, what you guys have done, the change to IOCs, what, what are some of the key features of it and the key values in it for customers you have a self-driving car, you do that 55 times, you have a better podcast that And how, talk about how you support architecture that uses Apache Arrow Parque, which is, you know, the standard now becoming for file And you build out from it. One of the things I know that Influx data talks about is the time to awesome, which I love that, So it's got a schemaless background so you don't have to know the schema beforehand. It just happened to be that we have two Teslas and we have for a while, What would it say? Understand it be time to Awesome. Evan, thank you so much for joining David, the Great thing Haven't Well, great to have you back talking about what you guys are doing and helping organizations like Tesla and others really That's great. You for our guest and Dave Ante.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
DavidPERSON

0.99+

Lisa MartinPERSON

0.99+

Evan KaplanPERSON

0.99+

six monthsQUANTITY

0.99+

EvanPERSON

0.99+

TeslaORGANIZATION

0.99+

Influx DataORGANIZATION

0.99+

PaulPERSON

0.99+

55 timesQUANTITY

0.99+

twoQUANTITY

0.99+

2:21 PMDATE

0.99+

Las VegasLOCATION

0.99+

Dave AntePERSON

0.99+

Paul DicksPERSON

0.99+

six yearsQUANTITY

0.99+

last yearDATE

0.99+

hundreds of millionsQUANTITY

0.99+

Mongo InfluxORGANIZATION

0.99+

4 billion timesQUANTITY

0.99+

TwoQUANTITY

0.99+

DecemberDATE

0.99+

MicrosoftORGANIZATION

0.99+

InfluxedORGANIZATION

0.99+

AWSORGANIZATION

0.99+

HortonworksORGANIZATION

0.99+

InfluxORGANIZATION

0.99+

IOxTITLE

0.99+

MySQLTITLE

0.99+

threeQUANTITY

0.99+

TuesdaysDATE

0.99+

each oneQUANTITY

0.98+

400 different locationsQUANTITY

0.98+

25 different daysQUANTITY

0.98+

first setQUANTITY

0.98+

an hourQUANTITY

0.98+

FirstQUANTITY

0.98+

six years agoDATE

0.98+

The CubeTITLE

0.98+

OneQUANTITY

0.98+

Neo ForgeORGANIZATION

0.98+

second thingQUANTITY

0.98+

Each oneQUANTITY

0.98+

Paul DsPERSON

0.97+

IOxORGANIZATION

0.97+

todayDATE

0.97+

TeslasORGANIZATION

0.97+

MITORGANIZATION

0.96+

PostgresORGANIZATION

0.96+

over seven monthsQUANTITY

0.96+

oneQUANTITY

0.96+

fiveDATE

0.96+

Venetian Expo CenterLOCATION

0.95+

Big Data LakesORGANIZATION

0.95+

ClouderaORGANIZATION

0.94+

ColumbiaLOCATION

0.94+

InfluxDataORGANIZATION

0.94+

Wall StreetLOCATION

0.93+

SQLTITLE

0.92+

ElasticTITLE

0.92+

Data BricksORGANIZATION

0.92+

Hacker NewsTITLE

0.92+

two years agoDATE

0.91+

OracleORGANIZATION

0.91+

AWS Reinvent 2022EVENT

0.91+

Elon MuskerPERSON

0.9+

SnowflakeORGANIZATION

0.9+

ReinventORGANIZATION

0.89+

billions of points a secondQUANTITY

0.89+

four yearQUANTITY

0.88+

ChronographTITLE

0.88+

ConfluentTITLE

0.87+

SparkTITLE

0.86+

ApacheORGANIZATION

0.86+

SnowflakeTITLE

0.85+

GrafanaTITLE

0.85+

GitHubORGANIZATION

0.84+

Bharath Chari, Confluent & Sam Kassoumeh, SecurityScorecard | AWS Startup Showcase S2 E4


 

>>Hey everyone. Welcome to the cubes presentation of the AWS startup showcase. This is season two, episode four of our ongoing series. That's featuring exciting startups within the AWS ecosystem. This theme, cybersecurity protect and detect against threats. I'm your host. Lisa Martin. I've got two guests here with me. Please. Welcome back to the program. Sam Kam, a COO and co-founder of security scorecard and bar Roth. Charri team lead solutions marketing at confluent guys. It's great to have you on the program talking about cybersecurity. >>Thanks for having us, Lisa, >>Sam, let's go ahead and kick off with you. You've been on the queue before, but give the audience just a little bit of context about security scorecard or SSC as they're gonna hear it referred to. >>Yeah. AB absolutely. Thank you for that. Well, the easiest way to, to put it is when people wanna know about their credit risk, they consult one of the major credit scoring companies. And when companies wanna know about their cybersecurity risk, they turn to security scorecard to get that holistic view of, of, of the security posture. And the way it works is SSC is continuously 24 7 collecting signals from across the entire internet. I entire IPV four space and they're doing it to identify vulnerable and misconfigured digital assets. And we were just looking back over like a three year period. We looked from 2019 to 2022. We, we, we assessed through our techniques over a million and a half organizations and found that over half of them had at least one open critical vulnerability exposed to the internet. What was even more shocking was 20% of those organizations had amassed over a thousand vulnerabilities each. >>So SSC we're in the business of really building solutions for customers. We mine the data from dozens of digital sources and help discover the risks and the flaws that are inherent to their business. And that becomes increasingly important as companies grow and find new sources of risk and new threat vectors that emerge on the internet for themselves and for their vendor and business partner ecosystem. The last thing I'll mention is the platform that we provide. It relies on data collection and processing to be done in an extremely accurate and real time way. That's a key for that's allowed us to scale. And in order to comp, in order for us to accomplish this security scorecard engineering teams, they used a really novel combination of confluent cloud and confluent platform to build a really, really robust data for streaming pipelines and the data streaming pipelines enabled by confluent allow us at security scorecard to collect the data from a lot of various sources for risk analysis. Then they get feer further analyzed and provided to customers as a easy to understand summary of analytics. >>Rob, let's bring you into the conversation, talk about confluent, give the audience that overview and then talk about what you're doing together with SSC. >>Yeah, and I wanted to say Sam did a great job of setting up the context about what confluent is. So, so appreciate that, but a really simple way to think about it. Lisa is confident as a data streaming platform that is pioneering a fundamentally new category of data infrastructure that is at the core of what SSE does. Like Sam said, the key is really collect data accurately at scale and in real time. And that's where our cloud native offering really empowers organizations like SSE to build great customer experiences for their customers. And the other thing we do is we also help organizations build a sophisticated real time backend operations. And so at a high level, that's the best way to think about comfort. >>Got it. But I'll talk about data streaming, how it's being used in cyber security and what the data streaming pipelines enable enabled by confluent allow SSE to do for its customers. >>Yeah, I think Sam can definitely share his thoughts on this, but one of the things I know we are all sort of experiencing is the, is the rise of cyber threats, whether it's online from a business B2B perspective or as consumers just be our data and, and the data that they're generating and the companies that have access to it. So as the, the need to protect the data really grows companies and organizations really need to effectively detect, respond and protect their environments. And the best way to do this is through three ways, scale, speed, and cost. And so going back to the points I brought up earlier with conference, you can really gain real time data ingestion and enable those analytics that Sam talked about previously while optimizing for cost scale. So those are so doing all of this at the same time, as you can imagine, is, is not easy and that's where we Excel. >>And so the entire premise of data streaming is built on the concepts. That data is not static, but constantly moving across your organization. And that's why we call it data streams. And so at its core, we we've sort of built or leveraged that open source foundation of APA sheet Kafka, but we have rearchitected it for the cloud with a totally new cloud native experience. And ultimately for customers like SSE, we have taken a away the need to manage a lot of those operational tasks when it comes to Apache Kafka. The other thing we've done is we've added a ton of proprietary IP, including security features like role based access control. I mean, some prognosis talking about, and that really allows you to securely connect to any data no matter where it resides at scale at speed. And it, >>Can you talk about bar sticking with you, but some of the improvements, and maybe this is a actually question for Sam, some of the improvements that have been achieved on the SSC side as a result of the confluent partnership, things are much faster and you're able to do much more understand, >>Can I, can Sam take it away? I can maybe kick us off and then breath feel, feel free to chime in Lisa. The, the, the, the problem that we're talking about has been for us, it was a longstanding challenge. We're about a nine year old company. We're a high growth startup and data collection has always been in, in our DNA. It's at it's at the core of what we do and getting, getting the insights, the, and analytics that we synthesize from that data into customer's hands as quickly as possible is the, is the name of the game because they're trying to make decisions and we're empowering them to make those decisions faster. We always had challenges in, in the arena because we, well partners like confluent didn't didn't exist when we started scorecard when, when we we're a customer. But we, we, we think of it as a partnership when we found confluent technology and you can hear it from Barth's description. >>Like we, we shared a common vision and they understood some of the pain points that we were experiencing on a very like visceral and intimate level. And for us, that was really exciting, right? Just to have partners that are there saying, we understand your problem. This is exactly the problem that we're solving. We're, we're here to help what the technology has done for us since then is it's not only allowed us to process the data faster and get the analytics to the customer, but it's also allowed us to create more value for customers, which, which I'll talk about in a bit, including new products and new modules that we didn't have the capabilities to deliver before. >>And we'll talk about those new products in a second exciting stuff coming out there from SSC, bro. Talk about the partnership from, from confluence perspective, how has it enabled confluence to actually probably enhance its technology as a result of seeing and learning what SSC is able to do with the technology? >>Yeah, first of all, I, I completely agree with Sam it's, it's more of a partnership because like Sam said, we sort of shared the same vision and that is to really make sure that organizations have access to the data. Like I said earlier, no matter where it resides so that you can scan and identify the, the potential security security threads. I think from, from our perspective, what's really helped us from the perspective of partnering with SSE is just looking at the data volumes that they're working with. So I know a stat that we talked about recently was around scanning billions of records, thousands of ports on a daily basis. And so that's where, like I, like I mentioned earlier, our technology really excels because you can really ingest and amplify the volumes of data that you're processing so that you can scan and, and detect those threats in real time. >>Because I mean, especially the amount of volume, the data volume that's increasing on a year by basis, that aspect in order to be able to respond quickly, that is paramount. And so what's really helped us is just seeing what SSE is doing in terms of scanning the, the web ports or the data systems that are at are at potential risk. Being able to support their use cases, whether it's data sharing between their different teams internally are being able to empower customers, to be able to detect and scan their data systems. And so the learning for us is really seeing how those millions and billions of records get processed. >>Got it sounds like a really synergistic partnership that you guys have had there for the last year or so, Sam, let's go back over to you. You mentioned some new products. I see SSC just released a tax surface intelligence product. That's detecting thousands of vulnerabilities per minute. Talk to us about that, the importance of that, and another release that you're making. >>There are some really exciting products that we have released recently and are releasing at security scorecard. When we think about, when we think about ratings and risk, we think about it not just for our companies or our third parties, but we think about it in a, in a broader sense of an, of an ecosystem, because it's important to have data on third parties, but we also want to have the data on their third parties as well. No, nobody's operating in a vacuum. Everybody's operating in this hyper connected ecosystem and the risk can live not just in the third parties, but they might be storing processing data in a myriad of other technological solutions, which we want to understand, but it's really hard to get that visibility because today the way it's done is companies ask their third parties. Hey, send me a list of your third parties, where my data is stored. >>It's very manual, it's very labor intensive, and it's a trust based exercise that makes it really difficult to validate. What we've done is we've developed a technology called a V D automatic vendor detection. And what a V D does is it goes out and for any company, your own company or another business partner that you work with, it will go detect all of the third party connections that we see that have a live network connection or data connection to an organization. So that's like an awareness and discovery tool because now we can see and pull the veil back and see what the bigger ecosystem and connectivity looks like. Thus allowing the customers to go hold accountable, not just the third parties, but their fourth parties, fifth parties really end parties. And they, and they can only do that by using scorecard. The attack surface intelligence tool is really exciting for us because well, be before security scorecard people thought what we were doing was fairly, I impossible. >>It was really hard to get instant visibility on any company and any business partner. And at the same time, it was of critical importance to have that instant visibility into the risk because companies are trying to make faster decisions and they need the risk data to steer those decisions. So when I think about, when I think about that problem in, in managing sort of this evolving landscape, what it requires is it requires insightful and actionable, real time security data. And that relies on a couple things, talent and tech on the talent side, it starts with people. We have an amazing R and D team. We invest heavily. It's the heartbeat of what we do. That team really excels in areas of data collection analysis and scaling large data sets. And then we know on the tech side, well, we figured out some breakthrough techniques and it also requires partners like confluent to help with the real time streaming. >>What we realized was those capabilities are very desired in the market. And we created a new product from it called the tech surface intelligence. A tech surface intelligence focuses less on the rating. There's, there's a persona on users that really value the rating. It's easy to understand. It's a bridge language between technical and non-technical stakeholders. That's on one end of the spectrum on the other end of the spectrum. There's customers and users, very technical customers and users that may not have as much interest in a layman's rating, but really want a deep dive into the strong threat Intel data and capabilities and insights that we're producing. So we produced ASI, which stands for attack surface intelligence that allows customers to look at the surface area of attack all of the digital assets for any organization and see all of the threats, vulnerabilities, bad actors, including sometimes discoveries of zero day vulnerabilities that are, that are out in the wild and being exploited by bad guys. So we have a really strong pulse on what's happening on the internet, good and bad. And we created that product to help service a market that was interested in, in going deep into the data. >>So it's >>So critical. Go >>Ahead to jump in there real quick, because I think the points that Sam brought up, we had a great, great discussion recently while we were building on the case study that I think brings this to life, going back to the AVD product that Sam talked about and, and Sam can probably do a better job of walking through the story, but the way I understand it, one of security scorecards customers approached them and told them that they had an issue to resolve and what they ended up. So this customer was using an AVD product at the time. And so they said that, Hey, the car SSE, they said, Hey, your product shows that we used, you were using HubSpot, but we stopped using that age server. And so I think when SSE investigated, they did find a very recent HubSpot ping being used by the marketing team in this instance. And as someone who comes from that marketing background, I can raise my hand and said, I've been there, done that. So, so yeah, I mean, Sam can probably share his thoughts on this, but that's, I think the great story that sort of brings this all to life in terms of how actually customers go about using SSCs products. >>And Sam, go ahead on that. It sounds like, and one of the things I'm hearing that is a benefit is reduction in shadow. It, I'm sure that happens so frequently with your customers about Mar like a great example that you gave of, of the, the it folks saying we don't use HubSpot, have it in years marketing initiates an instance. Talk about that as some of the benefits in it for customers reducing shadow it, there's gotta be many more benefits from a security perspective. >>Yeah, the, there's a, there's a big challenge today because the market moved to the cloud and that makes it really easy for anybody in an organization to go sign, sign up, put in a credit card, or get a free trial to, to any product. And that product can very easily connect into the corporate system and access the data. And because of the nature of how cloud products work and how easy they are to sign up a byproduct of that is they sort of circumvent a traditional risk assessment process that, that organizations go through and organizations invest a, a lot of money, right? So there's a lot of time and money and energy that are invested in having good procurement risk management life cycles, and making sure that contracts are buttoned up. So on one side you have companies investing loads of energy. And then on the other side, any employee can circumvent that process by just going and with a few clicks, signing up and purchasing a product. >>And that's, and, and, and then that causes a, a disparity and Delta between what the technology and security team's understanding is of the landscape and, and what reality is. And we're trying to close that gap, right? We wanna close and reduce any windows of time or opportunity where a hacker can go discover some misconfigured cloud asset that somebody signed up for and maybe forgot to turn off. I mean, it's a lot of it is just human error and it, and it happens the example that Barra gave, and this is why understanding the third parties are so important. A customer contacted us and said, Hey, you're a V D detection product has an error. It's showing we're using a product. I think it was HubSpot, but we stopped using that. Right. And we don't understand why you're still showing it. It has to be a false positive. >>So we investigated and found that there was a very recent live HubSpot connection, ping being made. Sure enough. When we went back to the customer said, we're very confident the data's accurate. They looked into it. They found that the marketing team had started experimenting with another instance of HubSpot on the side. They were putting in real customer data in that instance. And it, it, you know, it triggered a security assessment. So we, we see all sorts of permutations of it, large multinational companies spin up a satellite office and a contractor setting up the network equipment. They misconfigure it. And inadvertently leave an administrator portal to the Cisco router exposed on the public internet. And they forget to turn off the administrative default credentials. So if a hacker stumbles on that, they can ha they have direct access to the network. We're trying to catch those things and surface them to the client before the hackers find it. >>So we're giving 'em this, this hacker's eye view. And without the continuous data analysis, without the stream processing, the customer wouldn't have known about those risks. But if you can automatically know about the risks as they happen, what that does is that prevents a million shoulder taps because the customer doesn't have to go tap on the marketing team's shoulder and go tap on employees and manually interview them. They have the data already, and that can be for their company. That can be for any company they're doing business with where they're storing and processing data. That's a huge time savings and a huge risk reduction, >>Huge risk reduction. Like you're taking blinders off that they didn't even know were there. And I can imagine Sam tune in the last couple of years, as SAS skyrocketed the use of collaboration tools, just to keep the lights on for organizations to be able to communicate. There's probably a lot of opportunity in your customer base and perspective customer base to engage with you and get that really full 360 degree view of their entire organization. Third parties, fourth parties, et cetera. >>Absolutely. Absolutely. CU customers are more engaged than they've ever been because that challenge of the market moving to the cloud, it hasn't stopped. We've been talking about it for a long time, but there's still a lot of big organizations that are starting to dip their toe in the pool and starting to cut over from what was traditionally an in-house data center in the basement of the headquarters. They're, they're moving over to the cloud. And then on, on top of that cloud providers like Azure, AWS, especially make it so easy for any company to go sign up, get access, build a product, and launch that product to the market. We see more and more organizations sitting on AWS, launching products and software. The, the barrier to entry is very, very low. And the value in those products is very, very high. So that's drawing the attention of organizations to go sign up and engage. >>The challenge then becomes, we don't know who has control over this data, right? We don't have know who has control and visibility of our data. We're, we're bringing that to surface and for vendors themselves like, especially companies that sit in AWS, what we see them doing. And I think Lisa, this is what you're alluding to. When companies engage in their own scorecard, there's a bit of a social aspect to it. When they look good in our platform, other companies are following them, right? So now all of the sudden they can make one motion to go look good, make their scorecard buttoned up. And everybody who's looking at them now sees that they're doing the right things. We actually have a lot of vendors who are customers, they're winning more competitive bakeoffs and deals because they're proving to their clients faster that they can trust them to store the data. >>So it's a bit of, you know, we're in a, two-sided kind of market. You have folks that are assessing other folks. That's fun to look at others and see how they're doing and hold them accountable. But if you're on the receiving end, that can be stressful. So what we've done is we've taken the, that situation and we've turned it into a really positive and productive environment where companies, whether they're looking at someone else or they're looking at themselves to prove to their clients, to prove to the board, it turns into a very productive experience for them >>One. Oh >>Yeah. That validation. Go ahead, bro. >>Really. I was gonna ask Sam his thoughts on one particular aspect. So in terms of the industry, Sam, that you're seeing sort of really moving to the cloud and like this need for secure data, making sure that the data can be trusted. Are there specific like verticals that are doing that better than the others? Or do you see that across the board? >>I think some industries have it easier and some industries have it harder, definitely in industries that are, I think, health, healthcare, financial services, a absolutely. We see heavier activity there on, on both sides, right? They they're, they're certainly becoming more and more proactive in their investments, but the attacks are not stopping against those, especially healthcare because the data is so valuable and historically healthcare was under, was an underinvested space, right. Hospitals. And we're always strapped for it folks. Now, now they're starting to wake up and pay very close attention and make heavier investments. >>That's pretty interesting. >>Tremendous opportunity there guys. I'm sorry. We are out of time, but this is such an interesting conversation. You see, we keep going, wanna ask you both where can, can prospective interested customers go to learn more on the SSC side, on the confluence side, through the AWS marketplace? >>I let some go first. >>Sure. Oh, thank thank, thank you. Thank you for on the security scorecard side. Well look, security scorecard is with the help of Colu is, has made it possible to instantly rate the security posture of any company in the world. We have 12 million organizations rated today and, and that, and that's going up every day. We invite any company in the world to try security scorecard for free and experience how, how easy it is to get your rating and see the security rating of, of any company and any, any company can claim their score. There's no, there's no charge. They can go to security, scorecard.com and we have a special, actually a special URL security scorecard.com/free-account/aws marketplace. And even better if someone's already on AWS, you know, you can view our security posture with the AWS marketplace, vendor insights, plugin to quickly and securely procure your products. >>Awesome. Guys, this has been fantastic information. I'm sorry, bro. Did you wanna add one more thing? Yeah. >>I just wanted to give quick call out leads. So anyone who wants to learn more about data streaming can go to www confluent IO. There's also an upcoming event, which has a separate URL. That's coming up in October where you can learn all about data streaming and that URL is current event.io. So those are the two URLs I just wanted to quickly call out. >>Awesome guys. Thanks again so much for partnering with the cube on season two, episode four of our AWS startup showcase. We appreciate your insights and your time. And for those of you watching, thank you so much. Keep it right here for more action on the, for my guests. I am Lisa Martin. We'll see you next time.

Published Date : Sep 7 2022

SUMMARY :

It's great to have you on the program talking about cybersecurity. You've been on the queue before, but give the audience just a little bit of context about And the way it works the flaws that are inherent to their business. Rob, let's bring you into the conversation, talk about confluent, give the audience that overview and then talk about what a fundamentally new category of data infrastructure that is at the core of what what the data streaming pipelines enable enabled by confluent allow SSE to do for And so going back to the points I brought up earlier with conference, And so the entire premise of data streaming is built on the concepts. It's at it's at the core of what we do and getting, Just to have partners that are there saying, we understand your problem. Talk about the partnership from, from confluence perspective, how has it enabled confluence to So I know a stat that we talked about And so the learning for us is really seeing how those millions and billions Talk to us about that, the importance of that, and another release that you're making. and the risk can live not just in the third parties, Thus allowing the customers to go hold accountable, not just the third parties, And at the same time, it was of critical importance to have that instant visibility into the risk because And we created a new product from it called the tech surface intelligence. So critical. to resolve and what they ended up. Talk about that as some of the benefits in it for customers reducing shadow it, And because of the nature I mean, it's a lot of it is just human error and it, and it happens the example that Barra gave, And they forget to turn off the administrative default credentials. a million shoulder taps because the customer doesn't have to go tap on the marketing team's shoulder and go tap just to keep the lights on for organizations to be able to communicate. because that challenge of the market moving to the cloud, it hasn't stopped. So now all of the sudden they can make one motion to go look to prove to the board, it turns into a very productive experience for them Go ahead, bro. need for secure data, making sure that the data can be trusted. Now, now they're starting to wake up and pay very close attention and make heavier investments. learn more on the SSC side, on the confluence side, through the AWS marketplace? They can go to security, scorecard.com and we have a special, Did you wanna add one more thing? can go to www confluent IO. And for those of you watching,

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
SamPERSON

0.99+

Lisa MartinPERSON

0.99+

Sam KamPERSON

0.99+

LisaPERSON

0.99+

Sam KassoumehPERSON

0.99+

OctoberDATE

0.99+

20%QUANTITY

0.99+

2019DATE

0.99+

SSEORGANIZATION

0.99+

AWSORGANIZATION

0.99+

millionsQUANTITY

0.99+

two guestsQUANTITY

0.99+

SSCORGANIZATION

0.99+

360 degreeQUANTITY

0.99+

RobPERSON

0.99+

HubSpotORGANIZATION

0.99+

ExcelTITLE

0.99+

CiscoORGANIZATION

0.99+

DeltaORGANIZATION

0.99+

2022DATE

0.99+

last yearDATE

0.99+

fifth partiesQUANTITY

0.99+

Bharath ChariPERSON

0.99+

both sidesQUANTITY

0.99+

SASORGANIZATION

0.99+

thousandsQUANTITY

0.98+

over a million and a half organizationsQUANTITY

0.98+

three yearQUANTITY

0.98+

APATITLE

0.98+

todayDATE

0.98+

billions of recordsQUANTITY

0.98+

thousands of portsQUANTITY

0.97+

secondQUANTITY

0.97+

oneQUANTITY

0.97+

bothQUANTITY

0.97+

ColuORGANIZATION

0.97+

fourth partiesQUANTITY

0.96+

two URLsQUANTITY

0.96+

over a thousand vulnerabilitiesQUANTITY

0.96+

www confluent IOOTHER

0.95+

zero dayQUANTITY

0.95+

BarthPERSON

0.95+

IntelORGANIZATION

0.93+

scorecard.comOTHER

0.93+

one more thingQUANTITY

0.91+

SSETITLE

0.89+

firstQUANTITY

0.89+

BarraORGANIZATION

0.88+

24 7QUANTITY

0.87+

12 million organizationsQUANTITY

0.85+

Closing Remarks | Supercloud22


 

(gentle upbeat music) >> Welcome back everyone, to "theCUBE"'s live stage performance here in Palo Alto, California at "theCUBE" Studios. I'm John Furrier with Dave Vellante, kicking off our first inaugural Supercloud event. It's an editorial event, we wanted to bring together the best in the business, the smartest, the biggest, the up-and-coming startups, venture capitalists, everybody, to weigh in on this new Supercloud trend, this structural change in the cloud computing business. We're about to run the Ecosystem Speaks, which is a bunch of pre-recorded companies that wanted to get their voices on the record, so stay tuned for the rest of the day. We'll be replaying all that content and they're going to be having some really good commentary and hear what they have to say. I had a chance to interview and so did Dave. Dave, this is our closing segment where we kind of unpack everything or kind of digest and report. So much to kind of digest from the conversations today, a wide range of commentary from Supercloud operating system to developers who are in charge to maybe it's an ops problem or maybe Oracle's a Supercloud. I mean, that was debated. So so much discussion, lot to unpack. What was your favorite moments? >> Well, before I get to that, I think, I go back to something that happened at re:Invent last year. Nick Sturiale came up, Steve Mullaney from Aviatrix; we're going to hear from him shortly in the Ecosystem Speaks. Nick Sturiale's VC said "it's happening"! And what he was talking about is this ecosystem is exploding. They're building infrastructure or capabilities on top of the CapEx infrastructure. So, I think it is happening. I think we confirmed today that Supercloud is a thing. It's a very immature thing. And I think the other thing, John is that, it seems to me that the further you go up the stack, the weaker the business case gets for doing Supercloud. We heard from Marianna Tessel, it's like, "Eh, you know, we can- it was easier to just do it all on one cloud." This is a point that, Adrian Cockcroft just made on the panel and so I think that when you break out the pieces of the stack, I think very clearly the infrastructure layer, what we heard from Confluent and HashiCorp, and certainly VMware, there's a real problem there. There's a real need at the infrastructure layer and then even at the data layer, I think Benoit Dageville did a great job of- You know, I was peppering him with all my questions, which I basically was going through, the Supercloud definition and they ticked the box on pretty much every one of 'em as did, by the way Ali Ghodsi you know, the big difference there is the philosophy of Republicans and Democrats- got open versus closed, not to apply that to either one side, but you know what I mean! >> And the similarities are probably greater than differences. >> Berkely, I would probably put them on the- >> Yeah, we'll put them on the Democrat side we'll make Snowflake the Republicans. But so- but as we say there's a lot of similarities as well in terms of what their objectives are. So, I mean, I thought it was a great program and a really good start to, you know, an industry- You brought up the point about the industry consortium, asked Kit Colbert- >> Yep. >> If he thought that was something that was viable and what'd they say? That hyperscale should lead it? >> Yeah, they said hyperscale should lead it and there also should be an industry consortium to get the voices out there. And I think VMware is very humble in how they're putting out their white paper because I think they know that they can't do it all and that they do not have a great track record relative to cloud. And I think, but they have a great track record of loyal installed base ops people using VMware vSphere all the time. >> Yeah. >> So I think they need a catapult moment where they can catapult to the cloud native which they've been working on for years under Raghu and the team. So the question on VMware is in the light of Broadcom, okay, acquisition of VMware, this is an opportunity or it might not be an opportunity or it might be a spin-out or something, I just think VMware's got way too much engineering culture to be ignored, Dave. And I think- well, I'm going to watch this very closely because they can pull off some sort of rallying moment. I think they could. And then you hear the upstarts like Platform9, Rafay Systems and others they're all like, "Yes, we need to unify behind something. There needs to be some sort of standard". You know, we heard the argument of you know, more standards bodies type thing. So, it's interesting, maybe "theCUBE" could be that but we're going to certainly keep the conversation going. >> I thought one of the most memorable statements was Vittorio who said we- for VMware, we want our cake, we want to eat it too and we want to lose weight. So they have a lot of that aspirations there! (John laughs) >> And then I thought, Adrian Cockcroft said you know, the devs, they want to get married. They were marrying everybody, and then the ops team, they have to deal with the divorce. >> Yeah. >> And I thought that was poignant. It's like, they want consistency, they want standards, they got to be able to scale And Lori MacVittie, I'm not sure you agree with this, I'd have to think about it, but she was basically saying, all we've talked about is devs devs devs for the last 10 years, going forward we're going to be talking about ops. >> Yeah, and I think one of the things I learned from this day and looking back, and some kind of- I've been sauteing through all the interviews. If you zoom out, for me it was the epiphany of developers are still in charge. And I've said, you know, the developers are doing great, it's an ops security thing. Not sure I see that the way I was seeing before. I think what I learned was the refactoring pattern that's emerging, In Sik Rhee brought this up from Vertex Ventures with Marianna Tessel, it's a nuanced point but I think he's right on which is the pattern that's emerging is developers want ease-of-use tooling, they're driving the change and I think the developers in the devs ops ethos- it's never going to be separate. It's going to be DevOps. That means developers are driving operations and then security. So what I learned was it's not ops teams leveling up, it's devs redefining what ops is. >> Mm. And I think that to me is where Supercloud's going to be interesting- >> Forcing that. >> Yeah. >> Forcing the change because the structural change is open sources thriving, devs are still in charge and they still want more developers, Vittorio "we need more developers", right? So the developers are in charge and that's clear. Now, if that happens- if you believe that to be true the domino effect of that is going to be amazing because then everyone who gets on the wrong side of history, on the ops and security side, is going to be fighting a trend that may not be fight-able, you know, it might be inevitable. And so the winners are the ones that are refactoring their business like Snowflake. Snowflake is a data warehouse that had nothing to do with Amazon at first. It was the developers who said "I'm going to refactor data warehouse on AWS". That is a developer-driven refactorization and a business model. So I think that's the pattern I'm seeing is that this concept refactoring, patterns and the developer trajectory is critical. >> I thought there was another great comment. Maribel Lopez, her Lord of the Rings comment: "there will be no one ring to rule them all". Now at the same time, Kit Colbert, you know what we asked him straight out, "are you the- do you want to be the, the Supercloud OS?" and he basically said, "yeah, we do". Now, of course they're confined to their world, which is a pretty substantial world. I think, John, the reason why Maribel is so correct is security. I think security's a really hard problem to solve. You've got cloud as the first layer of defense and now you've got multiple clouds, multiple layers of defense, multiple shared responsibility models. You've got different tools for XDR, for identity, for governance, for privacy all within those different clouds. I mean, that really is a confusing picture. And I think the hardest- one of the hardest parts of Supercloud to solve. >> Yeah, and I thought the security founder Gee Rittenhouse, Piyush Sharrma from Accurics, which sold to Tenable, and Tony Kueh, former head of product at VMware. >> Right. >> Who's now an investor kind of looking for his next gig or what he is going to do next. He's obviously been extremely successful. They brought up the, the OS factor. Another point that they made I thought was interesting is that a lot of the things to do to solve the complexity is not doable. >> Yeah. >> It's too much work. So managed services might field the bit. So, and Chris Hoff mentioned on the Clouderati segment that the higher level services being a managed service and differentiating around the service could be the key competitive advantage for whoever does it. >> I think the other thing is Chris Hoff said "yeah, well, Web 3, metaverse, you know, DAO, Superclouds" you know, "Stupercloud" he called it and this bring up- It resonates because one of the criticisms that Charles Fitzgerald laid on us was, well, it doesn't help to throw out another term. I actually think it does help. And I think the reason it does help is because it's getting people to think. When you ask people about Supercloud, they automatically- it resonates with them. They play back what they think is the future of cloud. So Supercloud really talks to the future of cloud. There's a lot of aspects to it that need to be further defined, further thought out and we're getting to the point now where we- we can start- begin to say, okay that is Supercloud or that isn't Supercloud. >> I think that's really right on. I think Supercloud at the end of the day, for me from the simplest way to describe it is making sure that the developer experience is so good that the operations just happen. And Marianna Tessel said, she's investing in making their developer experience high velocity, very easy. So if you do that, you have to run on premise and on the cloud. So hybrid really is where Supercloud is going right now. It's not multi-cloud. Multi-cloud was- that was debunked on this session today. I thought that was clear. >> Yeah. Yeah, I mean I think- >> It's not about multi-cloud. It's about operationally seamless operations across environments, public cloud to on-premise, basically. >> I think we got consensus across the board that multi-cloud, you know, is a symptom Chuck Whitten's thing of multi-cloud by default versus multi- multi-cloud has not been a strategy, Kit Colbert said, up until the last couple of years. Yeah, because people said, "oh we got all these multiple clouds, what do we do with it?" and we got this mess that we have to solve. Whereas, I think Supercloud is something that is a strategy and then the other nuance that I keep bringing up is it's industries that are- as part of their digital transformation, are building clouds. Now, whether or not they become superclouds, I'm not convinced. I mean, what Goldman Sachs is doing, you know, with AWS, what Walmart's doing with Azure connecting their on-prem tools to those public clouds, you know, is that a supercloud? I mean, we're going to have to go back and really look at that definition. Or is it just kind of a SAS that spans on-prem and cloud. So, as I said, the further you go up the stack, the business case seems to wane a little bit but there's no question in my mind that from an infrastructure standpoint, to your point about operations, there's a real requirement for super- what we call Supercloud. >> Well, we're going to keep the conversation going, Dave. I want to put a shout out to our founding supporters of this initiative. Again, we put this together really fast kind of like a pilot series, an inaugural event. We want to have a face-to-face event as an industry event. Want to thank the founding supporters. These are the people who donated their time, their resource to contribute content, ideas and some cash, not everyone has committed some financial contribution but we want to recognize the names here. VMware, Intuit, Red Hat, Snowflake, Aisera, Alteryx, Confluent, Couchbase, Nutanix, Rafay Systems, Skyhigh Security, Aviatrix, Zscaler, Platform9, HashiCorp, F5 and all the media partners. Without their support, this wouldn't have happened. And there are more people that wanted to weigh in. There was more demand than we could pull off. We'll certainly continue the Supercloud conversation series here on "theCUBE" and we'll add more people in. And now, after this session, the Ecosystem Speaks session, we're going to run all the videos of the big name companies. We have the Nutanix CEOs weighing in, Aviatrix to name a few. >> Yeah. Let me, let me chime in, I mean you got Couchbase talking about Edge, Platform 9's going to be on, you know, everybody, you know Insig was poopoo-ing Oracle, but you know, Oracle and Azure, what they did, two technical guys, developers are coming on, we dig into what they did. Howie Xu from Zscaler, Paula Hansen is going to talk about going to market in the multi-cloud world. You mentioned Rajiv, the CEO of Nutanix, Ramesh is going to talk about multi-cloud infrastructure. So that's going to run now for, you know, quite some time here and some of the pre-record so super excited about that and I just want to thank the crew. I hope guys, I hope you have a list of credits there's too many of you to mention, but you know, awesome jobs really appreciate the work that you did in a very short amount of time. >> Well, I'm excited. I learned a lot and my takeaway was that Supercloud's a thing, there's a kind of sense that people want to talk about it and have real conversations, not BS or FUD. They want to have real substantive conversations and we're going to enable that on "theCUBE". Dave, final thoughts for you. >> Well, I mean, as I say, we put this together very quickly. It was really a phenomenal, you know, enlightening experience. I think it confirmed a lot of the concepts and the premises that we've put forth, that David Floyer helped evolve, that a lot of these analysts have helped evolve, that even Charles Fitzgerald with his antagonism helped to really sharpen our knives. So, you know, thank you Charles. And- >> I like his blog, by the I'm a reader- >> Yeah, absolutely. And it was great to be back in Palo Alto. It was my first time back since pre-COVID, so, you know, great job. >> All right. I want to thank all the crew and everyone. Thanks for watching this first, inaugural Supercloud event. We are definitely going to be doing more of these. So stay tuned, maybe face-to-face in person. I'm John Furrier with Dave Vellante now for the Ecosystem chiming in, and they're going to speak and share their thoughts here with "theCUBE" our first live stage performance event in our studio. Thanks for watching. (gentle upbeat music)

Published Date : Aug 9 2022

SUMMARY :

and they're going to be having as did, by the way Ali Ghodsi you know, And the similarities on the Democrat side And I think VMware is very humble So the question on VMware is and we want to lose weight. they have to deal with the divorce. And I thought that was poignant. Not sure I see that the Mm. And I think that to me is where And so the winners are the ones that are of the Rings comment: the security founder Gee Rittenhouse, a lot of the things to do So, and Chris Hoff mentioned on the is the future of cloud. is so good that the public cloud to on-premise, basically. So, as I said, the further and all the media partners. So that's going to run now for, you know, I learned a lot and my takeaway was and the premises that we've put forth, since pre-COVID, so, you know, great job. and they're going to speak

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
TristanPERSON

0.99+

George GilbertPERSON

0.99+

JohnPERSON

0.99+

GeorgePERSON

0.99+

Steve MullaneyPERSON

0.99+

KatiePERSON

0.99+

David FloyerPERSON

0.99+

CharlesPERSON

0.99+

Mike DooleyPERSON

0.99+

Peter BurrisPERSON

0.99+

ChrisPERSON

0.99+

Tristan HandyPERSON

0.99+

BobPERSON

0.99+

Maribel LopezPERSON

0.99+

Dave VellantePERSON

0.99+

Mike WolfPERSON

0.99+

VMwareORGANIZATION

0.99+

MerimPERSON

0.99+

Adrian CockcroftPERSON

0.99+

AmazonORGANIZATION

0.99+

BrianPERSON

0.99+

Brian RossiPERSON

0.99+

Jeff FrickPERSON

0.99+

Chris WegmannPERSON

0.99+

Whole FoodsORGANIZATION

0.99+

EricPERSON

0.99+

Chris HoffPERSON

0.99+

Jamak DaganiPERSON

0.99+

Jerry ChenPERSON

0.99+

CaterpillarORGANIZATION

0.99+

John WallsPERSON

0.99+

Marianna TesselPERSON

0.99+

JoshPERSON

0.99+

EuropeLOCATION

0.99+

JeromePERSON

0.99+

GoogleORGANIZATION

0.99+

Lori MacVittiePERSON

0.99+

2007DATE

0.99+

SeattleLOCATION

0.99+

10QUANTITY

0.99+

fiveQUANTITY

0.99+

Ali GhodsiPERSON

0.99+

Peter McKeePERSON

0.99+

NutanixORGANIZATION

0.99+

Eric HerzogPERSON

0.99+

IndiaLOCATION

0.99+

MikePERSON

0.99+

WalmartORGANIZATION

0.99+

five yearsQUANTITY

0.99+

AWSORGANIZATION

0.99+

Kit ColbertPERSON

0.99+

PeterPERSON

0.99+

DavePERSON

0.99+

Tanuja RanderyPERSON

0.99+

Breaking Analysis: Answering the top 10 questions about SuperCloud


 

>> From the theCUBE studios in Palo Alto in Boston, bringing you data driven insights from theCUBE and ETR. This is "Breaking Analysis" with Dave Vellante. >> Welcome to this week's Wikibon, theCUBE's insights powered by ETR. As we exited the isolation economy last year, supercloud is a term that we introduced to describe something new that was happening in the world of cloud. In this Breaking Analysis, we address the 10 most frequently asked questions we get around supercloud. Okay, let's review these frequently asked questions on supercloud that we're going to try to answer today. Look at an industry that's full of hype and buzzwords. Why the hell does anyone need a new term? Aren't hyperscalers building out superclouds? We'll try to answer why the term supercloud connotes something different from hyperscale clouds. And we'll talk about the problems that superclouds solve specifically. And we'll further define the critical aspects of a supercloud architecture. We often get asked, isn't this just multi-cloud? Well, we don't think so, and we'll explain why in this Breaking Analysis. Now in an earlier episode, we introduced the notion of super PaaS. Well, isn't a plain vanilla PaaS already a super PaaS? Again, we don't think so, and we'll explain why. Who will actually build and who are the players currently building superclouds? What workloads and services will run on superclouds? And 8-A or number nine, what are some examples that we can share of supercloud? And finally, we'll answer what you can expect next from us on supercloud? Okay, let's get started. Why do we need another buzzword? Well, late last year, ahead of re:Invent, we were inspired by a post from Jerry Chen called "Castles in the Cloud." Now in that blog post, he introduced the idea that there were sub-markets emerging in cloud that presented opportunities for investors and entrepreneurs that the cloud wasn't going to suck the hyperscalers. Weren't going to suck all the value out of the industry. And so we introduced this notion of supercloud to describe what we saw as a value layer emerging above the hyperscalers CAPEX gift, we sometimes call it. Now it turns out, that we weren't the only ones using the term as both Cornell and MIT have used the phrase in somewhat similar, but different contexts. The point is something new was happening in the AWS and other ecosystems. It was more than IaaS and PaaS, and wasn't just SaaS running in the cloud. It was a new architecture that integrates infrastructure, platform and software as services to solve new problems that the cloud vendors in our view, weren't addressing by themselves. It seemed to us that the ecosystem was pursuing opportunities across clouds that went beyond conventional implementations of multi-cloud. And we felt there was a structural change going on at the industry level, the supercloud, metaphorically was highlighting. So that's the background on why we felt a new catch phrase was warranted, love it or hate it. It's memorable and it's what we chose. Now to that last point about structural industry transformation. Andy Rappaport is sometimes and often credited with identifying the shift from the vertically integrated IBM mainframe era to the fragmented PC microprocesor-based era in his HBR article in 1991. In fact, it was David Moschella, who at the time was an IDC Analyst who first introduced the concept in 1987, four years before Rappaport's article was published. Moschella saw that it was clear that Intel, Microsoft, Seagate and others would replace the system vendors, and put that forth in a graphic that looked similar to the first two on this chart. We don't have to review the shift from IBM as the center of the industry to Wintel, that's well understood. What isn't as well known or accepted is what Moschella put out in his 2018 book called "Seeing Digital" which introduced the idea of "The Matrix" that's shown on the right hand side of this chart. Moschella posited that new services were emerging built on top of the internet and hyperscale clouds that would integrate other innovations and would define the next era of computing. He used the term Matrix because the conceptual depiction included not only horizontal technology rose like the cloud and the internet, but for the first time included connected industry verticals, the columns in this chart. Moschella pointed out that whereas historically, industry verticals had a closed value chain or stack and ecosystem of R&D, and production, and manufacturing, and distribution. And if you were in that industry, the expertise within that vertical generally stayed within that vertical and was critical to success. But because of digital and data, for the first time, companies were able to traverse industries, jump across industries and compete because data enabled them to do that. Examples, Amazon and content, payments, groceries, Apple, and payments, and content, and so forth. There are many examples. Data was now this unifying enabler and this marked a change in the structure of the technology landscape. And supercloud is meant to imply more than running in hyperscale clouds, rather it's the combination of multiple technologies enabled by CloudScale with new industry participants from those verticals, financial services and healthcare, manufacturing, energy, media, and virtually all in any industry. Kind of an extension of every company is a software company. Basically, every company now has the opportunity to build their own cloud or supercloud. And we'll come back to that. Let's first address what's different about superclouds relative to hyperscale clouds? You know, this one's pretty straightforward and obvious, I think. Hyperscale clouds, they're walled gardens where they want your data in their cloud and they want to keep you there. Sure, every cloud player realizes that not all data will go to their particular cloud so they're meeting customers where their data lives with initiatives like Amazon Outposts and Azure Arc, and Google Anthos. But at the end of the day, the more homogeneous they can make their environments, the better control, security, cost, and performance they can deliver. The more complex the environment, the more difficult it is to deliver on their brand promises. And of course, the lesser margin that's left for them to capture. Will the hyperscalers get more serious about cross-cloud services? Maybe, but they have plenty of work to do within their own clouds and within enabling their own ecosystems. They had a long way to go a lot of runway. So let's talk about specifically, what problems superclouds solve? We've all seen the stats from IDC or Gartner, or whomever the customers on average use more than one cloud. You know, two clouds, three clouds, five clouds, 20 clouds. And we know these clouds operate in disconnected silos for the most part. And that's a problem because each cloud requires different skills because the development environment is different as is the operating environment. They have different APIs, different primitives, and different management tools that are optimized for each respective hyperscale cloud. Their functions and value props don't extend to their competitors' clouds for the most part. Why would they? As a result, there's friction when moving between different clouds. It's hard to share data, it's hard to move work. It's hard to secure and govern data. It's hard to enforce organizational edicts and policies across these clouds, and on-prem. Supercloud is an architecture designed to create a single environment that enables management of workloads and data across clouds in an effort to take out complexity, accelerate application development, streamline operations and share data safely, irrespective of location. It's pretty straightforward, but non-trivial, which is why I always ask a company's CEO and executives if stock buybacks and dividends will yield as much return as building out superclouds that solve really specific and hard problems, and create differential value. Okay, let's dig a bit more into the architectural aspects of supercloud. In other words, what are the salient attributes of supercloud? So first and foremost, a supercloud runs a set of specific services designed to solve a unique problem and it can do so in more than one cloud. Superclouds leverage the underlying cloud native tooling of a hyperscale cloud, but they're optimized for a specific objective that aligns with the problem that they're trying to solve. For example, supercloud might be optimized for lowest cost or lowest latency, or sharing data, or governing, or securing that data, or higher performance for networking, for example. But the point is, the collection of services that is being delivered is focused on a unique value proposition that is not being delivered by the hyperscalers across clouds. A supercloud abstracts the underlying and siloed primitives of the native PaaS layer from the hyperscale cloud and then using its own specific platform as a service tooling, creates a common experience across clouds for developers and users. And it does so in a most efficient manner, meaning it has the metadata knowledge and management capabilities that can optimize for latency, bandwidth, or recovery, or data sovereignty, or whatever unique value that supercloud is delivering for the specific use case in their domain. And a supercloud comprises a super PaaS capability that allows ecosystem partners through APIs to add incremental value on top of the supercloud platform to fill gaps, accelerate features, and of course innovate. The services can be infrastructure-related, they could be application services, they could be data services, security services, user services, et cetera, designed and packaged to bring unique value to customers. Again, that hyperscalers are not delivering across clouds or on-premises. Okay, so another common question we get is, isn't that just multi-cloud? And what we'd say to that is yes, but no. You can call it multi-cloud 2.0, if you want, if you want to use it, it's kind of a commonly used rubric. But as Dell's Chuck Whitten proclaimed at Dell Technologies World this year, multi-cloud by design, is different than multi-cloud by default. Meaning to date, multi-cloud has largely been a symptom of what we've called multi-vendor or of M&A, you buy a company and they happen to use Google Cloud, and so you bring it in. And when you look at most so-called, multi-cloud implementations, you see things like an on-prem stack, which is wrapped in a container and hosted on a specific cloud or increasingly a technology vendor has done the work of building a cloud native version of their stack and running it on a specific cloud. But historically, it's been a unique experience within each cloud with virtually no connection between the cloud silos. Supercloud sets out to build incremental value across clouds and above hyperscale CAPEX that goes beyond cloud compatibility within each cloud. So if you want to call it multi-cloud 2.0, that's fine, but we chose to call it supercloud. Okay, so at this point you may be asking, well isn't PaaS already a version of supercloud? And again, we would say no, that supercloud and its corresponding superPaaS layer which is a prerequisite, gives the freedom to store, process and manage, and secure, and connect islands of data across a continuum with a common experience across clouds. And the services offered are specific to that supercloud and will vary by each offering. Your OpenShift, for example, can be used to construct a superPaaS, but in and of itself, isn't a superPaaS, it's generic. A superPaaS might be developed to support, for instance, ultra low latency database work. It would unlikely again, taking the OpenShift example, it's unlikely that off-the-shelf OpenShift would be used to develop such a low latency superPaaS layer for ultra low latency database work. The point is supercloud and its inherent superPaaS will be optimized to solve specific problems like that low latency example for distributed databases or fast backup and recovery for data protection, and ransomware, or data sharing, or data governance. Highly specific use cases that the supercloud is designed to solve for. Okay, another question we often get is who has a supercloud today and who's building a supercloud, and who are the contenders? Well, most companies that consider themselves cloud players will, we believe, be building or are building superclouds. Here's a common ETR graphic that we like to show with Net Score or spending momentum on the Y axis and overlap or pervasiveness in the ETR surveys on the X axis. And we've randomly chosen a number of players that we think are in the supercloud mix, and we've included the hyperscalers because they are enablers. Now remember, this is a spectrum of maturity it's a maturity model and we've added some of those industry players that we see building superclouds like CapitalOne, Goldman Sachs, Walmart. This is in deference to Moschella's observation around The Matrix and the industry structural changes that are going on. This goes back to every company, being a software company and rather than pattern match an outdated SaaS model, we see new industry structures emerging where software and data, and tools, specific to an industry will lead the next wave of innovation and bring in new value that traditional technology companies aren't going to solve, and the hyperscalers aren't going to solve. You know, we've talked a lot about Snowflake's data cloud as an example of supercloud. After being at Snowflake Summit, we're more convinced than ever that they're headed in this direction. VMware is clearly going after cross-cloud services you know, perhaps creating a new category. Basically, every large company we see either pursuing supercloud initiatives or thinking about it. Dell showed project Alpine at Dell Tech World, that's a supercloud. Snowflake introducing a new application development capability based on their superPaaS, our term of course, they don't use the phrase. Mongo, Couchbase, Nutanix, Pure Storage, Veeam, CrowdStrike, Okta, Zscaler. Yeah, all of those guys. Yes, Cisco and HPE. Even though on theCUBE at HPE Discover, Fidelma Russo said on theCUBE, she wasn't a fan of cloaking mechanisms, but then we talked to HPE's Head of Storage Services, Omer Asad is clearly headed in the direction that we would consider supercloud. Again, those cross-cloud services, of course, their emphasis is connecting as well on-prem. That single experience, which traditionally has not existed with multi-cloud or hybrid. And we're seeing the emergence of companies, smaller companies like Aviatrix and Starburst, and Clumio and others that are building versions of superclouds that solve for a specific problem for their customers. Even ISVs like Adobe, ADP, we've talked to UiPath. They seem to be looking at new ways to go beyond the SaaS model and add value within their cloud ecosystem specifically, around data as part of their and their customers digital transformations. So yeah, pretty much every tech vendor with any size or momentum and new industry players are coming out of hiding, and competing. Building superclouds that look a lot like Moschella's Matrix, with machine intelligence and blockchains, and virtual realities, and gaming, all enabled by the internet and hyperscale cloud CAPEX. So it's moving fast and it's the future in our opinion. So don't get too caught up in the past or you'll be left behind. Okay, what about examples? We've given a number in the past, but let's try to be a little bit more specific. Here are a few we've selected and we're going to answer the two questions in one section here. What workloads and services will run in superclouds and what are some examples? Let's start with analytics. Our favorite example is Snowflake, it's one of the furthest along with its data cloud, in our view. It's a supercloud optimized for data sharing and governance, query performance, and security, and ecosystem enablement. When you do things inside of that data cloud, what we call a super data cloud. Again, our term, not theirs. You can do things that you could not do in a single cloud. You can't do this with Redshift, You can't do this with SQL server and they're bringing new data types now with merging analytics or at least accommodate analytics and transaction type data, and bringing open source tooling with things like Apache Iceberg. And so it ticks the boxes we laid out earlier. I would say that a company like Databricks is also in that mix doing it, coming at it from a data science perspective, trying to create that consistent experience for data scientists and data engineering across clouds. Converge databases, running transaction and analytic workloads is another example. Take a look at what Couchbase is doing with Capella and how it's enabling stretching the cloud to the edge with ARM-based platforms and optimizing for low latency across clouds, and even out to the edge. Document database workloads, look at MongoDB, a very developer-friendly platform that with the Atlas is moving toward a supercloud model running document databases very, very efficiently. How about general purpose workloads? This is where VMware comes into to play. Very clearly, there's a need to create a common operating environment across clouds and on-prem, and out to the edge. And I say VMware is hard at work on that. Managing and moving workloads, and balancing workloads, and being able to recover very quickly across clouds for everyday applications. Network routing, take a look at what Aviatrix is doing across clouds, industry workloads. We see CapitalOne, it announced its cost optimization platform for Snowflake, piggybacking on Snowflake supercloud or super data cloud. And in our view, it's very clearly going to go after other markets is going to test it out with Snowflake, running, optimizing on AWS and it's going to expand to other clouds as Snowflake's business and those other clouds grows. Walmart working with Microsoft to create an on-premed Azure experience that's seamless. Yes, that counts, on-prem counts. If you can create that seamless and continuous experience, identical experience from on-prem to a hyperscale cloud, we would include that as a supercloud. You know, we've written about what Goldman is doing. Again, connecting its on-prem data and software tooling, and other capabilities to AWS for scale. And we can bet dollars to donuts that Oracle will be building a supercloud in healthcare with its Cerner acquisition. Supercloud is everywhere you look. So I'm sorry, naysayers it's happening all around us. So what's next? Well, with all the industry buzz and debate about the future, John Furrier and I, have decided to host an event in Palo Alto, we're motivated and inspired to further this conversation. And we welcome all points of view, positive, negative, multi-cloud, supercloud, hypercloud, all welcome. So theCUBE on Supercloud is coming on August 9th, out of our Palo Alto studios, we'll be running a live program on the topic. We've reached out to a number of industry participants, VMware, Snowflake, Confluent, Sky High Security, Gee Rittenhouse's new company, HashiCorp, CloudFlare. We've hit up Red Hat and we expect many of these folks will be in our studios on August 9th. And we've invited a number of industry participants as well that we're excited to have on. From industry, from financial services, from healthcare, from retail, we're inviting analysts, thought leaders, investors. We're going to have more detail in the coming weeks, but for now, if you're interested, please reach out to me or John with how you think you can advance the discussion and we'll see if we can fit you in. So mark your calendars, stay tuned for more information. Okay, that's it for today. Thanks to Alex Myerson who handles production and manages the podcast for Breaking Analysis. And I want to thank Kristen Martin and Cheryl Knight, they help get the word out on social and in our newsletters. And Rob Hof is our editor in chief over at SiliconANGLE, who does a lot of editing and appreciate you posting on SiliconANGLE, Rob. Thanks to all of you. Remember, all these episodes are available as podcasts wherever you listen. All you got to do is search Breaking Analysis podcast. It publish each week on wikibon.com and siliconangle.com. You can email me directly at david.vellante@siliconangle.com or DM me @DVellante, or comment on my LinkedIn post. And please do check out ETR.ai for the best survey data. And the enterprise tech business will be at AWS NYC Summit next Tuesday, July 12th. So if you're there, please do stop by and say hello to theCUBE, it's at the Javits Center. This is Dave Vellante for theCUBE insights powered by ETR. Thanks for watching. And we'll see you next time on "Breaking Analysis." (bright music)

Published Date : Jul 9 2022

SUMMARY :

From the theCUBE studios and how it's enabling stretching the cloud

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Alex MyersonPERSON

0.99+

SeagateORGANIZATION

0.99+

MicrosoftORGANIZATION

0.99+

Dave VellantePERSON

0.99+

1987DATE

0.99+

Andy RappaportPERSON

0.99+

David MoschellaPERSON

0.99+

WalmartORGANIZATION

0.99+

Jerry ChenPERSON

0.99+

IntelORGANIZATION

0.99+

Chuck WhittenPERSON

0.99+

Cheryl KnightPERSON

0.99+

Rob HofPERSON

0.99+

1991DATE

0.99+

August 9thDATE

0.99+

AmazonORGANIZATION

0.99+

HPEORGANIZATION

0.99+

Palo AltoLOCATION

0.99+

JohnPERSON

0.99+

MoschellaPERSON

0.99+

OracleORGANIZATION

0.99+

CiscoORGANIZATION

0.99+

IBMORGANIZATION

0.99+

20 cloudsQUANTITY

0.99+

StarburstORGANIZATION

0.99+

Goldman SachsORGANIZATION

0.99+

DellORGANIZATION

0.99+

Fidelma RussoPERSON

0.99+

2018DATE

0.99+

two questionsQUANTITY

0.99+

AppleORGANIZATION

0.99+

AWSORGANIZATION

0.99+

AviatrixORGANIZATION

0.99+

Omer AsadPERSON

0.99+

Sky High SecurityORGANIZATION

0.99+

DatabricksORGANIZATION

0.99+

ConfluentORGANIZATION

0.99+

WintelORGANIZATION

0.99+

NutanixORGANIZATION

0.99+

CapitalOneORGANIZATION

0.99+

CouchbaseORGANIZATION

0.99+

HashiCorpORGANIZATION

0.99+

five cloudsQUANTITY

0.99+

Kristen MartinPERSON

0.99+

last yearDATE

0.99+

david.vellante@siliconangle.comOTHER

0.99+

two cloudsQUANTITY

0.99+

RobPERSON

0.99+

SnowflakeORGANIZATION

0.99+

MongoORGANIZATION

0.99+

Pure StorageORGANIZATION

0.99+

each cloudQUANTITY

0.99+

VeeamORGANIZATION

0.99+

John FurrierPERSON

0.99+

GartnerORGANIZATION

0.99+

VMwareORGANIZATION

0.99+

first twoQUANTITY

0.99+

ClumioORGANIZATION

0.99+

CrowdStrikeORGANIZATION

0.99+

OktaORGANIZATION

0.99+

three cloudsQUANTITY

0.99+

MITORGANIZATION

0.99+

Javits CenterLOCATION

0.99+

first timeQUANTITY

0.99+

ZscalerORGANIZATION

0.99+

RappaportPERSON

0.99+

MoschellaORGANIZATION

0.99+

each weekQUANTITY

0.99+

late last yearDATE

0.99+

UiPathORGANIZATION

0.99+

10 most frequently asked questionsQUANTITY

0.99+

CloudFlareORGANIZATION

0.99+

IDCORGANIZATION

0.99+

one sectionQUANTITY

0.99+

SiliconANGLEORGANIZATION

0.98+

Seeing DigitalTITLE

0.98+

eachQUANTITY

0.98+

firstQUANTITY

0.98+

bothQUANTITY

0.98+

AdobeORGANIZATION

0.98+

more than one cloudQUANTITY

0.98+

each offeringQUANTITY

0.98+

Breaking Analysis: Answering the top 10 questions about supercloud


 

>> From theCUBE Studios in Palo Alto and Boston, bringing you data-driven insights from theCUBE and ETR. This is "Breaking Analysis" with Dave Vallante. >> Welcome to this week's Wikibon CUBE Insights powered by ETR. As we exited the isolation economy last year, Supercloud is a term that we introduced to describe something new that was happening in the world of cloud. In this "Breaking Analysis," we address the 10 most frequently asked questions we get around Supercloud. Okay, let's review these frequently asked questions on Supercloud that we're going to try to answer today. Look at an industry that's full of hype and buzzwords. Why the hell does anyone need a new term? Aren't hyperscalers building out Superclouds? We'll try to answer why the term Supercloud connotes something different from hyperscale clouds. And we'll talk about the problems that Superclouds solve specifically, and we'll further define the critical aspects of a Supercloud architecture. We often get asked, "Isn't this just multi-cloud?" Well, we don't think so, and we'll explain why in this "Breaking Analysis." Now, in an earlier episode, we introduced the notion of super PaaS. Well, isn't a plain vanilla PaaS already a super PaaS? Again, we don't think so, and we'll explain why. Who will actually build and who are the players currently building Superclouds? What workloads and services will run on Superclouds? And eight A or number nine, what are some examples that we can share of Supercloud? And finally, we'll answer what you can expect next from us on Supercloud. Okay, let's get started. Why do we need another buzzword? Well, late last year ahead of re:Invent, we were inspired by a post from Jerry Chen called castles in the cloud. Now, in that blog post, he introduced the idea that there were submarkets emerging in cloud that presented opportunities for investors and entrepreneurs. That the cloud wasn't going to suck the hyperscalers, weren't going to suck all the value out of the industry. And so we introduced this notion of Supercloud to describe what we saw as a value layer emerging above the hyperscalers CAPEX gift, we sometimes call it. Now, it turns out that we weren't the only ones using the term, as both Cornell and MIT, have used the phrase in somewhat similar, but different contexts. The point is, something new was happening in the AWS and other ecosystems. It was more than IS and PaaS, and wasn't just SaaS running in the cloud. It was a new architecture that integrates infrastructure, platform and software as services, to solve new problems that the cloud vendors, in our view, weren't addressing by themselves. It seemed to us that the ecosystem was pursuing opportunities across clouds that went beyond conventional implementations of multi-cloud. And we felt there was a structural change going on at the industry level. The Supercloud metaphorically was highlighting. So that's the background on why we felt a new catch phrase was warranted. Love it or hate it, it's memorable and it's what we chose. Now, to that last point about structural industry transformation. Andy Rapaport is sometimes and often credited with identifying the shift from the vertically integrated IBM mainframe era to the fragmented PC microprocesor based era in his HBR article in 1991. In fact, it was David Moschella, who at the time was an IDC analyst who first introduced the concept in 1987, four years before Rapaport's article was published. Moschella saw that it was clear that Intel, Microsoft, Seagate and others would replace the system vendors and put that forth in a graphic that looked similar to the first two on this chart. We don't have to review the shift from IBM as the center of the industry to Wintel. That's well understood. What isn't as well known or accepted is what Moschella put out in his 2018 book called "Seeing Digital" which introduced the idea of the matrix that's shown on the right hand side of this chart. Moschella posited that new services were emerging, built on top of the internet and hyperscale clouds that would integrate other innovations and would define the next era of computing. He used the term matrix, because the conceptual depiction included, not only horizontal technology rows, like the cloud and the internet, but for the first time included connected industry verticals, the columns in this chart. Moschella pointed out that, whereas historically, industry verticals had a closed value chain or stack and ecosystem of R&D and production and manufacturing and distribution. And if you were in that industry, the expertise within that vertical generally stayed within that vertical and was critical to success. But because of digital and data, for the first time, companies were able to traverse industries jump across industries and compete because data enabled them to do that. Examples, Amazon and content, payments, groceries, Apple and payments, and content and so forth. There are many examples. Data was now this unifying enabler and this marked a change in the structure of the technology landscape. And Supercloud is meant to imply more than running in hyperscale clouds. Rather, it's the combination of multiple technologies, enabled by cloud scale with new industry participants from those verticals; financial services, and healthcare, and manufacturing, energy, media, and virtually all and any industry. Kind of an extension of every company is a software company. Basically, every company now has the opportunity to build their own cloud or Supercloud. And we'll come back to that. Let's first address what's different about Superclouds relative to hyperscale clouds. Now, this one's pretty straightforward and obvious, I think. Hyperscale clouds, they're walled gardens where they want your data in their cloud and they want to keep you there. Sure, every cloud player realizes that not all data will go to their particular cloud. So they're meeting customers where their data lives with initiatives like Amazon Outposts and Azure Arc and Google Antos. But at the end of the day, the more homogeneous they can make their environments, the better control, security, costs, and performance they can deliver. The more complex the environment, the more difficult it is to deliver on their brand promises. And, of course, the less margin that's left for them to capture. Will the hyperscalers get more serious about cross cloud services? Maybe, but they have plenty of work to do within their own clouds and within enabling their own ecosystems. They have a long way to go, a lot of runway. So let's talk about specifically, what problems Superclouds solve. We've all seen the stats from IDC or Gartner or whomever, that customers on average use more than one cloud, two clouds, three clouds, five clouds, 20 clouds. And we know these clouds operate in disconnected silos for the most part. And that's a problem, because each cloud requires different skills, because the development environment is different as is the operating environment. They have different APIs, different primitives, and different management tools that are optimized for each respective hyperscale cloud. Their functions and value props don't extend to their competitors' clouds for the most part. Why would they? As a result, there's friction when moving between different clouds. It's hard to share data. It's hard to move work. It's hard to secure and govern data. It's hard to enforce organizational edicts and policies across these clouds and on-prem. Supercloud is an architecture designed to create a single environment that enables management of workloads and data across clouds in an effort to take out complexity, accelerate application development, streamline operations, and share data safely, irrespective of location. It's pretty straightforward, but non-trivial, which is why I always ask a company's CEO and executives if stock buybacks and dividends will yield as much return as building out Superclouds that solve really specific and hard problems and create differential value. Okay, let's dig a bit more into the architectural aspects of Supercloud. In other words, what are the salient attributes of Supercloud? So, first and foremost, a Supercloud runs a set of specific services designed to solve a unique problem, and it can do so in more than one cloud. Superclouds leverage the underlying cloud native tooling of a hyperscale cloud, but they're optimized for a specific objective that aligns with the problem that they're trying to solve. For example, Supercloud might be optimized for lowest cost or lowest latency or sharing data or governing or securing that data or higher performance for networking, for example. But the point is, the collection of services that is being delivered is focused on a unique value proposition that is not being delivered by the hyperscalers across clouds. A Supercloud abstracts the underlying and siloed primitives of the native PaaS layer from the hyperscale cloud, and then using its own specific platform as a service tooling, creates a common experience across clouds for developers and users. And it does so in the most efficient manner, meaning it has the metadata knowledge and management capabilities that can optimize for latency, bandwidth, or recovery or data sovereignty, or whatever unique value that Supercloud is delivering for the specific use case in their domain. And a Supercloud comprises a super PaaS capability that allows ecosystem partners through APIs to add incremental value on top of the Supercloud platform to fill gaps, accelerate features, and of course, innovate. The services can be infrastructure related, they could be application services, they could be data services, security services, user services, et cetera, designed and packaged to bring unique value to customers. Again, that hyperscalers are not delivering across clouds or on premises. Okay, so another common question we get is, "Isn't that just multi-cloud?" And what we'd say to that is yeah, "Yes, but no." You can call it multi-cloud 2.0, if you want. If you want to use, it's kind of a commonly used rubric. But as Dell's Chuck Whitten proclaimed at Dell Technologies World this year, multi-cloud, by design, is different than multi-cloud by default. Meaning, to date, multi-cloud has largely been a symptom of what we've called multi-vendor or of M&A. You buy a company and they happen to use Google cloud. And so you bring it in. And when you look at most so-called multi-cloud implementations, you see things like an on-prem stack, which is wrapped in a container and hosted on a specific cloud. Or increasingly, a technology vendor has done the work of building a cloud native version of their stack and running it on a specific cloud. But historically, it's been a unique experience within each cloud, with virtually no connection between the cloud silos. Supercloud sets out to build incremental value across clouds and above hyperscale CAPEX that goes beyond cloud compatibility within each cloud. So, if you want to call it multi-cloud 2.0, that's fine, but we chose to call it Supercloud. Okay, so at this point you may be asking, "Well isn't PaaS already a version of Supercloud?" And again, we would say, "No." That Supercloud and its corresponding super PaaS layer, which is a prerequisite, gives the freedom to store, process, and manage and secure and connect islands of data across a continuum with a common experience across clouds. And the services offered are specific to that Supercloud and will vary by each offering. OpenShift, for example, can be used to construct a super PaaS, but in and of itself, isn't a super PaaS, it's generic. A super PaaS might be developed to support, for instance, ultra low latency database work. It would unlikely, again, taking the OpenShift example, it's unlikely that off the shelf OpenShift would be used to develop such a low latency, super PaaS layer for ultra low latency database work. The point is, Supercloud and its inherent super PaaS will be optimized to solve specific problems like that low latency example for distributed databases or fast backup in recovery for data protection and ransomware, or data sharing or data governance. Highly specific use cases that the Supercloud is designed to solve for. Okay, another question we often get is, "Who has a Supercloud today and who's building a Supercloud and who are the contenders?" Well, most companies that consider themselves cloud players will, we believe, be building or are building Superclouds. Here's a common ETR graphic that we like to show with net score or spending momentum on the Y axis, and overlap or pervasiveness in the ETR surveys on the X axis. And we've randomly chosen a number of players that we think are in the Supercloud mix. And we've included the hyperscalers because they are enablers. Now, remember, this is a spectrum of maturity. It's a maturity model. And we've added some of those industry players that we see building Superclouds like Capital One, Goldman Sachs, Walmart. This is in deference to Moschella's observation around the matrix and the industry structural changes that are going on. This goes back to every company being a software company. And rather than pattern match and outdated SaaS model, we see new industry structures emerging where software and data and tools specific to an industry will lead the next wave of innovation and bring in new value that traditional technology companies aren't going to solve. And the hyperscalers aren't going to solve. We've talked a lot about Snowflake's data cloud as an example of Supercloud. After being at Snowflake Summit, we're more convinced than ever that they're headed in this direction. VMware is clearly going after cross cloud services, perhaps creating a new category. Basically, every large company we see either pursuing Supercloud initiatives or thinking about it. Dell showed Project Alpine at Dell Tech World. That's a Supercloud. Snowflake introducing a new application development capability based on their super PaaS, our term, of course. They don't use the phrase. Mongo, Couchbase, Nutanix, Pure Storage, Veeam, CrowdStrike, Okta, Zscaler. Yeah, all of those guys. Yes, Cisco and HPE. Even though on theCUBE at HPE Discover, Fidelma Russo said on theCUBE, she wasn't a fan of cloaking mechanisms. (Dave laughing) But then we talked to HPE's head of storage services, Omer Asad, and he's clearly headed in the direction that we would consider Supercloud. Again, those cross cloud services, of course, their emphasis is connecting as well on-prem. That single experience, which traditionally has not existed with multi-cloud or hybrid. And we're seeing the emergence of smaller companies like Aviatrix and Starburst and Clumio and others that are building versions of Superclouds that solve for a specific problem for their customers. Even ISVs like Adobe, ADP, we've talked to UiPath. They seem to be looking at new ways to go beyond the SaaS model and add value within their cloud ecosystem, specifically around data as part of their and their customer's digital transformations. So yeah, pretty much every tech vendor with any size or momentum, and new industry players are coming out of hiding and competing, building Superclouds that look a lot like Moschella's matrix, with machine intelligence and blockchains and virtual realities and gaming, all enabled by the internet and hyperscale cloud CAPEX. So it's moving fast and it's the future in our opinion. So don't get too caught up in the past or you'll be left behind. Okay, what about examples? We've given a number in the past but let's try to be a little bit more specific. Here are a few we've selected and we're going to answer the two questions in one section here. What workloads and services will run in Superclouds and what are some examples? Let's start with analytics. Our favorite example of Snowflake. It's one of the furthest along with its data cloud, in our view. It's a Supercloud optimized for data sharing and governance, and query performance, and security, and ecosystem enablement. When you do things inside of that data cloud, what we call a super data cloud. Again, our term, not theirs. You can do things that you could not do in a single cloud. You can't do this with Redshift. You can't do this with SQL server. And they're bringing new data types now with merging analytics or at least accommodate analytics and transaction type data and bringing open source tooling with things like Apache Iceberg. And so, it ticks the boxes we laid out earlier. I would say that a company like Databricks is also in that mix, doing it, coming at it from a data science perspective trying to create that consistent experience for data scientists and data engineering across clouds. Converge databases, running transaction and analytic workloads is another example. Take a look at what Couchbase is doing with Capella and how it's enabling stretching the cloud to the edge with arm based platforms and optimizing for low latency across clouds, and even out to the edge. Document database workloads, look at Mongo DB. A very developer friendly platform that where the Atlas is moving toward a Supercloud model, running document databases very, very efficiently. How about general purpose workloads? This is where VMware comes into play. Very clearly, there's a need to create a common operating environment across clouds and on-prem and out to the edge. And I say, VMware is hard at work on that, managing and moving workloads and balancing workloads, and being able to recover very quickly across clouds for everyday applications. Network routing, take a look at what Aviatrix is doing across clouds. Industry workloads, we see Capital One. It announced its cost optimization platform for Snowflake, piggybacking on Snowflake's Supercloud or super data cloud. And in our view, it's very clearly going to go after other markets. It's going to test it out with Snowflake, optimizing on AWS, and it's going to expand to other clouds as Snowflake's business and those other clouds grows. Walmart working with Microsoft to create an on-premed Azure experience that's seamless. Yes, that counts, on-prem counts. If you can create that seamless and continuous experience, identical experience from on-prem to a hyperscale cloud, we would include that as a Supercloud. We've written about what Goldman is doing. Again, connecting its on-prem data and software tooling, and other capabilities to AWS for scale. And you can bet dollars to donuts that Oracle will be building a Supercloud in healthcare with its Cerner acquisition. Supercloud is everywhere you look. So I'm sorry, naysayers, it's happening all around us. So what's next? Well, with all the industry buzz and debate about the future, John Furrier and I have decided to host an event in Palo Alto. We're motivated and inspired to further this conversation. And we welcome all points of view, positive, negative, multi-cloud, Supercloud, HyperCloud, all welcome. So theCUBE on Supercloud is coming on August 9th out of our Palo Alto studios. We'll be running a live program on the topic. We've reached out to a number of industry participants; VMware, Snowflake, Confluent, Skyhigh Security, G. Written House's new company, HashiCorp, CloudFlare. We've hit up Red Hat and we expect many of these folks will be in our studios on August 9th. And we've invited a number of industry participants as well that we're excited to have on. From industry, from financial services, from healthcare, from retail, we're inviting analysts, thought leaders, investors. We're going to have more detail in the coming weeks, but for now, if you're interested, please reach out to me or John with how you think you can advance the discussion, and we'll see if we can fit you in. So mark your calendars, stay tuned for more information. Okay, that's it for today. Thanks to Alex Myerson who handles production and manages the podcast for "Breaking Analysis." And I want to thank Kristen Martin and Cheryl Knight. They help get the word out on social and in our newsletters. And Rob Hof is our editor in chief over at SiliconANGLE, who does a lot of editing and appreciate you posting on SiliconANGLE, Rob. Thanks to all of you. Remember, all these episodes are available as podcasts wherever you listen. All you got to do is search, breaking analysis podcast. I publish each week on wikibon.com and siliconangle.com. Or you can email me directly at david.vellante@siliconangle.com. Or DM me @DVallante, or comment on my LinkedIn post. And please, do check out etr.ai for the best survey data in the enterprise tech business. We'll be at AWS NYC summit next Tuesday, July 12th. So if you're there, please do stop by and say hello to theCUBE. It's at the Javits Center. This is Dave Vallante for theCUBE Insights, powered by ETR. Thanks for watching. And we'll see you next time on "Breaking Analysis." (slow music)

Published Date : Jul 8 2022

SUMMARY :

This is "Breaking Analysis" stretching the cloud to the edge

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Alex MyersonPERSON

0.99+

SeagateORGANIZATION

0.99+

1987DATE

0.99+

Dave VallantePERSON

0.99+

MicrosoftORGANIZATION

0.99+

WalmartORGANIZATION

0.99+

1991DATE

0.99+

Andy RapaportPERSON

0.99+

Jerry ChenPERSON

0.99+

MoschellaPERSON

0.99+

OracleORGANIZATION

0.99+

Cheryl KnightPERSON

0.99+

David MoschellaPERSON

0.99+

Rob HofPERSON

0.99+

Palo AltoLOCATION

0.99+

August 9thDATE

0.99+

IntelORGANIZATION

0.99+

CiscoORGANIZATION

0.99+

HPEORGANIZATION

0.99+

Chuck WhittenPERSON

0.99+

IBMORGANIZATION

0.99+

Goldman SachsORGANIZATION

0.99+

AmazonORGANIZATION

0.99+

Fidelma RussoPERSON

0.99+

20 cloudsQUANTITY

0.99+

AWSORGANIZATION

0.99+

WintelORGANIZATION

0.99+

DatabricksORGANIZATION

0.99+

two questionsQUANTITY

0.99+

DellORGANIZATION

0.99+

John FurrierPERSON

0.99+

2018DATE

0.99+

AppleORGANIZATION

0.99+

JohnPERSON

0.99+

BostonLOCATION

0.99+

AviatrixORGANIZATION

0.99+

StarburstORGANIZATION

0.99+

ConfluentORGANIZATION

0.99+

five cloudsQUANTITY

0.99+

ClumioORGANIZATION

0.99+

CouchbaseORGANIZATION

0.99+

first timeQUANTITY

0.99+

NutanixORGANIZATION

0.99+

MoschellaORGANIZATION

0.99+

Skyhigh SecurityORGANIZATION

0.99+

MITORGANIZATION

0.99+

HashiCorpORGANIZATION

0.99+

last yearDATE

0.99+

RobPERSON

0.99+

two cloudsQUANTITY

0.99+

three cloudsQUANTITY

0.99+

david.vellante@siliconangle.comOTHER

0.99+

first twoQUANTITY

0.99+

Kristen MartinPERSON

0.99+

MongoORGANIZATION

0.99+

GartnerORGANIZATION

0.99+

CrowdStrikeORGANIZATION

0.99+

OktaORGANIZATION

0.99+

Pure StorageORGANIZATION

0.99+

Omer AsadPERSON

0.99+

Capital OneORGANIZATION

0.99+

each cloudQUANTITY

0.99+

SnowflakeORGANIZATION

0.99+

VeeamORGANIZATION

0.99+

OpenShiftTITLE

0.99+

10 most frequently asked questionsQUANTITY

0.99+

RapaportPERSON

0.99+

SiliconANGLEORGANIZATION

0.99+

CloudFlareORGANIZATION

0.99+

one sectionQUANTITY

0.99+

Seeing DigitalTITLE

0.99+

VMwareORGANIZATION

0.99+

IDCORGANIZATION

0.99+

ZscalerORGANIZATION

0.99+

each weekQUANTITY

0.99+

Javits CenterLOCATION

0.99+

late last yearDATE

0.98+

firstQUANTITY

0.98+

AdobeORGANIZATION

0.98+

more than one cloudQUANTITY

0.98+

each offeringQUANTITY

0.98+

Venkat Venkataramani, Rockset | CUBE Conversation


 

(upbeat music) >> Hello, welcome to this CUBE Conversation featuring Rockset CEO and co-founder Venkat Venkataramani who selected season two of the AWS Startup Showcase featured company. Before co-founding Rockset Venkat was the engineering director at Facebook, infrastructure team responsible for all the data infrastructure, storing all there at Facebook and he's here to talk real-time analytics. Venkat welcome back to theCUBE for this CUBE Conversation. >> Thanks John. Thanks for having me again. It's a pleasure to be here. >> I'd love to read back and I know you don't like to take a look back but Facebook was huge hyperscale data at scale, really a leading indicator of where everyone is kind of in now so this is about real-time analytics moving from batch to theme here. You guys are at the center, we've talked about it before here on theCUBE, and so let's get in. We've a couple different good talk tracks to dig into but first I want to get your reaction to this soundbite I read on your blog post. Fast analytics on fresh data is better than slow analytics on stale data, fresh beats stale every time, fast beats slow in every space. Where does that come from obviously it makes a lot of sense nobody wants slow data, no one wants to bail data.(giggles) >> Look, we live in the information era. Businesses do want to track, ask much information as possible about their business and want to use data driven decisions. This is now like motherhood and apple pie, no business would say that is not useful because there's more information than what can fit in one person's head that the businesses want to know. You can either do Monday morning quarterback or in the middle of the third quarter before the game is over, you're maybe six points down, you look at what plays are working today, you look at who's injured in your team and who's injured in your opponent and you try to come up with plays that can change the outcome of the game. You still need Monday morning quarterbacking that's not going anywhere, that's batch analytics, that's BI, classic BI, and what the world is demanding more and more is operational intelligence like help me run my business better, don't just gimme a great report at the end of the quarter. >> Yeah, this is the whole trend. Looking back is key to post more like all that good stuff but being present to make future decisions is a lot more mainstream now than ever was you guys are the center of it, and I want to get your take on this data driven culture because the showcase this year for this next episode of the showcase for Startup says, cloud stuff says, data as code something I'm psyched for because I've been saying in theCUBE for many years, data as code is almost as important as infrastructure as code. Because when you think about the application of data in real-time, it's not easy, it's a hard problem and two, you want to make it easy so this is the whole point of this data driven culture that you're on right now. Can you talk about how you see that because this is really one of the most important stories we've seen since the last inflection point. >> Exactly right. What is data driven culture which basically means you stop guessing. You look at the data, you look at what the data says and you try to come up with hypothesis it's still guardrail, it's a guiding light it's not going to tell you what to do, but you need to be able to interrogate your data. If every time you ask a question and it takes 20 minutes for you to get an answer from your favorite Alexa CD or what have you you are probably not going to ever use that device you will not try to be data driven and you can't really build that culture, so it's not just about visibility it's not just about looking back and getting analytics on how the business is doing, you need to be able to interrogate your data in real-time in an interactive fashion, and that I think is what real-time analytics gives you. This is what we say when we say fast analytics on real-time data that's what we mean, which is, as you make changes to your business on the course of your day-to-day work, week-to-week work, what changes are working? How much impact is it having? If something isn't working you have more questions to figure out why and being able to answer all of that is how you really build the data driven culture and it isn't really going to come from just looking at static reports at the end of the week and at the end of the quarter. >> To talk about the latency aspect of the term and how it relates to where it could be a false flag in the sense of you could say, well, we have low latency but you're not getting all the data. You got to get the data, you got to ingest it, make it addressable, query it, represent it, these are huge things when you factor in every single data where you're not guessing latency is a factor. Can you unpack what this new definition is all about and how do people understand whether they got it right or not. >> A great question. A lot of people say, is five minutes real-time? Because I used to run my thing every six hours. Now for us, if it's more than two seconds behind in terms of your data latency, data freshness, it's too old. When does the present become the past and the future hasn't arrived yet and we think it's about one to two seconds. And so everything we do at Rockset we only call it real-time if it can be within one to two seconds 'cause that's the present, that's what's happening now, if it's five minutes ago, it's already five minutes ago it's already past tense. So if you kind of break it down, you're absolutely right that you have to be able to bring data into a system in real-time without sacrificing freshness, and you store it in a way where you can get fast analytics out of that so Rockset is the only real-time data platform real-time analytics platform with built-in connectors so this is why we have built-in connectors where without writing a single line of code, you can bring in data in real-time from wherever you happen to be managing it today. And when data comes into Rockset now the latency is about query processing. What is the point of bringing in data in real-time if every question you're going to ask is going to still take 20 minutes to come back. Well, then you might as well batch data in order to load it, so there I think we have a conversion indexing, we have a real-time indexing technology that allows data as it comes in real-time to be organized in a way and how a distributor SQL engine on top of that so as long as you can frame your question using a SQL query you can ask any question on your real-time data and expect subsequent response time. So that I think is the the combination of the latency having two parts to it, one is how fresh is your data and how fast is your analytics, and you need both, with the simplicity of the cloud for you to really unlock and make real-time analytics to default, as opposed to let me try to do it and batch and see if I can get away with it, but if you really need real-time you have to be able to do both cut down and control your data latency on how fresh your data is, and also make it fast. >> You talk about culture, can you talk about the people you're working with and how that translates into your next topic which is business observability, the next play on words obviously observability if you can measure everything, there shouldn't be any questions that you can't ask. But it's important this culture is shifting from hardcore data engineering to business value kind of coming together at scale. This is kind of where you see the hardcore data folks really bringing that into the business can you talk about this? The people you're working with, and how that's translating to this business observability. >> Absolutely. We work with the world's probably largest Buy Now Pay Later company maybe they're in the top three, they have hundreds of millions of users 300,000+ merchants, working in so many different countries so many different payment methods and there's a very simple problem they have. Some part of their product, some part of their payment system is always down at any given point in time or it has a very high chance of not working. It's not the whole thing is down but, for this one merchant in Switzerland, Apple Pay could be not working and so all of those kinds of transactions might not be processing, and so they had a very classic cloud data warehouse based solution, accumulate all these payments, every six hours they would kind of process and look for anomalies and say, hey, these things needs to be investigated and a response team needs to be tackling these. The business was growing so fast. Those analytical jobs that would run every six hours in batch mode was taking longer than six hours to run and so that was a dead end. They came to Rockset, simply using SQL they're able to define all the metrics they care about across all of their dimensions and they're all accurate up to the second, and now they're able to run their models every minute. And in sort of six hours, every minute they're able find anomalies and run their statistical models, so that now they can protect their business better and more than that, the real side effect of that is they can offer much better quality of a product, much better quality of service to their customer so that the customers are very sticky because now they're getting into the state where they know something is wrong with one of their more merchants, even before the merchants realize that, and that allows them to build a much better product to their end users. So business observability is all about that. It's about do you know really what's happening in your business and can you keep tabs on it, in real-time, as you go about your business and this is what we call operational intelligence, businesses are really demanding operational intelligence a lot more than just traditional BI. >> And we're seeing it in every aspect of a company the digital transformation affects every single department. Sales use data to get big sales better, make the product better people use data to make product usage whether it's A/B testing whatnot, risk management, OPS, you name it data is there to drill down so this is a huge part of real-time. Are you finding that the business observability is maturing faster now or where do you put the progress of companies with respect to getting on board with the idea that this wave is here. >> I think it's a very good question. I would say it has gone mainstream primarily because if you look at technologies like Apache Kafka, and you see Confluent doing really really well, those technologies have really enabled now customers and business units, business functions across the spectrum, to be able to now acquire really really important business data in real-time. If you didn't have those mechanisms to acquire the data in real-time, well, you can't really do analytics and get operational intelligence on that. And so the majority is getting there and things are growing very fast as those kinds of technologies get better and better. SaaSification also is a very big component to it which is like more and more business apps are basically becoming SaaS apps. Now that allows everything to be in the cloud and being interconnected and now when all of those data systems are all interconnected, you can now have APIs that make data flow from one system to another all in happening in real-time, and that also unlocks a lot more potential for again, getting better operational intelligence for your enterprise, and there's a subcategory to this which is like B2B SaaS companies also having to build real-time interactive analytics embedded as part of their offering otherwise people wouldn't even want to buy it and so that it's all interconnected. I think the market is emerging, market is growing but it is gone mainstream I would say predominantly because, Kafka, Confluent, and these kinds of real-time data collection and aggregation kind of systems have gone mainstream and now you actually get to dream about operational intelligence which you couldn't even think about maybe five or 10 years ago. >> They're getting all their data together. So to close it out, take us through the bottom line real-time business observability, great for companies collecting their data, but now you got B2B, you got B2C, people are integrating partnerships where APIs are connecting, it could be third party business relationships, so the data collection is not just inside the company it's also outside. This is more value. This is the more of what's going on. >> Exactly. So more and more, instead of going to your data team and demanding real-time analytics what a lot of business units are doing is, they're going to the product analytics platform, the SaaS app they're using for covering various parts of their business, they go to them and demand, either this is my recruiting software, sales software, customer support, gimme more real-time insights otherwise it's not really that useful. And so there is really a huge uptake on all these SaaS companies now building real-time infrastructure powered by Rockset in many cases that actually ends up giving a lot of value to their end customers and that I think is kind of the proof of value for a SaaS product, all the workflows are all very, very important absolutely but almost every amazing SaaS product has an analytics tab and it needs to be fast, interactive and it needs to be real-time. It needs you talking about fresh insights that are happening and that is often in a B2B SaaS, application developers always comes and tell us that's the proof of value that we can show how much value that that particular SaaS application is creating for their customer. So I think it's all two sides of the same coin, large enterprises want to build it themselves because now they get more control about how exactly the problem needs to be solved and then there are also other solutions where you rely on a SaaS application, where you demand that particular application gives you. But at the end of the day, I think the world is going real-time and we are very, very happy to be part of this moment, operational intelligence. For every classic BI use case I think there are 10 times more operational intelligence use cases. As Rockset we are on a mission to eliminate all cost and complexity barriers and really really provide fast analytics on real-time data with the simplicity of the cloud and really be part of this moment. >> You guys having some fun right now these days through in the middle of all the action. >> Absolutely. I think we're growing very fast, we're hiring, we are onboarding as many customers as possible and really looking forward to being part of this moment and really accelerate this moment from business intelligence to operational intelligence. >> Well, Venkat great to see you. Thanks for coming on theCUBE as part of this CUBE Conversation, you're in the class of AWS Startup Showcase season two, episode two. Thanks for coming on. Keep it right there everyone watch more action from theCUBE. Your leader in tech coverage, I'm John Furrier your host. Thanks for watching. (upbeat music)

Published Date : Mar 23 2022

SUMMARY :

and he's here to talk real-time analytics. It's a pleasure to be here. and I know you don't like and you try to come up with plays and two, you want to make it easy and it isn't really going to come from and how it relates to where it could be and make real-time analytics to default, and how that translates and that allows them to data is there to drill down and now you actually get to This is the more of what's going on. and it needs to be fast, interactive You guys having some and really accelerate this moment Well, Venkat great to see you.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
20 minutesQUANTITY

0.99+

SwitzerlandLOCATION

0.99+

five minutesQUANTITY

0.99+

JohnPERSON

0.99+

10 timesQUANTITY

0.99+

FacebookORGANIZATION

0.99+

Venkat VenkataramaniPERSON

0.99+

six hoursQUANTITY

0.99+

RocksetORGANIZATION

0.99+

Monday morningDATE

0.99+

John FurrierPERSON

0.99+

six pointsQUANTITY

0.99+

two partsQUANTITY

0.99+

two sidesQUANTITY

0.99+

VenkatPERSON

0.99+

oneQUANTITY

0.99+

this yearDATE

0.99+

five minutes agoDATE

0.99+

bothQUANTITY

0.98+

twoQUANTITY

0.98+

more than two secondsQUANTITY

0.98+

two secondsQUANTITY

0.98+

hundreds of millionsQUANTITY

0.98+

longer than six hoursQUANTITY

0.97+

300,000+ merchantsQUANTITY

0.97+

firstQUANTITY

0.94+

AWSORGANIZATION

0.94+

ApacheORGANIZATION

0.93+

ConfluentORGANIZATION

0.93+

todayDATE

0.93+

season twoQUANTITY

0.92+

appleORGANIZATION

0.92+

endDATE

0.91+

every six hoursQUANTITY

0.9+

single lineQUANTITY

0.89+

AppleORGANIZATION

0.88+

CUBE ConversationEVENT

0.88+

fiveDATE

0.88+

SQLTITLE

0.86+

AlexaTITLE

0.82+

one systemQUANTITY

0.82+

SaaSTITLE

0.82+

episode twoQUANTITY

0.81+

10 years agoDATE

0.8+

single departmentQUANTITY

0.77+

AWS Startup ShowcaseEVENT

0.77+

one personQUANTITY

0.76+

one merchantQUANTITY

0.74+

third quarterDATE

0.73+

Startup ShowcaseEVENT

0.72+

CUBETITLE

0.7+

about oneQUANTITY

0.69+

single dataQUANTITY

0.68+

PayTITLE

0.68+

threeQUANTITY

0.64+

coupleQUANTITY

0.57+

theCUBEORGANIZATION

0.52+

ConversationEVENT

0.51+

KafkaTITLE

0.44+

theCUBETITLE

0.37+

Fangjin Yang, Imply.io | CUBE Conversation


 

(bright upbeat music) >> Welcome, everyone, to this CUBE Conversation featuring Imply. I'm your host, Lisa Martin. Today, we are excited to be joined by FJ Yang, the co-founder and CEO of Imply. FJ, thanks so much for joining us today. >> Lisa, thank you so much for having me. >> Tell me a little bit about yourself and about Imply. >> Yeah, absolutely. So, I started Imply a couple years ago and before start the company, I was a technologist. So, I was a software engineer and software developer primarily specializing in distributed systems. And one of the projects I worked on, ultimately became kind of the centerpiece behind Imply. Imply, as a company is a database company. What we do is we provide developers a powerful tool in order to help them build various types of data analytic applications. We're also an open source company, where the company develops a popular open source project called Apache Druid. >> Got it, so database as a service for modern analytics applications. You're also one of the original authors of Apache Druid. Talk to me, gimme a timeline, Druid's 10-year history or so. What's the big picture? What's been the market evolution that you've seen? >> Yeah, absolutely. So, I moved out to Silicon Valley basically to try and work at a startup, 'cause I was enamored with startups and I thought they were the coolest thing ever. So, at one point, I basically joined the smallest startup I could find. It was a startup called Metamarkets, which actually doesn't exist anymore, it was ultimately acquired by Snapchat a couple years ago. But, I was one of the first employees there. And what we were trying to do at the time, was we were trying to build an analytics application, a user-facing application where people could slice and dice various types of data. At the time, the data sets we were working with were like online advertising, digital advertising data sets which were very large and complex. And, we really struggled to find a database that could basically power the kind of interactive and user experience that we know we want to provide our end customers. So, what ended up happening was we decided to build our own database and we were a three or five-person shop when we decided to build our own database, and that was Druid. And over time, we saw many other types of companies actually struggle with a similar set of problems, albeit with very different types of use cases and very different types of data sets. And, the Druid community kind of grew and evolved from that. And in my work in engaging with the community, what I saw was a market opportunity and a market gap and that's where Imply formed. >> Let's double click on that. You talked about why you built Druid, the problem you were looking to solve. But, talk to me about the role that Imply has. >> Right. So, Imply is a commercial company. What we do is we build kind of an end-to-end enterprise product around Druid as the core engine. Imply provides deployment, it deploys management, it provides security, and it also provides visualization and monitoring pieces around Druid as a core engine. What we aim to do at Imply is really enable developers to build various types of data applications with only the click of a few buttons and interacting with a simple set of APIs. So, the goal is, if you're a developer, you don't have to think about managing the database yourself, you don't have to think about the operational complexity at the database, but instead, what you do is just work with APIs and build your application. >> So, then what gives Druid its superpower? What makes Druid Druid? >> Yeah, so, Druid, the easiest way to think about it, is it's a really fast calculator and it's a very fast calculator for a whole lot of data. So, when you have a whole lot of data and you want to crunch numbers very, very quickly, Druid is very good at doing that. And, people always ask me this question, which is, what makes Druid special? And I always struggle with it, because it's never just one thing, it's actually layers, upon layers, upon layers of engineering. You start with fundamentals of how you maximally optimize the resources of any hardware. So, how do you maximize storage? How do you maximize compute? And then, there's a lot of optimizations around how do you store the data? How do you access that data in a very fast way once it's stored in order to run computations very quickly? So, unfortunately, there's no silver bullet about Druid, but maybe I can summarize in this way. Druid, it's like a search system, and a data warehouse, and a time series database all mixed together. And, that architecture enables it to be very, very quickly. And unfortunately, if you don't know what some of the components I'm talking about are, it's hard to describe where the secret sauce is (chuckling). >> Sometimes you want to keep that secret sauce secret. Talk to me about the overall data space, as we see these days, every company is a data company or if it's not, it needs to be to be successful. Where does Druid fit in the overall data space? Give us that picture of where it fits. >> Yeah, absolutely. So, it's pretty interesting that you see now in the public markets as well as the private markets, some of the hottest unicorns out there are actually data companies. And, I think what people are are understanding now for the first time, is just how vast and complex the data space is and also how large the market is as well. So for sure, there's many different components and pieces in the data space, and they oftentimes come together to form what's known as a data stack. So, data stack is basically kind of an architecture that has various systems and each of these systems are designed to do a certain set of things very, very well. For example, a company that recently went public is a company called Confluent, which mostly catered towards data transport, so getting data from one place to another. They're built around an open source engine called Apache Kafka. Databricks is another mega unicorn that's going to go public pretty soon. And they're built around an open source project called Spark, which is mainly used for data processing. Where we sit is on the data query side. So, what that means is we're a system in which people can store data and then access that data very, very quickly. And there's other systems that do that, but where our bread and butter is, is we're building some sort of application, where you have end users that are clicking buttons in order to get access to data, we're a platform that enables the best end user experience. We return queries very, very quickly with a consistent SLA, we immediately visualize data as soon as it's made available, and then we can support many, many, many concurrent end users to access the system at the same time. >> So, real time. One of the things I think that we learned during the pandemic, one of the many things is that access to real time data, it's no longer a nice to have, it is table stakes for, as I said, every company, these days is a data company. So with how you describe it, how should people think of Druid versus a data warehouse? >> Yeah. So, that's a great question. And obviously, data warehouses have been around since the 70s. In the B2B space, they're among the largest players that kind of exist in enterprise software. So, it's only natural that when you come up with sort of a new analytics database, that people compare it with what they already know, which is data warehouse. So, a lot of how we think about why we're different than data warehouse goes back to how I answered the previous question, and that we're focused right now, really, on powering different types of data applications. Data applications are UIs in which people are really accessing and getting insights from data by clicking buttons versus writing more complex equal queries. And when you click buttons and you get access to data, what you want in terms of an end user experience, is you want answers to questions to come back almost immediately. So you don't want to click a button and then see a spinning dial that goes on for minute and minutes before an answer comes back. You basically want results to come back immediately. You want that experience no matter what types of queries that you're issuing or how many people are issuing those queries. If you have thousands, if not tens of thousands of people that are trying to access data exact same time, you want to give a consistent user experience like Google, which is one of my favorite products. There're millions of people that use Google, and ask questions and they get their answers back immediately. So we try to provide that same experience, but instead of a generic search engine, what we're doing is we're providing a system that basically answers questions on data and users get a very interactive and fast experience when asking questions. And that's something that I think is very different than what data warehouses are primarily specialized in. Data warehouses are really designed to be systems in which people write very large complex sequel queries that might take minutes or hours sometimes to run. But the experience of using a data warehouse to power and application is not a great one. >> So, I'm just curious, FJ, in the last couple of years, with, as I mentioned before the access to real time data no longer a nice to have, but it's something business critical for so many industries, did you see any industries in particular in the recent years that were really primed candidates for what Druid would can deliver? >> Yeah, that's a great question. And you can imagine that the industries that really heavily rely on fast decision making are the ones that are earliest to adopt technologies like this. So, in the security space, and the observability space, as well as working with networking and various forms of backend kind of metrics data, this system has been very popular and it's been popular because people need to triage (indistinct) as they occur, they need to resolve problems, and they also need immediate visibility, as well as very fast queries on data. Another space is online advertising. Online advertising, nowadays is almost entirely programmatic and digital. So, response times are critical in order to make decisions. And that's where Druid was actually born. It was born for advertising before it kind of went everywhere else. We're seeing it more in fraud protection, fraud prevention as well as fraud diagnostics nowadays. We're seeing it in retail as well, which is pretty interesting. And, the goal, of course, is I believe every industry and every vertical needs the capabilities that we provide. So hopefully, we see a whole lot more use cases in the near future. >> Right, it's absolutely horizontal these days. So, 10-year history, you've got a community of thousands, what's the future of Druid? What do you see when you open the crystal ball and look now down the 12 months, 18 months road? >> Yeah. So, I think as a technologist, your goal as the technologist, at least for me, is to try and create technology that has as much applicability as possible and solves problems for as many people as possible. That's always the way I think about it. So, I want to do good engineering and I want to build good systems. And I think what the hallmark of a really good system is you can solve all different types of problems and condense all these different problems, actually into the same set of models and the same set of principles. And, a thing that makes me most excited about Druid is the many, many different industries that it's found value and the many different use cases it's found value. So, if I were to give 30,000 foot roadmap, that's what we're trying to do with the next generation of Druid. We're actually doing a pretty major engine upgrade right now, and pretty major overhaul the entire system. And the goal of that is to take all the learnings that we've had over the last decade and to create something new that can solve an expanded set of problems that we've heard from the community and from other places as well. >> Excellent. FJ, exciting work that you've done the last 10 years. Congratulations on that. Looking forward to the roadmap that you talked about. Thanks for sharing what Druid is, the Imply connection, and all the different use cases where it applies. We appreciate your insights. >> Appreciate you having me on the show. Thank you very much. >> My pleasure. For FJ Yang, I'm Lisa Martin. You're watching this CUBE Conversation, the leader in live tech enterprise coverage. (bright upbeat music)

Published Date : Mar 23 2022

SUMMARY :

the co-founder and CEO of Imply. and before start the company, You're also one of the original At the time, the data sets we were working the problem you were looking to solve. So, the goal is, if you're a developer, of the components I'm talking about are, the overall data space? in the data space, One of the things I think So, a lot of how we think So, in the security space, and look now down the 12 and the same set of principles. and all the different use Appreciate you having me on the show. the leader in live tech

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Lisa MartinPERSON

0.99+

thousandsQUANTITY

0.99+

Silicon ValleyLOCATION

0.99+

LisaPERSON

0.99+

SnapchatORGANIZATION

0.99+

10-yearQUANTITY

0.99+

18 monthsQUANTITY

0.99+

FJ YangPERSON

0.99+

threeQUANTITY

0.99+

ImplyORGANIZATION

0.99+

ConfluentORGANIZATION

0.99+

12 monthsQUANTITY

0.99+

30,000 footQUANTITY

0.99+

DruidTITLE

0.99+

eachQUANTITY

0.99+

oneQUANTITY

0.99+

Fangjin YangPERSON

0.99+

first timeQUANTITY

0.98+

TodayDATE

0.98+

GoogleORGANIZATION

0.98+

todayDATE

0.98+

millions of peopleQUANTITY

0.98+

OneQUANTITY

0.98+

Imply.ioORGANIZATION

0.97+

MetamarketsORGANIZATION

0.96+

five-personQUANTITY

0.96+

first employeesQUANTITY

0.94+

tens of thousands of peopleQUANTITY

0.94+

pandemicEVENT

0.94+

last couple of yearsDATE

0.91+

FJPERSON

0.91+

70sDATE

0.89+

one thingQUANTITY

0.89+

DatabricksORGANIZATION

0.88+

one pointQUANTITY

0.87+

DruidPERSON

0.84+

couple years agoDATE

0.81+

last decadeDATE

0.75+

Apache DruidORGANIZATION

0.73+

ConversationEVENT

0.73+

ApacheORGANIZATION

0.72+

last 10 yearsDATE

0.72+

doubleQUANTITY

0.69+

SparkTITLE

0.66+

my favorite productsQUANTITY

0.62+

CUBE ConversationTITLE

0.58+

minutesQUANTITY

0.54+

minuteQUANTITY

0.51+

KafkaTITLE

0.41+

CUBE ConversationEVENT

0.31+

Justin Borgman, Starburst and Teresa Tung, Accenture | AWS re:Invent 2021


 

>>Hey, welcome back to the cubes. Continuing coverage of AWS reinvent 2021. I'm your host, Lisa Martin. This is day two, our first full day of coverage. But day two, we have two life sets here with AWS and its ecosystem partners to remote sets over a hundred guests on the program. We're going to be talking about the next decade of cloud innovation, and I'm pleased to welcome back to cube alumni to the program. Justin Borkman is here, the co-founder and CEO of Starburst and Teresa Tung, the cloud first chief technologist at Accenture guys. Welcome back to the queue. Thank you. Thank you for having me. Good to have you back. So, so Teresa, I was doing some research on you and I see you are the most prolific prolific inventor at Accenture with over 220 patents and patent applications. That's huge. Congratulations. Thank you. Thank you. And I love your title. I think it's intriguing. I'd like to learn a little bit more about your role cloud-first chief technologist. Tell me about, >>Well, I get to think about the future of cloud and if you think about clouded powers, everything experiences in our everyday lives and our homes and our car in our stores. So pretty much I get to be cute, right? The rest of Accenture's James Bond >>And your queue. I like that. Wow. What a great analogy. Just to talk to me a little bit, I know service has been on the program before, but give me a little bit of an overview of the company, what you guys do. What were some of the gaps in the markets that you saw a few years ago and said, we have an idea to solve this? Sure. >>So Starburst offers a distributed query engine, which essentially means we're able to run SQL queries on data anywhere, uh, could be in traditional relational databases, data lakes in the cloud on-prem. And I think that was the gap that we saw was basically that people had data everywhere and really had a challenge with how they analyze that data. And, uh, my co-founders are the creators of an open source project originally called Presto now called Trino. And it's how Facebook and Netflix and Airbnb and, and a number of the internet companies run their analytics. And so our idea was basically to take that, commercialize that and make it enterprise grade for the thousands of other companies that are struggling with data management, data analytics problems. >>And that's one of the things we've seen explode during the last 22 months, among many other things is data, right? In every company. These days has to be a data company. If they're not, there's a competitor in the rear view rear view mirror, ready to come and take that place. We're going to talk about the data mesh Teresa, we're going to start with you. This is not a new car. This is a new concept. Talk to us about what a data mesh is and why organizations need to embrace this >>Approach. So there's a canonical definition about data mesh with four attributes and any data geek or data architect really resonates with them. So number one, it's really routed decentralized domain ownership. So data is not within a single line of business within a single entity within a single partner has to be across different domains. Second is publishing data as products. And so instead of these really, you know, technology solutions, data sets, data tables, really thinking about the product and who's going to use it. The third one is really around self-service infrastructure. So you want everybody to be able to use those products. And finally, number four, it's really about federated and global governance. So even though their products, you really need to make sure that you're doing the right things, but what's data money. >>We're not talking about a single tool here, right? This is more of a, an approach, a solution. >>It is a data strategy first and foremost, right? So companies, they are multi-cloud, they have many projects going on, they are on premise. So what do you do about it? And so that's the reality of the situation today, and it's first and foremost, a business strategy and framework to think about the data. And then there's a new architecture that underlines and supports that >>Just didn't talk to me about when you're having customer conversations. Obviously organizations need to have a core data strategy that runs the business. They need to be able to, to democratize really truly democratized data access across all business units. What are some of the, what are some of your customer conversations like are customers really embracing the data strategy, vision and approach? >>Yeah, well, I think as you alluded to, you know, every business is data-driven today and the pandemic, if anything has accelerated digital transformation in that move to become data-driven. So it's imperative that every business of every shape and size really put the power of data in the hands of everyone within their organization. And I think part of what's making data mesh resonates so well, is that decentralization concept that Teresa spoke about? Like, I think companies acknowledge that data is inherently decentralized. They have a lot of different database systems, different teams and data mesh is a framework for thinking about that. Then not only acknowledges that reality, but also braces it and basically says there's actually advantages to this decentralized approach. And so I think that's, what's driving the interest level in the data mesh, uh, paradigm. And it's been exciting to work with customers as they think about that strategy. And I think that, you know, essentially every company in the space is, is in transition, whether they're moving from on cloud to the prem, uh, to, uh, sorry, from on-prem to the cloud or from one cloud to another cloud or undergoing that digital transformation, they have left behind data everywhere. And so they're, they're trying to wrestle with how to grasp that. >>And there's, we know that there's so much value in data. The, the need is to be able to get it, to be able to analyze it quickly in real time. I think another thing we learned in the pandemic is it real-time is no longer a nice to have. It is essential for businesses in every organization. So Theresa let's talk about how Accenture and servers are working together to take the data mesh from a concept of framework and put this into production into execution. >>Yeah. I mean, many clients are already doing some aspect of the data mesh as I listed those four attributes. I'm sure everybody thought like I'm already doing some of this. And so a lot of that is reviewing your existing data projects and looking at it from a data product landscape we're at Amazon, right? Amazon famous for being customer obsessed. So in data, we're not always customer obsessed. We put up tables, we put up data sets, feature stores. Who's actually going to use this data. What's the value from it. And I think that's a big change. And so a lot of what we're doing is helping apply that product lens, a literal product lens and thinking about the customer. >>So what are some w you know, we often talk about outcomes, everything being outcomes focused and customers, vendors wanting to help customers deliver big outcomes, you know, cost reduction, et cetera, things like that. How, what are some of the key outcomes Theresa that the data mesh framework unlocks for organizations in any industry to be able to leverage? >>Yeah. I mean, it really depends on the product. Some of it is organizational efficiency and data-driven decisions. So just by the able to see the data, see what's happening now, that's great. But then you have so beyond the, now what the, so what the analytics, right. Both predictive prescriptive analytics. So what, so now I have all this data I can analyze and drive and predict. And then finally, the, what if, if I have this data and my partners have this data in this mesh, and I can use it, I can ask a lot of what if and, and kind of game out scenarios about what if I did things differently, all of this in a very virtualized data-driven fashion, >>Right? Well, we've been talking about being data-driven for years and years and years, but it's one thing to say that it's a whole other thing to actually be able to put that into practice and to use it, to develop new products and services, delight customers, right. And, and really achieve the competitive advantage that businesses want to have. Just so talk to me about how your customer conversations have changed in the last 22 months, as we've seen this massive acceleration of digital transformation companies initially, really trying to survive and figure out how to pivot, not once, but multiple times. How are those customer conversations changing now is as that data strategy becomes core to the survival of every business and its ability to thrive. >>Yeah. I mean, I think it's accelerated everything and, and that's been obviously good for companies like us and like Accenture, cause there's a lot of work to be done out there. Um, but I think it's a transition from a storage centric mindset to more of an analytics centric mindset. You know, I think traditionally data warehousing has been all about moving data into one central place. And, and once you get it there, then you can analyze it. But I think companies don't have the time to wait for that anymore. Right there, there's no time to build all the ETL pipelines and maintain them and get all of that data together. We need to shorten that time to insight. And that's really what we, what we've been focusing on with our, with our customers, >>Shorten that time to insight to get that value out of the data faster. Exactly. Like I said, you know, the time is no longer a nice to have. It's an absolute differentiator for folks in every business. And as, as in our consumer lives, we have this expectation that we can get whatever we want on our phone, on any device, 24 by seven. And of course now in our business lives, we're having the same expectation, but you have to be able to unlock that access to that data, to be able to do the analytics, to make the decisions based on what the data say. Are you, are you finding our total? Let's talk about a little bit about the go to market strategy. You guys go in together. Talk to me about how you're working with AWS, Theresa, we'll start with you. And then Justin we'll head over to you. Okay. >>Well, a lot of this is powered by the cloud, right? So being able to imagine a new data business to run the analytics on it and then push it out, all of that is often cloud-based. But then the great thing about data mesh it's it gives you a framework to look at and tap into multi-cloud on-prem edge data, right? Data that can't be moved because it is a private and secure has to be at the edge and on-prem so you need to have that's their data reality. And the cloud really makes this easier to do. And then with data virtualization, especially coming from the digital natives, we know it scales >>Just to talk to me about it from your perspective that the GTL. >>Yeah. So, I mean, I think, uh, data mesh is really about people process and technology. I think Theresa alluded to it as a strategy. It's, it's more than just technology. Obviously we bring some of that technology to bear by allowing customers to query the data where it lives. But the people in process side is just as important training people to kind of think about how they do data management, data analytics differently is essential thinking about how to create data as a product. That's one of the core principles that Theresa mentioned, you know, that's where I think, um, you know, folks like Accenture can be really instrumental in helping people drive that transformational change within their organization. And that's >>Hard. Transformational change is hard with, you know, the last 22 months. I've been hard on everyone for every reason. How are you facilitating? I'm curious, like to get Theresa, we'll start with you, your perspectives on how our together as servers and Accenture, with the power of AWS, helping to drive that cultural change within organizations. Because like we talked about Justin there, nobody has extra time to waste on anything these days. >>The good news is there's that imperative, right? Every business is a digital business. We found that our technology leaders, right, the top 10% investors in digital, they are outperforming are the laggards. So before pandemic, it's times to post pep devek times five, so there's a need to change. And so data is really the heart of the company. That's how you unlock your technical debt into technical wealth. And so really using cloud and technologies like Starburst and data virtualization is how we can actually do that. >>And so how do you, Justin, how does Starburst help organizations transfer that technical debt or reduce it? How does the D how does the data much help facilitate that? Because we talk about technical debt and it can, it can really add up. >>Yeah, well, a lot of people use us, uh, or think about us as an abstraction layer above the different data sources that they have. So they may have legacy data sources today. Um, then maybe they want to move off of over time, um, could be classical data, warehouses, other classical, uh, relational databases, perhaps they're moving to the cloud. And by leveraging Starburst as this abstraction, they can query the data that they have today, while in the background, moving data into the cloud or moving it into the new data stores that they want to utilize. And it sort of hides that complexity. It decouples the end user experience, the business analyst, the data scientists from where the data lives. And I think that gives people a lot of freedom and a lot of optionality. And I think, you know, the only constant is change. Um, and so creating an architecture that can stand the test of time, I think is really, really important. >>Absolutely. Speaking of change, I just saw the announcement about Starburst galaxy fully managed SAS platform now available in all three major clouds. Of course, here we are at AWS. This is a, is this a big directional shift for servers? >>It is, you know, uh, I think there's great precedent within open source enterprise software companies like Mongo DB or confluent who started with a self managed product, much the way that we did, and then moved in the direction of creating a SAS product, a cloud hosted, fully managed product that really I think, expands the market. And that's really essentially what we're doing with galaxy galaxy is designed to be as easy as possible. Um, you know, Starburst was already powerful. This makes it powerful and easy. And, uh, and, and in our view, can, can hopefully expand the market to thousands of potential customers that can now leverage this technology in a, in a faster, easier way, >>Just in sticking with you for a minute. Talk to me about kind of where you're going in, where services heading in terms of support for the data mesh architecture across industries. >>Yeah. So a couple of things that we've, we've done recently, and whether we're doing, uh, as we speak, one is, uh, we introduced a new capability. We call star gate. Now star gate is a connector between Starburst clusters. So you're going to have a Starbucks cluster, and let's say Azure service cluster in AWS, a Starbucks cluster, maybe an AWS west and AWS east. And this basically pushes the processing to where the data lives. So again, living within this construct of, uh, of decentralized data that a data mesh is all about, this allows you to do that at an even greater level of abstraction. So it doesn't even matter what cloud region the data lives in or what cloud entirely it lives in. And there are a lot of important applications for this, not only latency in terms of giving you fast, uh, ability to join across those different clouds, but also, uh, data sovereignty constraints, right? >>Um, increasingly important, especially in Europe, but increasingly everywhere. And, you know, if your data isn't Switzerland, it needs to stay in Switzerland. So starting date as a way of pushing the processing to Switzerland. So you're minimizing the data that you need to pull back to complete your analysis. And, uh, and so we think that's a big deal about, you know, kind of enabling a data mash on a, on a global scale. Um, another thing we're working on back to the point of data products is how do customers curate and create these data products and share them within their organization. And so we're investing heavily in our product to make that easier as well, because I think back to one of the things, uh, Theresa said, it's, it's really all about, uh, making this practical and finding quick wins that customers can deploy, deploy in their data mess journey, right? >>This quick wins are key. So Theresa, last question to you, where should companies go to get started today? Obviously everybody has gotten, we're still in this work from anywhere environment. Companies have tons of data, tons of sources of data, did it, infrastructure's already in place. How did they go and get started with data? >>I think they should start looking at their data projects and thinking about the best data products. I think just that mindset shift about thinking about who's this for what's the business value. And then underneath that architecture and support comes to bear. And then thinking about who are the products that your product could work better with just like any other practice partnerships, like what we have with AWS, right? Like that's a stronger together sort of thing, >>Right? So there's that kind of that cultural component that really strategic shift in thinking and on the architecture. Awesome guys, thank you so much for joining me on the program, coming back on the cube at re-invent talking about data mesh really help. You can help organizations and industry put that together and what's going on at service. We appreciate your time. Thanks again. All right. For my guests, I'm Lisa Martin, you're watching the cubes coverage of AWS reinvent 2021. The cube is the leader in global live tech coverage. We'll be right back.

Published Date : Nov 30 2021

SUMMARY :

Good to have you back. Well, I get to think about the future of cloud and if you think about clouded powers, I know service has been on the program before, but give me a little bit of an overview of the company, what you guys do. And it's how Facebook and Netflix and Airbnb and, and a number of the internet And that's one of the things we've seen explode during the last 22 months, among many other things is data, So even though their products, you really need to make sure that you're doing the right things, but what's data money. This is more of a, an approach, And so that's the reality of the situation today, and it's first and foremost, Just didn't talk to me about when you're having customer conversations. And I think that, you know, essentially every company in the space is, The, the need is to be able to get it, And so a lot of that is reviewing your existing data projects So what are some w you know, we often talk about outcomes, So just by the able to see the data, see what's happening now, that's great. Just so talk to me about how your customer conversations have changed in the last 22 But I think companies don't have the time to wait for that anymore. Let's talk about a little bit about the go to market strategy. And the cloud really makes this easier to do. That's one of the core principles that Theresa mentioned, you know, that's where I think, I'm curious, like to get Theresa, we'll start with you, your perspectives on how And so data is really the heart of the company. And so how do you, Justin, how does Starburst help organizations transfer that technical And I think, you know, the only constant is change. This is a, is this a big directional can, can hopefully expand the market to thousands of potential customers that can now leverage Talk to me about kind of where you're going in, where services heading in the processing to where the data lives. And, uh, and so we think that's a big deal about, you know, kind of enabling a data mash So Theresa, last question to you, where should companies go to get started today? And then thinking about who are the products that your product could work better with just like any other The cube is the leader in global live tech coverage.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Lisa MartinPERSON

0.99+

TheresaPERSON

0.99+

AWSORGANIZATION

0.99+

Teresa TungPERSON

0.99+

Justin BorkmanPERSON

0.99+

Justin BorgmanPERSON

0.99+

TeresaPERSON

0.99+

AmazonORGANIZATION

0.99+

JustinPERSON

0.99+

EuropeLOCATION

0.99+

SwitzerlandLOCATION

0.99+

StarburstORGANIZATION

0.99+

AccentureORGANIZATION

0.99+

SecondQUANTITY

0.99+

thousandsQUANTITY

0.99+

NetflixORGANIZATION

0.99+

FacebookORGANIZATION

0.99+

third oneQUANTITY

0.99+

pandemicEVENT

0.98+

four attributesQUANTITY

0.98+

BothQUANTITY

0.98+

todayDATE

0.98+

24QUANTITY

0.98+

firstQUANTITY

0.98+

AirbnbORGANIZATION

0.98+

over 220 patentsQUANTITY

0.97+

over a hundred guestsQUANTITY

0.97+

2021DATE

0.97+

oneQUANTITY

0.96+

StarbucksORGANIZATION

0.96+

single partnerQUANTITY

0.96+

PrestoORGANIZATION

0.96+

single lineQUANTITY

0.96+

sevenQUANTITY

0.95+

confluentORGANIZATION

0.95+

10%QUANTITY

0.94+

one central placeQUANTITY

0.94+

one thingQUANTITY

0.93+

single toolQUANTITY

0.92+

day twoQUANTITY

0.92+

next decadeDATE

0.92+

single entityQUANTITY

0.92+

star gateTITLE

0.92+

Mongo DBORGANIZATION

0.91+

last 22 monthsDATE

0.91+

two lifeQUANTITY

0.91+

StarburstTITLE

0.88+

last 22 monthsDATE

0.87+

Venkat Venkataramani and Dhruba Borthakur, Rockset | CUIBE Conversation


 

(bright intro music) >> Welcome to this "Cube Conversation". I'm your host, Lisa Martin. This is part of our third AWS Start-up Showcase. And I'm pleased to welcome two gentlemen from Rockset, Venkat Venkataramani is here, the CEO and co-founder and Dhruba Borthakur, CTO and co-founder. Gentlemen, welcome to the program. >> Thanks for having us. >> Thank you. >> Excited to learn more about Rockset, Venkat, talk to me about Rockset and how it's putting real-time analytics within the reach of every company. >> If you see the confluent IPO, if you see where the world is going in terms of analytics, I know, we look at this, real-time analytics is like the lost frontier. Everybody wants fast queries on fresh data. Nobody wants to say, "I don't need that. You know, give me slow queries on stale data," right? I think if you see what data warehouses and data lakes have done, especially in the cloud, they've really, really made batch analytics extremely accessible, but real-time analytics still seems too clumsy, too complex, and too expensive for most people. And we are on a mission to make, you know, real-time analytics, make it very, very easy and affordable for everybody to be able to take advantage of that. So that's our, that's what we do. >> But you're right, nobody wants a stale data or slower queries. And it seems like one of the things that we learned, Venkat, sticking with you in the last 18 months of a very strange world that we're living in, is that real-time is no longer a nice to have. It's really a differentiator and table stakes for businesses in every industry. How do you make it more affordable and accessible to businesses in so many different industries? >> I think that's a great question. I think there are, at a very high level, there are two categories of use cases we see. I think there is one full category of use cases where business teams and business units are demanding almost like business observability. You know, if you think about one domain that actually understood real-time and made everything work in real-time is the DevOps world, you know, metrics and monitoring coming out of like, you know, all these machines and because they really want to know as soon as something goes wrong, immediately, I want to, you know, be able to dive in and click and see what happens. But now businesses are demanding the same thing, right? Like a CEO wants to know, "Are we on track to hit our quarterly estimates or not? And tell me now what's happening," because you know, the larger the company, the more complex that have any operations dashboards are. And, you know, if you don't give them real-time visibility, the window of opportunity to do something about it disappears. And so they are really, businesses is really demanding that. And so that is one big use case we have. And the other strange thing we're also seeing is that customers are demanding real-time even from the products they are using. So you could be using a SaaS product for sales automation, support automation, marketing automation. Now I don't want to use a product if it doesn't have real-time analytics baked into the product itself. And so all these software companies, you know, providing a SaaS service to their cloud customers and clients, they are also looking to actually, you know, their proof of value really comes from the analytics that they can show within the product. And if that is not interactive and real-time, then they are also going to be left behind. So it's really a huge differentiator whether you're building a software product or your running a business, the real-time observability gives you a window of opportunity to actually do something about, you know, when something goes wrong, you can actually act on it very, very quickly. >> Right, which is absolutely critical. Dhruba, I want to get your take on this. As the CTO and co-founder as I introduced you, what were some of the gaps in the market back in 2016 that you saw that really necessitated the development of this technology? >> Yeah, for real-time analytics, the difference compared to what it was earlier is that all your things used to be a lot of batch processes. Again, the reason being because there was something called MapReduce, and that was a scanning system that was kind of a invention from Google, which talked about processing big data sets. And it was about scanning, scanning large data sets to give answers. Whereas for real-time analytics, the new trend is that how can you index these big datasets so that you can answer queries really fast? So this is what Rockset does as well, is that we have capabilities to index humongous amounts of data cheaply, efficiently, and economically feasible for our customers. And that's why query is the leverage the index to give fast (indistinct). This is one of the big changes. The other change obviously is that it has moved to the cloud, right? A lot of analytics have moved to the cloud. So Rockset is built natively for the cloud, which is why we can scale up, scale down resources when queries come and we can provide a great (indistinct) for people as data latency, and as far as query latencies comes on, both of these things. So these two trends, I think, are kind of the power behind moving, making people use more real-time analytics. >> Right, and as Venkat was talking about how it's an absolute differentiator for businesses, you know, last year we saw this really, this quick, all these quick pivots to survive and ultimately thrive. And we're seeing the businesses now coming out of this, that we're able to do that, and we're able to pivot to digital, to be successful and to out-compete those who maybe were not as fast. I saw that recently, Venkat, you guys had a new product release a few weeks ago, major product release, that is making real-time analytics on streaming data sources like Apache Kafka, Amazon Kinesis, Amazon DynamoDB, and data lakes a lot more accessible and affordable. Breakdown that launch for me, and how is it doing the accessibility and affordability that you talked about before? >> Extremely good question. So we're really excited about what we call SQL-based roll-ups, is what we call that release. So what does that do? So if you think about real-time analytics and even teeing off the previous question you asked on what is the gap in the market? The gap in the market is really, all that houses and lakes are built for batch. You know, they're really good at letting people accumulate huge volumes of data, and once a week, analyst asking a question, generating a report, and everybody's looking at it. And with real-time, the data never stops coming. The queries never stop coming. So how do you, if I want real-time metrics on all this huge volumes of data coming in, now if I drain it into a huge data lake and then I'm doing analytics on that, it gets very expensive and very complex very quickly. And so the new release that we had is called SQL-based roll-ups, where simply using SQL, you can define any real-time metric that you want to track across any dimensions you care about. It could be geo demographic and other dimensions you care about that and Rockset will automatically maintain all those real-time metrics for you in real-time in a highly accurate fashion. So you never have to doubt whether the metrics are valid and it will be accurate up to the second. And the best part is you don't have to learn a new language. You can actually use SQL to define those metrics and Rockset will automatically maintain that and scale that for you in the cloud. And that, I think, reduces the barrier. So like if somebody wants to build a real-time, you know, track something for their business in real-time, you know, you have to duct tape together multiple, disparate components and systems that were never meant to work with each other. Now you have a real-time database built for the cloud that is fully, you know, supports full feature SQL. So you can do this in a matter of minutes, which would probably take you days or weeks with alternate technologies. >> That's a dramatic X reduction in time there. I want to mention the Snowflake IPO since you guys mentioned the Confluent IPO. You say that Rockset does for real-time, what Snowflake did for batch. Dhruba, I want to get your perspective on that. Tell me about that. What do you mean by that? >> Yeah, so like we see this trend in the market where lot of analytics, which are very batch, they get a lot of value if they've moved more real-time, right? Like Venkat mentioned, when analytics powers, actual products, which need to use analytics into their, to make the product better. So Rockset very much plays in this area. So Rockset is the only solution. I shouldn't say solution. It's a database, it's a real-time database, which powers these kind of analytic systems. If you don't use Rockset, then you might be using maybe a warehouse or something, but you cannot get real-time because there is always a latency of putting data into the warehouse. It could be minutes, it could be hours. And then also you don't get too many people making concurrent queries on the warehouse. So this is another difference for real-time analytics because it powers applications, the query volume could be large. So that's why you need a real-time database and not a real-time warehouse or any other technologies for this. And this trend has really caught up because most people have either, are pretty much into this journey. You asked me this previous question about what has changed since 2016 as well. And this is a journey that most enterprises we see are already embarking upon. >> One thing too, that we're seeing is that more and more applications are becoming data intensive applications, right? We think of whether it's Instagram or DoorDash or whatnot, or even our banking app, we expect to have the information updated immediately. How do you help, Dhruba, sticking with you, how do you help businesses build and power those data intensive applications that the consumers are demanding? >> That's a great question. And we have booked, me and Venkat, we have seen these data applications at large scale when we were at Facebook earlier. We were both parts of the Facebook team. So we saw how real-time was really important for building that kind of a business, that was social media. But now we are taking the same kind of back ends, which can scale to like huge volumes of data to the enterprises as well. Venkat, do you have anything to add? >> Yeah, I think when you're trying to go from batch to real-time, you're 100% spot on that, a static report, a static dashboard actually becomes an application, becomes a data application, and it has to be interactive. So you're not just showing a newspaper where you just get to read. You want to click and deep dive, do slice and dice the data to not only understand what happened, but why it happened and come up with hypotheses to figure out what I want to do with it. So the interactivity is important and the real-timeliness now it becomes important. So the way we think about it is like, once you go into real-time analytics, you know, the data never stops coming. That's obvious. Data freshness is important. But the queries never stop coming also because one, when your dashboards and metrics are getting up to date real-time, you really want alerts and anomaly detection to be automatically built in. And so you don't even have to look at the graphs once a week. When something is off, the system will come and tap on your shoulder and say, "Hey, something is going on." And so that really is a real-time application at that point, because it's constantly looking at the data and querying on your behalf and only alerting you when something, actually, is interesting happening that you might need to look at. So yeah, the whole movement towards data applications and data intensive apps is a huge use case for us. I think most of our customers, I would say, are building a data application in one shape or form or another. >> And if I think of use cases like cutthroat customer 360, you know, as customers and consumers of whatever product or solution we're talking about, we expect that these brands know who we are, know what we've done with them, what we've bought, what to show me next is what I expect whether again, it's my bank or it's Instagram or something else. So that personalization approach is absolutely critical, and I imagine another big game changer, differentiator for the customers that use Rockset. What do you guys think about that? >> Absolutely, personalized recommendation is a huge use case. We see this all where we have, you know, Ritual is one of the customers. We have a case study on that, I think. They want to personalize. They generate offline recommendations for anything that the user is buying, but they want to use behavioral data from the product to personalize that experience and combine the two before they serve anything on the checkout lane, right? We also see in B2B companies, real-time analytics and data applications becoming a very important thing. And we have another customer, Command Alkon, who, you know, they have a supply chain platform for heavy construction and 80% of concrete in North America flows through their platform, for example. And what they want to know in real-time is reporting on how many concrete trucks are arriving at a big construction site, which ones are late and whatnot. And the real-time, you know, analytics needs to be accurate and needs to be, you know, up to the second, you know, don't tell me what trucks were, you know, coming like an hour ago. No, I need this right now. And so even in a B2B platform, we see that very similar trend trend where real-time reporting, real-time search, real-time indexing is actually a very, very important piece to the puzzle. And not just for B to C examples that you said, and the Instagram comment is also very appropriate because a hedge fund customer came to us and said, "I have kind of a dashboards built on top of like Snowflake. They're taking two to five seconds and I have certain parts of my dashboards, but I am actually having 50/60 visualizations. You do the math, it takes many minutes to load. And so they said, "Hey, you have some indexing deck. Can you make this faster?" Three weeks later, the queries that would take two to five seconds on a traditional warehouse or a cloud data warehouse came back in 18 milliseconds with Rockset. And so it is so fast that they said, you know, "If my internal dashboards are not as fast as Instagram, no one in my company uses it." These are their words. And so they are really, you know, the speed is really, really important. The scale is really, really important. Data freshness is important. If you combine all of these things and also make it simple for people to access with SQL-based, that's really the real unique value prop that we have a Rockset, which is what our customers love. >> You brought up something interesting, Venkat, that kind of made me think of the employee experience. You know, we always think of the customer 360. The customer experience with the employee experience, in my opinion, is inextricably linked. The employees have to have access to what they need to deliver and help these great customer relationships. And as you were saying, you know, the employees are expecting databases to be as fast as they see on Instagram, when they're, you know, surfing on their free time. Then adoption, I imagine, gets better, obviously, than the benefit from the end user and customers' perspective is that speed. Talk to me a little bit about how Rockset, and I would like to get both of your opinions here, is a facilitator of that employee productivity for your customers. >> This is a great question. In fact, the same hedge fund, you know, customer, I pushed them to go and measure how many times do people even look at all the data that you produce? (laughs) How many analysts and investors actually use your dashboards and ask them to go investigate at that. And one of the things that they eventually showed me was there was a huge uptake and their dashboards went from two to three second kind of like, you know, lags to 18 milliseconds. They almost got the daily active user for their own internal dashboards to be almost going from five people to the entire company, you know, so I think you're absolutely spot on. So it really goes back to, you know, really leveraging the data and actually doing something about it. Like, you know, if I ask a question and it's going to, you know, system is going to take 20 minutes to answer that, you know, I will probably not ask as many questions as I want to. When it becomes interactive and very, very fast, and all of a sudden, I not only start with a question and, you know, I can ask a follow-up question and then another follow-up question and make it really drive that to, you know, a conclusion and I can actually act upon it. And this really accelerates. So even if you kind of like, look at the macro, you hear these phrases, the world is going from batch to real-time, and in my opinion, when I look at this, people want to, you know, accelerate their growth. People want to make faster decisions. People want to get to, what can I do about this and get actionable insights. And that is not really going to come from systems that take 20 minutes to give a response. It's going to really come from systems that are interactive and real-time, and that's really the need for acceleration is what's really driving this movement from batch to real-time. And we're very happy to facilitate that and accelerate that moment. >> And it really drives the opportunity for your customers to monetize more and more data so that they can actually act on it, as you said, in real-time and do something about it, whether it's a positive experience or it is, you know, remediating a challenge. Last question guys, since we're almost out of time here, but I want to understand, talk to me about the Rockset-AWS partnership and what the value is for your customers. >> Okay, yeah. I'll get to that in a second, but I wanted to add something to your previous question. I think my opinion for all the customers that we see is that real-time analytics is addictive. Once they get used to it, they can go back to the old stuff. So this is what we have found with all our customers. So, yeah, for the AWS question, I think maybe Venkat can answer that better than me. >> Yeah, I mean, we love partnering with AWS. I think, they are the world's leader when it comes to public clouds. We have a lot of joint happy customers that are all AWS customers. Rockset is entirely built on top of AWS, and we love that. And there is a lot of integrations that Rockset natively comes with. So if you're already managing your data in AWS, you know, there are no data transfer costs or anything like that involved for you to also, you know, index that data in Rockset and actually build real-time applications and stream the data to Rockset. So I think the partnership goes in very, very deep in terms of like, we are an AWS customer, we are a partner and we, you know, our go-to market teams work with them. And so, yeah, we're very, very happy, you know, like, AWS fanboys here, yeah. >> Excellent, it sounds like a very great synergistic collaborative relationship, and I love, Dhruba, what you said. This is like, this is a great quote. "Real-time analytics is addictive." That sounds to me like a good addiction (all subtly laugh) for businesses and every industry to take out. Guys, it's been a pleasure talking to you. Thank you for joining me, talking to the audience about Rockset, what differentiates you, and how you're helping customers really improve their customer productivity, their employee productivity, and beyond. We appreciate your time. >> Thanks, Lisa. >> Thank you, thanks a lot. >> For my guests, I'm Lisa Martin. You're watching this "Cube Conversation". (bright ending music)

Published Date : Sep 14 2021

SUMMARY :

And I'm pleased to welcome the reach of every company. And we are on a mission to make, you know, How do you make it more is the DevOps world, you know, that you saw that really the new trend is that how can you index for businesses, you know, And the best part is you don't What do you mean by that? And then also you don't that the consumers are demanding? Venkat, do you have anything to add? that you might need to look at. you know, as customers and And the real-time, you And as you were saying, you know, So it really goes back to, you know, a positive experience or it is, you know, the customers that we see and stream the data to Rockset. and I love, Dhruba, what you said. For my guests, I'm Lisa Martin.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Lisa MartinPERSON

0.99+

AWSORGANIZATION

0.99+

RocksetORGANIZATION

0.99+

FacebookORGANIZATION

0.99+

20 minutesQUANTITY

0.99+

Dhruba BorthakurPERSON

0.99+

2016DATE

0.99+

twoQUANTITY

0.99+

80%QUANTITY

0.99+

100%QUANTITY

0.99+

LisaPERSON

0.99+

five peopleQUANTITY

0.99+

last yearDATE

0.99+

GoogleORGANIZATION

0.99+

five secondsQUANTITY

0.99+

AmazonORGANIZATION

0.99+

oneQUANTITY

0.99+

Venkat VenkataramaniPERSON

0.99+

North AmericaLOCATION

0.99+

two categoriesQUANTITY

0.99+

18 millisecondsQUANTITY

0.99+

bothQUANTITY

0.99+

InstagramORGANIZATION

0.99+

DhrubaORGANIZATION

0.99+

SQLTITLE

0.99+

SnowflakeORGANIZATION

0.98+

one domainQUANTITY

0.98+

two gentlemenQUANTITY

0.98+

thirdQUANTITY

0.98+

Three weeks laterDATE

0.97+

three secondQUANTITY

0.97+

two trendsQUANTITY

0.97+

One thingQUANTITY

0.96+

secondQUANTITY

0.96+

VenkatORGANIZATION

0.95+

RitualORGANIZATION

0.93+

an hour agoDATE

0.92+

both partsQUANTITY

0.91+

once a weekQUANTITY

0.91+

SnowflakeTITLE

0.9+

one big use caseQUANTITY

0.89+

50/60QUANTITY

0.89+

few weeks agoDATE

0.87+

one shapeQUANTITY

0.86+

Cube ConversationTITLE

0.84+

Bob Wise, AWS & Peder Ulander, AWS | Red Hat Summit 2021 Virtual Experience


 

(smart gentle music) >> Hey, welcome back everyone to theCUBE's coverage of Red Hat Summit 2021 virtual. I'm John Furrier, host of theCUBE, got two great guests here from AWS, Bob Wise, General Manager of Kubernetes for Amazon Web Services and Peder Ulander, Head of product marketing for the enterprise developer and open-source at AWS. Gentlemen, you guys are the core leaders in the AWS open-source initiatives. Thanks for joining us on theCUBE here for Red Hat Summit. >> Thanks for having us, John. >> Good to be here. >> So the innovation that's come from people building on top of the cloud has just been amazing. You guys, props to Amazon Web Services for constantly adding more and raising the bar on more services every year. You guys do that, and now public cloud has become so popular, and so important that now Hybrid has pushed the Edge. You got outpost with Amazon you see everyone following suit. It's pretty much clear vote of confidence from the customers that, Hybrid is the operating model of the future. And that really is about the Edge. So I want to chat with you about the open-source intersection there, so let's get into it. So we're here at Red Hat Summit. So Red Hat's an open-source company and timing is great for them. Now, part of IBM you guys have had a relationship with Red Hat for some time. Can you tell us about the partnership and how it's working together? >> Yeah, absolutely. Why don't I take that one? AWS and Red Hat have been strategic partners since, shoot, I think it's 2008 or so in the early days of AWS, when engaging with customers, we wanted to ensure that AWS was the best place for enterprises to run their Red Hat workloads. And this is super important when you think about, what Red Hat has accomplished with RHEL in the enterprise, it's running SAP, it's running Oracle's, it's running all different types of core business applications, as well as a lot of the new things that customers are innovating. And so having that relationship to ensure that not only did it work on AWS, but it actually scaled we had integration of services, we had the performance, the price all of the things that were so critical to customers was critical from day one. And we continue to evolve this relationship over time. As you see us coming into Red Hat Summit this year. >> Well, again, to the hard news here also the new service Red Hat OpenShift servers on AWS known as ROSA, the A for Amazon Red Hat OpenShift, A for Amazon Web Services, a clever acronym but really it's on AWS. What exactly is this service? What does it do? And who is it designed for? >> Well, I'll let me jump in on this one. Maybe let's start with the why? Why ROSA? Customers love using OpenShift, but they also want to use AWS. They want the best of both. So they want their peanut butter and their chocolate together in a single confection. A lot of those customers have deployed AWS, have deployed OpenShift on AWS. They want managed service simplified supply chain. We want to be able to streamline moving on premises, OpenShift workloads to AWS, naturally want good integration with AWS services. So as to the, what? Our new service jointly operated is supported by Red Hat and AWS to provide a fully managed to OpenShifts on AWS. So again, like lot of customers have been running OpenShift on AWS before this time, but of course they were managing it themselves typically. And so now they get a fully managed option with also simplified supply chain. Single support channels, single billing. >> You know, were talking before we came on camera about the acronym on AWS and people build on the clouds kind of like it's no big deal to say that, but I know it means something. I want to explain, you guys to explain this on because I know I've been scolded saying things on theCUBE that were kind of misspoken because it's easy to say, Oh yeah, I built that app. We built all this stuff on theCUBE was on AWS, but it's not on AWS. It means something from a designation standpoint what does on AWS mean? 'Cause this is OpenShift servers on AWS, we see this other companies have their products on AWS. This is specific designation. Can you share, please. >> John, when you see the branding of something like Red Hat on AWS, what that basically signals to our customers is that this is joint engineering work. This is the top of the strategic partners where we actually do a lot of joint engineering and work to make sure that we're driving the right integrations and the right experience, make sure that these things are accessible and discoverable in our console. They're treated effectively as a first-class service inside of the AWS ecosystem. So it's, there's not many of the on's, if you will. You think about SAP on VMware cloud, on AWS, and now Red Hat OpenShift on AWS, it really is that signal that helps give customers the confidence of tested, tried, trued, supported and validated service on top of AWS. And we think that's significantly better than anything else. It's easy to run an image on a VM and stuffed it into a cloud service to make it available, but customers want better, customer want tighter experiences. They want to be able to take advantage of all the great things that we have from a scale availability and performance perspective. And that's really what we're pushing towards. >> Yeah. I've seen examples specifically where when partners work with Amazon at that level of joint engineering, deeper partnerships. The results were pretty significant on the business side. So congratulations to you guys working with OpenShift and Red Hat, that's real testament to their product. But I got to ask you guys, pull the Amazon playbook out and challenge you guys, or just, create a new some commentary around the process of working backwards. Every time I talked to Andy Jassy, he always says, we work backwards from the customer and we get the requirements, and we're listening to customers. Okay, great. He loves that, he loves to say that it's true. I know that I've seen that. What is the customer work backwards document look like here? What is the, what was the need and what made this become such an important part of AWS? What was the, and then what are they saying now, now that the products out there? >> Well, OpenShift has a very wide footprint as does AWS. Some working backwards documents kind of write themselves, because now the customer demand is so strong that there's just no avoiding it. Now, it really just becomes about making sure you have a good plan so it becomes much more operational at that point. ROSA's definitely one of those services. We had so much demand and as a result, no surprise that we're getting a lot of enthusiasm for customers because so many of them asked us for it. (crosstalk) >> What's been the reaction in asking demand. That's kind of got the sense of that, but okay. So there's demand now, what's the what's the use cases? What are customers saying? What's the reaction been? >> Lot of the use cases are these Hybrid kind of use cases where a customer has a big OpenShift footprint. What we see from a lot of these customers is a strong demand for consistency in order to reduce IT sprawl. What they really want to do is have the smallest number of simplest environments they can. And so when customers that standardized on OpenShift really wants to be able to standardize OpenShifts, both in their on premises environment and on AWS and get managed service options just to remove the undifferentiated heavy lifting. >> Hey, what's your take on the product marketing side of this, where you got open-source becoming very enterprise specific, Red Hat's been there for a very long time. I've been user of Red Hat since the beginning and following them, and Linux, obviously is Linux where that's come from. But what features specifically jump out in this offering that customers are resonating around? What's the vibe here? >> John, you kind of alluded to it early on, which is I don't know that I'd necessarily call it Hybrid but the reality is our customers have environments that are on premises in the cloud and all the way out to the Edge. Today, when you think of a lot of solutions and services, it's a fractured experience that they have between those three locations. And one of our biggest commitments to our customers, just to make things super simple, remove the complexity do all of the hard work, which means, customers are looking for a consistent experience environment and tooling that spans data center to cloud, to Edge. And that's probably the biggest kind of core asset here for customers who might have standardized on OpenShift in the data centers. They come to the cloud, they want to continue to leverage those skills. I think probably one of the, an interesting one is we headed down in this path, we all know Delta Airlines. Delta is a great example of a customer who, joint customer, who have been doing stuff inside of AWS for a long time. They've been standardizing on Red Hat for a long time and bringing this together just gave them that simple extension to take their investment in Red Hat OpenShift and leverage their experience. And again, the scale and performance of what AWS brings them. >> Next question, what's next for a Red Hat OpenShift on AWS in your work with Red Hat. Where does this go next? What's the big to-do item, what do you guys see as the vision? >> I'm glad you mentioned open-source collaboration at the start there. We're taking to point out is that AWS works on the Kubernetes project upstream as does the Red Hat teams. So one of the ways that we collaborate with the Red Hat team is in open-source. One of those projects is on a new project called ACK. It was on controllers for Kubernetes and this is a kind of Kubernetes friendly way for my customers to use an API to manage AWS services. So that's one of the things that we're looking forward to as that goes GA wobbling out into both ROSA and onto our other services. >> Awesome. I got to ask you guys this while you're here, because it's very rare to get two luminaries within AWS on the open-source side. This has been a huge build-out over the many, many years for AWS, and some people really kind of don't understand kind of the position. So take a minute to clarify the position of AWS on open-source. You guys are very active in a lot of projects. You mentioned upstream with Kubernetes in other areas. I've had many countries with Adrian Cockcroft on this, as well as others within AWS. Huge proponents web services, I mean, you go back to the original Amazon. I mean, Jeff Barr was saying 15 years ago some of those API's are still in play here. API's back in 15 years ago, that was kind of not main stream at that time. So you had open standards, really made Amazon web services successful and you guys are continuing it but as the modern era is very enterprise, like and you see a lot of legacy, you seeing a lot more operations that they're going to be driven by open technologies that you guys are investing in. I'll take a minute to explain what AWS is doing and what you guys care about and your mission? >> Yeah. Well, why don't I start? And then we'll kick it over to Bob 'cause I think Bob can also talk about some of the key contribution sides, but the best way to think about it is kind of in three different pillars. So let's start with the first one, which is, around the fact of ensuring that our customer's favorite open-source projects run best on AWS. Since 2006, we've been helping our customers operationalize their open-source investments and really kind of achieve that scale and focus more on how they use and innovate on the products versus how they set up and run. And for myself being an open-source since the late 90s, the biggest opportunity, yet challenge was the access to the technology, but it still required you as a customer to learn how to set up, configure, operationalized support and sustain. AWS removes that heavy lifting and, again, back to that earlier point from the beginning of AWS, we helped customers scale and implement their Apache services, their database services, all of these different types of open-source projects to make them really work exceptionally well on AWS. And back to that point, make sure that AWS was the best place for their open-source projects. I think the second thing that we do, and you're seeing that today with what we're doing with ROSA and Red Hat is we partner with open-source leaders from Red Hat to Redis and Confluent to a number of different players out there, Grafana, and Prometheus, to even foundations like the LF and the CNCF. We partner with these leaders to ensure that we're working together to grow grow the overall experience and the overall the overall pie, if you will. And this kind of gets into that point you were making John in that, the old world legacy proprietary stuff, there's a huge chance for refresh and new opportunity and rethinking or modernization if you will, as you come into the cloud having the expertise and the partnerships with these key players is as enterprises move in, is so crucial. And then the third piece I'd like to talk about that's important to our open-source strategies is really around contribution. We have a number of projects that we've delivered ourselves. I think the two most recent ones that really come top of mind for me is, what we did with Babel Fish, as well as with OpenSearch. So contributing and driving a true open-source project that helps our customers, take advantage of things like an SQL, a proprietary to open-source SQL conversion tool, or what we're doing to make Elasticsearch, the opportune or the primary open platform for our customers. But it's not just about those services, it's also collaborating with key industry initiatives. Bob's at the forefront of that with what we're doing with the CNCF around things, like Kubernetes and Prometheus et cetera, Bob you want to jump in on some of that? >> Sure, I think the one thing I would add here is that customers love using those open-source projects. The one of the challenges with them frequently is security. And this is job zero to AWS. So a lot of the collaboration work we do, a lot of the work that we do on upstream projects is go specifically around kind of security oriented things because that is what customers expect when they come to get a managed service at AWS. Some of those efforts are somewhat unsung because you generally do more work and less talk, in security oriented things. But projects across AWS, that's always a key contribution focus for us. >> Good way to call out security too. I think that's being built-in to the everything now, that's an operating model. People call it shift-left day two operations. Whatever you want to look at it. You got this nice formation going between under the hood kind of programmability of the infrastructure at scale. And then you have the modern application development which is just beginning, programmable DevSecOps. It's funny, Bob, I'd love to get your take on this because I remember in the 80s and during the Unix generation I used to peddle software under the table. Like, here's a copy of, you just don't tell anyone, people in the younger generation don't get the fact that it wasn't always open. And so now you have open and you have this idea of an enterprise that's going to be a system management system view. So you got engineering and you got computer science kind of coming together, this SRE middle layer. You're hearing that as a, kind of a new discipline. So DevOps kind of has won. I mean, we kind of knew this for many, many years. I said this in 2013 on theCUBE actually at re-inventing. I just recently shared that clip. But okay, now you've got SecOps, DevSecOps. So now you have an era where it's a system thinking and open-source is driving all of that. So can you share your perspective because this is kind of where the puck is going. It's an open to open world. That's going to have to be open and scalable. How does open-source and you guys take it to the next level to give that same scale and reliability? What's your vision? >> The key here is really around automation and what we're seeing you could look at Kubernetes. Kubernetes, is essentially a robot. It was like the early design of it was built around robotics principles. So it's a giant software robot and the world has changed. If you just look at the influx of all kinds of automation to not just the DevOps world but to all industries, you see a similar kind of trend. And so the world of IT operations person is changing from doing the work that the robot did and replacing it with the robot to managing large numbers of robots. And in this case, the robots are like a little early and a little hard to talk to. And so, you end up using languages like YAML and other things, but it turns out robots still just do what you tell them to do. And so one of the things you have to do is be really, really careful because robots will go and do whatever it is you ask them to do. On the other hand, they're really, really good at doing that. So in the security area, they take the research points to the largest single source of security issues, being people making manual mistakes. And a lot of people are still a little bit terrified if human beings aren't touching things on the way to production. In AWS, we're terrified if humans aren't touching it. And that is a super hard chasm to cross and open-source projects have really, are really playing a big role in what's really a IT wide migration to a whole new set of, not just tools, but organizational approaches. >> What's your reaction to that? Because we're talking that essentially software concepts, because if you write bad code, the code will execute what you did. So assuming it compiles left in the old days. Now, if you're going to scale a large scale operations that has dynamic capabilities, services being initiated in terminating tear down up started, you need the automation, but if you really don't design it right, you could be screwed. This is a huge deal. >> This is one reason why we've put so much effort into getops that you can think of it as a more narrowly defined subset of the DevOps world with a specific set of principles around using kind of simplified declarative approaches, along with robots that converge the desired state, converge the system to the desired state. And when you get into large distributed systems, you end up needing to take those kinds of approaches to get it to work at scale. Otherwise you have problems. >> Yeah, just adding to that. And it's funny, you said DevOps has won. I actually think DevOps has won, but DevOps hasn't changed (indistinct) Bob, you were right, the reality is it was founded back what quite a while ago, it was more around CICD in the enterprise and the closed data center. And it was one of those where automation and runbooks took addressed the fact that, every pair of hands between service requests and service delivery recreated or created an issue. So that growth and that mental model of moving from a waterfall, agile to DevOps, you built it, you run it, type of a model, I think is really, really important. But as it comes out into the cloud, you no longer have those controls of the data center and you actually have infinite scale. So back to your point of you got to get this right. You have to architect correctly you have to make sure that your code is good, you have to make sure that you have full visibility. This is where it gets really interesting at AWS. And some of the things that we're tying in. So whether we're talking about getops like what Bob just went through, or what you brought up with DevSecOps, you also have things like, AIOps. And so looking at how we take our machine learning tools to really implement the appropriate types of code reviews to assessing your infrastructure or your choices against well-architected principles and providing automated remediation is key, adding to that is observability, developers, especially in a highly distributed environment need to have better understanding, fidelity and touchpoints of what's going on with our application as it runs in production. And so what we do with regards to the work we have in observability around Grafana and Prometheus projects only accelerate that co-whole concept of continuous monitoring and continuous observability, and then kind of really, adding to that, I think it was last month, we introduce our fault injection simulator, a chaos engineering tool that, again takes advantage of all of this automation and machine learning to really help our developers, our customers operate at scale. And make sure that when they are releasing code, they're releasing code that is not just great in a small sense, it works on my laptop, but it works great in a highly distributed massively scaled environment around the globe. >> You know, this is one of the things that impresses me about Red Hat this year. And I've said this before all the covers events I've covered with them is that they get the cloud scale piece and I think their relationship with you guys shows that I think, DevOps has won, but it's the gift that keeps giving in open-source because what you have here is no longer a conversation about the cloud moving to the cloud. It's the cloud has become the operating model. So the conversation shifts to much more complicated enterprise or, and or intelligent Edge, and whether it's industrial or human or whatever, you got a data problem. So that's about a programmability issue at scale. So what's interesting is that Red Hat is on those bandwagon. It's an operating system. I mean, basically it's a distributed computing paradigm, essentially ala AWS concept as a cloud. Now it goes to the Edge, it's just distributed services via an open-source. So what's your reaction to that? >> Yeah, it's back to the original point, John where I said, any CIO is thinking about their IT environment from data center to cloud, to Edge and the more consistency automation and, kind of tools that they're at their disposal to enable them to create that kind of, I think you started to talk about an infrastructure the whole as code infrastructure's code, it's now, almost everything is code. And that starts with the operating system, obviously. And that's why this is so critical that we're partnering with companies like Red Hat on our vision and their vision, because they aligned to where our customers were ultimately going. Bob, you want to, you want to add to that? >> Bob: No, I think you said it. >> John: You guys are crushing it. Bob, one quick question for you, while I got you here. You mentioned getops, I've heard this before, I kind of understand it. Can you just quickly define from your perspective. What is getops? >> Sure, well, getops is really taking the, I said before it's a kind of narrowed version of DevOps. Sure, it's infrastructure is code. Sure, you're doing things incrementally but the getops principle, it's back to like, what are the good, what are the best practices we are managing large numbers, large numbers of robots. And in this case, it's around this idea of declarative intent. So instead of having systems that reach into production and change things, what you do is you set up the defined declared state of the system that you want and then leave the robots to constantly work to converge the state there. That seems kind of nebulous. Let me give you like a really concrete example from Kubernetes, by the way the entire Kubernetes system design is based on this. You say, I want five pods running in production and that's running my application. So what Kubernetes does is it sits there and it constantly checks, Oh, I'm supposed to have five pods. Do I have five? Well, what happens if the machine running one of those pods goes away. Now, suddenly it goes and checks and says, Oh, I'm supposed to have five pods, but there's four pods. What action do I take to now try to get the system back to the state. So you don't have a system running, reaching out and checking externally to Kubernetes, you let Kubernetes do the heavy lifting there. And so it goes through, goes through a loop of, Oh, I need to start a new pod and then it converges the system state back to running five pods. So it's really taking that kind of declarative intent combined with constant convergence loops to fully production at scale. >> That's awesome. Well, we do a whole segment on state and stateless future, but we don't have time. I do want to summarize real quick. We're here at the Red Hat Summit 2021. You got Red Hat OpenShift on AWS. The big news, Bob and Peder tell us quickly in summary, why AWS? Why Red Hat? Why better together? Give the quick overview, Bob, we'll start with you. >> Bob, you want to kick us off? >> I'm going to repeat peanut butter and chocolate. Customers love OpenShift, they love managed services. They want a simplified operations, simplified supply chain. So you get the best of both worlds. You get the OpenShift that you want fully managed on AWS, where you get all of the security and scale. Yeah, I can't add much to that. Other than saying, Red Hat is powerhouse obviously on data centers it is the operating system of the data center. Bringing together the best in the cloud, with the best in the data center is such a huge benefit to our customers. Because back to your point, John, our customers are thinking about what are they doing from data center to cloud, to Edge and bringing the best of those pieces together in a seamless solution is so, so critical. And that that's why AW. (indistinct) >> Thanks for coming on, I really appreciate it. I just want to give you guys a plug for you and being humble, but you've worked in the CNCF and standards bodies has been well, well known and I'm getting the word out. Congratulations for the commitment to open-source. Really appreciate the community. Thanks you, thank you for your time. >> Thanks, John. >> Okay, Cube coverage here, covering Red Hat Summit 2021. I'm John Ferry, host of theCUBE. Thanks for watching. (smart gentle music)

Published Date : Apr 27 2021

SUMMARY :

in the AWS open-source initiatives. And that really is about the Edge. And so having that relationship to ensure also the new service Red Red Hat and AWS to kind of like it's no big deal to say that, of the on's, if you will. But I got to ask you guys, pull the Amazon because now the customer That's kind of got the Lot of the use cases are of this, where you got do all of the hard work, which what do you guys see as the vision? So one of the ways that we collaborate I got to ask you guys this the overall pie, if you will. So a lot of the collaboration work we do, And so now you have open And so one of the things you have to do the code will execute what you did. into getops that you can of the data center and you So the conversation shifts to and the more consistency automation and, I kind of understand it. of the system that you want We're here at the Red Hat Summit 2021. in the cloud, with the best I just want to give you guys a I'm John Ferry, host of theCUBE.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
AmazonORGANIZATION

0.99+

JohnPERSON

0.99+

Jeff BarrPERSON

0.99+

AWSORGANIZATION

0.99+

John FerryPERSON

0.99+

ROSAORGANIZATION

0.99+

Adrian CockcroftPERSON

0.99+

Bob WisePERSON

0.99+

BobPERSON

0.99+

RedisORGANIZATION

0.99+

twoQUANTITY

0.99+

IBMORGANIZATION

0.99+

2013DATE

0.99+

Andy JassyPERSON

0.99+

John FurrierPERSON

0.99+

Red HatORGANIZATION

0.99+

DeltaORGANIZATION

0.99+

Amazon Web ServicesORGANIZATION

0.99+

2008DATE

0.99+

LFORGANIZATION

0.99+

fiveQUANTITY

0.99+

Amazon Web ServicesORGANIZATION

0.99+

Delta AirlinesORGANIZATION

0.99+

CNCFORGANIZATION

0.99+

five podsQUANTITY

0.99+

Red Hat OpenShiftTITLE

0.99+

GrafanaORGANIZATION

0.99+

Red HatTITLE

0.99+

five podsQUANTITY

0.99+

Amazon Web ServicesORGANIZATION

0.99+

KubernetesORGANIZATION

0.99+

Jim Long, Sarbjeet Johal, and Joseph Jacks | CUBEConversation, February 2019


 

(lively classical music) >> Hello everyone, welcome to this special Cube conversation, we are here at the Power Panel Conversation. I'm John Furrier, in Palo Alto, California, theCUBE studies we have remote on the line here, talk about the cloud technology's impact on entrepreneurship and startups and overall ecosystem is Jim Long, who's the CEO of Didja, which is a startup around disrupting digital TV, also has been an investor and a serial entrepreneur, Sarbjeet Johal, who's the in-cloud influencer of strategy and investor out of Berkeley, California, The Batchery, and also Joseph Jacks, CUBE alumni, actually you guys are all CUBE alumni, so great to have you on. Joseph Jacks is the founder and general partner of OSS Capital, Open Source Software Capital, a new fund that's been raised specifically to commercialize and fund startups around open source software. Guys, we got a great panel here of experts, thanks for joining us, appreciate it. >> Go Bears! >> Nice to be here. >> So we have a distinguished panel, it's the Power Panel, we're on cloud technos, first I'd like to get you guys' reaction you know, you're to seeing a lot of negative news around what Facebook has become, essentially their own hyper-scale cloud with their application. They were called the digital, you know, renegades, or digital gangsters in the UK by the Parliament, which was built on open source software. Amazon's continuing to win, Azure's doing their thing, bundling Office 365, making it look like they've got more revenue with their catching up, Google, and then you got IBM and Oracle, and then you got an ecosystem that's impacted by this large scale, so I want to get your thoughts on first point here. Is there room for more clouds? There's a big buzzword around multiple clouds. Are we going to see specialty clouds? 'Causes Salesforce is a cloud, so is there room for more cloud? Jim, why don't you start? >> Well, I sure hope so. You know, the internet has unfortunately become sort of the internet of monopolies, and that doesn't do anyone any good. In fact, you bring up an interesting point, it'd be kind of interesting to see if Facebook created a social cloud for certain types of applications to use. I've no idea whether that makes any sense, but Amazon's clearly been the big gorilla now, and done an amazing job, we love using them, but we also love seeing, trying out different services that they have and then figuring out whether we want to develop them ourselves or use a specialty service, and I think that's going to be interesting, particularly in the AI area, stuff like that. So I sure hope more clouds are around for all of us to take advantage of. >> Joseph, I want you to weigh in here, 'cause you were close to the Kubernetes trend, in fact we were at a OpenStack event when you started Kismatic, which is the movement that became KubeCon Cloud Native, many many years ago, now you're investing in open source. The world's built on open source, there's got to be room for more clouds. Your thoughts on the opportunities? >> Yeah, thanks for having me on, John. I think we need a new kind of open collaborative cloud, and to date, we haven't really seen any of the existing major sort of large critical mass cloud providers participate in that type of model. Arguably, Google has probably participated and contributed the most in the open source ecosystem, contributing TensorFlow and Kubernetes and Go, lots of different open source projects, but they're ultimately focused on gravitating huge amounts of compute and storage cycles to their cloud platform. So I think one of the big missing links in the industry is, as we continue to see the rise of these large vertically integrated proprietary control planes for computing and storage and applications and services, I think as the open source community and the open source ecosystem continues to grow and explode, we'll need a third sort of provider, one that isn't based on monopoly or based on a traditional proprietary software business like Microsoft kind of transitioning their enterprise customers to services, sort of Amazon in the first camp vertically integrated many a buffet of all these different compute, storage, networking services, application, middleware. Microsoft focused on sort of building managed services of their software portfolio. I think we need a third model where we have sort of an open set of interfaces and an open standards based cloud provider that might be a pure software company, it might be a company that builds on the rails and the infrastructure that Amazon has laid down, spending tens of billions in cap ex, or it could be something based on a project like Kubernetes or built from the community ecosystem. So I think we need something like that just to sort of provide, speed the innovation, and disaggregate the services away from a monolithic kind of closed vendor like Amazon or Azure. >> I want to come back to that whole startup opportunity, but I want to get Sarbjeet in here, because we've been in the B2B area with just last week at IBM Think 2019. Obviously they're trying to get back into the cloud game, but this digital transformation that has been the cliche for almost a couple of years now, if not five or plus. Business has got to move to the cloud, so there's a whole new ball game of complete cultural shift. They need stability. So I want to talk more about this open cloud, which I love that conversation, but give me the blocking and tackling capabilities first, 'cause I got to get out of that old cap ex model, move to an operating model, transform my business, whether it's multi clouds. So Sarbjeet, what's your take on the cloud market for say, the enterprise? >> Yeah, I think for the enterprise... you're just sitting in that data center and moving those to cloud, it's a cumbersome task. For that to work, they actually don't need all the bells and whistles which Amazon has in the periphery, if you will. They need just core things like compute, network, and storage, and some other sort of services, maybe database, maybe data share and stuff like that, but they just want to move those applications as is to start with, with some replatforming and with some changes. Like, they won't make changes to first when they start moving those applications, but our minds are polluted by this thinking. When we see a Facebook being formed by a couple of people, or a company of six people sold for a billion dollars, it just messes up with our mind on the enterprise side, hey we can do that too, we can move that fast and so forth, but it's sort of tragic that we think that way. Well, having said that, and I think we have talked about this in the past. If you are doing anything in the way of systems innovation, if your building those at, even at the enterprise, I think cloud is the way to go. To your original question, if there's room for newer cloud players, I think there is, provided that we can detach the platforms from the environments they are sitting on. So the proprietariness has to kinda, it has to be lowered, the degree of proprietariness has to be lower. It can be through open source I think mainly, it can be from open technologies, they don't have to be open source, but portable. >> JJ was mentioning that, I think that's a big point. Jim Long, you're an entrepreneur, you've been a VC, you know all the VCs, been around for a while, you're also, you're an entrepreneur, you're a serial entrepreneur, starting out at Cal Berkeley back in the day. You know, small ideas can move fast, and you're building on Amazon, and you've got a media kind of thing going on, there's a cloud opportunity for you, 'cause you are cloud native, 'cause you're built in the cloud. How do you see it playing out? 'Cause you're scaling with Amazon. >> Well, so we obviously, as a new startup, don't have the issues the enterprise folks have, and I could really see the enterprise customers, what we used to call the Fortune 500, for example, getting together and insisting on at least a base set of APIs that Amazon and Microsoft et cetera adopt, and for a startup, it's really about moving fast with your own solution that solves a problem. So you don't necessarily care too much that you're tied into Amazon completely because you know that if you need to, you can make a change some day. But they do such a good job for us, and their costs, while they can certainly be lower, and we certainly would like more volume discounts, they're pretty darn amazing across the network, across the internet, we do try to price out other folks just for the heck of it, been doing that recently with CDNs, for example. But for us, we're actually creating a hybrid cloud, if you will, a purpose-built cloud to support local television stations, and we do think that's going to be, along with using Amazon, a unique cloud with our own APIs that we will hopefully have lots of different TV apps use our hybrid cloud for part of their application to service local TV. So it's kind of a interesting play for us, the B2B part of it, we're hoping to be pretty successful as well, and we hope to maybe have multiple cloud vendors in our mix, you know. Not that our users will know who's behind us, maybe Amazon, for something, Limelight for another, or whatever, for example. >> Well you got to be concerned about lock-in as you become in the cloud, that's something that everybody's worried about. JJ, I want to get back to you on the investment thesis, because you have a cutting edge business model around investing in open source software, and there's two schools of thought in the open source community, you know, free contribution's great, and let tha.t be organic, and then there's now commercialization. There's real value being created in open source. You had put together a chart with your team about the billions of dollars in exits from open source companies. So what are you investing in, what do you see as opportunities for entrepreneurs like Jim and others that are out there looking at scaling their business? How do you look at success, what's your advice, what do you see as leading indicators? >> I think I'll broadly answer your question with a model that we've been thinking a lot about. We're going to start writing publicly about it and probably eventually maybe publish a book or two on it, and it's around the sort of fundamental perspective of creating value and capturing value. So if you model a famous investor and entrepreneur in Silicon Valley who has commonly modeled these things using two different letter variables, X and Y, but I'll give you the sort of perspective of modeling value creation and value capture around open source, as compared to closed source or proprietary software. So if you look at value creation modeled as X, and value capture modeled as Y, where X and Y are two independent variables with a fully proprietary software company based approach, whether you're building a cloud service or a proprietary software product or whatever, just a software company, your value creation exponent is typically bounded by two things. Capital and fundraising into the entity creating the software, and the centralization of research and development, meaning engineering output for producing the software. And so those two things are tightly coupled to and bounded to the company. With commercial open source software, the exact opposite is true. So value creation is decoupled and independent from funding, and value creation is also decentralized in terms of the research and development aspect. So you have a sort of decentralized, community-based, crowd-sourced, or sort of internet, global phenomena of contributing to a code base that isn't necessarily owned or fully controlled by a single entity, and those two properties are sort of decoupled from funding and decentralized R and D, are fundamentally changing the value creation kind of exponent. Now let's look at the value capture variable. With proprietary software company, or proprietary technology company, you're primarily looking at two constituents capturing value, people who pay for accessing the service or the software, and people who create the software. And so those two constituents capture all the value, they capture, you know, the vendor selling the software captures maybe 10 or 20% of the value, and the rest of the value, I would would express it say as the customer is capturing the rest of the value. Most economists don't express value capture as capturable by an end user or a customer. I think that's a mistake. >> Jim, you're-- >> So now... >> Okay, Jim, your reaction to that, because there's an article went around this weekend from Motherboard. "The internet was built on free labor "of open source developers. "Is that sustainable?" So Jim, what's your reaction to JJ's comments about the interactions and the dynamic between value creation, value capture, free versus sustainable funding? >> Well if you can sort of mix both together, that's what I would like, I haven't really ever figured out how to make open source work in our business model, but I haven't really tried that hard. It's an intriguing concept for sure, particularly if we come up with APIs that are specific to say, local television or something like that, and maybe some special processes that do things that are of interest to the wider community. So it's something I do plan to look at because I do agree that if you, I mean we use open source, we use this thing called FFmpeg, and several other things, and we're really happy that there's people out there adding value to them, et cetera, and we have our own versions, et cetera, so we'd like to contribute to the community if we could figure out how. >> Sarbjeet, your reactions to JJ's thesis there? >> I think two things. I will comment on two different aspects. One is the lack of standards, and then open source becoming the standard, right. I think open source kind of projects take birth and life in its own, because we have lack of standard, 'cause these different vendors can't agree on standards. So remember we used to have service-oriented architecture, we have Microsoft pushing some standards from one side and IBM pushing from other, SOAP versus xCBL and XML, different sort of paradigms, right, but then REST API became the de facto standard, right, it just took over, I think what REST has done for software in last about 10 years or so, nothing has done that for us. >> well Kubernetes is right now looking pretty good. So if you look at JJ, Kubernetes, the movement you were really were pioneering on, it's having similar dynamic, I mean Kubernetes is becoming a forcing function for solidarity in the community of cloud native, as well as an actual interoperable orchestration layer for multiple clouds and other services. So JJ, your thoughts on how open source continues as some of these new technologies, like Kubernetes, continue to hit the scene. Is there any trajectory change in open source that you see, that you could share, I'd love to get your insights on what's next behind, you know, the rise of Kubernetes is happening, what's next? >> I think more abstractly from Kubernetes, we believe that if you just look at the rate of innovation as a primary factor for progress and forward change in the world, open source software has the highest rate of innovation of any technology creation phenomena, and as a consequence, we're seeing more standards emerge from the open source ecosystem, we're seeing more disruption happen from the open source ecosystem, we're seeing more new technology companies and new paradigms and shifts happen from the open source ecosystem, and kind of all progress across the largest, most difficult sort of compound, sensitive problems, influenced and kind of sourced from the open source ecosystem and the open source world overall. Whether it's chip design, machine learning or computing innovations or new types of architectures, or new types of developer paradigms, you know, biological breakthroughs, there's kind of things up and down the technology spectrum that have a lot to sort of thank open source for. We think that the future of technology and the future of software is really that open source is at the core, as opposed to the periphery or the edges, and so today, every software technology company, and cloud providers included, have closed proprietary cores, meaning that where the core is, the data path, the runtime, the core business logic of the company, today that core is proprietary software or closed source software, and yet what is also true, is at the edges, the wrappers, the sort of crust, the periphery of every technology company, we have lots of open source, we have client libraries and bindings and languages and integrations, configuration, UIs and so on, but the cores are proprietary. We think the following will happen over the next few decades. We think the future will gradually shift from closed proprietary cores to open cores, where instead of a proprietary core, an open core is where you have core open source software project, as the fundamental building block for the company. So for example, Hadoop caused the creation of MapR and Cloudera and Hortonworks, Spark caused the creation of Databricks, Kafka caused the creation of Confluent, Git caused the creation of GitHub and GitLab, and this type of commercial open source software model, where there's a core open source project as the kernel building block for the company, and then an extension of intellectual property or wrappers around that open source project, where you can derive value capture and charge for licensed product with the company, and impress customer, we think that model is where the future is headed, and this includes cloud providers, basically selling proprietary services that could be based on a mixture of open source projects, but perhaps not fundamentally on a core open source project. Now we think generally, like abstractly, with maybe somewhat of a reductionist explanation there, but that open core future is very likely, fundamentally because of the rate of innovation being the highest with the open source model in general. >> All right, that's great stuff. Jim, you're a historian of tech, you've lived it. Your thoughts on some of the emerging trends around cloud, because you're disrupting linear TV with Didja, in a new way using cloud technology. How do you see cloud evolving? >> Well, I think the long lines we discussed, certainly I think that's a really interesting model, and having the open source be the center of the universe, then figure out how to have maybe some proprietary stuff, if I can use that word, around it, that other people can take advantage of, but maybe you get the value capture and build a business on that, that makes a lot of sense, and could certainly fit in the TV industry if you will from where I sit... Bring services to businesses and consumers, so it's not like there's some reason it wouldn't work, you know, it's bound to, it's bound to figure out a way, and if you can get a whole mass of people around the world working on the core technology and if it is sort of unique to what mission of, or at least the marketplace you're going after, that could be pretty interesting, and that would be great to see a lot of different new mini-clouds, if you will, develop around that stuff would be pretty cool. >> Sarbjeet, I want you to talk about scale, because you also have experience working with Rackspace. Rackspace was early on, they were trying to build the cloud, and OpenStack came out of that, and guess what, the world was moving so fast, Amazon was a bullet train just flying down the tracks, and it just felt like Rackspace and their cloud, you know OpenStack, just couldn't keep up. So is scale an issue, and how do people compete against scale in your mind? >> I think scale is an issue, and software chops is an issue, so there's some patterns, right? So one pattern is that we tend to see that open source is now not very good at the application side. You will hardly see any applications being built as open source. And also on the extreme side, open source is pretty sort of lame if you will, at very core of the things, like OpenStack failed for that reason, right? But it's pretty good in the middle as Joseph said, right? So building pipes, building some platforms based on open source, so the hooks, integration, is pretty good there, actually. I think that pattern will continue. Hopefully it will go deeper into the core, which we want to see. The other pattern is I think the software chops, like one vendor has to lead the project for certain amount of time. If that project goes into sort of open, like anybody can grab it, lot of people contribute and sort of jump in very quickly, it tends to fail. That's what happened to, I think, OpenStack, and there were many other reasons behind that, but I think that was the main reason, and because we were smaller, and we didn't have that much software chops, I hate to say that, but then IBM could control like hundred parties a week, at the project >> They did, and look where they are. >> And so does HP, right? >> And look where they are. All right, so I'd love to have a Power Panel on open source, certainly JJ's been in the thick of it as well as other folks in the community. I want to just kind of end on lightweight question for you guys. What have you guys learned? Go down the line, start with Jim, Sarbjeet, and then JJ we'll finish with you. Share something that you've learned over the past three months that moved you or that people should know about in tech or cloud trends that's notable. What's something new that you've learned? >> In my case, it was really just spending some time in the last few months getting to know our end users a little bit better, consumers, and some of the impact that having free internet television has on their lives, and that's really motivating... (distorted speech) Something as simple as you might take for granted, but lower income people don't necessarily have a TV that works or a hotel room that has a TV that works, or heaven forbid they're homeless and all that, so it's really gratifying to me to see people sort of tuning back into their local media through television, just by offering it on their phone and laptops. >> And what are you going to do as a result of that? Take a different action, what's the next step for you, what's the action item? >> Well we're hoping, once our product gets filled out with the major networks, et cetera, that we actually provide a community attachment to it, so that we have over-the-air television channels is the main part of the app, and then a side part of the app could be any IP stream, from city council meetings to high schools, to colleges, to local community groups, local, even religious situations or festivals or whatever, and really try to tie that in. We'd really like to use local television as a way to strengthening all local media and local communities, that's the vision at least. >> It's a great mission you guys have at Didja, thanks for sharing that. Sarbjeet, what have learned over the past quarter, three months that was notable for you and the impact and something that changed you a little bit? >> What actually I have gravitated towards in last three to six months is the blockchain, actually. I was light on that, like what it can do for us, and is there really a thing behind it, and can we leverage it. I've seen more and more actually usage of that, and sort of full SCM, supply chain management and healthcare and some other sort of use cases if you will. I'm intrigued by it, and there's a lot of activity there. I think there's some legs behind it, so I'm excited about that. >> And are doing a blockchain project as a result, or are you still tire-kicking? >> No actually, I will play with it, I'm a practitioner, I play with it, I write code and play with it and see (Jim laughs) what does that level of effort it takes to do that, and as you know, I wrote the Alexa scale couple of weeks back, and play with AI and stuff like that. So I try to do that myself before I-- >> We're hoping blockchain helps even out the TV ad economy and gets rid of middle men and makes more trusting transactions between local businesses and stuff. At least I say that, I don't really know what I'm talking about. >> It sounds good though. You get yourself a new round of funding on that sound byte alone. JJ, what have you learned in the past couple months that's new to you and changed you or made you do something different? >> I've learned over the last few months, OSS Capital is a few months and change old, and so just kind of getting started on that, and it's really, I think potentially more than one decade, probably multi-decade kind of mostly consensus building effort. There's such a huge lack of consensus and agreement in the industry. It's a fascinatingly polarizing area, the sort of general topic of open source technology, economics, value creation, value capture. So my learnings over the past few months have just intensified in terms of the lack of consensus I've seen in the industry. So I'm trying to write a little bit more about observations there and sort of put thoughts out, and that's kind of been the biggest takeaway over the last few months for me. >> I'm sure you learned about all the lawyer conversations, setting up a fund, learnings there probably too right, (Jim laughs) I mean all the detail. All right, JJ, thanks so much, Sarbjeet, Jim, thanks for joining me on this Power Panel, cloud conversation impact, to entrepreneurship, open source. Jim Long, Sarbjeet Johal and Joseph Jacks, JJ, thanks for joining us, theCUBE Conversation here in Palo Alto, I'm John Furrier, thanks for watching. >> Thanks John. (lively classical music)

Published Date : Feb 20 2019

SUMMARY :

so great to have you on. Google, and then you got IBM and Oracle, sort of the internet of monopolies, there's got to be room for more clouds. and the open source that has been the cliche So the proprietariness has to kinda, Berkeley back in the day. across the internet, we do in the open source community, you know, and the rest of the value, about the interactions and the dynamic to them, et cetera, and we have One is the lack of standards, the movement you were and the future of software is really that How do you see cloud evolving? and having the open source be just flying down the tracks, and because we were smaller, and look where they are. over the past three months that moved you and some of the impact that of the app could be any IP stream, and the impact and something is the blockchain, actually. and as you know, I wrote the Alexa scale the TV ad economy and in the past couple months and agreement in the industry. I mean all the detail. (lively classical music)

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
JimPERSON

0.99+

IBMORGANIZATION

0.99+

Jim LongPERSON

0.99+

JJPERSON

0.99+

AmazonORGANIZATION

0.99+

OracleORGANIZATION

0.99+

SarbjeetPERSON

0.99+

MicrosoftORGANIZATION

0.99+

Sarbjeet JohalPERSON

0.99+

JosephPERSON

0.99+

JohnPERSON

0.99+

Joseph JacksPERSON

0.99+

OSS CapitalORGANIZATION

0.99+

FacebookORGANIZATION

0.99+

February 2019DATE

0.99+

GoogleORGANIZATION

0.99+

six peopleQUANTITY

0.99+

John FurrierPERSON

0.99+

Silicon ValleyLOCATION

0.99+

Palo AltoLOCATION

0.99+

10QUANTITY

0.99+

two thingsQUANTITY

0.99+

20%QUANTITY

0.99+

CUBEORGANIZATION

0.99+

Palo Alto, CaliforniaLOCATION

0.99+

fiveQUANTITY

0.99+

HPORGANIZATION

0.99+

twoQUANTITY

0.99+

two constituentsQUANTITY

0.99+

Open Source Software CapitalORGANIZATION

0.99+

UKLOCATION

0.99+

Office 365TITLE

0.99+

last weekDATE

0.99+

DidjaORGANIZATION

0.99+

two propertiesQUANTITY

0.99+

bothQUANTITY

0.98+

two schoolsQUANTITY

0.98+

OneQUANTITY

0.98+

first pointQUANTITY

0.98+

RackspaceORGANIZATION

0.98+

third modelQUANTITY

0.98+

first campQUANTITY

0.98+

AlexaTITLE

0.98+

Joseph Jacks, OSS Capital | CUBEConversation, October 2018


 

(bright symphony music) >> Hello, I'm John Furrier, the founder of SiliconANGLE Media and co-host of theCUBE. We're here in Paulo Alto at our studio here. I'm joining with Joseph Jacks, the founder and general partner of OSS Capital. Open Source Software Capital, is what OSS stands for. He's also the founder of KubeCon which now is part of the CNCF. It's a huge conference around Kubernetes. He's a cloud guy. He knows open source. Very well respected in the industry and also a great guest and friend of theCUBE, CUBE alumni. Joseph, great to see you. Also known as JJ. JJ, good to see you. >> Thank you for having me on again, John. >> Hey, great to have you come on. I know we've talked many times on theCUBE, but you've got some exciting news. You got a new firm, OSS Capital. Open Source Software, not operational support like a telco, but this is an investment opportunity where you're making investments. Congratulations. >> Thank you. >> So I know you can't talk about some of the specifics on the funds size, but you are actually going to go out, talk to entrepreneurs, make some equity investments. Around open source software. What's the thesis? How did you get here, why did you do it? What's motivating you, and what's the thesis? >> A lot of questions in there. Yeah, I mean this is a really profoundly huge year for open source software. On a bunch of different levels. I think the biggest kind of thing everyone anchors towards is GitHub being acquired by Microsoft. Just a couple of weeks ago, we had the two huge hadoop vendors join forces. That, I think, surprised a lot of people. MuleSoft, which is a big opensource middleware company, getting acquired by Salesforce just a year after going public. Just a huge outcome. I think one observation, just to sort of like summarize the year 2018, is actually, starting in January, almost on sort of like a monthly basis, we've observed a major sort of opensource software company outcome. And sort of kicking off the year, we had CoreOS getting acquired by Red Hat. Brandon and Alex, the founders over there, built a really interesting company in the Kubernetes ecosystem. And I think in February, Al Fresco, which is an open source content portal taking privatization outcome from a private equity firm, I believe in March we had Magento getting acquired by Adobe, which an open source based CMS. PHP CMS. So just a lot of activity for significant outcomes. Multibillion dollar outcomes of commercial open source companies. And open source software is something like 20 years old. 20 years in the making. And this year in particular, I've just seen just a huge amount of large scale outcomes that have been many years in the making from companies that have taken lots of venture funding. And in a lot of cases, sort of partially focused funding from different investors that have an affinity for open source software and sort of understand the uniqueness of the open source model when it's applied to business, when it's applied to company building. But more sort of opportunistic and sort of affinity oriented, as opposed to a pure focus. So that's kind of been part of the motivation. I'd say the more authentically compelling motivation for doing this is that it just needs to exist. This is sort of a model that is happening by necessity. We're seeing more and more software companies be open source software companies. So open source first. They're built in a distributed way. They're leveraging engineers and talent around the world. They're just part of this open source kind of philosophy. And they are fundamentally kind of commercial open source software companies. We felt that if you had a firm basically designed in a way to exclusively focus on those kind of companies, and where the firmware actually backed and supported by the founders of the largest commercial open source companies in the world before sort of the last decade. That could actually deliver a lot of value. So we've been sort of blogging a little bit about this. >> And you wrote a great post on it. I read about open source monetization. But I think one of the things I'm seeing as well that supports your thesis, and I like to get your reaction to it because I think this is something that's not really talked about, but open source is still young. I mean, you go back. I remember the days when we used to have to hide in the shadows to get licenses and pirate stuff and do all those crazy stuff. But now, it's only a couple decades away. The leaders that were investing were usually entrepreneurs that've been successful. The Rob Bearns, the Amar Wadhwa, the guy that did Spring. All these different open source. Linux, obviously, great success story. But there hasn't any been any institutional. Yeah, you got benchmark, other things, done some investments. A discipline around open source. Where open source is now table stakes in all software development. Cloud is scaling, scaling out globally. There's no real foc- There's never been a firm that's been focused on- Just open source from a commercial, while maintaining the purity and ethos of open source. I mean, is that. >> You agree? >> That's true. >> 100%, yeah. That's been the big part of creating the firm is aligning and solving for a pure focused structure. And I think what I'll say abstractly is this sort of venture capital, venture style approach to funding enterprise technology companies, software companies in general, has been to kind of find great entrepreneurs and in an abstract way that can build great technology companies. Can bring them to market, can sell them, and can scale them, and so on. And either create categories, or dominate existing categories, and disrupt incumbents, and so on. And I think while that has worked for quite a while, in the venture industry overall, in the 50, 60 years of the venture industry, lots of successful firms, I think what we're starting to see is a necessary shift toward accounting for the fundamental differences of opensource software as it relates to new technology getting created and going, and new software companies kind of coming into market. So we actually fundamentally believe that commercial open source software companies are fundamentally different. Functionally in almost every way, as compared to proprietary closed source software companies of the last 30 years. And the way we've sort of designed our firm and we'll about ten people pretty soon. We're just about a month in. We're growing the team quickly, but we're sort of a small, focused team. >> A ten's not focused small, I mean, I know venture firms that have two billion in management that don't have more than 20 people. >> Well, we have portfolio partners that are focused in different functional areas where commercial open source software companies have really fundamental differences. If you were to sort of stack rank, by function, where commercial open source software companies are really fundamentally different, sort of top to bottom. Legal would be, probably, the very top of the list. Right, in terms of license compliance management, structuring all the sort of protections and provisions around how intellectual property is actually shipped to and sold to customers. The legal licensing aspects. The commercial software licensing. This is quite a polarizing hot topic these days. The second big functional area where we have a portfolio partner focused on this is finance. Finance is another area where commercial open source software companies have to sort of behaviorally orient and apply that function very, very differently as compared to proprietary software companies. So we're crazy honored and excited to have world experts and very respected leaders in those different areas sort of helping to provide sort of different pillars of wisdom to our portfolio companies, our portfolio founders, in those different functional areas. And we provide a really focused kind of structure for them. >> Well I want to ask you the kind of question that kind of bridges the old way and new way, 'cause I definitely see you guys definitely being new and different, which is good. Or as Andy Jassy would say, you can be misunderstood for a while, but as you become successful, people will start understanding what you do. And that's a great example of Amazon. The pattern with success is traditionally the same. If we kind of encapsulate the difference between open source old and new, and that is you have something of value, and you're disrupting the market and collecting rents from it. Or revenue, or profit. So that's commercial, that's how businesses run. How are you guys going to disrupt with open source software the next generation value creation? We know how value's created, certainly in software that opensource has shown a path on how to create value in writing software if code is value and functionality's value. But to commercialize and create revenue, which is people paying something for something. That's a little bit different kind of value extraction from the value creation. So open source software can create value in functionality and value product. Now you bring it to the market, you get paid for it, you have to disrupt somebody, you have to create something. How are you looking at that? What's the vision of the creation, the extraction of value, who's disrupted, is it greenfield new opportunities? What's your vision? >> A lot of nuance and complexity in that question. What I would say is- >> Well, open source is creating products. >> Well, open source is the basis for creating products in a different kind of way. I'll go back to your question around let's just sort of maybe simplify it as the value creation and the value capture dynamics, right? We've sort of written a few posts about this, and it's subtle, but it's easy to understand if you look at it from a fundamental kind of perspective. We actually believe, and we'll be publishing research on this, and maybe even sort of more principled scientific, perhaps, even ways of looking at it. And then blog posts and research. We believe that open source software will always generate or create orders of magnitude more value than any constituent can capture. Right, and that's a fundamental way of looking at it. So if you see how cloud providers are capturing value that open source creates, whether it's Elasticsearch, or Postgres, or MySQL or Hadoop. And then commercial open source software companies that capture value that open source software creates, whether it's companies like Confluent around Kafka, or Cloudera around Hadoop, or Databricks around Apache Spark. Or whether it's the creators of those projects. The creators of Spark and Hadoop and Elasticsearch, sometimes many of them are the founders of those companies I mentioned, and sometimes they're not. We just believe regardless of how that sort of value is captured by the cloud providers, the commercial vendors, or the creators, the value created relative to the value captured will always be orders and orders of magnitude greater. And this is expressed in another way, which this may be easier to understand, it's a sort of reinforcing this kind of assertion that there's orders of magnitude value created far greater than what can be captured. If you were to do a survey, which we're currently in the process of doing, and I'm happy to sort of say that publicly for the first time here, of all the commercial open source software companies that have projects with large significant adoption, whether, say for example, it's Docker, with millions of users, or Apache Hadoop. How many Hadoop deployments there are. How many customers' companies are there running Hadoop deployments. Or it may be even MySQL. How many MySQL installations are there. And then you were to sort of survey those companies and see how many end users are there relative to how many customers are paying for the usage of the project. It would probably be something like if there were a million users of a given project, the company behind that project or the cloud provider, or say the end user, the developer behind the project, is unlikely to capture more than, say, 1% or a couple percent of those end users to companies, to paying companies, to paying customers. And many times, that's high. Many times, 1% to 2% is very high. Often, what we've seen actually anecdotally, and we're doing principled research around this, and we'll have data here across a large number of companies, many times it's a fraction of 1%. Which is just sort of maybe sometimes 10% of 1%, or even smaller. >> So the practitioners will be making more money than the actual vendors? >> Absolutely right. End users and practitioners always stand to benefit far greater because of the fundamental nature of open source. It's permissionless, it's disaggregated, the value creation dynamics are untethered, and it is fundamentally freely available to use, freely available to contribute to, with different constraints based on the license. However, all those things are sort of like disaggregating the creating of technology into sort of an unbounded network. And that's really, really incredible. >> Okay, so first of all, I agree with your premise 100%. We've seen it with CUBE, where videos are free. >> And that's a good thing. All those things are good. >> And Dave Vellante says this all the time on theCUBE. And we actually pointed this out and called this in the Hadoop ecosystem in 2012. In fact, we actually said that on theCUBE, and it turned out to be true, 'cause look at Hortonworks and Cloudera had to merge because, again, the market changed very quickly >> Value Creation. >> Because value >> Was created around them in the immediate cloud, etc. So the question is, that changes the valuation mechanisms. So if this true, which we believe it is. Just say it is. Then the traditional net present value cash flow metric of the value of the firm, not your firm, but, like, if I'm an open source firm, I'm only one portion of the extraction. I'm a supplier, and I'm an enabler, the valuation on cash flow might not be as great as the real impact. So the question I have for you, have you thought about the valuation? 'Cause now you're thinking about bigger construct community network effects. These are new dynamics. I don't think anyone's actually crunched a valuation model around this. So if someone knew that, say for example, an open source project created all this value, and they weren't necessarily harvesting it from a cash flow perspective, there might be other ways to monetize it. Have you though about that, and what's your reaction to that concept? 'Cause capitalism would kind of shake down the system. 'Cause why would someone be motivated to participate if they're not capturing any value? So if the value shifts, are they still going to be able to participate? You follow the logic I'm trying to- >> I definitely do. I think what I would say to that is we expect and we encourage and we will absolutely heavily invest in more business model innovation in the area of open source. So what I mean by that is, and it's important to sort of qualify a few things there. There's a huge amount of polarization and lack of consensus, lack of industry consensus on what it actually means to have or implement an open source based business model. In fact there's a lot of people who just sort of point blankedly assert that an opensource business model does not exist. We believe that many business models for monetizing and commercializing open source exist. We've blogged and written about a few of them. Their services and training and support. There's open core, which is very effective in sort of a spectrum of ways to implement open core. Around the core, you can have a thin crust or a thick crust. There's SAS. There are hardware based distribution models, things like Sourcefire, and Cumulus Networks. And there are also network based approaches. For example, project called Storj or Stor-J. Being developed and run now by Ben Golub, who's the former CEO of Docker. >> CUBE alumni. >> Ben's really great open source veteran. This is a network, kind of decentralized network based approach of sort of right sizing the production and consumption of the resource of a storage based open source project in a decentralized network. So those are sort of four or five ways to commercializing value, however, four or five ways of commercializing value, however what we believe is that there will be more business model innovation. There will be more developments around how you can better capture more, or in different ways, the value that open source creates. However, what I will say though, is it is unrealistic to expect two things. It is unrealistic and, in fact, unfair to expect that any of those constituents will contribute back to open source proportional to the value that they received from it, or the benefit, and I'm actually paraphrasing Doug Cutting there, who tweeted this a couple of years ago. Very profoundly deep, wise tweet, which I very strongly agree with. And it is also unrealistic to expect a second thing, which is that any of those constituents can capture a material portion of the value that open source creates, which I would assert is many trillions of dollars, perhaps tens of trillions of dollars. It's really hard to quantify that. And it's not just dollars in economic sense, it's dollars in productivity time saved, new markets, new areas, and so on. >> Yeah, I think this is interesting, and I think that we'll be an open book at that. But I will say that what I've observed in looking through all these CUBE interviews, I think that business model innovation absolutely is something that is an IP. >> We need it. Well, it's now intellectual property, the business model isn't, hey I went to business school, learned this at Babson or Harvard, I learned this business model. We're going to do SAS premium. Okay, I get that. There's going to be very interesting new innovations coming, and I think that's the new IP. 'Cause open source, if it's community based, there's going to be formulas. So that's going to be really inter- Okay, so now let's get back to actual funding itself. You guys are doing early stage. Can you take us through the approach? >> We're very focused on early stage, investing, and backing teams that are, just sort of welcoming the idea of a commercial entity around their open source project. Or building a business fundamentally dependent on an open source project or maybe even more than one. The reason for that is this is really where there's a lot of structural inefficiency in supporting and backing those types of founders. >> I think one of the things with ... is with that acquisition. They were pure on the open source side, doing a great job, didn't want to push the business model too hard because the open source, let's face it, you got people like, eh, I don't want to get caught on the business side, and get revenue, perverse incentives might come up, or fear of incentives that might be different or not aligned. Was a great a value. >> I think so. >> So Red Hat got a steal on that one. But as you go forward, there's going to be certainly a lot more stuff. We're seeing a lot of it now in CNCF, for instance. I want to get your thoughts on this because, being the co founder of KubeCon, and donating it to the CNCF, Kubernetes is the hottest thing on the planet, as we talked about many years ago. What's your take on that, now? I see exciting things happening. What is the impact of Kubernetes, in your opinion, to the world, and where do you see that evolving rapidly, and where is the focus here as the people should be paying attention to? >> I think that Kubernetes replaces EC2. Kubernetes is a disaggregated API for distributed computing anywhere. And it happens to be portable and able to run on any kind of computer infrastructure, which sort of makes it like a liquid disaggregated EC2-like API. Which a lot of people have been sort of chasing and trying to implement for many years with things like OpenStack or Eucalyptus. But interestingly, Kubernetes is sort of the right abstraction for distributed computing, because it meets people where they are architecturally. It's sort of aligned with this current movement around distributed systems first designs. Microservices, packaging things in small compartmentalized units. >> Good for integrating of existing stuff. >> Absolutely, and it's very composable, un-opinionated architecturally. So you can sort of take an application and structure it in any given way, and as long as it has this sort of isolation boundary of a container, you can run it on Kubernetes without needing to sort of retrofit the architecture, which is really awesome. I think Kubernetes is a foundational part of the next kind of computing paradigm in the same way that Linux was foundational to the computing paradigm that gave rise to the internet. We had commodity hardware meeting open source based sort of cost reduction and efficiency, which really Linux enabled, and the movement toward scale out data center infrastructure that supported the Internet's sort of maturity and infrastructure. I think we're starting to see the same type of repeat effect thanks to Kubernetes basically being really well received by engineers, by the cloud providers. It's now the universal sort of standard for running container based applications on the different cloud providers. >> And think having the non-technical opinion posture, as you said, architectural posture, allows it to be compatible with a new kind of heterogeneous. >> Heterogeneity is critical. >> Heterogeneity is key, 'cause it's not just within the environment, it's also within each vendor, or customer has more heterogeneity. So, okay, now that's key. So multi cloud, I want to get your thoughts on multi cloud, because now this goes into some of things that might build on top of if Kubernetes continues to go down the road that you say it does. Then the next question is, stateful applications, service meshes. >> A lot of buzz words. A lot of buzz words in there. Stateful application's real because at a certain point in time, you have a maturity curve with critical infrastructure that starts to become appealing for stateful mission critical storage systems, which is typically where you have all the crown jewels of a given company's infrastructure, whether it's a transactional system, or reading and writing core customer, or financial service information, or whatever it is. So Kubernetes' starting to hit this maturity curve where people are migrating really serious mission critical storage workloads onto that platform. And obviously we're going to start to see even more critical work loads. We're starting to see Edge workloads because Kubernetes is a pretty low footprint system, so you can run it on Edge devices, you can even run it on microcontrollers. We're sort of past the experimental, you know, fun and games was Raspberry Pi, sort of towers, and people actually legitimately doing real world Edge kind of deployments with Kubernetes. We're absolutely starting to see multi-geo, multi-replication, multi-cloud sort of style architectures becoming real, as well. Because Kubernetes is this API that the industry's agreeing upon sufficiently. We actually have agreement around this sort of surface area for distributed system style computing that if cloud providers can actually standardize on in a way that lets application specific vendors or new types of application deployment models innovate further, then we can really unlock this sort of tight coupling of proprietary services inside cloud providers and disaggregate it. Which is really exciting, and I forget the Netscape, Jim Barksdale. Bundling, un-bundling. We're starting to see the un-bundling of proprietary cloud computing service API's. Things like Kinesis, and ALB and ELB and proprietary storage services, and these other sticky services get un-bundled because of two big things. Open source, obviously, we have open source alternative data paths. And then we have Kubernetes which allows us to sort of disaggregate things out pretty easily. >> I want to hear your thoughts, one final concept, before we break, 'cause I was having a private conversation with three people besides myself. A big time CIO of a company that if I said the name everyone would go, oh my god, that guy is huge, he's seen it all going back many, many ways. Currently done a lot of innovation. A hardcore network chip guy who knows networking, old school infrastructure. And then a cloud native application founder who knows a lot about software development and is state-of-the-art cloud native. So cloud native, all experienced, old-school, kind of about my age, a cloud native app developer, a big time CIO, and a chip networking kind of infrastructure guy. And we're talking, and one thing that came out, I want to get you thoughts on this, he says, so what's going on with DevOps, how do you see this service mesh, is a stay for (mumbles) on top of the stack, no stacks, horizontally scalable. And the comment that came out was storage and networking have had this relationship with everything since day one. Network moves a packet from point A to point B, and nothing happens in between, maybe some inspection. And storage goes from here now to the then, because you store it. He goes, that premise moves up the stacks, so then the cloud native guy goes, well that's what's happening up at the top, there's a lot of moving things around, workloads and or services, provisioning services, and then from now to then state. In real time. And what dawned on the next conversation the CIO goes, well this is exactly our challenge. We have under the hood infrastructure being programmable, >> We're having some trouble with the connection. Please try again. >> My phone's calling me. >> Programmable connections. >> So you got the programmable on the top of the stack too, so the CIO said, that's exactly the problem we're trying to solve. We're trying to solve some of these network storage concepts now at an application level. Your thoughts to that. >> Well, I think if I could tease apart everything you just said, which is profound synthesis of a lot of different things, I think we've started to see application logic leak out of application code itself into dedicated layers that are really good at doing one specific thing. So traditionally we had some crud style kind of behavioral semantics implemented around business logic. And then, inside of that, you also had libraries for doing connectivity and lookups and service discovery and locking and key management and encryption and coordination with other types of applications. And all that stuff was sort of shoved into the single big application binary. And now, we're starting to see all those language runtime specific parts of application code sort of crack or leak out into these dedicated, highly scalable, Unix philosophy oriented sort of like layers. So things like Envoy are really just built for the sort of nervous system layer of application communication fabric up and down the layer two through layer seven sort of protocol transport stack, which is really profound. We're seeing things like Vault from Hashicorp handle secure key storage persistence of application dedication, authorization, metadata and information to sort of access different systems and end points. And that's a dedicated sort of stateful layer that you can sort of fragment out and delegate sort of application specific functionality to, which is really great for scalability reasons. And on, and on, and on. So we've started to see that, and I think one way of looking at that is it's a cycle. It's the sort of bundling and un-bundling aspect. >> One of the granny level services are getting a really low level- >> Yeah, it's a sort of like bundling and un-bundling and so we've got all this un-bundling happening out of application code to these dedicated layers. The bundling back may happen. I've actually seen a few Bay Area companies go like, we're going back to the monolith 'cause it actually gives us lots of efficiencies in things that we though were trade offs before. We're actually comfortable with a big monorepo, and one or two core languages, and we're going to build everything into these big binaries, and everyone's going to sort of live in the same source code repository and break things out through folders or whatever. There's a lot of really interesting things. I don't want to say we're sort of clear on where this bundling, un-bundling is happening, but I do think that there's a lot of un-bundling happening right now. And there's a lot of opportunity there. >> And the open source, obviously, driving it. So final question for you, how many deals have you done? Can you talk a little bit about the firm? And exciting things and plans that you have going forward. >> Yeah, we're going to be making a lot of announcements over the next few months, and we're, I guess, extremely thrilled. I don't want to say overwhelmed, 'cause we're able to handle all of the volume and inquiries and inbound interest. We're really honored and thrilled by the reception over the last couple weeks from announcing the firm on the first of October, sort of before the Hortonworks Cloudera merger. The JFrog funding announcement that week. The Elastic IPO. Just a lot of really awesome things happened that week. This is obviously before Microsoft open sourced all their patents. We'll be announcing more investments that we've made. We announced our first one on the first of October as well with the announcement of the firm. We've made a good number of investments. We're not able to talk to much about our first initiative, but you'll hear more about that in the near future. >> Well, we're excited. I think it's the timing's perfect. I know you've been working on this kind of vision for a while, and I think it's really great timing. Congratulations, JJ >> Thank you so much. Thanks for having me on. >> Joesph Jacks, also known as JJ, founder and general partner of OSS Capital, Open Source Software Capital, co founder of KubeCon, which is now part of the CNCF. A real great player in the community and the ecosystem, great to have him on theCUBE, thanks for coming in. I'm John Furrier, thanks for watching. >> Thanks, John. (bright symphony music)

Published Date : Oct 18 2018

SUMMARY :

Hello, I'm John Furrier, the founder of SiliconANGLE Media Hey, great to have you come on. on the funds size, but you are actually going to go out, And sort of kicking off the year, hide in the shadows to get licenses And the way we've sort of designed our firm that have two billion in management structuring all the sort of that kind of bridges the old way and new way, A lot of nuance and complexity in that question. Well, open source is the basis for creating products far greater because of the fundamental nature Okay, so first of all, I agree with your premise 100%. And that's a good thing. because, again, the market changed very quickly of the value of the firm, Around the core, you can have a thin crust or a thick crust. sort of right sizing the and I think that we'll be an open book at that. So that's going to be really inter- The reason for that is this is really where because the open source, let's face it, What is the impact of Kubernetes, in your opinion, Which a lot of people have been sort of chasing the computing paradigm that gave rise to the internet. allows it to be compatible with the road that you say it does. We're sort of past the experimental, that if I said the name everyone would go, We're having some trouble that's exactly the problem we're trying to solve. and delegate sort of and everyone's going to sort of live in the same source code And the open source, obviously, driving it. sort of before the Hortonworks Cloudera merger. I think it's the timing's perfect. Thank you so much. A real great player in the community and the ecosystem, (bright symphony music)

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Dave VellantePERSON

0.99+

Ben GolubPERSON

0.99+

FebruaryDATE

0.99+

John FurrierPERSON

0.99+

MicrosoftORGANIZATION

0.99+

Andy JassyPERSON

0.99+

MarchDATE

0.99+

JanuaryDATE

0.99+

Joseph JacksPERSON

0.99+

JohnPERSON

0.99+

Paulo AltoLOCATION

0.99+

two billionQUANTITY

0.99+

AmazonORGANIZATION

0.99+

10%QUANTITY

0.99+

JosephPERSON

0.99+

oneQUANTITY

0.99+

OSS CapitalORGANIZATION

0.99+

AdobeORGANIZATION

0.99+

HortonworksORGANIZATION

0.99+

JJPERSON

0.99+

Joesph JacksPERSON

0.99+

2012DATE

0.99+

CNCFORGANIZATION

0.99+

Doug CuttingPERSON

0.99+

Red HatORGANIZATION

0.99+

SourcefireORGANIZATION

0.99+

SiliconANGLE MediaORGANIZATION

0.99+

MySQLTITLE

0.99+

secondQUANTITY

0.99+

Cumulus NetworksORGANIZATION

0.99+

100%QUANTITY

0.99+

50QUANTITY

0.99+

Jim BarksdalePERSON

0.99+

1%QUANTITY

0.99+

five waysQUANTITY

0.99+

MuleSoftORGANIZATION

0.99+

DockerORGANIZATION

0.99+

20 yearsQUANTITY

0.99+

two thingsQUANTITY

0.99+

October 2018DATE

0.99+

JFrogORGANIZATION

0.99+

ClouderaORGANIZATION

0.99+

fourQUANTITY

0.99+

Open Source Software CapitalORGANIZATION

0.99+

2018DATE

0.99+

first initiativeQUANTITY

0.99+

CUBEORGANIZATION

0.99+

BabsonORGANIZATION

0.99+

three peopleQUANTITY

0.99+

Rob BearnsPERSON

0.99+

2%QUANTITY

0.99+

OSSORGANIZATION

0.99+

AlexPERSON

0.99+

first timeQUANTITY

0.99+

KubernetesTITLE

0.99+

ConfluentORGANIZATION

0.98+

Al FrescoORGANIZATION

0.98+

BenPERSON

0.98+

Bay AreaLOCATION

0.98+

theCUBEORGANIZATION

0.98+

SalesforceORGANIZATION

0.98+

DatabricksORGANIZATION

0.98+

first oneQUANTITY

0.98+

NetscapeORGANIZATION

0.98+

GitHubORGANIZATION

0.98+

singleQUANTITY

0.98+

more than 20 peopleQUANTITY

0.98+

LinuxTITLE

0.98+

one observationQUANTITY

0.98+

StorjORGANIZATION

0.97+

KubeConORGANIZATION

0.97+

second thingQUANTITY

0.97+

two core languagesQUANTITY

0.97+

tenQUANTITY

0.97+

each vendorQUANTITY

0.97+

Kostas Tzoumas, data Artisans | Flink Forward 2018


 

(techno music) >> Announcer: Live, from San Francisco, it's theCUBE. Covering Flink Forward, brought to you by data Artisans. (techno music) >> Hello again everybody, this is George Gilbert, we're at the Flink Forward Conference, sponsored by data Artisans, the provider of both Apache Flink and the commercial distribution, the dA Platform that supports the productionization and operationalization of Flink, and makes it more accessible to mainstream enterprises. We're priviledged to have Kostas Tzoumas, CEO of data Artisans, with us today. Welcome Kostas. >> Thank you. Thank you George. >> So, tell us, let's start with sort of an idealized application-use case, that is in the sweet spot of Flink, and then let's talk about how that's going to broaden over time. >> Yeah, so just a little bit of an umbrella above that. So what we see very, very consistently, we see it in tech companies, and we see, so modern tech companies, and we see it in traditional enterprises that are trying to move there, is a move towards a business that runs in real time. Runs 24/7, is data-driven, so decisions are made based on data, and is software operated. So increasingly decisions are made by AI, by software, rather than someone looking at something and making a decision, yeah. So for example, some of the largest users of Apache Flink are companies like Uber, Netflix, Alibaba, Lyft, they are all working in this way. >> Can you tell us about the size of their, you know, something in terms of records per day, or cluster size, or, >> Yeah, sure. So, latest I heard, Alibaba is powering Alibaba Certs, more than a thousand nodes, terabytes of states, I'm pretty sure they will give us bigger numbers today. Netflix has reported of doing about one trillion events per day. >> George: Wow. >> On Flink. So pretty big sizes. >> So and is Netflix, I think I read, is powering their real time recommendation updates. >> They are powering a bunch of things, a bunch of applications, there's a lot of routing events internally. I think they have a talk, they had a talk definitely at the last conference, where they talk about this. And it's really a variety of use cases. It's really about building a platform, internally. And offering it to all sorts of departments in the company, be that for recommendations, be that for BI, be that for running, state of microservices, you know, all sorts of things. And we also see, the more traditional enterprise moving to this modus operandi. For example, ING is also one of our biggest partners, it's a global consumer bank based in the Netherlands, and their CEO is saying that ING is not a bank, it's a tech company that happens to have a banking license. It's a tech company that inherited a banking license. So that's how they want to operate. So what we see, is stream processing is really the enabler for this kind of business, for this kind of modern business where we interact with, in real time, they interact with the consumer in real time, they push notifications, they can change the pricing, et cetera, et cetera. So this is really the crux of stateful stream processing , for me. >> So okay, so tell us, for those who, you know, have a passing understanding of how Kafka's evolving, how Apache Spark and Structured Streaming's evolving, as distinct from, but also, Databricks. What is it about having state management that's sort of integrated, that for example, might make it easy to elastically change a cluster size by repartitioning. What can you assume about managing state internally, that makes things easier? >> Yeah, so I think really the, the sweet spot of Flink, is that if you are looking for stream process, from a stream processing engine, and for a stateful stream processing engine for that matter, Flink is the definition of this. It's the definite solution to this problem. It was created from scratch, with this in mind, it was not sort of a bolt-on on top of something else, so it's streaming from the get-go. And we have done a lot of work to make state a first-class citizen. What this means, is that in Flink programs, you can keep state that scales to terabytes, we have seen that, and you can manage this state together with your application. So Flink has this model based on check points, where you take a check point of your application and state together, and you can restart at any time from there. So it's really, the core of Flink, is around state management. >> And you manage exactly one semantics across the checkpointing? >> It's exactly once, it's application-level exactly once. We have also introduced end-to-end exactly once with Kafka. So Kafka-Flink-Kafka exactly once. So fully consistent. >> Okay so, let's drill down a little bit. What are some of the things that customers would do with an application running on a, let's say a big cluster or a couple clusters, where they want to operate both on the application logic and on the state that having it integrated you know makes much easier? >> Yeah, so it is a lot about a flipped architecture and about making operations and DevOps much, much easier. So traditionally what you would do is create, let's say a containerized stateless application and have a central centralized data store to keep all your states. What you do now, is the state becomes part of the application. So this has several benefits. It has performance benefits, it has organizational benefits in the company. >> Autonomy >> Autonomy between teams. It has, you know it gives you a lot of flexibility on what you can do with the applications, like, for example right, scaling an application. What you can do with Flink is that you have an application running with parallelism over 100 and you are getting a higher volume and you want to scale it to 500 right, so you can simply with Flink take a snapshot of the state and the application together, and then restart it at a 500 and Flink is going to resolve the state. So no need to do anything on a database. >> And then it'll reshard and Flink will reshard it. >> Will reshard and it will restart. And then one step further with the product that we have introduced, dA Platform which includes Flink, you can simply do this with one click or with one rest command. >> So, the the resharding was possible with core Flink, the Apache Flink and the dA Platform just makes it that much easier along with other operations. >> Yeah so what the dA Platform does is it gives you an API for common operational tasks, that we observed everybody that was deploying Flink at a decent scale needs to do. It abstracts, it is based on Kubernetes, but it gives you a higher-level API than Kubernetes. You can manage the application and the state together, and it gives that to you in a rest API, in a UI, et cetera. >> Okay, so in other words it's sort of like by abstracting even up from Kubernetes you might have a cluster as a first-class citizen but you're treating it almost like a single entity and then under the covers you're managing the, the things that happen across the cluster. >> So what we have in the dA Platform is a notion of a deployment which is, think of it as, I think of it as a cluster, but it's basically based on containers. So you have this notion of deployments that you can manage, (coughs) sorry, and then you have a notion of an application. And an application, is a Flink job that evolves over time. And then you have a very, you know, bird's-eye view on this. You can, when you update the code, this is the same application with updated code. You can travel through a history, you can visit the logs, and you can do common operational tasks, like as I said, rescaling, updating the code, rollbacks, replays, migrate to a new deployment target, et cetera. >> Let me ask you, outside of the big tech companies who have built much of the application management scaffolding themselves, you can democratize access to stream processing because the capabilities, you know, are not in the skill set of traditional, mainstream developers. So question, the first thing I hear from a lot of sort of newbies, or people who want to experiment, is, "Well, it's so easy to manage the state "in a shared database, even if I'm processing, "you know, continuously." Where should they make the trade-off? When is it appropriate to use a shared database? Maybe you know, for real OLTP work, and then when can you sort of scale it out and manage it integrally with the rest of the application? >> So when should we use a database and when should we use streaming, right? >> Yeah, and even if it's streaming with the embedded state. >> Yeah, that's a very good question. I think it really depends on the use case. So what we see in the market, is many enterprises start with with a use case that either doesn't scale, or it's not developer friendly enough to have these database application levels. Level separation. And then it quickly spreads out in the whole company and other teams start using it. So for example, in the work we did with ING, they started with a fraud detection application, where the idea was to load models dynamically in the application, as the data scientists are creating new models, and have a scalable fraud detection system that can handle their load. And then we have seen other teams in the company adopting processing after that. >> Okay, so that sounds like where the model becomes part of the application logic and it's a version of the application logic and then, >> The version of the model >> Is associated with the checkpoint >> Correct. >> So let me ask you then, what happens when you you're managing let's say terabytes of state across a cluster, and someone wants to query across that distributed state. Is there in Flink a query manager that, you know, knows about where all the shards are and the statistics around the shards to do a cost-based query? >> So there is a feature in Flink called queryable state that gives you the ability to do, very simple for now, queries on the state. This feature is evolving, it's in progress. And it will get more sophisticated and more production-ready over time. >> And that enables a different class of users. >> Exactly, I wouldn't, like to be frank, I wouldn't use it for complex data warehousing scenarios. That still needs a data warehouse, but you can do point queries and a few, you know, slightly more sophisticated queries. >> So this is different. This type of state would be different from like in Kafka where you can store you know the commit log for X amount of time and then replay it. This, it's in a database I assume, not in a log form and so, you have faster access. >> Exactly, and it's placed together with a log, so, you can think of the state in Flink as the materialized view of the log, at any given point in time, with various versions. >> Okay. >> And really, the way replay works is, roll back the state to a prior version and roll back the log, the input log, to that same logical time. >> Okay, so how do you see Flink spreading out, now that it's been proven in the most demanding customers, and now we have to accommodate skills, you know, where the developers and DevOps don't have quite the same distributed systems knowledge? >> Yeah, I mean we do a lot of work at data Artisans with financial services, insurance, very traditional companies, but it's definitely something that is work in progress in the sense that our product the dA Platform makes operation smarts easier. This was a common problem everywhere, this was something that tech companies solved for themselves, and we wanted to solve it for everyone else. Application development is yet another thing, and as we saw today in the last keynote, we are working together with Google and the BIM Community to bring Python, GOLD, all sorts of languages into Flink. >> Okay so that'll help at the developer level, and you're also doing work at the operations level with the platform. >> And of course there's SQL right? So Flink has Stream SQL which is standard SQL. >> And would you see, at some point, actually sort of managing the platform for customers, either on-prem or in the cloud? >> Yeah, so right now, the platform is running on Kubernetes, which means that typically the customer installs it in their clusters, in their Kubernetes clusters. Which can be either their own machines, or it can be a Kubernetes service from a cloud vendor. Moving forward I think it will be very interesting yes, to move to more hosted solutions. Make it even easier for people. >> Do you see a breakpoint or a transition between the most sophisticated customers who, either are comfortable on their own premises, or who were cloud, sort of native, from the beginning, and then sort of the rest of the mainstream? You know, what sort of applications might they move to the cloud or might coexist between on-prem and the cloud? >> Well I think it's clear that the cloud is, you know, every new business starts on the cloud, that's clear. There's a lot of enterprise that is not yet there, but there's big willingness to move there. And there's a lot of hybrid cloud solutions as well. >> Do you see mainstream customers rewriting applications because they would be so much more powerful in stream processing, or do you see them doing just new applications? >> Both, we see both. It's always easier to start with a new application, but we do see a lot of legacy applications in big companies that are not working anymore. And we see those rewritten. And very core applications, very core to the business. >> So could that be, could you be sort of the source and in an analytic processing for the continuous data and then that sort of feeds a transaction and some parameters that then feed a model? >> Yeah. >> Is that, is that a, >> Yeah. >> so in other words you could augment existing OLTP applications with analytics then inform them in real time essentially. >> Absolutely. >> Okay, 'cause that sounds like then something that people would build around what exists. >> Yeah, I mean you can do, you can think of stream processing, in a way, as transaction processing. It's not a dedicated OLTP store, but you can think of it in this flipped architecture right? Like the log is essentially the re-do log, you know, and then you create the materialized views, that's the write path, and then you have the read path, which is queryable state. This is this whole CQRS idea right? >> Yeah, Command-Query-Response. >> Exactly. >> So, this is actually interesting, and I guess this is critical, it's sort of like a new way of doing distributed databases. I know that's not the word you would choose, but it's like the derived data, managed by, sort of coming off of the state changes, then in the stream processor that goes through a single sort of append-only log, and then reading, and how do you manage consistency on the materialized views that derive data? >> Yeah, so we have seen Flink users implement that. So we have seen, you know, companies really base the complete product on the CQRS pattern. I think this is a little bit further out. Consistency-wise, Flink gives you the exactly once consistency on the write path, yeah. What we see a lot more is an architecture where there's a lot of transactional stores in the front end that are running, and then there needs to be some kind of global, of single source of truth, between all of them. And a very typical way to do that is to get these logs into a stream, and then have a Flink application that can actually scale to that. Create a single source of truth from all of these transactional stores. >> And by having, by feeding the transactional stores into this sort of hub, I presume, some cluster as a hub, and even if it's in the form of sort of a log, how can you replay it with sufficient throughput, I guess not to be a data warehouse but to, you know, have low latency for updating the derived data? And is that derived data I assume, in non-Flink products? >> Yeah, so the way it works is that, you know, you can get the change logs from the databases, you can use something like Kafka to buffer them up, and then you can use Flink for all the processing and to do the reprocessing with Flink, this is really one of the core strengths of Flink. Basically what you do is, you replay the Flink program together with the states you can get really, really high throughput reprocessing there. >> Where does the super high throughput come from? Is that because of the integration of state and logic? >> Yeah, that is because Flink is a true streaming engine. It is a high-performance streaming engine. And it manages the state, there's no tier, >> Crossing a boundary? >> no tier crossing and there's no boundary crossing when you access state. It's embedded in the Flink application. >> Okay, so that you can optimize the IO path? >> Correct. >> Okay, very, very interesting. So, it sounds like the Kafka guys, the Confluent folks, their aspirations, from the last time we talked to 'em, doesn't extend to analytics, you know, I don't know whether they want partners to do that, but it sounds like they have a similar topology, but they're, but I'm not clear how much of a first-class citizen state is, other than the log. How would you characterize the trade-offs between the two? >> Yeah, so, I mean obviously I cannot comment on Confluent, but like, what I think is that the state and the log are two very different things. You can think of the log as storage, it's a kind of hot storage because it's the most recent data but you know, you cannot query it, it's not a materialized view, right. So for me the separation is between processing state and storage. The log is is a kind of storage, so kind of message queue. State is really the active data, the real-time active data that needs to have consistency guarantees, and that's a completely different thing. >> Okay, and that's the, you're managing, it's almost like you're managing under the covers a distributed database. >> Yes, kind of. Yeah a distributed key-value store if you wish. >> Okay, okay, and then that's exposed through multiple interfaces, data stream, table. >> Data stream, table API, SQL, other languages in the future, et cetera. >> Okay, so going further down the line, how do you see the sort of use cases that are going to get you across the chasm from the big tech companies into the mainstream? >> Yeah, so we are already seeing that a lot. So we're doing a lot of work with financial services, insurance companies a lot of very traditional businesses. And it's really a lot about maintaining single source of truth, becoming more real-time in the way they interact with the outside world, and the customer, like they do see the need to transform. If we take financial services and investment banks for example, there is a big push in this industry to modernize the IT infrastructure, to get rid of legacy, to adopt modern solutions, become more real-time, et cetera. >> And so they really needed this, like the application platform, the dA Platform, because operationalizing what Netflix did isn't going to be very difficult maybe for non-tech companies. >> Yeah, I mean, you know, it's always a trade-off right, and you know for some, some companies build, some companies buy, and for many companies it's much more sensible to buy. That's why we have software products. And really, our motivation was that we worked in the open-source Flink community with all the big tech companies. We saw their successes, we saw what they built, we saw, you know, their failures. We saw everything and we decided to build this for everybody else, for everyone that, you know, is not Netflix, is not Uber, cannot hire software developers so easily, or with such good quality. >> Okay, alright, on that note, Kostas, we're going to have to end it, and to be continued, one with Stefan next, apparently. >> Nice. >> And then hopefully next year as well. >> Nice. Thank you. >> Alright, thanks Kostas. >> Thank you George. Alright, we're with Kostas Tzoumas, CEO of data Artisans, the company behind Apache Flink and now the application platform that makes Flink run for mainstream enterprises. We will be back, after this short break. (techno music)

Published Date : Apr 11 2018

SUMMARY :

Covering Flink Forward, brought to you by data Artisans. and makes it more accessible to mainstream enterprises. Thank you George. application-use case, that is in the sweet spot of Flink, So for example, some of the largest users of Apache Flink I'm pretty sure they will give us bigger numbers today. So pretty big sizes. So and is Netflix, I think I read, is powering it's a tech company that happens to have a banking license. So okay, so tell us, for those who, you know, and you can restart at any time from there. We have also introduced end-to-end exactly once with Kafka. and on the state that having it integrated So traditionally what you would do is and you want to scale it to 500 right, which includes Flink, you can simply do this with one click So, the the resharding was possible with and it gives that to you in a rest API, in a UI, et cetera. you might have a cluster as a first-class citizen and you can do common operational tasks, because the capabilities, you know, are not in the skill set So for example, in the work we did with ING, and the statistics around the shards that gives you the ability to do, but you can do point queries and a few, you know, where you can store you know the commit log so, you can think of the state in Flink and roll back the log, the input log, in the sense that our product the dA Platform at the operations level with the platform. And of course there's SQL right? Yeah, so right now, the platform is running on Kubernetes, Well I think it's clear that the cloud is, you know, It's always easier to start with a new application, so in other words you could augment Okay, 'cause that sounds like then something that's the write path, and then you have the read path, I know that's not the word you would choose, So we have seen, you know, companies Yeah, so the way it works is that, you know, And it manages the state, there's no tier, It's embedded in the Flink application. doesn't extend to analytics, you know, but you know, you cannot query it, Okay, and that's the, you're managing, it's almost like Yeah a distributed key-value store if you wish. Okay, okay, and then that's exposed other languages in the future, et cetera. and the customer, like they do see the need to transform. like the application platform, the dA Platform, and you know for some, some companies build, and to be continued, one with Stefan next, apparently. and now the application platform

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
AlibabaORGANIZATION

0.99+

NetflixORGANIZATION

0.99+

UberORGANIZATION

0.99+

INGORGANIZATION

0.99+

GeorgePERSON

0.99+

George GilbertPERSON

0.99+

Kostas TzoumasPERSON

0.99+

GoogleORGANIZATION

0.99+

LyftORGANIZATION

0.99+

KostasPERSON

0.99+

StefanPERSON

0.99+

San FranciscoLOCATION

0.99+

FlinkORGANIZATION

0.99+

next yearDATE

0.99+

NetherlandsLOCATION

0.99+

twoQUANTITY

0.99+

BothQUANTITY

0.99+

KafkaTITLE

0.99+

bothQUANTITY

0.99+

one clickQUANTITY

0.99+

PythonTITLE

0.99+

SQLTITLE

0.98+

first thingQUANTITY

0.98+

more than a thousand nodesQUANTITY

0.98+

KubernetesTITLE

0.98+

500QUANTITY

0.98+

todayDATE

0.98+

oneQUANTITY

0.98+

ConfluentORGANIZATION

0.97+

ArtisansORGANIZATION

0.96+

single sourceQUANTITY

0.96+

2018DATE

0.96+

over 100QUANTITY

0.95+

dA PlatformTITLE

0.95+

FlinkTITLE

0.94+

about one trillion events per dayQUANTITY

0.94+

Apache FlinkORGANIZATION

0.93+

singleQUANTITY

0.92+

Flink Forward ConferenceEVENT

0.9+

one stepQUANTITY

0.9+

ApacheORGANIZATION

0.89+

DatabricksORGANIZATION

0.89+

KafkaPERSON

0.88+

firstQUANTITY

0.86+

single entityQUANTITY

0.84+

one rest commandQUANTITY

0.82+

Clarke Patterson, Confluent - #SparkSummit - #theCUBE


 

>> Announcer: Live from San Francisco, it's theCUBE. covering Spark Summit 2017, brought to you by Databricks. (techno music) >> Welcome to theCUBE, at Spark Summit here at San Francisco, at the Moscone Center West, and we're going to be competing with all the excitement happening behind us. They're going to be going off with raffles, and I don't know what all. But we'll just have to talk above them, right? >> Clarke: Well at least we didn't get to win. >> Our next guest here on the show is Clarke Patterson from Confluent. You're the Senior Director of Product Marketing, is that correct? >> Yeah, you got it. >> All right, well it's exciting -- >> Clarke: Pleasure to be here >> To have you on the show. >> Clarke: It's my first time here. >> David: First time on theCUBE? >> I feel like one of those radio people, first time caller, here I am. Yup, first time on theCUBE. >> Well, long time listener too, I hope. >> Clarke: Yes, I am. >> And so, have you announced anything new that you want to talk about from Confluent? >> Yeah, I mean not particularly at this show per se, but most recently, we've done a lot of stuff to enable customers to adopt Confluent in the Cloud. So we came up with a Confluent Cloud offering, which is a managed service of our Confluent platform a couple weeks ago, at our event around Kafka. So we're really excited about that. It really fits that need where Cloud First or operation-starved organizations are really wanting to do things with storing platforms based on Kafka, but they just don't have the means to make it happen. And so, we're now standing this up as a managed service center that allows them to get their hands on this great set of capabilities with us as the back stop to do things with it. >> And you said, Kafka is not just a publish and subscribe engine, right? >> Yeah, I'm glad that you asked that. So, that one of the big misconceptions, I think, of Kafka. You know, it's made its way into a lot of organizations from the early use case of publish and subscribe for data. But, over the last 12 to 18 months, in particular, there's been a lot of interesting advancements. Two things in particular: One is the ability to connect, which is called a Connect API in Kafka. And it essentially simplifies how you integrate large amounts of producers and consumers of data as information flows through. So, a modernization of ETL, if you will. The second thing is stream processing. So there's a Kafka streams API that's built-in now as well that allows you to do the lightweight transformations of data as it flows from point A to point B, and you could publish out new topics if you need to manipulate things. And it expands the overall capabilities of what Kafka can do. >> Okay, and I'm going to ask George here to dive in, if you could. >> And I was just going to ask you. >> David: I can feel it. (laughing) >> So, this is interesting. But if we want to frame this in terms of what people understand from, I don't want to say prehistoric eras, but earlier approaches to similar problems. So, let's say, in days gone by, you had an ETL solution. >> Clarke: Yup. >> So now, let's put Connect together with stream processing, and how does that change the whole architecture of integrating your systems? >> Yeah, I mean I think the easiest way to think about this is if you think about some of the different market segments that have existed over the last 10 to 20 years. So data integration was all about how do I get a lot of different systems to integrate a bunch of data and transform it in some manner, and ship it off to some other place in my business. And it was really good at building these end-to-end workflows, moving big quantities of data. But it was generally kind of batch-oriented. And so we've been fixated on, how do we make this process faster? To some degree, and the other segment is application integration which said, hey, you know when I want applications to talk to one another, it doesn't have the scale of information exchange, but it needs to happen a whole lot faster. So these real-time integration systems, ESBs, and things like that came along and it was able to serve that particular need. But as we move forward into this world that we're in now, where there's just all sorts of information, companies want to become advanced-centric. You need to be able to get the best of both of those worlds. And this is really where Kafka is starting to sit. It's saying, hey let's take massive amounts of data producers that need to connect to massive amounts of data consumers, be able to ship a super-granular level of information, transform it as you need, and do that in real-time so that everything can get served out very, very fast. >> But now that you, I mean that's a wonderful and kind of pithy kind of way to distill it. But now that we have this new way of thinking of app integration, data integration, best of both worlds, that has sort of second order consequences in terms of how we build applications and connect them. So what does that look like? What do applications look like in the old world and now what enables them to be sort of re-factored? Or for new apps, how do you build them differently? >> Yeah, I mean we see a lot of people that are going into microservices oriented architecture. So moving away from one big monolithic app that takes this inordinate amount of effort to change in some capacity. And quite frankly, it happens very, very slow. And so they look to microservices to be able to split those up into very small, functional components that they can integrate a whole lot faster, decouple engineering teams so we're not dependent on one another, and just make things happen a whole lot quicker than we could before. But obviously when you do that, you need something that can connect all those pieces, and Kafka's a great thing to sit in there as a way to exchange state across all these things. So that's a massive use case for us and for Kafka specifically in terms of what we're seeing people do. >> You've said something in there at the end that I want to key off, which is, "To exchange state." So in the old world, we used a massive shared database to share state for a monolithic app or sometimes between monolithic apps. So what sort of state-of-the-art way that that's done now with microservices, if there's more than one, how does that work? >> Yeah, I mean so this is kind of rooted in the way we do stream processing. So there's this concept of topics, which effectively could align to individual microservices. And you're able to make sure that the most recent state of any particular one is stored in the central repository of Kafka. But then given that we take an API approach to stream processing, it's easy to embed those types of capabilities in any of the end-points. And so some of the activity can happen on that particular front, then it all gets synchronized down into the centralized hub. >> Okay, let me unpack that a little bit. Because you take an API approach, that means that if you're manipulating a topic, you're processing a microservice and that has state in it? Is that the right way to think about it? >> I think that's the easiest way to think about it, yeah. >> Okay. So where are we? Is this a 10 year migration, or is it a, some certain class of apps will lend themselves well to microservices, legacy apps will stay monolithic, and some new apps, some new Greenfield apps, will still be database-centric? How do you, or how should customers think about that mix? >> Yeah that's a great question. I don't know that I have the answer to it. The best gauge I can have is just the amount of interest and conversations that we have on this particular topic. I will say that from one of the topics that we do engage with, it's easily one of the most popular that people are interested in. So if that's a data point, it's definitely a lot of interested people trying to figure out how to do this stuff very, very fast. >> How to do the microservices? >> Yeah and I think if you look at some of the more notable tech companies of late, they're architected this way from the start. And so everyone's kind of looking at the Netflix of the world, and the Ubers of the world saying, I want to be like those guys, how do I do that? And it's driving them down this path. So competitive pressure, I think, will help force people's hands. The more that your competitors are getting in front of you and are able to deliver a better customer experience through some sort of mobile app or something like that, then it's going to force people to have to make these changes quicker. But how long that takes it'll be interesting to see. >> Great! Great stuff. Switch gears just a little bit. Talk about maybe why you're using Databricks and what some of the key value you've gotten out of that. >> Yeah, so I wouldn't say that we're using Databricks per se, but we integrate directly with Spark. So if you look at a lot of the use cases that people use Spark for, they need to obviously get data to where it is. And some of the principles that I said before about Kafka generally, it's a very flexible, very dynamic mechanism for taking lots of sources of information, culling all that down into one centralized place and then distributing it to places such as Spark. So we see a lot of people using the technologies together to get the data from point A to point B, do some transformation as they so need, and then obviously do some amazing computing horsepower and whatnot in Spark itself. >> David: All right. >> I'm processing this, and it's tough because you can go in so many different directions, especially like the question about Spark. I guess, give us some of the scenarios where Spark would fit. Would it be like doing microservices that require more advanced analytics, and then they feed other topics, or feed consumers? And then where might you stick with a shared database that a couple services might communicate with, rather than maintaining the state within the microservice? >> I think, let me see if I can kind of unpack that myself a little bit. >> George: I know it was packed pretty hard. (laughing) >> Got a lot packed in there. When folks want to do things like, I guess when you think about it like an overall business process. If you think about something like an order to cash business process these days, it has a whole bunch of different systems that hang off it. It's got your order processing. You've got an inventory management. Maybe you've got some real-time pricing. You've got some shipments. Things, like that all just kind of hang off of the flow of data across there. Now with any given system that you use for addressing any answers to each of those problems could be vastly different. It could be Spark. It could be a relational database. It could be a whole bunch of different things. Where the centralization of data comes in for us is to be able to just kind of make sure that all those components can be communicating with each other based on the last thing that happened within each of them individually. And so their ability to embed transformation, data transformations and data processing in themselves and then publish back out any change that they had into the shared cluster subsequently makes that state available to everybody else. So that if necessary, they can react to it. So in a lot of ways, we're kind of agnostic to the type of processing that happens on the end-points. It's more just the free movement of all the data to all those things. And then if they have any relevant updates that need to make it back to any of the other components hanging on that process flow, they should have the ability to publish that back down it. >> And so one thing that Jay Kreps, Founder and CEO, talks about is that Kafka may ultimately, or in his language, will ultimately grow into something that rivals the relational database. Tell us what that world would look like. >> It would be controversial (laughing). >> George: That's okay. >> You want me to be the bad guy? So it's interesting because we did Kafka Summit about a month ago, and there's a lot of people, a lot of companies I should say, that are actually using and calling Kafka an enterprise data hub, a central hub for data, a data distribution network. And they are literally storing all sorts (raffle announcements beginning on loudspeaker) of different links of data. So one interesting example was the New York Times. So they used Kafka and literally stored every piece of content that has ever been generated at that publisher since the beginning of time in Kafka. So all the way back to 1851, they've obviously digitized everything. And it sits in there, and then they disposition that back out to various forms of the business. So that's -- >> They replay it, they pull it. They replay and pull, wow, okay. >> So that has some very interesting implications. So you can replay data. If you run some analytics on something and you didn't get the result that you wanted, and you wanted to redo it, it makes it really easy and really fast to be able to do that. If you want to bring on a new system that has some new functionality, you can do that really quickly because you have the full pedigree of everything that sits in there. And then imagine this world where you could actually start to ask questions on it directly. That's where it starts to get very, very profound, and it will be interesting to see where that goes. >> Two things then: First, it sounds, like a database takes updates, so you don't have a perfect historical record. You have a snapshot of current values. Like whereas in a log, like Kafka, or log-structured data structure you have every event that ever happened. >> Clarke: Correct. >> Now, what's the impact on performance if you want to pull, you know -- >> Clarke: That much data? >> Yeah. >> Yeah, I mean so it all comes down to managing the environment on which you run it. So obviously the more data you're going to store in here, and the more type of things you're going to try to connect to it, you're going to have to take that into account. >> And you mentioned just a moment ago about directly asking about the data contained in the hub, in the data hub. >> Clarke: Correct. >> How would that work? >> Not quite sure today, to be honest with you. And I think this is where that question, I think, is a pretty provocative one. Like what does it mean to have this entire view of all granular event streams, not in some aggregated form over time? I think the key will be some mechanism to come onto an environment like this to make it more consumable for more business types users. And that's probably one of the areas we'll want to watch to see how that's (background noise drowns out speaker). >> Okay, only one unanswered question. But you answered all the other ones really well. So we're going to wrap it up here. We're up against a loud break right now. I want to think Clarke Patterson from Confluent for joining us. Thank you so much for being on the show. >> Clarke: Thank you for having me. >> Appreciate it so much. And thank you for watching theCUBE. We'll be back after the raffle in just a few minutes. We have one more guest. Stay with us, thank you. (techno music)

Published Date : Jun 8 2017

SUMMARY :

covering Spark Summit 2017, brought to you by Databricks. They're going to be going off with raffles, is that correct? I feel like one of those radio people, but they just don't have the means to make it happen. Yeah, I'm glad that you asked that. Okay, and I'm going to ask George here to dive in, David: I can feel it. but earlier approaches to similar problems. that have existed over the last 10 to 20 years. But now that we have this new way of thinking And so they look to microservices to be able So in the old world, we used a massive shared database And so some of the activity can happen Is that the right way to think about it? So where are we? I don't know that I have the answer to it. But how long that takes it'll be interesting to see. and what some of the key value you've gotten out of that. and then distributing it to places such as Spark. And then where might you stick with a shared database that myself a little bit. George: I know it was packed pretty hard. So that if necessary, they can react to it. that rivals the relational database. that publisher since the beginning of time in Kafka. They replay it, they pull it. and really fast to be able to do that. or log-structured data structure you have every event the environment on which you run it. And you mentioned just a moment ago about directly And that's probably one of the areas we'll want to watch But you answered all the other ones really well. And thank you for watching theCUBE.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
DavidPERSON

0.99+

AmazonORGANIZATION

0.99+

Dave VellantePERSON

0.99+

Justin WarrenPERSON

0.99+

Sanjay PoonenPERSON

0.99+

IBMORGANIZATION

0.99+

ClarkePERSON

0.99+

David FloyerPERSON

0.99+

Jeff FrickPERSON

0.99+

Dave VolantePERSON

0.99+

GeorgePERSON

0.99+

DavePERSON

0.99+

Diane GreenePERSON

0.99+

Michele PalusoPERSON

0.99+

AWSORGANIZATION

0.99+

Sam LightstonePERSON

0.99+

Dan HushonPERSON

0.99+

NutanixORGANIZATION

0.99+

Teresa CarlsonPERSON

0.99+

KevinPERSON

0.99+

Andy ArmstrongPERSON

0.99+

Michael DellPERSON

0.99+

Pat GelsingerPERSON

0.99+

JohnPERSON

0.99+

GoogleORGANIZATION

0.99+

Lisa MartinPERSON

0.99+

Kevin SheehanPERSON

0.99+

Leandro NunezPERSON

0.99+

MicrosoftORGANIZATION

0.99+

OracleORGANIZATION

0.99+

AlibabaORGANIZATION

0.99+

NVIDIAORGANIZATION

0.99+

EMCORGANIZATION

0.99+

GEORGANIZATION

0.99+

NetAppORGANIZATION

0.99+

KeithPERSON

0.99+

Bob MetcalfePERSON

0.99+

VMwareORGANIZATION

0.99+

90%QUANTITY

0.99+

SamPERSON

0.99+

Larry BiaginiPERSON

0.99+

Rebecca KnightPERSON

0.99+

BrendanPERSON

0.99+

DellORGANIZATION

0.99+

PeterPERSON

0.99+

Clarke PattersonPERSON

0.99+

Day One Wrap - #SparkSummit - #theCUBE


 

>> Announcer: Live from San Francisco, it's the CUBE covering Spark Summit 2017, brought to by Databricks. (energetic music plays) >> And what an exciting day we've had here at the CUBE. We've been at Spark Summit 2017, talking to partners, to customers, to founders, technologists, data scientists. It's been a load of information, right? >> Yeah, an overload of information. >> Well, George, you've been here in the studio with me talking with a lot of the guests. I'm going to ask you to maybe recap some of the top things you've heard today for our guests. >> Okay so, well, Databricks laid down, sort of, three themes that they wanted folks to take away. Deep learning, Structured Streaming, and serverless. Now, deep learning is not entirely new to Spark. But they've dramatically improved their support for it. I think, going beyond the frameworks that were written specifically for Spark, like Deeplearning4j and BigDL by Intel And now like TensorFlow, which is the opensource framework from Google, has gotten much better support. Structured Streaming, it was not clear how much more news we were going to get, because it's been talked about for 18 months. And they really, really surprised a lot of people, including me, where they took, essentially, the processing time for an event or a small batch of events down to 1 millisecond. Whereas, before, it was in the hundreds if not higher. And that changes the type of apps you can build. And also, the Databricks guys had coined the term continuous apps, which means they operate on a never-ending stream of data, which is different from what we've had in the past where it's batch or with a user interface, request-response. So they definitely turned up the volume on what they can do with continuous apps. And serverless, they'll talk about more tomorrow. And Jim, I think, is going to weigh in. But it, basically, greatly simplifies the ability to run this infrastructure, because you don't think of it as a cluster of resources. You just know that it's sort of out there, and you ask requests of it, and it figures out how to fulfill it. I will say, the other big surprise for me was when we have Matei, who's the creator of Spark and the chief technologist at Databricks, come on the show and say, when we asked him about how Spark was going to deal with, essentially, more advanced storage of data so that you could update things, so that you could get queries back, so that you could do analytics, and not just of stuff that's stored in Spark but stuff that Spark stores essentially below it. And he said, "You know, Databricks, you can expect to see come out with or partner with a database to do these advanced scenarios." And I got the distinct impression, and after listen to the tape again, that he was talking about for Apache Spark, which is separate from Databricks, that they would do some sort of key-value store. So in other words, when you look at competitors or quasi-competitors like Confluent Kafka or a data artist in Flink, they don't, they're not perfect competitors. They overlap some. Now Spark is pushing its way more into overlapping with some of those solutions. >> Alright. Well, Jim Kobielus. And thank you for that, George. You've been mingling with the masses today. (laughs) And you've been here all day as well. >> Educated masses, yeah, (David laughs) who are really engaged in this stuff, yes. >> Well, great, maybe give us some of your top takeaways after all the conversations you've had today. >> They're not all that dissimilar from George's. What Databricks, Databricks of course being the center, the developer, the primary committer in the Spark opensource community. They've done a number of very important things in terms of the announcements today at this event that push Spark, the Spark ecosystem, where it needs to go to expand the range of capabilities and their deployability into production environments. I feel the deep-learning side, announcement in terms of the deep-learning pipeline API very, very important. Now, as George indicated, Spark has been used in a fair number of deep-learning development environments. But not as a modeling tool so much as a training tool, a tool for In Memory distributed training of deep-learning models that we developed in TensorFlow, in Caffe, and other frameworks. Now this announcement is essentially bringing support for deep learning directly into the Spark modeling pipeline, the machine-learning modeling pipeline, being able to call out to deep learning, you know, TensorFlow and so forth, from within MLlib. That's very important. That means that Spark developers, of which there are many, far more than there are TensorFlow developers, will now have an easy pass to bring more deep learning into their projects. That's critically important to democratize deep learning. I hope, and from what I've seen what Databricks has indicated, that they have support currently in API reaching out to both TensorFlow and Keras, that they have plans to bring in API support for access to other leading DL toolkits such as Caffe, Caffe 2, which is Facebook-developed, such as MXNet, which is Amazon-developed, and so forth. That's very encouraging. Structured Streaming is very important in terms of what they announced, which is an API to enable access to faster, or higher-throughput Structured Streaming in their cloud environment. And they also announced that they have gone beyond, in terms of the code that they've built, the micro-batch architecture of Structured Streaming, to enable it to evolve into a more true streaming environment to be able to contend credibly with the likes of Flink. 'Cause I think that the Spark community has, sort of, had their back against the wall with Structured Streaming that they couldn't fully provide a true sub-millisecond en-oo-en latency environment heretofore. But it sounds like with this R&D that Databricks is addressing that, and that's critically important for the Spark community to continue to evolve in terms of continuous computation. And then the serverless-apps announcement is also very important, 'cause I see it as really being, it's a fully-managed multi-tenant Spark-development environment, as an enabler for continuous Build, Deploy, and Testing DevOps within a Spark machine-learning and now deep-learning context. The Spark community as it evolves and matures needs robust DevOps tools to production-ize these machine-learning and deep-learning models. Because really, in many ways, many customers, many developers are now using, or developing, Spark applications that are real 24-by-7 enterprise application artifacts that need a robust DevOps environment. And I think that Databricks has indicated they know where this market needs to go and they're pushing it with R&D. And I'm encouraged by all those signs. >> So, great. Well thank you, Jim. I hope both you gentlemen are looking forward to tomorrow. I certainly am. >> Oh yeah. >> And to you out there, tune in again around 10:00 a.m. Pacific Time. We're going to be broadcasting live here. From Spark Summit 2017, I'm David Goad with Jim and George, saying goodbye for now. And we'll see you in the morning. (sparse percussion music playing) (wind humming and waves crashing).

Published Date : Jun 7 2017

SUMMARY :

Announcer: Live from San Francisco, it's the CUBE to customers, to founders, technologists, data scientists. I'm going to ask you to maybe recap And that changes the type of apps you can build. And thank you for that, George. after all the conversations you've had today. for the Spark community to continue to evolve I hope both you gentlemen are looking forward to tomorrow. And to you out there, tune in again

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Jim KobielusPERSON

0.99+

JimPERSON

0.99+

GeorgePERSON

0.99+

DavidPERSON

0.99+

David GoadPERSON

0.99+

San FranciscoLOCATION

0.99+

MateiPERSON

0.99+

tomorrowDATE

0.99+

AmazonORGANIZATION

0.99+

DatabricksORGANIZATION

0.99+

hundredsQUANTITY

0.99+

SparkTITLE

0.99+

bothQUANTITY

0.98+

GoogleORGANIZATION

0.98+

IntelORGANIZATION

0.98+

Spark Summit 2017EVENT

0.98+

18 monthsQUANTITY

0.98+

FlinkORGANIZATION

0.97+

FacebookORGANIZATION

0.97+

Confluent KafkaORGANIZATION

0.97+

CaffeORGANIZATION

0.96+

todayDATE

0.96+

TensorFlowTITLE

0.94+

three themesQUANTITY

0.94+

10:00 a.m. Pacific TimeDATE

0.94+

CUBEORGANIZATION

0.94+

Deeplearning4jTITLE

0.94+

SparkORGANIZATION

0.93+

1 millisecondQUANTITY

0.93+

KerasORGANIZATION

0.91+

Day OneQUANTITY

0.81+

BigDLTITLE

0.79+

TensorFlowORGANIZATION

0.79+

7QUANTITY

0.77+

MLlibTITLE

0.73+

Caffe 2ORGANIZATION

0.7+

CaffeTITLE

0.7+

24-QUANTITY

0.68+

MXNetORGANIZATION

0.67+

Apache SparkORGANIZATION

0.54+

Chinmay Soman | Flink Forward 2017


 

>> Welcome back, everyone. We are on the ground at the data Artisans user conference for Flink. It's called Flink Forward. We are at the Kabuki Hotel in lower Pacific Heights in San Francisco. The conference kicked off this morning with some great talks by Uber and Netflix. We have the privilege of having with us Chinmay Soman from Uber. >> Yes. >> Welcome, Chinmay, it's good to have you. >> Thank you. >> You gave a really, really interesting presentation about the pipelines you're building and where Flink fits, but you've also said there's a large deployment of Spark. Help us understand how Flink became a mainstream technology for you, where it fits, and why you chose it. >> Sure. About one year back, when we were starting to evaluate what technology makes sense for the problem space that we are trying to solve, which is neural dynamics. We observed that Spark's theme processing is actually more resource intensive then some of the other technologies we benchmarked. More specifically, it was using more memory and CPU, at that time. That's one... I actually came from the Apache Samza world. It wasn't the same LinkedIn team before I came to Uber. We had in-house expertise on Samza and I think the reliability was the key motivation for choosing Samza. So we started building on top of Apache Samza for almost the last one and a half years. But then, we hit the scale where Samza, we felt, was lacking. So with Samza, it's actually tied into Kafka a lot. You need to make sure your Kafka scales in order for the stream processing to scale. >> In other words, the topics and the partitions of those topics, you have to keep the physical layout of those in mind at the message cue level, in line with the stream processing. >> That's right. The paralysm is actually tied into a number of partitions in Kafka. Further more, if you have a multi-stage pipeline, where one stage processes data and sends output to another stage, all these intermediate stages, today, again go back to Kafka. So if you want to do a lot of these use cases, you actually end up creating a lot of Kafka topics and the I/O overhead on a cluster shoots up exponentially. >> So when creating topics, or creating consumers that do something and then output to producers, if you do too many of those things, you defeat the purpose of low-latency because you're storing everything. >> Yeah. The credit of it is, it is more robust because if you suddenly get a spike in your traffic, your system is going to handle it because Kafka buffers that spike. It gives you a very reliable platform, but it's not cheap. So that's why we're looking at Flink, In Flink, you can actually build a multi-stage pipeline and have in-memory cues instead of writing back to Kafka, so it is fast and you don't have to create multiple topics per pipeline. >> So, let me unpack that just a little bit to be clearer. The in-memory cues give you, obviously, better I/O. >> Yes. >> And if I understand correctly, that can absorb some of the backpressure? >> Yeah, so backpressure is interesting. If you have everything in Kafka and no in-memory cues, there is no backpressure because Kafka is a big buffer, it just keeps running. With in-memory cues, there is backpressure. Another question is, how do you handle this? So going back to Samza systems, they actually degrade and can't recover once they are in backpressure. But Flink, as you've seen, it slows down consuming from Kafka, but once the spike is over, once you're over that hill, it actually recovers quickly. It is able to sustain heavy spikes. >> Okay, so this goes to your issues with keeping up with the growth of data... >> That's right. >> You know, the system, there's multiple leaves of elasticity and then resource intensity. Tell us about that end and the desire to get as many jobs as possible out of a certain level of resource. >> So, today, we are a platform where people come in and say, "Here's my code." Or, "Here's my SQL that I want to run on your platform." In the old days, they were telling us, "Oh, I need 10 gigabytes for a container," and this they need these many CPUs and that really limited how many use cases we onboarded and made our hardware footprint pretty expensive. So we need the pipeline, the infrastructure, to be really memory efficient. What we have seen is memory is the bottle link in our world, more so than CPU. A lot of applications, they consume from Kafka, they actually buffer locally in each container and then they do that in the local memory, in the JVM memory. So we need the memory component to be very efficient and we can pack more jobs on the same cluster if everyone is using lesser memory. That's one motivation. The other thing, for example, that Flink does and Samza also does, is make use of a RocksDB store, which is a local persistent-- >> Oh, that's where it gets the state management. >> That's right, so you can offload from memory on to the disk-- >> Into a proper database. >> Into a proper database and you don't have to cross a network to do that because it's sitting locally. >> Just to elaborate on what might be, what might seem like, a arcane topic, if it's residing locally, than anything it's going to join with has to also be residing locally. >> Yeah, that's a good point. You have to be able to partition your inputs and your state in the same way, otherwise there's no locality. >> Okay, and you'd have to shuffle stuff around the network. >> And more than that, you'd need to be able to recover if something happens because there's no replication for this state. If the hard disk on that DR node crashes, you need to recreate that cache from somewhere. So either you go back and read from Kafka, or you store that cache somewhere. So Flink actually supports this out of the box and it snapshots the RocksDB state into HTFS. >> Got it, okay. It's more resilient--- >> Yes. >> And more resource efficient. So, let me ask one last question. Main stream enterprises, they, or at least the very largest ones, have been trying to wrestle their arms around some opensource projects. Very innovative, the pace of innovation is huge, but it demands a skillset that seems to be most resident in large consumer internet companies. What advice do you have for them where they aspire to use the same technologies that you're talking about to build new systems, but they might not have the skills. >> Right, that's a very good question. I'll try to answer in the way that I can. I think the first thing to do is understand your scale. Even if you're a big, large banking corporation, you need to understand where you fit in the industry ecosystem. If it turns out that your scale isn't that big and you're using it for internal analytics, then you can just pick the off-the-shelf pipelines and make it work. For example, if you don't care about multi-tendency, if your hardware span is not that much, actually anything might actually work. The real challenge is when you pick a technology and make it work for a large use cases and you want to optimize for cost. That's where you need a huge engineering organization. So in simpler words, if your use cases extent is not that big, pick something which has a lot of support from the community. Most more common things just work out-of-the-box, and that's good enough. But if you're doing a lot of complicated things, like real-time machine running, or your scale is in billions of messages per day, or terabytes of data per day, then you really need to make a choice: Whether you invest in an engineering organization that can really understand these use cases; or you go to companies like Databricks. Get a support from Databricks, or... >> Or maybe a cloud vendor? >> Or a cloud vendor, or things like Confluent which is giving Kafka support, things like that. I don't think there is one answer. To me, our use case, for example, the reason we chose to build an engineering organization around that is because our use cases are immensely complicated and not really seen before, so we had to invest in this technology. >> Alright, Chinmay, we're going to leave it on that and hopefully keep the dialogue going-- >> Sure. >> offline. So, we'll be back shortly. We're at Flink Forward, the data Artisans user conference for Flink. We're on the ground at the Kabuki Hotel in downtown San Francisco and we'll be right back.

Published Date : Apr 14 2017

SUMMARY :

We have the privilege of having with us where it fits, and why you chose it. in order for the stream processing to scale. you have to keep the physical layout of those So if you want to do a lot of these use cases, that do something and then output to producers, and you don't have to create The in-memory cues give you, obviously, better I/O. but once the spike is over, once you're over that hill, Okay, so this goes to your issues with You know, the system, there's multiple leaves and that really limited how many use cases we onboarded Into a proper database and you don't have to going to join with has to also be residing locally. You have to be able to partition Okay, and you'd have to shuffle stuff and it snapshots the RocksDB state into HTFS. It's more resilient--- but it demands a skillset that seems to be and you want to optimize for cost. the reason we chose to build We're on the ground at the Kabuki Hotel

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
DatabricksORGANIZATION

0.99+

UberORGANIZATION

0.99+

NetflixORGANIZATION

0.99+

ChinmayPERSON

0.99+

LinkedInORGANIZATION

0.99+

Chinmay SomanPERSON

0.99+

KafkaTITLE

0.99+

ConfluentORGANIZATION

0.99+

FlinkORGANIZATION

0.99+

10 gigabytesQUANTITY

0.99+

each containerQUANTITY

0.99+

San FranciscoLOCATION

0.98+

todayDATE

0.98+

one answerQUANTITY

0.98+

ApacheORGANIZATION

0.98+

2017DATE

0.97+

one last questionQUANTITY

0.95+

first thingQUANTITY

0.95+

SparkTITLE

0.93+

Pacific HeightsLOCATION

0.91+

this morningDATE

0.86+

Kabuki HotelLOCATION

0.85+

RocksDBTITLE

0.83+

About one year backDATE

0.82+

terabytes of dataQUANTITY

0.82+

one motivationQUANTITY

0.8+

SQLTITLE

0.8+

ForwardEVENT

0.78+

SamzaORGANIZATION

0.74+

SamzaTITLE

0.73+

one stageQUANTITY

0.73+

billions of messages per dayQUANTITY

0.72+

ArtisansEVENT

0.7+

last one and a half yearsDATE

0.69+

Artisans userEVENT

0.62+

SamzaCOMMERCIAL_ITEM

0.34+

Darren Chinen, Malwarebytes - Big Data SV 17 - #BigDataSV - #theCUBE


 

>> Announcer: Live from San Jose, California, it's The Cube, covering Big Data Silicon Valley 2017. >> Hey, welcome back everybody. Jeff Frick here with The Cube. We are at Big Data SV in San Jose at the Historic Pagoda Lounge, part of Big Data week which is associated with Strata + Hadoop. We've been coming here for eight years and we're excited to be back. The innovation and dynamicism of big data and evolutions now with machine learning and artificial intelligence, just continues to roll, and we're really excited to be here talking about one of the nasty aspects of this world, unfortunately, malware. So we're excited to have Darren Chinen. He's the senior director of data science and engineering from Malwarebytes. Darren, welcome. >> Darren: Thank you. >> So for folks that aren't familiar with the company, give us just a little bit of background on Malwarebytes. >> So Malwarebytes is basically a next-generation anti-virus software. We started off as humble roots with our founder at 14 years old getting infected with a piece of malware, and he reached out into the community and, at 14 years old, wrote his first, with the help of some people, wrote his first lines of code to remediate a couple of pieces of malware. It grew from there and I think by the ripe old age of 18, founded the company. And he's now I want to say 26 or 27 and we're doing quite well. >> It was interesting, before we went live you were talking about his philosophy and how important that is to the company and now has turned into really a strategic asset, that no one should have to suffer from malware, and he decided to really offer a solution for free to help people rid themselves of this bad software. >> Darren: That's right. Yeah, so Malwarebytes was founded under the principle that Marcin believes that everyone has the right to a malware-free existence and so we've always offered a free version Malwarebytes that will help you to remediate if your machine does get infected with a piece of malware. And that's actually still going to this day. >> And that's now given you the ability to have a significant amount of inpoint data, transactional data, trend data, that now you can bake back into the solution. >> Darren: That's right. It's turned into a strategic advantage for the company, it's not something I don't think that we could have planned at 18 years old when he was doing this. But we've instrumented it so that we can get some anonymous-level telemetry and we can understand how malware proliferates. For many, many years we've been positioned as a second-opinion scanner and so we're able to see a lot of things, some trends happening in there and we can actually now see that in real time. >> So, starting out as a second-position scanner, you're basically looking at, you're finding what others have missed. And how can you, what do you have to do to become the first line of defense? >> Well, with our new product Malwarebytes 3.0, I think some of that landscape is changing. We have a very complete and layered offering. I'm not the product manager, so I don't think, as the data science guy, I don't know that I'm qualified to give you the ins and outs, but I think some of that is changing as we have, we've combined a lot of products and we have a much more complete sweep of layered protection built into the product. >> And so, maybe tell us, without giving away all the secret sauce, what sort of platform technologies did you use that enabled you to scale to these hundreds of millions of in points, and then to be fast enough at identifying things that were trending that are bad that you had to prioritize? >> Right, so traditionally, I think AV companies, they have these honeypots, right, where they go and the collect a piece of virus or a piece of malware, and they'll take the MD5 hash of that and then they'll basically insert that into a definition's database. And that's a very exact way to do it. The problem is is that there's so much malware or viruses out there in the wild, it's impossible to get all of them. I think one of the things that we did was we set up telemetry and we have a phenomenal research team where we're able to actually have our team catch entire families of malware, and that's really the secret sauce to Malwarebytes. There's several other levels but that's where we're helping out in the immediate term. What we do is we have, internally, we sort of jokingly call it a Lambda Two architecture. We had considered Lambda long ago, long ago and I say about a year ago when we first started this journey. But there's, Lambda is riddled with, as you know, a number of issues. If you've ever talked to Jay Kreps from Confluent, he has a lot of opinions on that, right? And one of the key problems with that is, that if you do a traditional Lambda, you have to implement your code in two places, it's very difficult, things get out of sync, you have to have replay frameworks. And these are some of the challenges with Lambda. So we do processing in a number of areas. The first thing that we did was we implemented Kafka to handle all of the streaming data. We use Kafka streams to do inline stateless transformations and then we also use Kafka Connect. And we write all of our data both into HBase, we use that, we may swap that out later for something like Redis, and that would be a thin speed layer. And then we also move the data into S3 and we use some ephemeral clusters to do very large-scale batch processing, and that really provides our data lab. >> When you call that Lambda Two, is that because you're still working essentially on two different infrastructures, so your code isn't quite the same? You still have to check the results on either on either fork. >> That's right, yeah, we didn't feel like it was, we did evaluate doing everything in the stream. But there are certain operations that are difficult to do with purely streamed processing, and so we did need a little bit, we did need to have a thin, what we call real time indicators, a speed layer, to supplement what we were doing in the stream. And so that's the differentiating factor between a traditional Lambda architecture where you'd want to have everything in the stream and everything in batch, and the batch is really more of a truing mechanism as opposed to, our real time is really directional, so in the traditional sense, if you look at traditional business intelligence, you'd have KPIs that would allow you to gauge the health of your business. We have RTIs, Real Time Indicators, that allow us to gauge directionally, what is important to look at this day, this hour, this minute? >> This thing is burning up the charts, >> Exactly. >> Therefore it's priority one. >> That's right, you got it. >> Okay. And maybe tell us a little more, because everyone I'm sure is familiar with Kafka but the streams product from them is a little newer as is Kafka Connect, so it sounds like you've got, it's not just the transport, but you've got some basic analytics and you've got the ability to do the ETL because you've got Connect that comes from sources and destinations, sources and syncs. Tell us how you've used that. >> Well, the streams product is, it's quite different than something like Spark Streaming. It's not working off micro-batching, it's actually working off the stream. And the second thing is, it's not a separate cluster. It's just a library, effectively a .jar file, right? And so because it works natively with Kafka, it handles certain things there quite well. It handles back pressure and when you expand the cluster, it's pretty good with things like that. We've found it to be a fairly stable technology. It's just a library and we've worked very closely with Confluent to develop that. Whereas Kafka Connect is really something that we use to write out to S3. In fact, Confluent just released a new, an S3 connector direct. We were using Stream X, which was a wrapper on top of an HDFS connector and they rigged that up to write to S3 for us. >> So tell us, as you look out, what sorts of technologies do you see as enabling you to build a platform that's richer, and then how would that show up in the functionality consumers like we would see? >> Darren: With respect to the architecture? >> Yeah. >> Well one of the things that we had to do is we had to evaluate where we wanted to spend our time. We're a very small team, the entire data science and engineering team is less than I think 10 months old. So all of us got hired, we've started this platform, we've gone very, very fast. And we had to decide, how are we going to, a, get, we've made this big investment, how are we going to get value to our end customer quickly, so that they're not waiting around and you get the traditional big-data story where, we've spent all this money and now we're not getting anything out of it. And so we had to make some of those strategic decisions and because of the fact that the data was really truly big data in nature, there's just a huge amount of work that has to be done in these open-source technologies. They're not baked, it's not like going out to Oracle and giving them a purchase order and you install it and away you go. There's a tremendous amount of work, and so we've made some strategic decisions on what we're going to do in open-source and what we're going to do with a third-party vendor solution. And one of those solutions that we decided was workload automation. So I just did a talk on this about how Control-M from BMC was really the tool that we chose to handle a lot of the coordination, the sophisticated coordination, and the workload automation on the batch side, and we're about to implement that in a data-quality monitoring framework. And that's turned out to be an incredibly stable solution for us. It's allowed us to not spend time with open-source solutions that do the same things like Airflow, which may or may not work well, but there's really no support around that, and focus our efforts on what we believe to be the really, really hard problems to tackle in Kafka, Kafka Streams, Connect, et cetera. >> Is it fair to say that Kafka plus Kafka Connect solves many of the old ETL problems or do you still need some sort of orchestration tool on top of it to completely commoditize, essentially moving and transforming data from OLTP or operational system to a decision support system? >> I guess the answer to that is, it depends on your use case. I think there's a lot of things that Kafka and the stream's job can solve for you, but I don't think that we're at the point where everything can be streaming. I think that's a ways off. There's legacy systems that really don't natively stream to you anyway, and there's just certain operations that are just more efficient to do in batch. And so that's why we've, I don't think batch for us is going away any time soon and that's one of the reasons why workload automation in the batch layer initially was so important and we've decided to extend that, actually, into building out a data-quality monitoring framework to put a collar around how accurate our data is on the real-time side. >> Cuz it's really horses for courses, it's not one or the other, it's application-specific, what's the best solution for that particular is. >> Yeah, I don't think that there's, if there was a one-size-fits-all it'd be a company, and there would be no need for architects, so I think that you have to look at your use case, your company, what kind of data, what style of data, what type of analysis do you need. Do you really actually need the data in real time and if you do put in all the work to get it in real time, are you going to be able to take action on it? And I think Malwarebytes was a great candidate. When it came in, I said, "Well, it does look like we can justify "the need for real time data, and the effort "that goes into building out a real-time framework." >> Jeff: Right, right. And we always say, what is real time? In time to do something about it, (all chuckle) and if there's not time to do something about it, depending on how you define real time, really what difference does it make if you can't do anything about it that fast. So as you look out in the future with IoT, all these connected devices, this is a hugely increased attack surface as we just read our essay a few weeks back. How does that work into your planning? What do you guys think about the future where there's so many more connected devices out on the edge and various degrees of intelligence and opportunities to hi-jack, if you will? >> Yeah, I think, I don't think I'm qualified to speak about the Malwarebytes product roadmap as far as IoT goes. >> But more philosophically, from a professional point of view, cuz every coin has two sides, there's a lot of good stuff coming from IoT and connected devices, but as we keep hearing over and over, just this massive attack surface expansion. >> Well I think, for us, the key is we're small and we're not operating, like I came from Apple where we operated on a budget of infinity, so we're not-- >> Have to build the infinity or the address infinity (Darren laughs) with an actual budget. >> We're small and we have to make sure that whatever we do creates value. And so what I'm seeing in the future is, as we get more into the IoT space and logs begin to proliferate and data just exponentiates in size, it's really how do we do the same thing and how are we going to manage that in terms of cost? Generally, big data is very low in information density. It's not like transactional systems where you get the data, it's effectively an Excel spreadsheet and you can go run some pivot tables and filters and away you go. I think big data in general requires a tremendous amount of massaging to get to the point where a data scientist or an analyst can actually extract some insight and some value. And the question is, how do you massage that data in a way that's going to be cost-effective as IoT expands and proliferates? So that's the question that we're dealing with. We're, at this point, all in with cloud technologies, we're leveraging quite a few of Amazon services, server-less technologies as well. We just are in the process of moving to the Athena, to Athena, as just an on-demand query service. And we use a lot of ephemeral clusters as well, and that allows us to actually run all of our ETL in about two hours. And so these are some of the things that we're doing to prepare for this explosion of data and making sure that we're in a position where we're not spending a dollar to gain a penny if that makes sense. >> That's his business. Well, he makes fun of that business model. >> I think you could do it, you want to drive revenue to sell dollars for 90 cents. >> That's the dot com model, I was there. >> Exactly, and make it up in volume. All right, Darren Chenin, thanks for taking a few minutes out of your day and giving us the story on Malwarebytes, sounds pretty exciting and a great opportunity. >> Thanks, I enjoyed it. >> Absolutely, he's Darren, he's George, I'm Jeff, you're watching The Cube. We're at Big Data SV at the Historic Pagoda Lounge. Thanks for watching, we'll be right back after this short break. (upbeat techno music)

Published Date : Mar 15 2017

SUMMARY :

it's The Cube, and evolutions now with machine learning So for folks that aren't and he reached out into the community and, and how important that is to the company and so we've always offered a free version And that's now given you the ability it so that we can get what do you have to do to become and we have a much more complete sweep and that's really the secret the results on either and so we did need a little bit, and you've got the ability to do the ETL that we use to write out to S3. and because of the fact that the data and that's one of the reasons it's not one or the other, and if you do put in all the and opportunities to hi-jack, if you will? I don't think I'm qualified to speak and connected devices, or the address infinity and how are we going to Well, he makes fun of that business model. I think you could do it, and giving us the story on Malwarebytes, the Historic Pagoda Lounge.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
JeffPERSON

0.99+

Darren ChinenPERSON

0.99+

DarrenPERSON

0.99+

Jeff FrickPERSON

0.99+

Darren CheninPERSON

0.99+

GeorgePERSON

0.99+

Jay KrepsPERSON

0.99+

90 centsQUANTITY

0.99+

two sidesQUANTITY

0.99+

AppleORGANIZATION

0.99+

AthenaLOCATION

0.99+

MarcinPERSON

0.99+

AmazonORGANIZATION

0.99+

two placesQUANTITY

0.99+

San JoseLOCATION

0.99+

BMCORGANIZATION

0.99+

eight yearsQUANTITY

0.99+

San Jose, CaliforniaLOCATION

0.99+

first linesQUANTITY

0.99+

MalwarebytesORGANIZATION

0.99+

KafkaTITLE

0.99+

oneQUANTITY

0.99+

10 monthsQUANTITY

0.99+

Kafka ConnectTITLE

0.99+

OracleORGANIZATION

0.99+

LambdaTITLE

0.99+

firstQUANTITY

0.99+

second thingQUANTITY

0.99+

GenePERSON

0.99+

ExcelTITLE

0.99+

ConfluentORGANIZATION

0.99+

The CubeTITLE

0.98+

first lineQUANTITY

0.98+

27QUANTITY

0.97+

26QUANTITY

0.97+

RedisTITLE

0.97+

Kafka StreamsTITLE

0.97+

S3TITLE

0.97+

18QUANTITY

0.96+

14 years oldQUANTITY

0.96+

18 years oldQUANTITY

0.96+

about two hoursQUANTITY

0.96+

g agoDATE

0.96+

ConnectTITLE

0.96+

second-positionQUANTITY

0.95+

HBaseTITLE

0.95+

first thingQUANTITY

0.95+

Historic Pagoda LoungeLOCATION

0.94+

bothQUANTITY

0.93+

two different infrastructuresQUANTITY

0.92+

S3COMMERCIAL_ITEM

0.91+

Big DataEVENT

0.9+

The CubeORGANIZATION

0.88+

Lambda TwoTITLE

0.87+

Malwarebytes 3.0TITLE

0.84+

AirflowTITLE

0.83+

a year agoDATE

0.83+

second-opinionQUANTITY

0.82+

hundreds of millions ofQUANTITY

0.78+