Data Power Panel V3

(upbeat music) >> The stampede to cloud and massive VC investments has led to the emergence of a new generation of object store based data lakes. And with them two important trends, actually three important trends. First, a new category that combines data lakes and data warehouses aka the lakehouse is emerged as a leading contender to be the data platform of the future. And this novelty touts the ability to address data engineering, data science, and data warehouse workloads on a single shared data platform. The other major trend we've seen is query engines and broader data fabric virtualization platforms have embraced NextGen data lakes as platforms for SQL centric business intelligence workloads, reducing, or somebody even claim eliminating the need for separate data warehouses. Pretty bold. However, cloud data warehouses have added complimentary technologies to bridge the gaps with lakehouses. And the third is many, if not most customers that are embracing the so-called data fabric or data mesh architectures. They're looking at data lakes as a fundamental component of their strategies, and they're trying to evolve them to be more capable, hence the interest in lakehouse, but at the same time, they don't want to, or can't abandon their data warehouse estate. As such we see a battle royale is brewing between cloud data warehouses and cloud lakehouses. Is it possible to do it all with one cloud center analytical data platform? Well, we're going to find out. My name is Dave Vellante and welcome to the data platform's power panel on theCUBE. Our next episode in a series where we gather some of the industry's top analysts to talk about one of our favorite topics, data. In today's session, we'll discuss trends, emerging options, and the trade offs of various approaches and we'll name names. Joining us today are Sanjeev Mohan, who's the principal at SanjMo, Tony Baers, principal at dbInsight. And Doug Henschen is the vice president and principal analyst at Constellation Research. Guys, welcome back to theCUBE. Great to see you again. >> Thank guys. Thank you. >> Thank you. >> So it's early June and we're gearing up with two major conferences, there's several database conferences, but two in particular that were very interested in, Snowflake Summit and Databricks Data and AI Summit. Doug let's start off with you and then Tony and Sanjeev, if you could kindly weigh in. Where did this all start, Doug? The notion of lakehouse. And let's talk about what exactly we mean by lakehouse. Go ahead. >> Yeah, well you nailed it in your intro. One platform to address BI data science, data engineering, fewer platforms, less cost, less complexity, very compelling. You can credit Databricks for coining the term lakehouse back in 2020, but it's really a much older idea. You can go back to Cloudera introducing their Impala database in 2012. That was a database on top of Hadoop. And indeed in that last decade, by the middle of that last decade, there were several SQL on Hadoop products, open standards like Apache Drill. And at the same time, the database vendors were trying to respond to this interest in machine learning and the data science. So they were adding SQL extensions, the likes Hudi and Vertical we're adding SQL extensions to support the data science. But then later in that decade with the shift to cloud and object storage, you saw the vendor shift to this whole cloud, and object storage idea. So you have in the database camp Snowflake introduce Snowpark to try to address the data science needs. They introduced that in 2020 and last year they announced support for Python. You also had Oracle, SAP jumped on this lakehouse idea last year, supporting both the lake and warehouse single vendor, not necessarily quite single platform. Google very recently also jumped on the bandwagon. And then you also mentioned, the SQL engine camp, the Dremios, the Ahanas, the Starbursts, really doing two things, a fabric for distributed access to many data sources, but also very firmly planning that idea that you can just have the lake and we'll help you do the BI workloads on that. And then of course, the data lake camp with the Databricks and Clouderas providing a warehouse style deployments on top of their lake platforms. >> Okay, thanks, Doug. I'd be remiss those of you who me know that I typically write my own intros. This time my colleagues fed me a lot of that material. So thank you. You guys make it easy. But Tony, give us your thoughts on this intro. >> Right. Well, I very much agree with both of you, which may not make for the most exciting television in terms of that it has been an evolution just like Doug said. I mean, for instance, just to give an example when Teradata bought AfterData was initially seen as a hardware platform play. In the end, it was basically, it was all those after functions that made a lot of sort of big data analytics accessible to SQL. (clears throat) And so what I really see just in a more simpler definition or functional definition, the data lakehouse is really an attempt by the data lake folks to make the data lake friendlier territory to the SQL folks, and also to get into friendly territory, to all the data stewards, who are basically concerned about the sprawl and the lack of control in governance in the data lake. So it's really kind of a continuing of an ongoing trend that being said, there's no action without counter action. And of course, at the other end of the spectrum, we also see a lot of the data warehouses starting to edit things like in database machine learning. So they're certainly not surrendering without a fight. Again, as Doug was mentioning, this has been part of a continual blending of platforms that we've seen over the years that we first saw in the Hadoop years with SQL on Hadoop and data warehouses starting to reach out to cloud storage or should say the HDFS and then with the cloud then going cloud native and therefore trying to break the silos down even further. >> Now, thank you. And Sanjeev, data lakes, when we first heard about them, there were such a compelling name, and then we realized all the problems associated with them. So pick it up from there. What would you add to Doug and Tony? >> I would say, these are excellent points that Doug and Tony have brought to light. The concept of lakehouse was going on to your point, Dave, a long time ago, long before the tone was invented. For example, in Uber, Uber was trying to do a mix of Hadoop and Vertical because what they really needed were transactional capabilities that Hadoop did not have. So they weren't calling it the lakehouse, they were using multiple technologies, but now they're able to collapse it into a single data store that we call lakehouse. Data lakes, excellent at batch processing large volumes of data, but they don't have the real time capabilities such as change data capture, doing inserts and updates. So this is why lakehouse has become so important because they give us these transactional capabilities. >> Great. So I'm interested, the name is great, lakehouse. The concept is powerful, but I get concerned that it's a lot of marketing hype behind it. So I want to examine that a bit deeper. How mature is the concept of lakehouse? Are there practical examples that really exist in the real world that are driving business results for practitioners? Tony, maybe you could kick that off. >> Well, put it this way. I think what's interesting is that both data lakes and data warehouse that each had to extend themselves. To believe the Databricks hype it's that this was just a natural extension of the data lake. In point of fact, Databricks had to go outside its core technology of Spark to make the lakehouse possible. And it's a very similar type of thing on the part with data warehouse folks, in terms of that they've had to go beyond SQL, In the case of Databricks. There have been a number of incremental improvements to Delta lake, to basically make the table format more performative, for instance. But the other thing, I think the most dramatic change in all that is in their SQL engine and they had to essentially pretty much abandon Spark SQL because it really, in off itself Spark SQL is essentially stop gap solution. And if they wanted to really address that crowd, they had to totally reinvent SQL or at least their SQL engine. And so Databricks SQL is not Spark SQL, it is not Spark, it's basically SQL that it's adapted to run in a Spark environment, but the underlying engine is C++, it's not scale or anything like that. So Databricks had to take a major detour outside of its core platform to do this. So to answer your question, this is not mature because these are all basically kind of, even though the idea of blending platforms has been going on for well over a decade, I would say that the current iteration is still fairly immature. And in the cloud, I could see a further evolution of this because if you think through cloud native architecture where you're essentially abstracting compute from data, there is no reason why, if let's say you are dealing with say, the same basically data targets say cloud storage, cloud object storage that you might not apportion the task to different compute engines. And so therefore you could have, for instance, let's say you're Google, you could have BigQuery, perform basically the types of the analytics, the SQL analytics that would be associated with the data warehouse and you could have BigQuery ML that does some in database machine learning, but at the same time for another part of the query, which might involve, let's say some deep learning, just for example, you might go out to let's say the serverless spark service or the data proc. And there's no reason why Google could not blend all those into a coherent offering that's basically all triggered through microservices. And I just gave Google as an example, if you could generalize that with all the other cloud or all the other third party vendors. So I think we're still very early in the game in terms of maturity of data lakehouses. >> Thanks, Tony. So Sanjeev, is this all hype? What are your thoughts? >> It's not hype, but completely agree. It's not mature yet. Lakehouses have still a lot of work to do, so what I'm now starting to see is that the world is dividing into two camps. On one hand, there are people who don't want to deal with the operational aspects of vast amounts of data. They are the ones who are going for BigQuery, Redshift, Snowflake, Synapse, and so on because they want the platform to handle all the data modeling, access control, performance enhancements, but these are trade off. If you go with these platforms, then you are giving up on vendor neutrality. On the other side are those who have engineering skills. They want the independence. In other words, they don't want vendor lock in. They want to transform their data into any number of use cases, especially data science, machine learning use case. What they want is agility via open file formats using any compute engine. So why do I say lakehouses are not mature? Well, cloud data warehouses they provide you an excellent user experience. That is the main reason why Snowflake took off. If you have thousands of cables, it takes minutes to get them started, uploaded into your warehouse and start experimentation. Table formats are far more resonating with the community than file formats. But once the cost goes up of cloud data warehouse, then the organization start exploring lakehouses. But the problem is lakehouses still need to do a lot of work on metadata. Apache Hive was a fantastic first attempt at it. Even today Apache Hive is still very strong, but it's all technical metadata and it has so many different restrictions. That's why we see Databricks is investing into something called Unity Catalog. Hopefully we'll hear more about Unity Catalog at the end of the month. But there's a second problem. I just want to mention, and that is lack of standards. All these open source vendors, they're running, what I call ego projects. You see on LinkedIn, they're constantly battling with each other, but end user doesn't care. End user wants a problem to be solved. They want to use Trino, Dremio, Spark from EMR, Databricks, Ahana, DaaS, Frink, Athena. But the problem is that we don't have common standards. >> Right. Thanks. So Doug, I worry sometimes. I mean, I look at the space, we've debated for years, best of breed versus the full suite. You see AWS with whatever, 12 different plus data stores and different APIs and primitives. You got Oracle putting everything into its database. It's actually done some interesting things with MySQL HeatWave, so maybe there's proof points there, but Snowflake really good at data warehouse, simplifying data warehouse. Databricks, really good at making lakehouses actually more functional. Can one platform do it all? >> Well in a word, I can't be best at breed at all things. I think the upshot of and cogen analysis from Sanjeev there, the database, the vendors coming out of the database tradition, they excel at the SQL. They're extending it into data science, but when it comes to unstructured data, data science, ML AI often a compromise, the data lake crowd, the Databricks and such. They've struggled to completely displace the data warehouse when it really gets to the tough SLAs, they acknowledge that there's still a role for the warehouse. Maybe you can size down the warehouse and offload some of the BI workloads and maybe and some of these SQL engines, good for ad hoc, minimize data movement. But really when you get to the deep service level, a requirement, the high concurrency, the high query workloads, you end up creating something that's warehouse like. >> Where do you guys think this market is headed? What's going to take hold? Which projects are going to fade away? You got some things in Apache projects like Hudi and Iceberg, where do they fit Sanjeev? Do you have any thoughts on that? >> So thank you, Dave. So I feel that table formats are starting to mature. There is a lot of work that's being done. We will not have a single product or single platform. We'll have a mixture. So I see a lot of Apache Iceberg in the news. Apache Iceberg is really innovating. Their focus is on a table format, but then Delta and Apache Hudi are doing a lot of deep engineering work. For example, how do you handle high concurrency when there are multiple rights going on? Do you version your Parquet files or how do you do your upcerts basically? So different focus, at the end of the day, the end user will decide what is the right platform, but we are going to have multiple formats living with us for a long time. >> Doug is Iceberg in your view, something that's going to address some of those gaps in standards that Sanjeev was talking about earlier? >> Yeah, Delta lake, Hudi, Iceberg, they all address this need for consistency and scalability, Delta lake open technically, but open for access. I don't hear about Delta lakes in any worlds, but Databricks, hearing a lot of buzz about Apache Iceberg. End users want an open performance standard. And most recently Google embraced Iceberg for its recent a big lake, their stab at having supporting both lakes and warehouses on one conjoined platform. >> And Tony, of course, you remember the early days of the sort of big data movement you had MapR was the most closed. You had Horton works the most open. You had Cloudera in between. There was always this kind of contest as to who's the most open. Does that matter? Are we going to see a repeat of that here? >> I think it's spheres of influence, I think, and Doug very much was kind of referring to this. I would call it kind of like the MongoDB syndrome, which is that you have... and I'm talking about MongoDB before they changed their license, open source project, but very much associated with MongoDB, which basically, pretty much controlled most of the contributions made decisions. And I think Databricks has the same iron cloud hold on Delta lake, but still the market is pretty much associated Delta lake as the Databricks, open source project. I mean, Iceberg is probably further advanced than Hudi in terms of mind share. And so what I see that's breaking down to is essentially, basically the Databricks open source versus the everything else open source, the community open source. So I see it's a very similar type of breakdown that I see repeating itself here. >> So by the way, Mongo has a conference next week, another data platform is kind of not really relevant to this discussion totally. But in the sense it is because there's a lot of discussion on earnings calls these last couple of weeks about consumption and who's exposed, obviously people are concerned about Snowflake's consumption model. Mongo is maybe less exposed because Atlas is prominent in the portfolio, blah, blah, blah. But I wanted to bring up the little bit of controversy that we saw come out of the Snowflake earnings call, where the ever core analyst asked Frank Klutman about discretionary spend. And Frank basically said, look, we're not discretionary. We are deeply operationalized. Whereas he kind of poo-pooed the lakehouse or the data lake, et cetera, saying, oh yeah, data scientists will pull files out and play with them. That's really not our business. Do any of you have comments on that? Help us swing through that controversy. Who wants to take that one? >> Let's put it this way. The SQL folks are from Venus and the data scientists are from Mars. So it means it really comes down to it, sort that type of perception. The fact is, is that, traditionally with analytics, it was very SQL oriented and that basically the quants were kind of off in their corner, where they're using SaaS or where they're using Teradata. It's really a great leveler today, which is that, I mean basic Python it's become arguably one of the most popular programming languages, depending on what month you're looking at, at the title index. And of course, obviously SQL is, as I tell the MongoDB folks, SQL is not going away. You have a large skills base out there. And so basically I see this breaking down to essentially, you're going to have each group that's going to have its own natural preferences for its home turf. And the fact that basically, let's say the Python and scale of folks are using Databricks does not make them any less operational or machine critical than the SQL folks. >> Anybody else want to chime in on that one? >> Yeah, I totally agree with that. Python support in Snowflake is very nascent with all of Snowpark, all of the things outside of SQL, they're very much relying on partners too and make things possible and make data science possible. And it's very early days. I think the bottom line, what we're going to see is each of these camps is going to keep working on doing better at the thing that they don't do today, or they're new to, but they're not going to nail it. They're not going to be best of breed on both sides. So the SQL centric companies and shops are going to do more data science on their database centric platform. That data science driven companies might be doing more BI on their leagues with those vendors and the companies that have highly distributed data, they're going to add fabrics, and maybe offload more of their BI onto those engines, like Dremio and Starburst. >> So I've asked you this before, but I'll ask you Sanjeev. 'Cause Snowflake and Databricks are such great examples 'cause you have the data engineering crowd trying to go into data warehousing and you have the data warehousing guys trying to go into the lake territory. Snowflake has $5 billion in the balance sheet and I've asked you before, I ask you again, doesn't there has to be a semantic layer between these two worlds? Does Snowflake go out and do M&A and maybe buy ad scale or a data mirror? Or is that just sort of a bandaid? What are your thoughts on that Sanjeev? >> I think semantic layer is the metadata. The business metadata is extremely important. At the end of the day, the business folks, they'd rather go to the business metadata than have to figure out, for example, like let's say, I want to update somebody's email address and we have a lot of overhead with data residency laws and all that. I want my platform to give me the business metadata so I can write my business logic without having to worry about which database, which location. So having that semantic layer is extremely important. In fact, now we are taking it to the next level. Now we are saying that it's not just a semantic layer, it's all my KPIs, all my calculations. So how can I make those calculations independent of the compute engine, independent of the BI tool and make them fungible. So more disaggregation of the stack, but it gives us more best of breed products that the customers have to worry about. >> So I want to ask you about the stack, the modern data stack, if you will. And we always talk about injecting machine intelligence, AI into applications, making them more data driven. But when you look at the application development stack, it's separate, the database is tends to be separate from the data and analytics stack. Do those two worlds have to come together in the modern data world? And what does that look like organizationally? >> So organizationally even technically I think it is starting to happen. Microservices architecture was a first attempt to bring the application and the data world together, but they are fundamentally different things. For example, if an application crashes, that's horrible, but Kubernetes will self heal and it'll bring the application back up. But if a database crashes and corrupts your data, we have a huge problem. So that's why they have traditionally been two different stacks. They are starting to come together, especially with data ops, for instance, versioning of the way we write business logic. It used to be, a business logic was highly embedded into our database of choice, but now we are disaggregating that using GitHub, CICD the whole DevOps tool chain. So data is catching up to the way applications are. >> We also have databases, that trans analytical databases that's a little bit of what the story is with MongoDB next week with adding more analytical capabilities. But I think companies that talk about that are always careful to couch it as operational analytics, not the warehouse level workloads. So we're making progress, but I think there's always going to be, or there will long be a separate analytical data platform. >> Until data mesh takes over. (all laughing) Not opening a can of worms. >> Well, but wait, I know it's out of scope here, but wouldn't data mesh say, hey, do take your best of breed to Doug's earlier point. You can't be best of breed at everything, wouldn't data mesh advocate, data lakes do your data lake thing, data warehouse, do your data lake, then you're just a node on the mesh. (Tony laughs) Now you need separate data stores and you need separate teams. >> To my point. >> I think, I mean, put it this way. (laughs) Data mesh itself is a logical view of the world. The data mesh is not necessarily on the lake or on the warehouse. I think for me, the fear there is more in terms of, the silos of governance that could happen and the silo views of the world, how we redefine. And that's why and I want to go back to something what Sanjeev said, which is that it's going to be raising the importance of the semantic layer. Now does Snowflake that opens a couple of Pandora's boxes here, which is one, does Snowflake dare go into that space or do they risk basically alienating basically their partner ecosystem, which is a key part of their whole appeal, which is best of breed. They're kind of the same situation that Informatica was where in the early 2000s, when Informatica briefly flirted with analytic applications and realized that was not a good idea, need to redouble down on their core, which was data integration. The other thing though, that raises the importance of and this is where the best of breed comes in, is the data fabric. My contention is that and whether you use employee data mesh practice or not, if you do employee data mesh, you need data fabric. If you deploy data fabric, you don't necessarily need to practice data mesh. But data fabric at its core and admittedly it's a category that's still very poorly defined and evolving, but at its core, we're talking about a common meta data back plane, something that we used to talk about with master data management, this would be something that would be more what I would say basically, mutable, that would be more evolving, basically using, let's say, machine learning to kind of, so that we don't have to predefine rules or predefine what the world looks like. But so I think in the long run, what this really means is that whichever way we implement on whichever physical platform we implement, we need to all be speaking the same metadata language. And I think at the end of the day, regardless of whether it's a lake, warehouse or a lakehouse, we need common metadata. >> Doug, can I come back to something you pointed out? That those talking about bringing analytic and transaction databases together, you had talked about operationalizing those and the caution there. Educate me on MySQL HeatWave. I was surprised when Oracle put so much effort in that, and you may or may not be familiar with it, but a lot of folks have talked about that. Now it's got nowhere in the market, that no market share, but a lot of we've seen these benchmarks from Oracle. How real is that bringing together those two worlds and eliminating ETL? >> Yeah, I have to defer on that one. That's my colleague, Holger Mueller. He wrote the report on that. He's way deep on it and I'm not going to mock him. >> I wonder if that is something, how real that is or if it's just Oracle marketing, anybody have any thoughts on that? >> I'm pretty familiar with HeatWave. It's essentially Oracle doing what, I mean, there's kind of a parallel with what Google's doing with AlloyDB. It's an operational database that will have some embedded analytics. And it's also something which I expect to start seeing with MongoDB. And I think basically, Doug and Sanjeev were kind of referring to this before about basically kind of like the operational analytics, that are basically embedded within an operational database. The idea here is that the last thing you want to do with an operational database is slow it down. So you're not going to be doing very complex deep learning or anything like that, but you might be doing things like classification, you might be doing some predictives. In other words, we've just concluded a transaction with this customer, but was it less than what we were expecting? What does that mean in terms of, is this customer likely to turn? I think we're going to be seeing a lot of that. And I think that's what a lot of what MySQL HeatWave is all about. Whether Oracle has any presence in the market now it's still a pretty new announcement, but the other thing that kind of goes against Oracle, (laughs) that they had to battle against is that even though they own MySQL and run the open source project, everybody else, in terms of the actual commercial implementation it's associated with everybody else. And the popular perception has been that MySQL has been basically kind of like a sidelight for Oracle. And so it's on Oracles shoulders to prove that they're damn serious about it. >> There's no coincidence that MariaDB was launched the day that Oracle acquired Sun. Sanjeev, I wonder if we could come back to a topic that we discussed earlier, which is this notion of consumption, obviously Wall Street's very concerned about it. Snowflake dropped prices last week. I've always felt like, hey, the consumption model is the right model. I can dial it down in when I need to, of course, the street freaks out. What are your thoughts on just pricing, the consumption model? What's the right model for companies, for customers? >> Consumption model is here to stay. What I would like to see, and I think is an ideal situation and actually plays into the lakehouse concept is that, I have my data in some open format, maybe it's Parquet or CSV or JSON, Avro, and I can bring whatever engine is the best engine for my workloads, bring it on, pay for consumption, and then shut it down. And by the way, that could be Cloudera. We don't talk about Cloudera very much, but it could be one business unit wants to use Athena. Another business unit wants to use some other Trino let's say or Dremio. So every business unit is working on the same data set, see that's critical, but that data set is maybe in their VPC and they bring any compute engine, you pay for the use, shut it down. That then you're getting value and you're only paying for consumption. It's not like, I left a cluster running by mistake, so there have to be guardrails. The reason FinOps is so big is because it's very easy for me to run a Cartesian joint in the cloud and get a $10,000 bill. >> This looks like it's been a sort of a victim of its own success in some ways, they made it so easy to spin up single note instances, multi note instances. And back in the day when compute was scarce and costly, those database engines optimized every last bit so they could get as much workload as possible out of every instance. Today, it's really easy to spin up a new node, a new multi node cluster. So that freedom has meant many more nodes that aren't necessarily getting that utilization. So Snowflake has been doing a lot to add reporting, monitoring, dashboards around the utilization of all the nodes and multi node instances that have spun up. And meanwhile, we're seeing some of the traditional on-prem databases that are moving into the cloud, trying to offer that freedom. And I think they're going to have that same discovery that the cost surprises are going to follow as they make it easy to spin up new instances. >> Yeah, a lot of money went into this market over the last decade, separating compute from storage, moving to the cloud. I'm glad you mentioned Cloudera Sanjeev, 'cause they got it all started, the kind of big data movement. We don't talk about them that much. Sometimes I wonder if it's because when they merged Hortonworks and Cloudera, they dead ended both platforms, but then they did invest in a more modern platform. But what's the future of Cloudera? What are you seeing out there? >> Cloudera has a good product. I have to say the problem in our space is that there're way too many companies, there's way too much noise. We are expecting the end users to parse it out or we expecting analyst firms to boil it down. So I think marketing becomes a big problem. As far as technology is concerned, I think Cloudera did turn their selves around and Tony, I know you, you talked to them quite frequently. I think they have quite a comprehensive offering for a long time actually. They've created Kudu, so they got operational, they have Hadoop, they have an operational data warehouse, they're migrated to the cloud. They are in hybrid multi-cloud environment. Lot of cloud data warehouses are not hybrid. They're only in the cloud. >> Right. I think what Cloudera has done the most successful has been in the transition to the cloud and the fact that they're giving their customers more OnRamps to it, more hybrid OnRamps. So I give them a lot of credit there. They're also have been trying to position themselves as being the most price friendly in terms of that we will put more guardrails and governors on it. I mean, part of that could be spin. But on the other hand, they don't have the same vested interest in compute cycles as say, AWS would have with EMR. That being said, yes, Cloudera does it, I think its most powerful appeal so of that, it almost sounds in a way, I don't want to cast them as a legacy system. But the fact is they do have a huge landed legacy on-prem and still significant potential to land and expand that to the cloud. That being said, even though Cloudera is multifunction, I think it certainly has its strengths and weaknesses. And the fact this is that yes, Cloudera has an operational database or an operational data store with a kind of like the outgrowth of age base, but Cloudera is still based, primarily known for the deep analytics, the operational database nobody's going to buy Cloudera or Cloudera data platform strictly for the operational database. They may use it as an add-on, just in the same way that a lot of customers have used let's say Teradata basically to do some machine learning or let's say, Snowflake to parse through JSON. Again, it's not an indictment or anything like that, but the fact is obviously they do have their strengths and their weaknesses. I think their greatest opportunity is with their existing base because that base has a lot invested and vested. And the fact is they do have a hybrid path that a lot of the others lack. >> And of course being on the quarterly shock clock was not a good place to be under the microscope for Cloudera and now they at least can refactor the business accordingly. I'm glad you mentioned hybrid too. We saw Snowflake last month, did a deal with Dell whereby non-native Snowflake data could access on-prem object store from Dell. They announced a similar thing with pure storage. What do you guys make of that? Is that just... How significant will that be? Will customers actually do that? I think they're using either materialized views or extended tables. >> There are data rated and residency requirements. There are desires to have these platforms in your own data center. And finally they capitulated, I mean, Frank Klutman is famous for saying to be very focused and earlier, not many months ago, they called the going on-prem as a distraction, but clearly there's enough demand and certainly government contracts any company that has data residency requirements, it's a real need. So they finally addressed it. >> Yeah, I'll bet dollars to donuts, there was an EBC session and some big customer said, if you don't do this, we ain't doing business with you. And that was like, okay, we'll do it. >> So Dave, I have to say, earlier on you had brought this point, how Frank Klutman was poo-pooing data science workloads. On your show, about a year or so ago, he said, we are never going to on-prem. He burnt that bridge. (Tony laughs) That was on your show. >> I remember exactly the statement because it was interesting. He said, we're never going to do the halfway house. And I think what he meant is we're not going to bring the Snowflake architecture to run on-prem because it defeats the elasticity of the cloud. So this was kind of a capitulation in a way. But I think it still preserves his original intent sort of, I don't know. >> The point here is that every vendor will poo-poo whatever they don't have until they do have it. >> Yes. >> And then it'd be like, oh, we are all in, we've always been doing this. We have always supported this and now we are doing it better than others. >> Look, it was the same type of shock wave that we felt basically when AWS at the last moment at one of their reinvents, oh, by the way, we're going to introduce outposts. And the analyst group is typically pre briefed about a week or two ahead under NDA and that was not part of it. And when they dropped, they just casually dropped that in the analyst session. It's like, you could have heard the sound of lots of analysts changing their diapers at that point. >> (laughs) I remember that. And a props to Andy Jassy who once, many times actually told us, never say never when it comes to AWS. So guys, I know we got to run. We got some hard stops. Maybe you could each give us your final thoughts, Doug start us off and then-- >> Sure. Well, we've got the Snowflake Summit coming up. I'll be looking for customers that are really doing data science, that are really employing Python through Snowflake, through Snowpark. And then a couple weeks later, we've got Databricks with their Data and AI Summit in San Francisco. I'll be looking for customers that are really doing considerable BI workloads. Last year I did a market overview of this analytical data platform space, 14 vendors, eight of them claim to support lakehouse, both sides of the camp, Databricks customer had 32, their top customer that they could site was unnamed. It had 32 concurrent users doing 15,000 queries per hour. That's good but it's not up to the most demanding BI SQL workloads. And they acknowledged that and said, they need to keep working that. Snowflake asked for their biggest data science customer, they cited Kabura, 400 terabytes, 8,500 users, 400,000 data engineering jobs per day. I took the data engineering job to be probably SQL centric, ETL style transformation work. So I want to see the real use of the Python, how much Snowpark has grown as a way to support data science. >> Great. Tony. >> Actually of all things. And certainly, I'll also be looking for similar things in what Doug is saying, but I think sort of like, kind of out of left field, I'm interested to see what MongoDB is going to start to say about operational analytics, 'cause I mean, they're into this conquer the world strategy. We can be all things to all people. Okay, if that's the case, what's going to be a case with basically, putting in some inline analytics, what are you going to be doing with your query engine? So that's actually kind of an interesting thing we're looking for next week. >> Great. Sanjeev. >> So I'll be at MongoDB world, Snowflake and Databricks and very interested in seeing, but since Tony brought up MongoDB, I see that even the databases are shifting tremendously. They are addressing both the hashtag use case online, transactional and analytical. I'm also seeing that these databases started in, let's say in case of MySQL HeatWave, as relational or in MongoDB as document, but now they've added graph, they've added time series, they've added geospatial and they just keep adding more and more data structures and really making these databases multifunctional. So very interesting. >> It gets back to our discussion of best of breed, versus all in one. And it's likely Mongo's path or part of their strategy of course, is through developers. They're very developer focused. So we'll be looking for that. And guys, I'll be there as well. I'm hoping that we maybe have some extra time on theCUBE, so please stop by and we can maybe chat a little bit. Guys as always, fantastic. Thank you so much, Doug, Tony, Sanjeev, and let's do this again. >> It's been a pleasure. >> All right and thank you for watching. This is Dave Vellante for theCUBE and the excellent analyst. We'll see you next time. (upbeat music)

Published Date : Jun 2 2022

SUMMARY :

And Doug Henschen is the vice president Thank you. Doug let's start off with you And at the same time, me a lot of that material. And of course, at the and then we realized all the and Tony have brought to light. So I'm interested, the And in the cloud, So Sanjeev, is this all hype? But the problem is that we I mean, I look at the space, and offload some of the So different focus, at the end of the day, and warehouses on one conjoined platform. of the sort of big data movement most of the contributions made decisions. Whereas he kind of poo-pooed the lakehouse and the data scientists are from Mars. and the companies that have in the balance sheet that the customers have to worry about. the modern data stack, if you will. and the data world together, the story is with MongoDB Until data mesh takes over. and you need separate teams. that raises the importance of and the caution there. Yeah, I have to defer on that one. The idea here is that the of course, the street freaks out. and actually plays into the And back in the day when the kind of big data movement. We are expecting the end And the fact is they do have a hybrid path refactor the business accordingly. saying to be very focused And that was like, okay, we'll do it. So Dave, I have to say, the Snowflake architecture to run on-prem The point here is that and now we are doing that in the analyst session. And a props to Andy Jassy and said, they need to keep working that. Great. Okay, if that's the case, Great. I see that even the databases I'm hoping that we maybe have and the excellent analyst.

ENTITIES

Entity	Category	Confidence
Doug	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Dave	PERSON	0.99+
Tony	PERSON	0.99+
Uber	ORGANIZATION	0.99+
Frank	PERSON	0.99+
Frank Klutman	PERSON	0.99+
Tony Baers	PERSON	0.99+
Mars	LOCATION	0.99+
Doug Henschen	PERSON	0.99+
2020	DATE	0.99+
AWS	ORGANIZATION	0.99+
Venus	LOCATION	0.99+
Oracle	ORGANIZATION	0.99+
2012	DATE	0.99+
Databricks	ORGANIZATION	0.99+
Dell	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Holger Mueller	PERSON	0.99+
Andy Jassy	PERSON	0.99+
last year	DATE	0.99+
$5 billion	QUANTITY	0.99+
$10,000	QUANTITY	0.99+
14 vendors	QUANTITY	0.99+
Last year	DATE	0.99+
last week	DATE	0.99+
San Francisco	LOCATION	0.99+
SanjMo	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
8,500 users	QUANTITY	0.99+
Sanjeev	PERSON	0.99+
Informatica	ORGANIZATION	0.99+
32 concurrent users	QUANTITY	0.99+
two	QUANTITY	0.99+
Constellation Research	ORGANIZATION	0.99+
Mongo	ORGANIZATION	0.99+
Sanjeev Mohan	PERSON	0.99+
Ahana	ORGANIZATION	0.99+
DaaS	ORGANIZATION	0.99+
EMR	ORGANIZATION	0.99+
32	QUANTITY	0.99+
Atlas	ORGANIZATION	0.99+
Delta	ORGANIZATION	0.99+
Snowflake	ORGANIZATION	0.99+
Python	TITLE	0.99+
each	QUANTITY	0.99+
Athena	ORGANIZATION	0.99+
next week	DATE	0.99+

Peter Burris Big Data Research Presentation

(upbeat music) >> Announcer: Live from San Jose, it's theCUBE presenting Big Data Silicon Valley brought to you by SiliconANGLE Media and its ecosystem partner. >> What am I going to spend time, next 15, 20 minutes or so, talking about. I'm going to answer three things. Our research has gone deep into where are we now in the big data community. I'm sorry, where is the big data community going, number one. Number two is how are we going to get there and number three, what do the numbers say about where we are? So those are the three things. Now, since when we want to get out of here, I'm going to fly through some of these slides but again there's a lot of opportunity for additional conversation because we're all about having conversations with the community. So let's start here. The first thing to know, when we think about where this is all going is it has to be bound. It's inextricably bound up with digital transformation. Well, what is digital transformation? We've done a lot of research on this. This is Peter Drucker who famously said many years ago, that the purpose of a business is to create and keep a customer. That's what a business is. Now what's the difference between a business and a digital business? What's the business between Sears Roebuck, or what's the difference between Sears Roebuck and Amazon? It's data. A digital business uses data as an asset to create and keep customers. It infuses data and operations differently to create more automation. It infuses data and engagement differently to catalyze superior customer experiences. It reformats and restructures its concept of value proposition and product to move from a product to a services orientation. The role of data is the centerpiece of digital business transformation and in many respects that is where we're going, is an understanding and appreciation of that. Now, we think there's going to be a number of strategic capabilities that will have to be built out to make that possible. First off, we have to start thinking about what it means to put data to work. The whole notion of an asset is an asset is something that can be applied to a productive activity. Data can be applied to a productive activity. Now, there's a lot of very interesting implications that we won't get into now, but essentially if we're going to treat data as an asset and think about how we could put more data to work, we're going to focus on three core strategic capabilities about how to make that possible. One, we need to build a capability for collecting and capturing data. That's a lot of what IoT is about. It's a lot of what mobile computing is about. There's going to be a lot of implications around how to ethically and properly do some of those things but a lot of that investment is about finding better and superior ways to capture data. Two, once we are able to capture that data, we have to turn it into value. That in many respects is the essence of big data. How we turn data into data assets, in the form of models, in the form of insights, in the form of any number of other approaches to thinking about how we're going to appropriate value out of data. But it's not just enough to create value out of it and have it sit there as potential value. We have to turn it into kinetic value, to actually do the work with it and that is the last piece. We have to build new capabilities for how we're going to apply data to perform work better, to enact based on data. Now, we've got a concept we're researching now that we call systems of agency, which is the idea that there's going to be a lot of new approaches, new systems with a lot of intelligence and a lot of data that act on behalf of the brand. I'm not going to spend a lot of time going into this but remember that word because I will come back to it. Systems of agency is about how you're going to apply data to perform work with automation, augmentation, and actuation on behalf of your brand. Now, all this is going to happen against the backdrop of cloud optimization. I'll explain what we mean by that right now. Very importantly, increasingly how you create value out of data, how you create future options on the value of your data is going to drive your technology choices. For the first 10 years of the cloud, the presumption is all data was going to go to the cloud. We think that a better way of thinking about it is how is the cloud experience going to come to the data. We've done a lot of research on the cost of data movement and both in terms of the actual out-of-pocket costs but also the potential uncertainty, the transaction costs, etc, associated with data movement. And that's going to be one of the fundamental pieces or elements of how we think about the future of big data and how digital business works, is what we think about data movement. I'll come to that in a bit. But our proposition is increasingly, we're going to see architectural approaches that focus on how we're going to move the cloud experience to the data. We've got this notion of true private cloud which is effectively the idea of the cloud experience on or near premise. That doesn't diminish the role that the cloud's going to play on industry or doesn't say that Amazon and AWS and Microsoft Azure and all the other options are not important. They're crucially important but it means we have to start thinking architecturally about how we're going to create value of data out of data and recognize that means that it, we have to start envisioning how our organization and infrastructure is going to be set up so that we can use data where it needs to be or where it's most valuable and often that's close to the action. So if we think then about that very quickly because it's a backdrop for everything, increasingly we're going to start talking about the idea of where's the workload going to go? Where's workload the dog going to be against this kind of backdrop of the divorce of infrastructure? We believe that and our research pretty strongly shows that a lot of workloads are going to go to true private cloud but a lot of big data is moving into the cloud. This is a prediction we made a few years ago and it's clearly happening and it's underway and we'll get into what some of the implications are. So again, when we say that a lot of the big data elements, a lot of the process of creating value out of data is going to move into the cloud. That doesn't mean that all the systems of agency that build or rely on that data, the inference engines, etc, are also in a public cloud. A lot of them are going to be distributed out to the edge, out to where the action needs to be because of latency and other types of issues. This is a fundamental proposition and I know I'm going fast but hopefully I'm being clear. All right, so let's now get to the second part. This is kind of where the industry's going. Data is an asset. Invest in strategic business capabilities to appreciate, to create those data assets and appreciate the value of those assets and utilize the cloud intelligently to generate and ensure increasing returns. So the next question is well, how will we get there? Now. Right now, not too far from here, Neil Raden for example, was on the show floor yesterday. Neil made the observation that, as he wandered around, he only heard the word big data two or three times. The concept of big data is not dead. Whether the term is or is not is somebody else's decision. Our perspective, very simply, is that the notion is bifurcating. And it's bifurcating because we see different strategic imperatives happening at two different levels. On the one hand, we see infrastructure convergence. The idea that increasingly we have to think about how we're going to bring and federated data together, both from a systems and a data management standpoint. And on the other hand, we're going to see infrastructure or application specialization. That's going to have an enormous implication over next few years, if only because there just aren't enough people in the world that understand how to create value out of data. And there's going to be a lot of effort made over the next few years to find new ways to go from that one expertise group to billions of people, billions of devices, and those are the two dominant considerations in the industry right now. How can we converge data physically, logically, and on the other hand, how can we liberate more of the smarts associated with this very, very powerful approach so that more people get access to the capacities and the capabilities and the assets that are being generated by that process. Now, we've done at Wikibon, probably I don't know, 18, 20, 23 predictions overall on the role that or on the changes being wrought by digital business. Here I'm going to focus on four of them that are central to our big data research. We have many more but I'm just going to focus on four. The first one, when we think about infrastructure convergence we worry about hardware. Here's a prediction about what we think is going to happen with hardware and our observation is we believe pretty strongly that future systems are going to be built on the concept of how do you increase the value of data assets. The technologies are all in place. Simpler parts that it more successfully bind specifically through all its storage and network are going to play together. Why, because increasingly that's the fundamental constraint. How do I make data available to other machines, actors, sources of change, sources of process within the business. Now, we envision or we are watching before our very eyes, new technologies that allow us to take these simple piece parts and weave them together in very powerful fabrics or grids, what we call UniGrid. So that there is almost no latency between data that exists within one of these, call it a molecule, and anywhere else in that grid or lattice. Now again, these are not systems that are going to be here in five years. All the piece parts are here today and there are companies that are actually delivering them. So if you take a look at what Micron has done with Mellanox and other players, that's an example of one of these true private cloud oriented machines in place. The bottom line though is that there is a lot of room left in hardware. A lot of room. This is what cloud suppliers are building and are going to build but increasingly as we think about true private cloud, enterprises are going to look at this as well. So future systems for improving data assets. The capacity of this type of a system with low latency amongst any source of data means that we can now think about data not as... Not as a set of sources that have to be each individually, each having some control over its own data and sinks woven together by middleware and applications but literally as networks of data. As we start to think about distributing data and distributing control and authority associated with that data more broadly across systems, we now have to think about what does it mean to create networks of data? Because that, in many respects, is how these assets are going to be forged. I haven't even mentioned the role that security is going to play in all of this by the way but fundamentally that's how it's likely to play out. We'll have a lot of different sources but from a business standpoint, we're going to think about how those sources come together into a persistent network that can be acted upon by the business. One of the primary drivers of this is what's going on at the edge. Marc Andreessen famously said that software is eating the world, well our observation is great but if software's eating the world, it's eating it at the edge. That's where it's happening. Secondly, that this notion of agency zones. I said I'm going to bring that word up again, how systems act on behalf of a brand or act on behalf of an institution or business is very, very crucial because the time necessary to do the analysis, perform the intelligence, and then take action is a real constraint on how we do things. And our expectation is that we're going to see what we call an agency zone or a hub zone or cloud zone defined by latency and how we architect data to get the data that's necessary to perform that piece of work into the zone where it's required. Now, the implications of this is none of this is going to happen if we don't use AI and related technologies to increasingly automate how we handle infrastructure. And technologies like blockchain have the potential to provide a interesting way of imagining how these networks of data actually get structured. It's not going to solve everything. There's some people that think the blockchain is kind of everything that's necessary but it will be a way of describing a network of data. So we see those technologies on the ascension. But what does it mean for DBMS? In the old way, in the old world, the old way of thinking, the database manager was the control point for data. In the new world these networks of data are going to exist beyond a single DBMS and in fact, over time, that concept of federated data actually has a potential to become real. When we have these networks of data, we're going to need people to act upon them and that's essentially a lot of what the data scientist is going to be doing. Identifying the outcome, identifying the data that's required, and weaving that data through the construction and management, manipulation of pipelines, to ensure that the data as an asset can persist for the purposes of solving a near-term problem or over whatever duration is required to solve a longer term problem. Data scientists remain very important but we're going to see, as a consequence of improvements in tooling capable of doing these things, an increasing recognition that there's a difference between a data scientist and a data scientist. There's going to be a lot of folks that participate in the process of manipulating, maintaining, managing these networks of data to create these business outcomes but we're going to see specialization in those ranks as the tooling is more targeted to specific types of activities. So the data scientist is going to become or will remain an important job, going to lose a little bit of its luster because it's going to become clear what it means. So some data scientists will probably become more, let's call them data network administrators or networks of data administrators. And very importantly as I said earlier, there's just not enough of these people on the planet and so increasingly when we think about again, digital business and the idea of creating data assets. A central challenge is going to be how to create the data or how to turn all the data that can be captured into assets that can be applied to a lot of different uses. There's going to be two fundamental changes to the way we are currently conceiving of the big data world on the horizon. One is well, it's pretty clear that Hadoop can only go so far. Hadoop is a great tool for certain types of activities and certain numbers of individuals. So Hadoop solves problems for an important but relatively limited subset of the world. Some of the new data science platforms that we just talked about, that I just talked about, they're going to help with a degree of specialization that hasn't been available before in the data world, will certainly also help but it also will only take it so far. The real way that we see the work that we're doing, the work that the big data community is performing, turned into sources of value that extend into virtually every single corner of humankind is going to be through these cloud services that are being built and increasingly through packaged applications. A lot of computer science, it still exists between what I just said and when this actually happens. But in many respects, that's the challenge of the vendor ecosystem. How to reconstruct the idea of packaged software, which has historically been built around operations and transaction processing, with a known data model and an unknown or the known process and some technology challenges. How do we reapply that to a world where we now are thinking about, well we don't know exactly what the process is because the data tells us at the moment that the actions going to be taking place. It's a very different way of thinking about application development. A very different way of thinking about what's important in IT and very different way of thinking about how business is going to be constructed and how strategy's going to be established. Packaged applications are going to be crucially important. So in the last few minutes here, what are the numbers? So this is kind of the basis for our analysis. Digital business, role of data is an asset, having an enormous impact in how we think about hardware, how do we think about database management or data management, how we think about the people involved in this, and ultimately how we think about how we're going to deliver all this value out to the world. And the numbers are starting to reflect that. So why don't you think about four numbers as I go through the two or three slides. Hundred and three billion, 68%, 11%, and 2017. So of all the numbers that you will see, those are four of the most important numbers. So let's start by looking at the total market place. This is the growth of the hardware, software, and services pieces of the big data universe. Now we have a fair amount of additional research that breaks all these down into tighter segments, especially in software side. But the key number here is we're talking about big numbers. 103 billion over the course of next 10 years and let's be clear that 103 billion dollars actually has a dramatic amplification on the rest of the computing industry because a lot of the pricing models associated with, especially the software, are tied back to open source which has its own issues. And very importantly, the fact that the services business is going to go through an enormous amount of change over the next five years as service companies better understand how to deliver some of these big data rich applications. The second point to note here is that it was in 2017 that the software market surpassed the hardware market in big data. Again, for first number of years we focused on buying the hardware and the system software associated with that and the software became something that we hope to discover. So I was having a conversation here in theCUBE with the CEO of Transwarp which is a very interesting Chinese big data company and I asked what's the difference between how you do things in China and how we do things in the US? He said well, in the US you guys focus on proof of concept. You spend an enormous amount of time asking, does the hardware work? Does the database software work? Does the data management software work? In China we focus on the outcome. That's what we focus on. Here you have to placate the IT organization to make sure that everybody in IT is comfortable with what's about to happen. In China, were focused on the business people. This is the first year that software is bigger than hardware and it's only going to get bigger and bigger over time. It doesn't mean again, that hardware is dead or hardware is not important. It's going to remain very important but it does mean that the centerpiece of the locus of the industry is moving. Now, when we think about what the market shares look like, it's a very fragmented market. 60%, 68% of the market is still other. This is a highly immature market that's going to go through a number of changes over the next few years. Partly catalyzed by that notion of infrastructure convergence. So in four years our expectation is that, that 68% is going to start going down pretty fast as we see greater consolidation in how some of these numbers come together. Now IBM is the biggest one on the basis of the fact that they operate in all these different segments. They operating the hardware, software, and services segment but especially because they're very strong within the services business. The last one I want to point your attention to is this one. I mentioned earlier on, that our expectation is that the market increasingly is going to move to a packaged application orientation or packaged services orientation as a way of delivering expertise about big data to customers. Splunk is the leading software player right now. Why, because that's the perspective that they've taken. Now, perhaps we're a limited subset. It's perhaps for a limited subset of individuals or markets or of sectors but it takes a packaged application, weaves these technologies together, and applies them to an outcome. And we think this presages more of that kind of activity over the course of the next few years. Oracle, kind of different approach and we'll see how that plays out over the course of the next five years as well. Okay, so that's where the numbers are. Again, a lot more numbers, a lot of people you can talk to. Let me give you some action items. First one, if data was a core asset, how would IT, how would your business be different? Stop and think about that. If it wasn't your buildings that were the asset, it wasn't the machines that were the asset, it wasn't your people by themselves who were the asset, but data was the asset. How would you reinstitutionalize work? That's what every business is starting to ask, even if they don't ask it in the same way. And our advice is, then do it because that's the future of business. Not that data is the only asset but data is a recognized central asset and that's going to have enormous impacts on a lot of things. The second point I want to leave you with, tens of billions of users and I'm including people and devices, are dependent on thousands of data scientists that's an impedance mismatch that cannot be sustained. Packaged apps and these cloud services are going to be the way to bridge that gap. I'd love to tell you that it's all going to be about tools, that we're going to have hundreds of thousands or millions or tens of millions or hundreds of millions of data scientists suddenly emerge out of the woodwork. It's not going to happen. The third thing is we think that big businesses, enterprises, have to master what we call the big inflection. The big tech inflection. The first 50 years were about known process and unknown technology. How do I take an accounting package and do I put on a mainframe or a mini computer a client/server or do I do it on the web? Unknown technology. Well increasingly today, all of us have a pretty good idea what the base technology is going to be. Does anybody doubt it's going to be the cloud? We got a pretty good idea what the base technology is going to be. What we don't know is what are the new problems that we can attack, that we can address with data rich approaches to thinking about how we turn those systems into actors on behalf of our business and customers. So I'm a couple minutes over, I apologize. I want to make sure everybody can get over to the keynotes if you want to. Feel free to stay, theCUBE's going to be live at 9:30. If I got that right. So it's actually pretty exciting if anybody wants to see how it works, feel free to stay. Georgia's here, Neil's here, I'm here. I mentioned Greg Terrio, Dave Volante, John Greco, I think I saw Sam Kahane back in the corner. Any questions, come and ask us, we'll be more than happy. Thank you very much for, oh David Volante. >> David: I have a question. >> Yes. >> David: Do you have time? >> Yep. >> David: So you talk about data as a core asset, that if you look at the top five companies by market cap in the US, Google, Amazon, Facebook, etc. They're data companies, they got data at the core which is kind of what your first bullet here describes. How do you see traditional companies closing that gap where humans, buildings, etc at the core as we enter this machine intelligence era, what's your advice to the traditional companies on how they close that gap? >> All right. So the question was, the most valuable companies in the world are companies that are well down the path of treating data as an asset. How does everybody else get going? Our observation is you go back to what's the value proposition? What actions are most important? what's data is necessary to perform those actions? Can changing the way the data is orchestrated and organized and put together inform or change the cost of performing that work by changing the cost transactions? Can you increase a new service along the same lines and then architect your infrastructure and your business to make sure that the data is near the action in time for the action to be absolute genius to your customer. So it's a relatively simple thought process. That's how Amazon thought, Apple increasingly thinks like that, where they design the experience and they think what data is necessary to deliver that experience. That's a simple approach but it works. Yes, sir. >> Audience Member: With the slide that you had a few slides ago, the market share, the big spenders, and you mentioned that, you asked the question do any of us doubt that cloud is the future? I'm with Snowflake, I don't see many of those large vendors in the cloud and I was wondering if you could speak to what are you seeing in terms of emerging vendors in that space. >> What a great question. So the question was, when you look at the companies that are catalyzing a lot of the change, you don't see a lot of the big companies being at the leadership. And someone from Snowflake just said, well who's going to lead it? That's a big question that has a lot of implications but at this point time it's very clear that the big companies are suffering a bit from the old, from the old, trying to remember what the... RCA syndrome. I think Clay Christensen talked about this. You know, the innovators dilemma. So RCA actually is one of the first creators. They created the transistor and they held a lot of original patents on it. They put that incredible new technology, back in the forties and fifties, under the control of the people who ran the vacuum tube business. When was the last time anybody bought RCA stock? The same problem is existing today. Now, how is that going to play out? Are we going to see a lot of, as we've always seen, a lot of new vendors emerge out of this industry, grow into big vendors with IPO related exits to try to scale their business? Or are we going to see a whole bunch of gobbling up? That's what I'm not clear on but it's pretty clear at this point in time that a lot of the technology, a lot of the science, is being done in smaller places. The moderating feature of that is the services side. Because there's limited groupings of expertise that the companies that today are able to attract that expertise. The Googles, the Facebooks, the AWSs, etc, the Amazons. Are doing so in support of a particular service. IBM and others are trying to attract that talent so they can apply it to customer problems. We'll see over the next few years whether the IBMs and the Accentures and the big service providers are able to attract the kind of talent necessary to diffuse that knowledge into the industry faster. So it's the rate at which that the idea of internet scale computing, the idea of big data being applied to business problems, can diffuse into the marketplace through services. If it can diffuse faster that will have both an accelerating impact for smaller vendors, as it has in the past. But it may also again, have a moderating impact because a lot of that expertise that comes out of IBM, IBM is going to find ways to drive in the product faster than it ever has before. So it's a complicated answer but that's our thinking at this point time. >> Dave: Can I add to that? >> Yeah. (audience member speaking faintly) >> I think that's true now but I think the real question, not to not to argue with Dave but this is part of what we do. The real question is how is that knowledge going to diffuse into the enterprise broadly? Because Airbnb, I doubt is going to get into the business of providing services. (audience member speaking faintly) So I think that the whole concept of community, partnership, ecosystem is going to remain very important as it always has and we'll see how fast those service companies that are dedicated to diffusing knowledge, diffusing knowledge into customer problems actually occurs. Our expectation is that as the tooling gets better, we will see more people be able to present themselves truly as capable of doing this and that will accelerate the process. But the next few years are going to be really turbulent and we'll see which way it actually ends up going. (audience member speaking faintly) >> Audience Member: So I'm with IBM. So I can tell you 100% for sure that we are, I hired literally 50 data scientists in the last three months to go out and do exactly what you're saying. Sit down with clients and help them figure out how to do data science in the enterprise. And so we are in fact scaling it, we're getting people that have done this at Google, Facebook. Not a whole lot of those 'cause we want to do it with people that have actually done it in legacy fortune 500 Companies, right? Because there's a little bit difference there. >> So. >> Audience Member: So we are doing exactly what you said and Microsoft is doing the same thing, Amazon is actually doing the same thing too, Domino Data Lab. >> They don't like they're like talking about it too much but they're doing it. >> Audience Member: But all the big players from the data science platform game are doing this at a different scale. >> Exactly. >> Audience Member: IBM is doing it on a much bigger scale than anyone else. >> And that will have an impact on ultimately how the market gets structured and who the winners end up being. >> Audience Member: To add too, a lot of people thought that, you mentioned the Red Hat of big data, a lot of people thought Cloudera was going to be the Red Hat of big data and if you look at what's happened to their business. (background noise drowns out other sounds) They're getting surrounded by the cloud. We look at like how can we get closer to companies like AWS? That was like a wild card that wasn't expected. >> Yeah but look, at the end of the day Red Hat isn't even the Red Hat of open source. So the bottom line is the thing to focus on is how is this knowledge going to diffuse. That's the thing to focus on. And there's a lot of different ways, some of its going to diffuse through tools. If it diffuses through tools, it increases the likelihood that we'll have more people capable of doing this in IBM and others can hire more. That Citibank can hire more. That's an important participant, that's an important play. So you have something to say about that but it also says we're going to see more of the packaged applications emerge because that facilitates the diffusion. This is not, we haven't figured out, I don't know exactly, nobody knows exactly the exact shape it's going to take. But that's the centerpiece of our big data researches. How is that diffusion process going to happen, accelerate, and what's the resulting structure going to look like? And ultimately how are enterprises going to create value with whatever results. Yes, sir. (audience member asks question faintly) So the recap question is you see more people coming in and promising the moon but being incapable of delivering because they are, partly because the technology is uncertain and for other reasons. So here's our approach. Or here's our observation. We actually did a fair amount of research on this. When you take a look at what we call a approach to doing big data that's optimized for the costs of procurement i.e. let's get the simplest combination of infrastructure, the simplest combination of open-source software, the simplest contracting, to create that proof of concept that you can stand things up very quickly if you have enough expertise but you can create that proof of concept but the process of turning that into actually a production system extends dramatically. And that's one of the reasons why the Clouderas did not take over the universe. There are other reasons. As George Gilbert's research has pointed out, that Cloudera is spending 53, 55 % of their money right now just integrating all the stuff that they bought into the distribution five years ago. Which is a real great recipe for creating customer value. The bottom line though is that if we focus on the time to value in production, we end up taking a different path. We don't focus as much on whether the hardware is going to work and the network is going to work and the storage can be integrated and how it's going to impact the database and what that's going to mean to our Oracle license pool and all the other things that people tend to think about if they're focused on the technology. And so as a consequence, you get better time to value if you focus on bringing the domain expertise, working with the right partner, working with the appropriate approach, to go from what's the value proposition, what actions are associated with a value proposition, what's stated in that area to perform those actions, how can I take transaction costs out of performing those actions, where's the data need to be, what infrastructure do I require? So we have to focus on a time to value not the time to procure. And that's not what a lot of professional IT oriented people are doing because many of them, I hate say it, but many of them still acquire new technology with the promise to helping the business but having a stronger focus on what it's going to mean to their careers. All right, I want to be really respectful to everybody's time. The keynotes start in about five minutes which means you just got time. If you want to stay, feel free to stay. We'll be here, we'll be happy to talk but I think that's pretty much going to close our presentation broadcast. Thank you very much for being an attentive audience and I hope you found this useful. (upbeat music)

Published Date : Mar 9 2018

SUMMARY :

brought to you by SiliconANGLE Media that the actions going to be taking place. by market cap in the US, Google, Amazon, Facebook, etc. or change the cost of performing that work in the cloud and I was wondering if you could speak to the idea of big data being applied to business problems, (audience member speaking faintly) Our expectation is that as the tooling gets better, in the last three months to go out and do and Microsoft is doing the same thing, but they're doing it. Audience Member: But all the big players from Audience Member: IBM is doing it on a much bigger scale how the market gets structured They're getting surrounded by the cloud. and the network is going to work

ENTITIES

Entity	Category	Confidence
Dave Volante	PERSON	0.99+
Marc Andreessen	PERSON	0.99+
Dave	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Neil	PERSON	0.99+
Facebook	ORGANIZATION	0.99+
Sam Kahane	PERSON	0.99+
Google	ORGANIZATION	0.99+
Neil Raden	PERSON	0.99+
2017	DATE	0.99+
John Greco	PERSON	0.99+
Citibank	ORGANIZATION	0.99+
Greg Terrio	PERSON	0.99+
China	LOCATION	0.99+
David Volante	PERSON	0.99+
Apple	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Clay Christensen	PERSON	0.99+
David	PERSON	0.99+
Sears Roebuck	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
Domino Data Lab	ORGANIZATION	0.99+
Peter Drucker	PERSON	0.99+
US	LOCATION	0.99+
Amazons	ORGANIZATION	0.99+
two	QUANTITY	0.99+
11%	QUANTITY	0.99+
George Gilbert	PERSON	0.99+
AWS	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
68%	QUANTITY	0.99+
millions	QUANTITY	0.99+
53, 55 %	QUANTITY	0.99+
60%	QUANTITY	0.99+
Peter Burris	PERSON	0.99+
Facebooks	ORGANIZATION	0.99+
103 billion	QUANTITY	0.99+
Googles	ORGANIZATION	0.99+
second part	QUANTITY	0.99+
second point	QUANTITY	0.99+
IBMs	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
AWSs	ORGANIZATION	0.99+
Accentures	ORGANIZATION	0.99+
Hadoop	TITLE	0.99+
One	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Snowflake	ORGANIZATION	0.99+
four	QUANTITY	0.99+
Hundred	QUANTITY	0.99+
Transwarp	ORGANIZATION	0.99+
Mellanox	ORGANIZATION	0.99+
tens of millions	QUANTITY	0.99+
three things	QUANTITY	0.99+
Micron	ORGANIZATION	0.99+
50 data scientists	QUANTITY	0.99+
First	QUANTITY	0.99+
yesterday	DATE	0.99+
three times	QUANTITY	0.99+
103 billion dollars	QUANTITY	0.99+
Red Hat	TITLE	0.99+
first bullet	QUANTITY	0.99+
Two	QUANTITY	0.99+
Airbnb	ORGANIZATION	0.99+
Secondly	QUANTITY	0.99+
five years	QUANTITY	0.98+
one	QUANTITY	0.98+
both	QUANTITY	0.98+
hundreds of millions	QUANTITY	0.98+
first	QUANTITY	0.98+

Show Wrap - Data Platforms 2017 - #DataPlatforms2017

>> Announcer: Live from the Wigwam in Phoenix, Arizona. It's theCUBE. Covering Data Platforms 2017. Brought to you by Kubo. >> Hey welcome back everybody. Jeff Frick here with theCUBE along with George Gilbert from Wikibon. We've had a tremendous day here at DataPlatforms 2017 at the historic Wigwam Resort, just outside of Phoenix, Arizona. George, you've been to a lot of big data shows. What's your impression? >> I thought we're at the, we're sort of at the edge of what could be a real bridge to something new, which is, we've built big data systems for like out of traditional, as traditional software for deployment on traditional infrastructure. Even if you were going to put it in a virtual machine, it's still not a cloud. You're still dealing with server abstractions. But what's happening with Kubo is, they're saying, once you go to the cloud, whether it's Amazon, Azure, Google or Oracle, you're going to be dealing with services. Services are very different. It greatly simplifies the administrative experience, the developer experience, and more than that, they're focused on, they're focused on turning Kubo, the product on Kubo the service, so that they can automate the management of it. And we know that big data has been choking itself on complexity. Both admin and developer complexity. And they're doing something unique, both on sort of the big data platform management, but also data science operations. And their point, their contention, which we still have to do a little more homework on, is that the vendors who started with software on-prem, can't really make that change very easily without breaking what they've done on-prem. Cuz they have traditional perpetual license physical software as opposed to services, which is what is in the cloud. >> The question is, are people going to wait for them to figure it out. I talked to somebody in the hallway earlier this morning and we were talking about their move to put all their data into, it was S3, on their data lake. And he said, it's part of a much bigger transformational process that we're doing inside the company. And so, this move, from his cloud, public cloud viable, to tell me, give me a reason why it shouldn't go to the cloud, has really kicked in big time. And hear over and over and over that speed and agility, not just in deploying applications, but in operating as a company, is the key to success. And we hear over and over how many, how short the tenure is on the Fortune 500 now, compared to what it used to be. So if you're not speed and agile, which you pretty much have to use cloud, and software driven automated decision-making >> Yeah. >> that's powered by machine learning to eat. >> Those two things. >> A huge percentage of your transaction and decision-making, you're going to get smoked by the person that is. >> Let's let's sort of peel that back. I was talking to Monte Zweben who is the co-founder of Splice Machine, one of the most advance databases that sort of come out of nowhere over the last couple of years. And it's now, I think, in close beta on Amazon. He showed me, like a couple of screens for spinning it up and configuring it on Amazon. And he said, if I were doing that on-prem, he goes I needed Hadoop cluster with HBase. It would take me like four plus months. And that's an example of software versus services. >> Jeff: Right. >> And when you said, when you pointed out that, automated decision-making, powered by machine learning, that's the other part, which is these big data systems ultimately are in the service of creating machine learning models that will inform ever better decisions with ever greater speed and the key then is to plug those models into existing systems of record. >> Jeff: Right. Right. >> Because we're not going to, >> We're not going to to rip those out and rebuild them from scratch. >> Right. But as you just heard, you can pull the data out that you need, run it through a new age application. >> George: Yeah. >> And then feed it back into the old system. >> George: Yes. >> The other thing that came up, it was Oskar, I have to look him up, Oskar Austegard from Gannett was on one of the panels. We always talk about the flexibility to add capacity very easily in a cloud-based solution. But he talked about in the separation of storage and cloud, that they actually have times where they turn off all their compute. It's off. Off. >> And that was If you had to boil down the fundamental compatibility break between on-prem and in the cloud, the Kubo folks, both the CEO and CMO said, look, you cannot reconcile what's essentially server send, where the storage is attached to the compute node, the server. With cloud where you have storage separate from compute and allowing you to spin it down completely. He said those are just the fundamentally incompatible. >> Yeah, yeah. And also, Andretti, one of the founders in his talk, he talked about the big three trends, which we just kind of talked about, he summarized them right in serverless. This continual push towards smaller and smaller units >> George: Yeah. >> of store compute. And the increasing speed of networks is one, from virtual servers to just no servers, to just compute. The second one is automation, you've got to move to automation. >> George: Right. If you're not, you're going to get passed by your competitor that is. Or the competitor you that you don't even know that exists that's going to come out from over your shoulder. And the third one was the intelligence, right. There is a lot of intelligence that can be applied. And I think the other cusp that we're on, is this continuing crazy increase in compute horsepower. Which just keeps going. That the speed and the intelligence of these machines is growing at an exponential curve, not a linear curve. It's going to be bananas in the not too distance future. >> We're soaking up more and more that intelligence with machine learning. The training part of machine learning where the datasets to train a model are immense. Not only the dataset are large, but the amount of time to sort of chug through them to come up with the, just the right mix of variables and values for those variables. Or maybe even multiple models. So that we're going to see in the cloud. And that's going to chew up more and more cycles. Even as we have >> Jeff: Right. Right. >> specialized processors. >> Jeff: Right. But in the data ops world, in theory yes, but I don't have to wait to get it right. Right? I can get it 70% right. >> George: Yeah. >> Which is better than not right. >> George: Yeah. >> And I can continue to iterate over time. In that, I think was the the genius of dev-ops. To stop writing PRDs and MRDs. >> George: Yeah. >> And deliver something. And then listen and adjust. >> George: Yeah. >> And within the data ops world, it's the same thing. Don't try to figure it all out. Take the data you know, have some hypothesis. Build some models and iterate. That's really tough to compete with. >> George: Yeah. >> Fast, fast, fast iteration. >> We're doing actually a fair amount of research on that. On the Wikibon side. Which is, if you build, if you build an enterprise application that has, that is reinforced or informed by models in many different parts, in other words, you're modeling more and more digital entities within the business. >> Jeff: Right. >> Each of those has feedback loops. >> Jeff: Right. Right. >> And when you get the whole thing orchestrated and moving or learning in concert then you have essentially what Michael Porter many years ago called competitive advantage. Which is when each business process reinforces all the other business processes in service of a delivering a value proposition. And those models represent business processes and when they're learning and orchestrated all together, you have a, what Trump called a fined-tuned machine. >> I won't go there. >> Leaving out that it was Bigley and it was finely-tuned machine. >> Yeah, yeah. But the end of the day, if you're using resources and effort to improve an different resource and effort, you're getting a multiplier effect. >> Yes. >> And that's really the key part. Final thought as we go out of here. Are you excited about this? Do you see, they showed the picture the NASA headquarters with the big giant snowball truck loading up? Do you see more and more of this big enterprise data going into S3, going into Google Cloud, going into Microsoft Azure? >> You're asking-- >> Is this the solution for the data lake swamp issue that we've been talking about? >> You're asking the 64 dollar question. Which is, companies, we sensed a year ago at the at the Hortonworks DataWorks Summit in, was in June, down in San Jose last year. That was where we first got the sense that, people were sort of throwing in the towel on trying to build, large scale big data platforms on-prem. And what changes now is, are they now evaluating Hortonworks versus Cloudera versus MapR in the cloud or are they widening their consideration as Kubo suggests. Because now they want to look, not only at Cloud Native Hadoop, but they actually might want to look at Cloud Native Services that aren't necessarily related to Hadoop. >> Right. Right. And we know as a service wins. It's continue. PAS is a service. Software is a service. Time and time again, as a service either eats a lot of share from the incumbent or knocks the incumbent out. So, Hadoop as a service, regardless of your distro, via one of these types of companies on Amazon, it seems like it's got to win, right. It's going to win. >> Yeah but the difference is, so far, so far, the Clouderas and the MapRs and the Hortonworks of the world are more software than service when they're in the cloud. They don't hide all the knobs. You still need You still a highly trained admin to get them up-- >> But not if you buy it as a service, in theory, right. It's going to be packaged up by somebody else and they'll have your knobs all set. >> They're not designed yet that way. >> HD Insight >> Then, then, then, then, They better be careful cuz it might be a new, as a service distro, of the Hadoop system. >> My point, which is what this is. >> Okay, very good, we'll leave it at that. So George, thanks for spending the day with me. Good show as always. >> And I'll be in a better mood next time when you don't steal my candy bars. >> All right. He's George Goodwin. I'm Jeff Frick. You're watching theCUBE. We're at the historic 99 years young, Wigwam Resort, just outside of Phoenix, Arizona. DataPlatforms 2017. Thanks for watching. It's been a busy season. It'll continue to be a busy season. So keep it tuned. SiliconAngle.TV or YouTube.com/SiliconAngle. Thanks for watching.

Published Date : May 26 2017

SUMMARY :

Brought to you by Kubo. at the historic Wigwam Resort, is that the vendors who started with software on-prem, but in operating as a company, is the key to success. you're going to get smoked by the person that is. over the last couple of years. and the key then is to plug those models Jeff: Right. We're not going to to rip those out But as you just heard, We always talk about the flexibility to add capacity And that was And also, Andretti, one of the founders in his talk, And the increasing speed of networks is one, And the third one was the intelligence, right. but the amount of time to sort of chug through them Jeff: Right. But in the data ops world, in theory yes, And I can continue to iterate over time. And then listen and adjust. Take the data you know, have some hypothesis. On the Wikibon side. Jeff: Right. And when you get the whole thing orchestrated Leaving out that it was Bigley But the end of the day, if you're using resources And that's really the key part. You're asking the 64 dollar question. a lot of share from the incumbent and the Hortonworks of the world It's going to be packaged up by somebody else of the Hadoop system. which is what this is. So George, thanks for spending the day with me. And I'll be in a better mood next time We're at the historic 99 years young, Wigwam Resort,

ENTITIES

Entity	Category	Confidence
Jeff Frick	PERSON	0.99+
Jeff	PERSON	0.99+
George	PERSON	0.99+
George Goodwin	PERSON	0.99+
George Gilbert	PERSON	0.99+
Michael Porter	PERSON	0.99+
Andretti	PERSON	0.99+
San Jose	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
64 dollar	QUANTITY	0.99+
70%	QUANTITY	0.99+
Trump	PERSON	0.99+
Oskar Austegard	PERSON	0.99+
June	DATE	0.99+
Oracle	ORGANIZATION	0.99+
Oskar	PERSON	0.99+
Google	ORGANIZATION	0.99+
NASA	ORGANIZATION	0.99+
Kubo	ORGANIZATION	0.99+
one	QUANTITY	0.99+
last year	DATE	0.99+
Hortonworks	ORGANIZATION	0.99+
four plus months	QUANTITY	0.99+
99 years	QUANTITY	0.99+
third one	QUANTITY	0.99+
Phoenix, Arizona	LOCATION	0.99+
a year ago	DATE	0.99+
Splice Machine	ORGANIZATION	0.98+
Both	QUANTITY	0.98+
Microsoft	ORGANIZATION	0.98+
Hadoop	TITLE	0.98+
both	QUANTITY	0.97+
Azure	ORGANIZATION	0.97+
Each	QUANTITY	0.96+
Monte Zweben	PERSON	0.96+
first	QUANTITY	0.94+
MapRs	ORGANIZATION	0.94+
earlier this morning	DATE	0.92+
Wigwam Resort	LOCATION	0.92+
two things	QUANTITY	0.92+
2017	DATE	0.92+
#DataPlatforms2017	EVENT	0.89+
Wikibon	ORGANIZATION	0.89+
second one	QUANTITY	0.89+
three trends	QUANTITY	0.89+
each business process	QUANTITY	0.87+
DataPlatforms	TITLE	0.86+
theCUBE	ORGANIZATION	0.85+
Cloudera	ORGANIZATION	0.85+
Hortonworks DataWorks Summit	EVENT	0.85+
Wigwam Resort	ORGANIZATION	0.85+
Kubo	PERSON	0.84+
Gannett	ORGANIZATION	0.82+
MapR	ORGANIZATION	0.8+
S3	TITLE	0.8+
many years ago	DATE	0.78+
DataPlatforms 2017	EVENT	0.74+
years	DATE	0.73+
YouTube.com/SiliconAngle	OTHER	0.72+
Clouderas	ORGANIZATION	0.7+
Cloud Native	TITLE	0.67+
Platforms	TITLE	0.67+
Google Cloud	TITLE	0.64+
Cloud Native Hadoop	TITLE	0.64+
last couple	DATE	0.64+
Azure	TITLE	0.61+

Day 1 Wrap - DataWorks Summit Europe 2017 - #DWS17 - #theCUBE

(Rhythm music) >> Narrator: Live, from Munich, Germany, it's The Cube. Coverage, DataWorks Summit Europe, 2017. Brought to you by Hortonworks. >> Okay, welcome back everyone. We are live in Munich, Germany for DataWorks 2017, formally known as Hadoop Summit. This is The Cube special coverage of the Big Data world. I'm John Furrier my co-host Dave Vallente. Two days of live coverage, day one wrapping up. Now, Dave, we're just kind of reviewing the scene here. First of all, Europe is a different vibe. But the game is still the same. It's about Big Data evolving from Hadoop to full open source penetration. Puppy's now public in markets Hortonworks, Cloudera is now filing an S-1, Neosoft, Talon, variety of the other public companies. Alteryx. Hadoop is not dead, it's not dying. It certainly is going to have a position in the industry, but the Big Data conversation is front and center. And one thing that's striking to me is that in Europe, more than in the North America, is IOT is more centrally themed in this event. Europe is on the Internet of Things because of the manufacturing, smart cities. So this is a lot of IOT happening here, and I think this is a big discovery certainly, Hortonworks event is much more of a community event than Strata Hadoop. Which is much more about making money and modernization. This show's got a lot more engagement with real conversations and developers sessions. Very engaging audience. Well, yeah, it's Europe. So you've go a little bit different smaller show than North America but to me, IOT, Internet of Things, is bringing the other cloud world with Big Data. That's the forcing function. And real time data is the center of the action. I think is going to be a continuing theme as we move forward. >> So, in 2010 John, it was all about 'What is Hadoop?' With the middle part of that decade was all about Hadoop's got to go into the enterprise. It's gone mainstream in to the enterprise, and now it's sort of 'what's next?' Same wine new bottle. But I will say this, Hadoop, as you pointed out, is not dead. And I liken it to the early web. Web one dot O it was profound. It was a new paradigm. The profundity of Hadoop was that you could ship five megabytes of code to a petabyte of data. And that was the new model and that's spawned, that's catalyzed the Big Data movement. That is with us now and it's entrenched, and now you're seeing layers of innovation on top of that. >> Yeah, and I would just reiterate and reinforce that point by saying that Cloudera, the founders of this industry if you will, with Hadoop the first company to be commercially funded to do what Hortonworks came in after the fact out of Yahoo, came out of a web-scale world. So you have the cloud native DevOps culture, Amar Ujala's at Yahoo, Mike Olson, Jeff Hammerbacher, Christopher Vercelli. These guys were hardcore large-scale data guys. Again, this is the continuation of the evolution, and I think nothing is changed it that regard because those pioneers have set the stage for now the commercialization and now the conversation around operationalizing this cloud is big. And having Alan Nance, a practitioner, rock-star, talking about radical deployments that can drop a billion dollars at a cost savings to the bottom line. This is the kind of conversations we're going to see more of this is going to change the game from, you know, "Hey, I'm the CFO buyer" or "CIO doing IT", to an operational CEO, chief operating officer level conversation. That operational model of cloud is now coming into the view what ERP did in software, those kinds of megatrends, this is happening right now. >> As we talk about the open, the people who are going to make the real money on Big Data are the practitioners, those people applying it. We talked about Alan Nance's example of billion dollar, half a billion dollar cost-savings revenue opportunities, that's where the money's being made. It's not being made, yet anyway with these public companies. You're seeing it Splunk, Tableau, now Cloudera, Hortonworks, MapR. Is MapR even here? >> Haven't seen 'em. >> No I haven't seen MapR, they used to have pretty prominent display at the show. >> You brought up point I want to get back to. This relates to those guys, which is, profitless prosperity. >> Yeah. >> A term used for open source. I think there's a trend happening and I can't put a finger on it but I can kind of feel it. That is the ecosystems of open source are now going to a dimension where they're not yet valued in the classic sense. Most people that build platforms value ecosystems, that's where developers came from. Developer ecosystems fuel open source. But if you look at enterprise, at transformations over the decades, you'd see the successful companies have ecosystems of channel partners; ecosystems of indirect sales if you will. We're seeing the formation, at least I can start seeing the formation of an indirect engine of value creation, vis-à-vis this organic developer community where the people are building businesses and companies. Shaun Connolly pointed to Thintech as an example. Where these startups became financial services businesses that became Thintech suppliers, the banks. They're not in the banking business per se, but they're becoming as important as banks 'cuz they're the providers in Thintech, Thintech being financial tech. So you're starting to see this ecosystem of not "channel partners", resell my equipment or software in the classic sense as we know them as they're called channel partners. But if this continues to develop, the thousand flower blooming strategy, you could argue that Hortonworks is undervalued as a company because they're not realizing those gains yet or those gains can't be measured. So if you're an MBA or an investment banker, you've got to be looking at the market saying, "wow, is there a net-present value to an ecosystem?" It begs the question Dave. >> Dave: It's a great question John. >> This is a wealth creation. A rising tide floats all boats, in that rising tide is a ecosystem value number there. No one has their hands on that, no one's talked about that. That is the upshot in my mind, the silver-lining to what some are saying is the consolidation of Hadoop. Some are saying Cloudera is going to get a huge haircut off their four point one billion dollar value. >> Dave: I think that's inevitable. >> Which is some say, they may lose two to three billion in value, in the IPO. Post IPO which would put them in line with Hortonworks based on the numbers. You know, is that good or bad? I don't think it's bad because the value shifts to the ecosystem. Both Cloudera and Hortonworks both play in open source so you can be glass half-full on one hand, on the haircut, upcoming for Cloudera, two saying "No, the glass is half-full because it's a haircut in the short-term maybe", if that happens. I mean some said Pure Storage was going to get a haircut, they never really did Dave. So, again, no one yet has pegged the valuation of an ecosystem. >> Well, and I think that is a great point, personally I think, I've been sort of racking my brain, will this Big Data hike be realized. Like the internet. You remember the internet hyped up, then it crashed; no one wanted to own any of these companies. But it actually lived up to the hype. It actually exceeded the hype. >> You can get pet food online now, it's called amazon. [Co-Hosts Chuckle Together] All the e-commerce played out. >> Right, e-commerce played out. But I think you're right. But everybody's expecting sort of, was expecting a similar type of cycle. "Oh, this will replace that." And that's now what's going to happen. What's going to happen is the ecosystem is going to create a flywheel effect, is really what you're saying. >> Jeff: Yes. >> And there will be huge valuations that emerge out of this. But today, the guys that we know and love, the Hortonworks, the Clouderas, et cetera, aren't really on the winners list, I mean some of their founders maybe are. But who are the winners? Maybe the customers because they saw a big drop in cost. Apache's a big winner here. Wouldn't ya say? >> Yeah. >> Apache's looking pretty good, Apache Foundation. I would say AWS is a pretty big winner. They're drifting off of this. How about Microsoft and IBM? I mean I feel in a way IBM is sort of co-opted this Big Data meme, and said, "okay, cognitive." And layered all of it's stuff on top of it. Bought the weather company, repositioned the company, now it hasn't translated in to growth, but certainly has profitability implications. >> IBM plays well here, I'll tell you why. They're very big in open source, so that's positive. Two, they have huge track record and staff dealing with professional services in the enterprise. So if transformation is the journey conversation, IBM's right there. You can't ignore IBM on this one. Now, the stack might be different, but again, beauty is in the eye of the beholder because depending on what work clothes you have it depends. IBM is not going to leave you high and dry 'cuz they have a really you need for what they can do with their customers. Where people are going to get blindsided in my opinion, the IBMs and Oracles of the world, and even Microsoft, is what Alan Nance was talking about, the radical transformation around the operating model is going to force people to figure out when to start cannibalizing their own stacks. That's going to be a tell sign for winners and losers in the big game. Because if IBM can shift quickly and co-op the megatrends, make it their own, get out in front of that next wave as Pat Gelsinger would say, they could surf that wave and then tweak, and then get out in front. If they don't get behind that next wave, they're driftwood. It really is all about where you are in the spectrum, and analytics is one of those things in data where, you've got to have a cohesive horizontal strategy. You got to be horizontally scalable with data. You got to make data freely available. You have to have an abstraction layer of software that will allow free movement of data, across systems. That's the number one thing that comes out of seeing the Hortonwork's data platform for instance. Shaun Connolly called it 'connective tissue'. Cloudera is the same thing, they have to start figuring out ways to be better at the data across the horizontal view. Cloudera like IBM has an opportunity as well, to get out in front of the next wave. I think you can see that with AI and machine learning, clearly they're going to go after that. >> Just to finish off on the winners and losers; I mean, the other winner is systems integrators to service these companies. But I like what you said about cannibalizing stacks as an indicator of what's happening. So let's talk about that. Oracle clearly cannibalizing it's stacks, saying, "okay, we're going to the red stack to the cloud, go." Microsoft has made that decision to do that. IBM? To a large degree is cannibalizing it's stack. HP sold off it's stack, said, "we don't want to cannibalize our stack, we want to sell and try to retool." >> So, your question, your point? >> So, haven't they already begun to do that, the big legacy companies? >> They're doing their tweaking the collet and mog, as an example. At Oracle Open World and IBM Interconnect, all the shows we, except for Amazon, 'cuz they're pure cloud. All are taking the unique differentiation approach to their own stuff. IBM is putting stuff that's relate to IBM in their cloud. Oracle differentiates on their stack, for instance, I have no problem with Oracle because they have a huge database business. And, you're high as a kite if you think Oracle's going to lose that database business when data is the number one asset in the world. What Oracle's doing which I think is quite brilliant on Oracle's part is saying, "hey, if you want to run on premise with hardware, we got Sun, and oh by the way, our database is the fastest on our stuff." Check. Win. "Oh you want to move to the cloud? Come to the Oracle cloud, our database runs the fastest in our cloud", which is their stuff in the cloud. So if you're an Oracle customer you just can't lose there. So they created an inimitability around their own database. So does that mean they're going to win the new database war? Maybe not, but they can coexist as a system of records so that's a win. Microsoft Office 365, tightly coupling that with Azure is a brilliant move. Why wouldn't they do that? They're going to migrate their customer base to their own clouds. Oracle and Microsoft are going to migrate their customers to their own cloud. Differentiate and give their customers a gateway to the cloud. VVMware is partnering with Amazon. Brilliant move and they just sold vCloud Air which we reported at Silicon Angle last night, to a French company recently so vCloud Air is gone. Now that puts the VMware clearly in bed with Amazon web services. Great move for VMware, benefit to AWS, that's a differentiation for VMware. >> Dave: Somebody bought vCloud Air? >> I think you missed that last night 'cuz you were traveling. >> Chuckling: That's tongue-in-cheek, I mean what did they get for vCloud Air? >> OVH bought them, French company. >> More de-levering by Michael. >> Well, they're inter-clouding right? I mean de-leveraging the focus, right? So OVH, French company, has a very much coexisted... >> What'd they pay? >> ... strategy. It's undisclosed. >> Yeah, well why? 'Cuz it wasn't a big number. That's my point. >> Back to the other cloud players, Google. I think Google's differentiating on their technology. Great move, smart move. They just got to get, as someone who's been following them, and you know, you and I both love an enterprise experience. They got to speak the enterprise language and execute the language. Not through 19 year olds and interns or recent smart college grads ad and say, "we're instantly enterprise." There's a dis-economies of scale for trying to ramp up and trying to be too heavy on the enterprise. Amazon's got the same problem, you can't hire sales guy fast enough, and oh by the way, find me a sales guy that has ten 15 years executive selling experience to a complex strategic sales, like the enterprise where you now have stakeholders that are in multiple roles and changing roles as Alan Nance pointed out. So the enterprise game is very difficult. >> Yup. >> Very very difficult. >> Well, I think these dupe startups are seeing that. None of them are making money. Shaun Connolly basically said, "hey, it used to be growth they would pay for growth, but now their punishing you if you don't have growth plus profitability." By the way, that's not all totally true. Amazon makes no money, unless stock prices go through the roof. >> There is no self-service, there is no self-service business model for digital transformation for enterprise customers today. It doesn't exist. The value proposition doesn't resinate with customers. It works good for Shadow IT, and if you want to roll out G Suite in some pockets of your organization, but an ad-sense sales force doesn't work in the enterprise. Everyone's finding that out right now because they're basically transforming their enterprise. >> I think Google's going to solve their problem. I think Google has to solve their problem 'cuz... >> I think they will, but to me it's, buy a company, there's a zillion company out there they could buy tomorrow that are private, that have like 300 sales people that are senior people. Pay the bucks, buy a sales force, roll your stuff out and start speaking the language. I think Dianne Green gets this. So, I think, I expect to see Google ... >> Dave: Totally. >> do some things in that area. >> And I think, to you're point, I've always said the rich get richer. The traditional legacy companies, they're holding servant in this. They waited they waited they waited, and they said, "okay now we're going to go put our chips on the table." Oracle made it's bets. IBM made it's bets. HP, not really, betting on hardware. Okay. Fine. Cisco, Microsoft, they're all making their bets. >> It's all about bets on technology and profitability. This is what I'm looking at right now Dave. We talked about it on our intro. Shaun Connolly who's in charge of strategy at Hortonworks clarified it that clearly revenue, losing money is not going to solve the problem for credibility. Profitability matters. This comes back to the point we've said on The Cube multiple years ago and even just as recently as last year, that the world's flipping back down to credibility. Customers in the enterprise want to see credibility and track record. And they're going to evaluate the suppliers based upon key fundamentals in their business. Can they make money? Can they deliver SLAs? These are going to be key requirements, not the shiny new toy from Silicon Valley. Or the cool machine learning algorithm. It has to apply to their product, their value, and they're going to look to companies on the scoreboard and say, "are you profitable?" As a proxy for relevance. >> Well I want to keep it, but I do want to, we've been kind of critical of some of the Hadoop players. Cloudera and Hortonworks specifically. But I want to give them props 'cuz you remember well John, when the legacy enterprise guys started coming into the Hadoop market they all said that they had the same messaging, "we're going to make Hadoop enterprise ready." You remember that well, and I have to say that Hortonworks, Cloudera, I would say MapR as well and the ecosystem, have done a pretty good job of making Hadoop and Big Data enterprise ready. They were already working on it very hard, I think they took it seriously and I think that that's why they are in the mix and they are growing as they are. Shaun Connolly talked about them being operating cashflow positive. Eking out some plus cash. On the next earnings call, pressures on. But we want to see, you know, rocket ships. >> I think they've done a good job, I mean, I don't think anyone's been asleep at the switch. At all, enterprise ready. The questions always been "can they get there fast enough?" I think everyone's recognized that cost of ownership's down. We still solicit on the OpenStack ecosystem, and that they move right from the valley properties. So we'll keep an eye on it, tomorrow we'll be checking in. We got a great day tomorrow. Live coverage here in Munich, Germany for DataWorks 2017. More coverage tomorrow, stay with us. I'm John Furrier with Dave Vallente. Be right back with more tomorrow, day two. Keep following us.

Published Date : Apr 6 2017

SUMMARY :

Brought to you by Hortonworks. Europe is on the Internet of Things And I liken it to the early web. the founders of this industry if you will, on Big Data are the practitioners, prominent display at the show. This relates to those guys, which is, That is the ecosystems of open source the silver-lining to what some are saying on one hand, on the haircut, You remember the internet hyped up, All the e-commerce played out. the ecosystem is going to the Hortonworks, the Clouderas, et cetera, Bought the weather company, IBM is not going to leave you high and dry the red stack to the cloud, go." Now that puts the VMware clearly in bed I think you missed that last night I mean de-leveraging the focus, right? It's undisclosed. 'Cuz it wasn't a big number. like the enterprise where you now have By the way, that's not all totally true. and if you want to roll out G Suite I think Google has to start speaking the language. And I think, to you're point, that the world's flipping of some of the Hadoop players. We still solicit on the

ENTITIES

Entity	Category	Confidence
Dave Vallente	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Cisco	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Michael	PERSON	0.99+
Dianne Green	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Shaun Connolly	PERSON	0.99+
Cloudera	ORGANIZATION	0.99+
Jeff Hammerbacher	PERSON	0.99+
Alan Nance	PERSON	0.99+
Europe	LOCATION	0.99+
two	QUANTITY	0.99+
Pat Gelsinger	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Jeff	PERSON	0.99+
Apache	ORGANIZATION	0.99+
John	PERSON	0.99+
Yahoo	ORGANIZATION	0.99+
tomorrow	DATE	0.99+
Christopher Vercelli	PERSON	0.99+
Google	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Thintech	ORGANIZATION	0.99+
HP	ORGANIZATION	0.99+
billion dollar	QUANTITY	0.99+
VVMware	ORGANIZATION	0.99+
three billion	QUANTITY	0.99+
last year	DATE	0.99+
Silicon Valley	LOCATION	0.99+
Sun	ORGANIZATION	0.99+
Mike Olson	PERSON	0.99+
Two days	QUANTITY	0.99+
North America	LOCATION	0.99+
2010	DATE	0.99+
Neosoft	ORGANIZATION	0.99+
Talon	ORGANIZATION	0.99+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Clouderas: