Data Power Panel V3

(upbeat music) >> The stampede to cloud and massive VC investments has led to the emergence of a new generation of object store based data lakes. And with them two important trends, actually three important trends. First, a new category that combines data lakes and data warehouses aka the lakehouse is emerged as a leading contender to be the data platform of the future. And this novelty touts the ability to address data engineering, data science, and data warehouse workloads on a single shared data platform. The other major trend we've seen is query engines and broader data fabric virtualization platforms have embraced NextGen data lakes as platforms for SQL centric business intelligence workloads, reducing, or somebody even claim eliminating the need for separate data warehouses. Pretty bold. However, cloud data warehouses have added complimentary technologies to bridge the gaps with lakehouses. And the third is many, if not most customers that are embracing the so-called data fabric or data mesh architectures. They're looking at data lakes as a fundamental component of their strategies, and they're trying to evolve them to be more capable, hence the interest in lakehouse, but at the same time, they don't want to, or can't abandon their data warehouse estate. As such we see a battle royale is brewing between cloud data warehouses and cloud lakehouses. Is it possible to do it all with one cloud center analytical data platform? Well, we're going to find out. My name is Dave Vellante and welcome to the data platform's power panel on theCUBE. Our next episode in a series where we gather some of the industry's top analysts to talk about one of our favorite topics, data. In today's session, we'll discuss trends, emerging options, and the trade offs of various approaches and we'll name names. Joining us today are Sanjeev Mohan, who's the principal at SanjMo, Tony Baers, principal at dbInsight. And Doug Henschen is the vice president and principal analyst at Constellation Research. Guys, welcome back to theCUBE. Great to see you again. >> Thank guys. Thank you. >> Thank you. >> So it's early June and we're gearing up with two major conferences, there's several database conferences, but two in particular that were very interested in, Snowflake Summit and Databricks Data and AI Summit. Doug let's start off with you and then Tony and Sanjeev, if you could kindly weigh in. Where did this all start, Doug? The notion of lakehouse. And let's talk about what exactly we mean by lakehouse. Go ahead. >> Yeah, well you nailed it in your intro. One platform to address BI data science, data engineering, fewer platforms, less cost, less complexity, very compelling. You can credit Databricks for coining the term lakehouse back in 2020, but it's really a much older idea. You can go back to Cloudera introducing their Impala database in 2012. That was a database on top of Hadoop. And indeed in that last decade, by the middle of that last decade, there were several SQL on Hadoop products, open standards like Apache Drill. And at the same time, the database vendors were trying to respond to this interest in machine learning and the data science. So they were adding SQL extensions, the likes Hudi and Vertical we're adding SQL extensions to support the data science. But then later in that decade with the shift to cloud and object storage, you saw the vendor shift to this whole cloud, and object storage idea. So you have in the database camp Snowflake introduce Snowpark to try to address the data science needs. They introduced that in 2020 and last year they announced support for Python. You also had Oracle, SAP jumped on this lakehouse idea last year, supporting both the lake and warehouse single vendor, not necessarily quite single platform. Google very recently also jumped on the bandwagon. And then you also mentioned, the SQL engine camp, the Dremios, the Ahanas, the Starbursts, really doing two things, a fabric for distributed access to many data sources, but also very firmly planning that idea that you can just have the lake and we'll help you do the BI workloads on that. And then of course, the data lake camp with the Databricks and Clouderas providing a warehouse style deployments on top of their lake platforms. >> Okay, thanks, Doug. I'd be remiss those of you who me know that I typically write my own intros. This time my colleagues fed me a lot of that material. So thank you. You guys make it easy. But Tony, give us your thoughts on this intro. >> Right. Well, I very much agree with both of you, which may not make for the most exciting television in terms of that it has been an evolution just like Doug said. I mean, for instance, just to give an example when Teradata bought AfterData was initially seen as a hardware platform play. In the end, it was basically, it was all those after functions that made a lot of sort of big data analytics accessible to SQL. (clears throat) And so what I really see just in a more simpler definition or functional definition, the data lakehouse is really an attempt by the data lake folks to make the data lake friendlier territory to the SQL folks, and also to get into friendly territory, to all the data stewards, who are basically concerned about the sprawl and the lack of control in governance in the data lake. So it's really kind of a continuing of an ongoing trend that being said, there's no action without counter action. And of course, at the other end of the spectrum, we also see a lot of the data warehouses starting to edit things like in database machine learning. So they're certainly not surrendering without a fight. Again, as Doug was mentioning, this has been part of a continual blending of platforms that we've seen over the years that we first saw in the Hadoop years with SQL on Hadoop and data warehouses starting to reach out to cloud storage or should say the HDFS and then with the cloud then going cloud native and therefore trying to break the silos down even further. >> Now, thank you. And Sanjeev, data lakes, when we first heard about them, there were such a compelling name, and then we realized all the problems associated with them. So pick it up from there. What would you add to Doug and Tony? >> I would say, these are excellent points that Doug and Tony have brought to light. The concept of lakehouse was going on to your point, Dave, a long time ago, long before the tone was invented. For example, in Uber, Uber was trying to do a mix of Hadoop and Vertical because what they really needed were transactional capabilities that Hadoop did not have. So they weren't calling it the lakehouse, they were using multiple technologies, but now they're able to collapse it into a single data store that we call lakehouse. Data lakes, excellent at batch processing large volumes of data, but they don't have the real time capabilities such as change data capture, doing inserts and updates. So this is why lakehouse has become so important because they give us these transactional capabilities. >> Great. So I'm interested, the name is great, lakehouse. The concept is powerful, but I get concerned that it's a lot of marketing hype behind it. So I want to examine that a bit deeper. How mature is the concept of lakehouse? Are there practical examples that really exist in the real world that are driving business results for practitioners? Tony, maybe you could kick that off. >> Well, put it this way. I think what's interesting is that both data lakes and data warehouse that each had to extend themselves. To believe the Databricks hype it's that this was just a natural extension of the data lake. In point of fact, Databricks had to go outside its core technology of Spark to make the lakehouse possible. And it's a very similar type of thing on the part with data warehouse folks, in terms of that they've had to go beyond SQL, In the case of Databricks. There have been a number of incremental improvements to Delta lake, to basically make the table format more performative, for instance. But the other thing, I think the most dramatic change in all that is in their SQL engine and they had to essentially pretty much abandon Spark SQL because it really, in off itself Spark SQL is essentially stop gap solution. And if they wanted to really address that crowd, they had to totally reinvent SQL or at least their SQL engine. And so Databricks SQL is not Spark SQL, it is not Spark, it's basically SQL that it's adapted to run in a Spark environment, but the underlying engine is C++, it's not scale or anything like that. So Databricks had to take a major detour outside of its core platform to do this. So to answer your question, this is not mature because these are all basically kind of, even though the idea of blending platforms has been going on for well over a decade, I would say that the current iteration is still fairly immature. And in the cloud, I could see a further evolution of this because if you think through cloud native architecture where you're essentially abstracting compute from data, there is no reason why, if let's say you are dealing with say, the same basically data targets say cloud storage, cloud object storage that you might not apportion the task to different compute engines. And so therefore you could have, for instance, let's say you're Google, you could have BigQuery, perform basically the types of the analytics, the SQL analytics that would be associated with the data warehouse and you could have BigQuery ML that does some in database machine learning, but at the same time for another part of the query, which might involve, let's say some deep learning, just for example, you might go out to let's say the serverless spark service or the data proc. And there's no reason why Google could not blend all those into a coherent offering that's basically all triggered through microservices. And I just gave Google as an example, if you could generalize that with all the other cloud or all the other third party vendors. So I think we're still very early in the game in terms of maturity of data lakehouses. >> Thanks, Tony. So Sanjeev, is this all hype? What are your thoughts? >> It's not hype, but completely agree. It's not mature yet. Lakehouses have still a lot of work to do, so what I'm now starting to see is that the world is dividing into two camps. On one hand, there are people who don't want to deal with the operational aspects of vast amounts of data. They are the ones who are going for BigQuery, Redshift, Snowflake, Synapse, and so on because they want the platform to handle all the data modeling, access control, performance enhancements, but these are trade off. If you go with these platforms, then you are giving up on vendor neutrality. On the other side are those who have engineering skills. They want the independence. In other words, they don't want vendor lock in. They want to transform their data into any number of use cases, especially data science, machine learning use case. What they want is agility via open file formats using any compute engine. So why do I say lakehouses are not mature? Well, cloud data warehouses they provide you an excellent user experience. That is the main reason why Snowflake took off. If you have thousands of cables, it takes minutes to get them started, uploaded into your warehouse and start experimentation. Table formats are far more resonating with the community than file formats. But once the cost goes up of cloud data warehouse, then the organization start exploring lakehouses. But the problem is lakehouses still need to do a lot of work on metadata. Apache Hive was a fantastic first attempt at it. Even today Apache Hive is still very strong, but it's all technical metadata and it has so many different restrictions. That's why we see Databricks is investing into something called Unity Catalog. Hopefully we'll hear more about Unity Catalog at the end of the month. But there's a second problem. I just want to mention, and that is lack of standards. All these open source vendors, they're running, what I call ego projects. You see on LinkedIn, they're constantly battling with each other, but end user doesn't care. End user wants a problem to be solved. They want to use Trino, Dremio, Spark from EMR, Databricks, Ahana, DaaS, Frink, Athena. But the problem is that we don't have common standards. >> Right. Thanks. So Doug, I worry sometimes. I mean, I look at the space, we've debated for years, best of breed versus the full suite. You see AWS with whatever, 12 different plus data stores and different APIs and primitives. You got Oracle putting everything into its database. It's actually done some interesting things with MySQL HeatWave, so maybe there's proof points there, but Snowflake really good at data warehouse, simplifying data warehouse. Databricks, really good at making lakehouses actually more functional. Can one platform do it all? >> Well in a word, I can't be best at breed at all things. I think the upshot of and cogen analysis from Sanjeev there, the database, the vendors coming out of the database tradition, they excel at the SQL. They're extending it into data science, but when it comes to unstructured data, data science, ML AI often a compromise, the data lake crowd, the Databricks and such. They've struggled to completely displace the data warehouse when it really gets to the tough SLAs, they acknowledge that there's still a role for the warehouse. Maybe you can size down the warehouse and offload some of the BI workloads and maybe and some of these SQL engines, good for ad hoc, minimize data movement. But really when you get to the deep service level, a requirement, the high concurrency, the high query workloads, you end up creating something that's warehouse like. >> Where do you guys think this market is headed? What's going to take hold? Which projects are going to fade away? You got some things in Apache projects like Hudi and Iceberg, where do they fit Sanjeev? Do you have any thoughts on that? >> So thank you, Dave. So I feel that table formats are starting to mature. There is a lot of work that's being done. We will not have a single product or single platform. We'll have a mixture. So I see a lot of Apache Iceberg in the news. Apache Iceberg is really innovating. Their focus is on a table format, but then Delta and Apache Hudi are doing a lot of deep engineering work. For example, how do you handle high concurrency when there are multiple rights going on? Do you version your Parquet files or how do you do your upcerts basically? So different focus, at the end of the day, the end user will decide what is the right platform, but we are going to have multiple formats living with us for a long time. >> Doug is Iceberg in your view, something that's going to address some of those gaps in standards that Sanjeev was talking about earlier? >> Yeah, Delta lake, Hudi, Iceberg, they all address this need for consistency and scalability, Delta lake open technically, but open for access. I don't hear about Delta lakes in any worlds, but Databricks, hearing a lot of buzz about Apache Iceberg. End users want an open performance standard. And most recently Google embraced Iceberg for its recent a big lake, their stab at having supporting both lakes and warehouses on one conjoined platform. >> And Tony, of course, you remember the early days of the sort of big data movement you had MapR was the most closed. You had Horton works the most open. You had Cloudera in between. There was always this kind of contest as to who's the most open. Does that matter? Are we going to see a repeat of that here? >> I think it's spheres of influence, I think, and Doug very much was kind of referring to this. I would call it kind of like the MongoDB syndrome, which is that you have... and I'm talking about MongoDB before they changed their license, open source project, but very much associated with MongoDB, which basically, pretty much controlled most of the contributions made decisions. And I think Databricks has the same iron cloud hold on Delta lake, but still the market is pretty much associated Delta lake as the Databricks, open source project. I mean, Iceberg is probably further advanced than Hudi in terms of mind share. And so what I see that's breaking down to is essentially, basically the Databricks open source versus the everything else open source, the community open source. So I see it's a very similar type of breakdown that I see repeating itself here. >> So by the way, Mongo has a conference next week, another data platform is kind of not really relevant to this discussion totally. But in the sense it is because there's a lot of discussion on earnings calls these last couple of weeks about consumption and who's exposed, obviously people are concerned about Snowflake's consumption model. Mongo is maybe less exposed because Atlas is prominent in the portfolio, blah, blah, blah. But I wanted to bring up the little bit of controversy that we saw come out of the Snowflake earnings call, where the ever core analyst asked Frank Klutman about discretionary spend. And Frank basically said, look, we're not discretionary. We are deeply operationalized. Whereas he kind of poo-pooed the lakehouse or the data lake, et cetera, saying, oh yeah, data scientists will pull files out and play with them. That's really not our business. Do any of you have comments on that? Help us swing through that controversy. Who wants to take that one? >> Let's put it this way. The SQL folks are from Venus and the data scientists are from Mars. So it means it really comes down to it, sort that type of perception. The fact is, is that, traditionally with analytics, it was very SQL oriented and that basically the quants were kind of off in their corner, where they're using SaaS or where they're using Teradata. It's really a great leveler today, which is that, I mean basic Python it's become arguably one of the most popular programming languages, depending on what month you're looking at, at the title index. And of course, obviously SQL is, as I tell the MongoDB folks, SQL is not going away. You have a large skills base out there. And so basically I see this breaking down to essentially, you're going to have each group that's going to have its own natural preferences for its home turf. And the fact that basically, let's say the Python and scale of folks are using Databricks does not make them any less operational or machine critical than the SQL folks. >> Anybody else want to chime in on that one? >> Yeah, I totally agree with that. Python support in Snowflake is very nascent with all of Snowpark, all of the things outside of SQL, they're very much relying on partners too and make things possible and make data science possible. And it's very early days. I think the bottom line, what we're going to see is each of these camps is going to keep working on doing better at the thing that they don't do today, or they're new to, but they're not going to nail it. They're not going to be best of breed on both sides. So the SQL centric companies and shops are going to do more data science on their database centric platform. That data science driven companies might be doing more BI on their leagues with those vendors and the companies that have highly distributed data, they're going to add fabrics, and maybe offload more of their BI onto those engines, like Dremio and Starburst. >> So I've asked you this before, but I'll ask you Sanjeev. 'Cause Snowflake and Databricks are such great examples 'cause you have the data engineering crowd trying to go into data warehousing and you have the data warehousing guys trying to go into the lake territory. Snowflake has $5 billion in the balance sheet and I've asked you before, I ask you again, doesn't there has to be a semantic layer between these two worlds? Does Snowflake go out and do M&A and maybe buy ad scale or a data mirror? Or is that just sort of a bandaid? What are your thoughts on that Sanjeev? >> I think semantic layer is the metadata. The business metadata is extremely important. At the end of the day, the business folks, they'd rather go to the business metadata than have to figure out, for example, like let's say, I want to update somebody's email address and we have a lot of overhead with data residency laws and all that. I want my platform to give me the business metadata so I can write my business logic without having to worry about which database, which location. So having that semantic layer is extremely important. In fact, now we are taking it to the next level. Now we are saying that it's not just a semantic layer, it's all my KPIs, all my calculations. So how can I make those calculations independent of the compute engine, independent of the BI tool and make them fungible. So more disaggregation of the stack, but it gives us more best of breed products that the customers have to worry about. >> So I want to ask you about the stack, the modern data stack, if you will. And we always talk about injecting machine intelligence, AI into applications, making them more data driven. But when you look at the application development stack, it's separate, the database is tends to be separate from the data and analytics stack. Do those two worlds have to come together in the modern data world? And what does that look like organizationally? >> So organizationally even technically I think it is starting to happen. Microservices architecture was a first attempt to bring the application and the data world together, but they are fundamentally different things. For example, if an application crashes, that's horrible, but Kubernetes will self heal and it'll bring the application back up. But if a database crashes and corrupts your data, we have a huge problem. So that's why they have traditionally been two different stacks. They are starting to come together, especially with data ops, for instance, versioning of the way we write business logic. It used to be, a business logic was highly embedded into our database of choice, but now we are disaggregating that using GitHub, CICD the whole DevOps tool chain. So data is catching up to the way applications are. >> We also have databases, that trans analytical databases that's a little bit of what the story is with MongoDB next week with adding more analytical capabilities. But I think companies that talk about that are always careful to couch it as operational analytics, not the warehouse level workloads. So we're making progress, but I think there's always going to be, or there will long be a separate analytical data platform. >> Until data mesh takes over. (all laughing) Not opening a can of worms. >> Well, but wait, I know it's out of scope here, but wouldn't data mesh say, hey, do take your best of breed to Doug's earlier point. You can't be best of breed at everything, wouldn't data mesh advocate, data lakes do your data lake thing, data warehouse, do your data lake, then you're just a node on the mesh. (Tony laughs) Now you need separate data stores and you need separate teams. >> To my point. >> I think, I mean, put it this way. (laughs) Data mesh itself is a logical view of the world. The data mesh is not necessarily on the lake or on the warehouse. I think for me, the fear there is more in terms of, the silos of governance that could happen and the silo views of the world, how we redefine. And that's why and I want to go back to something what Sanjeev said, which is that it's going to be raising the importance of the semantic layer. Now does Snowflake that opens a couple of Pandora's boxes here, which is one, does Snowflake dare go into that space or do they risk basically alienating basically their partner ecosystem, which is a key part of their whole appeal, which is best of breed. They're kind of the same situation that Informatica was where in the early 2000s, when Informatica briefly flirted with analytic applications and realized that was not a good idea, need to redouble down on their core, which was data integration. The other thing though, that raises the importance of and this is where the best of breed comes in, is the data fabric. My contention is that and whether you use employee data mesh practice or not, if you do employee data mesh, you need data fabric. If you deploy data fabric, you don't necessarily need to practice data mesh. But data fabric at its core and admittedly it's a category that's still very poorly defined and evolving, but at its core, we're talking about a common meta data back plane, something that we used to talk about with master data management, this would be something that would be more what I would say basically, mutable, that would be more evolving, basically using, let's say, machine learning to kind of, so that we don't have to predefine rules or predefine what the world looks like. But so I think in the long run, what this really means is that whichever way we implement on whichever physical platform we implement, we need to all be speaking the same metadata language. And I think at the end of the day, regardless of whether it's a lake, warehouse or a lakehouse, we need common metadata. >> Doug, can I come back to something you pointed out? That those talking about bringing analytic and transaction databases together, you had talked about operationalizing those and the caution there. Educate me on MySQL HeatWave. I was surprised when Oracle put so much effort in that, and you may or may not be familiar with it, but a lot of folks have talked about that. Now it's got nowhere in the market, that no market share, but a lot of we've seen these benchmarks from Oracle. How real is that bringing together those two worlds and eliminating ETL? >> Yeah, I have to defer on that one. That's my colleague, Holger Mueller. He wrote the report on that. He's way deep on it and I'm not going to mock him. >> I wonder if that is something, how real that is or if it's just Oracle marketing, anybody have any thoughts on that? >> I'm pretty familiar with HeatWave. It's essentially Oracle doing what, I mean, there's kind of a parallel with what Google's doing with AlloyDB. It's an operational database that will have some embedded analytics. And it's also something which I expect to start seeing with MongoDB. And I think basically, Doug and Sanjeev were kind of referring to this before about basically kind of like the operational analytics, that are basically embedded within an operational database. The idea here is that the last thing you want to do with an operational database is slow it down. So you're not going to be doing very complex deep learning or anything like that, but you might be doing things like classification, you might be doing some predictives. In other words, we've just concluded a transaction with this customer, but was it less than what we were expecting? What does that mean in terms of, is this customer likely to turn? I think we're going to be seeing a lot of that. And I think that's what a lot of what MySQL HeatWave is all about. Whether Oracle has any presence in the market now it's still a pretty new announcement, but the other thing that kind of goes against Oracle, (laughs) that they had to battle against is that even though they own MySQL and run the open source project, everybody else, in terms of the actual commercial implementation it's associated with everybody else. And the popular perception has been that MySQL has been basically kind of like a sidelight for Oracle. And so it's on Oracles shoulders to prove that they're damn serious about it. >> There's no coincidence that MariaDB was launched the day that Oracle acquired Sun. Sanjeev, I wonder if we could come back to a topic that we discussed earlier, which is this notion of consumption, obviously Wall Street's very concerned about it. Snowflake dropped prices last week. I've always felt like, hey, the consumption model is the right model. I can dial it down in when I need to, of course, the street freaks out. What are your thoughts on just pricing, the consumption model? What's the right model for companies, for customers? >> Consumption model is here to stay. What I would like to see, and I think is an ideal situation and actually plays into the lakehouse concept is that, I have my data in some open format, maybe it's Parquet or CSV or JSON, Avro, and I can bring whatever engine is the best engine for my workloads, bring it on, pay for consumption, and then shut it down. And by the way, that could be Cloudera. We don't talk about Cloudera very much, but it could be one business unit wants to use Athena. Another business unit wants to use some other Trino let's say or Dremio. So every business unit is working on the same data set, see that's critical, but that data set is maybe in their VPC and they bring any compute engine, you pay for the use, shut it down. That then you're getting value and you're only paying for consumption. It's not like, I left a cluster running by mistake, so there have to be guardrails. The reason FinOps is so big is because it's very easy for me to run a Cartesian joint in the cloud and get a $10,000 bill. >> This looks like it's been a sort of a victim of its own success in some ways, they made it so easy to spin up single note instances, multi note instances. And back in the day when compute was scarce and costly, those database engines optimized every last bit so they could get as much workload as possible out of every instance. Today, it's really easy to spin up a new node, a new multi node cluster. So that freedom has meant many more nodes that aren't necessarily getting that utilization. So Snowflake has been doing a lot to add reporting, monitoring, dashboards around the utilization of all the nodes and multi node instances that have spun up. And meanwhile, we're seeing some of the traditional on-prem databases that are moving into the cloud, trying to offer that freedom. And I think they're going to have that same discovery that the cost surprises are going to follow as they make it easy to spin up new instances. >> Yeah, a lot of money went into this market over the last decade, separating compute from storage, moving to the cloud. I'm glad you mentioned Cloudera Sanjeev, 'cause they got it all started, the kind of big data movement. We don't talk about them that much. Sometimes I wonder if it's because when they merged Hortonworks and Cloudera, they dead ended both platforms, but then they did invest in a more modern platform. But what's the future of Cloudera? What are you seeing out there? >> Cloudera has a good product. I have to say the problem in our space is that there're way too many companies, there's way too much noise. We are expecting the end users to parse it out or we expecting analyst firms to boil it down. So I think marketing becomes a big problem. As far as technology is concerned, I think Cloudera did turn their selves around and Tony, I know you, you talked to them quite frequently. I think they have quite a comprehensive offering for a long time actually. They've created Kudu, so they got operational, they have Hadoop, they have an operational data warehouse, they're migrated to the cloud. They are in hybrid multi-cloud environment. Lot of cloud data warehouses are not hybrid. They're only in the cloud. >> Right. I think what Cloudera has done the most successful has been in the transition to the cloud and the fact that they're giving their customers more OnRamps to it, more hybrid OnRamps. So I give them a lot of credit there. They're also have been trying to position themselves as being the most price friendly in terms of that we will put more guardrails and governors on it. I mean, part of that could be spin. But on the other hand, they don't have the same vested interest in compute cycles as say, AWS would have with EMR. That being said, yes, Cloudera does it, I think its most powerful appeal so of that, it almost sounds in a way, I don't want to cast them as a legacy system. But the fact is they do have a huge landed legacy on-prem and still significant potential to land and expand that to the cloud. That being said, even though Cloudera is multifunction, I think it certainly has its strengths and weaknesses. And the fact this is that yes, Cloudera has an operational database or an operational data store with a kind of like the outgrowth of age base, but Cloudera is still based, primarily known for the deep analytics, the operational database nobody's going to buy Cloudera or Cloudera data platform strictly for the operational database. They may use it as an add-on, just in the same way that a lot of customers have used let's say Teradata basically to do some machine learning or let's say, Snowflake to parse through JSON. Again, it's not an indictment or anything like that, but the fact is obviously they do have their strengths and their weaknesses. I think their greatest opportunity is with their existing base because that base has a lot invested and vested. And the fact is they do have a hybrid path that a lot of the others lack. >> And of course being on the quarterly shock clock was not a good place to be under the microscope for Cloudera and now they at least can refactor the business accordingly. I'm glad you mentioned hybrid too. We saw Snowflake last month, did a deal with Dell whereby non-native Snowflake data could access on-prem object store from Dell. They announced a similar thing with pure storage. What do you guys make of that? Is that just... How significant will that be? Will customers actually do that? I think they're using either materialized views or extended tables. >> There are data rated and residency requirements. There are desires to have these platforms in your own data center. And finally they capitulated, I mean, Frank Klutman is famous for saying to be very focused and earlier, not many months ago, they called the going on-prem as a distraction, but clearly there's enough demand and certainly government contracts any company that has data residency requirements, it's a real need. So they finally addressed it. >> Yeah, I'll bet dollars to donuts, there was an EBC session and some big customer said, if you don't do this, we ain't doing business with you. And that was like, okay, we'll do it. >> So Dave, I have to say, earlier on you had brought this point, how Frank Klutman was poo-pooing data science workloads. On your show, about a year or so ago, he said, we are never going to on-prem. He burnt that bridge. (Tony laughs) That was on your show. >> I remember exactly the statement because it was interesting. He said, we're never going to do the halfway house. And I think what he meant is we're not going to bring the Snowflake architecture to run on-prem because it defeats the elasticity of the cloud. So this was kind of a capitulation in a way. But I think it still preserves his original intent sort of, I don't know. >> The point here is that every vendor will poo-poo whatever they don't have until they do have it. >> Yes. >> And then it'd be like, oh, we are all in, we've always been doing this. We have always supported this and now we are doing it better than others. >> Look, it was the same type of shock wave that we felt basically when AWS at the last moment at one of their reinvents, oh, by the way, we're going to introduce outposts. And the analyst group is typically pre briefed about a week or two ahead under NDA and that was not part of it. And when they dropped, they just casually dropped that in the analyst session. It's like, you could have heard the sound of lots of analysts changing their diapers at that point. >> (laughs) I remember that. And a props to Andy Jassy who once, many times actually told us, never say never when it comes to AWS. So guys, I know we got to run. We got some hard stops. Maybe you could each give us your final thoughts, Doug start us off and then-- >> Sure. Well, we've got the Snowflake Summit coming up. I'll be looking for customers that are really doing data science, that are really employing Python through Snowflake, through Snowpark. And then a couple weeks later, we've got Databricks with their Data and AI Summit in San Francisco. I'll be looking for customers that are really doing considerable BI workloads. Last year I did a market overview of this analytical data platform space, 14 vendors, eight of them claim to support lakehouse, both sides of the camp, Databricks customer had 32, their top customer that they could site was unnamed. It had 32 concurrent users doing 15,000 queries per hour. That's good but it's not up to the most demanding BI SQL workloads. And they acknowledged that and said, they need to keep working that. Snowflake asked for their biggest data science customer, they cited Kabura, 400 terabytes, 8,500 users, 400,000 data engineering jobs per day. I took the data engineering job to be probably SQL centric, ETL style transformation work. So I want to see the real use of the Python, how much Snowpark has grown as a way to support data science. >> Great. Tony. >> Actually of all things. And certainly, I'll also be looking for similar things in what Doug is saying, but I think sort of like, kind of out of left field, I'm interested to see what MongoDB is going to start to say about operational analytics, 'cause I mean, they're into this conquer the world strategy. We can be all things to all people. Okay, if that's the case, what's going to be a case with basically, putting in some inline analytics, what are you going to be doing with your query engine? So that's actually kind of an interesting thing we're looking for next week. >> Great. Sanjeev. >> So I'll be at MongoDB world, Snowflake and Databricks and very interested in seeing, but since Tony brought up MongoDB, I see that even the databases are shifting tremendously. They are addressing both the hashtag use case online, transactional and analytical. I'm also seeing that these databases started in, let's say in case of MySQL HeatWave, as relational or in MongoDB as document, but now they've added graph, they've added time series, they've added geospatial and they just keep adding more and more data structures and really making these databases multifunctional. So very interesting. >> It gets back to our discussion of best of breed, versus all in one. And it's likely Mongo's path or part of their strategy of course, is through developers. They're very developer focused. So we'll be looking for that. And guys, I'll be there as well. I'm hoping that we maybe have some extra time on theCUBE, so please stop by and we can maybe chat a little bit. Guys as always, fantastic. Thank you so much, Doug, Tony, Sanjeev, and let's do this again. >> It's been a pleasure. >> All right and thank you for watching. This is Dave Vellante for theCUBE and the excellent analyst. We'll see you next time. (upbeat music)

Published Date : Jun 2 2022

SUMMARY :

And Doug Henschen is the vice president Thank you. Doug let's start off with you And at the same time, me a lot of that material. And of course, at the and then we realized all the and Tony have brought to light. So I'm interested, the And in the cloud, So Sanjeev, is this all hype? But the problem is that we I mean, I look at the space, and offload some of the So different focus, at the end of the day, and warehouses on one conjoined platform. of the sort of big data movement most of the contributions made decisions. Whereas he kind of poo-pooed the lakehouse and the data scientists are from Mars. and the companies that have in the balance sheet that the customers have to worry about. the modern data stack, if you will. and the data world together, the story is with MongoDB Until data mesh takes over. and you need separate teams. that raises the importance of and the caution there. Yeah, I have to defer on that one. The idea here is that the of course, the street freaks out. and actually plays into the And back in the day when the kind of big data movement. We are expecting the end And the fact is they do have a hybrid path refactor the business accordingly. saying to be very focused And that was like, okay, we'll do it. So Dave, I have to say, the Snowflake architecture to run on-prem The point here is that and now we are doing that in the analyst session. And a props to Andy Jassy and said, they need to keep working that. Great. Okay, if that's the case, Great. I see that even the databases I'm hoping that we maybe have and the excellent analyst.

ENTITIES

Entity	Category	Confidence
Doug	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Dave	PERSON	0.99+
Tony	PERSON	0.99+
Uber	ORGANIZATION	0.99+
Frank	PERSON	0.99+
Frank Klutman	PERSON	0.99+
Tony Baers	PERSON	0.99+
Mars	LOCATION	0.99+
Doug Henschen	PERSON	0.99+
2020	DATE	0.99+
AWS	ORGANIZATION	0.99+
Venus	LOCATION	0.99+
Oracle	ORGANIZATION	0.99+
2012	DATE	0.99+
Databricks	ORGANIZATION	0.99+
Dell	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Holger Mueller	PERSON	0.99+
Andy Jassy	PERSON	0.99+
last year	DATE	0.99+
$5 billion	QUANTITY	0.99+
$10,000	QUANTITY	0.99+
14 vendors	QUANTITY	0.99+
Last year	DATE	0.99+
last week	DATE	0.99+
San Francisco	LOCATION	0.99+
SanjMo	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
8,500 users	QUANTITY	0.99+
Sanjeev	PERSON	0.99+
Informatica	ORGANIZATION	0.99+
32 concurrent users	QUANTITY	0.99+
two	QUANTITY	0.99+
Constellation Research	ORGANIZATION	0.99+
Mongo	ORGANIZATION	0.99+
Sanjeev Mohan	PERSON	0.99+
Ahana	ORGANIZATION	0.99+
DaaS	ORGANIZATION	0.99+
EMR	ORGANIZATION	0.99+
32	QUANTITY	0.99+
Atlas	ORGANIZATION	0.99+
Delta	ORGANIZATION	0.99+
Snowflake	ORGANIZATION	0.99+
Python	TITLE	0.99+
each	QUANTITY	0.99+
Athena	ORGANIZATION	0.99+
next week	DATE	0.99+

COMMUNICATIONS Acellerating Network

(upbeat music) >> Hi, today I'm going to talk about network analytics and what that means for telecommunications as we go forward, thinking about 5G, what the impact that's likely to have on network analytics and the data requirement, not just to run the network and to understand the network a little bit better, but also to inform the rest of the operation of the telecommunications business. So as we think about where we are in terms of network analytics and what that is over the last 20 years, the telecommunications industry has evolved its management infrastructure to abstract away from some of the specific technologies in the network. So what do we mean by that? Well, in the, when initial telecommunications networks were designed there were management systems that were built in. Eventually fault management systems, assurance systems, provisioning systems, and so on, were abstracted away. So it didn't matter what network technology had, whether it was a Nokia technology or Erickson technology or Huawei technology or whoever it happened to be. You could just look at your fault management system and understand where faults were happened. As we got into the last sort of 10, 15 years or so telecommunication service providers become, became more sophisticated in terms of their approach to data analytics and specifically network analytics and started asking questions about why and what if in relation to their network performance and network behavior. And so network analytics as a sort of an independent functioning was born and over time more and more data began to get loaded into the network analytics function. So today just about every carrier in the world has a network analytics function that deals with vast quantities of data in big data environments that are now being migrated to the cloud. As all telecommunications carriers are migrating as many IT workloads as possible to the cloud. So what are the things that are happening as we migrate to the cloud that drive enhancements in use cases and enhancements in scale in telecommunications network analytics? Well, 5G is the big thing, right? So 5G, it's not just another G in that sense. I mean, in some cases, in some senses it is 5G means greater bandwidth and lower latency and all those good things. So, you know, we can watch YouTube videos with less interference and, and less sluggish bandwidth and so on and so forth. But 5G is really about the enterprise and enterprise services transformation. 5G is a more secure kind of a network, but 5G is also a more pervasive network. 5G has a fundamentally different network topology than previous generations. So there's going to be more masts. And that means that you can have more pervasive connectivity. So things like IOT and edge applications, autonomous car, current smart cities, these kinds of things are all much better served because you've got more masts, that of course means that you're going to have a lot more data as well and we'll get to that. The second piece is immersive digital services. So with more masts, with more connectivity, with lower latency, with higher bandwidth with the potential is immense for services innovation. And we don't know what those services are going to be. We know that technologies like augmented reality, virtual reality, things like this have great potential, but we have yet to see where those commercial applications are going to be, but the innovation and the innovation potential for 5G is phenomenal. It certainly means that we're going to have a lot more edge devices. And that again is going to lead to an increase in the amount of data that we have available. And then the idea of pervasive connectivity when it comes to smart cities, autonomous cars, integrated traffic management systems, all of this kind of stuff, those kind of smart environments thrive where you've got this kind of pervasive connectivity, this persistent connection to the network. Again, that's going to drive more innovation. And again, because you've got these new connected devices, you're going to get even more data. So this rise, this exponential rise in data is really what's driving the change in network analytics. And there are four major vectors that are driving this increase in data in terms of both volume and in terms of speed. So the first thing is more physical elements. So we said already that 5G networks are going to have a different topology. 5G networks will have more devices, more masts. And so with more physical elements in the network, you're going to get more physical data coming off those physical networks. And so that needs to be aggregated and collected and managed and stored and analyzed and understood so that we can have a better understanding as to, you know, why things happened the way they do, why the network behaves in which they do in ways that it does and why devices that are connected to the network and ultimately of course, consumers, whether they be enterprises or retail customers behave in the way they do in relation to their interaction with the network. Edge nodes and devices. We're going to have an explosion in terms of the number of devices. We've already seen IOT devices with your different kinds of trackers and sensors that are hanging off the edge of the network, whether it's to make buildings smarter or car smart or people smarter in terms of having the measurements and the connectivity and all that sort of stuff. So the numbers of devices on the edge and beyond the edge are going to be phenomenal. One of the things that we've been trying to wrestle with as an industry over the last few years is where does a telco network end and where does the enterprise, or even the consumer network begin? It used to be very clear that, you know, the telco network ended at the router but now it's not that clear anymore because in the enterprise space, particularly with virtualized networking, which we're going to talk about in a second, you start to see end to end network services being deployed. And so are they being those services in some instances that are being managed by the service provider themselves, and in some cases by the enterprise client. Again, the line between where the telco network ends and where the enterprise or the consumer network begins is not clear. So those edge, the proliferation of devices at the edge, in terms of, you know, what those devices are, what the data yield is and what the policies are that need to govern those devices in terms of security and privacy and things like that, that's all going to be really, really important. Virtualized services, we just touched on that briefly. One of the big, big trends that's happening right now is not just the shift of IT operations onto the cloud, but the shift of the network onto the cloud, the virtualization of network infrastructure. And that has two major impacts. First of all, it means that you've got the agility and all of the scale benefits that you get from migrating workloads to the cloud, the elasticity and the growth and all that sort of stuff, but arguably more importantly for the telco it means that with a virtualized network infrastructure, you can offer entire networks to enterprise clients. So, you know, selling to a government department, for example, who's looking to stand up a system for, you know, certification of, you know, export certification, something like that. You can not just sell them the connectivity, but you can sell them the networking and the infrastructure in order to serve that entire end to end application. You could send, you could offer them in theory, an entire end-to-end communications network. And with 5G network slicing, they can even have their own little piece of the 5G bandwidth that's been allocated to gets a carrier and have a complete end to end environment. So the kinds of services that can be offered by telcos given virtualize network infrastructure are many and varied and it's an outstanding opportunity. But what it also means is that the number of network elements virtualized in this case is also exploding. And that means the amount of data that we're getting on, informing us as to how those network elements are behaving, how they're performing is going to go up as well. And then finally, AI complexity. So on the demand inside while historically network analytics, big data has been driven by returns in terms of data monetization, whether that's through cost avoidance or service assurance, or even revenue generation through data monetization and things like that. AI is transforming telecommunications and every other industry. The potential for autonomous operations is extremely attractive. And so understanding how the end-to-end telecommunication service delivery infrastructure works is essential as a training ground for AI models that can help to automate a huge amount of telecommunications operating processes. So the AI demand for data is just going through the roof. And so all of these things combined to mean that big data is getting explosive. It is absolutely going through the roof. So that's a huge thing that's happening. So as telecommunications companies around the world are looking at their network analytics infrastructure, which was initially designed for service insurance primarily and how they migrate that to the cloud. These things are impacting on those decisions because you're not just looking at migrating a workload to operate in the cloud that used to work in the data center. Now you're looking at migrating a workload but also expanding the use cases in that workload. And bear in mind many of those workloads are going to need to remain on-prem. So they'll need to be within a private cloud or at best a hybrid cloud environment in order to satisfy regulatory jurisdiction requirements. So let's talk about an example. So LG Uplus is fantastic service provider in Korea, huge growth in that business over the last, over the last 10, 15 years or so. And obviously most people would be familiar with LG, the electronics brand, maybe less so with, with LG Uplus, but they've been doing phenomenal work and were the first business in the world to launch commercial 5G in 2019. And so a huge milestone that they achieved. And at the same time they deployed the Network Real-time Analytics Platform or NRAP from a combination of Cloudera and our partner Caremark . Now, there were a number of things that were driving the requirement for the analytics platform at the time. Clearly the 5G launch was the big thing that they had in mind, but there were other things that were at play as well. So within the 5G launch, they were looking for a visibility of services and service assurance and service quality. So, you know, what services have been launched? How are they being taken up? What are the issues that are arising? Where the faults happening? Where are the problems? Because clearly when you launch a new service like that you want to understand and be on top of the issues as they arise. So that was really, really important. A second piece was and, you know, this is not a new story to any telco in the world, right? But there are silos in operation. And so it taking advantage of, or eliminating redundancies through the process of digital transformation it was really important. And so particular, the two silos between wired and the wireless sides of the business needed to come together so that there would be an integrated network management system for LG Uplus as they rolled out 5G. So eliminating redundancy and driving cost savings through the integration of the silos was really, really important. And that's a process and the people think every bit, as much as it is a systems and a data thing so another big driver. And the fourth one, you know, we've talked a little bit about some of these things, right? 5G brings huge opportunity for enterprise services innovation. So industry 4.0, digital experience, these kinds of use cases were very important in the South Korean market and in the business of LG Uplus And so looking at AI and how can you apply AI to network management? Again, there's a number of use cases, really, really exciting use cases that have gone live now in LG Uplus since we did this initial deployment and they're making fantastic strides there. Big data analytics for users across LG Uplus, right? So it's not just for, it's not just for the immediate application of 5G or the support or the 5G network, but also for other data analysts and data scientists across the LG Uplus business. Network analytics while primarily it's, it's primary use case is around network management. LG Uplus or network analytics has applications across the entire business, right? So, you know, for customer churn or next best offer for understanding customer experience and customer behavior really important there for digital advertising, for product innovation, all sorts of different use cases and departments within the business needed access to this information. So collaboration sharing across the network, the real-time network analytics platform it was very important. And then finally, as I mentioned, LG group is much bigger than just LG Uplus. It's got the electronics and other pieces, and they had launched a major group wide digital transformation program in 2019. And so being a part of that was important as well. Some of the seems that they were looking to address. So first of all, the integration of wired and wireless data sources, and so getting your assurance data sources, your network data sources and so on integration was really, really important. Scale was massive for them. You know, they're talking about billions of transactions in under a minute being processed and hundreds of terabytes per day. So, you know, phenomenal scale that needed to be, you know, available out of the box as it were. Real time indicators and alarms. And there was lots of KPIs and thresholds set that, you know, to make, made to meet certain criteria, certain standards. Customer specific real time analysis of 5G, services particularly for the launch, root cause analysis and AI based prediction on service anomalies and service issues was a core use case. As I talked about already the provision of service, of data services across the organization. And then support for 5G, served the business service impact was extremely important. So it's not just understand, well, you know, that you have an outreach in a particular network element, but what is the impact on the business of LG Uplus, but also what is the impact on the business of the customer from an outage or an anomaly or a problem on the network. So being able to answer those kinds of questions really, really important too. And as I said between Cloudera and Caremark and LG Uplus they have already, themselves an intrinsic part of the solution, this is what we ended up building. So a big, complicated architecture side. I really don't want to go into too much detail here. You can see these things for yourself, but let me skip through it really quickly. So, first of all, the key data sources. You have all of your wireless network information, other data sources, this is really important 'cause sometimes you kind of skip over this. There are other systems that are in place like the enterprise data warehouse that needed to be integrated as well. Southbound and northbound interfaces. So we get our data, yo know, from the network and so on and network management applications through both file interfaces, Kafka, NiFi are important technologies. And also the RDBMS systems that, you know, like the enterprise data warehouse that we're able to feed that into the system. And then northbound, you know, we spoke already about making network analytics services available across the enterprise. So, you know, having both a file and API interface available for other systems and other consumers across the enterprise is very important. Lots of stuff going on then in the platform itself. Two petabytes and persistent storage, Cloudera HDFS, 300 nodes for the raw data storage and then Kudu for real time storage for, you know, real-time indicator analysis around generation and other real time processes. So there was the core of the solution spark processes for ETL, key quality indicators and alarming, and also a bunch of work done around data preparation, data generation for transferal to, for party systems through the northbound interfaces. Impala API queries for real-time systems there on the right hand side and then a whole bunch of clustering classification, prediction jobs through the ML processes, the machine learning processes. Again, another key use case, and we've done a bunch of work on that, and I encourage you to have a look at the Cloudera website for more detail on some of the work that we did here. Some pretty cool stuff. And then finally, just the upstream services, some of these, there's lots more than simply these ones, but service assurance is really, really important so SQM, CEM and ACD right to the service quality management customer experience autonomous control is really, really important consumers of the real-time analytics platform and your conventional service assurance functions like faulted performance management. These things are as much consumers of the information and the network analytics platform as they are providers of data to the network analytics platform. So some of the specific use cases that have been stood up and that are delivering value to this day and there's lots of more besides, but these are just three that we pulled out. So, first of all, sort of specific monitoring and customer quality analysis care and response. So again, growing from the initial 5G launch, and then broadening into broader services, understanding where there are issues so that when people complain, when people have an issue that we can answer the concerns of the client in a substantive way. AI functions around root cause analysis understanding why things went wrong when they went wrong and also making recommendations as to how to avoid those occurrences in the future. So, you know, what preventative measures can be taken. And then finally, the collaboration function across LG Uplus extremely important and continues to be important to this day where data is shared throughout the enterprise, through the API Lira, through file interfaces and other things and through interface integrations with upstream systems. So that's kind of the real quick run through of LG Uplus. And the numbers are just staggering. You know, we've seen upwards of a billion transactions in under 40 seconds being tested. And we've gone through beyond those thresholds now already, and we're start, and this isn't just a theoretical sort of a benchmarking test or something like that. We're seeing these kinds of volumes of data and not too far down the track. So with those things that I mentioned earlier or with the proliferation of network infrastructure in the 5G context with virtualized elements, with all of these other bits and pieces are driving massive volumes of data towards the network analytics platform. So phenomenal scale. This is just one example. We work with service providers all over the world is over 80% of the top 100 telecommunication service providers run on Cloudera. They use Cloudera in the network and we're seeing those customers all migrating Legacy Cloudera platforms now onto CDP onto the Cloudera Data Platform. They're increasing the jobs that they do. So it's not just warehousing, not just ingestion of ETL and moving into things like machine learning. And also looking at new data sources from places like NW DAF the network data analytics function in 5G or the management and orchestration layer in software defined network function virtualization. So, you know, new use cases coming in all the time, new data sources coming in all the time, growth in the application scope from, as we say, from edge to AI. And so it's really exciting to see how the footprint is growing and how the applications in telecommunications are really making a difference in facilitating network transformation. And that's covering, that's me covered for today. I hope you found that helpful. By all means please reach out. There's a couple of links here. You can follow me on Twitter. You can connect to the telecommunications page. Reach out to me directly at Cloudera I'd love to answer your questions and talk to you about how big data is transforming networks and how network transformation is accelerating telcos throughout the world.

Published Date : Aug 5 2021

SUMMARY :

and in the business of LG Uplus

ENTITIES

Entity	Category	Confidence
Korea	LOCATION	0.99+
2019	DATE	0.99+
LG	ORGANIZATION	0.99+
second piece	QUANTITY	0.99+
Nokia	ORGANIZATION	0.99+
10	QUANTITY	0.99+
Caremark	ORGANIZATION	0.99+
Erickson	ORGANIZATION	0.99+
today	DATE	0.99+
Cloudera	ORGANIZATION	0.99+
Huawei	ORGANIZATION	0.99+
LG Uplus	ORGANIZATION	0.99+
three	QUANTITY	0.99+
over 80%	QUANTITY	0.99+
one example	QUANTITY	0.99+
telco	ORGANIZATION	0.98+
YouTube	ORGANIZATION	0.98+
One	QUANTITY	0.98+
Two petabytes	QUANTITY	0.98+
Kafka	TITLE	0.98+
under 40 seconds	QUANTITY	0.98+
four major vectors	QUANTITY	0.98+
both	QUANTITY	0.97+
first business	QUANTITY	0.97+
first thing	QUANTITY	0.97+
hundreds of terabytes	QUANTITY	0.97+
Twitter	ORGANIZATION	0.96+
Cloudera	TITLE	0.96+
NiFi	TITLE	0.95+
First	QUANTITY	0.95+
two silos	QUANTITY	0.95+
fourth one	QUANTITY	0.94+
5G	ORGANIZATION	0.93+
both volume	QUANTITY	0.9+
15 years	QUANTITY	0.89+
last 20 years	DATE	0.88+
100 telecommunication service providers	QUANTITY	0.86+
two major impacts	QUANTITY	0.85+
Uplus	COMMERCIAL_ITEM	0.85+
first	QUANTITY	0.79+
a billion transactions	QUANTITY	0.74+
billions of transactions	QUANTITY	0.73+
both file	QUANTITY	0.71+
last few years	DATE	0.71+
couple	QUANTITY	0.71+
under a minute	QUANTITY	0.7+
links	QUANTITY	0.7+
Legacy	TITLE	0.69+
API Lira	TITLE	0.69+
300 nodes	QUANTITY	0.66+
a second	QUANTITY	0.64+
Kudu	ORGANIZATION	0.63+
5G	QUANTITY	0.6+
4.0	QUANTITY	0.56+
South Korean	OTHER	0.55+
NRAP	TITLE	0.48+

F1 Racing at the Edge of Real-Time Data: Omer Asad, HPE & Matt Cadieux, Red Bull Racing

>>Edge computing is predict, projected to be a multi-trillion dollar business. You know, it's hard to really pinpoint the size of this market. Let alone fathom the potential of bringing software, compute, storage, AI, and automation to the edge and connecting all that to clouds and on-prem systems. But what, you know, what is the edge? Is it factories? Is it oil rigs, airplanes, windmills, shipping containers, buildings, homes, race cars. Well, yes and so much more. And what about the data for decades? We've talked about the data explosion. I mean, it's mind boggling, but guess what, we're gonna look back in 10 years and laugh. What we thought was a lot of data in 2020, perhaps the best way to think about edge is not as a place, but when is the most logical opportunity to process the data and maybe it's the first opportunity to do so where it can be decrypted and analyzed at very low latencies that that defines the edge. And so by locating compute as close as possible to the sources of data, to reduce latency and maximize your ability to get insights and return them to users quickly, maybe that's where the value lies. Hello everyone. And welcome to this cube conversation. My name is Dave Vellante and with me to noodle on these topics is Omar Assad, VP, and GM of primary storage and data management services at HPE. Hello, Omer. Welcome to the program. >>Hey Steve. Thank you so much. Pleasure to be here. >>Yeah. Great to see you again. So how do you see the edge in the broader market shaping up? >>Uh, David? I think that's a super important, important question. I think your ideas are quite aligned with how we think about it. Uh, I personally think, you know, as enterprises are accelerating their sort of digitization and asset collection and data collection, uh, they're typically, especially in a distributed enterprise, they're trying to get to their customers. They're trying to minimize the latency to their customers. So especially if you look across industries manufacturing, which is distributed factories all over the place, they are going through a lot of factory transformations where they're digitizing their factories. That means a lot more data is being now being generated within their factories. A lot of robot automation is going on that requires a lot of compute power to go out to those particular factories, which is going to generate their data out there. We've got insurance companies, banks that are creating and interviewing and gathering more customers out at the edge for that. >>They need a lot more distributed processing out at the edge. What this is requiring is what we've seen is across analysts. A common consensus is that more than 50% of an enterprise is data, especially if they operate globally around the world is going to be generated out at the edge. What does that mean? More data is new data is generated at the edge, but needs to be stored. It needs to be processed data. What is not required needs to be thrown away or classified as not important. And then it needs to be moved for Dr. Purposes either to a central data center or just to another site. So overall in order to give the best possible experience for manufacturing, retail, uh, you know, especially in distributed enterprises, people are generating more and more data centric assets out at the edge. And that's what we see in the industry. >>Yeah. We're definitely aligned on that. There's some great points. And so now, okay. You think about all this diversity, what's the right architecture for these deploying multi-site deployments, robo edge. How do you look at that? >>Oh, excellent question. So now it's sort of, you know, obviously you want every customer that we talk to wants SimpliVity, uh, in, in, and, and, and, and no pun intended because SimpliVity is reasoned with a simplistic edge centric architecture, right? So because let's, let's take a few examples. You've got large global retailers, uh, they have hundreds of global retail stores around the world that is generating data that is producing data. Then you've got insurance companies, then you've got banks. So when you look at a distributed enterprise, how do you deploy in a very simple and easy to deploy manner, easy to lifecycle, easy to mobilize and easy to lifecycle equipment out at the edge. What are some of the challenges that these customers deal with these customers? You don't want to send a lot of ID staff out there because that adds costs. You don't want to have islands of data and islands of storage and promote sites, because that adds a lot of States outside of the data center that needs to be protected. >>And then last but not the least, how do you push lifecycle based applications, new applications out at the edge in a very simple to deploy better. And how do you protect all this data at the edge? So the right architecture in my opinion, needs to be extremely simple to deploy. So storage, compute and networking, uh, out towards the edge in a hyperconverged environment. So that's, we agree upon that. It's a very simple to deploy model, but then comes, how do you deploy applications on top of that? How do you manage these applications on top of that? How do you back up these applications back towards the data center, all of this keeping in mind that it has to be as zero touch as possible. We at HBS believe that it needs to be extremely simple. Just give me two cables, a network cable, a power cable, tied it up, connected to the network, push it state from the data center and back up at state from the ed back into the data center. Extremely simple. >>It's gotta be simple because you've got so many challenges. You've got physics that you have to deal your latency to deal with. You got RPO and RTO. What happens if something goes wrong, you've gotta be able to recover quickly. So, so that's great. Thank you for that. Now you guys have hard news. W what is new from HPE in this space >>From a, from a, from a, from a deployment perspective, you know, HPE SimpliVity is just gaining like it's exploding, like crazy, especially as distributed enterprises adopt it as it's standardized edge architecture, right? It's an HCI box has got stories, computer networking, all in one. But now what we have done is not only you can deploy applications all from your standard V-Center interface, from a data center, what have you have now added is the ability to backup to the cloud, right? From the edge. You can also back up all the way back to your core data center. All of the backup policies are fully automated and implemented in the, in the distributed file system. That is the heart and soul of, of the SimpliVity installation. In addition to that, the customers now do not have to buy any third-party software into backup is fully integrated in the architecture and it's van efficient. >>In addition to that, now you can backup straight to the client. You can backup to a central, uh, high-end backup repository, which is in your data center. And last but not least, we have a lot of customers that are pushing the limit in their application transformation. So not only do we previously were, were one-on-one them leaving VMware deployments out at the edge sites. Now revolver also added both stateful and stateless container orchestration, as well as data protection capabilities for containerized applications out at the edge. So we have a lot, we have a lot of customers that are now deploying containers, rapid manufacturing containers to process data out at remote sites. And that allows us to not only protect those stateful applications, but back them up, back into the central data center. >>I saw in that chart, it was a light on no egress fees. That's a pain point for a lot of CEOs that I talked to. They grit their teeth at those entities. So, so you can't comment on that or >>Excellent, excellent question. I'm so glad you brought that up and sort of at that point, uh, uh, pick that up. So, uh, along with SimpliVity, you know, we have the whole green Lake as a service offering as well. Right? So what that means, Dave, is that we can literally provide our customers edge as a service. And when you compliment that with, with Aruba wired wireless infrastructure, that goes at the edge, the hyperconverged infrastructure, as part of SimpliVity, that goes at the edge, you know, one of the things that was missing with cloud backups is the every time you backup to the cloud, which is a great thing, by the way, anytime you restore from the cloud, there is that breastfeed, right? So as a result of that, as part of the GreenLake offering, we have cloud backup service natively now offered as part of HPE, which is included in your HPE SimpliVity edge as a service offering. So now not only can you backup into the cloud from your edge sites, but you can also restore back without any egress fees from HBS data protection service. Either you can restore it back onto your data center, you can restore it back towards the edge site and because the infrastructure is so easy to deploy centrally lifecycle manage, it's very mobile. So if you want to deploy and recover to a different site, you could also do that. >>Nice. Hey, uh, can you, Omar, can you double click a little bit on some of the use cases that customers are choosing SimpliVity for, particularly at the edge, and maybe talk about why they're choosing HPE? >>What are the major use cases that we see? Dave is obviously, uh, easy to deploy and easy to manage in a standardized form factor, right? A lot of these customers, like for example, we have large retailer across the us with hundreds of stores across us. Right now you cannot send service staff to each of these stores. These data centers are their data center is essentially just a closet for these guys, right? So now how do you have a standardized deployment? So standardized deployment from the data center, which you can literally push out and you can connect a network cable and a power cable, and you're up and running, and then automated backup elimination of backup and state and BR from the edge sites and into the data center. So that's one of the big use cases to rapidly deploy new stores, bring them up in a standardized configuration, both from a hardware and a software perspective, and the ability to backup and recover that instantly. >>That's one large use case. The second use case that we see actually refers to a comment that you made in your opener. Dave was where a lot of these customers are generating a lot of the data at the edge. This is robotics automation that is going to up in manufacturing sites. These is racing teams that are out at the edge of doing post-processing of their cars data. Uh, at the same time, there is disaster recovery use cases where you have, uh, you know, campsites and local, uh, you know, uh, agencies that go out there for humanity's benefit. And they move from one site to the other. It's a very, very mobile architecture that they need. So those, those are just a few cases where we were deployed. There was a lot of data collection, and there's a lot of mobility involved in these environments. So you need to be quick to set up quick, to up quick, to recover, and essentially you're up to your next, next move. >>You seem pretty pumped up about this, uh, this new innovation and why not. >>It is, it is, uh, you know, especially because, you know, it is, it has been taught through with edge in mind and edge has to be mobile. It has to be simple. And especially as, you know, we have lived through this pandemic, which, which I hope we see the tail end of it in at least 2021, or at least 2022. They, you know, one of the most common use cases that we saw, and this was an accidental discovery. A lot of the retail sites could not go out to service their stores because, you know, mobility is limited in these, in these strange times that we live in. So from a central center, you're able to deploy applications, you're able to recover applications. And, and a lot of our customers said, Hey, I don't have enough space in my data center to back up. Do you have another option? So then we rolled out this update release to SimpliVity verse from the edge site. You can now directly back up to our backup service, which is offered on a consumption basis to the customers, and they can recover that anywhere they want. >>Fantastic Omer, thanks so much for coming on the program today. >>It's a pleasure, Dave. Thank you. >>All right. Awesome to see you. Now, let's hear from red bull racing and HPE customer, that's actually using SimpliVity at the edge. Countdown really begins when the checkered flag drops on a Sunday. It's always about this race to manufacture >>The next designs to make it more adapt to the next circuit to run those. Of course, if we can't manufacture the next component in time, all that will be wasted. >>Okay. We're back with Matt kudu, who is the CIO of red bull racing? Matt, it's good to see you again. >>Great to say, >>Hey, we're going to dig into a real-world example of using data at the edge and in near real time to gain insights that really lead to competitive advantage. But, but first Matt, tell us a little bit about red bull racing and your role there. >>Sure. So I'm the CIO at red bull racing and that red bull race. And we're based in Milton Keynes in the UK. And the main job job for us is to design a race car, to manufacture the race car, and then to race it around the world. So as CIO, we need to develop the ITT group needs to develop the applications is the design, manufacturing racing. We also need to supply all the underlying infrastructure and also manage security. So it's really interesting environment. That's all about speed. So this season we have 23 races and we need to tear the car apart and rebuild it to a unique configuration for every individual race. And we're also designing and making components targeted for races. So 20 a movable deadlines, um, this big evolving prototype to manage with our car. Um, but we're also improving all of our tools and methods and software that we use to design and make and race the car. >>So we have a big can do attitude of the company around continuous improvement. And the expectations are that we continuously make the car faster. That we're, that we're winning races, that we improve our methods in the factory and our tools. And, um, so for, I take it's really unique and that we can be part of that journey and provide a better service. It's also a big challenge to provide that service and to give the business the agility, agility, and needs. So my job is, is really to make sure we have the right staff, the right partners, the right technical platforms. So we can live up to expectations >>That tear down and rebuild for 23 races. Is that because each track has its own unique signature that you have to tune to, or are there other factors involved there? >>Yeah, exactly. Every track has a different shape. Some have lots of strengths. Some have lots of curves and lots are in between. Um, the track surface is very different and the impact that has some tires, um, the temperature and the climate is very different. Some are hilly, some, a big curves that affect the dynamics of the power. So all that in order to win, you need to micromanage everything and optimize it for any given race track. >>Talk about some of the key drivers in your business and some of the key apps that give you a competitive advantage to help you win races. >>Yeah. So in our business, everything is all about speed. So the car obviously needs to be fast, but also all of our business operations needed to be fast. We need to be able to design a car and it's all done in the virtual world, but the, the virtual simulations and designs need to correlate to what happens in the real world. So all of that requires a lot of expertise to develop the simulation is the algorithms and have all the underlying infrastructure that runs it quickly and reliably. Um, in manufacturing, um, we have cost caps and financial controls by regulation. We need to be super efficient and control material and resources. So ERP and MES systems are running and helping us do that. And at the race track itself in speed, we have hundreds of decisions to make on a Friday and Saturday as we're fine tuning the final configuration of the car. And here again, we rely on simulations and analytics to help do that. And then during the race, we have split seconds, literally seconds to alter our race strategy if an event happens. So if there's an accident, um, and the safety car comes out, or the weather changes, we revise our tactics and we're running Monte Carlo for example. And he is an experienced engineers with simulations to make a data-driven decision and hopefully a better one and faster than our competitors, all of that needs it. Um, so work at a very high level. >>It's interesting. I mean, as a lay person, historically we know when I think about technology and car racing, of course, I think about the mechanical aspects of a self-propelled vehicle, the electronics and the light, but not necessarily the data, but the data's always been there. Hasn't it? I mean, maybe in the form of like tribal knowledge, if somebody who knows the track and where the Hills are and experience and gut feel, but today you're digitizing it and you're, you're processing it and close to real time. >>It's amazing. I think exactly right. Yeah. The car's instrumented with sensors, we post-process at Virgin, um, video, um, image analysis, and we're looking at our car, our competitor's car. So there's a huge amount of, um, very complicated models that we're using to optimize our performance and to continuously improve our car. Yeah. The data and the applications that can leverage it are really key. Um, and that's a critical success factor for us. >>So let's talk about your data center at the track, if you will. I mean, if I can call it that paint a picture for us, what does that look like? >>So we have to send, um, a lot of equipment to the track at the edge. Um, and even though we have really a great wide area network linked back to the factory and there's cloud resources, a lot of the trucks are very old. You don't have hardened infrastructure, don't have ducks that protect cabling, for example, and you could lose connectivity to remote locations. So the applications we need to operate the car and to make really critical decisions, all that needs to be at the edge where the car operates. So historically we had three racks of equipment, like a safe infrastructure, um, and it was really hard to manage, um, to make changes. It was too flexible. Um, there were multiple panes of glass, um, and, um, and it was too slow. It didn't run her applications quickly. Um, it was also too heavy and took up too much space when you're cramped into a garage with lots of environmental constraints. >>So we, um, we'd, we'd introduced hyperconvergence into the factory and seen a lot of great benefits. And when we came time to refresh our infrastructure at the track, we stepped back and said, there's a lot smarter way of operating. We can get rid of all the slow and flexible, expensive legacy and introduce hyperconvergence. And we saw really excellent benefits for doing that. Um, we saw a three X speed up for a lot of our applications. So I'm here where we're post-processing data, and we have to make decisions about race strategy. Time is of the essence in a three X reduction in processing time really matters. Um, we also, um, were able to go from three racks of equipment down to two racks of equipment and the storage efficiency of the HPE SimpliVity platform with 20 to one ratios allowed us to eliminate a rack. And that actually saved a hundred thousand dollars a year in freight costs by shipping less equipment, um, things like backup, um, mistakes happen. >>Sometimes the user makes a mistake. So for example, a race engineer could load the wrong data map into one of our simulations. And we could restore that VDI through SimpliVity backup at 90 seconds. And this makes sure it enables engineers to focus on the car to make better decisions without having downtime. And we sent them to, I take guys to every race they're managing 60 users, a really diverse environment, juggling a lot of balls and having a simple management platform like HPE SimpliVity gives us, allows them to be very effective and to work quickly. So all of those benefits were a huge step forward relative to the legacy infrastructure that we used to run at the edge. >>Yeah. So you had the nice Petri dish and the factory. So it sounds like your, your goals, obviously your number one KPI is speed to help shave seconds time, but also costs just the simplicity of setting up the infrastructure. >>Yeah. It's speed. Speed, speed. So we want applications absolutely fly, you know, get to actionable results quicker, um, get answers from our simulations quicker. The other area that speed's really critical is, um, our applications are also evolving prototypes, and we're always, the models are getting bigger. The simulations are getting bigger and they need more and more resource and being able to spin up resource and provision things without being a bottleneck is a big challenge in SimpliVity. It gives us the means of doing that. >>So did you consider any other options or was it because you had the factory knowledge? It was HCI was, you know, very clearly the option. What did you look at? >>Yeah, so, um, we have over five years of experience in the factory and we eliminated all of our legacy, um, um, infrastructure five years ago. And the benefits I've described, um, at the track, we saw that in the factory, um, at the track we have a three-year operational life cycle for our equipment. When into 2017 was the last year we had legacy as we were building for 2018. It was obvious that hyper-converged was the right technology to introduce. And we'd had years of experience in the factory already. And the benefits that we see with hyper-converged actually mattered even more at the edge because our operations are so much more pressurized time has even more of the essence. And so speeding everything up at the really pointy end of our business was really critical. It was an obvious choice. >>Why, why SimpliVity? What why'd you choose HPE SimpliVity? >>Yeah. So when we first heard about hyperconverged way back in the, in the factory, um, we had, um, a legacy infrastructure, overly complicated, too slow, too inflexible, too expensive. And we stepped back and said, there has to be a smarter way of operating. We went out and challenged our technology partners. We learned about hyperconvergence within enough, the hype, um, was real or not. So we underwent some PLCs and benchmarking and, and the, the PLCs were really impressive. And, and all these, you know, speed and agility benefits, we saw an HP for our use cases was the clear winner in the benchmarks. So based on that, we made an initial investment in the factory. Uh, we moved about 150 VMs in the 150 VDI into it. Um, and then as, as we've seen all the benefits we've successfully invested, and we now have, um, an estate to the factory of about 800 VMs and about 400 VDI. So it's been a great platform and it's allowed us to really push boundaries and, and give the business, um, the service that expects. >>So w was that with the time in which you were able to go from data to insight to recommendation or, or edict, uh, was that compressed, you kind of indicated that, but >>So we, we all telemetry from the car and we post-process it, and that reprocessing time really it's very time consuming. And, um, you know, we went from nine, eight minutes for some of the simulations down to just two minutes. So we saw big, big reductions in time and all, ultimately that meant an engineer could understand what the car was during a practice session, recommend a tweak to the configuration or setup of it, and just get more actionable insight quicker. And it ultimately helps get a better car quicker. >>Such a great example. How are you guys feeling about the season, Matt? What's the team's sentiment? >>Yeah, I think we're optimistic. Um, we w we, um, uh, we have a new driver >>Lineup. Uh, we have, um, max for stopping his carries on with the team and Sergio joins the team. So we're really excited about this year and, uh, we want to go and win races. Great, Matt, good luck this season and going forward and thanks so much for coming back in the cube. Really appreciate it. And it's my pleasure. Great talking to you again. Okay. Now we're going to bring back Omer for quick summary. So keep it real >>Without having solutions from HB, we can't drive those five senses, CFD aerodynamics that would undermine the simulations being software defined. We can bring new apps into play. If we can bring new them's storage, networking, all of that can be highly advises is a hugely beneficial partnership for us. We're able to be at the cutting edge of technology in a highly stressed environment. That is no bigger challenge than the formula. >>Okay. We're back with Omar. Hey, what did you think about that interview with Matt? >>Great. Uh, I have to tell you I'm a big formula one fan, and they are one of my favorite customers. Uh, so, you know, obviously, uh, one of the biggest use cases as you saw for red bull racing is Trackside deployments. There are now 22 races in a season. These guys are jumping from one city to the next, they've got to pack up, move to the next city, set up, set up the infrastructure very, very quickly and average formula. One car is running the thousand plus sensors on that is generating a ton of data on track side that needs to be collected very quickly. It needs to be processed very quickly, and then sometimes believe it or not, snapshots of this data needs to be sent to the red bull back factory back at the data center. What does this all need? It needs reliability. >>It needs compute power in a very short form factor. And it needs agility quick to set up quick, to go quick, to recover. And then in post processing, they need to have CPU density so they can pack more VMs out at the edge to be able to do that processing now. And we accomplished that for, for the red bull racing guys in basically two are you have two SimpliVity nodes that are running track side and moving with them from one, one race to the next race, to the next race. And every time those SimpliVity nodes connect up to the data center collector to a satellite, they're backing up back to their data center. They're sending snapshots of data back to the data center, essentially making their job a whole lot easier, where they can focus on racing and not on troubleshooting virtual machines, >>Red bull racing and HPE SimpliVity. Great example. It's agile, it's it's cost efficient, and it shows a real impact. Thank you very much. I really appreciate those summary comments. Thank you, Dave. Really appreciate it. All right. And thank you for watching. This is Dave Volante. >>You.

Published Date : Mar 30 2021

SUMMARY :

as close as possible to the sources of data, to reduce latency and maximize your ability to get Pleasure to be here. So how do you see the edge in the broader market shaping up? A lot of robot automation is going on that requires a lot of compute power to go out to More data is new data is generated at the edge, but needs to be stored. How do you look at that? a lot of States outside of the data center that needs to be protected. We at HBS believe that it needs to be extremely simple. You've got physics that you have to deal your latency to deal with. In addition to that, the customers now do not have to buy any third-party In addition to that, now you can backup straight to the client. So, so you can't comment on that or So as a result of that, as part of the GreenLake offering, we have cloud backup service natively are choosing SimpliVity for, particularly at the edge, and maybe talk about why from the data center, which you can literally push out and you can connect a network cable at the same time, there is disaster recovery use cases where you have, uh, out to service their stores because, you know, mobility is limited in these, in these strange times that we always about this race to manufacture The next designs to make it more adapt to the next circuit to run those. it's good to see you again. insights that really lead to competitive advantage. So this season we have 23 races and we So my job is, is really to make sure we have the right staff, that you have to tune to, or are there other factors involved there? So all that in order to win, you need to micromanage everything and optimize it for Talk about some of the key drivers in your business and some of the key apps that So all of that requires a lot of expertise to develop the simulation is the algorithms I mean, maybe in the form of like tribal So there's a huge amount of, um, very complicated models that So let's talk about your data center at the track, if you will. So the applications we need to operate the car and to make really Time is of the essence in a three X reduction in processing So for example, a race engineer could load the wrong but also costs just the simplicity of setting up the infrastructure. So we want applications absolutely fly, So did you consider any other options or was it because you had the factory knowledge? And the benefits that we see with hyper-converged actually mattered even more at the edge And, and all these, you know, speed and agility benefits, we saw an HP So we saw big, big reductions in time and all, How are you guys feeling about the season, Matt? we have a new driver Great talking to you again. We're able to be at Hey, what did you think about that interview with Matt? and then sometimes believe it or not, snapshots of this data needs to be sent to the red bull And we accomplished that for, for the red bull racing guys in And thank you for watching.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Sergio	PERSON	0.99+
Matt	PERSON	0.99+
David	PERSON	0.99+
Dave	PERSON	0.99+
two racks	QUANTITY	0.99+
Steve	PERSON	0.99+
Dave Volante	PERSON	0.99+
2020	DATE	0.99+
Omar	PERSON	0.99+
Omar Assad	PERSON	0.99+
2018	DATE	0.99+
Matt Cadieux	PERSON	0.99+
20	QUANTITY	0.99+
Red Bull Racing	ORGANIZATION	0.99+
HBS	ORGANIZATION	0.99+
Milton Keynes	LOCATION	0.99+
2017	DATE	0.99+
23 races	QUANTITY	0.99+
60 users	QUANTITY	0.99+
22 races	QUANTITY	0.99+
three-year	QUANTITY	0.99+
90 seconds	QUANTITY	0.99+
eight minutes	QUANTITY	0.99+
Omer Asad	PERSON	0.99+
UK	LOCATION	0.99+
two cables	QUANTITY	0.99+
One car	QUANTITY	0.99+
more than 50%	QUANTITY	0.99+
two	QUANTITY	0.99+
nine	QUANTITY	0.99+
each track	QUANTITY	0.99+
ITT	ORGANIZATION	0.99+
SimpliVity	TITLE	0.99+
last year	DATE	0.99+
two minutes	QUANTITY	0.99+
Virgin	ORGANIZATION	0.99+
HPE SimpliVity	TITLE	0.99+
three racks	QUANTITY	0.99+
Matt kudu	PERSON	0.99+
one	QUANTITY	0.99+
hundreds of stores	QUANTITY	0.99+
five senses	QUANTITY	0.99+
hundreds	QUANTITY	0.99+
about 800 VMs	QUANTITY	0.99+
both	QUANTITY	0.98+
green Lake	ORGANIZATION	0.98+
about 400 VDI	QUANTITY	0.98+
10 years	QUANTITY	0.98+
second use case	QUANTITY	0.98+
one city	QUANTITY	0.98+
Aruba	ORGANIZATION	0.98+
one site	QUANTITY	0.98+
five years ago	DATE	0.98+
F1 Racing	ORGANIZATION	0.98+
today	DATE	0.98+
SimpliVity	ORGANIZATION	0.98+
this year	DATE	0.98+
150 VDI	QUANTITY	0.98+
about 150 VMs	QUANTITY	0.98+
Sunday	DATE	0.98+
red bull	ORGANIZATION	0.97+
first	QUANTITY	0.97+
Omer	PERSON	0.97+
multi-trillion dollar	QUANTITY	0.97+
over five years	QUANTITY	0.97+
one large use case	QUANTITY	0.97+
first opportunity	QUANTITY	0.97+
HPE	ORGANIZATION	0.97+
each	QUANTITY	0.96+
decades	QUANTITY	0.96+
one ratios	QUANTITY	0.96+
HP	ORGANIZATION	0.96+
one race	QUANTITY	0.95+
GreenLake	ORGANIZATION	0.94+

David Graham, Dell Technologies | CUBEConversation, August 2019

>> From the Silicon Angle Media office in Boston, Massachusetts, It's theCUBE. (upbeat music) Now, here's your host, Stu Miniman. >> Hi. I'm Stu Miniman, and this is theCUBE's Boston area studio; our actually brand-new studio, and I'm really excited to have I believe is a first-time guest, a long-time caller, you know, a long time listener >> Yeah, yep. first time caller, good buddy of mine Dave Graham, who is the director, is a director of emerging technologies: messaging at Dell Technologies. Disclaimer, Dave and I worked together at a company some of you might have heard on the past, it was EMC Corporation, which was a local company. Dave and I both left EMC, and Dave went back, after Dell had bought EMC. So Dave, thanks so much for joining, it is your first time on theCUBE, yes? >> It is the first time on theCUBE. >> Yeah, so. >> Lets do some, Some of the first times that I actually interacted with, with this team here, you and I were bloggers and doing lots of stuff back in the industry, so it's great to be able to talk to you on-camera. >> Yeah, same here. >> All right, so Dave, I mentioned you were a returning former EMC-er, now Dell tech person, and you spent some time at Juniper, at some startups, but give our audience a little bit about your background and your passions. >> Oh, so background-wise, yep, so started my career in technology, if you will, at EMC, worked, started in inside sales of all places. Worked my way into a consulting/engineer type position within ECS, which was, obviously a pretty hard-core product inside of EMC now, or Dell Technologies now. Left, went to a startup, everybody's got to do a start up at some point in their life, right? Take the risk, make the leap, that was awesome, was actually one of those Cloud brokers that's out there, like Nasuni, company called Sertis. Had a little bit of trouble about eight months in, so it kind of fell apart. >> Yeah, the company did, not you. >> The company did! (men laughing) I was fine, you know, but the, yeah, the company had some problems, but ended up leaving there, going to Symantec of all places, so I worked on the Veritas side, kind of the enterprise side, which just recently got bought out by Avago, evidently just. >> Broadcom >> Broadcom, Broadcom, art of the grand whole Avago. >> Dave, Dave, you know we're getting up there in years and our tech, when we keep talking about something 'cause I was just reading about, right, Broadcom, which was of course Avago bought Broadcom in the second largest tech acquisition in history, but when they acquired Broadcom, they took on the name because most people know Broadcom, not as many people know Avago, even those of us with backgrounds in the chip semiconductor and all those pieces. I mean you got Brocade in there, you've got some of the software companies that they've bought over the time, so some of those go together. But yeah, Veritas and Symantec, those of us especially with some storage and networking background know those brands well. >> Absolutely, PLX's being the PCI switched as well, it's actually Broadcom, those things. So yeah, went from Symantec after a short period of time there, went to Juniper Networks, ran part of their Center of Excellence, kind of a data center overlay team, the only non-networking guy in a networking company, it felt like. Can't say that I learned a ton about the networking side, but definitely saw a huge expansion in the data center space with Juniper, which was awesome to see. And then the opportunity came to come back to Dell Technologies. Kind of a everything old becoming new again, right? Going and revisiting a whole bunch of folks that I had worked with 13, you know, 10 years ago. >> Dave, it's interesting, you know, I think about, talk about somebody like Broadcom, and Avago, and things like that. I remember reading blog posts of yours, that you'd get down to some of that nitty-level, you and I would be ones that would be the talk about the product, all right now pull the board out, let me look at all the components, let me understand, you know, the spacing, and the cooling, and all the things there, but you know here it's 2019, Dave. Don't you know software is eating the world? So, tell us a little bit about what you're working on these days, because the high-level things definitely don't bring to mind the low-level board pieces that we used to talk about many years ago. >> Exactly, yeah, it's no longer, you know, thermals and processing power as much, right? Still aspects of that, but a lot of what we're focused on now, or what I'm focused on now is within what we call the emerging technology space. Or horizon 2, horizon 3, I guess. >> Sounds like something some analyst firm came up with, Dave. (Dave laughing) >> Yeah, like Industry 4.0, 5.0 type stuff. It's all exciting stuff, but you know when you look at technologies like five, 5G, fifth generation wireless, you know both millimeter waves, sub six gigahertz, AI, you know, everything old becoming new again, right? Stuff from the fifties, and sixties that's now starting to permeate everything that we do, you're not opening your mouth and breathing unless you're talking about AI at some point, >> Yeah, and you bring up a great point. So, we've spent some time with the Dell team understanding AI, but help connect for our audience that when you talk high AI we're talking about, we're talking about data at the center of everything, and it's those applications, are you working on some of those solutions, or is it the infrastructure that's going to enable that, and what needs to be done at that level for things to work right? >> I think it's all of the above. The beauty of kind of Dell Technologies that you sit across, both infrastructure and software. You look at the efforts and the energies, stuff like VMware buying, BitFusion, right, as a mechanism trying to assuage some of that low-level hardware stuff. Start to tap into what the infrastructure guys have always been doing. When you bring that kind of capability up the stack, now you can start to develop within the software mindset, how, how you're going to access this. Infrastructure still plays a huge part of it, you got to run it on something, right? You can't really do serverless AI at this point, am I allowed to say that? (man laughing) >> Well, you could say that, I might disagree with you, because absolutely >> Eh, that's fine. there's AI that's running on it. Don't you know, Dave, I actually did my serverless 101 article that I had, I actually had Ashley Gorakhpurwalla, who is the General Manager of Dell servers, holding the t-shirt that "there is no serverless, it's just, you know, a function that you only pay the piece that you need when you need and everything there." But the point of the humor that I was having there is even the largest server manufacturer in the world knows that underneath that serverless discussion, absolutely, there is still infrastructure that plays there, just today it tends to primarily be in AWS with all of their services, but that proliferation, serverless, we're just letting the developers be developers and not have to think about that stuff, and I mean, Dave, the stuff we've had background, you know, we want to get rid of silos and make things simpler, I mean, it's the things we've been talking about for decades, it's just, for me it was interesting to look at, it is very much a developer application driven piece, top-down as opposed to so many of the virtualization and infrastructure as a service is more of a bottom-up, let me try to change this construct so that we can then provide what you need above it, it's just a slightly different way of looking at things. >> Yeah, and I think we're really trying to push for that stuff, so you know you can bundle together hardware that makes it, makes the development platform easy to do, right? But the efforts and energy of our partnerships, Dell has engaged in a lot of partnerships within the industry, NVIDIA, Intel, AMD, Graphcore, you name it, right? We're out in that space working along with those folks, but a lot of that is driven by software. It's, you write to a library, like Kudu, or, you know pyEight, you know, PyTorch, you're using these type of elements and you're moving towards that, but then it has to run on something, right? So we want to be in that both-end space, right? We want to enable that kind of flexibility capability, and obviously not prevent it, but we want to also expose that platform to as many people within the industry as possible so they can kind of start to develop on it. You're becoming a platform company, really, when it comes down to it. >> I don't want to get down the semantical arguments of AI, if you will, but what are you hearing from customers, and what's some kind of driving some of the discussions lately that's the reality of AI as opposed to some of just the buzzy hype that everybody talks about? >> Well I still think there's some ambiguity in market around AI versus automation even, so what people that come and ask us are well, "you know, I believe in this thing called artificial intelligence, and I want to do X, Y, and Z." And these particular workloads could be better handled by a simple, not to distill it down to the barest minimum, but like cron jobs, something that's, go back in the history, look at the things that matter, that you could do very very simply that don't require a large amount of library, or sort of an understanding of more advanced-type algorithms or developments that way. In the reverse, you still have that capability now, where everything that we're doing within industry, you use chat-bots. Some of the intelligence that goes into those, people are starting to recognize, this is a better way that I could serve my customers. Really, it's that business out kind of viewpoint. How do I access these customers, where they may not have the knowledge set here, but they're coming to us and saying, "it's more than just, you know, a call, an IVR system," you know, like an electronic IVR system, right? Like I come in and it's just quick response stuff. I need some context, I need to be able to do this, and transform my data into something that's useful for my customers. >> Yeah, no, this is such a great point, Dave. The thing I've asked many times, is, my entire career we've talked about intelligence and we've talked about automation, what's different about it today? And the reality is, is it used to be all right. I was scripting things, or I would have some Bash processes, or I would put these things together. The order of magnitude and scale of what we're talking about today, I couldn't do it manually if I wanted to. And that automation is really, can be really cool these days, and it's not as, to set all of those up, there is more intelligence built into it, so whether it's AI or just machine learning kind of underneath it, that spectrum that we talk about it, there's some real-use cases, a real lot of things that are happening there, and it definitely is, order of magnitudes more improved than what we were talking about say, back when we were both at EMC and the latest generation of Symmetrix was much more intelligent than the last generation, but if you look at that 10 years later, boy, it's, it is night and day, and how could we ever have used those terms before, compared to where we are today. >> Yeah it's, it's, somebody probably at some point coined the term, "exponential". Like, things become exponential as you start to look at it. Yeah, the development in the last 10 years, both in computing horsepower, and GPU/GPGPU horsepower, you know, the innovation around, you know FPGAs are back in a big way now, right? All that brainpower that used to be in these systems now, you now can benefit even more from the flexibility of the systems in order to get specific workloads done. It's not for everybody, we all know that, but it's there. >> I'm glad you brought up FPGAs because those of us that are hardware geeks, I mean, some reason I studied mechanical engineering, not realizing that software would be a software world that we live in. I did a video with Amy Lewis and she's like, "what was your software-defined moments?" I'm like, "gosh, I'm the frog sitting in the pot, and, would love to, if I can't network-diagram it, or put these things together, networking guy, it's my background! So, the software world, but it is a real renaissance in hardware these days. Everything from the FPGAs you mentioned, you look at NVIDIA and all of their partners, and the competitors there. Anything you geeking out on the hardware side? >> I, yeah, a lot of the stuff, I mean, the era of GPU showed up in a big way, all right? We have NVIDIA to thank for that whole, I mean, the kudos to them for developing a software ecosystem alongside a hardware. I think that's really what sold that and made that work. >> Well, you know, you have to be able to solve that Bitcoin mining problem, so. >> Well, you know, depending on which cryptocurrency you did, EMD kind of snuck in there with their stuff and they did some of that stuff better. But you have that kind of competing architecture stuff, which is always good, competition you want. I think now that what we're seeing is that specific workloads now benefit from different styles of compute. And so you have the companies like Graphcore, or the chip that was just launched out of China this past week that's configurable to any type of network, enteral network underneath the covers. You see that kind of evolution in capability now, where general purpose is good, but now you start to go into reconfigurable elements so, I'll, FPGAs are some of these more advanced chips. The neuromorphic hardware, which is always, given my background in psychology, is always interesting to me, so anything that is biomorphic or neuromorphic to me is pinging around up here like, "oh, you're going to emulate the brain?" And Intel's done stuff, BraincChip's done stuff, Netspace, it's amazing. I just, the workloads that are coming along the way, I think are starting to demand different types or more effectiveness within that hardware now, so you're starting to see a lot of interesting developments, IPUs, TPUs, Teslas getting into the inferencing bit now, with their own hardware, so you see a lot of effort and energy being poured in there. Again, there's not going to be one ring to rule them all, to cop Tolkien there for a moment, but there's going to be, I think you're going to start to see the disparation of workloads into those specific hardware platforms. Again, software, it's going to start to drive the applications for how you see these things going, and it's going to be the people that can service the most amount of platforms, or the most amount of capability from a single platform even, I think are the people who are going to come out ahead. And whether it'll be us or any of our August competitors, it remains to be seen, but we want to be in that space we want to be playing hard in that space as well. >> All right Dave, last thing I want to ask you about is just career. So, it's interesting, at Vmworld, I kind of look at it in like, "wow, I'm actually, I'm sitting at a panel for Opening Acts, which is done by the VMunderground people the Sunday, day before VMworld really starts, talking about jobs and there's actually three panels, you know, careers, and financial, and some of those things, >> I'm going to be there, so come on by, >> Maybe I should join startin' at 1 o'clock Monday evening, I'm actually participating in a career cafe, talking about people and everything like that, so all that stuff's online if you want to check it out, but you know, right, you said psychology is what you studied but you worked in engineering, you were a systems engineer, and now you do messaging. The hardcore techies, there's always that boundary between the techies and the marketings, but I think it's obvious to our audience when they hear you geeking out on the TPUs and all the things there that you are not just, you're quite knowledgeable when it comes about the technology, and the good technical marketers I find tend to come from that kind of background, but give us a little bit, looking back at where you've been and where you're going, and some of those dynamics. >> Yeah, I was blessed from a really young age with a father who really loved technology. We were building PCs, like back in the eighties, right, when that was a thing, you know, "I built my AMD 386 DX box" >> Have you watched the AMC show, "Halt and Catch Fire," when that was on? >> Yeah, yeah, yeah, so there was that kind of, always interesting to me, and I, with the way my mind works, I can't code to save my life, that's my brother's gift, not mine. But being able to kind of assemble things in my head was kind of always something that stuck in the back. So going through college, I worked as a lab resident as well, working in computer labs and doing that stuff. It's just been, it's been a passion, right? I had the education, was very, you know, that was my family, was very hard on the education stuff. You're going to do this. But being able to follow that passion, a lot of things fell into place with that, it's been a huge blessing. But even in grad school when I was getting my Masters in clinical counseling, I ran my own consulting business as well, just buying and selling hardware. And a lot of what I've done is just I read and ask a ton of questions. I'm out on Twitter, I'm not the brightest bulb in the, of the bunch, but I've learned to ask a lot of questions and the amount of community support in that has gotten me a lot of where I am as well. But yeah, being able to come out on this side, marketing is, like you're saying, it's kind of an anathema to the technical guys, "oh those are the guys that kind of shine the, shine the turd, so to speak," right? But being able to come in and being able to kind of influence the way and make sure that we're technically sound in what we're saying, but you have to translate some of the harder stuff, the more hardcore engineering terms into layman's terms, because not everybody's going to approach that. A CIO with a double E, or an MS in electrical engineering are going on down that road are very few and far between. A lot of these folks have grown up or developed their careers in understanding things, but being able to kind of go in and translate through that, it's been a huge blessing, it's nice. But always following the areas where, networking for me was never a strong point, but jumping in, going, "hey, I'm here to learn," and being willing to learn has been one of the biggest, biggest things I think that's kind of reinforced that career process. >> Yeah, definitely Dave, that intellectual curiosity is something that serves anyone in the tech industry quite well, 'cause, you know, nobody is going to be an expert on everything, and I've spoken to some of the brightest people in the industry, and even they realize nobody can keep up with all of it, so that being able to ask questions, participate, and Dave, thank you so much for helping me, come have this conversation, great as always to have a chat. >> Ah, great to be here Stu, thanks. >> Alright, so be sure to check out the theCUBE.net, which is where all of our content always is, what shows we will be at, all the history of where we've been. This studio is actually in Marlborough, Massachusetts, so not too far outside of Boston, right on the 495 loop, we're going to be doing lot more videos here, myself and Dave Vellante are located here, we have a good team here, so look for more content out of here, and of course our big studio out of Palo Alto, California. So if we can be of help, please feel free to reach out, I'm Stu Miniman, and as always, thanks for watching theCUBE. (upbeat electronic music)

Published Date : Aug 9 2019

SUMMARY :

From the Silicon Angle Media office is a first-time guest, a long-time caller, you know, some of you might have heard on the past, back in the industry, so it's great to be able and you spent some time at Juniper, at some startups, in technology, if you will, at EMC, I was fine, you know, I mean you got Brocade in there, that I had worked with 13, you know, 10 years ago. and all the things there, but you know here it's 2019, Dave. Exactly, yeah, it's no longer, you know, came up with, Dave. sub six gigahertz, AI, you know, everything old or is it the infrastructure that's going to enable that, The beauty of kind of Dell Technologies that you sit across, so that we can then provide what you need above it, to push for that stuff, so you know you can bundle In the reverse, you still have that capability now, than the last generation, but if you look and GPU/GPGPU horsepower, you know, the innovation Everything from the FPGAs you mentioned, the kudos to them for developing a software ecosystem Well, you know, you have to be able and it's going to be the people you know, careers, and financial, so all that stuff's online if you want to check it out, when that was a thing, you know, "I built my AMD 386 DX box" I had the education, was very, you know, is something that serves anyone in the tech industry Alright, so be sure to check out the theCUBE.net,

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Symantec	ORGANIZATION	0.99+
EMC	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Dave Graham	PERSON	0.99+
Veritas	ORGANIZATION	0.99+
Stu Miniman	PERSON	0.99+
Amy Lewis	PERSON	0.99+
Ashley Gorakhpurwalla	PERSON	0.99+
David Graham	PERSON	0.99+
China	LOCATION	0.99+
Juniper	ORGANIZATION	0.99+
NVIDIA	ORGANIZATION	0.99+
Dell	ORGANIZATION	0.99+
Boston	LOCATION	0.99+
August 2019	DATE	0.99+
2019	DATE	0.99+
Avago	ORGANIZATION	0.99+
Broadcom	ORGANIZATION	0.99+
Sertis	ORGANIZATION	0.99+
Dell Technologies	ORGANIZATION	0.99+
AMD	ORGANIZATION	0.99+
Juniper Networks	ORGANIZATION	0.99+
first time	QUANTITY	0.99+
Nasuni	ORGANIZATION	0.99+
August	DATE	0.99+
Graphcore	ORGANIZATION	0.99+
AMC	ORGANIZATION	0.99+
Intel	ORGANIZATION	0.99+
Palo Alto, California	LOCATION	0.99+
Brocade	ORGANIZATION	0.99+
Vmworld	ORGANIZATION	0.99+
theCUBE	ORGANIZATION	0.99+
Halt and Catch Fire	TITLE	0.99+
first-time	QUANTITY	0.99+
PLX	ORGANIZATION	0.99+
Boston, Massachusetts	LOCATION	0.98+
EMC Corporation	ORGANIZATION	0.98+
both	QUANTITY	0.98+
10 years ago	DATE	0.98+
Teslas	ORGANIZATION	0.98+
1 o'clock Monday evening	DATE	0.98+
10 years later	DATE	0.98+
first times	QUANTITY	0.98+
Sunday	DATE	0.97+
495 loop	LOCATION	0.97+
eighties	DATE	0.97+

Action Item | Big Data SV Preview Show - Feb 2018

>> Hi, I'm Peter Burris and once again, welcome to a Wikibon Action Item. (lively electronic music) We are again broadcasting from the beautiful theCUBE Studios here in Palo Alto, California, and we're joined today by a relatively larger group. So, let me take everybody through who's here in the studio with us. David Floyer, George Gilbert, once again, we've been joined by John Furrier, who's one of the key CUBE hosts, and on the remote system is Jim Kobielus, Neil Raden, and another CUBE host, Dave Vellante. Hey guys. >> Hi there. >> Good to be here. >> Hey. >> So, one of the things we're, one of the reasons why we have a little bit larger group here is because we're going to be talking about a community gathering that's taking place in the big data universe in a couple of weeks. Large numbers of big data professionals are going to be descending upon Strata for the purposes of better understanding what's going on within the big data universe. Now we have run a CUBE show next to that event, in which we get the best thought leaders that are possible at Strata, bring them in onto theCUBE, and really to help separate the signal from the noise that Strata has historically represented. We want to use this show to preview what we think that signal's going to be, so that we can help the community better understand what to look for, where to go, what kinds of things to be talking about with each other so that it can get more out of that important event. Now, George, with that in mind, what are kind of the top level thing? If it was one thing that we'd identify as something that was different two years ago or a year ago, and it's going to be different from this show, what would we say it would be? >> Well, I think the big realization that's here is that we're starting with the end in mind. We know the modern operational analytic applications that we want to build, that anticipate or influence a user interaction or inform or automate a business transaction. And for several years, we were experimenting with big data infrastructure, but that was, it wasn't solution-centric, it was technology-centric. And we kind of realized that the do it yourself, assemble your own kit, opensource big data infrastructure created too big a burden on admins. Now we're at the point where we're beginning to see a more converged set of offerings take place. And by converged, I mean an end to end analytic pipeline that is uniform for developers, uniform for admins, and because it's pre-integrated, is lower latency. It helps you put more data through one single analytic latency budget. That's what we think people should look for. Right now, though, the hottest new tech-centric activity is around Machine Learning, and I think the big thing we have to do is recognize that we're sort of at the same maturity level as we were with big data several years ago. And people should, if they're going to work with it, start with the knowledge, for the most part, that they're going to be experimenting, 'cause the tooling isn't quite mature enough, we don't have enough data scientists for people to be building all these pipelines bespoke. And the third-party applications, we don't have a high volume of them where this is embedded yet. >> So if I can kind of summarize what you're saying, we're seeing bifurcation occur within the ecosystem associated with big data that's driving toward simplification on the infrastructure side, which increasingly is being associated with the term big data, and new technologies that can apply that infrastructure and that data to new applications, including things like AI, ML, DL, where we think about modeling and services, and a new way of building value. Now that suggests that one or the other is more or less hot, but Neil Raden, I think the practical reality is that here in Silicon Valley, we got to be careful about getting too far out in front of our skis. At the end of the day, there's still a lot of work to be done inside how you simply do things like move data from one place to the other in a lot of big enterprises. Would you agree with that? >> Oh absolutely. I've been talking to a lot clients this week and, you know, we don't talk about the fact that they're still running their business on what we would call legacy systems, and they don't know how to, you know, get out of them or transform from them. So they're still starting to plan for this, but the problem is, you know, it's like talking about the 27 rocket engines on the whatever it was that he launched into space, launching a Tesla into space. But you can talk about the engineering of those engines and that's great, but what about all the other things you're going to have to do to get that (laughs) car into space? And it's the same thing. A year ago, we were talking about Hadoop and big data and, to a certain extent, Machine Learning, maybe more data science. But now people are really starting to say, How do we actually do this, how do we secure it, how do we govern it, how do we get some sort of metadata or semantics on the data we're working with so people know what they're using. I think that's where we are in a lot of companies. >> Great, so that's great feedback, Neil. So as we look forward, Jim Kobielus, the challenges associated with what it means to better improve the facilities of your infrastructure, but also use that as a basis for increasing your capability on some of the new applications services, what are we looking for, what should folks be looking for as they explore the show in the next couple of weeks on the ML side? What new technologies, what new approaches? Going back to what George said, we're in experimentation mode. What are going to be the experiments that are going to generate greatest results over the course of the next year? >> Yeah, for the data scientists, who flock to Strata and similar conferences, automation of the Machine Learning pipeline is super hot in terms of investments by the solution providers. Everybody from Google to IBM to AWS, and others, are investing very heavily in automation of, not just the data engine, that problem's been had a long time ago. It's automation of more of the feature engineering and the trending. These very manual, often labor intensive, jobs have to be sped up and automated to a great degree to enable the magic of productivity by the data scientists in the new generation of app developers. So look for automation of Machine Learning to be a super hot focus. Related to that is, look for a new generation of development suites that focus on DevOps, speeding the Machine Learning in DL and AI from modeling through training and evaluation deployment in iteration. We've seen a fair upswing in the number of such toolkits on the market from a variety of startup vendors, like the DataRobots of the world. But also coming to say, AWS with SageMaker, for example, that's hot. Also, look for development toolkits that automate more of the cogeneration, you know, a low-code tools, but the new generation of low-code tools, as highlighted in a recent Wikibons study, use ML to drive more of the actual production of fairly decent, good enough code, as a first rough prototype for a broad range of applications. And finally we're seeing a fair amount of ML-generated code generation inside of things like robotic process automation, RPA, which I believe will probably be a super hot theme at Strata and other shows this year going forward. So there's a, you mentioned the idea of better tooling for DevOps and the relationship between big data and ML, and what not, and DevOps. One of the key things that we've been seeing over the course of the last few years, and it's consistent with the trends that we're talking about, is increasing specialization in a lot of the perspectives associated with changes within this marketplace, so we've seen other shows that have emerged that have been very, very important, that we, for example, are participating in. Places like Splunk, for example, that is the vanguard, in many respects, of a lot of these trends in big data and how big data can applied to business problems. Dave Vellante, I know you've been associated with a number of, participating in these shows, how does this notion of specialization inform what's going to happen in San Jose, and what kind of advice and counsel should we tell people to continue to explore beyond just what's going to happen in San Jose in a couple weeks? >> Well, you mentioned Splunk as an example, a very sort of narrow and specialized company that solves a particular problem and has a very enthusiastic ecosystem and customer base around that problem. LAN files to solve security problems, for example. I would say Tableau is another example, you know, heavily focused on Viz. So what you're seeing is these specialized skillsets that go deep within a particular domain. I think the thing to think about, especially when we're in San Jose next week, is as we talk about digital disruption, what are the skillsets required beyond just the domain expertise. So you're sort of seeing this bifurcated skillsets really coming into vogue, where if somebody understands, for example, traditional marketing, but they also need to understand digital marketing in great depth, and the skills that go around it, so there's sort of a two-tool player. We talk about five-tool player in baseball. At least a multidimensional skillset in digital. >> And that's likely to occur not just in a place like marketing, but across the board. David Floyer, as folks go to the show and start to look more specifically about this notion of convergence, are there particular things that they should think about that, to come back to the notion of, well, you know, hardware is going to make things more or less difficult for what the software can do, and software is going to be created that will fill up the capabilities of hardware. What are some of the underlying hardware realities that folks going to the show need to keep in mind as they evaluate, especially the infrastructure side, these different infrastructure technologies that are getting more specialized? >> Well, if we look historically at the big data area, the solution has been to put in very low cost equipment as nodes, lots of different nodes, and move the data to those nodes so that you get a parallelization of the, of the data handling. That is not the only way of doing it. There are good ways now where you can, in fact, have a single version of that data in one place in very high speed storage, on flash storage, for example, and where you can allow very fast communication from all of the nodes directly to that data. And that makes things a lot simpler from an operational point of view. So using current Batch Automation techniques that are in existence, and looking at those from a new perspective, which is I do IUs apply these to big data, how do I automate these things, can make a huge difference in just the practicality in the elapsed time for some of these large training things, for example. >> Yeah, I was going to say that to many respects, what you're talking about is bringing things like training under a more traditional >> David: Operational, yeah. >> approach and operational set of disciplines. >> David: Yes, that's right. >> Very, very important. So John Furrier, I want to come back to you, or I want to come to you, and say that there are some other technologies that, while they're the bright shiny objects and people think that they're going to be the new kind of Harry Potter technologies of magic everywhere, Blockchain is certainly going to become folded into this big data concept, because Blockchain describes how contracts, ownership, authority ultimately get distributed. What should folks look for as the, as Blockchain starts to become part of these conversations? >> That's a good point, Peter. My summary of the preview for BigData SV Silicon Valley, which includes the Strata show, is two things: Blockchain points to the future and GDPR points to the present. GDPR is probably the most, one of the most fundamental impacts to the big data market in a long time. People have been working on it for a year. It is a nightmare. The technical underpinnings of what companies have to do to comply with GDPR is a moving train, and it's complete BS. There's no real solutions out there, so if I was going to tell everyone to think about that and what to look for: What is happening with GDPR, what's the impact of the databases, what's the impact of the architectures? Everyone is faking it 'til they make it. No one really has anything, in my opinion from what I can see, so it's a technical nightmare. Where was that database? So it's going to impact how you store the data, and the sovereignty issue is another issue. So the Blockchain then points to the sovereignty issue of the data, both in terms of the company, the country, and the user. These things are going to impact software development, application development, and, ultimately, cloud choice and the IoT. So to me, GDPR is not just a one and done thing and Blockchain is kind of a future thing to look at. So I would look out of those two lenses and say, Do you have a direction or a narrative that supports me today with what GDPR will impact throughout the organization. And then, what's going on with this new decentralized infrastructure and the role of data, and the sovereignty of that data, with respect to company, country, and user. So to me, that's the big issue. >> So George Gilbert, if we think about this question of these fundamental technologies that are going to become increasingly important here, database managers are not dead as a technology. We've seen a relative explosion over the last few years in at least invention, even if it hasn't been followed with, as Neil talked about, very practical ways of bringing new types of disciplines into a lot of enterprises. What's going to happen with the database world, and what should people be looking for in a couple of weeks to better understand how some of these data management technologies are going to converge and, or involve? >> It's a topic that will be of intense interest and relevance to IT professionals, because it's become the common foundation of all modern apps. But I think what we can do is we can see, for instance, a leading indicator of what's going to happen with the legacy vendors, where we have in-memory technologies from both transaction processing and analytics, and we have more advanced analytics embedded in the database engine, including Machine Learning, the model training, as well as model serving. But the, what happened in the big data community is that we disassembled the DBMS into the data manipulation language, which is an analytic language, like, could be Spark, could be Flink, even Hive. We had the Catalog, which I think Jim has talked about or will be talking about, where we're not looking, it's not just a dictionary of what's in one DBMS, but it's a whole way of tracking and governing data across many stores. And then there's the Storage Manager, could be the file system, an object store, could be just something like Kudu, which is a MPP way of, in parallel, performing a bunch of operations on data that's stored. The reason I bring all this up is, following on David's comment about the evolution of hardware, databases are fundamentally meant to expose capabilities in the hardware and to mediate access to data, using these hardware capabilities. And now that we have this, what's emerging as this unigrid, with memory-intensive architectures and super low latency to get from any point or node on that cluster to any other node, like with only a five microsecond lag, relative to previous architectures. We can now build databases that scale up with the same knowledge base that we built databases... I'm sorry, that scale out, that we used to build databases that scale up. In other words, it democratizes the ability to build databases of enormous scale, and that means that we can have analytics and the transactions working together at very low latency. >> Without binding them. Alright, so I think it's time for the action items. We got a lot to do, so guys, keep it really tight, really simple. David Floyer, let me start with you. Action item. >> So action item on big data should be focus on technologies that are going to reduce the elapse time of solutions in the data center, and those are many and many of them, but it's a production problem, it's becoming a production problem, treat it as a production problem, and put it in the fundamental procedures and technologies to succeed. >> And look for vendors >> Who can do that, yes. >> that do that. George Gilbert, action item. >> So I talked about convergence before. The converged platform now is shifting, it's center of gravity is shifting to continuous processing, where the data lake is a reference data repository that helps inform the creation of models, but then you run the models against the streaming continuous data for the freshest insights-- >> Okay, Jim Kobielus, action item. >> Yeah, focus on developer productivity in this new era of big data analytics. Specifically focus on the next generation of developers, who are data scientists, and specifically focus on automating most of what they do, so they can focus on solving problems and sifting through data. Put all the grunt work or training, and all that stuff, take and carry it by the infrastructure, the tooling. >> Peter: Neil Raden, action item. >> Well, one thing I learned this week is that everything we're talking about is about the analytical problem, which is how do you make better decisions and take action? But companies still run on transactions, and it seems like we're running on two different tracks and no one's talking about the transactions anymore. We're like the tail wagging the dog. >> Okay, John Furrier, action item. >> Action item is dig into GDPR. It is a really big issue. If you're not proactive, it could be a nightmare. It's going to have implications that are going to be far-reaching in the technical infrastructure, and it's the Sarbanes-Oxley, what they did for public companies, this is going to be a nightmare. And evaluate the impact of Blockchains. Two things. >> David Vellante, action item. >> So we often say that digital is data, and just because your industry hasn't been upended by digital transformations, don't think it's not coming. So it's maybe comfortable to sit back and say, Well, we're going to wait and see. Don't sit back and wait and see. All industries are susceptible to digital transformation. >> Alright, so I'll give the action item for the team. We've talked a lot about what to look for in the community gathering that's taking place next week in Silicon Valley around strata. Our observations as the community, it descends upon us, and what to look for is, number one, we're seeing a bifurcation in the marketplace, in the thought leadership, and in the tooling. One set of group, one group is going more after the infrastructure, where it's focused more on simplification, convergence; another group is going more after the developer, AI, ML, where it's focused more on how to create models, training those models, and building applications with the services associated with those models. Look for that. Don't, you know, be careful about vendors who say that they do it all. Be careful about vendors that say that they don't have to participate in a converged approach to doing this. The second thing I think we need to look for, very importantly, is that the role of data is evolving, and data is becoming an asset. And the tooling for driving velocity of data through systems and applications is going to become increasingly important, and the discipline that is necessary to ensure that the business can successfully do that with a high degree of predictability, bringing new production systems are also very important. A third area that we take a look at is that, ultimately, the impact of this notion of data as an asset is going to really come home to roost in 2018 through things like GDPR. As you scan the show, ask a simple question: Who here is going to help me get up to compliance and sustain compliance, as the understanding of privacy, ownership, etc. of data, in a big data context, starts to evolve, because there's going to be a lot of specialization over the next few years. And there's a final one that we might add: When you go to the show, do not just focus on your favorite brands. There's a lot of new technology out there, including things like Blockchain. They're going to have an enormous impact, ultimately, on how this marketplace unfolds. The kind of miasma that's occurred in big data is starting to specialize, it's starting to break down, and that's creating new niches and new opportunities for new sources of technology, while at the same time, reducing the focus that we currently have on things like Hadoop as a centerpiece. A lot of convergence is going to create a lot of new niches, and that's going to require new partnerships, new practices, new business models. Once again, guys, I want to thank you very much for joining me on Action Item today. This is Peter Burris from our beautiful Palo Alto theCUBE Studio. This has been Action Item. (lively electronic music)

Published Date : Feb 24 2018

SUMMARY :

We are again broadcasting from the beautiful and it's going to be different from this show, And the third-party applications, we don't have Now that suggests that one or the other is more or less hot, but the problem is, you know, it's like talking about the What are going to be the experiments that are going to in a lot of the perspectives associated with I think the thing to think about, that folks going to the show need to keep in mind and move the data to those nodes and people think that they're going to be So the Blockchain then points to the sovereignty issue What's going to happen with the database world, in the hardware and to mediate access to data, We got a lot to do, so guys, focus on technologies that are going to that do that. that helps inform the creation of models, Specifically focus on the next generation of developers, and no one's talking about the transactions anymore. and it's the Sarbanes-Oxley, So it's maybe comfortable to sit back and say, and sustain compliance, as the understanding of privacy,

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Jim Kobielus	PERSON	0.99+
George	PERSON	0.99+
David Floyer	PERSON	0.99+
George Gilbert	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Neil Raden	PERSON	0.99+
Neil	PERSON	0.99+
Peter Burris	PERSON	0.99+
David Vellante	PERSON	0.99+
IBM	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
John Furrier	PERSON	0.99+
Peter	PERSON	0.99+
Feb 2018	DATE	0.99+
Silicon Valley	LOCATION	0.99+
Jim	PERSON	0.99+
AWS	ORGANIZATION	0.99+
2018	DATE	0.99+
Google	ORGANIZATION	0.99+
GDPR	TITLE	0.99+
next week	DATE	0.99+
two things	QUANTITY	0.99+
Palo Alto, California	LOCATION	0.99+
Splunk	ORGANIZATION	0.99+
both	QUANTITY	0.99+
A year ago	DATE	0.99+
two lenses	QUANTITY	0.99+
a year ago	DATE	0.99+
two years ago	DATE	0.99+
this week	DATE	0.99+
Palo Alto	LOCATION	0.99+
first	QUANTITY	0.99+
third area	QUANTITY	0.98+
CUBE	ORGANIZATION	0.98+
one group	QUANTITY	0.98+
second thing	QUANTITY	0.98+
27 rocket	QUANTITY	0.98+
today	DATE	0.98+
next year	DATE	0.98+
Two things	QUANTITY	0.97+
theCUBE Studios	ORGANIZATION	0.97+
two-tool player	QUANTITY	0.97+
five microsecond	QUANTITY	0.96+
One set	QUANTITY	0.96+
Tableau	ORGANIZATION	0.94+
a year	QUANTITY	0.94+
single version	QUANTITY	0.94+
one	QUANTITY	0.94+
Wikibons	ORGANIZATION	0.91+
Wikibon	ORGANIZATION	0.91+
two different tracks	QUANTITY	0.91+
five-tool player	QUANTITY	0.9+
several years ago	DATE	0.9+
this year	DATE	0.9+
Strata	TITLE	0.87+
Harry Potter	PERSON	0.85+
one thing	QUANTITY	0.84+
years	DATE	0.83+
one place	QUANTITY	0.82+

Reni Waegelein, Veikkaus | PentahoWorld 2017

>> Narrator: Live from Orlando, Florida, it's The Cube, covering PentahoWorld 2017. Brought to you by Hitachi Ventara. >> Welcome back to The Cube's live coverage of PentahoWorld, brought to you, of course, by Hitachi Ventara. I'm your host, Rebecca Knight, along with my cohost, Dave Vellante. We're joined by Reni Waegelein, he is the IT manager of Veikkaus. Thanks so much for coming on The Cube Reni. >> Thank you for having me here. >> So, Veikkaus is the Finnish national betting agency wholly owned by the government. >> Yeah. >> Tell us more. >> Yeah we have, we used to have like three companies, now we are merged as one and we operate every money gaming thing, all the money gaming in Finland. So that includes from casino to lottery, to scratch tickets, sports betting, horse betting, whatever that is, and we gather money, of course, pay out some good winnings as well. But everything we make under the line, that goes to good causes, and I mean everything. >> And you are IT manager. >> Reni: Yeah. >> So what does, what are your responsibilities? >> Yeah, responsibilities like the developing the whole of the idea things we have, from architecture to doing the IT procurement development, and harnessing how we work. >> So the public policy on betting is, hey, let's have a single state-run monopoly. >> Reni: Yep. >> And we'll take the winnings and put it to the public good, right, makes sense. >> Reni: Yep. >> And is there any competition from internet, for example? >> Of course, yes, and the internet, well, it's like a full competition, although we are a legally-based company in Finland and we operate and sell only to Finnish people. The people itself, they have all the freedom to choose whoever they want to play with, so in that sense, it's full competition and have been so for many years. >> So you have to have great websites. >> Reni: Yep. >> Great customer experience, >> Reni: Yep. >> User experience. >> Reni: Yeah. >> Competitive rates, all that stuff. >> Reni: Yep. >> Okay so, and good analytics. (laughing) I mean that industry is obviously very data heavy. >> Reni: Yep. >> Always has been. So how do you analytics and data to compete? >> So we have been doing, like, the product analytics for quite a long time and then we established a customer-ship. So in Finland we have a 5.4 million habitats, and we sell only for the 18+ year old people, and at the moment we have more than 2 million registered customers already. So, you can imagine that we have that vast amount of data from the customer, and we use that data, for example, promoting the service, promoting games, targeting, making some recommendation. We build our own recommendation engine, for example, and utilize all of that kind of data. But, as you know, the gaming is also like a two-edged sword, that's a happy side, but there's also the dark side. So it does cause problem, so we try also to use the data so that we want to identify the bad patterns when somebody is about to lose control of gaming. So we use also the same data that we want to see, for example, for these players who want to see all the activities of marketing, for example, we don't want anybody to get into problems because of gaming. >> So that's a really interesting tension here, is that you obviously want to make money in this, but you also have to watch out for the Finnish society. And as you said, if there's a compulsive gambler or an addicted gambler, you need to act, I mean, is that? >> Yeah, yeah that's really big part of our responsibility, and if we didn't have any data or if we couldn't process it fast, we couldn't know who is problematic gambler and who is not. Since vast majorities, of course, is enjoying it, it's a nice habit. Play a game of poker every now and then or go to the casino for once or twice a month, for example. But then there's the small portion of people who we want to protect so that they don't get into the debt. That's not our intention. >> And the level of protection that you provide, is you stop marketing to them, is that right or? >> Reni: Yeah, yeah. >> It's not like you intervene in some other way. >> Yeah, of course, we want to promote that if you want, you can stop and close your account, or this kind of activities. >> So you promote cutting the cord basically? >> Yeah, yeah, yeah. So instead of marketing, we say that this might be a problem to use, so yeah. >> Let's take a break. >> You should take a break, yeah. >> So, as Dave was saying, you're really, because you are competing with private entities you really have to have a great interface, great customer experience, great rates. How much does this put Veikkaus really on the vanguard of this kind of technology, more so than what other government agencies are doing, in the sense that, you really have to stay on the cutting edge of these things. >> Yeah, we have to be like double-backed, you say. >> So how much do you then you talk to the health agency, or other government agencies about what you're doing and sharing the best practices about capturing customer attention? >> We are actually talking more to the new players out in the field who already live and breath true to data, so that's where we can learn and, I would say that we are also in to like a lottery area itself but also in quite many other industries as well. So we have been doing this for awhile, so we have had the luxury that we have already gathered some experience and opened some paths and, well, maybe learned also from the hard way how not to do it. We of course didn't succeeded in the first runs but you just have to go and have a trial and error in some areas as well. >> And you have multiple data sources obviously, maybe talk about how you're handling those data sources, are you ingesting, how you ingest those into Pentaho, what you do with it, how you're operationalizing the analytics. Where does Pentaho fit in that whole process? >> Yeah Pentaho we use, that's like ETL process, so to get this 360 view of the customer, we have like a various data sources. After the merger, we tripled the amount of different sources, and I think more than quadrupled the amount of data. So of course, just to make the data and work of the analysts easier, we need to make some transformations to the data and in that area the Pentaho has it's place. And in the future, what we are also expecting like the future versions to help us with is the tech in the more real time data. So for example, we can put in the real time data feed for the one physical place so they can see like which machines are used well, which are not, or is there any other activities that they can learn right in their place. >> So are you in the process of instrumenting the machines at this point? >> Reni: Yep. >> And so you're putting, how does that work, is it rip and replace, is it some kind of chip that you put into the machine? How do you instrument the machine? >> It's a good thing, so that we have actually we design our own slot machines, even. >> Dave: Okay, okay so. >> So we, we can like build up from the ground up. >> Dave: Design it in. >> Yeah. We designed the hardware supports like, it's, they are big IOT machines. >> Dave: Right. >> But also the software will support us. >> And then you've got connectivity, is it hard-wired? Is it physical or is it wireless connectivity? >> We use, well, whatever is available, so... >> Dave: Depends. >> Yeah, yeah. And when we are developing like a new type of games, for example, when the slot machines should have like online all the time, like jackpot available, then of course, we have to think about what's the quality of service of the network, as well. So far, we have been like using whatever is available. >> So what does the data architecture look like? I wonder if you could paint the picture, so you've got the machines, let's just use slot machines as an example. So you have the slot machines, you've instrumented those, you're doing real time analytics there, and maybe talk about what kinds of things you do there? And then where does the data go? How much data, do you persist the data? Maybe talk about that a little. >> Yeah so we get like the slot machines and other resources as well, and have like Kafka Hadoop area where we collect everything. Then there's a Pentaho doing the ETL work and we store the, all the data that goes through it to the Vertica. So we have HP's Vertica there, in that Vertica they've like lots of users, they have like a SAS analytics, use that and the Hadoop as well, so then we have some reporting, financials, finance department they also utilize it. But then we are also building up some new things like Apache's Kudu is one thing that we want to set up there just to make the life of analysts much more easier so they are the moment having little bit hard time in some areas how to utilize the data, and especially how to use like the different analyst tools from different cloud vendors for this data since we are still at the moment on premise, so everything is on premise partly because of the government requirements. >> Dave: Okay. >> So some part of the data they require that we keep it in within the Finland. >> Right so could we call that your private cloud? >> Reni: It's not private cloud yet. >> It's not, okay. >> But we're, we are going. >> Dan: Someday. >> Yeah, yeah. >> It will be a private cloud, okay, so you have edge device, which is the slot machine, and then you do you send all the data back to Vertica or no, probably not, right, I mean. >> Not yet. >> Dave: But do you want to? >> But it will be. >> Dave: Really? >> Yeah, it will be. Of course we have to make some decision like what data will be important and what is not, so not all the data is valuable, but especially when it's like connected somehow to the customer, or the retailer as well, that data we also keep like more than a year. So we are not doing all the analytics just for a short time of data but also want to seek out the long trends and make new hypothesis out of it. >> And the Vertica system is essentially your data warehouse, is that right? >> Reni: Yeah. >> Okay. And then are you doing sort of, well you mentioned recommendation engine so you're doing some >> Reni: Yeah. form of it. That's a form of AI, as far as I'm concerned. Are you doing that, where are you doing that? Is you doing that in your data center, and is that another layer of the data pipeline or is that done in the? >> Yeah, it's done partly on site but also in AVS. >> Yeah >> So we used Amazon services in some areas where we can use those, so the recommendation for example, and part of the cost of AI, that's part, some blocks are also on the AWS. >> So it's a three tier. >> Reni: Yeah. >> So there's the edge, then there's the aggregation at Vertica, and then there's the cloud modeling and training that goes on, and Pentaho plays across that data pipeline, is that right? >> Yeah, yeah, it's our one major player in our data platform in this sense so that it will take care quite a many different kind of transactions so that we have the right data in the right place. >> Dave: All right I'm done geeking out. (laughing) >> All right, so Reni before the cameras were rolling, we were talking a little bit about the difficulties of cultural change within these organizations and you were talking about something that you're working on in Finland that's not necessarily related to Veikkaus, can you tell our viewers a little bit about what you're doing? >> Yeah, we are also setting up a Teal Finland, so promoting this like next phase of organizational, well you cannot call it belief, but vision and perspective so we want to also promote these kind of activities. So I know that especially with the big data movement, you have also seen the cultural changes so not the normal organization ways of working are not, just are not efficient enough so you have to liberate today, you have to give the freedom, how to use the data, what kind of hypothesis, what kind of activities are done, and this cultural change is also with the Teal movement. It's like getting next big leap so this is, well it's a side project but it's also really heavily work related. >> And how open is the Finnish tech community to these ideas, I mean is there an adversarial relationship within the people who don't necessarily welcome the change, I mean how would you describe it? >> I believe it's a really open, we have already, I believe, a handful of companies who work and who operate by this, from this perspective and more is popping out. And we are establishing one cooperative, like to support this movement, and maybe to create new spinoffs which can be for profit. >> All right, let's get to the heart of the matter here, (laughing) how do I beat the house? >> I knew you were going there, Dave. >> Just, just between us. >> I knew it. (laughing) >> Obviously I'm kidding but different games have different odds. >> Reni: Yeah. >> Right, I mean, and those are, you're transparent about that, people know what they are, but what are the best odds? Is it slots, best chance of winning, or poker, or... >> Yeah, slots is good side and also whenever you go to Cassie you know, it has a top notch, so 90 point something, so... >> Of probabilities and, >> But of course I have to say that the house wins eventually, so yeah, yeah. >> The bookeys always win so. >> Rebecca: Right exactly. >> So the higher the probability, the lower the pay out, and reverse, presumably, right? >> Reni: Yeah, yeah. >> The lottery would be. >> Lottery you're a check out if you're yeah. >> Dave: Low odds. >> Low odds but, >> Dave: Telephone numbers if you win. >> Yeah. >> Dave: Yeah. >> But David, you can't win if you don't play, okay, just saying, just saying. >> And every week there's somebody who wins. >> Rebecca: Right! >> Yeah. So why it cannot be me, or you? (laughing) >> Or me, or me, maybe! >> So what do you do to the guys who count cards, you like break arms or you put them in jail, no? >> It's Finland, this is no, no, come on. >> Nobody does that, right? >> Reni: No, no, no. But of course, yeah that's probably something we could in future also to use data more efficiently than we use it at the moment, so that's one part like how people behave versus machines behave. So for example in the online poker, the card counting program, that's one problem I think every, for the industry. >> Dave: Right. >> Are you working with behavioral finance experts in this to sort of understand people's behavior when it comes to this? >> Yeah we work, for example, with psychologists to understand this and the same goes with problematic gambling as well so you have to know about how people behave. >> And do you have customers outside of Finland or is it pretty much exclusively? >> No, sorry, it's exclusive club, you have to move to, you know you have to move to Finland. (laughing) And then we welcome you. >> Awesome. >> He's going to immigrate, I think, any day now. Well Reni, >> Reni: But hey, it's one of the best countries. >> Thank you so much for coming on The Cube, it was a lot of fun talking to you. >> Yeah, thank you. >> I'm Rebecca Knight, for Dave Vellante, we will have more from PentahoWorld just after this.

Published Date : Oct 26 2017

SUMMARY :

Brought to you by Hitachi Ventara. he is the IT manager of Veikkaus. So, Veikkaus is the Finnish and we gather money, of course, of the idea things we So the public policy on and put it to the public good, have all the freedom all that stuff. I mean that industry is So how do you analytics and at the moment we is that you obviously want and if we didn't have any data or It's not like you we want to promote that we say that this might doing, in the sense that, Yeah, we have to be like the luxury that we have already And you have multiple After the merger, we tripled the amount we have actually we design So we, we can like build We designed the hardware We use, well, whatever So far, we have been like So you have the slot machines, So we have HP's Vertica there, So some part of the data all the data back to Vertica so not all the data is And then are you doing of the data pipeline Yeah, it's done partly for example, and part of the cost of AI, kind of transactions so that we have Dave: All right I'm done geeking out. so you have to liberate today, And we are establishing one cooperative, I knew it. have different odds. and those are, you're to Cassie you know, it has a top notch, to say that the house check out if you're yeah. But David, you can't win And every week there's So why it cannot be me, or you? So for example in the online poker, so you have to know And then we welcome you. He's going to immigrate, it's one of the best countries. Thank you so much we will have more from

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
David	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Reni	PERSON	0.99+
Reni Waegelein	PERSON	0.99+
Finland	LOCATION	0.99+
Rebecca	PERSON	0.99+
Dan	PERSON	0.99+
Veikkaus	ORGANIZATION	0.99+
Apache	ORGANIZATION	0.99+
Vertica	ORGANIZATION	0.99+
three companies	QUANTITY	0.99+
90 point	QUANTITY	0.99+
HP	ORGANIZATION	0.99+
one problem	QUANTITY	0.99+
Orlando, Florida	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
PentahoWorld	ORGANIZATION	0.99+
more than a year	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
18+ year old	QUANTITY	0.99+
once	QUANTITY	0.99+
Hitachi Ventara	ORGANIZATION	0.98+
360 view	QUANTITY	0.98+
one part	QUANTITY	0.97+
Pentaho	ORGANIZATION	0.96+
Veikkaus	PERSON	0.96+
one	QUANTITY	0.96+
Kafka Hadoop	TITLE	0.96+
first runs	QUANTITY	0.96+
two-edged	QUANTITY	0.96+
twice a month	QUANTITY	0.95+
one major player	QUANTITY	0.95+
The Cube	ORGANIZATION	0.95+
more than 2 million registered customers	QUANTITY	0.94+
Hadoop	TITLE	0.94+
three tier	QUANTITY	0.9+
5.4 million habitats	QUANTITY	0.89+
Finnish	OTHER	0.89+
single state	QUANTITY	0.87+
one thing	QUANTITY	0.85+
2017	DATE	0.84+
today	DATE	0.83+
Cassie	TITLE	0.83+
Vertica	TITLE	0.82+

Mark Grover & Jennifer Wu | Spark Summit 2017

>> Announcer: Live from San Francisco, it's the Cube covering Spark Summit 2017, brought to you by databricks. >> Hi, we're back here where the Cube is live, and I didn't even know it Welcome, we're at Spark Summit 2017. Having so much fun talking to our guests I didn't know the camera was on. We are doing a talk with Cloudera, a couple of experts that we have here. First is Mark Grover, who's a software engineer and an author. He wrote the book, "Dupe Application Architectures." Mark, welcome to the show. >> Mark: Thank you very much. Glad to be here. And just to his left we also have Jennifer Wu, and Jennifer's director of product management at Cloudera. Did I get that right? >> That's right. I'm happy to be here, too. >> Alright, great to have you. Why don't we get started talking a little bit more about what Cloudera is maybe introducing new at the show? I saw a booth over here. Mark, do you want to get started? >> Mark: Yeah, there are two exciting things that we've launched at least recently. There Cloudera Altus, which is for transient work loads and being able to do ETL-Like workloads, and Jennifer will be happy to talk more about that. And then there's Cloudera data science workbench, which is this tool that allows folks to use data science at scale. So, get away from doing data science in silos on your personal laptops, and do it in a secure environment on cloud. >> Alright, well, let's jump into Data Science Workbench first. Tell me a little bit more about that, and you mentioned it's for exploratory data science. So give us a little more detail on what it does. >> Yeah, absolutely. So, there was private beta for Cloudera Data Science Workbench earlier in the year and then it was GA a few months ago. And it's like you said, an exploratory data science tool that brings data science to the masses within an enterprise. Previously people used to have, it was this dichotomy, right? As a data scientist, I want to have the latest and greatest tools. I want to use the latest version of Python, the latest notebook kernel, and I want to be able to use R and Python to be able to crunch this data and run my models in machine learning. However, on the other side of this dichotomy are the IT organization of the organization, where if they want to make sure that all tools are compliant and that your clusters are secure, and your data is not going into places that are not secured by state of the art security solutions, like Kerberos for example, right? And of course if the data scientists are putting the data on their laptops and taking the laptop around to wherever they go, that's not really a solution. So, that was one problem. And the other one was if you were to bring them all together in the same solution, data scientists have different requirements. One may want to use Python 2.6. Another one maybe want to use 3.2, right? And so Cloudera Data Science Workbench is a new product that allows data scientists to visualize and do machine learning through this very nice notebook-like interface, share their work with the rest of their colleagues in the organization, but also allows you to keep your clusters secure. So it allows you to run against a Kerberized cluster, allows single sign on to your web interface to Data Science Workbench, and provides a really nice developer experience in the sense that My workflow and my tools and my version of Python does not conflict with Jennifer's version of Python. We all have our own docker and Kubernetes-based infrastructure that makes sure that we use the packages that we need, and they don't interfere with each other. We're going to go to Jennifer on Altus in just a few minutes, but George first give you a chance to maybe dig in on Data Science workshop. >> Two questions on the data science side: some of the really toughest nuts to crack have been Sort of a common environment for the collaborators, but also the ability to operationalize the models once you've sort of agreed on them, and manage the lifecycle across teams, you know? Like, challenger champion, promote something, or even before that doing the ab testing, and then sort of what's in production is typically in a different language from what, you know, it was designed in and sort of integrating it with the apps. Where is that on the road map? Cause no one really has a good answer for that. >> Yeah, that's an excellent question. In general I think it's the problem to crack these days. How do you productionalize something that was written by a data scientist in a notebook-like system onto the production cluster, right? And I think the part where the data scientist works in a different language than the language that's in production, I think that problem, the best I can say right now is to actually have someone rewrite that. Have someone rewrite that in the language you're going to make in production, right? I don't see that to be the more common part. I think the more widespread problem is even when the language is production, how do you go making the part that the data scientist wrote, the model or whatever that would be, into a prodution cluster? And so, Data Science Workbench in particular runs on the same cluster that is being managed by Cloudera manager, right? So this is a tool that you install, but that is available to you as a web server, as a web interface, and so that allows you to move your development machine learning algorithms from your data science workbench to production much more easier, because it's all running on the same hardware and same systems. There's no separate Cloudera managers that you have to use to manage the workbench compared to your actual cluster. >> Okay. A tangential question, but one of the, the difficulties of doing machine learning is finding all the training data and, and sort of data science expertise to sit with the domain expert to, you know, figure out proper model of features, things like that. One of the things we've seen so far from the cloud vendors is they take their huge datasets in terms of voice, you know, images. They do the natural language understanding, speech or rather text to speech, you know, facial recognition. Cause they have such huge datasets they can train on. We're hearing noises that they'd going to take that down to the more mundane statistical kind of machine learning algorithms, so that you wouldn't be, like, here's a algorithm to do churn, you know, go to town, but that they might have something that's already kind of pre-populated that you would just customize. Is that something that you guys would tackle, too? >> I can't speak for the road map in that sense, but I think some of that problem needs to be tackled by projects like Spark for example. So I think as the stack matures, it's going to raise the level of abstraction as time goes on. And I think whatever benefits Spark ecosystem will have will come directly to distributions like Cloudera. >> George: That's interesting. >> Yeah >> Okay >> Alright, well let's go to Jennifer now and talk about Altus a little bit. Now you've been on the Cube show before, right? >> I have not. >> Okay, well, familiar with your work. Tell us again, you're the product manager for Altus. What does it do, and what was the motivation to build it? >> Yeah, we're really excited about Cloudera Altus. So, we released Cloudera Altus in its first GA form in April, and we launched Cloudera Altus in a public environment in Strata London about two weeks ago, so we're really excited about this and we are very excited to now open this up to all of the customer base. And what it is is a platform as a service offering designed to leverage, basically, the agility and the scale of cloud, and make a very easy to use type of experience to expose Cloudera capacity for, in particular for data engineering type of workloads. So the end user will be able to very easily, in a very agile manner, get data engineering capacity on Cloudera in the cloud, and they'll be able to do things like ETL and large scale data processing, productionized machine learning workflows in the cloud with this new data engineering as a service experience. And we wanted to abstract away the cloud, and cluster operations, and make the end user a really, the end user experience very easy. So, jobs and workloads as first class objects. You can do things like submit jobs, clone jobs, terminate jobs, troubleshoot jobs. We wanted to make this very, very easy for the data engineering end user. >> It does sound like you've sort of abstracted away a lot of the infrastructure that you would associate with on-prem, and sort of almost make it, like, programmable and invisible. But, um, I guess my, one of my questions is when you put it in a cloud environment, when you're on-prem you have a certain set of competitors which is kind of restrictive, because you are the standalone platform. But when you go on the cloud, someone might say, "I want to use red shift on Amazon," or Snowflake, you know, as the MPP sequel database at the end of a pipeline. And it's not just, I'm using those as examples. There's, you know, dozens, hundreds, thousands of other services to choose from. >> Yes. >> What happens to the integrity of that platform if someone carves off one piece? >> Right. So, interoperability and a unified data pipeline is very important to us, so we want to make sure that we can still service the entire data pipeline all the way from ingest and data processing to analytics. So our team has 24 different open source components that we deliver in the CDH distribution, and we have committers across the entire stack. We know the application, and we want to make sure that everything's interoperable, no matter how you deploy the cluster. So if you deploy data engineering clusters through Cloudera Altus, but you deployed Impala clusters for data marks in the cloud through Cloudera Director or through any other format, we want all these clusters to be interoperable, and we've taken great pains in order to make everything work together well. >> George: Okay. So how do Altus and Sata Science Workbench interoperate with Spark? Maybe start with >> You want to go first with Altus? >> Sure, so, we, in terms of interoperability we focus on things like making sure there are no data silos so that the data that you use for your entire data lake can be consumed by the different components in our system, the different compute engines and different tools, and so if you're processing data you can also look at this data and visualize this data through Data Science Workbench. So after you do data ingestion and data processing, you can use any of the other analytic tools and then, and this includes Data Science Workbench. >> Right, and for Data Science Workbench runs, for example, with the latest version of Spark you could pick, the currently latest released version of Spark, Spark 2.1, Spark 2.2 is being boarded of course, and that will soon be integrated after its release. For example you could use Data Science Workbench with your flavor of Spark two's version and you can run PySpark or Scala jobs on this notebook-like interface, be able to share your work, and because you're using Spark Underneath the hood it uses yarn for resource management, the Data Science Workbench itself uses Docker for configuration management, and Kubernetes for resource managing these Docker containers. >> What would be, if you had to describe sort of the edge conditions and the sweet spot of the application, I mean you talked about data engineering. One thing, we were talking to Matei Zaharia and Ronald Chin about was, and Ali Ghodsi as well was if you put Spark on a database, or at least a, you know, sophisticated storage manager, like Kudu, all of a sudden there're a whole new class of jobs or applications that open up. Have you guys thought about what that might look like in the future, and what new applications you would tackle? >> I think a lot of that benefit, for example, could be coming from the underlying storage engine. So let's take Spark on Kudu, for example. The inherent characteristics of Kudu today allow you to do updates without having to either deal with the complexity of something like Hbase, or the crappy performance of dealing HDFS compactions, right? So the sweet spot comes from Kudu's capabilities. Of course it doesn't support transactions or anything like that today, but imagine putting something like Spark and being able to use the machine learning libraries and, we have been limited so far in the machine learning algorithms that we have implemented in Spark by the storage system sometimes, and, for example new machine learning algorithms or the existing ones could rewritten to make use of the update features for example, in Kudu. >> And so, it sounds like it makes it, the machine learning pipeline might get richer, but I'm not hearing that, and maybe this isn't sort of in the near term sort of roadmap, the idea that you would build sort of operational apps that have these sophisticated analytics built in, you know, where the analytics, um, you've done the training but at run time, you know, the inferencing influences a transaction, influences a decision. Is that something that you would foresee? >> I think that's totally possible. Again, at the core of it is the part that now you have one storage system that can do scans really well, and it can also do random reads and writes any place, right? So as your, and so that allows applications which were previously siloed because one appication that ran off of HDFS, another application that ran out of Hbase, and then so you had to correlate them to just being one single application that can use to train and then also use their trained data to then make decisions on the new transactions that come in. >> So that's very much within the sort of scope of imagination, or scope. That's part of sort of the ultimate plan? >> Mark: I think it's definitely conceivable now, yeah. >> Okay. >> We're up against a hard break coming up in just a minute, so you each get a 30-second answer here, so it's the same question. You've been here for a day and a half now. What's the most surprising thing you've learned that you thing should be shared more broadly with the Spark community? Let's start with you. >> I think one of the great things that's happening in Spark today is people have been complaining about latency for a long time. So if you saw the keynote yesterday, you would see that Spark is making forays into reducing that latency. And if you are interested in Spark, using Spark, it's very exciting news. You should keep tabs on it. We hope to deliver lower latency as a community sooner. >> How long is one millisecond? (Mark laughs) >> Yeah, I'm largely focused on cloud infrastructure and I found here at the conference that, like, many many people are very much prepared to actually start taking more, you know, more POCs and more interest in cloud and the response in terms of all of this in Altus has been very encouraging. >> Great. Well, Jennifer, Mark, thank you so much for spending some time here on the Cube with us today. We're going to come by your booth and chat a little bit more later. It's some interesting stuff. And thank you all for watching the Cube today here at Spark Summit 2017, and thanks to Cloudera for bringing us these two experts. And thank you for watching. We'll see you again in just a few minutes with our next interview.

Published Date : Jun 7 2017

SUMMARY :

covering Spark Summit 2017, brought to you by databricks. I didn't know the camera was on. And just to his left we also have Jennifer Wu, I'm happy to be here, too. Mark, do you want to get started? and being able to do ETL-Like workloads, and you mentioned it's for exploratory data science. And the other one was if you were to bring them all together and manage the lifecycle across teams, you know? and so that allows you to move your development machine the domain expert to, you know, I can't speak for the road map in that sense, and talk about Altus a little bit. to build it? on Cloudera in the cloud, and they'll be able to do things a lot of the infrastructure that you would associate with We know the application, and we want to make sure Maybe start with so that the data that you use for your entire data lake and you can run PySpark in the future, and what new applications you would tackle? or the existing ones could rewritten to make use the idea that you would build sort of operational apps Again, at the core of it is the part that now you have That's part of sort of the ultimate plan? that you thing should be shared more broadly So if you saw the keynote yesterday, you would see that and the response in terms of all of this on the Cube with us today.

ENTITIES

Entity	Category	Confidence
Jennifer	PERSON	0.99+
Mark Grover	PERSON	0.99+
Jennifer Wu	PERSON	0.99+
Ali Ghodsi	PERSON	0.99+
George	PERSON	0.99+
Mark	PERSON	0.99+
April	DATE	0.99+
Ronald Chin	PERSON	0.99+
San Francisco	LOCATION	0.99+
Matei Zaharia	PERSON	0.99+
30-second	QUANTITY	0.99+
Cloudera	ORGANIZATION	0.99+
Dupe Application Architectures	TITLE	0.99+
dozens	QUANTITY	0.99+
Python	TITLE	0.99+
yesterday	DATE	0.99+
Two questions	QUANTITY	0.99+
today	DATE	0.99+
Spark	TITLE	0.99+
Amazon	ORGANIZATION	0.99+
two experts	QUANTITY	0.99+
a day and a half	QUANTITY	0.99+
First	QUANTITY	0.99+
one problem	QUANTITY	0.99+
Python 2.6	TITLE	0.99+
Strata London	LOCATION	0.99+
one piece	QUANTITY	0.99+
first	QUANTITY	0.98+
Spark Summit 2017	EVENT	0.98+
Cloudera Altus	TITLE	0.98+
Scala	TITLE	0.98+
Docker	TITLE	0.98+
One	QUANTITY	0.97+
Kudu	ORGANIZATION	0.97+
one millisecond	QUANTITY	0.97+
PySpark	TITLE	0.96+
R	TITLE	0.95+
one	QUANTITY	0.95+
two weeks ago	DATE	0.93+
Data Science Workbench	TITLE	0.92+
Cloudera	TITLE	0.91+
hundreds	QUANTITY	0.89+
Hbase	TITLE	0.89+
each	QUANTITY	0.89+
24 different open source components	QUANTITY	0.89+
few months ago	DATE	0.89+
single	QUANTITY	0.88+
kernel	TITLE	0.88+
Altus	TITLE	0.88+

Wikibon Big Data Market Update pt. 2 - Spark Summit East 2017 - #SparkSummit - #theCUBE

(lively music) >> [Announcer] Live from Boston, Massachusetts, this is the Cube, covering Sparks Summit East 2017. Brought to you by Databricks. Now, here are your hosts, Dave Vellante and George Gilbert. >> Welcome back to Sparks Summit in Boston, everybody. This is the Cube, the worldwide leader in live tech coverage. We've been here two days, wall-to-wall coverage of Sparks Summit. George Gilbert, my cohost this week, and I are going to review part two of the Wikibon Big Data Forecast. Now, it's very preliminary. We're only going to show you a small subset of what we're doing here. And so, well, let me just set it up. So, these are preliminary estimates, and we're going to look at different ways to triangulate the market. So, at Wikibon, what we try to do is focus on disruptive markets, and try to forecast those over the long term. What we try to do is identify where the traditional market research estimates really, we feel, might be missing some of the big trends. So, we're trying to figure out, what's the impact, for example, of real time. And, what's the impact of this new workload that we've been talking about around continuous streaming. So, we're beginning to put together ways to triangulate that, and we're going to show you, give you a glimpse today of what we're doing. So, if you bring up the first slide, we showed this yesterday in part one. This is our last year's big data forecast. And, what we're going to do today, is we're going to focus in on that line, that S-curve. That really represents the real time component of the market. The Spark would be in there. The Streaming analytics would be in there. Add some color to that, George, if you would. >> [George] Okay, for 60 years, since the dawn of computing, we have two ways of interacting with computers. You put your punch cards in, or whatever else and you come back and you get your answer later. That's batch. Then, starting in the early 60's, we had interactive, where you're at a terminal. And then, the big revolution in the 80's was you had a PC, but you still were either interactive either with terminal or batch, typically for reporting and things like that. What's happening is the rise of a new interaction mode. Which is continuous processing. Streaming is one way of looking at it but it might be more effective to call it continuous processing because you're not going to get rid of batch or interactive but your apps are going to have a little of each. So, what we're trying to do, since this is early, early in its life cycle, we're going to try and look at that streaming component from a couple of different angles. >> Okay, as I say, that's represented by this Ogive curve, or the S-curve. On the next slide, we're at the beginning when you think about these continuous workloads. We're at the early part of that S-curve, and of course, most of you or many of you know how the S-curve works. It's slow, slow, slow. For a lot of effort, you don't get much in return. Then you hit the steep part of that S-curve. And that's really when things start to take off. So, the challenge is, things are complex right now. That's really what this slide shows. And Spark is designed, really, to reduce some of that complexity. We've heard a lot about that, but take us through this. Look at this data flow from ingest, to explore, to process, to serve. We talked a lot about that yesterday, but this underscores the complexity in the marketplace. >> [George] Right, and while we're just looking mostly at numbers today, the point of the forecast is to estimate when the barriers, representing complexities, start to fall. And then, when we can put all these pieces together, in just explore, process, serve. When that becomes an end-to-end pipeline. When you can start taking the data in on one end, get a scientist to turn it into a model, inject it into an application, and that process becomes automated. That's when it's mature enough for the knee in the curve to start. >> And that's when we think the market's going to explode. But now so, how do you bound this. Okay, when we do forecasts, we always try to bound things. Because if they're not bounded, then you get no foundation. So, if you look at the next slide, we're trying to get a sense of real-time analytics. How big can it actually get? That's what this slide is really trying to-- >> [George] So this one was one firm's take on real-time analytics, where by 2027, they see it peaking just under-- >> [Dave] When you say one firm, you mean somebody from the technology district? >> [George] Publicly available data. And we take it as as a, since they didn't have a lot of assumptions published, we took it as, okay one data point. And then, we're going to come at it with some bottoms-up end top-down data points, and compare. >> [Dave] Okay, so the next slide we want to drill into the DBMS market and when you think about DBMS, you think about the traditional RDBMS and what we know, or the Oracle, SQL Server, IBMDB2's, etc. And then, you have this emergent NewSQL, and noSQL entrance, which are, obviously, we talked today to a number of folks. The number of suppliers is exploding. The revenue's still relatively small. Certainly small relative to the RDBMS marketplace. But, take us through what your expectations is here, and what some of the assumptions are behind this. >> [George] Okay, so the first thing to understand is the DBMS market, overall, is about $40 billion of which 30 billion goes to online transaction processing supporting real operational apps. 10 billion goes to Orlap or business intelligence type stuff. The Orlap one is shrinking materially. The online transaction processing one, new sales is shrinking materially but there's a huge maintenance stream. >> [Dave] Yeah which companies like Oracle and IBM and Microsoft are living off of that trying to fund new development. >> We modeled that declining gently and beginning to accelerate more going out into the latter years of the tenure period. >> What's driving that decline? Obviously, you've got the big sucking sound of a dup in part, is driving that. But really, increasingly it's people shifting their resources to some of these new emergent applications and workloads and new types of databases to support them right? But these are still, those new databases, you can see here, the NewSQL and noSQL still, relatively, small. A lot of it's open source. But then it starts to take off. What's your assumption there? >> So here, what's going on is, if you look at dollars today, it's, actually, interesting. If you take the noSQL databases, you take DynamoDB, you take Cassandra, Hadoop, HBase, Couchbase, Mongo, Kudu and you add all those up, it's about, with DynamoDB, it's, probably, about 1.55 billion out of a $40 billion market today. >> [Dave] Okay but it's starting to get meaningful. We were approaching two billion. >> But where it's meaningful is the unit share. If that were translated into Oracle pricing. The market would be much, much bigger. So the point it. >> Ten X? >> At least, at least. >> Okay, so in terms of work being done. If there's a measure of work being done. >> [George] We're looking at dollars here. >> Operations per second or etcetera, it would be enormous. >> Yes, but that's reflective of the fact that the data volumes are exploding but the prices are dropping precipitously. >> So do you have a metric to demonstrate that. We're, obviously, not going to show it today but. >> [George] Yes. >> Okay great, so-- >> On the business intelligence side, without naming names, the data warehouse appliance vendors are charging anywhere from 25,000 per terabyte up to, when you include running costs, as high as 100,000 a terabyte. That their customers are estimating. That's not the selling cost but that's the cost of ownership per terabyte. Whereas, if you look at, let's say Hadoop, which is comparable for the off loading some of the data warehouse work loads. That's down to the 5K per terabyte range. >> Okay great, so you expect that these platforms will have a bigger and bigger impact? What's your pricing assumption? Is prices going to go up or is it just volume's going to go through the roof? >> I'm, actually, expecting pricing. It's difficult because we're going to add more and more functionality. Volumes go up and if you add sufficient functionality, you can maintain pricing. But as volumes go up, typically, prices go down. So it's a matter of how much do these noSQL and NewSQL databases add in terms of functionality and I distinguish between them because NewSQL databases are scaled out version of Oracle or Teradata but they are based on the more open source pricing model. >> Okay and NoSQL, don't forget, stands for not only SQL, not not SQL. >> If you look at the slides, big existing markets never fall off a cliff when they're in the climb. They just slowly fade. And, eventually, that accelerates. But what's interesting here is, the data volumes could explode but the revenue associated with the NoSQL which is the dark gray and the NewSQL which is the blue. Those don't explode. You could take, what's the DBMS cost of supporting YouTube? It would be in the many, many, many billions of dollars. It would support 1/2 of an Oracle itself probably. But it's all open source there so. >> Right, so that's minimizing the opportunity is what you're saying? >> Right. >> You can see the database market is flat, certainly flattish and even declining but you do expect some growth in the out years as part of that evasion, that volume, presumably-- >> And that's the next slide which is where we've seen that growth come from. >> Okay so let's talk about that. So the next slide, again, I should have set this up better. The X-axis year is worldwide dollars and the horizontal axis is time. And we're talking here about these continuous application work loads. This new work load that you talked about earlier. So take us through the three. >> [George] There's three types of workloads that, in large part, are going to be driving most of this revenue. Now, these aren't completely, they are completely comparable to the DBMS market because some of these don't use traditional databases. Or if they do, they're Torry databases and I'll explain that. >> [Dave] Sure but if I look at the IoT Edge, the Cloud and the micro services and streaming, that's a tail wind to the database forecast in the previous slide, is that right? >> [George] It's, actually, interesting but the application and infrastructure telemetry, this is what Splunk pioneered. Which is all the torrents of data coming out of your data center and your applications and you're trying to manage what's going on. That is a database application. And we know Splunk, for 2016, was 400 million. In software revenue Hadoop was 750 million. And the various other management vendors, New Relic, AppDynamics, start ups and 5% of Azure and AWS revenue. If you add all that up, it comes out to $1.7 billion for 2016. And so, we can put a growth rate on that. And we talked to several vendors to say, okay, how much will that work load be compared to IoT Edge Cloud. And the IoT Edge Cloud is the smart devices at the Edge and the analytics are in the fog but not counting the database revenue up in the Cloud. So it's everything surrounding the Cloud. And that, actually, if you look out five years, that's, maybe, 20% larger than the app and infrastructure telemetry but growing much, much faster. Then the third one where you were talking about was this a tail wind to the database. Micro server systems streaming are very different ways of building applications from what we do now. Now, people build their logic for the application and everyone then, stores their data in this centralized external database. In micro services, you build a little piece of the app and whatever data you need, you store within that little piece of the app. And so the database requirements are, rather, primitive. And so that piece will not drive a lot of database revenue. >> So if you could go back to the previous slide, Patrick. What's driving database growth in the out years? Why wouldn't database continue to get eaten away and decline? >> [George] In broad terms, the overall database market, it staying flat. Because as prices collapse but the data volumes go up. >> [Dave] But there's an assumption in here that the NoSQL space, actually, grows in the out years. What's driving that growth? >> [George] Both the NoSQL and the NewSQL. The NoSQL, probably, is best serving capturing the IoT data because you don't need lots of fancy query capabilities for concurrency. >> [Dave] So it is a tail wind in a sense in that-- >> [George] IoT but that's different. >> [Dave] Yeah sure but you've got the overall market growing. And that's because the new stuff, NewSQL and NoSQL is growing faster than the decline of the old stuff. And it's not in the 2020 to 2022 time frame. It's not enough to offset that decline. And then they have it start growing again. You're saying that's going to be driven by IoT and other Edge use cases? >> Yes, IoT Edge and the NewSQL, actually, is where when they mature, you start to substitute them for the traditional operational apps. For people who want to write database apps not who want to write micro service based apps. >> Okay, alright good. Thank you, George, for setting it up for us. Now, we're going to be at Big Data SV in mid March? Is that right? Middle of March. And George is going to be releasing the actual final forecast there. We do it every year. We use Spark Summit to look at our preliminary numbers, some of the Spark related forecasts like continuous work loads. And then we harden those forecasts going into Big Data SV. We publish our big data report like we've done for the past, five, six, seven years. So check us out at Big Data SV. We do that in conjunction with the Strada events. So we'll be there again this year at the Fairmont Hotel. We got a bunch of stuff going on all week there. Some really good programs going on. So check out siliconangle.tv for all that action. Check out Wikibon.com. Look for new research coming out. You're going to be publishing this quarter, correct? And of course, check out siliconangle.com for all the news. And, really, we appreciate everybody watching. George, been a pleasure co-hosting with you. As always, really enjoyable. >> Alright, thanks Dave. >> Alright, to that's a rap from Sparks. We're going to try to get out of here, hit the snow storm and work our way home. Thanks everybody for watching. A great job everyone here. Seth, Ava, Patrick and Alex. And thanks to our audience. This is the Cube. We're out, see you next time. (lively music)

Published Date : Feb 9 2017

SUMMARY :

Brought to you by Databricks. of the Wikibon Big Data Forecast. What's happening is the rise of a new interaction mode. On the next slide, we're at the beginning for the knee in the curve to start. So, if you look at the next slide, And then, we're going to come at it with some bottoms-up [Dave] Okay, so the next slide we want to drill into the [George] Okay, so the first thing to understand and IBM and Microsoft are living off of that going out into the latter years of the tenure period. you can see here, the NewSQL and you add all those up, [Dave] Okay but it's starting to get meaningful. So the point it. Okay, so in terms of work being done. it would be enormous. that the data volumes are exploding So do you have a metric to demonstrate that. some of the data warehouse work loads. the more open source pricing model. Okay and NoSQL, don't forget, but the revenue associated with the NoSQL And that's the next slide which is where and the horizontal axis is time. in large part, are going to be driving of the app and whatever data you need, What's driving database growth in the out years? the data volumes go up. that the NoSQL space, actually, grows is best serving capturing the IoT data because And it's not in the 2020 to 2022 time frame. and the NewSQL, actually, And George is going to be releasing This is the Cube.

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
George Gilbert	PERSON	0.99+
Patrick	PERSON	0.99+
George	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Dave	PERSON	0.99+
Seth	PERSON	0.99+
30 billion	QUANTITY	0.99+
Alex	PERSON	0.99+
two billion	QUANTITY	0.99+
2016	DATE	0.99+
$40 billion	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
2027	DATE	0.99+
20%	QUANTITY	0.99+
five years	QUANTITY	0.99+
New Relic	ORGANIZATION	0.99+
Orlap	ORGANIZATION	0.99+
$1.7 billion	QUANTITY	0.99+
10 billion	QUANTITY	0.99+
2020	DATE	0.99+
Boston	LOCATION	0.99+
Ava	PERSON	0.99+
mid March	DATE	0.99+
third one	QUANTITY	0.99+
last year	DATE	0.99+
AppDynamics	ORGANIZATION	0.99+
2022	DATE	0.99+
yesterday	DATE	0.99+
Wikibon	ORGANIZATION	0.99+
60 years	QUANTITY	0.99+
two days	QUANTITY	0.99+
siliconangle.com	OTHER	0.99+
400 million	QUANTITY	0.99+
750 million	QUANTITY	0.99+
YouTube	ORGANIZATION	0.99+
today	DATE	0.99+
5%	QUANTITY	0.99+
Middle of March	DATE	0.99+
Sparks Summit	EVENT	0.99+
first slide	QUANTITY	0.99+
three	QUANTITY	0.99+
two ways	QUANTITY	0.98+
Boston, Massachusetts	LOCATION	0.98+
early 60's	DATE	0.98+
about $40 billion	QUANTITY	0.98+
one firm	QUANTITY	0.98+
this year	DATE	0.98+
Ten X	QUANTITY	0.98+
Spark Summit	EVENT	0.97+
25,000 per terabyte	QUANTITY	0.97+
80's	DATE	0.97+
Databricks	ORGANIZATION	0.97+
DynamoDB	TITLE	0.97+
three types	QUANTITY	0.97+
Both	QUANTITY	0.96+
Sparks Summit East 2017	EVENT	0.96+
Spark Summit East 2017	EVENT	0.96+
this week	DATE	0.95+
Spark	TITLE	0.95+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Kudu: