Brian Schwarz, Google Cloud | VeeamON 2022

(soft intro music) >> Welcome back to theCUBE's coverage of VeeamON 2022. Dave Vellante with David Nicholson. Brian Schwarz is here. We're going to stay on cloud. He's the director of product management at Google Cloud. The world's biggest cloud, I contend. Brian, thanks for coming on theCUBE. >> Thanks for having me. Super excited to be here. >> Long time infrastructure as a service background, worked at Pure, worked at Cisco, Silicon Valley guy, techie. So we're going to get into it here. >> I love it. >> I was saying before, off camera. We used to go to Google Cloud Next every year. It was an awesome show. Guys built a big set for us. You joined, right as the pandemic hit. So we've been out of touch a little bit. It's hard to... You know, you got one eye on the virtual event, but give us the update on Google Cloud. What's happening generally and specifically within storage? >> Yeah. So obviously the Cloud got a big boost during the pandemic because a lot of work went online. You know, more things kind of being digitally transformed as people keep trying to innovate. So obviously the growth of Google Cloud, has got a big tailwind to it. So business has been really good, lots of R&D investment. We obviously have an incredible set of technology already but still huge investments in new technologies that we've been bringing out over the past couple of years. It's great to get back out to events to talk to people about 'em. Been a little hard the last couple of years to give people some of the insights. When I think about storage, huge investments, one of the things that some people know but I think it's probably underappreciated is we use the same infrastructure for Google Cloud that is used for Google consumer products. So Search and Photos and all the public kind of things that most people are familiar with, Maps, et cetera. Same infrastructure at the same time is also used for Google Cloud. So we just have this tremendous capability of infrastructure. Google's got nine products that have a billion users most of which many people know. So we're pretty good at storage pretty good at compute, pretty good at networking. Obviously a lot of that kind of shines through on Google Cloud for enterprises to bring their applications, lift and shift and/or modernize, build new stuff in the Cloud with containers and things like that. >> Yeah, hence my contention that Google has the biggest cloud in the world, like I said before. Doesn't have the most IS revenue 'cause that's a different business. You can't comment, but I've got Google Cloud running at $12 billion a year run rate. So a lot of times people go, "Oh yeah, Google they're third place going for the bronze." But that is a huge business. There aren't a lot of 10, $12 billion infrastructure companies. >> In a rapidly growing market. >> And if you do some back of napkin math, whatever, give me 10, 15, let's call it 15% of that, to storage. You've got a big storage business. I know you can't tell us how big, but it's big. And if you add in all the stuff that's not in GCP, you do a lot of storage. So you know storage, you understand the technology. So what is the state of technology? You have a background in Cisco, nearly a networking company, they used to do some storage stuff sort of on the side. We used to say they're going to buy NetApp, of course that never happened. That would've made no sense. Pure Storage, obviously knows storage, but they were a disk array company essentially. Cloud storage, what's different about it? What's different in the technology? How does Google think about it? >> You know, I always like to tell people there's some things that are the same and familiar to you, and there's some things that are different. If I start with some of the differences, object storage in the Cloud, like just fundamentally different. Object storage on-prem, it's been around for a while, often used as kind of like a third tier of storage, maybe a backup target, compliance, something like that. In the cloud, object storage is Tier one storage. Public reference for us, Spotify, okay, use object storage for all the songs out there. And increasingly we see a lot of growth in-- >> Well, how are you defining Tier one storage in that regard? Again, are you thinking streaming service? Okay. Fine. Transactional? >> Spotify goes down and I'm pissed. >> Yeah. This is true. (Dave laughing) >> Not just you, maybe a few million other people too. One is importance, business importance. Tier one applications like critical to the business, like business down type stuff. But even if you look at it for performance, for capabilities, object storage in the cloud, it's a different thing than it was. >> Because of the architecture that you're deploying? >> Yeah. And the applications that we see running on it. Obviously, a huge growth in our business in AI and analytics. Obviously, Google's pretty well known in both spaces, BigQuery, obviously on the analytics side, big massive data warehouses and obviously-- >> Gets very high marks from customers. >> Yeah, very well regarded, super successful, super popular with our customers in Google Cloud. And then obviously AI as well. A lot of AI is about getting structure from unstructured data. Autonomous vehicles getting pictures and videos around the world. Speech recognition, audio is a fundamentally analog signal. You're trying to train computers to basically deal with analog things and it's all stored in object storage, machine learning on top of it, creating all the insights, and frankly things that computers can deal with. Getting structure out of the unstructured data. So you just see performance capabilities, importance as it's really a Tier one storage, much like file and block is where have kind of always been. >> Depending on, right, the importance. Because I mean, it's a fair question, right? Because we're used to thinking, "Oh, you're running your Oracle transaction database on block storage." That's Tier one. But Spotify's pretty important business. And again, on BigQuery, it is a cloud-native born in the cloud database, a lot of the cloud databases aren't, right? And that's one of the reasons why BigQuery is-- >> Google's really had a lot of success taking technologies that were built for some of the consumer services that we build and turning them into cloud-native Google Cloud. Like HDFS, who we were talking about, open source technologies came originally from the Google file system. Now we have a new version of it that we run internally called Colossus, incredible technologies that are cloud scale technologies that you can use to build things like Google Cloud storage. >> I remember one of the early Hadoop worlds, I was talking to a Google engineer and saying, "Well, wow, that's so cool that Hadoop came. You guys were the main spring of that." He goes, "Oh, we're way past Hadoop now." So this is early days of Hadoop (laughs) >> It's funny whenever Google says consumer services, usually consumer indicates just for me. But no, a consumer service for Google is at a scale that almost no business needs at a point in time. So you're not taking something and scaling it up-- >> Yeah. They're Tier one services-- for sure. >> Exactly. You're more often pairing it down so that a fortune 10 company can (laughs) leverage it. >> So let's dig into data protection in the Cloud, disaster recovery in the Cloud, Ransomware protection and then let's get into why Google. Maybe you could give us the trends that you're seeing, how you guys approach it, and why Google. >> Yeah. One of the things I always tell people, there's certain best practices and principles from on-prem that are just still applicable in the Cloud. And one of 'em is just fundamentals around recovery point objective and recovery time objective. You should know, for your apps, what you need, you should tier your apps, get best practice around them and think about those in the Cloud as well. The concept of RPO and RTO don't just magically go away just 'cause you're running in the Cloud. You should think about these things. And it's one of the reasons we're here at the VeeamON event. It's important, obviously, they have a tremendous skill in technology, but helping customers implement the right RPO and RTO for their different applications. And they also help do that in Google Cloud. So we have a great partnership with them, two main offerings that they offer in Google. One is integration for their on-prem things to use, basically Google as a backup target or DR target and then cloud-native backups they have some technologies, Veeam backup for Google. And obviously they also bought Kasten a while ago. 'Cause they also got excited about the container trend and obviously great technologies for those customers to use those in Google Cloud as well. >> So RPO and RTO is kind of IT terms, right? But we think of them as sort of the business requirement. Here's the business language. How much data are you willing to lose? And the business person says, "What? I don't want to lose any data." Oh, how big's your budget, right? Oh, okay. That's RPO. RTO is how fast you want to get it back? "How fast do you want to get it back if there's an outage?" "Instantly." "How much money do you want to spend on that?" "Oh." Okay. And then your application value will determine that. Okay. So that's what RPO and RTO is for those who you may not know that. Sometimes we get into the acronym too much. Okay. Why Google Cloud? >> Yeah. When I think about some of the infrastructure Google has and like why does it matter to a customer of Google Cloud? The first couple things I usually talk about is networking and storage. Compute's awesome, we can talk about containers and Kubernetes in a little bit, but if you just think about core infrastructure, networking, Google's got one of the biggest networks in the world, obviously to service all these consumer applications. Two things that I often tell people about the Google network, one, just tremendous backbone bandwidth across the regions. One of the things to think about with data protection, it's a large data set. When you're going to do recoveries, you're pushing lots of terabytes often and big pipes matter. Like it helps you hit the right recovery time objective 'cause you, "I want to do a restore across the country." You need good networks. And obviously Google has a tremendous network. I think we have like 20 subsea cables that we've built underneath the the world's oceans to connect the world on the internet. >> Awesome. >> The other thing that I think is really underappreciated about the Google network is how quickly you get into it. One of the reasons all the consumer apps have such good response time is there's a local access point to get into the Google network somewhere close to you almost anywhere in the world. I'm sure you can find some obscure place where we don't have an access point, but look Search and Photos and Maps and Workspace, they all work so well because you get in the Google network fast, local access points and then we can control the quality of service. And that underlying substrate is the same substrate we have in Google Cloud. So the network is number one. Second one in storage, we have some really incredible capabilities in cloud storage, particularly around our dual region and multi-region buckets. The multi-region bucket, the way I describe it to people, it's a continent sized bucket. Single bucket name, strongly consistent that basically spans a continent. It's in some senses a little bit of the Nirvana of storage. No more DR failover, right? In a lot of places, traditionally on-prem but even other clouds, two buckets, failover, right? Orchestration, set up. Whenever you do orchestration, the DR is a lot more complicated. You got to do more fire drills, make sure it works. We have this capability to have a single name space that spans regions and it has strong read after write consistency, everything you drop into it you can read back immediately. >> Say I'm on the west coast and I have a little bit of an on-premises data center still and I'm using Veeam to back something up and I'm using storage within GCP. Trace out exactly what you mean by that in terms of a continent sized bucket. Updates going to the recovery volume, for lack of a better term, in GCP. Where is that physically? If I'm on the west coast, what does that look like? >> Two main options. It depends again on what your business goals are. First option is you pick a regional bucket, multiple zones in a Google Cloud region are going to store your data. It's resilient 'cause there's three zones in the region but it's all in one region. And then your second option is this multi-region bucket, where we're basically taking a set of the Google Cloud regions from around North America and storing your data basically in the continent, multiple copies of your data. And that's great because if you want to protect yourself from a regional outage, right? Earthquake, natural disaster of some sort, this multi-region, it basically gives you this DR protection for free and it's... Well, it's not free 'cause you have to pay for it of course, but it's a free from a failover perspective. Single name space, your app doesn't need to know. You restart the app on the east coast, same bucket name. >> Right. That's good. >> Read and write instantly out of the bucket. >> Cool. What are you doing with Veeam? >> So we have this great partnership, obviously for data protection and DR. And I really often segment the conversation into two pieces. One is for traditional on-prem customers who essentially want to use the Cloud as either a backup or a DR target. Traditional Veeam backup and replication supports Google Cloud targets. You can write to cloud storage. Some of these advantages I mentioned. Our archive storage, really cheap. We just actually lowered the price for archive storage quite significantly, roughly a third of what you find in some of the other competitive clouds if you look at the capabilities. Our archive class storage, fast recovery time, right? Fast latency, no hours to kind of rehydrate. >> Good. Storage in the cloud is overpriced. >> Yeah. >> It is. It is historically overpriced despite all the rhetoric. Good. I didn't know that. I'm glad to hear. >> Yeah. So the archive class store, so you essentially read and write into this bucket and restore. So it's often one of the things I joke with people about. I live in Silicon Valley, I still see the tape truck driving around. I really think people can really modernize these environments and use the cloud as a backup target. You get a copy of your data off-prem. >> Don't you guys use tape? >> Well, we don't talk a lot about-- >> No comment. Just checking. >> And just to be clear, when he says cloud storage is overpriced, he thinks that a postage stamp is overpriced, right? >> No. >> If I give you 50 cents, are you going to deliver a letter cross country? No. Cloud storage, it's not overpriced. >> Okay. (David laughing) We're going to have that conversation. I think it's historically overpriced. I think it could be more attractive, relative to the cost of the underlying technology. So good for you guys pushing prices. >> Yeah. So this archive class storage, is one great area. The second area we really work with Veeam is protecting cloud-native workloads. So increasingly customers are running workloads in the Cloud, they run VMware in the Cloud, they run normal VMs, they run containers. Veeam has two offerings in Google that essentially help customers protect that data, hit their RPO, RTO objectives. Another thing that is not different in the Cloud is the need to meet your compliance regulations, right? So having a product like Veeam that is easy to show back to your auditor, to your regulator to make sure that you have copies of your data, that you can hit an appropriate recovery time objective if you're in finance or healthcare, energy. So there's some really good Veeam technologies that work in Google Cloud to protect applications that actually run in Google Cloud all in. >> To your point about the tape truck I was kind of tongue in cheek, but I know you guys use tape. But the point is you shouldn't have to call the tape truck, right, you should go to Google and say, "Okay. I need my data back." Now having said that sometimes the highest bandwidth in the world is putting all this stuff on the truck. Is there an option for that? >> Again, it gets back to this networking capability that I mentioned. Yes. People do like to joke about, okay, trucks and trains and things can have a lot of bandwidth, big networks can push a lot of data around, obviously. >> And you got a big network. >> We got a huge network. So if you want to push... I've seen statistics. You can do terabits a second to a single Google Cloud storage bucket, super computing type performance inside Google Cloud, which from a scale perspective, whether it be network compute, these are things scale. If there's one thing that Google's really, really good at, it's really high scale. >> If your's companies can't afford to. >> Yeah, if you're that sensitive, avoid moving the data altogether. If you're that sensitive, have your recovery capability be in GCP. >> Yeah. Well, and again-- >> So that when you're recovering you're not having to move data. >> It's approximate to, yeah. That's the point. >> Recovering GCV, fail over your VMware cluster. >> Exactly. >> And use the cloud as a DR target. >> We got very little time but can you just give us a rundown of your portfolio in storage? >> Yeah. So storage, cloud storage for object storage got a bunch of regional options and classes of storage, like I mentioned, archive storage. Our first party offerings in the file area, our file store, basic enterprise and high scale, which is really for highly concurrent paralyzed applications. Persistent disk is our block storage offering. We also have a very high performance cash block storage offering and local SSDs. So that's the main kind of food groups of storage, block file object, increasingly doing a lot of work in data protection and in transfer and distributed cloud environments where the edge of the cloud is pushing outside the cloud regions themselves. But those are our products. Also, we spend a lot of time with our partners 'cause Google's really good at building and open sourcing and partnering at the same time hence with Veeam, obviously with file. We partner with NetApp and Dell and a bunch of folks. So there's a lot of partnerships we have that are important to us as well. >> Yeah. You know, we didn't get into Kubernetes, a great example of open source, Istio, Anthos, we didn't talk about the on-prem stuff. So Brian we'll have to have you back and chat about those things. >> I look forward to it. >> To quote my friend Matt baker, it's not a zero sum game out there and it's great to see Google pushing the technology. Thanks so much for coming on. All right. And thank you for watching. Keep it right there. Our next guest will be up shortly. This is Dave Vellante for Dave Nicholson. We're live at VeeamON 2022 and we'll be right back. (soft beats music)

Published Date : May 18 2022

SUMMARY :

He's the director of product Super excited to be here. So we're going to get into it here. You joined, right as the pandemic hit. and all the public kind of things that Google has the In a rapidly What's different in the technology? the same and familiar to you, in that regard? (Dave laughing) storage in the cloud, BigQuery, obviously on the analytics side, around the world. a lot of the cloud of the consumer services the early Hadoop worlds, is at a scale that for sure. so that a fortune 10 company protection in the Cloud, And it's one of the reasons of the business requirement. One of the things to think is the same substrate we have If I'm on the west coast, of the Google Cloud regions That's good. out of the bucket. And I really often segment the cloud is overpriced. despite all the rhetoric. So it's often one of the things No comment. are you going to deliver the underlying technology. is the need to meet your But the point is you shouldn't have a lot of bandwidth, So if you want to push... avoid moving the data altogether. So that when you're recovering That's the point. over your VMware cluster. So that's the main kind So Brian we'll have to have you back pushing the technology.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Dave Nicholson	PERSON	0.99+
David Nicholson	PERSON	0.99+
Dell	ORGANIZATION	0.99+
Brian Schwarz	PERSON	0.99+
David	PERSON	0.99+
Google	ORGANIZATION	0.99+
Brian	PERSON	0.99+
Cisco	ORGANIZATION	0.99+
50 cents	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
two pieces	QUANTITY	0.99+
10	QUANTITY	0.99+
NetApp	ORGANIZATION	0.99+
second option	QUANTITY	0.99+
two offerings	QUANTITY	0.99+
15%	QUANTITY	0.99+
Veeam	ORGANIZATION	0.99+
First option	QUANTITY	0.99+
three zones	QUANTITY	0.99+
Spotify	ORGANIZATION	0.99+
15	QUANTITY	0.99+
one region	QUANTITY	0.99+
one	QUANTITY	0.99+
BigQuery	TITLE	0.99+
Oracle	ORGANIZATION	0.99+
Two main options	QUANTITY	0.99+
One	QUANTITY	0.99+
Matt baker	PERSON	0.99+
Dave	PERSON	0.99+
second area	QUANTITY	0.98+
Second one	QUANTITY	0.98+
20 subsea cables	QUANTITY	0.98+
10, $12 billion	QUANTITY	0.98+
two main offerings	QUANTITY	0.97+
North America	LOCATION	0.97+
nine products	QUANTITY	0.97+
two buckets	QUANTITY	0.96+
one thing	QUANTITY	0.96+
Single	QUANTITY	0.96+
Hadoop	TITLE	0.95+
Google Cloud	TITLE	0.95+
one eye	QUANTITY	0.95+
Anthos	ORGANIZATION	0.95+
Two things	QUANTITY	0.94+
Pure	ORGANIZATION	0.94+
first party	QUANTITY	0.92+
VeeamON 2022	EVENT	0.91+
pandemic	EVENT	0.91+

COMMUNICATIONS V1 | CLOUDERA

>>Hi today, I'm going to talk about network analytics and what that means for, for telecommunications as we go forward. Um, thinking about, uh, 5g, what the impact that's likely to have on, on network analytics and the data requirement, not just to run the network and to understand the network a little bit better. Um, but also to, to inform the rest of the operation of the telecommunications business. Um, so as we think about where we are in terms of network analytics and what that is over the last 20 years, the telecommunications industry has evolved its management infrastructure, uh, to abstract away from some of the specific technologies in the network. So what do we mean by that? Well, uh, in the, in the initial, uh, telecommunications networks were designed, there were management systems that were built in, um, eventually fault management systems, uh, assurance systems, provisioning systems, and so on were abstracted away. >>So it didn't matter what network technology had, whether it was a Nokia technology or Erickson technology or Huawei technology or whatever it happened to be. You could just look at your fault management system, understand where false, what happened as we got into the last sort of 10, 15 years or so. Telecommunication service providers become became more sophisticated in terms of their approach to data analytics and specifically network analytics, and started asking questions about why and what if in relation to their network performance and network behavior. And so network analytics as a, as a bit of an independent function was born and over time, more and more data began to get loaded into the network analytics function. So today just about every carrier in the world has a network analytics function that deals with vast quantities of data in big data environments that are now being migrated to the cloud. >>As all telecommunications carriers are migrating as many it workloads as possible, um, to the cloud. So what are the things that are happening as we migrate to the cloud that drive, uh, uh, enhancements in use cases and enhancements and scale, uh, in telecommunications network analytics? Well, 5g is the big thing, right? So 5g, uh, it's not just another G in that sense. I mean, in some cases, in some senses, it is 5g means greater bandwidth, lower latency and all those good things. So, you know, we can watch YouTube videos with less interference and, and less sluggish bandwidth and so on and so forth. But 5g is really about the enterprise and enterprise services. Transformation, 5g is more secure, kind of a network, but 5g is also a more pervasive network 5g, a fundamentally different network topology than previous generations. So there's going to be more masts and that means that you can have more pervasive connectivity. >>Uh, so things like IOT and edge applications, autonomous cars, smart cities, these kinds of things, um, are all much better served because you've got more masks that of course means that you're going to have a lot more data as well. And we'll get to that. The second piece is immersive digital services. So with more masks, with more connectivity, with lower latency with higher man, the potential, uh, is, is, is, is immense for services innovation. And we don't know what those services are going to be. We know that technologies like augmented reality, virtual reality, things like this have great potential. Um, but we, we have yet to see where those commercial applications are going to be, but the innovation and the innovation potential for 5g is phenomenal. Um, it certainly means that we're going to have a lot more, uh, edge devices, um, uh, and that again is going to lead to an increase in the amount of data that we have available. >>And then the idea of pervasive connectivity when it comes to smart, smart cities, uh, autonomous, autonomous currents, um, uh, integrated traffic management systems, um, all of this kind of stuff, those of those kind of smart environments thrive where you've got this kind of pervasive connectivity, this persistent, uh, connection to the network. Um, again, that's going to drive, um, um, uh, more innovation. And again, because you've got these new connected devices, you're going to get even more data. So this rise, this exponential rise in data is really what's driving the change in, in network analytics. And there are four major vectors that are driving this increase in data in terms of both volume and in terms of speed. So the first is more physical elements. So we said already that 5g networks are going to have a different apology. 5g networks will have more devices, more and more masks. >>Um, and so with more physical elements in the network, you're going to get more physical data coming off those physical networks. And so that needs to be aggregated and collected and managed and stored and analyzed and understood when, so that we can, um, have a better understanding as to why things happened the way they do, why the network behaves in which they do in, in, in, in ways that it does and why devices that are connected to the network. And ultimately of course, consumers, whether they be enterprises or retail customers, um, behave in the way they do in relation to their interaction within our edge nodes and devices, we're going to have a, uh, an explosion in terms of the number of devices. We've already seen IOT devices with your different kinds of trackers and, uh, and, and sensors that are hanging off the edge of the network, whether it's to make buildings smarter car smarter, or people smarter, um, in, in terms of having the, the, the measurements and the connectivity and all that sort of stuff. >>So the numbers of devices on the agent beyond the age, um, are going to be phenomenal. One of the things that we've been trying to with as an industry over the last few years is where does the telco network end, and where does the enterprise, or even the consumer network begin. You used to be very clear that, you know, the telco network ended at the router. Um, but now it's not, it's not that clear anymore because in the enterprise space, particularly with virtualized networking, which we're going to talk about in a second, um, you start to see end to end network services being deployed. Um, uh, and so are they being those services in some instances are being managed by the service provider themselves, and in some cases by the enterprise client, um, again, the line between where the telco network ends and where the enterprise or the consumer network begins, uh, is not clear. >>Uh, so, so those edge, the, the, the proliferation of devices at the age, um, uh, in terms of, um, you know, what those devices are, what the data yield is and what the policies are, their need to govern those devices, um, in terms of security and privacy, things like that, um, that's all going to be really, really important virtualized services. We just touched on that briefly. One of the big, big trends that's happening right now is not just the shift of it operations onto the cloud, but the shift of the network onto the cloud, the virtualization of network infrastructure, and that has two major impacts. First of all, it means that you've got the agility and all of the scale, um, uh, benefits that you get from migrating workloads to the cloud, the elasticity and the growth and all that sort of stuff. But arguably more importantly for the telco, it means that with a virtualized network infrastructure, you can offer entire networks to enterprise clients. >>So if you're selling to a government department, for example, is looking to stand up a system for certification of, of, you know, export certification, something like that. Um, you can not just sell them the connectivity, but you can sell them the networking and the infrastructure in order to serve that entire end to end application. You could sentence, you could offer them in theory, an entire end-to-end communications network, um, and with 5g network slicing, they can even have their own little piece of the 5g bandwidth that's been allocated against the carrier, um, uh, and, and have a complete end to end environment. So the kinds of services that can be offered by telcos, um, given virtualize network infrastructure, uh, are, are many and varied. And it's a, it's a, it's a, um, uh, an outstanding opportunity. But what it also means is that the number of network elements virtualized in this case is also exploding. >>That means the amount of data that we're getting on, uh, informing us as to how those network elements are behaving, how they're performing, um, uh, is, is, is going to go up as well. And then finally, AI complexity. So on the demand side, um, while historically, uh, um, network analytics, big data, uh, has been, has been driven by, um, returns in terms of data monetization, uh, whether that's through cost avoidance, um, or service assurance, uh, or even revenue generation through data monetization and things like that. AI is transforming telecommunications and every other industry, the potential for autonomous operations, uh, is extremely attractive. And so understanding how the end-to-end telecommunication service delivering delivery infrastructure works, uh, is essential, uh, as a training ground for AI models that can help to automate a huge amount of telecommunications operating, um, processes. So the AI demand for data is just going through the roof. >>And so all of these things combined to mean big data is getting explosive. It is absolutely going through the roof. So that's a huge thing that's happening. So as telecommunications companies around the world are looking at their network analytics infrastructure, which was initially designed for service insurance primarily, um, and how they migrate that to the cloud. These things are impacting on those decisions because you're not just looking at migrating a workload to operate in the cloud that used to work in the, in the data center. Now you're looking at, um, uh, migrating a workload, but also expanding the use cases in that work and bear in mind, many of those, those are going to need to remain on prem. So they'll need to be within a private cloud or at best a hybrid cloud environment in order to satisfy a regulatory jurisdictional requirements. So let's talk about an example. >>So LGU plus is a Finastra fantastic service provider in Korea. Um, huge growth in that business over the last, uh, over the last 10, 15 years or so. Um, and obviously most people will be familiar with LG, the electronics brand, maybe less so with, uh, with LG plus, but they've been doing phenomenal work. And we're the first, uh, business in the world who launch commercial 5g in 2019. And so a huge milestone that they achieved. And at the same time they deploy the network real-time analytics platform or in rep, uh, from a combination of Cloudera and our partner calmer. Now, um, there were a number of things that were driving, uh, the requirement for it, for the, for the analytics platform at the time. Um, clearly the 5g launch was that was the big thing that they had in mind, but there were other things that re so within the 5g launch, um, uh, they were looking for, for visibility of services, um, and service assurance and service quality. >>So, you know, what services have been launched? How are they being taken up? What are the issues that are arising, where are the faults happening? Um, where are the problems? Because clearly when you launch a new service, but then you want to understand and be on top of the issues as they arise. Um, so that was really, really important. The second piece was, and, you know, this is not a new story to any telco in the world, right. But there are silos in operation. Uh, and so, um, taking advantage of, um, or eliminating redundancies through the process, um, of, of digital transformation, it was really important. And so particular, the two silos between wired and the wireless sides of the business come together so that there would be an integrated network management system, um, for, uh, for LGU plus, as they rolled out 5g. So eliminating redundancy and driving cost savings through the, the integration of the silos is really, really important. >>And that's a process and the people thing every bit, as much as it is a systems and a data thing. So, um, another big driver and the fourth one, you know, we've talked a little bit about some of these things, right? 5g brings huge opportunity for enterprise services, innovation. So industry 4.0 digital experience, these kinds of use cases, um, are very important in the south Korean marketing and in the, um, in the business of LGU plus. And so, uh, um, looking at AI and how can you apply AI to network management? Uh, again, there's a number of use cases, really, really exciting use cases that have gone live now, um, in LG plus since, uh, since we did this initial deployment and they're making fantastic strides there, um, big data analytics for users across LGU plus, right? So it's not just for, um, uh, it's not just for the immediate application of 5g or the support or the 5g network. >>Um, but also for other data analysts and data scientists across the LGU plus business network analytics, while primarily it's primary it's primary use case is around network management, um, LGU plus, or, or network analytics, um, has applications across the entire business, right? So, um, you know, for customer churn or next best offer for understanding customer experience and customer behavior really important there for digital advertising, for product innovation, all sorts of different use cases and departments within the business needed access to this information. So collaboration sharing across the network, the real-time network analytics platform, um, it was very important. And then finally, as I mentioned, LG group is much bigger than just LG plus it's because the electronics and other pieces, and they had launched a major group wide digital transformation program in 2019, and still being a part of that was, well, some of them, the problems that they were looking to address. >>Um, so first of all, the integration of wired and wireless data service data sources, and so getting your assurance data sources, your network, data sources, uh, and so on integrated with is really, really important scale was massive for them. Um, you know, they're talking about billions of transactions in under a minute, uh, being processed, um, and hundreds of terabytes per day. So, uh, you know, phenomenal scale, uh, that needed to be available out of the box as it were, um, real time indicators and alarms. And there was lots of KPIs and thresholds set that, you know, w to make, make it to meet certain criteria, certain standards, um, customer specific, real time analysis of 5g, particularly for the launch root cause analysis, an AI based prediction on service, uh, anomalies and service service issues was, was, was a core use case. Um, as I talked about already the provision of service of data services across the organization, and then support for 5g, uh, served the business service, uh, impact, uh, was extremely important. >>So it's not just understand well, you know, that you have an outage in a particular network element, but what is the impact on the business of LGU plus, but also what is the impact on the business of the customer, uh, from an outage or an anomaly or a problem on, on, on the network. So being able to answer those kinds of questions really, really important, too. And as I said, between Cloudera and Kamarck, uh, uh, and LGU plus, uh, really themselves an intrinsic part of the solution, um, uh, this is, this is what we, we ended up building. So a big complicated architecture space. I really don't want to go into too much detail here. Um, uh, you can see these things for yourself, but let me skip through it really quickly. So, first of all, the key data sources, um, you have all of your wireless network information, other data sources. >>This is really important because sometimes you kind of skip over this. There are other systems that are in place like the enterprise data warehouse that needed to be integrated as well, southbound and northbound interfaces. So we get our data from the network and so on, um, and network management applications through file interfaces. CAFCA no fire important technologies. And also the RDBMS systems that, uh, you know, like the enterprise data warehouse that we're able to feed that into the system. And then northbound, um, you know, we spoke already about me making network analytics services available across the enterprise. Um, so, uh, you know, uh, having both the file and the API interface available, um, for other systems and other consumers across the enterprise is very important. Um, lots of stuff going on then in the platform itself to petabytes and persistent storage, um, Cloudera HDFS, 300 nodes for the, the raw data storage, um, uh, and then, uh, could do for real time storage for real-time indicator analysis, alarm generation, um, uh, and other real time, um, processes. >>Uh, so there, that was the, the core of the solution, uh, spark processes for ETL key quality indicators and alarming, um, and also a bunch of work done around, um, data preparation, data generation for transferal to, to third party systems, um, through the northbound interfaces, um, uh, Impala, API queries, um, for real-time systems, uh, there on the right hand side, and then, um, a whole bunch of clustering classification, prediction jobs, um, through the, uh, the, the, the, the ML processes, the machine learning processes, uh, again, another key use case, and we've done a bunch of work on that. And, um, I encourage you to have a look at the Cloudera website for more detail on some of the work that we did here. Um, so this is some pretty cool stuff. Um, and then finally, just the upstream services, some of these there's lots more than, than, than simply these ones, but service assurance is really, really important. So SQM cm and SED grade. So the service quality management customer experience, autonomous controllers, uh, really, really important consumers of, of the, of the real-time analytics platform, uh, and your conventional service assurance, um, functions like faulted performance management. Uh, these things are as much consumers of the information and the network analytics platform as they are providers of data to the network, uh, analytics >>Platform. >>Um, so some of the specific use cases, uh, that, uh, have been, have been stood up and that are delivering value to this day and lots of more episodes, but these are just three that we pulled out. Um, so first of all, um, uh, sort of specific monitoring and customer quality analysis, Karen response. So again, growing from the initial 5g launch and then broadening into broader services, um, understanding where there are the, where there are issues so that when people complaining, when people have an issue, um, that, um, uh, that we can answer the, the concerns of the client, um, in a substantive way, um, uh, AI functions around root cause analysis or understanding why things went wrong when they went wrong. Um, uh, and also making recommendations as to how to avoid those occurrences in the future. Uh, so we know what preventative measures can be taken. Um, and then finally the, uh, the collaboration function across LGU plus extremely important and continues to be important to this day where data is shared throughout the enterprise, through the API Lira through file interfaces and other things, and through interface integrations with, uh, with upstream systems. >>So, um, that's kind of the, the, uh, real quick run through of LGU plus the numbers are just stave staggering. Um, you know, we've seen, uh, upwards of a billion transactions in under 40 seconds being, um, uh, being tested. Um, and, and we've gone beyond those thresholds now, already, um, and we're started and, and, and, and this isn't just a theoretical sort of a benchmarking test or something like that. We're seeing these kinds of volumes of data and not too far down the track. So, um, with those things that I mentioned earlier with the proliferation of, of, um, of network infrastructure, uh, in the 5g context with virtualized elements, with all of these other bits and pieces are driving massive volumes of data towards the, uh, the, the, the network analytics platform. So phenomenal scale. Um, this is just one example we work with, with service providers all over the world is over 80% of the top 100 telecommunication service providers run on Cloudera. >>They use Cloudera in the network, and we're seeing those customers, all migrating legacy cloud platforms now onto CDP onto the Cloudera data platform. Um, they're increasing the, the, the jobs that they do. So it's not just warehousing, not just ingestion ETL, and moving into things like machine learning. Um, and also looking at new data sources from places like NWTF the network data analytics function in 5g, or the management and orchestration layer in, in software defined networks, network, function, virtualization. So, you know, new use cases coming in all the time, new data sources coming in all the time growth in, in, in, in the application scope from, as we say, from edge to AI. Um, and so it's, it's really exciting to see how the, the, the, the footprint is growing and how, uh, the applications in telecommunications are really making a difference in, in facilitating, um, network transformation. And that's covering that. That's me covered for today. I hope you found that helpful, um, by all means, please reach out, uh, there's a couple of links here. You can follow me on Twitter. You can connect to the telecommunications page, reach out to me directly at Cloudera. I'd love to answer your questions, um, uh, and, uh, and talk to you about how big data is transforming networks, uh, and how network transformation is, is accelerating telcos, uh, throughout >>Jamie Sharath with Liga data, I'm primarily on the delivery side of the house, but I also support our new business teams. I'd like to spend a minute really just kind of telling you about the legal data, where basically a Silicon valley startup, uh, started in 2014, and, uh, our lead iron, our executive team, basically where the data officers at Yahoo before this, uh, we provide managed data services, and we provide products that are focused on telcos. So we have some experience in non telco industry, but our focus for the last seven years or so is specifically on telco. So again, something over 200 employees, we have a global presence in north America, middle east Africa, Asia, and Europe. And we have folks in all of those places, uh, I'd like to call your attention to the, uh, the middle really of the screen there. So here is where we have done some partnership with Cloudera. >>So if you look at that and you can see we're in Holland and Jamaica, and then a lot to throughout Africa as well. Now, the data fabric is the product that we're talking about. And the data fabric is basically a big data type of data warehouse with a lot of additional functionality involved. The data fabric is comprised of, uh, some something called a flare, which we'll talk about in a minute below there, and then the Cloudera data platform underneath. So this is how we're partnering together. We, uh, we, we have this tool and it's, uh, it's functioning and delivering in something over 10 up. So flare now, flare is a piece of that legal data IP. The rest is there. And what flare does is that basically pulls in data, integrates it to an event streaming platform. It's, uh, it is the engine behind the data fabric. >>Uh, it's also a decisioning platform. So in real time, we're able to pull in data. We're able to run analytics on it, and we're able to alert are, do whatever is needed in a real-time basis. Of course, a lot of clients at this point are still sending data in batch. So it handles that as well, but we call that a CA picture Sanchez. Now Sacho is a very interesting app. It's an AI analytics app for executives. What it is is it runs on your mobile phone. It ties into your data. Now this could be the data fabric, but it couldn't be a standalone product. And basically it allows you to ask, you know, human type questions to say, how are my gross ads last week? How are they comparing against same time last week before that? And even the same time 60 days ago. So as an executive or as an analyst, I can pull it up and I can look at it instantly in a meeting or anywhere else without having to think about queries or anything like that. >>So that's pretty much for us at legal data, not really to set the context of where we are. So this is a traditional telco environments. So you see the systems of record, you see the cloud, you see OSS and BSS data. So one of the things that the next step above which calls we call the system of intelligence of the data fabric does, is it mergers that BSS and OSS data. So the longer we have any silos or anything that's separated, it's all coming into one area to allow business, to go in or allow data scientists go in and do that. So if you look at the bottom line, excuse me, of the, uh, of the system of intelligence, you can see that flare is the tools that pulls in the data. So it provides even streaming capabilities. It preserves entity states, so that you can go back and look at it state at any time. >>It does stream analytics that is as the data is coming in, it can perform analytics on it. And it also allows real-time decisioning. So that's something that, uh, that's something that business users can go in and create a system of, uh, if them's, it looks very much like the graph database, where you can create a product that will allow the user to be notified if a certain condition happens. So for instance, a bundle, so a real-time offer or user is succinct to run out of is ongoing, and an offer can be sent to him right on the fly. And that's set up by the business user as opposed to programmers, uh, data infrastructure. So the fabric has really three areas. That data is persistent, obviously there's the data lake. So the data lake stores that level of granularity that is very deep years and years of history, data, scientists like that, uh, and, uh, you know, for a historical record keeping and requirements from the government, that data would be stored there. >>Then there's also something we call the business semantics layer and the business semantics layer contains something over 650 specific telco KPIs. These are initially from PM forum, but they also are included in, uh, various, uh, uh, mobile operators that we've delivered at. And we've, we've grown that. So that's there for business data lake is there for data scientists, analytical stores, uh, they can be used for many different reasons. There are a lot of times RDBMS is, are still there. So these, this, this basically platform, this cloud they're a platform can tie into analytical data stores as well via flair access and reporting. So graphic visualizations, API APIs are a very key part of it. A third-party query tools, any kind of grid tools can be used. And those are the, of course, the, uh, the ones that are highly optimized and allow, you know, search of billions of records. >>And then if you look at the top, it's the systems of engagement, then you might vote this use cases. So teleco reporting, hundreds of KPIs that are, that are generated for users, segmentation, basically micro to macro segmentation, segmentation will play a key role in a use case. We talked about in a minute monetization. So this helps teleco providers monetize their specific data, but monetize it in. Okay, how to, how do they make money off of it, but also how might you leverage this data to engage with another client? So for instance, in some where it's allowed a DPI is used, and the fabric tracks exactly where each person goes each, uh, we call it a subscriber, goes within his, uh, um, uh, internet browsing on the, on the four or 5g. And, uh, the, all that data is stored. Uh, whereas you can tell a lot of things where the segment, the profile that's being used and, you know, what are they propensity to buy? Do they spend a lot of time on the Coca-Cola page? There are buyers out there that find that information very valuable, and then there's signs of, and we spoke briefly about Sanchez before that sits on top of the fabric or it's it's alone. >>So, so the story really that we want to tell is, is one, this is, this is one case out of it. This is a CVM type of case. So there was a mobile operator out there that was really offering, you know, packages, whether it's a bundle or whether it's a particular tool to subscribers, they, they were offering kind of an abroad approach that it was not very focused. It was not depending on the segments that were created around the profiling earlier, uh, the subscriber usage was somewhat dated and this was causing a lot of those. A lot of those offers to be just basically not taken and, and not, not, uh, audited. Uh, there was limited segmentation capabilities really before the, uh, before the, uh, fabric came in. Now, one of the key things about the fabric is when you start building segments, you can build that history. >>So all of that data stored in the data lake can be used in terms of segmentation. So what did we do about that? The, the, the envy and, oh, the challenge this, uh, we basically put the data fabric in and the data fabric was running Cloudera data platform and that, uh, and that's how we team up. Uh, we facilitated the ability to personalize campaign. So what that means is, uh, the segments that were built and that user fell within that segment, we knew exactly what his behavior most likely was. So those recommendations, those offers could be created then, and we enable this in real time. So real-time ability to even go out to the CRM system and gather further information about that. All of these tools, again, we're running on top of the Cloudera data platform, uh, what was the outcome? Willie, uh, outcome was that there was a much more precise offer given to the client that is, that was accepted, no increase in cross sell and upsell subscriber retention. >>Uh, our clients came back to us and pointed out that, uh, it was 183% year on year revenue increase. Uh, so this is a, this is probably one of the key use cases. Now, one thing to really mention is there are hundreds and hundreds of use cases running on the fabric. And I would even say thousands. A lot of those have been migrated. So when the fabric is deployed, when they bring the Cloudera and the legal data solution in there's generally a legacy system that has many use cases. So many of those were, were migrated virtually all of them in pen, on put on the cloud. Uh, another issue is that new use cases are enabled again. So when you get this level of granularity and when you have campaigns that can now base their offers on years of history, as opposed to 30 days of history, the campaigns campaign management response systems, uh, are, are, uh, are enabled quite a bit to do all, uh, to be precise in their offers. Okay. >>Okay. So this is a technical slide. Uh, one of the things that we normally do when we're, when we're out there talking to folks, is we talk and give an overview and that last little while, and then we give a deep technical dive on all aspects of it. So sometimes that deep dive can go a couple of hours. I'm going to do this slide and a couple of minutes. So if you look at it, you can see over on the left, this is the, uh, the sources of the data. And they go through this tool called flare that runs on the cloud. They're a data platform, uh, that can either be via cues or real-time cues, or it can be via a landing zone, or it can be a data extraction. You can take a look at the data quality that's there. So those are built in one of the things that flare does is it has out of the box ability to ingest data sources and to apply the data quality and validation for telco type sources. >>But one of the reasons this is fast to market is because throughout those 10 or 12, uh, opcos that we've done with Cloudera, where we have already built models, so models for CCN, for air for, for most mediation systems. So there's not going to be a type of, uh, input that we haven't already seen are very rarely. So that actually speeds up deployment very quickly. Then a player does the transformations, the, uh, the metrics, continuous learning, we call it continuous decisioning, uh, API access. Uh, we, uh, you know, for, for faster response, we use distributed cash. I'm not going to go too deeply in there, but the layer in the business semantics layer again, are, are sitting on top of the Cloudera data platform. You see the Kafka CLU, uh, Q1, the right as well. >>And all of that, we're calling the fabric. So the fabric is Cloudera data platform and the cloud and flair and all of this runs together. And, and by the way, there've been many, many, many, many hundreds of hours testing flare with Cloudera and, uh, and the whole process, the results, what are the results? Well, uh, there are, there are four I'm going to talk about, uh, we saw the one for the, it was called my pocket pocket, but it's a CDM type, a use case. Uh, the subscribers of that mobile operator were 14 million plus there was a use case for 24 million plus that a year on year revenue was 130%, uh, 32 million plus for 38%. These are, um, these are different CVM pipe, uh, use cases, as well as network use cases. And then there were 44%, uh, telco with 76 million subscribers. So I think that there are a lot more use cases that we could talk about, but, but in this case, this is the ones we're looking at, uh, again, 183%. This is something that we find consistently. And these figures come from our, uh, our actual end client. How do we unlock the full potential of this? Well, I think to start is to arrange a meeting and, uh, it would be great to, to, uh, for you to reach out to me or to Anthony. Uh, we're working at the junction on this, and we can set up a, uh, we can set up a meeting and we can go through this initial meeting. And, uh, I think that's the very beginning. Uh, again, you can get additional information from Cloudera website and from the league of data website, Anthony, that's the story. Thank you. >>No, that's great. Jeremy, thank you so much. It's a, it's, it's wonderful to go deep. And I know that there are hundreds of use cases being deployed in MTN, um, but great to go deep on one. And like you said, it can, once you get that sort of architecture in place, you can do so many different things. The power of data is tremendous, but it's great to be able to see how you can, how you can track it end to end from collecting the data, processing it, understanding it, and then applying it in a commercial context and bringing actual revenue back into the business. So there is your ROI straight away. Now you've got a platform that you can transform your business on. That's, that's, it's a tremendous story, Jamie, and thank you for your part. Sure. Um, that's a, that's, that's our story for today. Like Jamie says, um, please do flee, uh, feel free to reach out to us. Um, the, the website addresses are there and our contact details, and we'd be delighted to talk to you a little bit more about some of the other use cases, perhaps, um, and maybe about your own business and, uh, and how we might be able to make it, make it perform a little better. So thank you.

Published Date : Aug 4 2021

SUMMARY :

Um, thinking about, uh, So it didn't matter what network technology had, whether it was a Nokia technology or Erickson technology the cloud that drive, uh, uh, enhancements in use cases uh, and that again is going to lead to an increase in the amount of data that we have available. So the first is more physical elements. And so that needs to be aggregated and collected and managed and stored So the numbers of devices on the agent beyond the age, um, are going to be phenomenal. the agility and all of the scale, um, uh, benefits that you get from migrating So the kinds of services So on the demand side, um, So they'll need to be within a private cloud or at best a hybrid cloud environment in order to satisfy huge growth in that business over the last, uh, over the last 10, 15 years or so. And so particular, the two silos between And so, uh, um, the real-time network analytics platform, um, it was very important. Um, so first of all, the integration of wired and wireless data service data sources, So, first of all, the key data sources, um, you have all of your wireless network information, And also the RDBMS systems that, uh, you know, like the enterprise data warehouse that we're able to feed of the information and the network analytics platform as they are providers of data to the network, Um, so some of the specific use cases, uh, Um, you know, we've seen, Um, and also looking at new data sources from places like NWTF the network data analytics So here is where we have done some partnership with So if you look at that and you can see we're in Holland and Jamaica, and then a lot to throughout And even the same time So the longer we have any silos data, scientists like that, uh, and, uh, you know, for a historical record keeping and requirements of course, the, uh, the ones that are highly optimized and allow, the segment, the profile that's being used and, you know, what are they propensity to buy? Now, one of the key things about the fabric is when you start building segments, So all of that data stored in the data lake can be used in terms of segmentation. So when you get this level of granularity and when you have campaigns that can now base their offers So if you look at it, you can see over on the left, this is the, uh, the sources of the data. So there's not going to be a type of, uh, input that we haven't already seen are very rarely. So the fabric is Cloudera data platform and the cloud uh, and how we might be able to make it, make it perform a little better.

ENTITIES

Entity	Category	Confidence
Jamie	PERSON	0.99+
Jeremy	PERSON	0.99+
Holland	LOCATION	0.99+
Jamie Sharath	PERSON	0.99+
Anthony	PERSON	0.99+
Korea	LOCATION	0.99+
38%	QUANTITY	0.99+
Cloudera	ORGANIZATION	0.99+
2014	DATE	0.99+
2019	DATE	0.99+
183%	QUANTITY	0.99+
Europe	LOCATION	0.99+
24 million	QUANTITY	0.99+
14 million	QUANTITY	0.99+
LG	ORGANIZATION	0.99+
second piece	QUANTITY	0.99+
30 days	QUANTITY	0.99+
Jamaica	LOCATION	0.99+
Nokia	ORGANIZATION	0.99+
Huawei	ORGANIZATION	0.99+
today	DATE	0.99+
Yahoo	ORGANIZATION	0.99+
130%	QUANTITY	0.99+
32 million	QUANTITY	0.99+
Asia	LOCATION	0.99+
last week	DATE	0.99+
Erickson	ORGANIZATION	0.99+
Finastra	ORGANIZATION	0.99+
three	QUANTITY	0.99+
thousands	QUANTITY	0.99+
Africa	LOCATION	0.99+
north America	LOCATION	0.99+
telco	ORGANIZATION	0.99+
Silicon valley	LOCATION	0.99+
first	QUANTITY	0.99+
each person	QUANTITY	0.99+
Willie	PERSON	0.99+
10	QUANTITY	0.99+
44%	QUANTITY	0.99+
over 80%	QUANTITY	0.99+
one	QUANTITY	0.98+
76 million subscribers	QUANTITY	0.98+
60 days ago	DATE	0.98+
over 200 employees	QUANTITY	0.98+
LGU plus	ORGANIZATION	0.98+
Cloudera	TITLE	0.98+
Sacho	TITLE	0.98+
middle east Africa	LOCATION	0.97+
First	QUANTITY	0.97+
Liga data	ORGANIZATION	0.97+
four major vectors	QUANTITY	0.97+
under 40 seconds	QUANTITY	0.97+
YouTube	ORGANIZATION	0.97+
one example	QUANTITY	0.97+
One	QUANTITY	0.97+
two silos	QUANTITY	0.97+
each	QUANTITY	0.96+
Karen	PERSON	0.96+
one case	QUANTITY	0.96+
billions of records	QUANTITY	0.96+
three areas	QUANTITY	0.96+
under a minute	QUANTITY	0.95+
CAFCA	ORGANIZATION	0.95+
one thing	QUANTITY	0.95+
both	QUANTITY	0.94+
12	QUANTITY	0.94+
LG plus	ORGANIZATION	0.94+
Twitter	ORGANIZATION	0.94+
one area	QUANTITY	0.93+
fourth one	QUANTITY	0.93+
hundreds and	QUANTITY	0.92+
a year	QUANTITY	0.92+

Maria Colgan & Gerald Venzl, Oracle | June CUBEconversation

(upbeat music) >> It'll be five, four, three and then silent two, one, and then you guys just follow my lead. We're just making some last minute adjustments. Like I said, we're down two hands today. So, you good Alex? Okay, are you guys ready? >> I'm ready. >> Ready. >> I got to get get one note here. >> So I noticed Maria you stopped anyway, so I have time. >> Just so they know Dave and the Boston Studio, are they both kind of concurrently be on film even when they're not speaking or will only the speaker be on film for like if Gerald's drawing while Maria is talking about-- >> Sorry but then I missed one part of my onboarding spiel. There should be, if you go into gallery there should be a label. There should be something labeled Boston live switch feed. If you pin that gallery view you'll see what our program currently being recorded is. So any time you don't see yourself on that feed is an excellent time to take a drink of water, scratch your nose, check your notes. Do whatever you got to do off screen. >> Can you give us a three shot, Alex? >> Yes, there it is. >> And then go to me, just give me a one-shot to Dave. So when I'm here you guys can take a drink or whatever >> That makes sense? >> Yeah. >> Excellent, I will get my recordings restarted and we'll open up when Dave's ready. >> All right, you guys ready? >> Ready. >> All right Steve, you go on mute. >> Okay, on me in 5, 4, 3. Developers have become the new king makers in the world of digital and cloud. The rise of containers and microservices has accelerated the transition to cloud native applications. A lot of people will talk about application architecture and the related paradigms and the benefits they bring for the process of writing and delivering new apps. But a major challenge continues to be, the how and the what when it comes to accessing, processing and getting insights from the massive amounts of data that we have to deal with in today's world. And with me are two experts from the data management world who will share with us how they think about the best techniques and practices based on what they see at large organizations who are working with data and developing so-called data-driven apps. Please welcome Maria Colgan and Gerald Venzl, two distinguish product managers from Oracle. Folks, welcome, thanks so much for coming on. >> Thanks for having us Dave. >> Thank you very much for having us. >> Okay, Maria let's start with you. So, we throw around this term data-driven, data-driven applications. What are we really talking about there? >> So data-driven applications are applications that work on a diverse set of data. So anything from spatial to sensor data, document data as well as your usual transaction processing data. And what they're going to do is they'll generate value from that data in very different ways to a traditional application. So for example, they may use machine learning, they are able to do product recommendations in the middle of a transaction. Or we could use graph to be able to identify an influencer within the community so we can target them with a specific promotion. It could also use spatial data to be able to help find the nearest stores to a particular customer. And because these apps are deployed on multiple platforms, everything from mobile devices as well as standard browsers, they need a data platform that's going to be both secure, reliable and scalable. >> Well, so when you think about how the workloads are shifting I mean, we're not talking about, you know it's not anymore a world of just your ERP or your HCM or your CRM, you know kind of the traditional operational systems. You really are seeing an explosion of these new data oriented apps. You're seeing, you know, modeling in the cloud, you are going to see more and more inferencing, inferencing at the edge. But Maria maybe you could talk a little bit about sort of the benefits that customers are seeing from developing these types of applications. I mean, why should people care about data-driven apps? >> Oh, for sure, there's massive benefits to them. I mean, probably the most obvious one for any business regardless of the industry, is that they not only allow you to understand what your customers are up to, but they allow you to be able to anticipate those customer's needs. So that helps businesses maintain that competitive edge and retain their customers. But it also helps them make data-driven decisions in real time based on actual data rather than on somebody's gut feeling or basing those decisions on historical data. So for example, you can do real-time price adjustments on products based on demand and so forth, that kind of thing. So it really changes the way people do business today. >> So Gerald, you think about the narrative in the industry everybody wants to be a platform player all your customers they are becoming software companies, they are becoming platform players. Everybody wants to be like, you know name a company that is huge trillion dollar market cap or whatever, and those are data-driven companies. And so it would seem to me that data-driven applications, there's nobody, no company really shouldn't be data-driven. Do you buy that? >> Yeah, absolutely. I mean, data-driven, and that naturally the whole industry is data-driven, right? It's like we all have information technologies about processing data and deriving information out of it. But when it comes to app development I think there is a big push to kind of like we have to do machine learning in our applications, we have to get insights from data. And when you actually look back a bit and take a step back, you see that there's of course many different kinds of applications out there as well that's not to be forgotten, right? So there is a usual front end user interfaces where really the application all it does is just entering some piece of information that's stored somewhere or perhaps a microservice that's not attached to a data to you at all but just receives or asks calls (indistinct). So I think it's not necessarily so important for every developer to kind of go on a bandwagon that they have to be data-driven. But I think it's equally important for those applications and those developers that build applications, that drive the business, that make business critical decisions as Maria mentioned before. Those guys should take really a close look into what data-driven apps means and what the data to you can actually give to them. Because what we see also happening a lot is that a lot of the things that are well known and out there just ready to use are being reimplemented in the applications. And for those applications, they essentially just ended up spending more time writing codes that will be already there and then have to maintain and debug the code as well rather than just going to market faster. >> Gerald can you talk to the prevailing approaches that developers take to build data-driven applications? What are the ones that you see? Let's dig into that a little bit more and maybe differentiate the different approaches and talk about that? >> Yeah, absolutely. I think right now the industry is like in two camps, it's like sort of a religious war going on that you'll see often happening with different architectures and so forth going on. So we have single purpose databases or data management technologies. Which are technologies that are as the name suggests build around a single purpose. So it's like, you know a typical example would be your ordinary key-value store. And a key-value store all it does is it allows you to store and retrieve a piece of data whatever that may be really, really fast but it doesn't really go beyond that. And then the other side of the house or the other camp would be multimodal databases, multimodal data management technologies. Those are technologies that allow you to store different types of data, different formats of data in the same technology in the same system alongside. And, you know, when you look at the geographics out there of what we have from technology, is pretty much any relational database or any database really has evolved into such a multimodal database. Whether that's MySQL that allows you to store or chase them alongside relational or even a MongoDB that allows you to do or gives you native graph support since (mumbles) and as well alongside the adjacent support. >> Well, it's clearly a trend in the industry. We've talked about this a lot in The Cube. We know where Oracle stands on this. I mean, you just mentioned MySQL but I mean, Oracle Databases you've been extending, you've mentioned JSON, we've got blockchain now in there you're infusing, you know ML and AI into the database, graph database capabilities, you know on and on and on. We talked a lot about we compared that to Amazon which is kind of the right tool, the right job approach. So maybe you could talk about, you know, your point of view, the benefits for developers of using that converged database if I can use that word approach being able to store multiple data formats? Why do you feel like that's a better approach? >> Yeah, I think on a high level it comes down to complexity. You are actually avoiding additional complexity, right? So not every use case that you have necessarily warrants to have yet another data management technology or yet the special build technology for managing that data, right? It's like many use cases that we see out there happily want to just store a piece of a chase and document, a piece of chase in a database and then perhaps retrieve it again afterwards so write some simple queries over it. And you really don't have to get a new database technology or a NoSQL database into the mix if you already have some to just fulfill that exact use case. You could just happily store that information as well in the database you already have. And what it really comes down to is the learning curve for developers, right? So it's like, as you use the same technology to store other types of data, you don't have to learn a new technology, you don't have to associate yourself with new and learn new drivers. You don't have to find new frameworks and you don't have to know how to necessarily operate or best model your data for that database. You can essentially just reuse your knowledge of the technology as well as the libraries and code you have already built in house perhaps in another application, perhaps, you know framework that you used against the same technology because it is still the same technology. So, kind of all comes down again to avoiding complexity rather than not fragmenting you know, the many different technologies we have. If you were to look at the different data formats that are out there today it's like, you know, you would end up with many different databases just to store them if you were to fully religiously follow the single purpose best built technology for every use case paradigm, right? And then you would just end up having to manage many different databases more than actually focusing on your app and getting value to your business or to your user. >> Okay, so I get that and I buy that by the way. I mean, especially if you're a larger organization and you've got all these projects going on but before we go back to Maria, Gerald, I want to just, I want to push on that a little bit. Because the counter to that argument would be in the analogy. And I wonder if you, I'd love for you to, you know knock this analogy off the blocks. The counter would be okay, Oracle is the Swiss Army knife and it's got, you know, all in one. But sometimes I need that specialized long screwdriver and I go into my toolbox and I grab that. It's better than the screwdriver in my Swiss Army knife. Why, are you the Swiss Army knife of databases? Or are you the all-in-one have that best of breed screwdriver for me? How do you think about that? >> Yeah, that's a fantastic question, right? And I think it's first of all, you have to separate between Oracle the company that has actually multiple data management technologies and databases out there as you said before, right? And Oracle Database. And I think Oracle Database is definitely a Swiss Army knife has many capabilities of since the last 40 years, you know that we've seen object support coming that's still in the Oracle Database today. We have seen XML coming, it's still in the Oracle Database, graph, spatial, et cetera. And so you have many different ways of managing your data and then on top of that going into the converge, not only do we allow you to store the different data model in there but we actually allow you also to, you apply all the security policies and so forth on top of it something Maria can talk more about the mission around converged database. I would also argue though that for some aspects, we do actually have to or add a screwdriver that you talked about as well. So especially in the relational world people get very quickly hung up on this idea that, oh, if you only do rows and columns, well, that's kind of what you put down on disk. And that was never true, it's the relational model is actually a logical model. What's probably being put down on disk is blocks that align themselves nice with block storage and always has been. So that allows you to actually model and process the data sort of differently. And one common example or one good example that we have that we introduced a couple of years ago was when, column and databases were very strong and you know, the competition came it's like, yeah, we have In-Memory column that stores now they're so much better. And we were like, well, orienting the data role-based or column-based really doesn't matter in the sense that we store them as blocks on disks. And so we introduced the in memory technology which gives you an In-Memory column, a representation of your data as well alongside your relational. So there is an example where you go like, well, actually you know, if you have this use case of the column or analytics all In-Memory, I would argue Oracle Database is also that screwdriver you want to go down to and gives you that capability. Because not only gives you representation in columnar, but also which many people then forget all the analytic power on top of SQL. It's one thing to store your data columnar, it's a completely different story to actually be able to run analytics on top of that and having all the built-in functionalities and stuff that you want to do with the data on top of it as you analyze it. >> You know, that's a great example, the kilometer 'cause I remember there was like a lot of hype around it. Oh, it's the Oracle killer, you know, at Vertica. Vertica is still around but, you know it never really hit escape velocity. But you know, good product, good company, whatever. Natezza, it kind of got buried inside of IBM. ParXL kind of became, you know, red shift with that deal so that kind of went away. Teradata bought a company, I forget which company it bought but. So that hype kind of disapated and now it's like, oh yeah, columnar. It's kind of like In-Memory, we've had a In-Memory databases ever since we've had databases you know, it's a kind of a feature not a sector. But anyway, Maria, let's come back to you. You've got a lot of customer experience. And you speak with a lot of companies, you know during your time at Oracle. What else are you seeing in terms of the benefits to this approach that might not be so intuitive and obvious right away? >> I think one of the biggest benefits to having a multimodel multiworkload or as we call it a converged database, is the fact that you can get greater data synergy from it. In other words, you can utilize all these different techniques and data models to get better value out of that data. So things like being able to do real-time machine learning, fraud detection inside a transaction or being able to do a product recommendation by accessing three different data models. So for example, if I'm trying to recommend a product for you Dave, I might use graph analytics to be able to figure out your community. Not just your friends, but other people on our system who look and behave just like you. Once I know that community then I can go over and see what products they bought by looking up our product catalog which may be stored as JSON. And then on top of that I can then see using the key-value what products inside that catalog those community members gave a five star rating to. So that way I can really pinpoint the right product for you. And I can do all of that in one transaction inside the database without having to transform that data into different models or God forbid, access different systems to be able to get all of that information. So it really simplifies how we can generate that value from the data. And of course, the other thing our customers love is when it comes to deploying data-driven apps, when you do it on a converged database it's much simpler because it is that standard data platform. So you're not having to manage multiple independent single purpose databases. You're not having to implement the security and the high availability policies, you know across a bunch of different diverse platforms. All of that can be done much simpler with a converged database 'cause the DBA team of course, is going to just use that standard set of tools to manage, monitor and secure those systems. >> Thank you for that. And you know, it's interesting, you talk about simplification and you are in Juan's organization so you've big focus on mission critical. And so one of the things that I think is often overlooked well, we talk about all the time is recovery. And if things are simpler, recovery is faster and easier. And so it's kind of the hallmark of Oracle is like the gold standard of the toughest apps, the most mission critical apps. But I wanted to get to the cloud Maria. So because everything is going to the cloud, right? Not all workloads are going to the cloud but everybody is talking about the cloud. Everybody has cloud first mentality and so yes, it's a hybrid world. But the natural next question is how do you think the cloud fits into this world of data-driven apps? >> I think just like any app that you're developing, the cloud helps to accelerate that development. And of course the deployment of these data-driven applications. 'Cause if you think about it, the developer is instantly able to provision a converged database that Oracle will automatically manage and look after for them. But what's great about doing something like that if you use like our autonomous database service is that it comes in different flavors. So you can get autonomous transaction processing, data warehousing or autonomous JSON so that the developer is going to get a database that's been optimized for their specific use case, whatever they are trying to solve. And it's also going to contain all of that great functionality and capabilities that we've been talking about. So what that really means to the developer though is as the project evolves and inevitably the business needs change a little, there's no need to panic when one of those changes comes in because your converged database or your autonomous database has all of those additional capabilities. So you can simply utilize those to able to address those evolving changes in the project. 'Cause let's face it, none of us normally know exactly what we need to build right at the very beginning. And on top of that they also kind of get a built-in buddy in the cloud, especially in the autonomous database. And that buddy comes in the form of built-in workload optimizations. So with the autonomous database we do things like automatic indexing where we're using machine learning to be that buddy for the developer. So what it'll do is it'll monitor the workload and see what kind of queries are being run on that system. And then it will actually determine if there are indexes that should be built to help improve the performance of that application. And not only does it bill those indexes but it verifies that they help improve the performance before publishing it to the application. So by the time the developer is finished with that app and it's ready to be deployed, it's actually also been optimized by the developers buddy, the Oracle autonomous database. So, you know, it's a really nice helping hand for developers when they're building any app especially data-driven apps. >> I like how you sort of gave us, you know the truth here is you don't always know where you're going when you're building an app. It's like it goes from you are trying to build it and they will come to start building it and we'll figure out where it's going to go. With Agile that's kind of how it works. But so I wonder, can you give some examples of maybe customers or maybe genericize them if you need to. Data-driven apps in the cloud where customers were able to drive more efficiency, where the cloud buddy allowed the customers to do more with less? >> No, we have tons of these but I'll try and keep it to just a couple. One that comes to mind straight away is retrace. These folks built a blockchain app in the Oracle Cloud that allows manufacturers to actually share the supply chain with the consumer. So the consumer can see exactly, who made their product? Using what raw materials? Where they were sourced from? How it was done? All of that is visible to the consumer. And in order to be able to share that they had to work on a very diverse set of data. So they had everything from JSON documents to images as well as your traditional transactions in there. And they store all of that information inside the Oracle autonomous database, they were able to build their app and deploy it on the cloud. And they were able to do all of that very, very quickly. So, you know, that ability to work on multiple different data types in a single database really helped them build that product and get it to market in a very short amount of time. Another customer that's doing something really, really interesting is MindSense. So these guys operate the largest mines in Canada, Chile, and Peru. But what they do is they put these x-ray devices on the massive mechanical shovels that are at the cove or at the mine face. And what that does is it senses the contents of the buckets inside these mining machines. And it's looking to see at that content, to see how it can optimize the processing of the ore inside in that bucket. So they're looking to minimize the amount of power and water that it's going to take to process that. And also of course, minimize the amount of waste that's going to come out of that project. So all of that sensor data is sent into an autonomous database where it's going to be processed by a whole host of different users. So everything from the mine engineers to the geo scientists, to even their own data scientists utilize that data to drive their business forward. And what I love about these guys is they're not happy with building just one app. MindSense actually use our built-in low core development environment, APEX that comes as part of the autonomous database and they actually produce applications constantly for different aspects of their business using that technology. And it's actually able to accelerate those new apps to the business. It takes them now just a couple of days or weeks to produce an app instead of months or years to build those new apps. >> Great, thank you for that Maria. Gerald, I'm going to push you again. So, I said upfront and talked about microservices and the cloud and containers and you know, anybody in the developer space follows that very closely. But some of the things that we've been talking about here people might look at that and say, well, they're kind of antithetical to microservices. This is our Oracles monolithic approach. But when you think about the benefits of microservices, people want freedom of choice, technology choice, seen as a big advantage of microservices and containers. How do you address such an argument? >> Yeah, that's an excellent question and I get that quite often. The microservices architecture in general as I said before had architectures, Linux distributions, et cetera. It's kind of always a bit of like there's an academic approach and there's a pragmatic approach. And when you look at the microservices the original definitions that came out at the early 2010s. They actually never said that each microservice has to have a database. And they also never said that if a microservice has a database, you have to use a different technology for each microservice. Just like they never said, you have to write a microservice in a different programming language, right? So where I'm going with this is like, yes you know, sometimes when you look at some vendors out there, some niche players, they push this message or they jump on this academic approach of like each microservice has the best tool at hand or I'd use a different database for your purpose, et cetera. Which almost often comes across like us. You know, we want to stay part of the conversation. Nothing stops a developer from, you know using a multimodal database for the microservice and just using that as a document store, right? Or just using that as a relational database. And, you know, sometimes I mean, it was actually something that happened that was really interesting yesterday I don't know whether you follow Dave or not. But Facebook had an outage yesterday, right? And Facebook is one of those companies that are seen as the Silicon Valley, you know know how to do microservices companies. And when you add through the outage, well, what happened, right? Some unfortunate logical error with configuration as a force that took a database cluster down. So, you know, there you have it where you go like, well, maybe not every microservice is actually in fact talking to its own database or its own special purpose database. I think there, you know, well, what we should, the industry should be focusing much more on this argument of which technology to use? What's the right tool for a job? Is more to ask themselves, what business problem actually are we trying to solve? And therefore what's the right approach and the right technology for this. And so therefore, just as I said before, you know multimodal databases they do have strong benefits. They have many built-in functionalities that are already there and they allow you to reduce this complexity of having to know many different technologies, right? And so it's not only to store different data models either you know, treat a multimodal database as a chasing documents store or a relational database but most databases are multimodal since 20 plus years. But it's also actually being able to perhaps if you store that data together, you can perhaps actually derive additional value for somebody else but perhaps not for your application. But like for example, if you were to use Oracle Database you can actually write queries on top of all of that data. It doesn't really matter for our query engine whether it's the data is format that then chase or the data is formatted in rows and columns you can just rather than query over it. And that's actually very powerful for those guys that have to, you know get the reporting done the end of the day, the end of the week. And for those guys that are the data scientists that they want to figure out, you know which product performed really well or can we tweak something here and there. When you look into that space you still see a huge divergence between the guys to put data in kind of the altarpiece style and guys that try to derive new insights. And there's still a lot of ETL going around and, you know we have big data technologies that some of them come and went and some of them came in that are still around like Apache Spark which is still like a SQL engine on top of any of your data kind of going back to the same concept. And so I will say that, you know, for developers when we look at microservices it's like, first of all, is the argument you were making because the vendor or the technology you want to use tells you this argument or, you know, you kind of want to have an argument to use a specific technology? Or is it really more because it is the best technology, to best use for this given use case for this given application that you have? And if so there's of course, also nothing wrong to use a single purpose technology either, right? >> Yeah, I mean, whenever I talk about Oracle I always come back to the most important applications, the mission critical. It's very difficult to architect databases with microservices and containers. You have to be really, really careful. And so and again, it comes back to what we were talking before about with Maria that the complexity and the recovery. But Gerald I want to stay with you for a minute. So there's other data management technologies popping out there. I mean, I've seen some people saying, okay just leave the data in an S3 bucket. We can query that, then we've got some magic sauce to do that. And so why are you optimistic about you know, traditional database technology going forward? >> I would say because of the history of databases. So one thing that once struck me when I came to Oracle and then got to meet great people like Juan Luis and Andy Mendelsohn who had been here for a long, long time. I come to realization that relational databases are around for about 45 years now. And, you know, I was like, I'm too young to have been around then, right? So I was like, what else was around 45 years? It's like just the tech stack that we have today. It's like, how does this look like? Well, Linux only came out in 93. Well, databases pre-date Linux a lot rather than as I started digging I saw a lot of technologies come and go, right? And you mentioned before like the technologies that data management systems that we had that came and went like the columnar databases or XML databases, object databases. And even before relational databases before Cot gave us the relational model there were apparently these networks stores network databases which to some extent look very similar to adjacent documents. There wasn't a harder storing data and a hierarchy to format. And, you know when you then start actually reading the Cot paper and diving a little bit more into the relation model, that's I think one important crux in there that most of the industry keeps forgetting or it hasn't been around to even know. And that is that when Cot created the relational model, he actually focused not so much on the application putting the data in, but on future users and applications still being able to making sense out of the data, right? And that's kind of like I said before we had those network models, we had XML databases you have adjacent documents stores. And the one thing that they all have along with it is like the application that puts the data in decides the structure of the data. And that's all well and good if you had an application of the developer writing an application. It can become really tricky when 10 years later you still want to look at that data and the application that the developer is no longer around then you go like, what does this all mean? Where is the structure defined? What is this attribute? What does it mean? How does it correlate to others? And the one thing that people tend to forget is that it's actually the data that's here to stay not someone who does the applications where it is. Ideally, every company wants to store every single byte of data that they have because there might be future value in it. Economically may not make sense that's now much more feasible than just years ago. But if you could, why wouldn't you want to store all your data, right? And sometimes you actually have to store the data for seven years or whatever because the laws require you to. And so coming back then and you know, like 10 years from now and looking at the data and going like making sense of that data can actually become a lot more difficult and a lot more challenging than having to first figure out and how we store this data for general use. And that kind of was what the relational model was all about. We decompose the data structures into tables and columns with relationships amongst each other so therefore between each other. So that therefore if somebody wants to, you know typical example would be well you store some purchases from your web store, right? There's a customer attribute in it. There's some credit card payment information in it, just some product information on what the customer bought. Well, in the relational model if you just want to figure out which products were sold on a given day or week, you just would query the payment and products table to get the sense out of it. You don't need to touch the customer and so forth. And with the hierarchical model you have to first sit down and understand how is the structure, what is the customer? Where is the payment? You know, does the document start with the payment or does it start with the customer? Where do I find this information? And then in the very early days those databases even struggled to then not having to scan all the documents to get the data out. So coming back to your question a bit, I apologize for going on here. But you know, it's like relational databases have been around for 45 years. I actually argue it's one of the most successful software technologies that we have out there when you look in the overall industry, right? 45 years is like, in IT terms it's like from a star being the ones who are going supernova. You have said it before that many technologies coming and went, right? And just want to add a more really interesting example by the way is Hadoop and HDFS, right? They kind of gave us this additional promise of like, you know, the 2010s like 2012, 2013 the hype of Hadoop and so forth and (mumbles) and HDFS. And people are just like, just put everything into HDFS and worry about the data later, right? And we can query it and map reduce it and whatever. And we had customers actually coming to us they were like, great we have half a petabyte of data on an HDFS cluster and we have no clue what's stored in there. How do we figure this out? What are we going to do now? Now you had a big data cleansing problem. And so I think that is why databases and also data modeling is something that will not go away anytime soon. And I think databases and database technologies are here for quite a while to stay. Because many of those are people they don't think about what's happening to the data five years from now. And many of the niche players also and also frankly even Amazon you know, following with this single purpose thing is like, just use the right tool for the job for your application, right? Just pull in the data there the way you wanted. And it's like, okay, so you use technologies all over the place and then five years from now you have your data fragmented everywhere in different formats and, you know inconsistencies, and, and, and. And those are usually when you come back to this data-driven business critical business decision applications the worst case scenario you can have, right? Because now you need an army of people to actually do data cleansing. And there's not a coincidence that data science has become very, very popular the last recent years as we kind of went on with this proliferation of different database or data management technologies some of those are not even database. But I think I leave it at that. >> It's an interesting talk track because you're right. I mean, no schema on right was alluring, but it definitely created some problems. It also created an entire, you know you referenced the hyper specialized roles and did the data cleansing component. I mean, maybe technology will eventually solve that problem but it hasn't up at least up tonight. Okay, last question, Maria maybe you could start off and Gerald if you want to chime in as well it'd be great. I mean, it's interesting to watch this industry when Oracle sort of won the top database mantle. I mean, I watched it, I saw it. It was, remember it was Informix and it was (indistinct) too and of course, Microsoft you got to give them credit with SQL server, but Oracle won the database wars. And then everything got kind of quiet for awhile database was sort of boring. And then it exploded, you know, all the, you know not only SQL and the key-value stores and the cloud databases and this is really a hot area now. And when we looked at Oracle we said, okay, Oracle it's all about Oracle Database, but we've seen the kind of resurgence in MySQL which everybody thought, you know once Oracle bought Sun they were going to kill MySQL. But now we see you investing in HeatWave, TimesTen, we talked about In-Memory databases before. So where do those fit in Maria in the grand scheme? How should we think about Oracle's database portfolio? >> So there's lots of places where you'd use those different things. 'Cause just like any other industry there are going to be new and boutique use cases that are going to benefit from a more specialized product or single purpose product. So good examples off the top of my head of the kind of systems that would benefit from that would be things like a stock exchange system or a telephone exchange system. Both of those are latency critical transaction processing applications where they need microsecond response times. And that's going to exceed perhaps what you might normally get or deploy with a converged database. And so Oracle's TimesTen database our In-Memory database is perfect for those kinds of applications. But there's also a host of MySQL applications out there today and you said it yourself there Dave, HeatWave is a great place to provision and deploy those kinds of applications because it's going to run 100 times faster than AWS (mumbles). So, you know, there really is a place in the market and in our customer's systems and the needs they have for all of these different members of our database family here at Oracle. >> Yeah, well, the internet is basically running in the lamp stack so I see MySQL going away. All right Gerald, will give you the final word, bring us home. >> Oh, thank you very much. Yeah, I mean, as Maria said, I think it comes back to what we discussed before. There is obviously still needs for special technologies or different technologies than a relational database or multimodal database. Oracle has actually many more databases that people may first think of. Not only the three that we have already mentioned but there's even SP so the Oracle's NoSQL database. And, you know, on a high level Oracle is a data management company, right? And we want to give our customers the best tools and the best technology to manage all of their data. Rather than therefore there has to be a need or there should be a part of the business that also focuses on this highly specialized systems and this highly specialized technologies that address those use cases. And I think it makes perfect sense. It's like, you know, when the customer comes to Oracle they're not only getting this, take this one product you know, and if you don't like it your problem but actually you have choice, right? And choice allows you to make a decision based on what's best for you and not necessarily best for the vendor you're talking to. >> Well guys, really appreciate your time today and your insights. Maria, Gerald, thanks so much for coming on The Cube. >> Thank you very much for having us. >> And thanks for watching this Cube conversation this is Dave Vellante and we'll see you next time. (upbeat music)

Published Date : Jun 24 2021

SUMMARY :

and then you guys just follow my lead. So I noticed Maria you stopped anyway, So any time you don't So when I'm here you guys and we'll open up when Dave's ready. and the benefits they bring What are we really talking about there? the nearest stores to kind of the traditional So for example, you can do So Gerald, you think about to you at all but just receives or even a MongoDB that allows you to do ML and AI into the database, in the database you already have. and I buy that by the way. of since the last 40 years, you know the benefits to this approach is the fact that you can get And you know, it's And that buddy comes in the form of the truth here is you don't and deploy it on the cloud. and the cloud and containers and you know, is the argument you were making And so why are you because the laws require you to. And then it exploded, you and the needs they have in the lamp stack so I and the best technology to and your insights. we'll see you next time.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Gerald Venzl	PERSON	0.99+
Andy Mendelsohn	PERSON	0.99+
Maria	PERSON	0.99+
Dave	PERSON	0.99+
Chile	LOCATION	0.99+
Maria Colgan	PERSON	0.99+
Peru	LOCATION	0.99+
100 times	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
Gerald	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Canada	LOCATION	0.99+
seven years	QUANTITY	0.99+
Juan Luis	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Steve	PERSON	0.99+
five star	QUANTITY	0.99+
Maria Colgan	PERSON	0.99+
Swiss Army	ORGANIZATION	0.99+
Swiss Army	ORGANIZATION	0.99+
Alex	PERSON	0.99+
Facebook	ORGANIZATION	0.99+
MySQL	TITLE	0.99+
one note	QUANTITY	0.99+
yesterday	DATE	0.99+
two hands	QUANTITY	0.99+
three	QUANTITY	0.99+
two experts	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
Linux	TITLE	0.99+
Teradata	ORGANIZATION	0.99+
each microservice	QUANTITY	0.99+
Hadoop	TITLE	0.99+
45 years	QUANTITY	0.99+
Oracles	ORGANIZATION	0.99+
early 2010s	DATE	0.99+
today	DATE	0.99+
one-shot	QUANTITY	0.99+
five	QUANTITY	0.99+
one good example	QUANTITY	0.99+
Sun	ORGANIZATION	0.99+
tonight	DATE	0.99+
first	QUANTITY	0.99+

Keynote Analysis | Virtual Vertica BDC 2020

(upbeat music) >> Narrator: It's theCUBE, covering the Virtual Vertica Big Data Conference 2020. Brought to you by Vertica. >> Dave Vellante: Hello everyone, and welcome to theCUBE's exclusive coverage of the Vertica Virtual Big Data Conference. You're watching theCUBE, the leader in digital event tech coverage. And we're broadcasting remotely from our studios in Palo Alto and Boston. And, we're pleased to be covering wall-to-wall this digital event. Now, as you know, originally BDC was scheduled this week at the new Encore Hotel and Casino in Boston. Their theme was "Win big with big data". Oh sorry, "Win big with data". That's right, got it. And, I know the community was really looking forward to that, you know, meet up. But look, we're making the best of it, given these uncertain times. We wish you and your families good health and safety. And this is the way that we're going to broadcast for the next several months. Now, we want to unpack Colin Mahony's keynote, but, before we do that, I want to give a little context on the market. First, theCUBE has covered every BDC since its inception, since the BDC's inception that is. It's a very intimate event, with a heavy emphasis on user content. Now, historically, the data engineers and DBAs in the Vertica community, they comprised the majority of the content at this event. And, that's going to be the same for this virtual, or digital, production. Now, theCUBE is going to be broadcasting for two days. What we're doing, is we're going to be concurrent with the Virtual BDC. We got practitioners that are coming on the show, DBAs, data engineers, database gurus, we got a security experts coming on, and really a great line up. And, of course, we'll also be hearing from Vertica Execs, Colin Mahony himself right of the keynote, folks from product marketing, partners, and a number of experts, including some from Micro Focus, which is the, of course, owner of Vertica. But I want to take a moment to share a little bit about the history of Vertica. The company, as you know, was founded by Michael Stonebraker. And, Verica started, really they started out as a SQL platform for analytics. It was the first, or at least one of the first, to really nail the MPP column store trend. Not only did Vertica have an early mover advantage in MPP, but the efficiency and scale of its software, relative to traditional DBMS, and also other MPP players, is underscored by the fact that Vertica, and the Vertica brand, really thrives to this day. But, I have to tell you, it wasn't without some pain. And, I'll talk a little bit about that, and really talk about how we got here today. So first, you know, you think about traditional transaction databases, like Oracle or IMBDB tour, or even enterprise data warehouse platforms like Teradata. They were simply not purpose-built for big data. Vertica was. Along with a whole bunch of other players, like Netezza, which was bought by IBM, Aster Data, which is now Teradata, Actian, ParAccel, which was the basis for Redshift, Amazon's Redshift, Greenplum was bought, in the early days, by EMC. And, these companies were really designed to run as massively parallel systems that smoked traditional RDBMS and EDW for particular analytic applications. You know, back in the big data days, I often joked that, like an NFL draft, there was run on MPP players, like when you see a run on polling guards. You know, once one goes, they all start to fall. And that's what you saw with the MPP columnar stores, IBM, EMC, and then HP getting into the game. So, it was like 2011, and Leo Apotheker, he was the new CEO of HP. Frankly, he has no clue, in my opinion, with what to do with Vertica, and totally missed one the biggest trends of the last decade, the data trend, the big data trend. HP picked up Vertica for a song, it wasn't disclosed, but my guess is that it was around 200 million. So, rather than build a bunch of smart tokens around Vertica, which I always call the diamond in the rough, Apotheker basically permanently altered HP for years. He kind of ruined HP, in my view, with a 12 billion dollar purchase of Autonomy, which turned out to be one of the biggest disasters in recent M&A history. HP was forced to spin merge, and ended up selling most of its software to Microsoft, Micro Focus. (laughs) Luckily, during its time at HP, CEO Meg Whitman, largely was distracted with what to do with the mess that she inherited form Apotheker. So, Vertica was left alone. Now, the upshot is Colin Mahony, who was then the GM of Vertica, and still is. By the way, he's really the CEO, and he just doesn't have the title, I actually think they should give that to him. But anyway, he's been at the helm the whole time. And Colin, as you'll see in our interview, is a rockstar, he's got technical and business jobs, people love him in the community. Vertica's culture is really engineering driven and they're all about data. Despite the fact that Vertica is a 15-year-old company, they've really kept pace, and not been polluted by legacy baggage. Vertica, early on, embraced Hadoop and the whole open-source movement. And that helped give it tailwinds. It leaned heavily into cloud, as we're going to talk about further this week. And they got a good story around machine intelligence and AI. So, whereas many traditional database players are really getting hurt, and some are getting killed, by cloud database providers, Vertica's actually doing a pretty good job of servicing its install base, and is in a reasonable position to compete for new workloads. On its last earnings call, the Micro Focus CFO, Stephen Murdoch, he said they're investing 70 to 80 million dollars in two key growth areas, security and Vertica. Now, Micro Focus is running its Suse play on these two parts of its business. What I mean by that, is they're investing and allowing them to be semi-autonomous, spending on R&D and go to market. And, they have no hardware agenda, unlike when Vertica was part of HP, or HPE, I guess HP, before the spin out. Now, let me come back to the big trend in the market today. And there's something going on around analytic databases in the cloud. You've got companies like Snowflake and AWS with Redshift, as we've reported numerous times, and they're doing quite well, they're gaining share, especially of new workloads that are merging, particularly in the cloud native space. They combine scalable compute, storage, and machine learning, and, importantly, they're allowing customers to scale, compute, and storage independent of each other. Why is that important? Because you don't have to buy storage every time you buy compute, or vice versa, in chunks. So, if you can scale them independently, you've got granularity. Vertica is keeping pace. In talking to customers, Vertica is leaning heavily into the cloud, supporting all the major cloud platforms, as we heard from Colin earlier today, adding Google. And, why my research shows that Vertica has some work to do in cloud and cloud native, to simplify the experience, it's more robust in motor stack, which supports many different environments, you know deep SQL, acid properties, and DNA that allows Vertica to compete with these cloud-native database suppliers. Now, Vertica might lose out in some of those native workloads. But, I have to say, my experience in talking with customers, if you're looking for a great MMP column store that scales and runs in the cloud, or on-prem, Vertica is in a very strong position. Vertica claims to be the only MPP columnar store to allow customers to scale, compute, and storage independently, both in the cloud and in hybrid environments on-prem, et cetera, cross clouds, as well. So, while Vertica may be at a disadvantage in a pure cloud native bake-off, it's more robust in motor stack, combined with its multi-cloud strategy, gives Vertica a compelling set of advantages. So, we heard a lot of this from Colin Mahony, who announced Vertica 10.0 in his keynote. He really emphasized Vertica's multi-cloud affinity, it's Eon Mode, which really allows that separation, or scaling of compute, independent of storage, both in the cloud and on-prem. Vertica 10, according to Mahony, is making big bets on in-database machine learning, he talked about that, AI, and along with some advanced regression techniques. He talked about PMML models, Python integration, which was actually something that they talked about doing with Uber and some other customers. Now, Mahony also stressed the trend toward object stores. And, Vertica now supports, let's see S3, with Eon, S3 Eon in Google Cloud, in addition to AWS, and then Pure and HDFS, as well, they all support Eon Mode. Mahony also stressed, as I mentioned earlier, a big commitment to on-prem and the whole cloud optionality thing. So 10.0, according to Colin Mahony, is all about really doubling down on these industry waves. As they say, enabling native PMML models, running them in Vertica, and really doing all the work that's required around ML and AI, they also announced support for TensorFlow. So, object store optionality is important, is what he talked about in Eon Mode, with the news of support for Google Cloud and, as well as HTFS. And finally, a big focus on deployment flexibility. Migration tools, which are a critical focus really on improving ease of use, and you hear this from a lot of customers. So, these are the critical aspects of Vertica 10.0, and an announcement that we're going to be unpacking all week, with some of the experts that I talked about. So, I'm going to close with this. My long-time co-host, John Furrier, and I have talked some time about this new cocktail of innovation. No longer is Moore's law the, really, mainspring of innovation. It's now about taking all these data troves, bringing machine learning and AI into that data to extract insights, and then operationalizing those insights at scale, leveraging cloud. And, one of the things I always look for from cloud is, if you've got a cloud play, you can attract innovation in the form of startups. It's part of the success equation, certainly for AWS, and I think it's one of the challenges for a lot of the legacy on-prem players. Vertica, I think, has done a pretty good job in this regard. And, you know, we're going to look this week for evidence of that innovation. One of the interviews that I'm personally excited about this week, is a new-ish company, I would consider them a startup, called Zebrium. What they're doing, is they're applying AI to do autonomous log monitoring for IT ops. And, I'm interviewing Larry Lancaster, who's their CEO, this week, and I'm going to press him on why he chose to run on Vertica and not a cloud database. This guy is a hardcore tech guru and I want to hear his opinion. Okay, so keep it right there, stay with us. We're all over the Vertica Virtual Big Data Conference, covering in-depth interviews and following all the news. So, theCUBE is going to be interviewing these folks, two days, wall-to-wall coverage, so keep it right there. We're going to be right back with our next guest, right after this short break. This is Dave Vellante and you're watching theCUBE. (upbeat music)

Published Date : Mar 31 2020

SUMMARY :

Brought to you by Vertica. and the Vertica brand, really thrives to this day.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Larry Lancaster	PERSON	0.99+
Colin	PERSON	0.99+
IBM	ORGANIZATION	0.99+
HP	ORGANIZATION	0.99+
70	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
Michael Stonebraker	PERSON	0.99+
Colin Mahony	PERSON	0.99+
Stephen Murdoch	PERSON	0.99+
Vertica	ORGANIZATION	0.99+
EMC	ORGANIZATION	0.99+
Palo Alto	LOCATION	0.99+
Zebrium	ORGANIZATION	0.99+
two days	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
Boston	LOCATION	0.99+
Verica	ORGANIZATION	0.99+
Micro Focus	ORGANIZATION	0.99+
2011	DATE	0.99+
HPE	ORGANIZATION	0.99+
Uber	ORGANIZATION	0.99+
first	QUANTITY	0.99+
Mahony	PERSON	0.99+
Meg Whitman	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Aster Data	ORGANIZATION	0.99+
Snowflake	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
First	QUANTITY	0.99+
12 billion dollar	QUANTITY	0.99+
One	QUANTITY	0.99+
this week	DATE	0.99+
John Furrier	PERSON	0.99+
15-year-old	QUANTITY	0.98+
Python	TITLE	0.98+
Oracle	ORGANIZATION	0.98+
olin Mahony	PERSON	0.98+
around 200 million	QUANTITY	0.98+
Virtual Vertica Big Data Conference 2020	EVENT	0.98+
theCUBE	ORGANIZATION	0.98+
80 million dollars	QUANTITY	0.97+
today	DATE	0.97+
two parts	QUANTITY	0.97+
Vertica Virtual Big Data Conference	EVENT	0.97+
Teradata	ORGANIZATION	0.97+
one	QUANTITY	0.97+
Actian	ORGANIZATION	0.97+

UNLIST TILL 4/2 - Vertica Big Data Conference Keynote

>> Joy: Welcome to the Virtual Big Data Conference. Vertica is so excited to host this event. I'm Joy King, and I'll be your host for today's Big Data Conference Keynote Session. It's my honor and my genuine pleasure to lead Vertica's product and go-to-market strategy. And I'm so lucky to have a passionate and committed team who turned our Vertica BDC event, into a virtual event in a very short amount of time. I want to thank the thousands of people, and yes, that's our true number who have registered to attend this virtual event. We were determined to balance your health, safety and your peace of mind with the excitement of the Vertica BDC. This is a very unique event. Because as I hope you all know, we focus on engineering and architecture, best practice sharing and customer stories that will educate and inspire everyone. I also want to thank our top sponsors for the virtual BDC, Arrow, and Pure Storage. Our partnerships are so important to us and to everyone in the audience. Because together, we get things done faster and better. Now for today's keynote, you'll hear from three very important and energizing speakers. First, Colin Mahony, our SVP and General Manager for Vertica, will talk about the market trends that Vertica is betting on to win for our customers. And he'll share the exciting news about our Vertica 10 announcement and how this will benefit our customers. Then you'll hear from Amy Fowler, VP of strategy and solutions for FlashBlade at Pure Storage. Our partnership with Pure Storage is truly unique in the industry, because together modern infrastructure from Pure powers modern analytics from Vertica. And then you'll hear from John Yovanovich, Director of IT at AT&T, who will tell you about the Pure Vertica Symphony that plays live every day at AT&T. Here we go, Colin, over to you. >> Colin: Well, thanks a lot joy. And, I want to echo Joy's thanks to our sponsors, and so many of you who have helped make this happen. This is not an easy time for anyone. We were certainly looking forward to getting together in person in Boston during the Vertica Big Data Conference and Winning with Data. But I think all of you and our team have done a great job, scrambling and putting together a terrific virtual event. So really appreciate your time. I also want to remind people that we will make both the slides and the full recording available after this. So for any of those who weren't able to join live, that is still going to be available. Well, things have been pretty exciting here. And in the analytic space in general, certainly for Vertica, there's a lot happening. There are a lot of problems to solve, a lot of opportunities to make things better, and a lot of data that can really make every business stronger, more efficient, and frankly, more differentiated. For Vertica, though, we know that focusing on the challenges that we can directly address with our platform, and our people, and where we can actually make the biggest difference is where we ought to be putting our energy and our resources. I think one of the things that has made Vertica so strong over the years is our ability to focus on those areas where we can make a great difference. So for us as we look at the market, and we look at where we play, there are really three recent and some not so recent, but certainly picking up a lot of the market trends that have become critical for every industry that wants to Win Big With Data. We've heard this loud and clear from our customers and from the analysts that cover the market. If I were to summarize these three areas, this really is the core focus for us right now. We know that there's massive data growth. And if we can unify the data silos so that people can really take advantage of that data, we can make a huge difference. We know that public clouds offer tremendous advantages, but we also know that balance and flexibility is critical. And we all need the benefit that machine learning for all the types up to the end data science. We all need the benefits that they can bring to every single use case, but only if it can really be operationalized at scale, accurate and in real time. And the power of Vertica is, of course, how we're able to bring so many of these things together. Let me talk a little bit more about some of these trends. So one of the first industry trends that we've all been following probably now for over the last decade, is Hadoop and specifically HDFS. So many companies have invested, time, money, more importantly, people in leveraging the opportunity that HDFS brought to the market. HDFS is really part of a much broader storage disruption that we'll talk a little bit more about, more broadly than HDFS. But HDFS itself was really designed for petabytes of data, leveraging low cost commodity hardware and the ability to capture a wide variety of data formats, from a wide variety of data sources and applications. And I think what people really wanted, was to store that data before having to define exactly what structures they should go into. So over the last decade or so, the focus for most organizations is figuring out how to capture, store and frankly manage that data. And as a platform to do that, I think, Hadoop was pretty good. It certainly changed the way that a lot of enterprises think about their data and where it's locked up. In parallel with Hadoop, particularly over the last five years, Cloud Object Storage has also given every organization another option for collecting, storing and managing even more data. That has led to a huge growth in data storage, obviously, up on public clouds like Amazon and their S3, Google Cloud Storage and Azure Blob Storage just to name a few. And then when you consider regional and local object storage offered by cloud vendors all over the world, the explosion of that data, in leveraging this type of object storage is very real. And I think, as I mentioned, it's just part of this broader storage disruption that's been going on. But with all this growth in the data, in all these new places to put this data, every organization we talk to is facing even more challenges now around the data silo. Sure the data silos certainly getting bigger. And hopefully they're getting cheaper per bit. But as I said, the focus has really been on collecting, storing and managing the data. But between the new data lakes and many different cloud object storage combined with all sorts of data types from the complexity of managing all this, getting that business value has been very limited. This actually takes me to big bet number one for Team Vertica, which is to unify the data. Our goal, and some of the announcements we have made today plus roadmap announcements I'll share with you throughout this presentation. Our goal is to ensure that all the time, money and effort that has gone into storing that data, all the data turns into business value. So how are we going to do that? With a unified analytics platform that analyzes the data wherever it is HDFS, Cloud Object Storage, External tables in an any format ORC, Parquet, JSON, and of course, our own Native Roth Vertica format. Analyze the data in the right place in the right format, using a single unified tool. This is something that Vertica has always been committed to, and you'll see in some of our announcements today, we're just doubling down on that commitment. Let's talk a little bit more about the public cloud. This is certainly the second trend. It's the second wave maybe of data disruption with object storage. And there's a lot of advantages when it comes to public cloud. There's no question that the public clouds give rapid access to compute storage with the added benefit of eliminating data center maintenance that so many companies, want to get out of themselves. But maybe the biggest advantage that I see is the architectural innovation. The public clouds have introduced so many methodologies around how to provision quickly, separating compute and storage and really dialing-in the exact needs on demand, as you change workloads. When public clouds began, it made a lot of sense for the cloud providers and their customers to charge and pay for compute and storage in the ratio that each use case demanded. And I think you're seeing that trend, proliferate all over the place, not just up in public cloud. That architecture itself is really becoming the next generation architecture for on-premise data centers, as well. But there are a lot of concerns. I think we're all aware of them. They're out there many times for different workloads, there are higher costs. Especially if some of the workloads that are being run through analytics, which tend to run all the time. Just like some of the silo challenges that companies are facing with HDFS, data lakes and cloud storage, the public clouds have similar types of siloed challenges as well. Initially, there was a belief that they were cheaper than data centers, and when you added in all the costs, it looked that way. And again, for certain elastic workloads, that is the case. I don't think that's true across the board overall. Even to the point where a lot of the cloud vendors aren't just charging lower costs anymore. We hear from a lot of customers that they don't really want to tether themselves to any one cloud because of some of those uncertainties. Of course, security and privacy are a concern. We hear a lot of concerns with regards to cloud and even some SaaS vendors around shared data catalogs, across all the customers and not enough separation. But security concerns are out there, you can read about them. I'm not going to jump into that bandwagon. But we hear about them. And then, of course, I think one of the things we hear the most from our customers, is that each cloud stack is starting to feel even a lot more locked in than the traditional data warehouse appliance. And as everybody knows, the industry has been running away from appliances as fast as it can. And so they're not eager to get locked into another, quote, unquote, virtual appliance, if you will, up in the cloud. They really want to make sure they have flexibility in which clouds, they're going to today, tomorrow and in the future. And frankly, we hear from a lot of our customers that they're very interested in eventually mixing and matching, compute from one cloud with, say storage from another cloud, which I think is something that we'll hear a lot more about. And so for us, that's why we've got our big bet number two. we love the cloud. We love the public cloud. We love the private clouds on-premise, and other hosting providers. But our passion and commitment is for Vertica to be able to run in any of the clouds that our customers choose, and make it portable across those clouds. We have supported on-premises and all public clouds for years. And today, we have announced even more support for Vertica in Eon Mode, the deployment option that leverages the separation of compute from storage, with even more deployment choices, which I'm going to also touch more on as we go. So super excited about our big bet number two. And finally as I mentioned, for all the hype that there is around machine learning, I actually think that most importantly, this third trend that team Vertica is determined to address is the need to bring business critical, analytics, machine learning, data science projects into production. For so many years, there just wasn't enough data available to justify the investment in machine learning. Also, processing power was expensive, and storage was prohibitively expensive. But to train and score and evaluate all the different models to unlock the full power of predictive analytics was tough. Today you have those massive data volumes. You have the relatively cheap processing power and storage to make that dream a reality. And if you think about this, I mean with all the data that's available to every company, the real need is to operationalize the speed and the scale of machine learning so that these organizations can actually take advantage of it where they need to. I mean, we've seen this for years with Vertica, going back to some of the most advanced gaming companies in the early days, they were incorporating this with live data directly into their gaming experiences. Well, every organization wants to do that now. And the accuracy for clickability and real time actions are all key to separating the leaders from the rest of the pack in every industry when it comes to machine learning. But if you look at a lot of these projects, the reality is that there's a ton of buzz, there's a ton of hype spanning every acronym that you can imagine. But most companies are struggling, do the separate teams, different tools, silos and the limitation that many platforms are facing, driving, down sampling to get a small subset of the data, to try to create a model that then doesn't apply, or compromising accuracy and making it virtually impossible to replicate models, and understand decisions. And if there's one thing that we've learned when it comes to data, prescriptive data at the atomic level, being able to show end of one as we refer to it, meaning individually tailored data. No matter what it is healthcare, entertainment experiences, like gaming or other, being able to get at the granular data and make these decisions, make that scoring applies to machine learning just as much as it applies to giving somebody a next-best-offer. But the opportunity has never been greater. The need to integrate this end-to-end workflow and support the right tools without compromising on that accuracy. Think about it as no downsampling, using all the data, it really is key to machine learning success. Which should be no surprise then why the third big bet from Vertica is one that we've actually been working on for years. And we're so proud to be where we are today, helping the data disruptors across the world operationalize machine learning. This big bet has the potential to truly unlock, really the potential of machine learning. And today, we're announcing some very important new capabilities specifically focused on unifying the work being done by the data science community, with their preferred tools and platforms, and the volume of data and performance at scale, available in Vertica. Our strategy has been very consistent over the last several years. As I said in the beginning, we haven't deviated from our strategy. Of course, there's always things that we add. Most of the time, it's customer driven, it's based on what our customers are asking us to do. But I think we've also done a great job, not trying to be all things to all people. Especially as these hype cycles flare up around us, we absolutely love participating in these different areas without getting completely distracted. I mean, there's a variety of query tools and data warehouses and analytics platforms in the market. We all know that. There are tools and platforms that are offered by the public cloud vendors, by other vendors that support one or two specific clouds. There are appliance vendors, who I was referring to earlier who can deliver package data warehouse offerings for private data centers. And there's a ton of popular machine learning tools, languages and other kits. But Vertica is the only advanced analytic platform that can do all this, that can bring it together. We can analyze the data wherever it is, in HDFS, S3 Object Storage, or Vertica itself. Natively we support multiple clouds on-premise deployments, And maybe most importantly, we offer that choice of deployment modes to allow our customers to choose the architecture that works for them right now. It still also gives them the option to change move, evolve over time. And Vertica is the only analytics database with end-to-end machine learning that can truly operationalize ML at scale. And I know it's a mouthful. But it is not easy to do all these things. It is one of the things that highly differentiates Vertica from the rest of the pack. It is also why our customers, all of you continue to bet on us and see the value that we are delivering and we will continue to deliver. Here's a couple of examples of some of our customers who are powered by Vertica. It's the scale of data. It's the millisecond response times. Performance and scale have always been a huge part of what we have been about, not the only thing. I think the functionality all the capabilities that we add to the platform, the ease of use, the flexibility, obviously with the deployment. But if you look at some of the numbers they are under these customers on this slide. And I've shared a lot of different stories about these customers. Which, by the way, it still amaze me every time I talk to one and I get the updates, you can see the power and the difference that Vertica is making. Equally important, if you look at a lot of these customers, they are the epitome of being able to deploy Vertica in a lot of different environments. Many of the customers on this slide are not using Vertica just on-premise or just in the cloud. They're using it in a hybrid way. They're using it in multiple different clouds. And again, we've been with them on that journey throughout, which is what has made this product and frankly, our roadmap and our vision exactly what it is. It's been quite a journey. And that journey continues now with the Vertica 10 release. The Vertica 10 release is obviously a massive release for us. But if you look back, you can see that building on that native columnar architecture that started a long time ago, obviously, with the C-Store paper. We built it to leverage that commodity hardware, because it was an architecture that was never tightly integrated with any specific underlying infrastructure. I still remember hearing the initial pitch from Mike Stonebreaker, about the vision of Vertica as a software only solution and the importance of separating the company from hardware innovation. And at the time, Mike basically said to me, "there's so much R&D in innovation that's going to happen in hardware, we shouldn't bake hardware into our solution. We should do it in software, and we'll be able to take advantage of that hardware." And that is exactly what has happened. But one of the most recent innovations that we embraced with hardware is certainly that separation of compute and storage. As I said previously, the public cloud providers offered this next generation architecture, really to ensure that they can provide the customers exactly what they needed, more compute or more storage and charge for each, respectively. The separation of compute and storage, compute from storage is a major milestone in data center architectures. If you think about it, it's really not only a public cloud innovation, though. It fundamentally redefines the next generation data architecture for on-premise and for pretty much every way people are thinking about computing today. And that goes for software too. Object storage is an example of the cost effective means for storing data. And even more importantly, separating compute from storage for analytic workloads has a lot of advantages. Including the opportunity to manage much more dynamic, flexible workloads. And more importantly, truly isolate those workloads from others. And by the way, once you start having something that can truly isolate workloads, then you can have the conversations around autonomic computing, around setting up some nodes, some compute resources on the data that won't affect any of the other data to do some things on their own, maybe some self analytics, by the system, etc. A lot of things that many of you know we've already been exploring in terms of our own system data in the product. But it was May 2018, believe it or not, it seems like a long time ago where we first announced Eon Mode and I want to make something very clear, actually about Eon mode. It's a mode, it's a deployment option for Vertica customers. And I think this is another huge benefit that we don't talk about enough. But unlike a lot of vendors in the market who will dig you and charge you for every single add-on like hit-buy, you name it. You get this with the Vertica product. If you continue to pay support and maintenance, this comes with the upgrade. This comes as part of the new release. So any customer who owns or buys Vertica has the ability to set up either an Enterprise Mode or Eon Mode, which is a question I know that comes up sometimes. Our first announcement of Eon was obviously AWS customers, including the trade desk, AT&T. Most of whom will be speaking here later at the Virtual Big Data Conference. They saw a huge opportunity. Eon Mode, not only allowed Vertica to scale elastically with that specific compute and storage that was needed, but it really dramatically simplified database operations including things like workload balancing, node recovery, compute provisioning, etc. So one of the most popular functions is that ability to isolate the workloads and really allocate those resources without negatively affecting others. And even though traditional data warehouses, including Vertica Enterprise Mode have been able to do lots of different workload isolation, it's never been as strong as Eon Mode. Well, it certainly didn't take long for our customers to see that value across the board with Eon Mode. Not just up in the cloud, in partnership with one of our most valued partners and a platinum sponsor here. Joy mentioned at the beginning. We announced Vertica Eon Mode for Pure Storage FlashBlade in September 2019. And again, just to be clear, this is not a new product, it's one Vertica with yet more deployment options. With Pure Storage, Vertica in Eon mode is not limited in any way by variable cloud, network latency. The performance is actually amazing when you take the benefits of separate and compute from storage and you run it with a Pure environment on-premise. Vertica in Eon Mode has a super smart cache layer that we call the depot. It's a big part of our secret sauce around Eon mode. And combined with the power and performance of Pure's FlashBlade, Vertica became the industry's first advanced analytics platform that actually separates compute and storage for on-premises data centers. Something that a lot of our customers are already benefiting from, and we're super excited about it. But as I said, this is a journey. We don't stop, we're not going to stop. Our customers need the flexibility of multiple public clouds. So today with Vertica 10, we're super proud and excited to announce support for Vertica in Eon Mode on Google Cloud. This gives our customers the ability to use their Vertica licenses on Amazon AWS, on-premise with Pure Storage and on Google Cloud. Now, we were talking about HDFS and a lot of our customers who have invested quite a bit in HDFS as a place, especially to store data have been pushing us to support Eon Mode with HDFS. So as part of Vertica 10, we are also announcing support for Vertica in Eon Mode using HDFS as the communal storage. Vertica's own Roth format data can be stored in HDFS, and actually the full functionality of Vertica is complete analytics, geospatial pattern matching, time series, machine learning, everything that we have in there can be applied to this data. And on the same HDFS nodes, Vertica can actually also analyze data in ORC or Parquet format, using External tables. We can also execute joins between the Roth data the External table holds, which powers a much more comprehensive view. So again, it's that flexibility to be able to support our customers, wherever they need us to support them on whatever platform, they have. Vertica 10 gives us a lot more ways that we can deploy Eon Mode in various environments for our customers. It allows them to take advantage of Vertica in Eon Mode and the power that it brings with that separation, with that workload isolation, to whichever platform they are most comfortable with. Now, there's a lot that has come in Vertica 10. I'm definitely not going to be able to cover everything. But we also introduced complex types as an example. And complex data types fit very well into Eon as well in this separation. They significantly reduce the data pipeline, the cost of moving data between those, a much better support for unstructured data, which a lot of our customers have mixed with structured data, of course, and they leverage a lot of columnar execution that Vertica provides. So you get complex data types in Vertica now, a lot more data, stronger performance. It goes great with the announcement that we made with the broader Eon Mode. Let's talk a little bit more about machine learning. We've been actually doing work in and around machine learning with various extra regressions and a whole bunch of other algorithms for several years. We saw the huge advantage that MPP offered, not just as a sequel engine as a database, but for ML as well. Didn't take as long to realize that there's a lot more to operationalizing machine learning than just those algorithms. It's data preparation, it's that model trade training. It's the scoring, the shaping, the evaluation. That is so much of what machine learning and frankly, data science is about. You do know, everybody always wants to jump to the sexy algorithm and we handle those tasks very, very well. It makes Vertica a terrific platform to do that. A lot of work in data science and machine learning is done in other tools. I had mentioned that there's just so many tools out there. We want people to be able to take advantage of all that. We never believed we were going to be the best algorithm company or come up with the best models for people to use. So with Vertica 10, we support PMML. We can import now and export PMML models. It's a huge step for us around that operationalizing machine learning projects for our customers. Allowing the models to get built outside of Vertica yet be imported in and then applying to that full scale of data with all the performance that you would expect from Vertica. We also are more tightly integrating with Python. As many of you know, we've been doing a lot of open source projects with the community driven by many of our customers, like Uber. And so now with Python we've integrated with TensorFlow, allowing data scientists to build models in their preferred language, to take advantage of TensorFlow. But again, to store and deploy those models at scale with Vertica. I think both these announcements are proof of our big bet number three, and really our commitment to supporting innovation throughout the community by operationalizing ML with that accuracy, performance and scale of Vertica for our customers. Again, there's a lot of steps when it comes to the workflow of machine learning. These are some of them that you can see on the slide, and it's definitely not linear either. We see this as a circle. And companies that do it, well just continue to learn, they continue to rescore, they continue to redeploy and they want to operationalize all that within a single platform that can take advantage of all those capabilities. And that is the platform, with a very robust ecosystem that Vertica has always been committed to as an organization and will continue to be. This graphic, many of you have seen it evolve over the years. Frankly, if we put everything and everyone on here wouldn't fit on a slide. But it will absolutely continue to evolve and grow as we support our customers, where they need the support most. So, again, being able to deploy everywhere, being able to take advantage of Vertica, not just as a business analyst or a business user, but as a data scientists or as an operational or BI person. We want Vertica to be leveraged and used by the broader organization. So I think it's fair to say and I encourage everybody to learn more about Vertica 10, because I'm just highlighting some of the bigger aspects of it. But we talked about those three market trends. The need to unify the silos, the need for hybrid multiple cloud deployment options, the need to operationalize business critical machine learning projects. Vertica 10 has absolutely delivered on those. But again, we are not going to stop. It is our job not to, and this is how Team Vertica thrives. I always joke that the next release is the best release. And, of course, even after Vertica 10, that is also true, although Vertica 10 is pretty awesome. But, you know, from the first line of code, we've always been focused on performance and scale, right. And like any really strong data platform, the execution engine, the optimizer and the execution engine are the two core pieces of that. Beyond Vertica 10, some of the big things that we're already working on, next generation execution engine. We're already actually seeing incredible early performance from this. And this is just one example, of how important it is for an organization like Vertica to constantly go back and re-innovate. Every single release, we do the sit ups and crunches, our performance and scale. How do we improve? And there's so many parts of the core server, there's so many parts of our broader ecosystem. We are constantly looking at coverages of how we can go back to all the code lines that we have, and make them better in the current environment. And it's not an easy thing to do when you're doing that, and you're also expanding in the environment that we are expanding into to take advantage of the different deployments, which is a great segue to this slide. Because if you think about today, we're obviously already available with Eon Mode and Amazon, AWS and Pure and actually MinIO as well. As I talked about in Vertica 10 we're adding Google and HDFS. And coming next, obviously, Microsoft Azure, Alibaba cloud. So being able to expand into more of these environments is really important for the Vertica team and how we go forward. And it's not just running in these clouds, for us, we want it to be a SaaS like experience in all these clouds. We want you to be able to deploy Vertica in 15 minutes or less on these clouds. You can also consume Vertica, in a lot of different ways, on these clouds. As an example, in Amazon Vertica by the Hour. So for us, it's not just about running, it's about taking advantage of the ecosystems that all these cloud providers offer, and really optimizing the Vertica experience as part of them. Optimization, around automation, around self service capabilities, extending our management console, we now have products that like the Vertica Advisor Tool that our Customer Success Team has created to actually use our own smarts in Vertica. To take data from customers that give it to us and help them tune automatically their environment. You can imagine that we're taking that to the next level, in a lot of different endeavors that we're doing around how Vertica as a product can actually be smarter because we all know that simplicity is key. There just aren't enough people in the world who are good at managing data and taking it to the next level. And of course, other things that we all hear about, whether it's Kubernetes and containerization. You can imagine that that probably works very well with the Eon Mode and separating compute and storage. But innovation happens everywhere. We innovate around our community documentation. Many of you have taken advantage of the Vertica Academy. The numbers there are through the roof in terms of the number of people coming in and certifying on it. So there's a lot of things that are within the core products. There's a lot of activity and action beyond the core products that we're taking advantage of. And let's not forget why we're here, right? It's easy to talk about a platform, a data platform, it's easy to jump into all the functionality, the analytics, the flexibility, how we can offer it. But at the end of the day, somebody, a person, she's got to take advantage of this data, she's got to be able to take this data and use this information to make a critical business decision. And that doesn't happen unless we explore lots of different and frankly, new ways to get that predictive analytics UI and interface beyond just the standard BI tools in front of her at the right time. And so there's a lot of activity, I'll tease you with that going on in this organization right now about how we can do that and deliver that for our customers. We're in a great position to be able to see exactly how this data is consumed and used and start with this core platform that we have to go out. Look, I know, the plan wasn't to do this as a virtual BDC. But I really appreciate you tuning in. Really appreciate your support. I think if there's any silver lining to us, maybe not being able to do this in person, it's the fact that the reach has actually gone significantly higher than what we would have been able to do in person in Boston. We're certainly looking forward to doing a Big Data Conference in the future. But if I could leave you with anything, know this, since that first release for Vertica, and our very first customers, we have been very consistent. We respect all the innovation around us, whether it's open source or not. We understand the market trends. We embrace those new ideas and technologies and for us true north, and the most important thing is what does our customer need to do? What problem are they trying to solve? And how do we use the advantages that we have without disrupting our customers? But knowing that you depend on us to deliver that unified analytics strategy, it will deliver that performance of scale, not only today, but tomorrow and for years to come. We've added a lot of great features to Vertica. I think we've said no to a lot of things, frankly, that we just knew we wouldn't be the best company to deliver. When we say we're going to do things we do them. Vertica 10 is a perfect example of so many of those things that we from you, our customers have heard loud and clear, and we have delivered. I am incredibly proud of this team across the board. I think the culture of Vertica, a customer first culture, jumping in to help our customers win no matter what is also something that sets us massively apart. I hear horror stories about support experiences with other organizations. And people always seem to be amazed at Team Vertica's willingness to jump in or their aptitude for certain technical capabilities or understanding the business. And I think sometimes we take that for granted. But that is the team that we have as Team Vertica. We are incredibly excited about Vertica 10. I think you're going to love the Virtual Big Data Conference this year. I encourage you to tune in. Maybe one other benefit is I know some people were worried about not being able to see different sessions because they were going to overlap with each other well now, even if you can't do it live, you'll be able to do those sessions on demand. Please enjoy the Vertica Big Data Conference here in 2020. Please you and your families and your co-workers be safe during these times. I know we will get through it. And analytics is probably going to help with a lot of that and we already know it is helping in many different ways. So believe in the data, believe in data's ability to change the world for the better. And thank you for your time. And with that, I am delighted to now introduce Micro Focus CEO Stephen Murdoch to the Vertica Big Data Virtual Conference. Thank you Stephen. >> Stephen: Hi, everyone, my name is Stephen Murdoch. I have the pleasure and privilege of being the Chief Executive Officer here at Micro Focus. Please let me add my welcome to the Big Data Conference. And also my thanks for your support, as we've had to pivot to this being virtual rather than a physical conference. Its amazing how quickly we all reset to a new normal. I certainly didn't expect to be addressing you from my study. Vertica is an incredibly important part of Micro Focus family. Is key to our goal of trying to enable and help customers become much more data driven across all of their IT operations. Vertica 10 is a huge step forward, we believe. It allows for multi-cloud innovation, genuinely hybrid deployments, begin to leverage machine learning properly in the enterprise, and also allows the opportunity to unify currently siloed lakes of information. We operate in a very noisy, very competitive market, and there are people, who are in that market who can do some of those things. The reason we are so excited about Vertica is we genuinely believe that we are the best at doing all of those things. And that's why we've announced publicly, you're under executing internally, incremental investment into Vertica. That investments targeted at accelerating the roadmaps that already exist. And getting that innovation into your hands faster. This idea is speed is key. It's not a question of if companies have to become data driven organizations, it's a question of when. So that speed now is really important. And that's why we believe that the Big Data Conference gives a great opportunity for you to accelerate your own plans. You will have the opportunity to talk to some of our best architects, some of the best development brains that we have. But more importantly, you'll also get to hear from some of our phenomenal Roth Data customers. You'll hear from Uber, from the Trade Desk, from Philips, and from AT&T, as well as many many others. And just hearing how those customers are using the power of Vertica to accelerate their own, I think is the highlight. And I encourage you to use this opportunity to its full. Let me close by, again saying thank you, we genuinely hope that you get as much from this virtual conference as you could have from a physical conference. And we look forward to your engagement, and we look forward to hearing your feedback. With that, thank you very much. >> Joy: Thank you so much, Stephen, for joining us for the Vertica Big Data Conference. Your support and enthusiasm for Vertica is so clear, and it makes a big difference. Now, I'm delighted to introduce Amy Fowler, the VP of Strategy and Solutions for FlashBlade at Pure Storage, who was one of our BDC Platinum Sponsors, and one of our most valued partners. It was a proud moment for me, when we announced Vertica in Eon mode for Pure Storage FlashBlade and we became the first analytics data warehouse that separates compute from storage for on-premise data centers. Thank you so much, Amy, for joining us. Let's get started. >> Amy: Well, thank you, Joy so much for having us. And thank you all for joining us today, virtually, as we may all be. So, as we just heard from Colin Mahony, there are some really interesting trends that are happening right now in the big data analytics market. From the end of the Hadoop hype cycle, to the new cloud reality, and even the opportunity to help the many data science and machine learning projects move from labs to production. So let's talk about these trends in the context of infrastructure. And in particular, look at why a modern storage platform is relevant as organizations take on the challenges and opportunities associated with these trends. The answer is the Hadoop hype cycles left a lot of data in HDFS data lakes, or reservoirs or swamps depending upon the level of the data hygiene. But without the ability to get the value that was promised from Hadoop as a platform rather than a distributed file store. And when we combine that data with the massive volume of data in Cloud Object Storage, we find ourselves with a lot of data and a lot of silos, but without a way to unify that data and find value in it. Now when you look at the infrastructure data lakes are traditionally built on, it is often direct attached storage or data. The approach that Hadoop took when it entered the market was primarily bound by the limits of networking and storage technologies. One gig ethernet and slower spinning disk. But today, those barriers do not exist. And all FlashStorage has fundamentally transformed how data is accessed, managed and leveraged. The need for local data storage for significant volumes of data has been largely mitigated by the performance increases afforded by all Flash. At the same time, organizations can achieve superior economies of scale with that segregation of compute and storage. With compute and storage, you don't always scale in lockstep. Would you want to add an engine to the train every time you add another boxcar? Probably not. But from a Pure Storage perspective, FlashBlade is uniquely architected to allow customers to achieve better resource utilization for compute and storage, while at the same time, reducing complexity that has arisen from the siloed nature of the original big data solutions. The second and equally important recent trend we see is something I'll call cloud reality. The public clouds made a lot of promises and some of those promises were delivered. But cloud economics, especially usage based and elastic scaling, without the control that many companies need to manage the financial impact is causing a lot of issues. In addition, the risk of vendor lock-in from data egress, charges, to integrated software stacks that can't be moved or deployed on-premise is causing a lot of organizations to back off the all the way non-cloud strategy, and move toward hybrid deployments. Which is kind of funny in a way because it wasn't that long ago that there was a lot of talk about no more data centers. And for example, one large retailer, I won't name them, but I'll admit they are my favorites. They several years ago told us they were completely done with on-prem storage infrastructure, because they were going 100% to the cloud. But they just deployed FlashBlade for their data pipelines, because they need predictable performance at scale. And the all cloud TCO just didn't add up. Now, that being said, well, there are certainly challenges with the public cloud. It has also brought some things to the table that we see most organizations wanting. First of all, in a lot of cases applications have been built to leverage object storage platforms like S3. So they need that object protocol, but they may also need it to be fast. And the said object may be oxymoron only a few years ago, and this is an area of the market where Pure and FlashBlade have really taken a leadership position. Second, regardless of where the data is physically stored, organizations want the best elements of a cloud experience. And for us, that means two main things. Number one is simplicity and ease of use. If you need a bunch of storage experts to run the system, that should be considered a bug. The other big one is the consumption model. The ability to pay for what you need when you need it, and seamlessly grow your environment over time totally nondestructively. This is actually pretty huge and something that a lot of vendors try to solve for with finance programs. But no finance program can address the pain of a forklift upgrade, when you need to move to next gen hardware. To scale nondestructively over long periods of time, five to 10 years plus is a crucial architectural decisions need to be made at the outset. Plus, you need the ability to pay as you use it. And we offer something for FlashBlade called Pure as a Service, which delivers exactly that. The third cloud characteristic that many organizations want is the option for hybrid. Even if that is just a DR site in the cloud. In our case, that means supporting appplication of S3, at the AWS. And the final trend, which to me represents the biggest opportunity for all of us, is the need to help the many data science and machine learning projects move from labs to production. This means bringing all the machine learning functions and model training to the data, rather than moving samples or segments of data to separate platforms. As we all know, machine learning needs a ton of data for accuracy. And there is just too much data to retrieve from the cloud for every training job. At the same time, predictive analytics without accuracy is not going to deliver the business advantage that everyone is seeking. You can kind of visualize data analytics as it is traditionally deployed as being on a continuum. With that thing, we've been doing the longest, data warehousing on one end, and AI on the other end. But the way this manifests in most environments is a series of silos that get built up. So data is duplicated across all kinds of bespoke analytics and AI, environments and infrastructure. This creates an expensive and complex environment. So historically, there was no other way to do it because some level of performance is always table stakes. And each of these parts of the data pipeline has a different workload profile. A single platform to deliver on the multi dimensional performances, diverse set of applications required, that didn't exist three years ago. And that's why the application vendors pointed you towards bespoke things like DAS environments that we talked about earlier. And the fact that better options exists today is why we're seeing them move towards supporting this disaggregation of compute and storage. And when it comes to a platform that is a better option, one with a modern architecture that can address the diverse performance requirements of this continuum, and allow organizations to bring a model to the data instead of creating separate silos. That's exactly what FlashBlade is built for. Small files, large files, high throughput, low latency and scale to petabytes in a single namespace. And this is importantly a single rapid space is what we're focused on delivering for our customers. At Pure, we talk about it in the context of modern data experience because at the end of the day, that's what it's really all about. The experience for your teams in your organization. And together Pure Storage and Vertica have delivered that experience to a wide range of customers. From a SaaS analytics company, which uses Vertica on FlashBlade to authenticate the quality of digital media in real time, to a multinational car company, which uses Vertica on FlashBlade to make thousands of decisions per second for autonomous cars, or a healthcare organization, which uses Vertica on FlashBlade to enable healthcare providers to make real time decisions that impact lives. And I'm sure you're all looking forward to hearing from John Yavanovich from AT&T. To hear how he's been doing this with Vertica and FlashBlade as well. He's coming up soon. We have been really excited to build this partnership with Vertica. And we're proud to provide the only on-premise storage platform validated with Vertica Eon Mode. And deliver this modern data experience to our customers together. Thank you all so much for joining us today. >> Joy: Amy, thank you so much for your time and your insights. Modern infrastructure is key to modern analytics, especially as organizations leverage next generation data center architectures, and object storage for their on-premise data centers. Now, I'm delighted to introduce our last speaker in our Vertica Big Data Conference Keynote, John Yovanovich, Director of IT for AT&T. Vertica is so proud to serve AT&T, and especially proud of the harmonious impact we are having in partnership with Pure Storage. John, welcome to the Virtual Vertica BDC. >> John: Thank you joy. It's a pleasure to be here. And I'm excited to go through this presentation today. And in a unique fashion today 'cause as I was thinking through how I wanted to present the partnership that we have formed together between Pure Storage, Vertica and AT&T, I want to emphasize how well we all work together and how these three components have really driven home, my desire for a harmonious to use your word relationship. So, I'm going to move forward here and with. So here, what I'm going to do the theme of today's presentation is the Pure Vertica Symphony live at AT&T. And if anybody is a Westworld fan, you can appreciate the sheet music on the right hand side. What we're going to what I'm going to highlight here is in a musical fashion, is how we at AT&T leverage these technologies to save money to deliver a more efficient platform, and to actually just to make our customers happier overall. So as we look back, and back as early as just maybe a few years ago here at AT&T, I realized that we had many musicians to help the company. Or maybe you might want to call them data scientists, or data analysts. For the theme we'll stay with musicians. None of them were singing or playing from the same hymn book or sheet music. And so what we had was many organizations chasing a similar dream, but not exactly the same dream. And, best way to describe that is and I think with a lot of people this might resonate in your organizations. How many organizations are chasing a customer 360 view in your company? Well, I can tell you that I have at least four in my company. And I'm sure there are many that I don't know of. That is our problem because what we see is a repetitive sourcing of data. We see a repetitive copying of data. And there's just so much money to be spent. This is where I asked Pure Storage and Vertica to help me solve that problem with their technologies. What I also noticed was that there was no coordination between these departments. In fact, if you look here, nobody really wants to play with finance. Sales, marketing and care, sure that you all copied each other's data. But they actually didn't communicate with each other as they were copying the data. So the data became replicated and out of sync. This is a challenge throughout, not just my company, but all companies across the world. And that is, the more we replicate the data, the more problems we have at chasing or conquering the goal of single version of truth. In fact, I kid that I think that AT&T, we actually have adopted the multiple versions of truth, techno theory, which is not where we want to be, but this is where we are. But we are conquering that with the synergies between Pure Storage and Vertica. This is what it leaves us with. And this is where we are challenged and that if each one of our siloed business units had their own stories, their own dedicated stories, and some of them had more money than others so they bought more storage. Some of them anticipating storing more data, and then they really did. Others are running out of space, but can't put anymore because their bodies aren't been replenished. So if you look at it from this side view here, we have a limited amount of compute or fixed compute dedicated to each one of these silos. And that's because of the, wanting to own your own. And the other part is that you are limited or wasting space, depending on where you are in the organization. So there were the synergies aren't just about the data, but actually the compute and the storage. And I wanted to tackle that challenge as well. So I was tackling the data. I was tackling the storage, and I was tackling the compute all at the same time. So my ask across the company was can we just please play together okay. And to do that, I knew that I wasn't going to tackle this by getting everybody in the same room and getting them to agree that we needed one account table, because they will argue about whose account table is the best account table. But I knew that if I brought the account tables together, they would soon see that they had so much redundancy that I can now start retiring data sources. I also knew that if I brought all the compute together, that they would all be happy. But I didn't want them to tackle across tackle each other. And in fact that was one of the things that all business units really enjoy. Is they enjoy the silo of having their own compute, and more or less being able to control their own destiny. Well, Vertica's subclustering allows just that. And this is exactly what I was hoping for, and I'm glad they've brought through. And finally, how did I solve the problem of the single account table? Well when you don't have dedicated storage, and you can separate compute and storage as Vertica in Eon Mode does. And we store the data on FlashBlades, which you see on the left and right hand side, of our container, which I can describe in a moment. Okay, so what we have here, is we have a container full of compute with all the Vertica nodes sitting in the middle. Two loader, we'll call them loader subclusters, sitting on the sides, which are dedicated to just putting data onto the FlashBlades, which is sitting on both ends of the container. Now today, I have two dedicated storage or common dedicated might not be the right word, but two storage racks one on the left one on the right. And I treat them as separate storage racks. They could be one, but i created them separately for disaster recovery purposes, lashing work in case that rack were to go down. But that being said, there's no reason why I'm probably going to add a couple of them here in the future. So I can just have a, say five to 10, petabyte storage, setup, and I'll have my DR in another 'cause the DR shouldn't be in the same container. Okay, but I'll DR outside of this container. So I got them all together, I leveraged subclustering, I leveraged separate and compute. I was able to convince many of my clients that they didn't need their own account table, that they were better off having one. I eliminated, I reduced latency, I reduced our ticketing I reduce our data quality issues AKA ticketing okay. I was able to expand. What is this? As work. I was able to leverage elasticity within this cluster. As you can see, there are racks and racks of compute. We set up what we'll call the fixed capacity that each of the business units needed. And then I'm able to ramp up and release the compute that's necessary for each one of my clients based on their workloads throughout the day. And so while they compute to the right before you see that the instruments have already like, more or less, dedicated themselves towards all those are free for anybody to use. So in essence, what I have, is I have a concert hall with a lot of seats available. So if I want to run a 10 chair Symphony or 80, chairs, Symphony, I'm able to do that. And all the while, I can also do the same with my loader nodes. I can expand my loader nodes, to actually have their own Symphony or write all to themselves and not compete with any other workloads of the other clusters. What does that change for our organization? Well, it really changes the way our database administrators actually do their jobs. This has been a big transformation for them. They have actually become data conductors. Maybe you might even call them composers, which is interesting, because what I've asked them to do is morph into less technology and more workload analysis. And in doing so we're able to write auto-detect scripts, that watch the queues, watch the workloads so that we can help ramp up and trim down the cluster and subclusters as necessary. There has been an exciting transformation for our DBAs, who I need to now classify as something maybe like DCAs. I don't know, I have to work with HR on that. But I think it's an exciting future for their careers. And if we bring it all together, If we bring it all together, and then our clusters, start looking like this. Where everything is moving in harmonious, we have lots of seats open for extra musicians. And we are able to emulate a cloud experience on-prem. And so, I want you to sit back and enjoy the Pure Vertica Symphony live at AT&T. (soft music) >> Joy: Thank you so much, John, for an informative and very creative look at the benefits that AT&T is getting from its Pure Vertica symphony. I do really like the idea of engaging HR to change the title to Data Conductor. That's fantastic. I've always believed that music brings people together. And now it's clear that analytics at AT&T is part of that musical advantage. So, now it's time for a short break. And we'll be back for our breakout sessions, beginning at 12 pm Eastern Daylight Time. We have some really exciting sessions planned later today. And then again, as you can see on Wednesday. Now because all of you are already logged in and listening to this keynote, you already know the steps to continue to participate in the sessions that are listed here and on the previous slide. In addition, everyone received an email yesterday, today, and you'll get another one tomorrow, outlining the simple steps to register, login and choose your session. If you have any questions, check out the emails or go to www.vertica.com/bdc2020 for the logistics information. There are a lot of choices and that's always a good thing. Don't worry if you want to attend one or more or can't listen to these live sessions due to your timezone. All the sessions, including the Q&A sections will be available on demand and everyone will have access to the recordings as well as even more pre-recorded sessions that we'll post to the BDC website. Now I do want to leave you with two other important sites. First, our Vertica Academy. Vertica Academy is available to everyone. And there's a variety of very technical, self-paced, on-demand training, virtual instructor-led workshops, and Vertica Essentials Certification. And it's all free. Because we believe that Vertica expertise, helps everyone accelerate their Vertica projects and the advantage that those projects deliver. Now, if you have questions or want to engage with our Vertica engineering team now, we're waiting for you on the Vertica forum. We'll answer any questions or discuss any ideas that you might have. Thank you again for joining the Vertica Big Data Conference Keynote Session. Enjoy the rest of the BDC because there's a lot more to come

Published Date : Mar 30 2020

SUMMARY :

And he'll share the exciting news And that is the platform, with a very robust ecosystem some of the best development brains that we have. the VP of Strategy and Solutions is causing a lot of organizations to back off the and especially proud of the harmonious impact And that is, the more we replicate the data, Enjoy the rest of the BDC because there's a lot more to come

ENTITIES

Entity	Category	Confidence
Stephen	PERSON	0.99+
Amy Fowler	PERSON	0.99+
Mike	PERSON	0.99+
John Yavanovich	PERSON	0.99+
Amy	PERSON	0.99+
Colin Mahony	PERSON	0.99+
AT&T	ORGANIZATION	0.99+
Boston	LOCATION	0.99+
John Yovanovich	PERSON	0.99+
Vertica	ORGANIZATION	0.99+
Joy King	PERSON	0.99+
Mike Stonebreaker	PERSON	0.99+
John	PERSON	0.99+
May 2018	DATE	0.99+
100%	QUANTITY	0.99+
Wednesday	DATE	0.99+
Colin	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Vertica Academy	ORGANIZATION	0.99+
five	QUANTITY	0.99+
Joy	PERSON	0.99+
2020	DATE	0.99+
two	QUANTITY	0.99+
Uber	ORGANIZATION	0.99+
Stephen Murdoch	PERSON	0.99+
Vertica 10	TITLE	0.99+
Pure Storage	ORGANIZATION	0.99+
one	QUANTITY	0.99+
today	DATE	0.99+
Philips	ORGANIZATION	0.99+
tomorrow	DATE	0.99+
AT&T.	ORGANIZATION	0.99+
September 2019	DATE	0.99+
Python	TITLE	0.99+
www.vertica.com/bdc2020	OTHER	0.99+
One gig	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
Second	QUANTITY	0.99+
First	QUANTITY	0.99+
15 minutes	QUANTITY	0.99+
yesterday	DATE	0.99+

Ben Di Qual, Microsoft | Commvault GO 2019

>>Live from Denver, Colorado. It's the cube covering com vault go 2019 brought to you by Combolt. >>Hey, welcome back to the cube at Lisa Martin with Steve men and men and we are coming to you alive from combo go 19 please to welcome to the cube, a gent from Microsoft Azure. We've got Ben call principal program manager. Ben, welcome. Thank you. Thanks for having me on. Thanks for coming on. So Microsoft combo, what's going on with the partnership? >>They wouldn't have have great storage pond is in data management space. We've been working with Convult for 20 years now in Microsoft and and they've been working with us on Azure for that as long as I can remember not being on that the Azure business for about seven years now. So just a long time in cloud terms like dog ears and it's sort of, they've been doing a huge amount there around getting customer data into the cloud, reducing costs, getting more resiliency and then also letting them do more with the data. So they're a pretty good partner to have and they make it much easy for their customers to to go and leverage cloud. >> So Ben, you know, in my career I've had lots of interactions with the Microsoft storage team. Things have changed a little bit when you're now talking about Azure compared to more, it was the interaction with the operating system or the business suite at had. >>So maybe bring us up to date as those people that might not have followed where kind of the storage positioning inside of Microsoft is now that when we talk about Azure and your title. Yeah, we, we sort of can just, just briefly, we worked very heavily with our own premises brethren, they are actually inside the O team is inside of the Azure engineering old male, which is kind of funny, but we do a load of things there. If he started looking at, firstly on that, that hybrid side, we have things like Azure files. It's a highly resilient as a service SMB NFS file Shafter a hundred terabytes, but that interacts directly with windows server to give you Azure file sync. So there is sort of synergies there as well. What I'm doing personally, my team, we work on scale storage. The big thing we have in there is owl is out blood storage technology, which really is the underpinning technology fault. >>Preapproval storage and Azure, which is an including our SAS offerings, which are hosted on Azure too. So disc is on blood storage of files on blood storage. You look at Xbox live, all these kind of stuff is a customer to us. So we build that out and we were doing work there and that's, that's really, really interesting. And how we do it. And that's not looking at going, we're gonna buy some compute, we're going to buy some storage, we're going to build it out, we're going to run windows or hyper V or maybe VM-ware with hoc with windows running on the VMware, whatever else. This is more a story about we're gonna provide you storage as a service. You didn't get a minimum of like 11 nines at your ability. And and be able to have that scale to petabytes of capacity in one logical namespace and give you multiple gigabytes, double digit gigabytes of throughput to that storage. >>And now we're even that about to multiple protocols. So rest API century. Today we've got Azure stack storage, EU API, she can go and use, but we give you that consistency of the actual backend storage and the objects and the data available via more than just one protocol. You can go and access that via HDFS API. We talk about data lakes all the time. For us, our blood storage is a data Lake. We turn on hierarchal namespace and you can go and access that via other protocols like as I mentioned HDFS as well. So that is a big story about what we want to do. We want to make that data available at crazy scale, have no limits in the end to the capacity or throughput or performance and over any protocol. That's kind of our lawn on the Hill about what we want to get to. >>And we've been talking to the Combolt team about some of the solutions that they are putting in the cloud. The new offering metallic that came out. They said if my customer has Azure storage or storage from that other cloud provider, you could just go ahead and use that. Maybe how familiar and how much I know you've been having about run metallic. >> We were working, we work pretty tightly with the product team over Convolt around this and my team as well around how do we design and how do we make it work the best and we're going to continue working to optimize as they get to beyond initial launch to go, wow, we've got data sets we we can analyze. We knew how to, we wanted out of tune it. Now really we love the solution particularly more because you know the default if you don't select the storage type where you want to go, you will run on Azure. >>So really sort of be cued off to the relationship there. They chose us as a first place we'll go to, but they've also done the choice for customers. So some customers may want to take it to another cloud. That's fine. It's reasonable. I mean we totally understand it's going to be a multicloud world and that's a reality for any large company. Our goal is to make sure we're growing faster than the competitors, not to knock out the competitors altogether because that just won't happen. So they've got that ability to go and, yeah, Hey, we'll use Azure as default because they feel we're offering the best support and the best solution there. But then if they have that customer, same customer wants to turn around and use a competitor of ours, fine as well. And I see people talking about that today where they may want to mitigate risks and say, I'm going to do, I'm doing off office three, six five on a, taken off this three 65 backup. It's cool. You use metallic, it'll take it maybe to a different region in Asia and they're backing up. They still going, well, I'm still all in on Microsoft. They may want to take it to another cloud or even take it back to on premise. So that does happen too because just in case of that moment we can get that data back in a different location. Something >>so metallic talking about that is this new venture is right. It's a Convolt venture and saw that the other day and thought that's interesting. So we dug into it a little bit yesterday and it's like a startup operating within a 20 year old company, which is very interesting. Not just from an incumbent customer perspective, but an incumbent partner perspective. How have you seen over the last few years and particularly bad in the last nine months with big leadership and GTM changes for condo? How has the partnership with Microsoft evolved as a result of those changes? >>Um, it's always been interesting. I guess when you start looking at adventure and everything seems to, things change a little bit. Priorities may change just to be fair, but we've had that tight relationship for a long time and a relationship level and an exec leadership level, nothing's really changed. But in the way they're building this platform, we, we sit down out of my team at the Azure engineering group and we'll sit down and do things like ideations. Like here's where we see gaps in the markets, here's what we believe could happen. And look back in July, we had inspire, which is our partner conference in Las Vegas and we sat down with their OT, our OT in a room, we'll talking about these kinds of things. And this is I think about two months after they may have started the initial development metallic from what I understand, but we're talking about exactly what they're doing with metallic offered as a service in Azure as, Hey, how about we do this? So we think it's really cool. It opens up a new market to convert I think too. I mean they're so strong in the enterprise, but they don't do much in the smaller businesses because with the full feature product, it also has inherent complexibility complexity around it. So by doing metallic, is it click, click, next done thing. They really opening I think new markets to them and also to us as a partner. >>I was going to add, you know, kind of click on that because they developed this very quickly. This is something that I think what student were here yesterday, metallic was kind of conceived, designed, built in about six months. So in terms of like acceleration, that's kind of a new area for Combolt. >>Yeah, and I think, I think they're really embracing the fact about let's release our code in production for, for products which are sort of getting the, getting to the, Hey, the product is at the viable stage now, not minimum viable, viable, let's release in production, let's find out how customers are using it and then let's keep optimizing and doing that constant iteration, taking that dev ops approach to let's get it out there, let's get it launched, and then let's do these small batches of changes based on customer need, based on tele telemetry. We can actually get in. We can't get the telemetry without having customers. So that's how it's going to keep working. So I think this initial product we see today, it's just going to keep evolving and improving as they get more data, as they get more information, more feedback, which is exactly what we want to see. >>Well, what will come to the cloud air or something you've been living in for a number of years. Ben, I'd love to hear you've been meeting with customers, they've been asking you questions, gives us some of the, you know, some of the things that, what's top of mind for some of the customers? What kinds of things did they come into Microsoft, Dawn, and how's that all fit together? >>There's many different conferences of interrelate, many different conversations and there'll be, we'll go from talking about, you know, Python machine learning or AI fits in PowerPoint. >>Yeah. >>It's a things like, you know, when are we gonna do incremental snapshots from the manage disks, get into the weeds on very infrastructure centric stuff. We're seeing range of conversations there. The big thing I think I see, keep seeing people call out and make assumptions of is that they're not going to be relevant because cloud, I don't know cloud yet. I don't know this whole coup cube thing, containers, I don't really understand that as well as I think I need to. And an AI, Oh my gosh, what do we even do there? Cause everyone's throwing the words and terms around. But to be honest, I think would still really evident is cloud is still is tiny fraction of enterprise workloads. So let's be honest, it's growing at a huge rate because it is that small fraction. So again, there's plenty of time for people to learn but they shouldn't go and try. >>And so it's not like you go and learn everything in the technology stack from networking to development to database management to, to running a data set of power and cooling. You learn the things that are applicable to what you're trying to do. And the same thing goes to cloud. Any of these technologies go and look at what you need to build for your business. Take it that step and then go and find out the details and levels you want to know. And as someone who's been on Azure for, like I said, almost seven years, which is crazy long. That was, that was literally like being in a startup instead of Microsoft when I joined and I wasn't sure if I wanted to join a licensing company. It's been very evident to me. I will not say I'm an Azure expert and I've been seven years in the platform. >>There are too many things for for me to be an expert in everything on, and I think people sort of just have to realize that anyone's saying that it's bravado. Nothing else. Oh, people. The goal is Microsoft as a platform provider. Hopefully you've got the software and the solution does make a lot of this easier for the customer, so hopefully they shouldn't need to become a Coobernetti's expert because it's baked into your platform. They shouldn't have to worry about some of these offerings because it's SAS. Most customers are there. Some things you need to learn between going from exchange to go into Oh three 65 absolutely. There's some nuances and things like that, but once you get over that initial hurdle, it should be a little easier. I think it's right and I think going back to that, sort of going back to bear principles going, what is the highest level of distraction that's viable for your business or that application or this workload has to always be done with everything. If it's like, well, class, not even viable, running on premises, don't, don't need to apologize for not running in cloud. If I as this, what's happening for you because of security, because of application architecture, run it that way. Don't feel the need and the pressure to have to push it that way. I think too many people get caught up in this shiny stuff up here, which is what you know 1% of people are doing versus the other 99% which is still happening in a lot of the areas we work and have challenges in today. >>That's a great point that you bring up because there is all the buzz words, right? AI, machine learning cloud. You've got to be cloud ready. You've got to be data-driven to customer. To your point going, I just need to make sure that what we have set up for our business is going to allow our business one to remain relevant, but to also be able to harness the power of the data that they have to extract new opportunities, new insights, and not get caught up with, shoot, should we be using automation? Should we be using AI? Everybody's talking about it. I liked that you brought up and I find it very respectfully, he said, Hey, I'm not an Azure expert. You'd been there seven, seven dog years like you said. And I think that's what customers probably gained confidence in is hearing the folks like you that they look to for that guidance and that leadership saying, no, I don't know everything to know. But giving them the confidence that their tribe, they're trusting you with that data and also helping look, trusting you to help them make the right decisions for their business. >>Yeah, and that's, we've got to do that. I mean, I as a tech guy, it's like I've, I've loved seeing the changes. When I joined Microsoft, I, I wasn't lying. I was almost there go enough. I really want to join this company. I was going to go join a startup instead and I got asked to one stage in an interview going, why do you want to join Microsoft? We see you've never applied to, I'd never wanted to. A friend told me to come in and it's just been amazing to see those changes and I'm pretty proud on that. So when we talk about those things we're doing, I mean, I think there is no shame going, I'm just going to lift and shift machines because cloud's about flexibility. If you're doing it just on cost, probably doing it for the wrong reason. It's about that flexibility to go and do something. >>Then change within months and slowly make steps to make things better and better as you find a need as you find the ability, whatever it may be. And some of the big things that we focus on right now with customers is we've got a product called Azure advisor. It'll go until people, when one, you don't build things in a resilient manner. Hey, do you know this has not ha because of this and you can do this. It's like, great. We'll also will tell you about security vulnerabilities that maybe should a gateway here for security. Maybe you should do this or this is not patched. But the big thing of that, it also goes and tells you, Hey, you're overspending. You don't need this much. It provisions, you provision like a Ferrari, you need a, you just need a Prius. Go and run a Prius because it's going to do what you need. >>I need a paler list and that's part of that trusted suit. Getting that understanding, and it's counterintuitive, but we're now like, it's coming out of mozzarella too, which is great. But seeing these guys were dropping contracts and licenses and basically, you know, once every three years I may call the customer, Hey, how about renewal? Now, go from that to now being focused on the customer's actual success. I've focused on their growth in Azure as a platform. Our services growth, like utilization not in sales has been a huge change. It scared some people away, but it's brought a lot more people in and and that sort of counterintuitive spend less money thing actually leads in the longterm to people using more. >>Absolutely. That's definitely not the shrink wrap software company of Microsoft that I remember from the 90s yeah. might be similar to, you know, just as to get Convolt to 2019 is not the same combo that many of us know from 15 years ago. A good >>mutual friend of ours, sort of Simon and myself before I took this job, he and I sat down, we're having a beer and discussing the merits, all not Yvette go to things like that. Same with Convolt there. They're changing such such a great deal with, you know, what they're putting in the cloud, what they're doing with the data, where they're trying to achieve with things like for data management across on premises and cloud with microservices applications and stuff going, Hey, this won't work like this anymore. When you now are doing it on premises and with containers, it's pretty good to see. I'm interested to see how they take that even further to their current audience, which is product predominantly. You know the it pro, the data center admin, storage manager. >>It's funny when you talked about just the choice that customers have and those saying, aye, we shouldn't be following the trends because they're the trends. We actually interviewed a couple of hours ago, one of customers that is all on prime healthcare company and said, he's like, I want to make a sticker that says no cloud and proud and it just what there was, we don't normally hear from them. We always talk about cloud, but for a company to sit down and look at what's best for our business, whether it's, you know, FedRAMP certification challenges or HIPAA or GDPR, other compelling requirements to keep it on prem, it was just refreshing to hear this customer say, >>yeah, I mean it's just appropriate for them. You do what's right for you. I, yeah, it's no shame in any of it. It's, I mean you don't, you definitely don't get fans by it by shaming people about not doing something right. And I mean I've, I'm personally very happy to fee fee, you know, see sort of hype around things like blockchain die down a little bit. So it's a slow database and we should use it for this specific case of that shared ledger. You know, things like that where people don't have to know blockchain. Now I have to know IOT. It's like, yeah, and that hype gets people there, but it also causes a lot of anxiety and it's good to see someone actually not be ashamed of it. And they agree the ones when they do take a step and use cloud citizen may be in the business already, they're probably going to do it appropriately because have a reason, not just because we think this would be cool, right? >>Well not. And how much inherit and complexity does that bring in if somebody is really feeling pressured to follow those trends. And maybe that's when you end up with this hodgepodge of technologies that don't work well together. You're spending way more in as as business it folks are consumers, you know, consumers in their personal lives, they expect things to be accessible, visible, but also cost efficient because they have so much choice. >>Yeah, the choice choice is hard. It's just a, just the conversation I was having recently, for example, just we'll take the storage cause of where we are, right? It's like I'm running something on Azure, I'm a, I'm using Souza, I want an NFS Mount point, which is available to me in Fs. Great, perfect. what do I use as like, well you can use any one of these seven options like that, but what's the right choice? And that's the thing about being a platform can be, we give you a lot of choices, but it's still up to you or up to app hotness. It can really help the customers as well to make the most appropriate choice. And, and I, I pushed back really hard in terms of best practices and things. I hate it because again, it's making the assumption this is the best thing to do. >>It's not. It's always about, you know, what are the patterns that have worked for other people? What are the anti-patterns and what's the appropriate path for me to take? And that's actually how we're building our docs now too. So we, we keep, we keep focusing on our Azure technology and we're bringing out some of the biggest things we've done is how we manage our documentation. It's all open sourced, it's all in markdown on get hub. So you can go in and read a document from someone like myself is doing product management going, this is how to use this product and you're actually, this bit's wrong, this bit needs to be like this and you can go in yourself even now, make a change and we can go, Oh yeah and take that committed in and dual this kind of stuff in that way. So we're constantly taking those documents in that way and getting realtime feedback from customers who are using it, not just ourself in an echo chamber. >>So you get this great insight and visibility that you never had before. Well, Ben, thank you, Georgie stew and me on the queue this afternoon. Excited to hear what's coming up next for Azure. Makes appreciate your time. Thank you for steam and event. I, Lisa Martin, you're watching the cue from Convault go 19.

Published Date : Oct 16 2019

SUMMARY :

com vault go 2019 brought to you by Combolt. Hey, welcome back to the cube at Lisa Martin with Steve men and men and we are coming to you alive So they're a pretty good partner to have and they make it much easy for their So Ben, you know, in my career I've had lots of interactions but that interacts directly with windows server to give you Azure file sync. And and be able to have that scale to petabytes of capacity in one logical no limits in the end to the capacity or throughput or performance and over any you could just go ahead and use that. you know the default if you don't select the storage type where you want to go, you will run on Azure. So really sort of be cued off to the relationship there. How have you seen over the last few years and I guess when you start looking at adventure and everything seems to, I was going to add, you know, kind of click on that because they developed this very quickly. So that's how it's going to keep working. been meeting with customers, they've been asking you questions, gives us some of the, you know, some of the things that, we'll go from talking about, you know, Python machine learning or AI fits in PowerPoint. of is that they're not going to be relevant because cloud, You learn the things that are applicable to what you're trying to I think too many people get caught up in this shiny stuff up here, which is what you know 1% I liked that you brought up and I find asked to one stage in an interview going, why do you want to join Microsoft? Go and run a Prius because it's going to do what you need. from that to now being focused on the customer's actual success. might be similar to, you know, just as to get Convolt to 2019 is not the same combo that many of us you know, what they're putting in the cloud, what they're doing with the data, where they're trying to achieve with things like It's funny when you talked about just the choice that customers have and those saying, they're probably going to do it appropriately because have a reason, not just because we think this would be cool, And how much inherit and complexity does that bring in if somebody is really feeling pressured to And that's the thing about being a platform can be, we give you a lot of choices, So you can go in and read a document from someone like myself is doing product management going, So you get this great insight and visibility that you never had before.

ENTITIES

Entity	Category	Confidence
Microsoft	ORGANIZATION	0.99+
Lisa Martin	PERSON	0.99+
Ben Di Qual	PERSON	0.99+
July	DATE	0.99+
Las Vegas	LOCATION	0.99+
Asia	LOCATION	0.99+
seven years	QUANTITY	0.99+
Steve	PERSON	0.99+
Ben	PERSON	0.99+
seven	QUANTITY	0.99+
20 years	QUANTITY	0.99+
Convolt	ORGANIZATION	0.99+
Combolt	ORGANIZATION	0.99+
99%	QUANTITY	0.99+
PowerPoint	TITLE	0.99+
1%	QUANTITY	0.99+
seven options	QUANTITY	0.99+
2019	DATE	0.99+
Python	TITLE	0.99+
Georgie stew	PERSON	0.99+
one	QUANTITY	0.99+
Simon	PERSON	0.99+
today	DATE	0.99+
Ferrari	ORGANIZATION	0.99+
Denver, Colorado	LOCATION	0.99+
GDPR	TITLE	0.98+
yesterday	DATE	0.98+
Today	DATE	0.98+
about seven years	QUANTITY	0.98+
90s	DATE	0.98+
15 years ago	DATE	0.98+
HIPAA	TITLE	0.98+
Azure	TITLE	0.98+
Convult	ORGANIZATION	0.97+
a hundred terabytes	QUANTITY	0.97+
about six months	QUANTITY	0.96+
one protocol	QUANTITY	0.96+
SAS	ORGANIZATION	0.96+
seven dog years	QUANTITY	0.95+
Azure	ORGANIZATION	0.93+
one logical namespace	QUANTITY	0.92+
20 year old	QUANTITY	0.91+
six	QUANTITY	0.9+
this afternoon	DATE	0.89+
Dawn	ORGANIZATION	0.88+
steam	PERSON	0.87+
one stage	QUANTITY	0.86+
last nine months	DATE	0.85+
almost seven years	QUANTITY	0.85+
three years	QUANTITY	0.84+
FedRAMP	ORGANIZATION	0.84+
couple of hours ago	DATE	0.83+
windows	TITLE	0.82+
firstly	QUANTITY	0.81+
more than	QUANTITY	0.81+
Xbox live	COMMERCIAL_ITEM	0.81+
first place	QUANTITY	0.81+
last few years	DATE	0.8+
about two months	QUANTITY	0.79+
Souza	ORGANIZATION	0.77+
five	QUANTITY	0.77+
ody	PERSON	0.73+

Morgan McLean, Google Cloud Platform & Ben Sigelman, LightStep | KubeCon + CloudNativeCon EU 2019

>> Live from Barcelona, Spain it's theCUBE, covering KubeCon, CloudNativeCon, Europe 2019. Brought to you by Red Hat, the Cloud Native Computing Foundation and Ecosystem Partners. >> Welcome back. This is theCUBE's coverage of KubeCon, CloudNativeCon 2019. I'm Stu Miniman, my co-host for two days wall-to-wall coverage is Corey Quinn. Happy to welcome back to the program first Ben Sigelman, who is the co-founder and CEO of LightStep. And welcome to the program a first time Morgan McLean, who's a product manager at Google Cloud Platform. Gentlemen, thanks so much for joining us. >> Thanks for having us. >> Yeah. >> All right so, this was a last minute ad for us because you guys had some interesting news in the keynote. I think the feedback everybody's heard is there's too many projects and everything's overlapping, and how do I make a decision, but interesting piece is OpenCensus, which Morgan was doing, and OpenTracing, which Ben and LightStep were doing are now moving together for OpenTelemetry if I got it right. >> Yup. >> So, is it just everybody's holding hands and singing Kumbaya around the Kubernetes campfire, or is there something more to this? >> Well I mean, it started when the CNCF locked us in a room and told us there were too many projects. (Stu and Ben laughing) Really wouldn't let us leave. No, to be fair they did actually take us to a room and really start the ball rolling, but conversations have picked up for the last few months and personally I'm just really excited that it's gone so well. Initially if you told me six or nine months ago that this would happen, I would've been, given just the way the projects were going, both were growing very quickly, I would've been a little skeptical. But seriously, this merger's gone beyond my wildest dreams. It's awesome, both to unite the communities, it's awesome to unite the projects together. >> What has the response been from the communities on this merger? >> Very positive. >> Yeah. >> Very positive. I mean OpenTracing and OpenCensus are both projects with healthy user bases that are growing quickly and all that, but the reason people adopt them is to future-proof their own software. Because they want to adopt something that's going to be here to stay. And by having these two things out in the world that are both successful, and were overlapping in terms of their goals, I think the presence of two projects was actually really problematic for people. So, the fact that they're merging is net positive, absolutely for the end user community, also for the vendor community, it's a similar, it's almost exactly the same parallel thought process. When we met, the CNCF did broker an in-person meeting where they gave us some space and we all got together and, I don't know how many people were there, like 20 or 30 people in that room. >> They did let us leave the room though, yesterday, yeah that was nice. >> They did let us leave the room, that's true. We were not locked in there, (Morgan laughing) but they asked us in the beginning, essentially they asked everyone to state what their goals were. And almost all of us really had the same goal, which is just to try and make it easy for end users to adopt a telemetry project that they can stick with for the long haul. And so when you think of it in that respect, the merger seems completely obvious. It is true that it doesn't happen very often, and we could speculate about why that is. But I think in this case it was enabled by the fact that we had pretty good social relationships with OpenCensus people. I think Twitter tends to amplify negativity in the world in general, as I'm sure people, not a controversial statement. >> News alert, wait, absolutely the negatives are, it's something in the algorithm I think. >> Yeah, yeah. >> Maybe they should fix that. >> Yeah, yeah (laughs) exactly. And it was funny, there was a lot of perceived animosity between OpenTracing and OpenCensus a year ago, nine months ago, but when you actually talk to the principals in the projects and even just the general purpose developers who are doing a huge amount of work for both projects, that wasn't a sentiment that was widely held or widely felt I think. So, it has been a very kind of happy, it's a huge relief frankly, this whole thing has been a huge relief for all of us I think. >> Yeah it feels like the general ask has always been that, for tracing that doesn't suck. And that tends to be a bit of a tall order. The way that they have seemed to have responded to it is a credit to the maturity of the community. And I think it also speaks to a growing realization that no one wants to have a monoculture of just one option, any color you want so long as it's black. (Ben laughing) Versus there's 500 different things you can pick that all stand in that same spot, and at that point analysis paralysis kicks in. So this feels like it's a net positive for, absolutely everyone involved. >> Definitely. Yeah, one of the anecdotes that Ben and I have shared throughout a lot of these interviews is there were a lot of projects that wanted to include distributed tracing in them. So various web frameworks, I think, was it Hadoop or HBase was-- >> HBase and HDFS were jointly deciding what to do about instrumentation. >> Yeah, and so they would publish an issue on GitHub and someone from OpenTracing would respond saying hey, OpenTracing does this. And they'd be like oh, that's interesting, we can go build an implementation file and issue, someone from OpenCensus would respond and say, no wait, you should use OpenCensus. And with these being very similar yet incompatible APIs, these groups like HBase would sit it and be like, this isn't mature enough, I don't want to deal with this, I've got more important things to focus on right now. And rather than even picking one and ignoring the other, they just ignored tracing, right? With things moving to microservices with Kubernetes being so popular, I mean just look at this conference. Distributed tracing is no longer this kind of nice to have when you're a big company, you need it to understand how your app works and understand the cause of an outage, the cause of a problem. And when you had organizations like this that were looking at tracing instrumentation saying this is a bit of joke with two competing projects, no one was being served well. >> All right, so you talked about there were incompatible APIs, so how do we get from where we were to where we're going? >> So I can talk about that a little bit. The APIs are conceptually incredibly similar. And the part of the criteria for any new language, for OpenTelemetry, are that we are able to build a software bridge to both OpenTracing and OpenCensus that will translate existing instrumentation alongside OpenTelemetry instrumentation, and omit the correct data at the end. And we've built that out in Java already and then starting working a few other languages. It's not a tremendously difficult thing to do if that's your goal. I've worked on this stuff, I started working on Dapper in 2004, so it's been 15 years that I've been working in this space, and I have a lot of regrets about what we did to OpenTracing. And I had this unbelievably tempting thing to start Greenfield like, let's do it right this time, and I'm suppressing every last impulse to do that. And the only goal for this project technically is backwards compatibility. >> Yeah. >> 100% backwards compatibility. There's the famous XKCD comic where you have 14 standards and someone says, we need to create a new standard that will unify across all 14 standards, and now you have 15 standards. So, we don't want to follow that pattern. And by having the leadership from OpenTracing and OpenCensus involved wholesale in this new effort, as well as having these compatibility bridges, we can avoid the fate of IPv6, of Python 3 and things like that. Where the new thing is very appealing but it's so far from the old thing that you literally can't get there incrementally. So that's, our entire design constraint is make sure that backwards compatibility works, get to one project and then we can think about the grand unifying theory of a provability-- >> Ben you are ruining the best thing about standards is that there is so many of them to choose from. (everyone laughing) >> There's still plenty more growing in other areas (laughs) just in this particular space it's smaller. >> One could argue that your approach is nonstandard in its own right. (Ben laughing) And in my own experiments with distributed tracing it seems like step one is, first you have to go back and instrument everything you've built. And step two, hey come back here, because that's a lot of work. The idea of an organization going back and reinstrumenting everything they've already instrumented the first time. >> It's unlikely. >> Unless they build things very modularly and very portably to do exactly that, it's a bit of a heavy lift. >> I agree, yeah, yeah. >> So going forward, are people who have deployed one or the other of your projects going to have to go back and do a reinstrumentation, or will they unify and continue to work as they are? >> So, I would pause at the, I don't know, I would be making up the statistic, so I shouldn't. But let's say a vast majority, I'm thinking like 95, 98% of instrumentation is actually embedded in frameworks and libraries that people depend on. So you need to get Dropwizard, and Spring, and Django, and Flask, and Kafka, things like that need to be instrumented. The application code, the instrumentation, that burden is a bit lower. We announced something called SpecialAgent at LightStep last week, separate to all of this. It's kind of a funny combination, a typical APM agent will interpose on individual function calls, which is a very complicated and heavyweight thing. This doesn't do any of that, but it takes, it basically surveys what you have in your process, it looks for OpenTracing, and in the future OpenTelemetry instrumentation that matches that, and then installs it for you. So you don't have to do any manual work, just basically gluing tab A into slot B or whatever, you don't have to do any of that stuff which is what most OpenTracing instrumentation actually looks like these days. And you can get off the ground without doing any code modifications. So, I think that direction, which is totally portable and vendor neutral as well, as a layer on top of telemetry makes a ton of sense. There are also data translation efforts that are part of OpenCensus that are being ported in to OpenTelemetry that also serve to repurpose existing sources of correlated data. So, all these things are ways to take existing software and get it into the new world without requiring any code changes or redeploys. >> The long-term goal of this has always been that because web framework and client library providers will go and build the instrumentation into those, that when you're writing your own service that you're deploying in Kubernetes or somewhere else, that by linking one of the OpenTelemetry implementations that you get all of that tracing and context propagation, everything out of the box. You as a sort of individual developer are only using the APIs to define custom metrics, custom spans, things that are specific to your business. >> So Ben, you didn't name LightStep the same as your project. But that being said, a major piece of your business is going through a change here, what does this mean for LightStep? >> That's actually not the way I see it for what it's worth. LightStep as a product, since you're giving me an opportunity to talk about it, (laughs) foolish move on your part. No, I'm just kidding. But LightStep as a product is totally omnivorous, we don't really care where the data comes from. And translating any source of data that has a correlation ID and a timestamp is a pretty trivial exercise for us. So we do support OpenTracing, we also support OpenCensus for what it's worth. We'll support OpenTelemetry, we support a bunch of weird in-house things people have already built. We don't care about that at all. The reason that we're pursuing OpenTelemetry is two-fold, one is that we do want to see high quality data coming out of projects. We said at the keynote this morning, but observability literally cannot be better than your telemetry. If your telemetry sucks, your observability will also suck. It's just definitionally true, if you go back to the definition of observability from the '60s. And so we want high quality telemetry so our product can be awesome. Also, just as an individual, I'm a nerd about this stuff and I just like it. I mean a lot of my motivation for working on this is that I personally find it gratifying. It's not really a commercial thing, I just like it. >> Do you find that, as you start talking about this more and more with companies that are becoming cloud-native rapidly, either through digital transformation or from springing fully formed from the forehead of some God, however these born in the cloud companies tend to be, that they intuitively are starting to grasp the value of tracing? Or does this wind up being a much heavier lift as you start, showing them the golden path as it were? >> It's definitely grown like I-- >> Well I think the value of tracing, you see that after you see the negative value of a really catastrophic outage. >> Yes. >> I mean I was just talking to a bank, I won't name the bank but a bank at this conference, and they were talking about their own adoption of tracing, which was pretty slow, until they had a really bad outage where they couldn't transact for an hour and they didn't know which of the 200 services was responsible for the issue. And that really put some muscle behind their tracing initiative. So, typically it's inspired by an incident like that, and then, it's a bit reactive. Sometimes it's not but either way you end up in that place eventually. >> I'm a strong proponent of distributed tracing and I feel very seen by your last answer. (Ben laughing) >> But it's definitely made a big impact. If you came to conferences like this two years ago you'd have Adrian, or Yuri or someone doing a talk on distributed tracing. And they would always start by asking the 100 to 200 person audience, who here knows what distributed tracing is? And like five people would raise their hand and everyone else would be like no, that's why I'm here at the talk, I want to find out about it. And you go to ones now, or even last year, and now they have 400 people at the talk and you ask, who knows what distributed tracing is? And last year over half the people would raise their hand, now it's going to be even higher. And I think just beyond even anecdotes, clearly businesses are finding the value because they're implementing it. And you can see that through the number of companies that have an interest in OpenTracing, OpenTelemetry, OpenCensus. You can see that in the growth of startups in this space, LightStep and others. >> The other thing I like about OpenTelemetry as a name, it's a bit of a mouthful but that's, it's important for people to understand the distinction between telemetry and tracing data and actual solutions. I mean OpenTelemetry stops when the correct data is being omitted. And then what you do with that data is your own business. And I also think that people are realizing that tracing is more than just visualizing a single distributed trace. >> Yeah. >> The traces have an enormous amount of information in there about resource usage, security patterns, access patterns, large-scale performance patterns that are embedded in thousands of traces, that sort of data is making its way into products as well. And I really like that OpenTelemetry has clearly delineated that it stops with the telemetry. OpenTracing was confusing for people, where they'd want tracing and they'd adopt OpenTracing, and then be like, where's my UI? And it's like well no, it's not that kind of project. With OpenTelemetry I think we've been very clear, this is about getting >> The name is more clear yeah. >> very high quality data in a portable way with minimal effort. And then you can use that in any number of ways, and I like that distinction, I think it's important. >> Okay so, how do we make sure that the combination of these two doesn't just get watered-down to the least common denominator, or that Ben just doesn't get upset and say, forget it, I'm going to start from scratch and do it right this time? (Ben laughing) >> I'm not sure I see either of those two happening. To your comment about the least common denominator, we're starting from what I was just commenting about like two years ago, from very little prior art. Like yeah, you had projects like Zipkin, and Zipkin had its own instrumentation, but it was just for tracing, it was just for Zipkin. And you had Jaeger with its own. And so, I think we're so far away, in a few years the least common denominator will be dramatically better than what we have today. (laughs) And so at this stage, I'm not even remotely worried about that. And secondly to some vendor, I know, because Ben had just exampled this, >> Some vendor, some vendor. >> that's probably not, probably not the best one. But for vendor interference in this projects, I really don't see it. Both because of what we talked about earlier where the vendors right now want more telemetry. I meet with them, Ben meets with 'em, we all meet with 'em all the time, we work with them. And the biggest challenge we have is just the data we get is bad, right? Either we don't support certain platforms, we'll get traces that dead end at certain places, we don't get metrics with the same name for certain types of telemetry. And so this project is going to fix that and it's going to solve this problem for a lot of vendors who have this, frankly, a really strong economic incentive to play ball, and to contribute to it. >> Do you see that this, I guess merging of the two projects, is offering an opportunity to either of you to fix some, or revisit if not fix, some of the mistakes, as they were, of the past? I know every time I build something I look back and it was frankly terrible because that's the kind of developer I am. But are you seeing this, as someone who's probably, presumably much better at developing than I've ever been, as the opportunity to unwind some of the decisions you made earlier on, out of either ignorance or it didn't work out as well as you hoped? >> There are a couple of things about each project that we see an opportunity to correct here without doing any damage to the compatibility story. For OpenTracing it was just a bit too narrow. I mean I would talk a lot about how we want to describe the software, not the tracing system. But we kind of made a mistake in that we called it OpenTracing. Really people want, if a request comes in, they want to describe that request and then have it go to their tracing system, but also to their metric system, and to their logging stack, and to anywhere else, their security system. You should only have to instrument that once. So, OpenTracing was a bit too narrow. OpenCensus, we've talked about this a lot, built a really high quality reference implementation into the product, if OpenCensus, the product I mean. And that coupling created problems for vendors to adopt and it was a bit thick for some end users as well. So we are still keeping the reference implementation, but it's now cleanly decoupled. >> Yeah. >> So we have loose coupling, a la OpenTracing, but wider scope a la OpenCensus. And in that aspect, I think philosophically, this OpenTelemetry effort has taken the best of both worlds from these two projects that it started with. >> All right well, Ben and Morgan thank you so much for sharing. Best of luck and let us know if CNCF needs to pull you guys in a room a little bit more to help work through any of the issues. (Ben laughing) But thanks again for joining us. >> Thank you so much. >> Thanks for having us, it's been a pleasure. >> Yeah. >> All right for Corey Quinn, I'm Stu Miniman we'll be back to wrap up our day one of two days live coverage here from KubeCon, CloudNativeCon 2019, Barcelona, Spain. Thanks for watching theCUBE. (soft instrumental music)

Published Date : May 21 2019

SUMMARY :

Brought to you by Red Hat, the Cloud Native Happy to welcome back to the program first Ben Sigelman, because you guys had some interesting news in the keynote. and really start the ball rolling, like 20 or 30 people in that room. They did let us leave the room though, And so when you think of it in that respect, in the algorithm I think. and even just the general purpose developers And that tends to be a bit of a tall order. Yeah, one of the anecdotes that Ben and I have shared HBase and HDFS were jointly deciding And rather than even picking one and ignoring the other, And the only goal for this project There's the famous XKCD comic where you have 14 standards is that there is so many of them to choose from. growing in other areas (laughs) just in this One could argue that your to do exactly that, it's a bit of a heavy lift. and get it into the new world without requiring that by linking one of the OpenTelemetry implementations But that being said, a major piece of your business one is that we do want to see high quality data you see that after you see the negative value And that really put some muscle and I feel very seen by your last answer. You can see that in the growth of startups And then what you do with that data is your own business. And I really like that OpenTelemetry has clearly delineated and I like that distinction, I think it's important. And you had Jaeger with its own. Some vendor, And so this project is going to fix that and it's going to solve is offering an opportunity to either of you to fix some, and then have it go to their tracing system, And in that aspect, I think philosophically, Best of luck and let us know if CNCF needs to pull you guys Thanks for having us, Thanks for watching theCUBE.

ENTITIES

Entity	Category	Confidence
Ben Sigelman	PERSON	0.99+
2004	DATE	0.99+
Corey Quinn	PERSON	0.99+
Stu Miniman	PERSON	0.99+
Morgan	PERSON	0.99+
20	QUANTITY	0.99+
Ben	PERSON	0.99+
Red Hat	ORGANIZATION	0.99+
Cloud Native Computing Foundation	ORGANIZATION	0.99+
Stu	PERSON	0.99+
100	QUANTITY	0.99+
Python 3	TITLE	0.99+
two projects	QUANTITY	0.99+
yesterday	DATE	0.99+
last year	DATE	0.99+
Java	TITLE	0.99+
five people	QUANTITY	0.99+
15 years	QUANTITY	0.99+
thousands	QUANTITY	0.99+
LightStep	ORGANIZATION	0.99+
Adrian	PERSON	0.99+
last week	DATE	0.99+
both	QUANTITY	0.99+
400 people	QUANTITY	0.99+
two days	QUANTITY	0.99+
KubeCon	EVENT	0.99+
30 people	QUANTITY	0.99+
Morgan McLean	PERSON	0.99+
two	QUANTITY	0.99+
200 services	QUANTITY	0.99+
each project	QUANTITY	0.99+
CNCF	ORGANIZATION	0.99+
nine months ago	DATE	0.99+
Yuri	PERSON	0.99+
two things	QUANTITY	0.99+
OpenCensus	TITLE	0.99+
Both	QUANTITY	0.99+
Twitter	ORGANIZATION	0.99+
one	QUANTITY	0.99+
OpenCensus	ORGANIZATION	0.99+
Barcelona, Spain	LOCATION	0.99+
OpenTracing	TITLE	0.99+
CloudNativeCon	EVENT	0.98+
two years ago	DATE	0.98+
95, 98%	QUANTITY	0.98+
200 person	QUANTITY	0.98+
Ecosystem Partners	ORGANIZATION	0.98+
one option	QUANTITY	0.98+
one project	QUANTITY	0.98+
first time	QUANTITY	0.98+
two-fold	QUANTITY	0.98+
both projects	QUANTITY	0.97+
six	DATE	0.97+
Google	ORGANIZATION	0.97+
two years ago	DATE	0.97+
15 standards	QUANTITY	0.97+
first	QUANTITY	0.97+
LightStep	TITLE	0.96+
GitHub	ORGANIZATION	0.96+
CloudNativeCon 2019	EVENT	0.96+
'60s	DATE	0.96+
OpenTracing	ORGANIZATION	0.96+
Zipkin	ORGANIZATION	0.96+

Steve Wooledge, Arcadia Data & Satya Ramachandran, Neustar | DataWorks Summit 2018

(upbeat electronic music) >> Live from San Jose, in the heart of Silicon Valley, it's theCUBE. Covering Dataworks Summit 2018, brought to you by Hortonworks. (electronic whooshing) >> Welcome back to theCUBE's live coverage of Dataworks, here in San Jose, California. I'm your host, Rebecca Knight, along with my co-host, James Kobielus. We have two guests in this segment, we have Steve Wooledge, he is the VP of Product Marketing at Arcadia Data, and Satya Ramachandran, who is the VP of Engineering at Neustar. Thanks so much for coming on theCUBE. >> Our pleasure and thank you. >> So let's start out by setting the scene for our viewers. Tell us a little bit about what Arcadia Data does. >> Arcadia Data is focused on getting business value from these modern scale-out architectures, like Hadoop, and the Cloud. We started in 2012 to solve the problem of how do we get value into the hands of the business analysts that understand a little bit more about the business, in addition to empowering the data scientists to deploy their models and value to a much broader audience. So I think that's been, in some ways, the last mile of value that people need to get out of Hadoop and data lakes, is to get it into the hands of the business. So that's what we're focused on. >> And start seeing the value, as you said. >> Yeah, seeing is believing, a picture is a thousand words, all those good things. And what's really emerging, I think, is companies are realizing that traditional BI technology won't solve the scale and user concurrency issues, because architecturally, big data's different, right? We're on the scale-out, MPP architectures now, like Hadoop, the data complexity and variety has changed, but the BI tools are still the same, and you pull the data out of the system to put it into some little micro cube to do some analysis. Companies want to go after all the data, and view the analysis across a much broader set, and that's really what we enable. >> I want to hear about the relationship between your two companies, but Satya, tell us a little about Neustar, what you do. >> Neustar is an information services company, we are built around identity. We are the premiere identity provider, the most authoritative identity provider for the US. And we built a whole bunch of services around that identity platform. I am part of the marketing solutions group, and I head the analytics engineering for marketing solutions. The product that I work on helps marketers do their annual planning, as well as their campaign or tactical planning, so that they can fine tune their campaigns on an ongoing basis. >> So how do you use Arcadia Data's primary product? >> So we are a predictive analytics platform, the reporting solution, we use Arcadia for the reporting part of it. So we have multi terabytes of advertising data in our values, and so we use Arcadia to provide fast taxes to our customers, and also very granular and explorative analysis of this data. High (mumbles) and explorative analysis of this data. >> So you say you help your customers with their marketing campaigns, so are you doing predictive analytics? And are you during churn analysis and so forth? And how does Arcadia fit into all of that? >> So we get data and then they build an activation model, which tells how the marketing spent corresponds to the revenue. We not only do historical analysis, we also do predictive, in the sense that the marketers frequently done what-if analysis, saying that, what if I moved my budget from page search to TV? And how does it affect the revenue? So all of this modeling is built by Neustar, the modeling platform is built by the Neustar, but the last mile of taking these reports and providing this explorative analysis of the results, that is provided by the reporting solution, which is Arcadia. >> Well, I mean, the thing about data analytics, is that it really is going to revolutionize marketing. That famous marketing adage of, I know my advertising works, I just don't know which half, and now we're really going to be able to figure out which half. Can you talk a little bit about return on investment and what your clients see? >> Sure, we've got some major Fortune 500 companies that have said publicly that they've realized over a billion dollars of incremental value. And that could be across both marketing analytics, and how we better treat our messaging, our brand, to reach our intended audience. There's things like supply chain and being able to more realtime analyze what-if analysis for different routes, it's things like cyber security and stopping fraud and waste and things like that at a much grander scale than what was really possible in the past. >> So we're here at Dataworks and it's the Hortonworks show. Give us a sense of the degree of your engagement or partnership with Hortonworks and participation in their partner ecosystem. >> Yeah, absolutely. Hortonworks is one of our key partners, and what we did that's different architecturally, is we built our BI server directly into the data platforms. So what I mean by that is, we take the concept of a BI server, we install it and run it on the data nodes of Hortonworks Data Platform. We inherit the security directly out of systems like Apache Ranger, so that all that administration and scale is done at Hadoop economics, if you will, and it leverages the things that are already in place. So that has huge advantages both in terms of scale, but also simplicity, and then you get the performance, the concurrency that companies need to deploy out to like, 5,000 users directly on that Hadoop cluster. So, Hortonworks is a fantastic partner for us and a large number of our customers run on Hortonworks, as well as other platforms, such as Amazon Web Services, where Satya's got his system deployed. >> At the show they announced Hortonworks Data Platform 3.0. There's containerization there, there's updates to Hive to enable it to be more of a realtime analytics, and also a data warehousing engine. In Arcadia Data, do you follow their product enhancements, in terms of your own product roadmap with any specific, fixed cycle? Are you going to be leveraging the new features in HDP 3.0 going forward to add value to your customers' ability to do interactive analysis of this data in close to realtime? >> Sure, yeah, no, because we're a native-- >> 'Cause marketing campaigns are often in realtime increasingly, especially when you're using, you know, you got a completely digital business. >> Yeah, absolutely. So we benefit from the innovations happening within the Hortonworks Data Platform. So, because we're a native BI tool that runs directly within that system, you know, with changes in Hive, or different things within HDFS, in terms of performance or compression and things like that, our customers generally benefit from that directly, so yeah. >> Satya, going forward, what are some of the problems that you want to solve for your clients? What is their biggest pain points and where do you see Neustar? >> So, data is the new island, right? So, marketers, also for them now, data is the biggest, is what they're going after. They want faster analysis, they want to be able to get to insights as fast as they can, and they want to obviously get, work on as large amount of data as possible. The variety of sources is becoming higher and higher and higher, in terms of marketing. There used to be a few channels in '70s and '80s, and '90s kind of increased, now you have like, hundreds of channels, if not thousands of channels. And they want visibility across all of that. It's the ability to work across this variety of data, increasing volume at a very high speed. Those are high level challenges that we have at Neustar. >> Great. >> So the difference, marketing attribution analysis you say is one of the core applications of your solution portfolio. How is that more challenging now than it had been in the past? We have far more marketing channels, digital and so forth, then how does the state-of-the-art of marketing attribution analysis, how is it changing to address this multiplicity of channels and media for advertising and for influencing the customer on social media and so forth? And then, you know, can you give us a sense for then, what are the necessary analytical tools needed for that? We often hear about a social graph analysis or semantic analysis, or for behavioral analytics and so forth, all of this makes it very challenging. How can you determine exactly what influences a customer now in this day and age, where, you think, you know, Twitter is an influencer over the conversation. How can you nail that down to specific, you know, KPIs or specific things to track? >> So I think, from our, like you pointed out, the variety is increasing, right? And I think the marketers now have a lot more options than what they have, and that that's a blessing, and it's also a curse. Because then I don't know where I'm going to move my marketing spending to. So, attribution right now, is still sitting at the headquarters, it's kind of sitting at a very high level and it is answering questions. Like we said, with the Fortune 100 companies, it's still answering questions to the CMOs, right? Where attribution will take us, next step is to then lower down, where it's able to answer the regional headquarters on what needs to happen, and more importantly, on every store, I'm able to then answer and tailor my attribution model to a particular store. Let's take Ford for an example, right? Now, instead of the CMO suite, but, if I'm able to go to every dealer, and I'm able to personal my attribution to that particular dealer, then it becomes a lot more useful. The challenge there is it all needs to be connected. Whatever model we are working for the dealer, needs to be connected up to the headquarters. >> Yes, and that personalization, it very much leverages the kind of things that Steve was talking about at Arcadia. Being able to analyze all the data to find those micro, micro, micro segments that can be influenced to varying degrees, so yeah. I like where you're going with this, 'cause it very much relates to the power of distributing federated big data fabrics like Hortonworks' offers. >> And so it's streaming analytics is coming to forward, and it's been talked about for the past longest period of time, but we have real use cases for streaming analytics right now. Similarly, the large volumes of the data volumes is, indeed, becoming a lot more. So both of them are doing a lot more right now. >> Yes. >> Great. >> Well, Satya and Steve, thank you so much for coming on theCUBE, this was really, really fun talking to you. >> Excellent. >> Thanks, it was great to meet you. Thanks for having us. >> I love marketing talk. >> (laughs) It's fun. I'm Rebecca Knight, for James Kobielus, stay tuned to theCUBE, we will have more coming up from our live coverage of Dataworks, just after this. (upbeat electronic music)

Published Date : Jun 20 2018

SUMMARY :

brought to you by Hortonworks. the VP of Product Marketing the scene for our viewers. the data scientists to deploy their models the value, as you said. and you pull the data out of the system Neustar, what you do. and I head the analytics engineering the reporting solution, we use Arcadia analysis of the results, and what your clients see? and being able to more realtime and it's the Hortonworks show. and it leverages the things of this data in close to realtime? you got a completely digital business. So we benefit from the It's the ability to work to specific, you know, KPIs and I'm able to personal my attribution the data to find those micro, analytics is coming to forward, talking to you. Thanks, it was great to meet you. stay tuned to theCUBE, we

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Steve Wooledge	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Satya Ramachandran	PERSON	0.99+
Steve	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Neustar	ORGANIZATION	0.99+
Arcadia Data	ORGANIZATION	0.99+
Ford	ORGANIZATION	0.99+
Satya	PERSON	0.99+
2012	DATE	0.99+
San Jose	LOCATION	0.99+
two companies	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
two guests	QUANTITY	0.99+
Arcadia	ORGANIZATION	0.99+
San Jose, California	LOCATION	0.99+
Amazon Web Services	ORGANIZATION	0.99+
US	LOCATION	0.99+
both	QUANTITY	0.99+
Hortonworks'	ORGANIZATION	0.99+
5,000 users	QUANTITY	0.99+
Dataworks	ORGANIZATION	0.98+
theCUBE	ORGANIZATION	0.98+
one	QUANTITY	0.97+
Twitter	ORGANIZATION	0.96+
hundreds of channels	QUANTITY	0.96+
Dataworks Summit 2018	EVENT	0.96+
DataWorks Summit 2018	EVENT	0.93+
thousands of channels	QUANTITY	0.93+
over a billion dollars	QUANTITY	0.93+
Data Platform 3.0	TITLE	0.9+
'70s	DATE	0.86+
Arcadia	TITLE	0.84+
Hadoop	TITLE	0.84+
HDP 3.0	TITLE	0.83+
'90s	DATE	0.82+
Apache Ranger	ORGANIZATION	0.82+
thousand words	QUANTITY	0.76+
HDFS	TITLE	0.76+
multi terabytes	QUANTITY	0.75+
Hive	TITLE	0.69+
Neustar	TITLE	0.67+
Fortune	ORGANIZATION	0.62+
80s	DATE	0.55+
500	QUANTITY	0.45+
100	QUANTITY	0.4+
theCUBE	TITLE	0.39+

Matthew Baird, AtScale | Big Data SV 2018

>> Announcer: Live from San Jose. It's theCUBE, presenting Big Data Silicon Valley. Brought to you by SiliconANGLE Media, and it's ecosystem partners. (techno music) >> Welcome back to theCUBE, our continuing coverage on day one of our event, Big Data SV. I'm Lisa Martin with George Gilbert. We are down the street from the Strata Data Conference. We've got a great, a lot of cool stuff going on. You can see the cool set behind me. We are at Forager Tasting Room & Eatery. Come down and join us, be in our audience today. We have a cocktail event tonight, who doesn't want to join that? And we have a nice presentation tomorrow morning of our Wikibon's 2018 Big Data Forecast and Review. Joining us next is Matthew Baird the co-founder of AtScale. Matthew, welcome to theCUBE. >> Thanks for having me. Fantastic venue, by the way. >> Isn't it cool? >> This is very cool. >> Yeah, it is. So, talking about Big Data, you know, Gardner says, "85% of Big Data projects have failed." I often say failure is not a bad F word, because it can spawn the genesis of a lot of great business opportunities. Data lakes were big a few years ago, turned into swamps. AtScale has this vision of Data Lake 2.0, what is that? >> So, you're right. There have been a lot of failures, there's no doubt about it. And you're also right that is how we evolve, and we're a Silicon Valley based company. We don't give up when faced with these things. It's just another way to not do something. So, what we've seen and what we've learned through our customers is they need to have a solution that is integrated with all the technologies that they've adopted in the enterprise. And it's really about, if you're going to make a data lake, you're going to have data on there that is the crown jewels of your business. How are you going to get that in the hands of your constituents, so that they can analyze it, and they can use it to make decisions? And how can we, furthermore, do that in a way that supplies governance and auditability on top of it, so that we aren't just sending data out into the ether and not knowing where it goes? We have a lot of customers in the insurance, health insurance space, and with financial customers that the data absolutely must be managed. I think one of the biggest changes is around that integration with the current technologies. There's a lot of movement into the Cloud. The new data lake is kind of focused more on these large data stores, where it was HDFS with Hadoop. Now it's S3, Google's object storage, and Azure ADLS. Those are the sorts of things that are backing the new data lake I believe. >> So if we take these, where the Data Lake Store didn't have to be something that's a open source HDFS implementation, it could even be through just through a HDSF API. >> Matthew: Yeah, absolutely. >> What are some of the, how should we think about the data sources and feeds, for this repository, and then what is it on top that we need to put to make the data more consumable? >> Yeah, that's a good point. S3, Google Object Storage, and Azure, they all have a characteristic of, they are large stores. You can store as much as you want. They generally on the Clouds, and in the open source on-prem software for landing the data exists, for streaming the data and landing it, but the important thing there is it's cost-effective. S3 is a cost-effective storage system. HDFS is a mostly cost-effective storage system. You have to manage it, so it has a slightly higher cost, but the advice has been, get it to the place you're going to store it. Store it in a unified format. You get a halo effect when you have a unified format, and I think the industry is coalescing around... I'd probably say ParK's in the lead right now, but once ParK can be read by, let's take Amazon for instance, can be read by Athena, can be read by Redshift Spectrum, it can be read by their EMR, now you have this halo effect where your data's always there, always available to be consumed by a tool or a technology that can then deliver it to your end users. >> So when we talk about ParK, we're talking about columnar serialization format, >> Matthew: Yes. but there's more on top of that that needs to be layered, so that you can, as we were talking about earlier, combine the experience of a data warehouse, and the curated >> Absolutely data access where there's guard rails, >> Matthew: Yes >> and it's simple, versus sort of the wild west, but where I capture everything in a data lake. How do you bring those two together? >> Well, specifically for AtScale, we allow you to integrate multiple data access tools in AtScale, and then we use the appropriate tool to access the data for the use case. So let me give you an example, in the Amazon case, Redshift is wonderful for accessing interactive data, which BI users want, right? They want fast queries, sub-second queries. They don't want to pay to have all the raw data necessarily stored in Redshift 'cause that's pretty expensive. So they have this Redshift spectrum, it's sitting in S3, that's cost effective. So when we go and we read raw data to build these summary tables, to deliver the data fast, we can read from Spectrum, we can put it all together, drop it into Redshift, a much smaller volume of data, so it has faster characteristics for being accessed. And it delivers it to the user that way. We do that in Hadoop when we access via Hive for building aggregate tables, but Spark or Impala, is a much faster interactive engine, so we use those. As I step back and look at this, I think the Data Lake 2.0, from a technical perspective is about abstraction, and abstraction's sort of what separates us from the animals, right? It's a concept where we can pack a lot of sophistication and complexity behind an interface that allows people to just do what they want to do. You don't know how, or maybe you do know how a car engine works, I don't really, kind of, a little bit, but I do know how to press the gas pedal and steer. >> Right. >> I don't need to know these things, and I think the Data Lake 2.0 is about, well I don't need to know how Century, or Ranger, or Atlas, or any of these technologies work. I need to know that they're there, and when I access data, they're going to be applied to that data, and they're going to deliver me the stuff that I have access to and that I can see. >> So a couple things, it sounded like I was hearing abstraction, and you said really that's kind of the key, that sounds like a differentiator for AtScale, is giving customers that abstraction they need. But I'm also curious from a data value perspective, you talked about in Redshift from an expense perspective. Do you also help customers gain abstraction by helping them evaluate value of data and where they ought to keep it, and then you give them access to it? Or is that something that they need to do, kind of bring to the table? >> We don't really care, necessarily, about the source of the data, as long as it can be expressed in a way that can be accessed by whatever engine it is. Lift and shift is an example. There's a big move to move from Teradata or from Netezza into a Cloud-based offering. People want to lift it and shift it. It's the easiest way to do this. Same table definitions, but that's not optimized necessarily for the underlying data store. Take BigQuery for example, BigQuery's an amazing piece of technology. I think there's nothing like it out there in the market today, but if you really want BigQuery to be cost-effective, and perform and scale up to concurrency of... one of our customers is going to roll out about 8,000 users on this. You have to do things in BigQuery that are BigQuery-friendly. The data structures, the way that you store the data, repeated values, those sorts of things need to be taken into consideration when you build your schema out for consumption. With AtScale they don't need to think about that, they don't need to worry about it, we do it for them. They drop the schema in the same way that it exists on their current technology, and then behind the scenes, what we're doing is we're looking at signals, we're looking at queries, we're looking at all the different ways that people access the data naturally, and then we restructure those summary tables using algorithms and statistics, and I think people would broadly call it ML type approaches, to build out something that answers those questions, and adapts over time to new questions, and new use cases. So it's really about, imagine you had the best data engineering team in the world, in a box, they're never tired, they never stop, and they're always interacting with what the customers really want, which is "Now I want to look at the data this way". >> It's sounds actually like what your talking about is you have a whole set of sources, and targets, and you understand how they operate, but why I say you, I mean your software. And so that you can take data from wherever it's coming in, and then you apply, if it's machine learning or whatever other capabilities to learn from the access methods, how to optimize that data for that engine. >> Matthew: Exactly. >> And then the end users have an optimal experience and it's almost like the data migration service that Amazon has, it's like, you give us your Postgres or Oracle database, and we'll migrate it to the cloud. It sounds like you add a lot of intelligence to that process for decision support workloads. >> Yes. >> And figure out, so now you're going to... It's not Postgres to Postgres, but it might be Teradata to Redshift, or S3, that's going to be accessed by Athena or Redshift, and then let's put that in the right format. >> I think you sort of hit something that we've noticed is very powerful, which is if you can set up, and we've done this with a number of customers, if you can set up at the abstraction layer that is AtScale, on your on-prem data, literally in, say hours, you can move it into the Cloud, obviously you have to write the detail to move it into the Cloud, but once it's in the Cloud you take the same AtScale instance, you re-point it at that new data source, and it works. We've done that with multiple customers, and it's fast and effective, and it let's you actually try out things that you may not have the agility to do before because there's differences in how the SQL dialects work, there's differences in, potentially, how the schema might be built. >> So a couple things I'm interested in, I'm hearing two A-words, that abstraction that we've talked about a number of times, you also mention adaptability. So when you're talking with customers, what are some of the key business outcomes they need to drive, where adaptability and abstraction are concerned, in terms of like cost reduction, revenue generation. What are some of those see-swee business objectives that AtScale can help companies achieve? >> So looking at, say, a customer, a large retailer on the East Coast, everybody knows the stores, they're everywhere, they sell hardware. they have a 20-terabyte cube that they use for day-to-day revenue analytics. So they do period over period analysis. When they're looking at stores, they're looking at things like, we just tried out a new marketing approach... I was talking to somebody there last week about how they have these special stores where they completely redo one area and just see how that works. They have to be able to look at those analytics, and they run those for a short amount of time. So if you're window for getting data, refreshing data, building cubes, which in the old world could take a week, you know my co-founder at Yahoo, he had a week and a half build time. That data is now two weeks old, maybe three weeks old. There might be bugs in it-- >> And the relevance might be, pshh... >> And the relevance goes down, or you can't react as fast. I've been at companies where... Speed is so important these days, and the new companies that are grasping data aggressively, putting it somewhere where they can make decisions on it on a day-to-day basis, they're winning. And they're spending... I was at a company that was spending three million dollars on pay-per-click data, a month. If you can't get data everyday, you're on the wrong campaigns, and everything goes off the rails, and you only learn about it a week later, that's 25% of your spend, right there, gone. >> So the biggest thing, sorry George, it really sounds to me like what AtScale can facilitate for probably customers in any industry is the ability to truly make data-driven business decisions that can really directly affect revenue and profit. >> Yes, and in an agile format. So, you can build-- >> That's the third A; agile, adaptability, abstraction. >> There ya go, the three A's. (Lisa laughs) We had the three V's, now we have the three A's. >> Yes. >> The fact that you're building a curated model, so in retail the calendars are complex. I'm sure everybody that uses Tableau is good at analyzing data, but they might not know what your rules are around your financial calendar, or around the hierarchies of your product. There's a lot of things that happen where you want an enterprise group of data modelers to build it, bless it, and roll it out, but then you're a user, and you say, wait, you forgot x, y, and z, I don't want to wait a week, I don't want to wait two weeks, three weeks, a month, maybe more. I want that data to be available in the model an hour later 'cause that's what I get with Tableau today. And that's where we've taken the two approaches of enterprise analytics and self-service, and tried to create a scenario where you get the best of both worlds. >> So, we know that an implication of what you're telling us is that insights are perishable, and latency is becoming more and more critical. How do you plan to work with streaming data where you've got a historical archive, but you've got fresh data coming in? But fresh could mean a variety of things. Tell us what some of those scenarios look like. >> Absolutely, I think there's two approaches to this problem, and I'm seeing both used in practice, and I'm not exactly sure, although I have some theories on which one's going to win. In one case, you are streaming everything into, sort of a... like I talked about, this data lake, S3, and you're putting it in a format like ParK, and then people are accessing it. The other way is access the data where it is. Maybe it's already in, this is a common BI scenario, you have a big data store, and then you have a dimensional data store, like Oracle has your customers, Hadoop has machine data about those customers accessing on their mobile devices or something. If there was some way to access those data without having to move the Oracle stuff into the big data store, that's a Federation story that I think we've talked about in the Bay Area for a long time, or around the world for a long time. I think we're getting closer to understanding how we can do that in practice, and have it be tenable. You don't move the big data around, you move the small data around. For data coming in from outside sources it's probably a little bit more difficult, but it is kind of a degenerate version of the same story. I would say that streaming is gaining a lot of momentum, and with what we do, we're always mapping, because of the governance piece that we've built into the product, we're always mapping where did the data come from, where did it land, and how did we use it to build summary tables. So if we build five summary tables, 'cause we're answering different types of questions, we still need to know that it goes back to this piece of data, which has these security constraints, and these audit requirements, and we always track it back to that, and we always apply those to our derived data. So when you're accessing this automatically ETLed summary tables, it just works the way it is. So I think that there are two ways that this is going to expand and I'm excited about Federation because I think the time has come. I'm also excited about streaming. I think they can serve two different use cases, and I don't actually know what the answer will be, because I've seen both in customers, it's some of the biggest customers we have. >> Well Matthew thank you so much for stopping by, and four A's, AtScale can facilitate abstraction, adaptability, and agility. >> Yes. Hashtag four A's. >> There we go. I don't even want credit for that. (laughs) >> Oh wow, I'm going to get five more followers, I know it! (George laughs) >> There ya go! >> We want to thank you for watching theCUBE, I am Lisa Martin, we are live in San Jose, at our event Big Data SV, I'm with George Gilbert. Stick around, we'll be back with our next guest after a short break. (techno music)

Published Date : Mar 7 2018

SUMMARY :

Brought to you by SiliconANGLE Media, We are down the street from the Strata Data Conference. Thanks for having me. because it can spawn the genesis that is the crown jewels of your business. So if we take these, that can then deliver it to your end users. and the curated and it's simple, versus sort of the wild west, And it delivers it to the user that way. and they're going to deliver me the stuff and then you give them access to it? The data structures, the way that you store the data, And so that you can take data and it's almost like the data migration service but it might be Teradata to Redshift, and it let's you actually try out things they need to drive, and just see how that works. And the relevance goes down, or you can't react as fast. is the ability to truly make data-driven business decisions Yes, and in an agile format. We had the three V's, now we have the three A's. where you get the best of both worlds. How do you plan to work with streaming data and then you have a dimensional data store, and four A's, AtScale can facilitate abstraction, Yes. I don't even want credit for that. We want to thank you for watching theCUBE,

ENTITIES

Entity	Category	Confidence
Matthew	PERSON	0.99+
George Gilbert	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Matthew Baird	PERSON	0.99+
George	PERSON	0.99+
San Jose	LOCATION	0.99+
Yahoo	ORGANIZATION	0.99+
three weeks	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
25%	QUANTITY	0.99+
Gardner	PERSON	0.99+
two approaches	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
two weeks	QUANTITY	0.99+
Redshift	TITLE	0.99+
S3	TITLE	0.99+
three million dollars	QUANTITY	0.99+
two ways	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
one case	QUANTITY	0.99+
85%	QUANTITY	0.99+
last week	DATE	0.99+
a month	QUANTITY	0.99+
Century	ORGANIZATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
a week	QUANTITY	0.99+
BigQuery	TITLE	0.99+
both	QUANTITY	0.99+
20-terabyte	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
a week and a half	QUANTITY	0.99+
a week later	DATE	0.99+
Data Lake 2.0	COMMERCIAL_ITEM	0.99+
two	QUANTITY	0.99+
tomorrow morning	DATE	0.99+
AtScale	ORGANIZATION	0.99+
Atlas	ORGANIZATION	0.99+
Bay Area	LOCATION	0.98+
Lisa	PERSON	0.98+
ParK	TITLE	0.98+
Tableau	TITLE	0.98+
five more followers	QUANTITY	0.98+
an hour later	DATE	0.98+
Ranger	ORGANIZATION	0.98+
Netezza	ORGANIZATION	0.98+
tonight	DATE	0.97+
today	DATE	0.97+
both worlds	QUANTITY	0.97+
about 8,000 users	QUANTITY	0.97+
theCUBE	ORGANIZATION	0.97+
Strata Data Conference	EVENT	0.97+
one	QUANTITY	0.97+
Big Data SV 2018	EVENT	0.97+
Teradata	ORGANIZATION	0.96+
AtScale	TITLE	0.96+
Big Data SV	EVENT	0.93+
East Coast	LOCATION	0.93+
Hadoop	TITLE	0.92+
two different use cases	QUANTITY	0.92+
day one	QUANTITY	0.91+
one area	QUANTITY	0.91+

Yaron Haviv, iguazio | BigData NYC 2017

>> Announcer: Live from midtown Manhattan, it's theCUBE, covering BigData New York City 2017, brought to you by SiliconANGLE Media and its ecosystem sponsors. >> Okay, welcome back everyone, we're live in New York City, this is theCUBE's coverage of BigData NYC, this is our own event for five years now we've been running it, been at Hadoop World since 2010, it's our eighth year covering the Hadoop World which has evolved into Strata Conference, Strata Hadoop, now called Strata Data, and of course it's bigger than just Strata, it's about big data in NYC, a lot of big players here inside theCUBE, thought leaders, entrepreneurs, and great guests. I'm John Furrier, the cohost this week with Jim Kobielus, who's the lead analyst on our BigData and our Wikibon team. Our next guest is Yaron Haviv, who's with iguazio, he's the founder and CTO, hot startup here at the show, making a lot of waves on their new platform. Welcome to theCUBE, good to see you again, congratulations. >> Yes, thanks, thanks very much. We're happy to be here again. >> You're known in the theCUBE community as the guy on Twitter who's always pinging me and Dave and team, saying, "Hey, you know, you guys got to "get that right." You really are one of the smartest guys on the network in our community, you're super-smart, your team has got great tech chops, and in the middle of all that is the hottest market which is cloud native, cloud native as it relates to the integration of how apps are being built, and essentially new ways of engineering around these solutions, not just repackaging old stuff, it's really about putting things in a true cloud environment, with an application development, with data at the center of it, you got a whole complex platform you've introduced. So really, really want to dig into this. So before we get into some of my pointed questions I know Jim's got a ton of questions, is give us an update on what's going on so you guys got some news here at the show, let's get to that first. >> So since the last time we spoke, we had tons of news. We're making revenues, we have customers, we've just recently GA'ed, we recently got significant investment from major investors, we raised about $33 million recently from companies like Verizon Ventures, Bosch, you know for IoT, Chicago Mercantile Exchange, which is Dow Jones and other properties, Dell EMC. So pretty broad. >> John: So customers, pretty much. >> Yeah, so that's the interesting thing. Usually you know investors are sort of strategic investors or partners or potential buyers, but here it's essentially our customers that it's so strategic to the business, we want to... >> Let's go with GA of the projects, just get into what's shipping, what's available, what's the general availability, what are you now offering? >> So iguazio is trying to, you know, you alluded to cloud native and all that. Usually when you go to events like Strata and BigData it's nothing to do with cloud native, a lot of hard labor, not really continuous development and integration, it's like continuous hard work, it's continuous hard work. And essentially what we did, we created a data platform which is extremely fast and integrated, you know has all the different forms of states, streaming and events and documents and tables and all that, into a very unique architecture, won't dive into that today. And on top of it we've integrated cloud services like Kubernetes and serverless functionality and others, so we can essentially create a hybrid cloud. So some of our customers they even deploy portions as an Opix-based settings in the cloud, and some portions in the edge or in the enterprise deployed the software, or even a prepackaged appliance. So we're the only ones that provide a full hybrid experience. >> John: Is this a SAS product? >> So it's a software stack, and it could be delivered in three different options. One, if you don't want to mess with the hardware, you can just rent it, and it's deployed in Equanix facility, we have very strong partnerships with them globally. If you want to have something on-prem, you can get a software reference architecture, you go and deploy it. If you're a telco or an IoT player that wants a manufacturing facility, we have a very small 2U box, four servers, four GPUs, all the analytics tech you could think of. You just put it in the factory instead of like two racks of Hadoop. >> So you're not general purpose, you're just whatever the customer wants to deploy the stack, their flexibility is on them. >> Yeah. Now it is an appliance >> You have a hosting solution? >> It is an appliance even when you deploy it on-prem, it's a bunch of Docker containers inside that you don't even touch them, you don't SSH to the machine. You have APIs and you have UIs, and just like the cloud experience when you go to Amazon, you don't open the Kimono, you know, you just use it. So our experience that's what we're telling customers. No root access problems, no security problems. It's a hardened system. Give us servers, we'll deploy it, and you go through consoles and UIs, >> You don't host anything for anyone? >> We host for some customers, including >> So you do whatever the customer was interested in doing? >> Yes. (laughs) >> So you're flexible, okay. >> We just want to make money. >> You're pretty good, sticking to the product. So on the GA, so here essentially the big data world you mentioned that there's data layers, like data piece. So I got to ask you the question, so pretend I'm an idiot for a second, right. >> Yaron: Okay. >> Okay, yeah. >> No, you're a smart guy. >> What problem are you solving. So we'll just go to the simple. I love what you're doing, I assume you guys are super-smart, which I can say you are, but what's the problem you're solving, what's in it for me? >> Okay, so there are two problems. One is the challenge everyone wants to transform. You know there is this digital transformation mantra. And it means essentially two things. One is, I want to automate my operation environment so I can cut costs and be more competitive. The other one is I want to improve my customer engagement. You know, I want to do mobile apps which are smarter, you know get more direct content to the user, get more targeted functionality, et cetera. These are the two key challenges for every business, any industry, okay? So they go and they deploy Hadoop and Hive and all that stuff, and it takes them two years to productize it. And then they get to the data science bit. And by the time they finished they understand that this Hadoop thing can only do one thing. It's queries, and reporting and BI, and data warehousing. How do you do actionable insights from that stuff, okay? 'Cause actionable insights means I get information from the mobile app, and then I translate it into some action. I have to enrich the vectors, the machine learning, all that details. And then I need to respond. Hadoop doesn't know how to do it. So the first generation is people that pulled a lot of stuff into data lake, and started querying it and generating reports. And the boss said >> Low cost data link basically, was what you say. >> Yes, and the boss said, "Okay, what are we going to do with this report? "Is it generating any revenue to the business?" No. The only revenue generation if you take this data >> You're fired, exactly. >> No, not all fired, but now >> John: Look at the budget >> Now they're starting to buy our stuff. So now the point is okay, how can I put all this data, and in the same time generate actions, and also deal with the production aspects of, I want to develop in a beta phase, I want to promote it into production. That's cloud native architectures, okay? Hadoop is not cloud, How do I take a Spark, Zeppelin, you know, a notebook and I turn it into production? There's no way to do that. >> By the way, depending on which cloud you go to, they have a different mechanism and elements for each cloud. >> Yeah, so the cloud providers do address that because they are selling the package, >> Expands all the clouds, yeah. >> Yeah, so cloud providers are starting to have their own offerings which are all proprietary around this is how you would, you know, forget about HDFS, we'll have S3, and we'll have Redshift for you, and we'll have Athena, and again you're starting to consume that into a service. Still doesn't address the continuous analytics challenge that people have. And if you're looking at what we've done with Grab, which is amazing, they started with using Amazon services, S3, Redshift, you know, Kinesis, all that stuff, and it took them about two hours to generate the insights. Now the problem is they want to do driver incentives in real time. So they want to incent the driver to go and make more rides or other things, so they have to analyze the event of the location of the driver, the event of the location of the customers, and just throwing messages back based on analytics. So that's real time analytics, and that's not something that you can do >> They got to build that from scratch right away. I mean they can't do that with the existing. >> No, and Uber invested tons of energy around that and they don't get the same functionality. Another unique feature that we talk about in our PR >> This is for the use case you're talking about, this is the Grab, which is the car >> Grab is the number one ride-sharing in Asia, which is bigger than Uber in Asia, and they're using our platform. By the way, even Uber doesn't really use Hadoop, they use MemSQL for that stuff, so it's not really using open source and all that. But the point is for example, with Uber, when you have a, when they monetize the rides, they do it just based on demand, okay. And with Grab, now what they do, because of the capability that we can intersect tons of data in real time, they can also look at the weather, was there a terror attack or something like that. They don't want to raise the price >> A lot of other data points, could be traffic >> They don't want to raise the price if there was a problem, you know, and all the customers get aggravated. This is actually intersecting data in real time, and no one today can do that in real time beyond what we can do. >> A lot of people have semantic problems with real time, they don't even know what they mean by real time. >> Yaron: Yes. >> The data could be a week old, but they can get it to them in real time. >> But every decision, if you think if you generalize round the problem, okay, and we have slides on that that I explain to customers. Every time I run analytics, I need to look at four types of data. The context, the event, okay, what happened, okay. The second type of data is the previous state. Like I have a car, was it up or down or what's the previous state of that element? The third element is the time aggregation, like, what happened in the last hour, the average temperature, the average, you know, ticker price for the stock, et cetera, okay? And the fourth thing is enriched data, like I have a car ID, but what's the make, what's the model, who's driving it right now. That's secondary data. So every time I run a machine learning task or any decision I have to collect all those four types of data into one vector, it's called feature vector, and take a decision on that. You take Kafka, it's only the event part, okay, you take MemSQL, it's only the state part, you take Hadoop it's only like historical stuff. How do you assemble and stitch a feature vector. >> Well you talked about complex machine learning pipeline, so clearly, you're talking about a hybrid >> It's a prediction. And actions based on just dumb things, like the car broke and I need to send a garage, I don't need machine learning for that. >> So within your environment then, do you enable the machine learning models to execute across the different data platforms, of which this hybrid environment is composed, and then do you aggregate the results of those models, runs into some larger model that drives the real time decision? >> In our solution, everything is a document, so even a picture is a document, a lot of things. So you can essentially throw in a picture, run tensor flow, embed more features into the document, and then query those features on another platform. So that's really what makes this continuous analytics extremely flexible, so that's what we give customers. The first thing is simplicity. They can now build applications, you know we have tier one now, automotive customer, CIO coming, meeting us. So you know when I have a project, one year, I need to have hired dozens of people, it's hugely complex, you know. Tell us what's the use case, and we'll build a prototype. >> John: All right, well I'm going to >> One week, we gave them a prototype, and he was amazed how in one week we created an application that analyzed all the streams from the data from the cars, did enrichment, did machine learning, and provided predictions. >> Well we're going to have to come in and test you on this, because I'm skeptical, but here's why. >> Everyone is. >> We'll get to that, I mean I'm probably not skeptical but I kind of am because the history is pretty clear. If you look at some of the big ideas out there, like OpenStack. I mean that thing just morphed into a beast. Hadoop was a cost of ownership nightmare as you mentioned early on. So people have been conceptually correct on what they were trying to do, but trying to get it done was always hard, and then it took a long time to kind of figure out the operational model. So how are you different, if I'm going to play the skeptic here? You know, I've heard this before. How are you different than say OpenStack or Hadoop Clusters, 'cause that was a nightmare, cost of ownership, I couldn't get the type of value I needed, lost my budget. Why aren't you the same? >> Okay, that's interesting. I don't know if you know but I ran a lot of development for OpenStack when I was in Matinox and Hadoop, so I patched a lot of those >> So do you agree with what I said? That that was a problem? >> They are extremely complex, yes. And I think one of the things that first OpenStack tried to bite on too much, and it's sort of a huge tent, everyone tries to push his agenda. OpenStack is still an infrastructure layer, okay. And also Hadoop is sort of a something in between an infrastructure and an application layer, but it was designed 10 years ago, where the problem that Hadoop tried to solve is how do you do web ranking, okay, on tons of batch data. And then the ecosystem evolved into real time, and streaming and machine learning. >> A data warehousing alternative or whatever. >> So it doesn't fit the original model of batch processing, 'cause if an event comes from the car or an IoT device, and you have to do something with it, you need a table with an index. You can't just go and build a huge Parquet file. >> You know, you're talking about complexity >> John: That's why he's different. >> Go ahead. >> So what we've done with our team, after knowing OpenStack and all those >> John: All the scar tissue. >> And all the scar tissues, and my role was also working with all the cloud service providers, so I know their internal architecture, and I worked on SAP HANA and Exodata and all those things, so we learned from the bad experiences, said let's forget about the lower layers, which is what OpenStack is trying to provide, provide you infrastructure as a service. Let's focus on the application, and build from the application all the way to the flash, and the CPU instruction set, and the adapters and the networking, okay. That's what's different. So what we provide is an application and service experience. We don't provide infrastructure. If you go buy VMware and Nutanix, all those offerings, you get infrastructure. Now you go and build with the dozen of dev ops guys all the stack above. You go to Amazon, you get services. Just they're not the most optimized in terms of the implementation because they also have dozens of independent projects that each one takes a VM and starts writing some >> But they're still a good service, but you got to put it together. >> Yeah right. But also the way they implement, because in order for them to scale is that they have a common layer, they found VMs, and then they're starting to build up applications so it's inefficient. And also a lot of it is built on 10-year-old baseline architecture. We've designed it for a very modern architecture, it's all parallel CPUs with 30 cores, you know, flash and NVMe. And so we've avoided a lot of the hardware challenges, and serialization, and just provide and abstraction layer pretty much like a cloud on top. >> Now in terms of abstraction layers in the cloud, they're efficient, and provide a simplification experience for developers. Serverless computing is up and coming, it's an important approach, of course we have the public clouds from AWS and Google and IBM and Microsoft. There are a growing range of serverless computing frameworks for prem-based deployment. I believe you are behind one. Can you talk about what you're doing at iguazio on serverless frameworks for on-prem or public? >> Yes, it's the first time I'm very active in CNC after Cloud Native Foundation. I'm one of the authors of the serverless white paper, which tries to normalize the definitions of all the vendors and come with a proposal for interoperable standard. So I spent a lot of energy on that, 'cause we don't want to lock customers to an API. What's unique, by the way, about our solution, we don't have a single proprietary API. We just emulate all the other guys' stuff. We have all the Amazon APIs for data services, like Kinesis, Dynamo, S3, et cetera. We have the open source APIs, like Kafka. So also on the serverless, my agenda is trying to promote that if I'm writing to Azure or AWS or iguazio, I don't need to change my app. I can use any developer tools. So that's my effort there. And we recently, a few weeks ago, we launched our open source project, which is a sort of second generation of something we had before called Nuclio. It's designed for real time >> John: How do you spell that? >> N-U-C-L-I-O. I even have the logo >> He's got a nice slick here. >> It's really fast because it's >> John: Nuclio, so that's open source that you guys just sponsor and it's all code out in the open? >> All the code is in the open, pretty cool, has a lot of innovative ideas on how to do stream processing and best, 'cause the original serverless functionality was designed around web hooks and HTTP, and even many of the open source projects are really designed around HTTP serving. >> I have a question. I'm doing research for Wikibon on the area of serverless, in fact we've recently published a report on serverless, and in terms of hybrid cloud environments, I'm not seeing yet any hybrid serverless clouds that involve public, you know, serverless like AWS Lambda, and private on-prem deployment of serverless. Do you have any customers who are doing that or interested in hybridizing serverless across public and private? >> Of course, and we have some patents I don't want to go into, but the general idea is, what we've done in Nuclio is also the decoupling of the data from the computation, which means that things can sort of be disjoined. You can run a function in Raspberry Pi, and the data will be in a different place, and those things can sort of move, okay. >> So the persistence has to happen outside the serverless environment, like in the application itself? >> Outside of the function, the function acts as the persistent layer through APIs, okay. And how this data persistence is materialized, that server separate thing. So you can actually write the same function that will run against Kafka or Kinesis or Private MQ, or HTTP without modifying the function, and ad hoc, through what we call function bindings, you define what's going to be the thing driving the data, or storing the data. So that can actually write the same function that does ETL drop from table one to table two. You don't need to put the table information in the function, which is not the thing that Lambda does. And it's about a hundred times faster than Lambda, we do 400,000 events per second in Nuclio. So if you write your serverless code in Nuclio, it's faster than writing it yourself, because of all those low-level optimizations. >> Yaron, thanks for coming on theCUBE. We want to do a deeper dive, love to have you out in Palo Alto next time you're in town. Let us know when you're in Silicon Valley for sure, we'll make sure we get you on camera for multiple sessions. >> And more information re:Invent. >> Go to re:Invent. We're looking forward to seeing you there. Love the continuous analytics message, I think continuous integration is going through a massive renaissance right now, you're starting to see new approaches, and I think things that you're doing is exactly along the lines of what the world wants, which is alternatives, innovation, and thanks for sharing on theCUBE. >> Great. >> That's very great. >> This is theCUBE coverage of the hot startups here at BigData NYC, live coverage from New York, after this short break. I'm John Furrier, Jim Kobielus, after this short break.

Published Date : Sep 27 2017

SUMMARY :

brought to you by SiliconANGLE Media I'm John Furrier, the cohost this week with Jim Kobielus, We're happy to be here again. and in the middle of all that is the hottest market So since the last time we spoke, we had tons of news. Yeah, so that's the interesting thing. and some portions in the edge or in the enterprise all the analytics tech you could think of. So you're not general purpose, you're just Now it is an appliance and just like the cloud experience when you go to Amazon, So I got to ask you the question, which I can say you are, So the first generation is people that basically, was what you say. Yes, and the boss said, and in the same time generate actions, By the way, depending on which cloud you go to, and that's not something that you can do I mean they can't do that with the existing. and they don't get the same functionality. because of the capability that we can intersect and all the customers get aggravated. A lot of people have semantic problems with real time, but they can get it to them in real time. the average temperature, the average, you know, like the car broke and I need to send a garage, So you know when I have a project, an application that analyzed all the streams from the data Well we're going to have to come in and test you on this, but I kind of am because the history is pretty clear. I don't know if you know but I ran a lot of development is how do you do web ranking, okay, and you have to do something with it, and build from the application all the way to the flash, but you got to put it together. it's all parallel CPUs with 30 cores, you know, Now in terms of abstraction layers in the cloud, So also on the serverless, my agenda is trying to promote I even have the logo and even many of the open source projects on the area of serverless, in fact we've recently and the data will be in a different place, So if you write your serverless code in Nuclio, We want to do a deeper dive, love to have you is exactly along the lines of what the world wants, I'm John Furrier, Jim Kobielus, after this short break.

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Bosch	ORGANIZATION	0.99+
Uber	ORGANIZATION	0.99+
John	PERSON	0.99+
John Furrier	PERSON	0.99+
Verizon Ventures	ORGANIZATION	0.99+
Yaron Haviv	PERSON	0.99+
Asia	LOCATION	0.99+
NYC	LOCATION	0.99+
Google	ORGANIZATION	0.99+
New York City	LOCATION	0.99+
Jim	PERSON	0.99+
Palo Alto	LOCATION	0.99+
30 cores	QUANTITY	0.99+
New York	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
two years	QUANTITY	0.99+
BigData	ORGANIZATION	0.99+
Silicon Valley	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
five years	QUANTITY	0.99+
two problems	QUANTITY	0.99+
Dell EMC	ORGANIZATION	0.99+
Yaron	PERSON	0.99+
One	QUANTITY	0.99+
Dave	PERSON	0.99+
Kafka	TITLE	0.99+
third element	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Dow Jones	ORGANIZATION	0.99+
two things	QUANTITY	0.99+
two racks	QUANTITY	0.99+
today	DATE	0.99+
Grab	ORGANIZATION	0.99+
Nuclio	TITLE	0.99+
two key challenges	QUANTITY	0.99+
Cloud Native Foundation	ORGANIZATION	0.99+
about $33 million	QUANTITY	0.99+
eighth year	QUANTITY	0.99+
Hadoop	TITLE	0.98+
second type	QUANTITY	0.98+
Lambda	TITLE	0.98+
10 years ago	DATE	0.98+
each cloud	QUANTITY	0.98+
Strata Conference	EVENT	0.98+
Equanix	LOCATION	0.98+
10-year-old	QUANTITY	0.98+
first thing	QUANTITY	0.98+
first generation	QUANTITY	0.98+
one	QUANTITY	0.98+
second generation	QUANTITY	0.98+
Hadoop World	EVENT	0.98+
first time	QUANTITY	0.98+
theCUBE	ORGANIZATION	0.97+
Nutanix	ORGANIZATION	0.97+
MemSQL	TITLE	0.97+
each one	QUANTITY	0.97+
2010	DATE	0.97+
Kinesis	TITLE	0.97+
SAS	ORGANIZATION	0.96+
Wikibon	ORGANIZATION	0.96+
Chicago Mercantile Exchange	ORGANIZATION	0.96+
about two hours	QUANTITY	0.96+
this week	DATE	0.96+
one thing	QUANTITY	0.95+
dozen	QUANTITY	0.95+

Tim Smith, AppNexus | BigData NYC 2017

>> Announcer: Live, from Midtown Manhattan, it's theCUBE. Covering Big Data, New York City, 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. >> Okay welcome back, everyone. Live in Manhattan, New York City, in Hell's Kitchen, this is theCUBE's special event, our annual CUBE-Wikibon Research Big Data event in Manhattan. Alongside Strata, Hadoop; formerly Hadoop World, now called Strata Data, as the world continues. This is our annual event; it's our fifth year here, sixth overall, wanted to kind of move from uptown. I'm John Furrier, the co-host of theCUBE, with Peter Burris, Head of Research at SiliconANGLE and GM of Wikibon Research. Our next guest is Tim Smith, who's the SVP of technical operations at AppNexus; technical operations for large scale is an understatement. But before we get going; Tim, just talk about what AppNexus as a company, what you guys do, what's the core business? >> Sure, AppNexus is the second largest digital advertising marketplace after google. We're an internet technology company that harnessed, we harness data and machine learning to power the companies that comprise the open internet. We began by building a powerful technology platform, in which we embedded core capabilities, tools and features. With me so far? >> Yeah, we got it. >> Okay, on top of that platform, we built a core suite of cloud-based enterprise products that enable the buying and selling of digital advertising, and a scale-transparent and low-cost marketplace where other companies can transact; either using our enterprise products, or those offered by other companies. If you want to hear a little about the daily peaks, peak feeds and speeds, it is Strata, we should probably talk about that. We do about 11.8 billion impressions transacted on a daily basis. Each of those is a real-time auction conducted in a fraction of a second, well under half a second. We see about 225 billion impressions per day, and we handle about 5 million queries per second at peak load. We produce about 150 terabytes of data each day, and we move about 400 gigabits into and out of the internet at peak, all those numbers are daily peaks. Makes sense? >> Yep. >> Okay, so by way of comparison, which might be useful for people, I believe the NYSE currently does roughly 2 million trades per day. So if we round that up to 3 million trades a day and assume the NYSE were to conduct that volume every single day of the year; 7 days a week, 365 days a year, that'd be about a billion trades a year. Similarly, I believe Visa did about 28-and-a-half billion transactions in their fiscal third quarter. I'll round that up to 30 billion, and average it out to about 333 million transactions per day and annualize it to about 4 billion transactions per year. Little bit of math, but as I mentioned, AppNexus does an excess of 10 billion transactions per day. And so it seems reasonable to say that AppNexus does roughly 10 times the transaction volume in one day, than the NYSE does in a year. And similarly, it seems reasonable to say that AppNexus daily does more than two times the transaction volume that Visa does in a year. Obviously, these are all just very rough numbers based on publicly available information about the NYSE and Visa, and both the NYSE and Visa do far, far more volume than AppNexus when measured in terms of dollars. So given our volumes, it's imperative that AppNexus does each transaction with the maximum efficiency and lowest reasonable possible cost, and that is one of the most challenging aspects of my job. >> So thanks for spending the time to give the overview. There's a lot of data; I mean 10 billion a day is massive volume. I mean the internet, and you see the scale, is insane. We're in a new era right now of web-scale. We've seen it in Facebook, and it's enormous. It's only going to get bigger, right? So on the online ad tech, you guys are essentially doing like a Google model, that's not everything but Google, which is still huge numbers. Then you include Microsoft and everybody else. Really heavy lifting, IT-like situation. What's the environment like? And just talk about, you know, what's it like for you guys. Because you got a lot of opp's, I mean terms of dev opp's. You can't break anything, because that 10 billion transaction or near, it's a significant impact. So you have to have everything buttoned-up super tight, yet you got to innovate and grow with the future growth. What's the IT environment like? >> It's interesting. We have about 8,000 servers spread across about seven data centers on three continents, and we run, as you mentioned, around the clock. There's no closing bell; downtime is not acceptable. So when you look at our environment, you're talking about four major categories of server complexes. We have real-time processing, which is the actual ad serving. We have a data pipeline, which is what we call our big data environment. We also have client-facing environment and an infrastructure environment. So we use a lot of different tools and applications, but I think the most relevant ones to this discussion are Hadoop and its friends HDFS, and Hive and Spark. And then we use the Vertica Analytics Platform. And together Hadoop and its friends, and Vertica comprise our entire data pipeline. They're both very disk-intensive. They're cluster based applications, and it's a lot of challenge to keep them up and running. >> So what are some of those challenges? Just explain a little bit, because you also have a lot of opportunity. I mean, it's money flowing through the air, basically; digital air, if you will. I mean, they got a lot of stuff happening. Take us through the challenges. >> You know, our biggest apps are all clustered. And all of our clusters are built with commodity servers, just like a lot of other environments. The big data app clusters traditionally have had internal disks, while almost all of our other servers are very light on disk. One of the biggest challenges is, since the server is the fundamental building block of a cluster, then regardless of whether you need more compute or more storage, you always have to add more servers to get it. That really limits flexibility and creates a lot of inefficiencies, and I really, really am obsessive about reducing and eliminating inefficiencies. So, with me so far? >> Yep. >> Great. The inefficiencies result from two major factors. First, not all workloads require the same ratio of compute to storage. Some workloads are more compute-intensive, and others are really less dependent on storage, while other workloads require a lot more storage. So we have to use standard server configurations and as a result, we wind up with underutilized compute and storage. This is undesirable, it's inefficient, yet given our scale, we have to use standardized configurations. So that's the first big challenge. The second is the compute to disk ratio. It's generally fixed when you buy the servers. Yes, we can certainly add more disks in the field, but that's a labor intensive, and it's complicated from a logistics and an asset management standpoint, and you're fundamentally limited by the number of disk slots in the server. So now you're right back into the trap of more storage requires more servers, regardless of whether you need more compute or not. And then you compound the inefficiencies. >> Couldn't you just move the resources from, unused resources, from one cluster to the other? >> I've been asked that a lot; and no, it's just not that simple. Each application cluster becomes a silo due to its configuration of storage and compute. This means you just can't move servers from clusters because the clusters are optimized for the workloads, and the fact that you can't move resources from one cluster to another, it's more inefficiencies. And then they're compounded over time since workloads change, and the ideal ratio of compute-to-storage changes. And the end result is unused resources trapped in silos and configurations that are no longer optimized for your workload. And there's only really one solution that we've been able to find. And to paraphrase an orator far, far more talented than I am, namely Ronald Reagan, we need to open this gate, tear down these silos. The silos just have to go away. They fundamentally limit flexibility and efficiency. >> What were some of the other issues caused by using servers with internal drives? >> You have more maintenance, you've got to deal with the logistics. But the biggest problem is service and storage have significantly different life cycles. Servers typically have a three year life cycle before they're obsolete. Storage typically is four to six years. You can sometimes stretch that a little further with the storage. Inside the servers that are replaced every 3 years, we end up replacing storage before the end of its effective lifetime; that's inefficient. Further, since the storage is inside the servers, we have to do massive data migrations when we replace servers. Migrations, they're time consuming, they're logistically difficult, and they're high risk. >> So how did DriveScale help you guys? Because you guys certainly have a challenging environment, you laid out the the story, and we appreciate that. How did DriveScale help you with the challenges? >> Well, what we really wanted to do was disaggregate storage from servers, and DriveScale enables us to do that. Disaggregating resources is a new term in the industry, but I think lot of people are focusing on it. I can explain it if you think that would make sense. >> What do you mean by disaggregating resources? Can you explain that, and how it works? >> Sure, so instead of buying servers with internal drives, we now buy diskless servers with JBODs. And DriveScale lets us easily compose servers with whatever amount of disk storage we need, from the server resource pool and the disk resource pool; and they're separate pools. This means we have the right balance of compute and storage for each workload, and we can easily adjust it over time. And all of this is done via software, so it's easy to do with a GUI or in our case, at our scale, scripting. And it's done on demand, and it's much more efficient. >> How does it help you with the underutilized resource challenge you mentioned earlier? >> Well, since we can add and remove resources from each cluster, we can manage exactly how much compute power and storage is deployed for each workload. Since this is all done via software, it can be done quickly and easily. We don't have to send a technician into a data center to physically swap drives, add drives, move drives. It's all done via software and it's very, very efficient. >> Can you move resources between silos? >> Well, yes and no. First off, our goal is no more silos. That said, we still have clusters, and once we completely migrate to DriveScale, all of our compute and storage resources will be consolidated into just a few common pools. And disk storage will no longer differentiate pools; thus, we have fewer pools. For more, we have fewer pools and can use the resources in each pool for more workloads. And when our needs change and they always do, we can reallocate resources as needed. >> What of the life cycle management challenge? How you guys address that? >> Well that's addressed with DriveScale. The compute and the storage are now disaggregated or separated into diskless servers and JBODs, so we can upgrade one without touching the other. We want to upgrade servers to take advantage of new processors or new memory architectures, we just replace the servers, re-combine the disks with the new servers, and we're back up and operating. It saves the cost of buying new disks when we don't need to, and it also simplifies logistics and reduces risk, as we no longer have to run the old plant and the new plant concurrently, and do a complicated data migration. >> What about this qualifying server and storage vendors? Do you still do that? Or how's that impact -- >> We actually don't have to do it. We're still using the same server vendor. We've used Dell for many, many years, we continue to use them. We are using them for storage and there was no real work, we just had to add DriveScale into the mix. >> What's it like working with DriveScale? >> They're really wonderful to work with. They have a really seasoned team. They were at Sun Microsystems and Cisco, they built some of the really foundational products that changed the internet, that the internet was built on. They're really talented, they really bright, and they're really focused on customer success. >> Great story, thanks for sharing that. My final question for you is, you guys have a very big, awesome environment, you've got a lot of scale there. It's great for a startup to get into an environment like this, because one, they could get access to the data, work with a good team like you have. What's it like working with a startup? >> You know it's always challenging at first; too many things to do. >> They got talented guys. Most of the startups, those early day startups, they got all their A players out there. >> They have their A players, and we've been very pleased working with them. We're dealing with the top talent, some of the top talent in the industry, that created the industry. They have a proven track record. We really don't have any concerns, we know they're committed to our success and they have a great team, and great investors. >> A final, final question. For your friends out there are watching, and other practitioners who are trying to run things at scale with a cloud. What's your advice to them? You've been operating at scale, and a lot of, billions of transactions, I mean huge; it's only going to get bigger. Put your IT friendly advice hat on. What's the mindset of operators out there, technical op's, as dev op's comes in seeing a lot of that. What do people need to be thinking about to run at scale? >> There's no magic silver bullet. There's no magic answers. The public cloud is very helpful in a lot of ways, but you really have to think hard about your economics, you have to think about your scale. You just have to be sure that you're going into each decision knowing that you've looked at the costs and the benefits, the performance, the risks, and you don't expect there to be simple answers. >> Yeah, there's no magic beans as they say. You've got to make it work for the business. >> No magic beans, I wish there were. >> Tim, thanks so much for the story. Appreciate the commentaries. Live coverage at Big Data NYC, it's theCUBE. Be back with more after this short break. (upbeat techno music)

Published Date : Sep 27 2017

SUMMARY :

Brought to you by SiliconANGLE Media and GM of Wikibon Research. Sure, AppNexus is the second largest of the internet at peak, all those numbers are daily peaks. and that is one of the most challenging aspects of my job. I mean the internet, and you see the scale, is insane. and we run, as you mentioned, around the clock. because you also have a lot of opportunity. One of the biggest challenges is, The second is the compute to disk ratio. and the fact that you can't move resources Further, since the storage is inside the servers, Because you guys certainly have a challenging environment, I can explain it if you think that would make sense. and we can easily adjust it over time. We don't have to send a technician into a data center and once we completely migrate to DriveScale, and the new plant concurrently, We actually don't have to do it. that changed the internet, that the internet was built on. you guys have a very big, awesome environment, You know it's always challenging at first; Most of the startups, those early day startups, that created the industry. What's the mindset of operators out there, and you don't expect there to be simple answers. You've got to make it work for the business. Tim, thanks so much for the story.

ENTITIES

Entity	Category	Confidence
NYSE	ORGANIZATION	0.99+
Cisco	ORGANIZATION	0.99+
Peter Burris	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Sun Microsystems	ORGANIZATION	0.99+
Tim Smith	PERSON	0.99+
four	QUANTITY	0.99+
Dell	ORGANIZATION	0.99+
Manhattan	LOCATION	0.99+
AppNexus	ORGANIZATION	0.99+
SiliconANGLE	ORGANIZATION	0.99+
Tim	PERSON	0.99+
Ronald Reagan	PERSON	0.99+
10 times	QUANTITY	0.99+
Visa	ORGANIZATION	0.99+
three year	QUANTITY	0.99+
one day	QUANTITY	0.99+
First	QUANTITY	0.99+
fifth year	QUANTITY	0.99+
second	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
each workload	QUANTITY	0.99+
One	QUANTITY	0.99+
each cluster	QUANTITY	0.99+
google	ORGANIZATION	0.99+
Wikibon Research	ORGANIZATION	0.99+
sixth	QUANTITY	0.99+
six years	QUANTITY	0.99+
one	QUANTITY	0.99+
each pool	QUANTITY	0.99+
Midtown Manhattan	LOCATION	0.99+
fiscal third quarter	DATE	0.99+
Each	QUANTITY	0.99+
7 days a week	QUANTITY	0.99+
one solution	QUANTITY	0.99+
each transaction	QUANTITY	0.98+
one cluster	QUANTITY	0.98+
365 days a year	QUANTITY	0.98+
Facebook	ORGANIZATION	0.98+
each day	QUANTITY	0.98+
a year	QUANTITY	0.98+
10 billion a day	QUANTITY	0.98+
Hell's Kitchen	LOCATION	0.98+
three continents	QUANTITY	0.98+
both	QUANTITY	0.98+
about 28-and-a-half billion transactions	QUANTITY	0.98+
about 150 terabytes	QUANTITY	0.97+
Manhattan, New York City	LOCATION	0.97+
more than two times	QUANTITY	0.97+
Big Data	ORGANIZATION	0.97+
New York City	LOCATION	0.97+
two major factors	QUANTITY	0.97+
about 11.8 billion impressions	QUANTITY	0.96+
about 8,000 servers	QUANTITY	0.96+
about 400 gigabits	QUANTITY	0.96+
Each application cluster	QUANTITY	0.96+
billions	QUANTITY	0.96+
up to 30 billion	QUANTITY	0.96+
NYC	LOCATION	0.95+
under half a second	QUANTITY	0.94+
Strata Data	EVENT	0.93+
each decision	QUANTITY	0.92+
SiliconANGLE Media	ORGANIZATION	0.92+
2017	DATE	0.91+
Vertica	ORGANIZATION	0.91+
about 4 billion transactions per year	QUANTITY	0.9+
Spark	TITLE	0.9+
theCUBE	ORGANIZATION	0.9+
about a billion trades a year	QUANTITY	0.9+
up to 3 million trades a day	QUANTITY	0.9+
10 billion transaction	QUANTITY	0.88+
DriveScale	ORGANIZATION	0.88+
about 333 million transactions per day	QUANTITY	0.87+
Hive	TITLE	0.87+
HDFS	TITLE	0.87+
CUBE-Wikibon Research Big Data	EVENT	0.86+
DriveScale	TITLE	0.86+
10 billion transactions per day	QUANTITY	0.86+
GM	PERSON	0.83+
2 million trades per day	QUANTITY	0.82+

Vikram Bhambri, Dell EMC - Dell EMC World 2017

>> Narrator: Live from Las Vegas, it's theCUBE. Covering Dell EMC World 2017, brought to you by Dell EMC. >> Okay, welcome back everyone, we are live in Las Vegas for Dell EMC World 2017. This is theCUBE's eighth year of coverage of what was once EMC World, now it's Dell EMC World 2017. I'm John Furrier at SiliconANGLE, and also my cohost from SiliconANGLE, Paul Gillin. Our next guest is Vikram Bhambri, who is the Vice President of Product Management at Dell EMC. Formally with Microsoft Azure, knows cloud, knows VIPRE, knows the management, knows storage up and down, the Emerging Technologies Group, formerly of EMC. Good to see you on theCUBE again. >> Good to see you guys again. >> Okay, so Elastic Compute, this is going to be the game changer. We're so excited about one of our favorite interviews was your colleague we had on earlier. Unstructured data, object store, is becoming super valuable. And it was once the throwaway, "Yeah, store, later late ". Now with absent data driven enterprises having access to data is the value proposition that they're all driving towards. >> Absolutely. >> Where are you guys with making that happen and bringing that data to life? >> So, when I think about object storage in general, people talk about it's the S3 protocol, or it's the object protocol versus the file protocol. I think the conversation is not about that. The conversation is about data of the universe is increasing and it's increasing tremendously. We're talking about 44 zettabytes of data by 2020. You need an easier way to consume, store, that data in a meaningful way, and not only just that but being able to derive meaningful insights out of that either when the data is coming in or when the data is stored on a periodic basis being able to drive value. So having access to the data at any point of time, anywhere, is the most important aspect of it. And with ECS we've been able to actually attack the market from both sides. Whether it's talking about moving data from higher cost storage arrays or higher performance tiers down to a more accessible, more cheap storage that is available geographically, that's one market. And then also you have tons of data that's available on the tape drive but that data is so difficult to access, so not available. And if you want to go put that tape back on a actual active system the turnaround time is so long. So being able to turn all of that storage into an active storage system that's accessible all the time is the real value proposition that we have to talk about. >> Well now help me understand this because we have all these different ways to make sense of unstructured data now. We have NoSQL databases, we have JSON, we have HDFS, and we've got object storage. Where does it fit into the hierarchy of making sense of unstructured data? >> The simplest way to think about it is we talk about a data ocean, with the amount of data that's growing. Having the capability to store data that is in a global content repository. That is accessible-- >> Meaning one massive repository. >> One massive repository. And not necessarily in one data center, right? It's spread across multiple data centers, it's accessible, available with a single, global namespace, regardless of whether you're trying to access data from location A or location B. But having that data be available through a single global namespace is the key value proposition that object storage brings to bear. The other part is the economics that we're able to provide consistently better than what the public clouds are able to offer. You're talking about anywhere between 30 to 48% cheaper TCO than what public clouds are able to offer, in your own data center with all the constraints that you want to like upload to it, whether it's regular environments. Whether you're talking about country specific clouds and such, that's where it fits well together. But, exposing that same data out whether through HDFS or a file is where ECS differentiated itself from other cloud platforms. Yes, you can go to a Hadoop cluster and do a separate data processing but then you're creating more copies of the same data that you have in your primary storage. So things like that essentially help position object as the global content repository where you can just dump and forget about, about the storage needs. >> Vikram I want to ask you about the elastic cloud storage, as you mentioned, ECS, it's been around for a couple of years. You just announced a ECS lesser cloud storage, dedicated cloud. Can you tell me what that is and more about that because some people think of elastic they think Amazon, "I'll just throw it in object storage in the cloud." What are you guys doing specifically 'cause you have this hybrid offering. >> Absolutely. >> What is this about, can you explain that? >> Yeah, so if you look at, there are two extremes, or two paradigms that people are attracted by. On one side you have public clouds which give you the ease of use, you just swipe your credit card and you're in business. You don't have to worry about the infrastructure, you don't have to worry about, like, "Where my data is going to be stored?" It's just there. And then on the other side you have regular environments or you just have environments where you cannot move to public clouds so customers end up put in ECS, or other object storage for that matter, though ECS is the best. >> John: Biased, but that's okay. >> Yeah, now we are starting to see customers they're saying, "Can I have the best of both worlds? "Can I have a situation where I like the ease of use "of the public cloud but I don't want to "be in a shared bathtub environment. "I don't want to be in a public cloud environment. "I like the privacy that you are able to provide me "with this ECS in my own data center "but I don't want to take on the infrastructure management." So for those customers we have launched ECS dedicated cloud service. And this is specifically targeted for scenarios where customers have maybe one data center, two data centers, but they want to use the full strength and the capabilities of ECS. So what we're telling them we will actually put their bought ECS in our data centers, ECS team will operate and manage that environment for the customer but they're the only dedicated customer on that cloud. So that means they have their own environment-- >> It's completely secure for their data. >> Vikram: Exactly. >> No multi tenant issues at all. >> No, and you can have either partial capabilities in our data center, or you can fully host in our data center. So you can do various permutation and combinations thus giving customers a lot of flexibility of starting with one point and moving to the other. Let's them start with a private cloud, they want to move to a hybrid version they can move that, or if they start from the hybrid and they want to go back to their own data centers they can do that as well. >> Let's change gears and talk about IoT. You guys had launched Project Nautilus, we also heard that from your boss earlier, two days ago. What is that about? Explain, specifically, what is Project Nautilus? >> So as I was mentioning earlier there is a whole universe of data that is now being generated by these IoT devices. Whether you're talking about connected cars, you're talking about wind sensors, you're talking about anything that collects a piece of data that needs to be not only stored but people want to do realtime analysis on that dataset. And today people end up using a combination of 10 different things. They're using Kafka, Speak, HDFS, Cassandra, DASH storage to build together a makeshift solution, that sort of works but doesn't really. Or you end up, like, if you're in the public cloud you'll end up using some implementation of Lambda Architecture. But the challenge there is you're storing same amount of data in a few different places, and not only that there is no consistent way of managing data, processing data that effectively. So what Project Nautilus is our attempt to essentially streamline all of that. Allow stream of data that's coming from these IoT devices to be processed realtime, or for batch, in the same solution. And then once you've done that processing you essentially push that data down to a tier, whether it's Isilon or ECS, depending on the use case that you are trying to do. So it simplifies the whole story on realtime analytics and you don't want to do it in a closed source way. What we've done is we've created this new paradigm, or new primitive called streaming storage, and we are open sourcing it, we are Project Pravega, which is in the Apache Foundation. We want the whole community, just like there is a common sense of awareness for object file we want to that same thing for streaming storage-- >> So you guys are active in open source. Explain quickly, many might not know that. Talk about that. >> So, yeah, as I mentioned Project Prevega is something we announced at Flink Forward Conference. It's a streaming storage layer which is completely open source in the Apache Foundation and we just open sourced it today. And giving customers the capability to contribute code to it, take their version, or they can do whatever they want to do, like build additional innovation on top. And the goal is to make streaming storage just like a common paradigm like everything else. And in addition we're partnering with another open source component. There is a company called data Artisans based out of Berlin, Germany, and they have a project called Flink, and we're working with them pretty closely to bring Nautilus to fruition. >> theCUBE was there by the way, we covered Flink Forward again, one of the-- >> Paul: True streaming engine. >> Very good, very big open source project. >> Yeah, we we're talking with Jeff Woodrow earlier about software defined storage, self driving storage as he calls it. >> Where does ECS fit in the self driving storage? Is this an important part of what you're doing right now or is it a different use? >> Yeah, our vision right from the beginning itself was when we built this next generation of object storage system it has to be software first. Not only software first where a customer can choose their commodity hardware to bring to bear or we an supply the commodity hardware but over time build intelligence in that layer of software so that you can pull data off smartly to other, from SSDs to more SATA based drives. Or you can bring in smarts around metadata search capabilities that we've introduced recently. Because you have now billions of billions of records that are being stored on ECS. You want ease of search of what specifically you're looking for, so we introduced metadata search capability. So making the storage system and all of the data services that were usually outside of the platform, making them be part of the code platform itself. >> Are you working with Elasticsearch? >> Yes, we are using Elasticsearch more to enable customers who want to get insights about ECS itself. And Nautilus, of course, is also going to integrate with Elasticsearch as well. >> Vikram let's wrap this up. Thank you for coming on theCUBE. Bottom line, what's the bottom line message, quickly, summarize the value proposition, why customers should be using ECS, what's the big aha moment, what's the proposition? >> I would say the value proposition is very simple. Sometimes it can be like, people talk about lots of complex terms, it's very simple. Sustainably, low cost storage, for storing a wide variety of content in a global content repository is the key value proposition. >> And used for application developers to tap into? The whole dev ops, data as code, infrastructure as code movement. >> Yeah, you start, what we have seen in the majority of the used cases customers start with one used case of archiving. And then they very quickly realize that there's, it's like a Swiss Army knife. You start with archiving then you move on to application development, more modern applications, or in the cloud native applications development. And now with IoT and Nautilus being able to leverage data from these IoT devices onto these-- >> As I said two days ago, I think this is a huge, important area for agile developers. Having access to data in less than a hundred milliseconds, from any place in the world, is going to be table steaks. >> ECS has to be, or in general, object storage, has to be part of every important conversation that is happening about digital IT transformation. >> It sounds like eventually most of the data's going to end up there. >> Absolutely. >> Okay, so I'll put ya on the spot. When are we going to be seeing data in less than a hundred milliseconds from any database anywhere in the fabric of a company for a developer to call a data ocean and give me data back from any database, from any transaction in less than a hundred milliseconds? Can we do that today? >> We can do that today, it's available today. The challenge is how quickly enterprises are adopting the technology. >> John: So they got to architect it? >> Yeah. >> They have to architect it. >> Paul: If it's all of Isilon. >> They can pull it, they can cloud pull it down from Isilon to ECS. >> True. >> Yeah. >> Speed, low latency, is the key to success. Congratulations. >> Thank you so much. >> And I love this new object store, love this tier two value proposition. It's so much more compelling for developers, certainly in cloud native. >> Vikram: Absolutely. >> Vikram, here on theCUBE, bringing you more action from Las Vegas. We'll be right back as day three coverage continues here at Dell EMC World 2017. I'm John Furrier with Paul Gillan, we'll be right back.

Published Date : May 10 2017

SUMMARY :

brought to you by Dell EMC. Good to see you on theCUBE again. this is going to be the game changer. is the real value proposition that we have to talk about. Where does it fit into the hierarchy Having the capability to store data of the same data that you have in your primary storage. Vikram I want to ask you about the elastic cloud storage, And then on the other side you have regular environments "I like the privacy that you are able to provide me No, and you can have either partial capabilities What is that about? depending on the use case that you are trying to do. So you guys are active in open source. And the goal is to make streaming storage Yeah, we we're talking with Jeff Woodrow so that you can pull data off smartly to other, And Nautilus, of course, is also going to summarize the value proposition, of content in a global content repository is the key developers to tap into? You start with archiving then you move on from any place in the world, is going to be table steaks. has to be part of every important conversation of the data's going to end up there. of a company for a developer to call a data ocean are adopting the technology. down from Isilon to ECS. Speed, low latency, is the key to success. And I love this new object store, bringing you more action from Las Vegas.

ENTITIES

Entity	Category	Confidence
Jeff Woodrow	PERSON	0.99+
Paul	PERSON	0.99+
John	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Paul Gillan	PERSON	0.99+
Vikram Bhambri	PERSON	0.99+
Vikram	PERSON	0.99+
John Furrier	PERSON	0.99+
Paul Gillin	PERSON	0.99+
EMC	ORGANIZATION	0.99+
Emerging Technologies Group	ORGANIZATION	0.99+
2020	DATE	0.99+
Las Vegas	LOCATION	0.99+
less than a hundred milliseconds	QUANTITY	0.99+
Dell EMC	ORGANIZATION	0.99+
two extremes	QUANTITY	0.99+
Apache Foundation	ORGANIZATION	0.99+
two paradigms	QUANTITY	0.99+
Isilon	ORGANIZATION	0.99+
eighth year	QUANTITY	0.99+
both sides	QUANTITY	0.99+
Swiss Army	ORGANIZATION	0.99+
Flink	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
today	DATE	0.99+
two days ago	DATE	0.99+
one	QUANTITY	0.98+
Nautilus	ORGANIZATION	0.98+
30	QUANTITY	0.98+
Lambda Architecture	TITLE	0.98+
48%	QUANTITY	0.98+
two data centers	QUANTITY	0.98+
10 different things	QUANTITY	0.98+
SiliconANGLE	ORGANIZATION	0.98+
one data center	QUANTITY	0.98+
Elasticsearch	TITLE	0.98+
NoSQL	TITLE	0.97+
ECS	TITLE	0.97+
single	QUANTITY	0.97+
Kafka	TITLE	0.97+
both worlds	QUANTITY	0.97+
ECS	ORGANIZATION	0.97+
one point	QUANTITY	0.97+
one side	QUANTITY	0.97+
one market	QUANTITY	0.96+
first	QUANTITY	0.96+
Speak	TITLE	0.96+
Cassandra	TITLE	0.95+
Dell EMC World 2017	EVENT	0.94+
VIPRE	ORGANIZATION	0.94+
billions of billions of records	QUANTITY	0.93+
Project Nautilus	ORGANIZATION	0.92+
Vikram	ORGANIZATION	0.92+
day three	QUANTITY	0.91+
JSON	TITLE	0.91+
Berlin, Germany	LOCATION	0.9+
tons of data	QUANTITY	0.89+
EMC World 2017	EVENT	0.88+
data Artisans	ORGANIZATION	0.86+
HDFS	TITLE	0.84+
tier two	QUANTITY	0.83+
theCUBE	ORGANIZATION	0.82+
S3	OTHER	0.82+
44 zettabytes	QUANTITY	0.82+
Project Nautilus	TITLE	0.8+
Project Pravega	ORGANIZATION	0.78+

Carlo Vaiti | DataWorks Summit Europe 2017

>> Announcer: You are CUBE Alumni. Live from Munich, Germany, it's theCUBE. Covering, DataWorks Summit Europe 2017. Brought to you by Hortonworks. >> Hello, everyone, welcome back to live coverage at DataWorks 2017, I'm John Furrier with my cohost, Dave Vellante. Two days of coverage here in Munich, Germany, covering Hortonworks and Yahoo, presenting Hadoop Summit, now called DataWorks 2017. Our next guest is Carlo Vaiti, who's the HPE chief technology strategist, EMEA Digital Solutions, Europe, Middle East, and Africa. Welcome to theCUBE. >> Thank you, John. >> So we were just chatting before we came on, of your historic background at IBM, Oracle, and now HPE, and now back into the saddle there. >> Don't forget Sun Microsystems. >> Sun Microsystems, sorry, Sun, yeah. I mean, great, great run. >> It was a long run. >> You've seen the computer revolution happen. I worked at HP for nine years, from '88 to '97. Again, Dave was a premier analyst during that run of client-server. We've seen the computer revolution happen. Now we're seeing the digital revolution where the iPhone is now 10 years old, Cloud is booming, data's at the center of the value proposition, so a completely new disruptive capability. >> Carlo: Sure, yes. >> So what are you doing as the CTO, chief technologist for HPE, how are you guys bringing this story together? 'Cause there's so much going on at HPE. You got the services spit, you got the software split, and HP's focusing on the new style of IT, as Meg Whitman calls it. >> So, yeah. My role in EMEA is actually all about having basically a visionary kind of strategy role for what's going to be HP in the future, in terms of IT. And one of the things that we are looking at is, is specifically to have, we split our strategy in three different aspects, so three transformation areas. The first one which we usually talk is what I call hybrid IT, right, which is basically making services around either On-Premise or on Cloud for our customer base. The second one is actually power the Intelligent Edge, so is actually looking after our collaboration and when we acquire Aruba components. And the third one, which is in the middle, and that's why I'm here at the DataWorks Summit, is actually the data-analytics aspects. And we have a couple of solution in there. One is the Enterprise great Hadoop, which is part of this. This is actually how we generalize all the figure and the strategy for HP. >> It's interesting, Dave and I were talking yesterday, being in Europe, it's obviously a different sideshow, it's smaller than the DataWorks or Hadoop Summit in North America in San Jose, but there's a ton of Internet of things, IoT or IIoT, 'cause here in Germany, obviously, a lot of industrial nations, but in Europe in general, a lot of smart cities initiatives, a lot of mobility, a ton of Internet of things opportunity, more than in the US. >> Absolutely. >> Can you comment on how you guys are tackling the IoT? Because it's an Intelligent Edge, certainly, but it's also data, it's in your wheelhouse. >> Yes, sure. So I'm actually working, it's a good question, because I'm actually working a couple of projects in Eastern Europe, where it's all about Industrial IoT Analytics, IIoTA. That's the new terminology we use. So what we do is actually, we analyze from a business perspective, what are the business pain points, in an oil and gas company for example. And we understand for example, what kind of things that they need and must have. And what I'm saying here is, one of the aspects for example, is the drilling opportunity. So how much oil you can extract from a specific rig in the middle of the North Sea, for example. This is one of the key question, because the customer want to understand, in the future, how much oil they can extract. The other one is for example, the upstream business. So doing on the retail side and having, say, when my customer is stopping in a gas station, I want go in the shop, immediately giving, I dunno, my daughter, a kind of campaign for the Barbie, because they like the Barbie. So IoT, Industrial IoT help us in actually making a much better customer experience, and that's the case of the upstream business, but is also helping us in actually much faster business outcomes. And that's what the customer wants, right? 'Cause, and was talking with your colleague before, I'm talking to the business guy. I'm not talking to the IT anymore in these kind of place, and that's how IoT allow us a chance to change the conversation at the industry level. >> These are first-time conversations too. You're getting at the kinds of business conversations that weren't possible five years ago. >> Carlo: Yes, sure. >> I mean and 10 years ago, they would have seemed fantasy. Now they're reality. >> The role of analytics in my opinion, is becoming extremely key, and I said this morning, for me my best center is that the detail, is the stone foundation of the digital economy. I continue to repeat this terminology, because it's actually where everything is starting from. So what I mean is, let's take a look at the analytic aspect. So if I'm able to analyze the data close to the shop floor, okay, close to the shop manufacturing floor, if I'm able to analyze my data on the rig, in the oil and gas industry, if I'm able to analyze doing preprocessing analytics, with Kafka, Druid, these kind of open-source software, where close to the Intelligent Edge, then my customers going to be happy, because I give them very fast response, and the decision-maker can get to decision in a faster time. Today, it takes a long time to take these type of decision. So that's why we want to move into the power Intelligent Edge. >> So you're saying, data's foundational, but if you get to the Intelligent Edge, it's dynamic. So you have a dynamic reactive, realtime time series, or presences of data, but you need the foundational pre-data. >> Perfect. >> Is that kind of what you're getting at? >> Yes, that's the first step. Preprocessing analytics is what we do. In the next generation of, we think is going to be Industrial IoT Analytics, we're going to actually put massive amount of compute close to the shop manufacturing floor. We call internally or actually externally, convergent planned infrastructure. And that's the key point, right? >> John: Convergent plan? >> Convergent planned infrastructure, CPI. If you look at in Google, you will find. It's a solution we bring in the market a few months ago. We announce it in December last year. >> Yeah, Antonio's smart. He also had a converged systems as well. One of the first ones. >> Yeah, so that's converge compute at the edge basically. >> Correct, converge compute-- >> Very powerful. >> Very powerful, and we run analytics on the edge. That's the key point. >> Which we love, because that means you don't have to send everything back to the Cloud because it's too expensive, it's going to take too long, it's not going to work. >> Carlo: The bandwidth on the network is much less. >> There's no way that's going to be successful, unless you go to the edge and-- >> It takes time. >> With a cost. >> Now the other thing is, of course, you've got the Aruba asset, to be able to, I always say, joke, connect the windmill. But, Carlo, can we go back to the IoTA example? >> Carlo: Correct, yeah. >> I want to help, help our audience understand, sort of, the new HP, post these spin merges. So perviously you would say, okay, we have Vertica. You still have partnership, or you still own Vertica, but after September 1st-- >> Absolutely, absolutely. It's part of the columnar side-- >> Right, yes, absolutely, but, so. But the new strategy is to be more of a platform for a variety of technology. So how for instance would you solve, or did you solve, that problem that you described? What did you actually deliver? >> So again, as I said, we're, especially in the Industrial IoT, we are an ecosystem, okay? So we're one element of the ecosystem solution. For the oil and gas specifically, we're working with other system integrator. We're working with oil and the industry gas expertise, like DXC company, right, the company that we just split a few days ago, and we're working with them. They're providing the industry expertise. We are a infrastructure provided around that, and the services around that for the infrastructure element. But for the industry expertise, we try to have a kind of little bit of knowledge, to start the conversation with the customer. But again, my role in the strategy is actually to be a ecosystem digital integrator. That's the new terminology we like to bring in the market, because we really believe that's the way HP role is going to be. And the relevance of HP is totally depending if we are going to be successful in these type of things. >> Okay, now a couple other things you talked about in your keynote. I'm just going to list them, and then we can go wherever we want. There was Data Link 3.0, Storage Disaggregation, which is kind of interesting, 'cause it's been a problem. Hadoop as a service, Realtime Everywhere, and then Analytics at the Edge, which we kind of just talked about. Let's pick one. Let's start with Data Link 3.0. What is that? John doesn't like the term data link. He likes data ocean. >> I like data ocean. >> Is Data Link 3.0 becoming an ocean? >> It's becoming an ocean. So, Data Link 3.0 for us is actually following what is going to be the future for HDFS 3.0. So we have three elements. The erasure coding feature, which is coming on HDFS. The second element is around having HDFS data tier, multi-data tier. So we're going to have faster SSD drives. We're going to have big memory nodes. We're going to have GPU nodes. And the reason why I say disaggregation is because some of the workload will be only compute, and some of the workload will be only storage, okay? So we're going to bring, and the customer require this, because it's getting more data, and they need to have for example, YARN application running on compute nodes, and the same level, they want to have storage compute block, sorry, storage components, running on the storage model, like HBase for example, like HDFS 3.0 with the multi-tier option. So that's why the data disaggregation, or disaggregation between compute and storage, is the key point. We call this asymmetric, right? Hadoop is becoming asymmetric. That's what it mean. >> And the problem you're solving there, is when I add a node to a cluster, I don't have to add compute and storage together, I can disaggregate and choose whatever I need, >> Everyone that we did. >> based on the workload. >> They are all multitenancy kind of workload, and they are independent and they scale out. Of course, it's much more complex, but we have actually proved that this is the way to go, because that's what the customer is demanding. >> So, 3.0 is actually functional. It's erasure coding, you said. There's a data tier. You've got different memory levels. >> And I forgot to mention, the containerization of the application. Having dockerized the application for example. Using mesosphere for example, right? So having the containerization of the application is what all of that means, because what we do in Hadoop, we actually build the different clusters, they need to talk to each other, and change data in a faster way. And a solution like, a product like SQL Manager, from Hortonworks, is actually helping us to get this connection between the cluster faster and faster. And that's what the customer wants. >> And then Hadoop as a service, is that an on-premise solution, is that a hybrid solution, is it a Cloud solution, all three? >> I can offer all of them. Hadoop is a service could be run on-premise, could be run on a public Cloud, could be run on Azure, or could be mix of them, partially on-premise, and partially on public. >> And what are you seeing with regard to customer adoption of Cloud, and specifically around Hadoop and big data? >> I think the way I see that option is all the customer want to start very small. The maturity is actually better from a technology standpoint. If you're asking me the same question maybe a year ago, I would say, it's difficult. Now I think they've got the point. Every large customer, they want to build this big data ocean, note the delay, ocean, whatever you want to call it. >> John: Love that. (laughs) >> All right. They want to build this data ocean, and the point I want to make is, they want to start small, but they want to think very high. Very big, right, from their perspective. And the way they approach us is, we have a kind of methodology. We establish the maturity assessment. We do a kind of capability maturity assessment, where we find that if the customer is actually a pioneer, or is actually a very traditional one, so it's very slow-going. Once we determine where is the stage of the customer is, we propose some specific proof of concept. And in three months usually, we're putting this in place. >> You also talked about realtime everywhere. We in our research, we talk about the, historically, you had batchy of interactive, and now you have what we call continuous, or realtime streaming workloads. How prevalent is that? Where do you see it going in the future? >> So I think is another train for the future, as I mentioned this morning in my presentation. So and Spark is actually doing the open-source memory engine process, is actually the core of this stuff. We see 60 to 70 time faster analytics, compared to not to use Spark. So many customer implemented Spark because of this. The requirement are that the customer needs an immediate response time, okay, for a specific decision-making that they have to do, in order to improve their business, in order to improve their life. But this require a different architecture. >> I have a question, 'cause you, you've lived in the United States, you're obviously global, and spent a lot of time in Europe as well, and a lot of times, people want to discuss the differences between, let's make it specific here, the European continent and North America, and from a sophistication standpoint, same, we can agree on that, but there are still differences. Maybe, more greater privacy concerns. The whole thing with the Cloud and the NSA in the United States, created some concerns. What do you see as the differences today between North America and Europe? >> From my perspective, I think we are much more for example take IoT, Industrial IoT. I think in Europe we are much more advanced. I think in the manufacturing and the automotive space, the connected car kind of things, autonomous driving, this is something that we know already how to manage, how to do it. I mean, Tesla in the US is a good example that what I'm saying is not true, but if I look at for example, large German manufacturing car, they always implemented these type of things already today. >> Dave: For years, yeah. >> That's the difference, right? I think the second step is about the faster analytic approach. So what I mentioned before. The Power the Intelligent Edge, in my opinion at the moment, is much more advanced in the US compared to Europe. But I think Europe is starting to run back, and going on the same route. Because we believe that putting compute capacity on the edge is what actually the customer wants. But that's the two big differences I see. >> The other two big external factors that we like to look at, are Brexit and Trump. So (laughs) how 'about Brexit? Now that it's starting to sort of actually become, begin the process, how should we think about it? Is it overblown? It is critical? What's your take? >> Well, I think it's too early to say. UK just split a few days ago, right, officially. It's going to take another 18 months before it's going to be completed. From a commercial standpoint, we don't see any difference so far. We're actually working the same way. For me it's too early to say if there's going to be any implication on that. >> And we don't know about Trump. We don't have to talk about it, but the, but I saw some data recently that's, European sentiment, business sentiment is trending stronger than the US, which is different than it's been for the last many years. What do you see in terms of just sentiment, business conditions in Europe? Do you see a pick up? >> It's getting better, it is getting better. I mean, if I look at the major countries, the P&L is going positive, 1.5%. So I think from that perspective, we are getting better. Of course we are still suffering from the Chinese, and Japanese market sometimes. Especially in some of the big large deals. The inclusion of the Japanese market, I feel it, and the Chinese market, I feel that. But I think the economy is going to be okay, so it's going to be good. >> Carlo, I want to thank you for coming on and sharing your insight, final question for you. You're new to HPE, okay. We have a lot of history, obviously I was, spent a long part of my career there, early in my career. Dave and I have covered the transformation of HP for many, many years, with theCUBE certainly. What attracted you to HP and what would you say is going on at HP from your standpoint, that people should know about? >> So I think the number one thing is that for us the word is going to be hybrid. It means that some of the services that you can implement, either on-premise or on Cloud, could be done very well by the new Pointnext organization. I'm not part of Pointnext. I'm in the EG, Enterprise Group division. But I am fan for Pointnext because I believe this is the future of our company, is on the services side, that's where it's going. >> I would just point out, Dave and I, our commentary on the spin merge has been, create these highly cohesive entities, very focused. Antonio now running EG, big fans, of where it's actually an efficient business model. >> Carlo: Absolutely. >> And Chris Hsu is running the Micro Focus, CUBE Alumni. >> Carlo: It's a very efficient model, yes. >> Well, congratulations and thanks for coming on and sharing your insights here in Europe. And certainly it is an IoT world, IIoT. I love the analytics story, foundational services. It's going to be great, open source powering it, and this is theCUBE, opening up our content, and sharing that with you. I'm John Furrier, Dave Vellante. Stay with us for more great coverage, here from Munich after the short break.

Published Date : Apr 6 2017

SUMMARY :

Brought to you by Hortonworks. Welcome to theCUBE. and now back into the saddle there. I mean, great, great run. data's at the center of the value proposition, and HP's focusing on the new style And one of the things that we are looking at is, it's smaller than the DataWorks or Hadoop Summit Can you comment on how you guys are tackling the IoT? and that's the case of the upstream business, You're getting at the kinds of business conversations I mean and 10 years ago, they would have seemed fantasy. and the decision-maker can get to decision in a faster time. So you have a dynamic reactive, And that's the key point, right? It's a solution we bring in the market a few months ago. One of the first ones. That's the key point. it's going to take too long, it's not going to work. Now the other thing is, sort of, the new HP, post these spin merges. It's part of the columnar side-- But the new strategy is to be more That's the new terminology we like to bring in the market, John doesn't like the term data link. and the same level, they want to have but we have actually proved that this is the way to go, So, 3.0 is actually functional. So having the containerization of the application Hadoop is a service could be run on-premise, all the customer want to start very small. John: Love that. and the point I want to make is, they want to start small, and now you have what we call continuous, is actually the core of this stuff. in the United States, created some concerns. I mean, Tesla in the US is a good example is much more advanced in the US compared to Europe. actually become, begin the process, before it's going to be completed. We don't have to talk about it, but the, and the Chinese market, I feel that. Dave and I have covered the transformation of HP It means that some of the services that you can implement, our commentary on the spin merge has been, I love the analytics story, foundational services.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Carlo	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
Europe	LOCATION	0.99+
IBM	ORGANIZATION	0.99+
Germany	LOCATION	0.99+
Trump	PERSON	0.99+
Meg Whitman	PERSON	0.99+
Vertica	ORGANIZATION	0.99+
Pointnext	ORGANIZATION	0.99+
Chris Hsu	PERSON	0.99+
John	PERSON	0.99+
Carlo Vaiti	PERSON	0.99+
John Furrier	PERSON	0.99+
HP	ORGANIZATION	0.99+
Munich	LOCATION	0.99+
HPE	ORGANIZATION	0.99+
Yahoo	ORGANIZATION	0.99+
Sun Microsystems	ORGANIZATION	0.99+
Antonio	PERSON	0.99+
US	LOCATION	0.99+
EG	ORGANIZATION	0.99+
second element	QUANTITY	0.99+
United States	LOCATION	0.99+
second step	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
December last year	DATE	0.99+
iPhone	COMMERCIAL_ITEM	0.99+
San Jose	LOCATION	0.99+
1.5%	QUANTITY	0.99+
yesterday	DATE	0.99+
North America	LOCATION	0.99+
September 1st	DATE	0.99+
'97	DATE	0.99+
'88	DATE	0.99+
Africa	LOCATION	0.99+
one	QUANTITY	0.99+
Today	DATE	0.99+
three months	QUANTITY	0.99+
Eastern Europe	LOCATION	0.99+
Sun	ORGANIZATION	0.99+
Two days	QUANTITY	0.99+
60	QUANTITY	0.99+
DataWorks 2017	EVENT	0.99+
10 years ago	DATE	0.99+
DXC	ORGANIZATION	0.98+
EMEA Digital Solutions	ORGANIZATION	0.98+
five years ago	DATE	0.98+
a year ago	DATE	0.98+
Tesla	ORGANIZATION	0.98+

Steve Roberts, IBM– DataWorks Summit Europe 2017 #DW17 #theCUBE

>> Narrator: Covering DataWorks Summit, Europe 2017, brought to you by Hortonworks. >> Welcome back to Munich everybody. This is The Cube. We're here live at DataWorks Summit, and we are the live leader in tech coverage. Steve Roberts is here as the offering manager for big data on power systems for IBM. Steve, good to see you again. >> Yeah, good to see you Dave. >> So we're here in Munich, a lot of action, good European flavor. It's my second European, formerly Hadoop Summit, now DataWorks. What's your take on the show? >> I like it. I like the size of the venue. It's the ability to interact and talk to a lot of the different sponsors and clients and partners, so the ability to network with a lot of people from a lot of different parts of the world in a short period of time, so it's been great so far and I'm looking forward to building upon this and towards the next DataWorks Summit in San Jose. >> Terri Virnig VP in your organization was up this morning, had a keynote presentation, so IBM got a lot of love in front of a fairly decent sized audience, talking a lot about the sort of ecosystem and that's evolving, the openness. Talk a little bit about open generally at IBM, but specifically what it means to your organization in the context of big data. >> Well, I am from the power systems team. So we have an initiative that we have launched a couple years ago called Open Power. And Open Power is a foundation of participants innovating from the power processor through all aspects, through accelerators, IO, GPUs, advanced analytics packages, system integration, but all to the point of being able to drive open power capability into the market and have power servers delivered not just through IBM, but through a whole ecosystem of partners. This compliments quite well with the Apache, Hadoop, and Spark philosophy of openness as it relates to software stack. So our story's really about being able to marry the benefits of open ecosystem for open power as it relates to the system infrastructure technology, which drives the same time to innovation, community value, and choice for customers as it relates to a multi-vendor ecosystem and coupled with the same premise as it relates to Hadoop and Spark. And of course, IBM is making significant contributions to Spark as part of the Apache Spark community and we're a key active member, as is Hortonworks with the ODPi organization forwarding the standards around Hadoop. So this is a one, two combo of open Hadoop, open Spark, either from Hortonworks or from IBM sitting on the open power platform built for big data. No other story really exists like that in the market today, open on open. >> So Terri mentioned cognitive systems. Bob Picciano has recently taken over and obviously has some cognitive chops, and some systems chops. Is this a rebranding of power? Is it sort of a layer on top? How should we interpret this? >> No, think of it more as a layer on top. So power will now be one of the assets, one of the sort of member family of the cognitive systems portion on IBM. System z can also be used as another great engine for cognitive in certain clients, certain use cases where they want to run cognitive close to the data and they have a lot of data sitting on System z. So power systems as a server really built for big data and machine learning, in particular our S822LC for high performance computing. This is a server which is landing very well in the deep learning, machine learning space. It offers the Tesla P100 GPU and with the NVIDIA NVLink technology can offer up to 2.8x bandwidth benefits CPU to GPU over what would be available through a PCIe Intel combination today. So this drives immediate value when you need to ensure that not just you're exploiting GPUs, but you of course need to move your data quickly from the processor to the GPU. >> So I was going to ask you actually, sort of what make power so well suited for big data and cognitive applications, particularly relative to Intel alternatives. You touched on that. IBM talks a lot about Moore's Law starting to hit its peak, that innovation is going to come from other places. I love that narrative 'cause it's really combinatorial innovation that's going to lead us in the next 50 years, but can we stay on that thread for a bit? What makes power so substantially unique, uniquely suited and qualified to run cognitive systems and big data? >> Yeah, it actually starts with even more of the fundamentals of the power processors. The power processor has eight threads per core in contrast to Intel's two threads per core. So this just means for being able to parallelize your workloads and workloads that come up in the cognitive space, whether you're running complex queries and need to drive SQL over a lot of parallel pipes or you're writing iterative computation, the same data set as when you're doing model training, these can all benefit from highly parallelized workloads, which can benefit from this 4x thread advantage. But of course to do this, you also need large, fast memory, and we have six times more cache per core versus Broadwell, so this just means you have a lot of memory close to the processor, driving that throughput that you require. And then on top of that, now we get to the ability to add accelerators, and unique accelerators such as I mentioned the NVIDIA in the links scenario for GPU or using the open CAPI as an approach to attach FPGA or Flash to get access speeds, processor memory access speeds, but with an attached acceleration device. And so this is economies of scale in terms of being able to offload specialized compute processing to the right accelerator at the right time, so you can drive way more throughput. The upper bounds are driving workload through individual nodes and being able to balance your IO and compute on an individual node is far superior with the power system server. >> Okay, so multi-threaded, giant memories, and this open CAPI gives you primitive level access I guess to a memory extension, instead of having to-- >> Yeah, pluggable accelerators through this high speed memory extension. >> Instead of going through, what I often call the horrible storage stack, aka SCSI, And so that's cool, some good technology discussion there. What's the business impact of all that? What are you seeing with clients? >> Well, the business impact is not everyone is going to start with supped up accelerated workloads, but they're going to get there. So part of the vision that clients need to understand is to begin to get more insights from their data is, it's hard to predict where your workloads are going to go. So you want to start with a server that provides you some of that upper room for growth. You don't want to keep scaling out horizontally by requiring to add nodes every time you need to add storage or add more compute capacity. So firstly, it's the flexibility, being able to bring versatile workloads onto a node or a small number of nodes and be able to exploit some of these memory advantages, acceleration advantages without necessarily having to build large scale out clusters. Ultimately, it's about improving time to insights. So with accelerators and with large memory, running workloads on a similar configured clusters, you're simply going to get your results faster. For example, recent benchmark we did with a representative set of TPC-DS queries on Hortonworks running on Linux and power servers, we're able to drive 70% more queries per hour over a comparable Intel configuration. So this is just getting more work done on what is now similarly priced infrastructure. 'Cause power family is a broad family that now includes 1U, 2U, scale out servers, along with our 192 core horsepowers for enterprise grade. So we can directly price compete on a scale out box, but we offer a lot more flexible choice as clients want to move up in the workload stack or to bring accelerators to the table as they start to experiment with machine learning. >> So if I understand that right, I can turn two knobs. I can do the same amount of work for less money, TCO play. Or, for the same amount of money, I can do more work. >> Absolutely >> Is that fair? >> Absolutely, now in some cases, especially in the Hadoop space, the size of your cluster is somewhat gated by how much storage you require. And if you're using the classic scale up storage model, you're going to have so many nodes no matter what 'cause you can only put so much storage on the node. So in that case, >> You're scaling storage. >> Your clusters can look the same, but you can put a lot more workload on that cluster or you can bring in IBM, a solution like IBM Spectrum Scale our elastic storage server, which allows you to essentially pull that storage off the nodes, put it in a storage appliance, and at that point, you now have high speed access to storage 'cause of course the network bandwidth has increased to the point that the performance benefit of local storage is no longer really a driving factor to a classic Hadoop deployment. You can get that high speed access in a storage appliance mode with the resiliency at far less cost 'cause you don't need 3x replication, you just have about a 30% overhead for the software erasure coding. And now with your compete nodes, you can really choose and scale those nodes just for your workload purposes. So you're not bound by the number of nodes equal total storage required by storage per node, which is a classic, how big is my cluster calculation. That just doesn't work if you get over 10 nodes, 'cause now you're just starting to get to the point where you're wasting something right? You're either wasting storage capacity or typically you're wasting compute capacity 'cause you're over provisioned on one side or the other. >> So you're able to scale compute and storage independent and tune that for the workload and grow that resource efficiently, more efficiently? >> You can right size the compute and storage for your cluster, but also importantly is you gain the flexibility with that storage tier, that data plan can be used for other non-HDFS workloads. You can still have classic POSIX applications or you may have new object based applications and you can with a single copy of the data, one virtual file system, which could also be geographically distributed, serving both Hadoop and non-Hadoop workloads, so you're saving then additional replicas of the data from being required by being able to onboard that onto a common data layer. >> So that's a return on asset play. You got an asset that's more fungible across the application portfolio. You can get more value out of it. You don't have to dedicate it to this one workload and then over provision for another one when you got extra capacity sitting here. >> It's a TCO play, but it's also a time saver. It's going to get you time to insight faster 'cause you don't have to keep moving that data around. The time you spend copying data is time you should be spending getting insights from the data, so having a common data layer removes that delay. >> Okay, 'cause it's HDFS ready I don't have to essentially move data from my existing systems into this new stovepipe. >> Yeah, we just present it through the HDFS API as it lands in the file system from the original application. >> So now, all this talk about rings of flexibility, agility, etc, what about cloud? How does cloud fit into this strategy? What do are you guys doing with your colleagues and cohorts at Bluemix, aka SoftLayer. You don't use that term anymore, but we do. When we get our bill it says SoftLayer still, but any rate, you know what I'm talking about. The cloud with IBM, how does it relate to what you guys are doing in power systems? >> Well the cloud is still, really the born on the cloud philosophy of IBM software analytics team is still very much the motto. So as you see in the data science experience, which was launched last year, born in the cloud, all our analytics packages whether it be our BigInsights software or our business intelligence software like Cognos, our future generations are landing first in the cloud. And of course we have our whole arsenal of Watson based analytics and APIs available through the cloud. So what we're now seeing as well as we're taking those born in the cloud, but now also offering a lot of those in an on-premise model. So they can also participate in the hybrid model, so data science experience now coming on premise, we're showing it at the booth here today. Bluemix has a on premise version as well, and the same software library, BigInsights, Cognos, SPSS are all available for on prem deployment. So power is still ideal place for hosting your on prem data and to run your analytics close to the data, and now we can federate that through hybrid access to these elements running in the cloud. So the focus is really being able to, the cloud applications being able to leverage the power and System z's based data through high speed connectors and being able to build hybrid configurations where you're running your analytics where they most make sense based upon your performance requirements, data security and compliance requirements. And a lot of companies, of course, are still not comfortable putting all their jewels in the cloud, so typically there's going to be a mix and match. We are expanding the footprint for cloud based offerings both in terms of power servers offered through SoftLayer, but also through other cloud providers, Nimbix is a partner we're working with right now who actually is offering our Power AI package. Power AI is a package of open source, deep learning frameworks, packaged by IBM, optimized for Power in an easily deployed package with IBM support available. And that's, could be deployed on premise in a power server, but also available on a pay per drink purpose through the Nimbix cloud. >> All right, we covered a lot of ground here. We talked strategy, we talked strategic fit, which I guess is sort of a adjunct to strategy, we talked a little bit about the competition and where you differentiate, some of the deployment models, like cloud, other bits and pieces of your portfolio. Can we talk specifically about the announcements that you have here at this event, just maybe summarize for use? >> Yeah, no absolutely. As it relates to IBM, and Hadoop, and Spark, we really have the full stack support, the rich analytics capabilities that I was mentioning, deep insight, prescriptive insights, streaming analytics with IBM Streams, Cognos Business Intelligence, so this set of technologies is available for both IBMs, Hadoop stack, and Hortonworks Hadoop stack today. Our BigInsights and IOP offering, is now out for tech preview, their next release their 4.3 release, is available for technical preview will be available for both Linux on Intel, Linux on power towards the end of this month, so that's kind of one piece of new Hadoop news at the analytics layer. As it relates to power systems, as Hortonworks announced this morning, HDP 2.6 is now available for Linux on power, so we've been partnering closely with Hortonworks to ensure that we have an optimized story for HDP running on power system servers as the data point I shared earlier with the 70% improved queries per hour. At the storage layer, we have a work in progress to certify Hortonworks, to certify Spectrum Scale file system, which really now unlocks abilities to offer this converged storage alternative to the classic Hadoop model. Spectrum Scale actually supports and provides advantages in both a classic Hadoop model with local storage or it can provide the flexibility of offering the same sort of multi-application support, but in a scale out model for storage that it also has the ability to form a part of a storage appliance that we call Elastic Storage Server, which is a combination of power servers and high density storage enclosures, SSD or spinning disk, depending upon the, or flash, depending on the configuration, and that certification will now have that as an available storage appliance, which could underpin either IBM Open Platform or HDP as a Hadoop data leg. But as I mentioned, not just for Hadoop, really for building a common data plane behind mixed analytics workloads that reduces your TCO through converged storage footprint, but more importantly, provides you that flexibility of not having to create data copies to support multiple applications. >> Excellent, IBM opening up its portfolio to the open source ecosystem. You guys have always had, well not always, but in the last 20 years, major, major investments in open source. They continue on, we're seeing it here. Steve, people are filing in. The evening festivities are about to begin. >> Steve: Yeah, yeah, the party will begin shortly. >> Really appreciate you coming on The Cube, thanks very much. >> Thanks a lot Dave. >> You're welcome. >> Great to talk to you. >> All right, keep it right there everybody. John and I will be back with a wrap up right after this short break, right back.

Published Date : Apr 6 2017

SUMMARY :

brought to you by Hortonworks. Steve, good to see you again. Munich, a lot of action, so the ability to network and that's evolving, the openness. as it relates to the system and some systems chops. from the processor to the GPU. in the next 50 years, and being able to balance through this high speed memory extension. What's the business impact of all that? and be able to exploit some of these I can do the same amount of especially in the Hadoop space, 'cause of course the network and you can with a You don't have to dedicate It's going to get you I don't have to essentially move data as it lands in the file system to what you guys are and to run your analytics a adjunct to strategy, to ensure that we have an optimized story but in the last 20 years, Steve: Yeah, yeah, the you coming on The Cube, John and I will be back with a wrap up

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
John	PERSON	0.99+
Steve	PERSON	0.99+
Steve Roberts	PERSON	0.99+
Dave	PERSON	0.99+
Munich	LOCATION	0.99+
Bob Picciano	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Terri	PERSON	0.99+
3x	QUANTITY	0.99+
six times	QUANTITY	0.99+
70%	QUANTITY	0.99+
last year	DATE	0.99+
San Jose	LOCATION	0.99+
two knobs	QUANTITY	0.99+
Bluemix	ORGANIZATION	0.99+
NVIDIA	ORGANIZATION	0.99+
eight threads	QUANTITY	0.99+
Linux	TITLE	0.99+
Hadoop	TITLE	0.99+
both	QUANTITY	0.98+
one	QUANTITY	0.98+
Nimbix	ORGANIZATION	0.98+
today	DATE	0.98+
DataWorks Summit	EVENT	0.98+
SoftLayer	TITLE	0.98+
second	QUANTITY	0.97+
Hadoop Summit	EVENT	0.97+
Intel	ORGANIZATION	0.97+
Spark	TITLE	0.97+
IBMs	ORGANIZATION	0.95+
single copy	QUANTITY	0.95+
end of this month	DATE	0.95+
Watson	TITLE	0.95+
S822LC	COMMERCIAL_ITEM	0.94+
Europe	LOCATION	0.94+
this morning	DATE	0.94+
firstly	QUANTITY	0.93+
HDP 2.6	TITLE	0.93+
first	QUANTITY	0.93+
HDFS	TITLE	0.91+
one piece	QUANTITY	0.91+
Apache	ORGANIZATION	0.91+
30%	QUANTITY	0.91+
ODPi	ORGANIZATION	0.9+
DataWorks Summit Europe 2017	EVENT	0.89+
two threads per core	QUANTITY	0.88+
SoftLayer	ORGANIZATION	0.88+

Chandra Mukhyala, IBM - DataWorks Summit Europe 2017 - #DW17 - #theCUBE

>> Narrator: theCUBE covering, DataWorks Summit Europe 2017. Brought to you by Hortonworks. >> Welcome back to the DataWorks Summit in Munich everybody. This is The Cube, the leader in live tech coverage. Chandra Mukhyala is here. He's the offering manager for IBM Storage. Chandra, good to see you. It always comes back to storage. >> It does, it's the foundation. We're here at a Data Show, and you got to put the data somewhere. How's the show going? What are you guys doing here? >> The show's going good. We have lots of participation. I didn't expect this big a crowd, but there is good crowd. Storage, people don't look at it as the most sexy thing but I still see a lot of people coming and asking. "What do you have to do with Hadoop?" kind of questions which is exactly the kind of question I expect. So, going good, we're able to-- >> It's interesting, in the early days of Hadoop and big data, I remember we interviewed, John and I interviewed Jeff Hammerbacher, founder of Cloudera and he was at Facebook and he said, "My whole goal at Facebook "when we're working with Hadoop was to "eliminate the storage container "and the expensive storage container." They succeeded, but now you see guys like you coming in and saying, "Hey, we have better storage." Why does the world need anything different than HDFS? >> This has been happening for the last two decades, right? In storage, every few years a startup comes, they address one problem very well. They address one problem and create a whole storage solution around that. Everybody understands the benefit of it and that becomes part of the main storage. When I say main storage, because these new point solutions address one problem but what about all the rest of the features storage has been developing for decades. Same thing happened with other solutions, for example, deduplication. Very popular, right at one point, dedupe appliances. Nowadays, every storage solution has dedupe in. I think same thing with HDFS right? HDFS's purpose is built for Hadoop. It solves that problem in terms of giving local access storage, scalable storage, big plural storage. But, it's missing out many things you know. One of the biggest problems they have with HDFS is it's siloed storage, meaning that data is only available, the data in HDFS is only for Hadoop. You can't, what about the rest of the applications in the organizations, who may need it through traditional protocols like NFS, or SMB or they maybe need it through new applications like S3 interfaces or Swift interfaces. So, you don't want that siloed storage. That's one of the biggest problems we have. >> So, you're putting forth a vision of some kind horizontal infrastructure that can be leveraged across your application portfolio... >> Chandra: Yes. >> How common is that? And what's the value of that? >> It's not really common, that's one of the stories, messages we're trying to get out. And I've been talking to data scientists in the last one year, a lot of them. One of the first things they do when they are implementing a Hadoop project is, they have to copy a lot data into HDFS Because before they could enter it just as HDFS they can't on any set. That copy process takes days. >> Dave: That's a big move, yeah. >> It's not only wasting time from a data scientist, but it also makes the data stale. I tell them you don't have to do that if your data was on something like IBM Spectrum Scale. You can run Hadoop straight off that, why do you even have to copy into HDFS. You can use the same existing applications map, and just applications with zero change to it and pour in them at Spectrum Scale it can still use the HSFS API. You don't have to copy that. And every data scientists I talk to is like, "Really?" "I don't know how to do this, I'm wasting time?" Yes. So, it's not very well known that, you know, most people think that there's only one way to do Hadoop applications, in sometimes HDFS. You don't have to. And advantages there is, one, you don't have to copy, you can share the data with the rest of the applications but its no more stale data. But also, one other big difference between the HDFS type of storage versus shared storages. In the shared, which is what HDFS is, the various scale is by adding new nodes, which adds both compute and storage. What if our applications, which don't necessarily need need more compute, all they need is more throughput. You're wasting computer resources, right? So there are certain applications where a share nothing is a better architecture. Now the solution which IBM has, will allow you to deploy it in either way. Share nothing or shared storage but that's one of the main reasons, people want to, data scientists especially, want to look at these alternative solutions for storage. >> So when I go back to my Hammerbacher example, it worked for a Facebook of the early days because they didn't have a bunch of legacy data hanging around, they could start with, pretty much, a blank piece of paper. >> Yes. >> Re-architect, plus they had such scale, they probably said, "Okay, we don't want to go to EMC "and NetApp or IBM, or whomever and buy storage, "we want to use commodity components." Not every enterprise can do that, is what you're saying. >> Yes, exactly. It's probably okay for somebody like a very large search engine, when all they're doing is analytics, nothing else. But if you to any large commercial enterprise, they have lots of, the whole point around analytics is they want to pool all of the data and look at that. So, find the correlations, right? It's not about analyzing one small, one dataset from one business function. It's about pooling everything together and see what insights can I get out of it. So that's one of the reasons it's very important to have support to access the data for your legacy enterprise applications, too, right? Yeah, so NFS and SMB are pretty important, so are S3 and Swift, but also for these analytics applications, one of the advantage of IBM Solution here is we provide local access for file system. Not necessarily through mass protocols like an access, we do that, but we also have PO SIX access to have data local access to the file system. With that, HDFS you have to first copy the file into HDFS, you had to bring it back to do anything with that. All those copy operations go away. And this is important, again in enterprise, not just for data sharing but also to get local access. >> You're saying your system is Hadoop ready. >> Chandra: It is. >> Okay. And then, the other thing you hear a lot from IT practitioners anyway, not so much from from the line of businesses, that when people spin up these Hadoop projects, big data projects, they go outside of the edicts of the organization in terms of governance and compliance, and often, security. How do you solve, do you solve that problem? >> Yeah, that's one of the reason to consider again, the enterprise storage, right? It's not just because you have, you're able to share the data with rest of applications, but also the whole bunch of data management features, including data governance features. You can talk about encryption there, you can talk about auditing there, you can talk about features like WAN, right, WAN, so data is, especially archival data, once you write you can't modify that. There are a whole bunch of features around data retention, data governance, those are all part of the data management stack we have. You get that for free. You not only get universal access, unified access, but you also get data governance. >> So is this one of the situations where, on the face of it, when you look at the CapEx, you say, "Oh, wow, I cause use commodity components, save a bunch of money." You know, you remember the client server days. "Oh, wow, cheap, cheap, cheep, "microprocessor based solution," and then all the sudden, people realize we have to manage this. Have we seen a similar sort of trend with Hadoop, with the ability to or the complexity of managing all of this infrastructure? It's so high than it actually drives costs up. >> Actually there are two parts to it, right? There is actually value in utilizing commodity hardware, industry standards. That does reduce your costs right? If you can just buy a standard XL6 server we can, a storage server and utilize that, why not. That is kind of just because. But the real value in any kind of a storage data manage solution is in the software stack. Now you can reduce CapEx by using industry standards. It's a good thing to do and we should, and we support that but in the end, the data management is there in the software stack. What I'm saying is HDFS is solving one problem by dismissing the whole data management problems, which we just touched on. And that all comes in software which goes down under service. >> Well, and you know, it's funny, I've been saying for years, that if you peel back the onion on any storage device, the vast majority anyway, they're all based on standard components. It's the software that you're paying for. So it's sort of artificial in that a company like IBM will say, "Okay, we've got all this value in here, "but it's on top of commodity components, "we're going to charge for the value." >> Right. >> And so if you strip that out, sure, you do it yourself. >> Yeah, exactly. And it's all standard service. It's been like that always. Now one difference is ten years ago people used propriety array controllers. Now all of the functionalities coming into software-- >> ASICs, >> Recording. >> Yeah, 3PAR still has an ASIC, but most don't. >> Right, that's funny, they only come in like.. Almost everybody has some kind of a software-based recording and they're able to utilize sharing server. Now the reason advantage in appliance more over, because, yes it can run on industry's standard, but this is storage, this is where, that's a foundation of all of your inter sectors. And you want RAS, or you want reliability and availability. The only way to get that is a fully integrated, tight solution, where you're doing a lot of testing on the software and the hardware. Yes, it's supposed to work, but what really happens when it fails, how does the sub react. And that's where I think there is still a value for integrated systems. If you're a large customer, you have a lot of storage saving, source of the administrators and they know to build solutions and validate it. Yes, software based storage is the right answer for you. And you're the offering manager for Spectrum Scale, which is the file offering, right, that's right? >> Yes, right yes. >> And it includes object as well, or-- >> Spectrum Sale is a file and object storage pack. It supports both file and protocols. It also supports object protocols. The thing about object storage is it means different things to different people. To some people, it's the object interface. >> Yeah, to me it means get put. >> Yeah, that's what the definition is, then it is objectivity. But the fact is that everybody's supposed to stay in now. But to some of the people, it's not about the protocol, because they're going to still access by finding those protocols, but to them, it's about the object store, which means it's a flat name space and there's no hierarchical name structure, and you can get into billions of finites without having any scalable issues. That's an object store. But to some other people it's neither of those, it's about a range of coding which object storage, so it's cheap storage. It allows you to run on storage and service, and you get cheap storage. So it's three different things. So if you're talking about protocols yes, but their skill is by their definition is object storage, also. >> So in thinking about, well let's start with Spectrum Scale generally. But specifically, your angle in big data and Hadoop, and we talked about that a little bit, but what are you guys doing here, what are you showing, what's your partership with Hortonworks. Maybe talk about that a little bit. >> So we've been supporting this, what we call as Hadoop connector on Spectrum Scale for almost a year now, which is allowing our existing Spectrum Scale customers to run Hadoop straight on it. But if you look at the Hadoop distributions, there are two or three major ones, right? Cloudera, Hortonworks, maybe MapArt. One of the first questions we get is, we tell our customers you can run Hadoop on this. "Oh, is this supported by my distribution?" So that has been a problem. So what we announced is, we found a partnership with Hortonworks, so now Hortonwords is certifying IBM Spectrum Scale. It's not new code changes, it's not new features, but it's a validation and a stamp from Hortonworks, that's in the process. The result of is, Hortonworks certified reference architecture, which is what we announced. We announced it about a month ago. We should be publishing that soon. Now customers can have more confidence in the joint solutions. It's not just IBM saying that it's Hadoop ready, but it's Hortonworks backing that up. >> Okay, and your scope, correct me if I'm wrong, is sort of on prem and hybrid, >> Chandra: Yes. >> Not cloud services. That's kind of you might sell your technology internally, but-- >> Correct so IBM storage is primarily focused on on prem storage. We do have a separate cloud division, but almost every IBM storage production, especially Spectrum Scale, is what I can speak of, we treat them as hybrid cloud storage. What we mean that is we have built in capabilities, we have feature. Most of our products call transfer in cloud tiering, it allows you to set a policy on when data should be automatically tiered to the cloud. Everybody wants public, everybody wants on prem. Obviously there are pros and cons of on primary storage, versus off primary storage, but basially, it boils down to, if you want performance and security, you want to be on premises. But there's always some which is better to be in the cloud, and we try to automate that with our feature called transfer and cloud data. You set a policy based on age, based on the type of data, based on the ownership. The system will automatically tier the data to the cloud, and when a user access that cloud, it comes back automatically, too. It's all transferred to the end. So yes, we're a non primary storage business but our solutions are hybrid cloud storage. >> So, as somebody who knows the file business pretty well, let's talk about kind of the business file and sort of where it's headed. There's some mega trends and dislocations. There's obviously software defined. You guys have made a big investment in software defined a year and a half, two years ago. There's cloud, Amazon with S3 sort of shook up the world. I mean, at first it was sort of small, but then now, it's really catching on. Object obviously fits in there. What do you see as the future of file. >> That's a great question. When it comes to data layout, there's really a block file of object. Software defined and cloud are various ways of consuming storage. If you're large service probably, you would prefer a software based solution so you can run it on your existing service. But who are your preferred solutions? Depending on the organization's preferences for security, and how concerned they are about security and performance needs, they will prefer to run some of the applications on cloud. These are different ways of consuming storage. But coming back to file, an object right? So object is perfect if you are not going to modify the data. You're done writing that data, and you're not going to change. It just belongs an object store, right? It's more scalable storage, I say scalable because file systems are hierarchical in nature. Because it's a file system tree, you have travels through the various subtype trees. Beyond a few million subtype trees, it slows you down. But file systems have a strength. When you want to modify the file, any application which is going to edit the file, which is going to modify the file, that application belongs on file storage, not on object. But let's say you are dealing with medical images. You're not going to modify an x-ray once it's done. That's better suited on an object storage. So file storage will always have a place. Take video editing and all these videos they are doing, you know video, we do a lot of video editing. That belongs on file storage, not on object. If you care about file modifications and file performance, file is your answer, but if you're done and you just want to archive it, you know, you want a scalable storage, billions of objects, then object is answer. Now either of these can be software based storage or it could be appliance. That's again an organization's preference for do you want to integrate a robust ready, ready made solution, then appliance is an answer. "Ah, no I'm a large organization. "I have a lot of storage administered," as they can build something on their own, then software based is answer. Having most windows will give you a choice. >> What brought you to IBM. You used to be at NetApp. IBM's buying the weather company. Dell's buying EMC. What attracted you to IBM? Storage is the foundation which we have, but it's really about data, and it's really about making sense of it, right? And everybody saying data is the new oil, right? And IBM is probably the only company I can think of, which has the tools and the IT to make sense of all this. NetApp, it was great in early 2000s. Even as a storage foundation, they have issues, with scale out and a true scale out, not just a single name space. EMC is pure storage company. In the future it's all about, the reason we are here at this conference is about analyzing the data. What tools do you have to make sense of that. And that's where machine learning, then deep learning comes. Watson is very well-known for that. IBM has the IT and it has a rightful research going on behind that, and I think storage will make more sense here. And also, IBM is doing the right thing by investing almost a billion dollars in software defined storage. They are one of the first companies who did not hesitate to take the software from the integrated systems, for example, XIV, and made the software available as software only. We did the same thing with Store-Wise. We took the software off it and made available as Spectrum Virtualize. We did not hesitate at all to take the same software which was available, to some other vendors, "I can't do that. "I'm going to lose all my margins." We didn't hesitate. We made it available as software. 'Cause we believe that's an important need for our customers. >> So the vision of the company, cognitive, the halo effect of that business, that's the future, is going to bring a lot of storage action, is sort of the premise there. >> Chandra: Yes. >> Excellent, well Chandra, thanks very much for coming to theCUBE. It was great to have you, and good luck with attacking the big data world. >> Thank you, thanks for having me. >> You're welcome. Keep it right there everybody. We'll be back with our next guest. We're live from Munich. This is DataWorks 2017. Right back. (techno music)

Published Date : Apr 5 2017

SUMMARY :

Brought to you by Hortonworks. This is The Cube, the leader It does, it's the foundation. at it as the most sexy thing in the early days of Hadoop and big data, and that becomes part of the main storage. of some kind horizontal infrastructure One of the first things they do but it also makes the data stale. of legacy data hanging around, that, is what you're saying. So that's one of the You're saying your of the organization in terms of governance but also the whole bunch of the client server days. It's a good thing to do and we should, It's the software that you're paying for. And so if you strip that Now all of the functionalities an ASIC, but most don't. is the right answer for you. To some people, it's the object interface. it's not about the protocol, but what are you guys doing One of the first questions we get is, That's kind of you might sell based on the type of data, let's talk about kind of the business file of the applications on cloud. And also, IBM is doing the right thing is sort of the premise there. to theCUBE. This is DataWorks 2017.

ENTITIES

Entity	Category	Confidence
Jeff Hammerbacher	PERSON	0.99+
John	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Hortonwords	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Munich	LOCATION	0.99+
Chandra Mukhyala	PERSON	0.99+
Facebook	ORGANIZATION	0.99+
Chandra	PERSON	0.99+
two parts	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
billions	QUANTITY	0.99+
EMC	ORGANIZATION	0.99+
Dell	ORGANIZATION	0.99+
DataWorks Summit	EVENT	0.99+
Swift	TITLE	0.99+
early 2000s	DATE	0.99+
One	QUANTITY	0.99+
one problem	QUANTITY	0.99+
DataWorks Summit	EVENT	0.99+
Cloudera	ORGANIZATION	0.99+
S3	TITLE	0.98+
one	QUANTITY	0.98+
both	QUANTITY	0.98+
MapArt	ORGANIZATION	0.98+
first	QUANTITY	0.98+
Spectrum Scale	TITLE	0.97+
ten years ago	DATE	0.97+
two years ago	DATE	0.97+
first questions	QUANTITY	0.96+
first companies	QUANTITY	0.96+
billions of objects	QUANTITY	0.95+
Hadoop	TITLE	0.95+
#DW17	EVENT	0.95+
one point	QUANTITY	0.95+
2017	EVENT	0.94+
decades	QUANTITY	0.94+
one business function	QUANTITY	0.94+
zero	QUANTITY	0.94+
a year and a half	DATE	0.93+
DataWorks Summit Europe 2017	EVENT	0.92+
one dataset	QUANTITY	0.92+
one way	QUANTITY	0.92+
three different things	QUANTITY	0.92+
DataWorks 2017	EVENT	0.91+
SMB	TITLE	0.91+
CapEx	ORGANIZATION	0.9+
last one year	DATE	0.89+

Shaun Connolly, Hortonworks - DataWorks Summit Europe 2017 - #DW17 - #theCUBE

>> Announcer: Coverage DataWorks Summit Europe 2017 brought to you by Hortonworks. >> Welcome back everyone. Live here in Munich, Germany for theCUBE'S special presentation of Hortonworks Hadoop Summit now called DataWorks 2017. I'm John Furrier, my co-host Dave Vellante, our next guest is Shaun Connolly, Vice President of Corporate Strategy, Chief Strategy Officer. Shaun great to see you again. >> Thanks for having me guys. Always a pleasure. >> Super exciting. Obviously we always pontificating on the status of Hadoop and Hadoop is dead, long live Hadoop, but runs in demise is greatly over-exaggerated, but reality is is that no major shifts in the trends other than the fact that the amplification with AI and machine learning has upleveled the narrative to mainstream around data, big data has been written on on gen one on Hadoop, DevOps, culture, open-source. Starting with Hadoop you guys certainly have been way out in front of all the trends. How you guys have been rolling out the products. But it's now with IoT and AI as that sizzle, the future self driving cars, smart cities, you're starting to really see demand for comprehensive solutions that involve data-centric thinking. Okay, said one. Two, open-source continues to dominate MuleSoft went public, you guys went public years ago, Cloudera filed their S-1. A crop of public companies that are open-source, haven't seen that since Red Hat. >> Exactly. 99 is when Red Hat went public. >> Data-centric, big megatrend with open-source powering it, you couldn't be happier for the stars lining up. >> Yeah, well we definitely placed our bets on that. We went public in 2014 and it's nice to see that graduating class of Taal and MuleSoft, Cloudera coming out. That just I think helps socializes movement that enterprise open-source, whether it's for on-prem or powering cloud solutions pushed out to the edge, and technologies that are relevant in IoT. That's the wave. We had a panel earlier today where Dahl Jeppe from Centric of British Gas, was talking about his ... The digitization of energy and virtual power plant notions. He can't achieve that without open-source powering and fueling that. >> And the thing about it is is just kind of ... For me personally being my age in this generation of computer industry since I was 19, to see the open-source go mainstream the way it is, is even gets better every time, but it really is the thousandth flower bloom strategy. Throwing the seeds out there of innovation. I want to ask you as a strategy question, you guys from a performance standpoint, I would say kind of got hammered in the public market. Cloudera's valuation privately is 4.1 billion, you guys are close to 700 million. Certainly Cloudera's going to get a haircut looks like. The public market is based on the multiples from Dave and I's intro, but there's so much value being created. Where's the value for you guys as you look at the horizon? You're talking about white spaces that are really developing with use cases that are creating value. The practitioners in the field creating value, real value for customers. >> So you covered some of the trends, but I'll translate em into how the customers are deploying. Cloud computing and IoT are somewhat related. One is a centralization, the other is decentralization, so it actually calls for a connected data architecture as we refer to it. We're working with a variety of IoT-related use cases. Coca-Cola, East Japan spoke at Tokyo Summit about beverage replenishment analytics. Getting vending machine analytics from vending machines even on Mount Fuji. And optimizing their flow-through of inventory in just-in-time delivery. That's an IoT-related to run on Azure. It's a cloud-related story and it's a big data analytics story that's actually driving better margins for the business and actually better revenues cuz they're getting the inventory where it needs to be so people can buy it. Those are really interesting use cases that we're seeing being deployed and it's at this convergence of IoT cloud and big data. Ultimately that leads to AI, but I think that's what we're seeing the rise of. >> Can you help us understand that sort of value chain. You've got the edge, you got the cloud, you need something in-between, you're calling it connected data platform. How do you guys participate in that value chain? >> When we went public our primary workhorse platform was Hortonworks Data Platform. We had first class cloud services with Azure HDInsight and Hortonworks Data Cloud for AWS, curated cloud services pay-as-you-go, and Hortonworks DataFlow, I call as our connective tissue, it manages all of your data motion, it's a data logistics platform, it's like FedEx for data delivery. It goes all the way out to the edge. There's a little component called Minify, mini and ify, which does secure intelligent analytics at the edge and transmission. These smart manufacturing lines, you're gathering the data, you're doing analytics on the manufacturing lines, and then you're bringing the historical stuff into the data center where you can do historical analytics across manufacturing lines. Those are the use cases that are connect the data archives-- >> Dave: A subset of that data comes back, right? >> A subset of the data, yep. The key events of that data it may not be full of-- >> 10%, half, 90%? >> It depends if you have operational events that you want to store, sometimes you may want to bring full fidelity of that data so you can do ... As you manufacture stuff and when it got deployed and you're seeing issues in the field, like Western Digital Hard Drives, that failure's in the field, they want that data full fidelity to connect the data architecture and analytics around that data. You need to ... One of the terms I use is in the new world, you need to play it where it lies. If it's out at the edge, you need to play it there. If it makes a stop in the cloud, you need to play it there. If it comes into the data center, you also need to play it there. >> So a couple years ago, you and I were doing a panel at our Big Data NYC event and I used the term "profitless prosperity," I got the hairy eyeball from you, but nonetheless, we talked about you guys as a steward of the industry, you have to invest in open-source projects. And it's expensive. I mean HDFS itself, YARN, Tez, you guys lead a lot of those initiatives. >> Shaun: With the community, yeah, but we-- >> With the community yeah, but you provided contributions and co-leadership let's say. You're there at the front of the pack. How do we project it forward without making forward-looking statements, but how does this industry become a cashflow positive industry? >> Public companies since end of 2014, the markets turned beginning at 2016 towards, prior to that high growth with some losses was palatable, losses were not palatable. That his us, Splunk, Tableau most of the IT sector. That's just the nature of the public markets. As more public open-source, data-driven companies will come in I think it will better educate the market of the value. There's only so much I can do to control the stock price. What I can from a business perspective is hit key measures from a path to profitability. The end of Q4 2016, we hit what we call the just-to-even or breakeven, which is a stepping stone. On our earnings call at the end of 2016 we ended with 185 million in revenue for the year. Only five years into this journey, so that's a hard revenue growth pace and we basically stated in Q3 or Q4 of 17, we will hit operating cashflow neutrality. So we are operating business-- >> John: But you guys also hit a 100 million at record pace too, I believe. >> Yeah, in four years. So revenue is one thing, but operating margins, like if you look at our margins on our subscription business for instance, we've got 84% margin on that. It's a really nice margin business. We can make that better margins, but that's a software margin. >> You know what's ironic, we were talking about Red Hat off camera. Here's Red Hat kicking butt, really hitting all cylinders, three billion dollars in bookings, one would think, okay hey I can maybe project forth some of these open-source companies. Maybe the flip side of this, oh wow we want it now. To your point, the market kind of flipped, but you would think that Red Hat is an indicator of how an open-source model can work. >> By the way Red Hat went public in 99, so it was a different trajectory, like you know I charted their trajectory out. Oracle's trajectory was different. They didn't even in inflation adjusted dollars they didn't hit a 100 million in four years, I think it was seven or eight years or what have you. Salesforce did it in five. So these SaaS models and these subscription models and the cloud services, which is an area that's near and dear to my heart. >> John: Goes faster. >> You get multiple revenue streams across different products. We're a multi-products cloud service company. Not just a single platform. >> So we were actually teasing this out on our-- >> And that's how you grow the business, and that's how Red Hat did it. >> Well I want to get your thoughts on this while we're just kind of ripping live here because Dave and I were talking on our intro segment about the business model and how there's some camouflage out there, at least from my standpoint. One of the main areas that I was kind of pointing at and trying to poke at and want to get your reaction to is in the classic enterprise go-to-market, you have sales force expansive, you guys pay handsomely for that today. Incubating that market, getting the profitability for it is a good thing, but there's also channels, VARs, ISVs, and so on. You guys have an open-source channel that kind of not as a VAR or an ISV, these are entrepreneurs and or businesses themselves. There's got to be a monetization shift there for you guys in the subscription business certainly. When you look at these partners, they're co-developing, they're in open-source, you can almost see the dots connecting. Is this new ecosystem, there's always been an ecosystem, but now that you have kind of a monetization inherently in a pure open distribution model. >> It forces you to collaborate. IBM was on stage talking about our system certified on the Power Systems. Many may look at IBM as competitive, we view them as a partner. Amazon, some may view them as a competitor with us, they've been a great partner in our for AWS. So it forces you to think about how do you collaborate around deeply engineered systems and value and we get great revenue streams that are pulled through that they can sell into the market to their ecosystems. >> How do you vision monetizing the partners? Let's just say Dave and I start this epic idea and we create some connective tissue with your orchestrator called the Data Platform you have and we start making some serious bang. We make a billion dollars. Do you get paid on that if it's open-source? I mean would we be more subscriptions? I'm trying to see how the tide comes in, whose boats float on the rising tide of the innovation in these white spaces. >> Platform thinking is you provide the platform. You provide the platform for 10x value that rides atop that platform. That's how the model works. So if you're riding atop the platform, I expect you and that ecosystem to drive at least 10x above and beyond what I would make as a platform provider in that space. >> So you expect some contributions? >> That's how it works. You need a thousand flowers to be running on the platform. >> You saw that with VMware. They hit 10x and ultimately got to 15 or 16, 17x. >> Shaun: Exactly. >> I think they don't talk about it anymore. I think it's probably trading the other way. >> You know my days at JBoss Red Hat it was somewhere between 15 to 20x. That was the value that was created on top of the platforms. >> What about the ... I want to ask you about the forking of the Hadoop distros. I mean there was a time when everybody was announcing Hadoop distros. John Furrier announced SiliconANGLE was announcing Hadoop distro. So we saw consolidation, and then you guys announced the ODP, then the ODPI initiative, but there seems to be a bit of a forking in Hadoop distros. Is that a fair statement? Unfair? >> I think if you look at how the Linux market played out. You have clearly Red Hat, you had Conicho Ubuntu, you had SUSE. You're always going to have curated platforms for different purposes. We have a strong opinion and a strong focus in the area of IoT, fast analytic data from the edge, and a centralized platform with HDP in the cloud and on-prem. Others in the market Cloudera is running sort of a different play where they're curating different elements and investing in different elements. Doesn't make either one bad or good, we are just going after the markets slightly differently. The other point I'll make there is in 2014 if you looked at the then chart diagrams, there was a lot of overlap. Now if you draw the areas of focus, there's a lot of white space that we're going after that they aren't going after, and they're going after other places and other new vendors are going after others. With the market dynamics of IoT, cloud and AI, you're going to see folks chase the market opportunities. >> Is that dispersity not a problem for customers now or is it challenging? >> There has to be a core level of interoperability and that's one of the reasons why we're collaborating with folks in the ODPI, as an example. There's still when it comes to some of the core components, there has to be a level of predictability, because if you're an ISV riding atop, you're slowed down by death by infinite certification and choices. So ultimately it has to come down to just a much more sane approach to what you can rely on. >> When you guys announced ODP, then ODPI, the extension, Mike Olson wrote a blog saying it's not necessary, people came out against it. Now we're three years in looking back. Was he right or not? >> I think ODPI take away this year, there's more than we can do above and beyond the Hadoop platform. It's expanded to include SQL and other things recently, so there's been some movement on this spec, but frankly you talk to John Mertic at ODPI, you talk to SAS and others, I think we want to be a bit more aggressive in the areas that we go after and try and drive there from a standardization perspective. >> We had Wei Wang on earlier-- >> Shaun: There's more we can do and there's more we should do. >> We had Wei on with Microsoft at our Big Data SV event a couple weeks ago. Talk about the Microsoft relationship with you guys. It seems to be doing very well. Comments on that. >> Microsoft was one of the two companies we chose to partner with early on, so and 2011, 2012 Microsoft and Teradata were the two. Microsoft was how do I democratize and make this technology easy for people. That's manifest itself as Azure Cloud Service, Azure HDInsight-- >> Which is growing like crazy. >> Which is globally deployed and we just had another update. It's fundamentally changed our engineering and delivering model. This latest release was a cloud first delivery model, so one of the things that we're proud of is the interactive SQL and the LLAP technology that's in HDP, that went out through Azure HDInsight what works data cloud first. Then it certified in HDP 2.6 and it went power at the same time. It's that cadence of delivery and cloud first delivery model. We couldn't do it without a partnership with Microsoft. I think we've really learned what it takes-- >> If you look at Microsoft at that time. I remember interviewing you on theCUBE. Microsoft was trading something like $26 a share at that time, around their low point. Now the stock is performing really well. Stockinnetel very cloud oriented-- >> Shaun: They're very open-source. >> They're very open-source and friendly they've been donating a lot to the OCP, to the data center piece. Extremely different Microsoft, so you slipped into that beautiful spot, reacted on that growth. >> I think as one of the stalwarts of enterprise software providers, I think they've done a really great job of bending the curve towards cloud and still having a mixed portfolio, but in sending a field, and sending a channel, and selling cloud and growing that revenue stream, that's nontrivial, that's hard. >> They know the enterprise sales motions too. I want to ask you how that's going over all within Hortonworks. What are some of the conversations that you're involved in with customers today? Again we were saying in our opening segment, it's on YouTube if you're not watching, but the customers is the forcing function right now. They're really putting the pressure one the suppliers, you're one of them, to get tight, reduce friction, lower costs of ownership, get into the cloud, flywheel. And so you see a lot-- >> I'll throw in another aspect some of the more late majority adopters traditionally, over and over right here by 2025 they want to power down the data center and have more things running in the public cloud, if not most everything. That's another eight years or what have you, so it's still a journey, but this journey to making that an imperative because of the operational, because of the agility, because of better predictability, ease of use. That's fundamental. >> As you get into the connected tissue, I love that example, with Kubernetes containers, you've got developers, a big open-source participant and you got all the stuff you have, you just start to see some coalescing around the cloud native. How do you guys look at that conversation? >> I view container platforms, whether they're container services that are running one on cloud or what have you, as the new lightweight rail that everything will ride atop. The cloud currently plays a key role in that, I think that's going to be the defacto way. In particularly if you go cloud first models, particularly for delivery. You need that packaging notion and you need the agility of updates that that's going to provide. I think Red Hat as a partner has been doing great things on hardening that, making it secure. There's others in the ecosystem as well as the cloud providers. All three cloud providers actually are investing in it. >> John: So it's good for your business? >> It removes friction of deployment ... And I ride atop that new rail. It can't get here soon enough from my perspective. >> So I want to ask about clouds. You were talking about the Microsoft shift, personally I think Microsoft realized holy cow, we could actaully make a lot of money if we're selling hardware services. We can make more money if we're selling the full stack. It was sort of an epiphany and so Amazon seems to be doing the same thing. You mentioned earlier you know Amazon is a great partner, even though a lot of people look at them as a competitor, it seems like Amazon, Azure etc., they're building out their own big data stack and offering it as a service. People say that's a threat to you guys, is it a threat or is it a tailwind, is it it is what it is? >> This is why I bring up industry-wide we always have waves of centralization, decentralization. They're playing out simultaneously right now with cloud and IoT. The fact of the matter is that you're going to have multiple clouds on-prem data and data at the edge. That's the problem I am looking to facilitate and solve. I don't view them as competitors, I view them as partners because we need to collaborate because there's a value chain of the flow of the data and some of it's going to be running through and on those platforms. >> The cloud's not going to solve the edge problem. Too expensive. It's just physics. >> So I think that's where things need to go. I think that's why we talk about this notion of connected data. I don't talk hybrid cloud computing, that's for compute. I talk about how do you connect to your data, how do you know where your data is and are you getting the right value out of the data by playing it where it lies. >> I think IoT has been a great sweet trend for the big data industry. It really accelerates the value proposition of the cloud too because now you have a connected network, you can have your cake and eat it too. Central and distributed. >> There's different dynamics in the US versus Europe, as an example. US definitely we're seeing a cloud adoption that's independent of IoT. Here in Europe, I would argue the smart mobility initiatives, the smart manufacturing initiatives, and the connected grid initiatives are bringing cloud in, so it's IoT and cloud and that's opening up the cloud opportunity here. >> Interesting. So on a prospects for Hortonworks cashflow positive Q4 you guys have made a public statement, any other thoughts you want to share. >> Just continue to grow the business, focus on these customer use cases, get them to talk about them at things like DataWorks Summit, and then the more the merrier, the more data-oriented open-source driven companies that can graduate in the public markets, I think is awesome. I think it will just help the industry. >> Operating in the open, with full transparency-- >> Shaun: On the business and the code. (laughter) >> Welcome to the party baby. This is theCUBE here at DataWorks 2017 in Munich, Germany. Live coverage, I'm John Furrier with Dave Vellante. Stay with us. More great coverage coming after this short break. (upbeat music)

Published Date : Apr 5 2017

SUMMARY :

brought to you by Hortonworks. Shaun great to see you again. Always a pleasure. in front of all the trends. Exactly. 99 is when you couldn't be happier for the and it's nice to see that graduating class Where's the value for you guys margins for the business You've got the edge, into the data center where you A subset of the data, yep. that failure's in the field, I got the hairy eyeball from you, With the community yeah, of the public markets. John: But you guys like if you look at our margins the market kind of flipped, and the cloud services, You get multiple revenue streams And that's how you grow the business, but now that you have kind on the Power Systems. called the Data Platform you have You provide the platform for 10x value to be running on the platform. You saw that with VMware. I think they don't between 15 to 20x. and then you guys announced the ODP, I think if you look at how and that's one of the reasons When you guys announced and beyond the Hadoop platform. and there's more we should do. Talk about the Microsoft the two companies we chose so one of the things that I remember interviewing you on theCUBE. so you slipped into that beautiful spot, of bending the curve towards cloud but the customers is the because of the operational, and you got all the stuff you have, and you need the agility of updates that And I ride atop that new rail. People say that's a threat to you guys, The fact of the matter is to solve the edge problem. and are you getting the It really accelerates the value and the connected grid you guys have made a public statement, that can graduate in the public Shaun: On the business and the code. Welcome to the party baby.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Dave Vellante	PERSON	0.99+
John	PERSON	0.99+
Europe	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
2014	DATE	0.99+
John Furrier	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
John Mertic	PERSON	0.99+
Mike Olson	PERSON	0.99+
Shaun	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Shaun Connolly	PERSON	0.99+
Centric	ORGANIZATION	0.99+
Teradata	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Coca-Cola	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
2016	DATE	0.99+
4.1 billion	QUANTITY	0.99+
Cloudera	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
90%	QUANTITY	0.99+
two	QUANTITY	0.99+
100 million	QUANTITY	0.99+
five	QUANTITY	0.99+
2011	DATE	0.99+
Mount Fuji	LOCATION	0.99+
US	LOCATION	0.99+
seven	QUANTITY	0.99+
185 million	QUANTITY	0.99+
eight years	QUANTITY	0.99+
four years	QUANTITY	0.99+
10x	QUANTITY	0.99+
Dahl Jeppe	PERSON	0.99+
YouTube	ORGANIZATION	0.99+
FedEx	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
100 million	QUANTITY	0.99+
one	QUANTITY	0.99+
MuleSoft	ORGANIZATION	0.99+
2025	DATE	0.99+
Red Hat	ORGANIZATION	0.99+
three years	QUANTITY	0.99+
15	QUANTITY	0.99+
two companies	QUANTITY	0.99+
2012	DATE	0.99+
Munich, Germany	LOCATION	0.98+
Hadoop	TITLE	0.98+
DataWorks 2017	EVENT	0.98+
Wei Wang	PERSON	0.98+
Wei	PERSON	0.98+
10%	QUANTITY	0.98+
eight years	QUANTITY	0.98+
20x	QUANTITY	0.98+
Hortonworks Hadoop Summit	EVENT	0.98+
end of 2016	DATE	0.98+
three billion dollars	QUANTITY	0.98+
SiliconANGLE	ORGANIZATION	0.98+
Azure	ORGANIZATION	0.98+
DataWorks Summit	EVENT	0.97+

Scott Gnau | DataWorks Summit Europe 2017

>> More information, click here. (soothing technological music) >> Announcer: Live from Munich, Germany, it's theCUBE. Covering Dataworks Summit Europe 2017. Brought to you by Hortonworks. (soft technological music) >> Okay welcome back everyone, we're here in Munich, Germany for Dataworks Summit 2017 formerly Hadoop Summit powered by Hortonworks. It's their event, but now called Dataworks because data is at the center of the value proposition Hadoop plus Airal Data and storage. I'm John, my cohost David. Our next guest is Scott Gnau he's the CTO of Hortonworks joining us again from the keynote stage, good to see you again. >> Thanks for having me back, great to be here. >> Good having you back. Get down and dirty and get technical. I'm super excited about the conversations that are happening in the industry right now for a variety of reasons. One is you can't get more excited about what's happening in the data business. Machine learning AI has really brought up the hype around, to me is human America, people can visualize AI and see the self-driving cars and understand how software's powering all this. But still it's data driven and Hadoop is extending into data seeing that natural extension and CloudAIR has filed their S1 to go public. So it brings back the conversations of this opensource community that's been doin' all this work in the big data industry, originally riding in the horse of Hadoop. You guys have an update to your Hadoop data platform which we'll get to in a second, but I want to ask you a lot of stories around Hadoop, I say Hadoop was the first horse that everyone rode in on in the big data industry... When I say big data, I mean like DevOps, Cloud, the whole open sourcing he does, but it's evolving it's not being replaced. So I want you to clarify your position on this because we're just talkin' about some of the false premises, a lot of stories being written about the demise of Hadoop, long-live Hadoop. Yeah, well, how long do we have? (laughing) I think you hit it first, we're at Dataworks Summit 2017 and we rebranded and it was previously Hadoop Summit. We rebranded it to really recognize that there's this bigger thing going on and it's not just Hadoop. Hadoop is a big contributor, a big driver, a very important part of the ecosystem but it's more than that. It's really about being able to manage and deliver analytic content on all data across that data's lifecycle from when it gets created at the edge to its moving through networks, to its landed and store in a cluster to analytics run and decisions go back out. It's that entire lifecycle and you mentioned some of the megatrends and I talked about this morning in the opening keynote. With AI and streaming and IoT, all of these things kind of converging are creating a much larger problem set and frankly, opportunity for us as an industry to go soft. So that's the context that we're really looking-- >> And there's real demand there. This is not like, I mean there's certainly a hype factor on AI, but IoT is real. You have data now, not just a back office concept, you have a front-facing business centric... I mean there's real customer demand here. >> There's real customer demand and it really creates the ability to dramatically change a business. A simple example that I used onstage this morning is think about the electric utility business. I live in Southern California. 25 years ago, by the way I studied to be an electrical engineer, 20 years ago, 30 years ago, that business not entirely simple was about building a big power plant and distributing electrons out to all the consumers of electrons. One direction and optimization of that grid, network and that business was very hard and there was billions of dollars at stake. Fast forward to today, now you still got those generating plants online, but you've also got folks like me generating their own power and putting it back into the grid. So now you've got bidirectional electrons. The optimization is totally different. Then how do you figure out how most effectively to create capacity and distribute that capacity because created capacity that's not consumed is 100% spoiled. So it's a huge data problem but it's a huge data problem meeting IoT, right? Devices, smart meter devices out at the edge creating data doing it in realtime. A cloud blew over, my generating capacity on my roof went down so I've got to pull from the grid, combining all of that data to make realtime decisions is we're talking hundreds of billions of dollars and it's being done today in an industry, it's not a high-tech Silicon Valley kind of industry, electric utilities are taking advantage of this technology today. >> So we were talking off-camera about you know some commentary about the Hadoop is failed and obviously you take exception to that and I and you also made the point it's not just about Hadoop but in a way it is because Hadoop was the catalyst of all this open Why has Hadoop not failed in your view >> Well because we have customers and you know the great thing about conferences like this is we're actually able to get a lot of folks to come in and talk about what they're doing with the technology and how they're driving business benefit and share that business benefit to their colleagues so we see that that it's business benefit coming along you know In any hype cycle you know people can go down a path maybe they had false expectations right early on you know six years ago years ago we were talking about hey is open source of Hadoop is going to come along and replace EDW complete fallacy right what I talked about in that opportunity being able to store all kinds of disparate data being able to manage and maneuver analytics in real time that's the value proposition is very different than some of the legacy ten. So if you view it as hey this thing is going to replace that thing okay maybe not but the point is is very successful for what is not verified that-- >> Just to clarify what you just said there that was you guys never kicked that position. CloudAIR or did with their impala was their initial on you could give me that you don't agree with that? >> Publicly they would say oh it's not a replacement but you're right i mean the actions were maybe designed to do that >> And set in the marketplace that that might be one of the outcomes >> Yeah, but they pivoted quickly when they realized that was failed strategy but i mean that but that became a premise that people locked in on. >> If that becomes your yardstick for measuring then then so-- >> Oh but but wouldn't you agree that that Hadoop in many respects was designed to solve some of the problems that edw never could >> Exactly so so you know again when you think about the the variety of data when you think about the analytic content doing time series analysis is very hard to do in a relational model so it's a new tool in the workbench to go solve analytic problems and so when you look at it from that perspective and I use the utility example the manufacturing example financial consumer finance telco all of these companies are using this technology leveraging this technology to solve problems they couldn't solve or and frankly to build new businesses that they couldn't build before because they didn't have access to that real time-- >> And so money did shift from pouring money into the edw with limited returns because you were at the steep part or the flat part of the s-curve to hey let's put it over here and this so called big data thing and that's why the market I think was conditioned to sort of come to that simple conclusion but dollars the spending did shift did it not? >> Yeah I mean if you subscribe kind of that to that herd mentality and you know the net increase the net new expenditure in the new technology is always going to outpace the growth of the existing kind of plateau technologists. That's just math. >> The growth yes, but not the size not the absolute dollars and so you have a lot of companies right now struggling in the traditional legacy space and you got this rocket ship going in-- >> And again I think if you think about kind of the converging forces that are out there in addition to you know i OT and streaming the ability frankly Hadoop is an enabler of AI when you think about the success of AI and machine learning it's about having massive massive massive amounts of data right? And I think back 25 years ago my first data Mart was 30 gigabytes and we thought that was all the data in the world Now fits on your phone so so when you think about just having the utter capacity and the ability to actually process that capacity of data these are technology breakthroughs that have been driven in the poor open source in Hadoop community when combined with the ability then to execute in clouds and ephemeral kinds of workloads you combine all that stuff together now instead of going to capital committee for 20 millioin dollars for a bunch of hardware to do an exabyte kind of study where you may not get an answer that means anything you can now spin that up in the cloud and for a couple of thousand dollars get the answer take that answer and go build a new system of insight that's going to drive your business and this is a whole new area of opportunity or even by the convergence of all that >> So I agree i mean it's absurd to say Hadoop and big data has failed, it's crazy. Okay but despite the growth i called profitless prosperity can the industry fund itself I mean you've got to make big bets yarn tezz different clouds how does the industry turn into one that is profitable and growing well I mean obviously it creates new business models and new ways of monetizing software in deploying software you know one of the key things that is core to our belief system is really leveraging and working with and nurturing the community is going to be a key success factor for our business right nurturing that innovation in collaboration across the community to keep up with the rate of pace of change is one of the aspects of being relevant as a business and then obviously creating a great service experience for our customers so that they they know that they can depend on enterprise class support enterprise-class security and governance and operational management in the cloud and on-prem in creating that value propisition along with the the advanced and accelerated delivery of innovation is where I think you know we kind of intersect uniquely in in the in the industry. >> and one of the things that I think that people point out and I have this conversation all the time of people who try to squint through the you know the wall street implications of the value proposition of the industry and this and that and I want to get your thoughts on because open source at this era that we're living in today bringing so much value outside of just important works in your your company Dave would made a comment on the intro package we're doing is that the practitioners are getting a lot of value people out in the field so these are the white space as a value and they're actually transformative can you give some examples where things are getting done that are real of real value as use cases that are that are highlighted you guys can i light I think that's the unwritten story that no one thought about it that rising tide floating all boat happening? >> Yeah yes I mean what is the most use cases the white so you have some of those use cases again it really involves kind of integrating legacy traditional transactional information right very valuable information about a company its operations its customers its products and all this kind of thing about being able to combine that with the ability to do real-time sensor management and ultimately have a technology stack that enables kind of the connection of all of those sources of data for an analytic and that's an important differentiation you know for the first 25 years of my career right it was all about what school all this data into a place and then let's do something with it and then we can push analytics back not an entirely bad model but a model that breaks in the world of IOT connected devices it's just frankly isn't enough money to spend on bandwidth to make that happen and as fast as the speed of light is it creates latency so those decisions aren't going to be able to be made in time so we're seeing even in traditional i mentioned utility business think about manufacturing oil and gas right sensors everywhere being able to take advantage not not of collecting all the central data and all of that but being able to actually create analytics based on sensor data and put those analytics outs of the sensors to make real-time decisions that can affect hundreds of millions of dollars of production or equipment are the use cases that we're seeing be deployed today and that's complete white space that was unavailable before. >> Yeah and customer demand too I mean Dave and I were also debating about the this not being a new trend this is just big data happening the customers are demanding production workload so you've seen a lot more forcing function driven by the customer and you guys have some news I want to get to and give your thoughts on HTTP or worse data platform two points dicks what's the key news their house in real time you talking about real time. >> Yeah it's about real time real time flexibility and choice you know motherhood and apple pie >> And the major highlights of that operate >> So the upgrades really inside of hive we now have operational analytic query capabilities where when you do tactical response times second sub second kind of response time. >> You know Hadoop and Hive wasn't previously known for that kind of a tactical response we've been able to now add inside of that technology the ability to view that workload we have customers who building these white space applications who have hundreds or thousands of users or applications that depend on consistency of very quick analytic response time we now deliver that inside the platform what's really cool about it in addition to the fact that it works is is that we did it inside a pipe so we didn't create yet another project or yet another thing that a customer has to integrate to or rewrite their application so any high based application cannot take advantage of this performance enhancement and that's part of our thinking of it as a platform the second thing inside of that that we've done that really it creaks to those kinds of workload is is we've really enhance the ability to incremental data acquisition right whether it be streaming whether it be patch up certs right on the sequel person doing up service being able to do that data maintenance in an active compliant fashion completely automatically and behind the scenes so that those applications again can just kind of run without any heavy lifting >> Just staying in motion kind of thing going on >> Right it's anywhere from data in motion even to batch to mini batch and anywhere kind of in between but we're doing those incremental data loads you know, it's easy to get the same file twice by mistake you don't want to double count you want to have sanctity of the transactions we now handle that inside of Hive with acid compliance. >> So a layperson question for the CTO if I may you mentioned Hadoop was not known for a sort of real-time response you just mentioned acid it was never in the early days known for a sort of acid you know complies others would say you know Hadoop the original Big Data Platform is not designed for the matrix of the matrix math of AI for example are these misconceptions and like Tim Berners-lee when we met Tim Berners-lee web 2.0 this is what the web was designed for would you say the same thing about Hadoop? >> Yeah. Ultimately from my perspective and kind of mending it out, Hadoop was designed for the easy acquisition of data the easy onboarding of data and then once you've onboarded that data it it also was known for enabling new kinds of analytics that could be plugged in certainly starting out with MapReduce in HDFS was kind of before but the whole idea is I have now the flexible way to easily acquire data in its native form without having to apply schema without having to have any formatting distort I can get it exactly as it was and store it and then I can apply whatever schema whatever rules whatever analytics on top of that that I want so the center of gravity from my mind has really moved up to yarn which enables a multi-tenancy approach to having pluggable multiple different kinds of file formats and pluggable different kinds of analytics and data access methods whether it be sequel whether it be machine learning whether the HBase will look up and indexing and anywhere kind of in between it's that it's that Swiss Army knife as it were for handling all of this new stuff that is changing every second we sit here data has changed. >> And just a quick follow-up if I can just clarification so you said new types of analytics that can be plugged in by design because of its openness is that right? >> By design because of its openness and the flexibility that the platform was was built for in addition on the performance we've also got a new update to spark and usability consume ability and collaboration for data scientists using the latest versions of spark inside the platform we've got a whole lot of other features and functions as that our customers have asked for and then on the flexibility and choice it's available public cloud infrastructures of service public cloud platform as a service on Prem x and net new on prem with power >> Just got final question for you just as the industry evolves what are some of the key areas that open source can pivot to that really takes advantage of the machine learning the AI trends going on because you start to see that really increase the narrative around the importance of data and a lot of people are scratching their heads going okay i need to do the back office to set up my IT to have all those crates stuff always open source projects all that the Hadoop data platform but then I got to get down and dirty i might do multiple clouds on the hybrid cloud going on i might want to leverage the moles canoe cool containers and super Nettie's and micro services and almost devops where's that transition happening as a CTO what do you see that that how do you talk to customers about that this transition this evolution of how the data businesses in getting more and more mainstream? >> Yeah i mean i think i think the big thing that people had to get over is we've reverse polarity from again 30 years of I want a stack vendor to have an integrated stack of everything a plug-and-play it's integrated and end it might not be a hundred percent what I want but the cost leverage that I get out of the stack versus what I'm going to go do that's perfect in this world if the opposite it's about enabling the ecosystem and that's where having and by the way it's a combination of open source and proprietary software that you know some of our partners have proprietary software that's okay but it's really about enabling the ecosystem and I think the biggest service that we as an open source community can do is to continue to kind of keep that standard kernel for the platform and make it very usable and very easy for many apps and software providers and other folks. >> A thousand flower bloom and kind of concept and that's what you've done with the white spaces as these cases are evolving very rapidly and then the bigger apps are kind of going to settling into a workload with realtime. >> Yeah all time you know think about the next generation of IT professional the next generation of business professional grew up with iphones and here comes they grew up in a mini app world i mean it download an app i'm going to try it is a widget boom and it's going to help me get something done but it's not a big stack that I'm going to spend 30 years to implement and I liked it and then I want to take to those widgets and connect them together to do things that i haven't been able to do before and that's how this ecosystem is really-- >> Great DevOps culture very agile that's their mindset. So Scott congratulations on your 2.6 upgrade and >> Scott: We're thrilled about it. >> Great stuff acid compliance really big deal again these compliance because little things are important in the enterprise great all right thanks for coming to accuse the Dataworks in Germany Munich I'm John thanks for watching more coverage live here in Germany after this short break

Published Date : Apr 5 2017

SUMMARY :

(soothing technological music) Brought to you by Hortonworks. because data is at the center of the value proposition that are happening in the industry you have a front-facing business centric... combining all of that data to make realtime decisions and share that business benefit to their Just to clarify what you just said there a premise that people locked in on. that to that herd mentality and you know the community to keep up with the rate cases the white so you have some of debating about the this not being a new So the upgrades really inside of hive we it's easy to get the same file twice by mistake you the CTO if I may you mentioned Hadoop acquisition of data the easy onboarding the big thing that people had to get kind of going to settling into a So Scott congratulations on your 2.6 upgrade and

ENTITIES

Entity	Category	Confidence
Scott	PERSON	0.99+
100%	QUANTITY	0.99+
John	PERSON	0.99+
David	PERSON	0.99+
Dave	PERSON	0.99+
Germany	LOCATION	0.99+
Southern California	LOCATION	0.99+
30 years	QUANTITY	0.99+
30 gigabytes	QUANTITY	0.99+
Scott Gnau	PERSON	0.99+
hundreds	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
Swiss Army	ORGANIZATION	0.99+
six years ago years ago	DATE	0.99+
America	LOCATION	0.99+
25 years ago	DATE	0.99+
Hadoop	TITLE	0.99+
Munich, Germany	LOCATION	0.99+
today	DATE	0.98+
Dataworks Summit 2017	EVENT	0.98+
30 years ago	DATE	0.98+
two points	QUANTITY	0.98+
iphones	COMMERCIAL_ITEM	0.98+
telco	ORGANIZATION	0.98+
Hadoop	ORGANIZATION	0.98+
hundred percent	QUANTITY	0.98+
billions of dollars	QUANTITY	0.98+
first 25 years	QUANTITY	0.97+
DevOps	TITLE	0.97+
hundreds of millions of dollars	QUANTITY	0.97+
20 years ago	DATE	0.97+
20 millioin dollars	QUANTITY	0.97+
twice	QUANTITY	0.97+
DataWorks Summit	EVENT	0.97+
first	QUANTITY	0.97+
one	QUANTITY	0.97+
One	QUANTITY	0.96+
second thing	QUANTITY	0.96+
Tim Berners-lee	PERSON	0.96+
Silicon Valley	LOCATION	0.96+
Munich	LOCATION	0.96+
Hadoop Summit	EVENT	0.96+
One direction	QUANTITY	0.96+
first horse	QUANTITY	0.95+
first data	QUANTITY	0.95+
Dataworks	ORGANIZATION	0.94+
second	QUANTITY	0.92+
Cloud	TITLE	0.92+
EDW	ORGANIZATION	0.85+
2017	EVENT	0.85+
couple of thousand dollars	QUANTITY	0.84+
Dataworks Summit Europe 2017	EVENT	0.84+
MapReduce	TITLE	0.84+
thousands of users	QUANTITY	0.83+
lot of folks	QUANTITY	0.83+
this morning	DATE	0.8+
S1	TITLE	0.79+
Europe	LOCATION	0.78+
A thousand flower bloom	QUANTITY	0.78+
2.6	OTHER	0.76+
apps	QUANTITY	0.73+

Martin Lidl, Chris Murphy & Itamar Ankorion - BigData SV - #BigDataSV - #theCUBE

>> Announcer: Live from San Jose, California, it's the CUBE, covering Big Data Silicon Valley 2017. >> Good afternoon everyone. This is George Gilbert. We're at Silicon Valley Big Data in conjunction with Strata and Hadoop World. We've been here every year for six years and I'm pleased to bring with us today a really interesting panel, with our friends from Attunity, Itamar Ankorion. We were just discussing, has an Israeli name, but some of us could be forgiven for thinking Italian or Turkish. Itamar is CMO of Attunity. We have Chris Murphy who is from a very large insurance company that we can't name right now, and then Martin Lidl from Deloitte. We're going to be talking about their experience building a data lake, a high value data lake, and some of the technology choices they made, including how Attunity fits in that. Maybe kicking that off, Chris, perhaps you can tell us what the big objectives were for the data lake, in terms of what outcomes were you seeking. >> Okay, I'd start off by saying there wasn't any single objective. It was very much about putting in a key enterprise component that would facilitate many, many things. When I look at it now and I look back, with wisdom hopefully, I see it as trying to put in data as a service within the company. Very much we built it as an operational data lake first and foremost, because we wanted to generate value for the company. I very much convey to people that this was something that was worth investing in on an ongoing basis, and then on the back of that of course, once you've actually pulled all the data together and started to curate it and make it available, then you can start doing the research work as well. We were trying to get the best of both worlds from that perspective. >> Let me follow up with that just really quickly. It sounds like if you're doing data as a service, it's where central IT as a function created a platform on which others would build applications and you had to make that platform mature at a certain level, not just the software but the data itself. Then at that point, did you show prototype applications to different departments and business units, or how did the uptake, you know, how organically did that move? >> Not so much, it was very much a fast delivering, agile, set of projects working together, so we actually had, and we used to call it the holy trinity of the projects we were doing. We had putting in a new customer portal that would be getting all of its data from the data lake, putting in a new CRM system getting all of its data from the data lake and talking to the customer portal, and then of course at the back behind that, the data lake itself feeding all the data to these systems. We weren't developing in parallel to to those projects, but of course those were not small projects. Those were sizable beasts, but side by side with that, we were still able to use the data lake to do some proof of concept work around analytics. Interestingly, one of the first things we used the data lake for, in terms of on the analytics side, was actually meeting a government regulatory requirement, where they needed us to get an amount of data together for them very quickly. When I say quickly, I mean within two weeks. We went to our typical suppliers and said, "How long will this take?" About three months, they thought. In terms of actually using the data lake, we pulled the data together in about two days and most of the delays were due to the lack of strict requirements, where we were just figuring out exactly what people wanted, and that really helped demonstrate the benefit of having a data lake at base. >> So Martin, tell us how Deloitte, you know, with its sort of deep bench of professional services skills, could help make that journey easier for Chris and for others. >> There were actually a number of areas where we engaged ... We were all the way from the very beginning, engaged in working on the business case creation and really when it sort of came to life was when we brought our technology people actually in to work out a road map of how to deal with it. As Chris said, there were many moving parts, therefor many teams within Deloitte that were engaged with different areas of specialization, so from a big development perspective on the one hand to sales force, CRM in the background, and then obviously my team of sort of data ninjas that came in and built the data lake. What we also did is actually we partnered with other third parties on the testing side, so that we covered, really, the full life cycle there. >> If I were to follow up with that, it sounds like because there were other systems being built out in parallel that depended on this, you probably had less, fewer degrees of freedom in terms of what the data had to look like when you were done. >> I think that's true, to a degree, but when you look at that every model that we deployed, it was very much agile delivery and we, during the liberation phase, we were working together very closely across these three teams, right? So there was a certain amount of, well not freedom in terms of what to deliver in the end, but to come to an agreement as to what good will look like at the end of a sprint or for a release, so there were no surprises as such. Still, through the flexible architecture that we had built and the flexible model that we had delivering, we could also respond to changes very quickly, so if the product owner changed priority or made priority calls and changed priority items in the backlog, we could quite quickly respond to this. >> So Itamar, maybe you can help us understand how Attunity added value, that other products couldn't really do and how it made the overall pipeline more performant. >> Okay, absolutely. The project that again, this Fortune 100 company was putting together, was an operational data lake. It was very important for them to get data from a lot of different data sources, so they can merge it together for analytic purposes, and also get data in real time so they can support real time analytics using information that is very fresh. That data in many financial services and insurance companies came from the mainframe, so multiple systems on the mainframe as well as other systems, and they needed an efficient way to get the data ingested into their data lake, so that's where Attunity came in, as part of the overall data lake architecture, to support an incremental, continuous, universal data ingestion process. Attunity replicate lends itself to being able to load the data directly into the data lake, into Hadoop, in this case, or also if they opt to use Kafka or go through mechanisms like Kafka and others, so it provided a lot of flexibility architecturally to capture data as it changes, in their many different databases and feed that into the data lake so it can be used for different types of analytics. >> So just to drill down on that one level, 'cuz many of us would assume that, you know, the replication log that Attunity sort of models itself after, would be similar to the event log that Kafka works, sort of models itself after. Is it that if you use Kafka you have to modify the source systems, and therefor it puts more load on them, where as with Attunity you are sort of piggybacking on what's already happening, and so you don't add to the load on those systems? >> Okay, great question. Let me clarify. >> Okay. First of all, Kafka is a great technology that we're seeing more and more customers adopt as part of their overall big data management architectures. It's a public subscribe basically infrastructure that allows you to scale up the messaging of data and storage of data as events, as messages, so you can easily move it around and process it also in a more real time streaming fashion. Attunity complements Kafka and is actually very well integrated with it, as well as other streaming type of ingestion data processing technologies. What Attunity brings to the picture here is primarily the key function of technology, CDC, change data capture, which is the ability, the technology to capture the data as it changes, in many different databases. Do that in a manner that has very little impact, if any, on the source system and the environment, and deliver it in real time. So what Attunity does in a sense, we turn the databases to be live feeds that then can stream, either directly, either we can take it directly into platforms such as Hive, HDFS, or we can feed it into Kafka for further processing integration through Kafka integration. So again, it's very complementary in that sense. >> Okay. So maybe give us, Chris, a little more color on the before and after state, you know, before these multiple projects happened, and then the data lake as sort of a data foundation for these other systems that you're integrating. What business outcomes changed, and how did they change? >> Oof, that's a tough question. I've been asked many flavors of that question before and the analogy I always come back to is it's like we were moving from candle power to electricity. There's no single use case that shows this is why you need a data lake. It was many, many things they wanted to do. In the before picture, again that was always just very challenging, so like many companies, we've outsourced the mainframe support operation and running of our system to third parties, and we were constrained by that. You know, we were in that crazy situation where we couldn't get to our own data. By implementing the data lake, we've broken down that barrier. We now have things back in our control. I mentioned before that POC we did with the regulatory reporting, again, three months ... Two days. It was night and day in terms of what we were now able to do. >> Many banks are beginning to say that their old business model was get the customers' checking account and then, you know, upsell, cross sell, to all these other related products or services. Is something happening like that with insurance, where if you break down the data silos, it's easier to sell other services? >> There will be, is probably the best way to put it. We're not there yet, and you know it's a road, right? It's a long journey and we're doing it in stages, so I think we've done what? Three different releases on the data lake to date? That's very much on the plan. We want to do things like nudges to demonstrate to the customers how there are products that could be a very good fit for them, because once you understand your customer, you understand what their gaps are, what their needs, what their wants are. Again, very much in the roadmap, just not at that part of the map yet. >> So help us maybe understand some of the near term steps you want to take on that roadmap towards that nirvana. >> So, those >> And what the role Attunity as a vendor might play, and Deloitte, you know as a professional service organization, to help get you there. >> So Attunity was obviously was all about getting the data there as efficiently as possible. Unfortunately like many things, in your first iteration it's still, our data lake is still running on a batch basis, but we'd like to evolve that as time goes by. In terms of actually making use of the lake, one of the key things that we were doing on that was actually implementing a client matching solution, so we didn't actually have a MDM system in place for managing our customers. We had 12 different policy admin systems in place. Customers could be coming to us being enrolled, they could be a beneficiary, they could be the policy holder, they could be a power of attorney, and we could talk to someone on the phone and not really understand who they were. You get them into the data lake, you start to build up that 360 view about who people are, then you start to understand what can I do for this person. That was very much the journey we're going on. >> And Martin, have you worked with ... Are you organized by industry line and is there a sort of capability maturity level where you know, you can say, okay, you have to master these skills and at that skill level then you can do these richer business offerings? >> Yeah, absolutely. First of all, yes, we are organized by industry groups and we have sort of a common model across industry store that describe what you just said. When we talk about inside strength in organization, this is really where you are sort of moving to on the maturity curve, as you become more mature in using your analytical capabilities and turning data from just data into information, into a real asset you can actually monetize, right? Where we went with Chris' organization and actually there's many other life insurers, is actually sort of the first step on this journey, right? What Chris described around for the first time being able to see a customer centric view and see what a customer has in terms of product, and therefor what they don't have, right? And where there's opportunities for cross selling, this is sort of a first step into becoming more proactive, right? There's actually a lot more that can follow on after that, but yeah, we've got maturity models that we assess against and we sort of gradually move people, organizations to the right place for them, because it's not going to be right for every organization to be an inside driven organization, to make this huge investment, to get there, but most companies will benefit will benefit from nudging them in that direction. >> Okay, and on that note we're going to have to leave it here. I will say that I think that there's a session at 2:30 today with the Deloitte and the unnamed insurance team talking in greater depth about the case study, with Attunity. On that, we'll be taking a short break. We'll be back at Big Data Silicon Valley. This is George Gilbert and we'll see you in a few short minutes.

Published Date : Mar 14 2017

SUMMARY :

it's the CUBE, and some of the technology choices they made, and started to curate it and make it available, or how did the uptake, you know, of the projects we were doing. you know, with its sort of deep bench of professional that came in and built the data lake. had to look like when you were done. and the flexible model that we had delivering, So Itamar, maybe you can help us understand the data directly into the data lake, into Hadoop, Is it that if you use Kafka you have to modify Okay, great question. that allows you to scale up the messaging of data before and after state, you know, before these multiple and the analogy I always come back to is it's like and then, you know, upsell, cross sell, to all these other Three different releases on the data lake to date? you want to take on that roadmap towards that nirvana. professional service organization, to help get you there. one of the key things that we were doing on that where you know, you can say, okay, on the maturity curve, as you become more mature Okay, and on that note we're going to have to leave it here.

ENTITIES

Entity	Category	Confidence
Martin	PERSON	0.99+
Chris Murphy	PERSON	0.99+
Chris	PERSON	0.99+
George Gilbert	PERSON	0.99+
Deloitte	ORGANIZATION	0.99+
six years	QUANTITY	0.99+
San Jose, California	LOCATION	0.99+
Kafka	TITLE	0.99+
Big Data	ORGANIZATION	0.99+
first step	QUANTITY	0.99+
Itamar Ankorion	PERSON	0.99+
Strata	ORGANIZATION	0.99+
Two days	QUANTITY	0.99+
Chris'	PERSON	0.99+
today	DATE	0.99+
first time	QUANTITY	0.99+
Itamar	PERSON	0.99+
BigData SV	ORGANIZATION	0.98+
Attunity	ORGANIZATION	0.98+
one	QUANTITY	0.98+
three teams	QUANTITY	0.98+
Hadoop World	ORGANIZATION	0.98+
three months	QUANTITY	0.98+
single objective	QUANTITY	0.98+
two weeks	QUANTITY	0.97+
first iteration	QUANTITY	0.97+
#BigDataSV	ORGANIZATION	0.96+
about two days	QUANTITY	0.96+
First	QUANTITY	0.96+
both worlds	QUANTITY	0.95+
12 different policy admin systems	QUANTITY	0.95+
Three different releases	QUANTITY	0.95+
360 view	QUANTITY	0.95+
About three months	QUANTITY	0.94+
Israeli	OTHER	0.93+
single use case	QUANTITY	0.93+
Martin Lidl	PERSON	0.91+
Hadoop	TITLE	0.91+
data lake	ORGANIZATION	0.9+
first	QUANTITY	0.9+
one level	QUANTITY	0.88+
2:30 today	DATE	0.87+
Turkish	OTHER	0.86+
first things	QUANTITY	0.85+
Attunity	TITLE	0.84+
Hive	TITLE	0.82+
Italian	OTHER	0.77+
2017	EVENT	0.7+
Fortune 100	ORGANIZATION	0.69+
third	QUANTITY	0.68+
Silicon Valley	LOCATION	0.65+
Big Data	EVENT	0.62+
Silicon	LOCATION	0.58+
every	QUANTITY	0.58+
Valley	ORGANIZATION	0.58+
data	ORGANIZATION	0.55+
HDFS	TITLE	0.52+
agile	ORGANIZATION	0.51+

Jean-Pierre Dijcks, Oracle - On the Ground - #theCUBE

>> Narrator: The Cube presents, On the Ground. (techno music) >> Hi I'm Peter Burris, welcome to, an On the Ground here at Oracle Headquarters, with Silicon Angle media The Cube. Today we're talking to JP Dijcks, who is the master product manager inside, or one of the master product managers, inside Oracle's big data product group, welcome JP. >> Thank you Peter. >> Well, we're going to talk about how developers get access to this plethora, this miasma, this unbelievable complexity of data that's being made possible by IOT, traditional applications, and other sources, how are developers going to get access to this data? >> That's a good question Peter, I still think that one of the key aspects to getting access to that data is SQL, and so that's one of the ways we are driving, try to figure out, can we get the Oracle SQL engine, and all the richness of SQL analytics enabled on all of that data, no matter the what the format is, or no matter where it lives, how can I enable those SQL analytics on that, and then obviously we've all seemed to shift in APIs, and languages, like people don't necessarily always want to speak SQL and write SQL questions, or write SQL queries. So how do we then enable things like R, how do we enable plural, how do we enable Python, all sorts of things like that, how do we do that, and so the thought we had was, can we use SQL as the common meta-data interface? And the common structure around some of this, and enable all of these languages on top of that through the database. So that's kind of the baseline of what we're thinking of, of enabling this to developers and large communities of users. So that's SQL as an access method, do you also envision that SQL will also be a data creation language? As we think about how to envision big data coming together from a modeling perspective. >> So I think from a modeling perspective the meta-data part we certainly look at as a creation or definition language is probably the better word, how do I do structured queries, 'cause that's what SQL stands for, how do I do that on Jason documents, how do I do that on IOT data as you said, how do I get that done, and so we certainly want to create the meta-data, in like a very traditional data base catalog, or if you compare to a Hive Catalog, very much like that. The execution is very different, it uses the mechanisms under the cover that no SQL data bases have, or that Hadoop HDFS offer, and we certainly have no real interest in doing insert into Hadoop, 'cause the transaction mechanisms work very very differently, so its really focused on the meta-data areas and how do I expose that, how do I classify and categorize that data in ways people know and have seen for years. >> So that data manipulation will be handled by native tools, and some of the creations, some of the generation, some of the modeling will be handled now inside SQL, and there are a lot of SQL folks out there that have pretty good afinity for how to work with data. >> That's absolutely correct. >> So that's what it is, now how does it work? Tell us a bit about how this big data SQL is going to work, in a practical world. >> Okay. So we talked about the modeling already. The first step is that we extend the Oracle database and the catalog to understand things like Hive objects or HDFS kind of, where does stuff live. So we expanded and so we found a way to classify the meta-data first and foremost. The real magic is leveraging the Hadoop stack, so you ask a BI question and you want to join data in Oracle transactions, finance information, let's say with IOT data, which you'd reach out to HDFS for, big data SQL runs on the Hadoop notes, so it's local processing of that data, and it works exactly as HDFS and Hadoop work, in other words, I'm going to do processing local, I'm going to ask the name note which blocks am I supposed to read, that'll get run, we generate that query, we put it down to the Hadoop notes. And that's when some of the magic of SQL kicks in, which is really focused on performance, its performance, performance, performance, that's always the problem with federated data, how do I get it to perform across the board. And so what we took was, >> Predictably. >> Predictably, that's an interesting one, predictable performance, 'cause sometimes it works, sometimes it doesn't. So what we did is we took the exadata that was stored on the software, with all the magic as to how do I get a performance out of a file system out of IO, and we put that on the Hadoop notes, and then we push the queries all the way down to that software, and it does filtering, it does predicate pushdown, it leverages features like Parquet and ORC on the HDFS side, and at the end of the day, it kind of takes the IO requests, which is what a SQL query gives, feeds it to the Hadoop notes, runs it locally, and then sends it back to the database. And so we filter out a lot of the gunk we don't need, 'cause you said, oh I only need yesterdays data, or whatever the predicates are, and so that's how we think we can get an architecture ready that allows the global optimization, 'cause we can see the entire ecosystem in its totality, IOT, Oracle, all of it combined, we optimized the queries, push everything down as far as we can, algorithms to data, not data to algorithms, and that's how we're going to run this performance, predictably performance, on all of these pieces of data. >> So we end up with, if I got this right, let me recap, so we've got this notion that for data creation, data modeling, we can now use SQL, understood by a lot of people, doesn't preclude us from using native tools, but at least that's one place where we can see how it all comes together, we continue to use local tools for the actual manipulation elements. >> Absolutely. >> We are now using synergy like structures so we can push algorithm down to the data, so we're moving a small amount of data to a large amount of data, 'cause its cost down and improves predictability, but at the same time we've got meta-data objects that allow us to anticipate with some degree of predictability how this whole thing will run, and how this will come together back at the keynote, got that right? >> Got that right. >> Alright, so, next question is what's the impact of doing it this way? Talk a bit about, if you can, about how its helping folks who run data, who build applications, and who actually who are trying to get business value out of this whole process. >> So if we start with the business value, I think the biggest thing we bring to the table is simplicity, and standardization. If I have to understand how is this object represented in NoSQL, how in HDFS, how did somebody put a Jason file in here, I have to now spend time on literally digging through that, and then does it conform, do I have to modify it, what do I do? So I think the business value comes out of the SQL layer on top of it. It all looks exactly the same. It's well known, it's well understood, its far quicker to get from, I've got a bunch of data, to actually building a VI report, building a dashboard, building KPIs, and integrating that data, there's nothing new to data, its a level of abstraction we put on top of this, whether you use API or in this case we use SQL, 'cause that's the most common analytics language. So that's one part of how it will impact things. The 2nd is, and I think that's where the architecture is completely unique, we keep complete control of the query execution, from the meta-data we just talked about, and that enables us to do global optimization, and we can, and if you think this through a little bit, and go, oh global optimization sounds really cool, what does that mean? I can now actually start pushing processing, I can move data, and its what we've done in the exadata platform for years, data lives on disk, oh, Peter likes to query it very frequently, let's move it up to Flash, let's move it up to in-memory, let's twist the data around. So all the sudden we got control, we understand what gets queried, we understand where data lives, and we can start to optimize, exactly for the usage pattern the customer has, and that's always the performance aspect. And that goes to the old saying of, how can I get data as quickly to a customer when he really needs it, that's what this does, right, how can I optimize this? I've got thousands of people querying certain elements, move them up in the stack and get the performance and all these queries come back in like seconds. Regulatory stuff that needs to go through like five years of data, let's put it in cheap areas, and let's optimize that, and so the impact is cheaper and faster at the end of the day, and all 'cause there's a singular entity almost that governs the data, it governs the queries, it governs the usage patterns, that's what we uniquely bring to the table with this architecture. >> So I want to build on the notion of governance, because actually one of the interesting things you said was the idea that if its all under a common sort of interfaces, then you have greater visibility, where the data is, who owns it, et cetera. If you do this right, one of the biggest challenges that business are having is the global sense of how you govern your data. If you do this right, are you that much closer to having a competent overall data governance? >> I think we were able to set up a big step forward on it, and it sounds very simple, but we now have a central catalog, that actually understands what your data is and where it lives, in kind of like a well-known way, and again it sounds very simple but if you look at silos, that's the biggest problem, you have multiple silos, multiple things are in there, nobody knows really what's in there, so here we start to publish this in like a common structural layer, we have all the technical meta-data, we track who queries what, who does all those things, so that's a tremendous help in governance. The other side of course, because we still use native tools to let's say manipulate some data, or augment or add new data, we now are going to tie in a lot of the meta-data, that comes from say the Hadoop ecosystem, again into this catalog, and while we're probably not there yet just today on the end to end governance everything's kind of out of the box, here we go. >> And probably never will be. >> And we probably never will, you're right, and I think we set a major step forward with just consolidating it, and exposing people to all the data the have, and you can run all the other tools like, crawl my data and check box anything that says SSN, or looks like a social security number, all of those tools are are still relevant. We just have a consolidated view, dramatically improved governance. >> So I'm going to throw you a curve ball. >> Sure. >> Not all data I want to use is inside my business, or is being generated by sensors that I control, how does big data SQL and related technologies play a role in the actual contracting for additional data sources, and sustaining those relationships that are very very fundamental, how data's shared across organizations. Do you see this information being brought in under this umbrella? Do you see Oracle facilitating those types of relationships, introducing standards for data sharing across partnerships becomes even easier? >> I'm not convinced that big data SQL as a technology is going to solve all the problems we see there, I'm absolutely convinced that Oracle is going to work towards that, you see it in so many acquisitions we've done, you see it in the efforts of making data as a service available to people, and to some extent big data SQL will be a foundation layer to make BI queries run smoother across more and more and more pillars of data. If we can integrate database, Hadoop, and NoSQL, there's nothing that says, oh and by the way, storage cloud. >> And we have relatively common physical governance, that I have the same physical governance, and you have the same physical governance, now its easier for us to show how we can introduce governance across our instances. >> Absolutely, and today we focus a lot on HDFS or Hadoop as the next data pillar, storage cloud, ground to cloud, all of those are on the roadmap for big data SQL to catch up with that, and so if you have data as a service, let's declare that cloud for a second, and I have data in my database in my Hadoop cluster, again, all now becomes part of the same ecosystem of data, and it all looks the same to me from a BI query perspective, from an analytics perspective. And then the, how do I get the data sharing standards set up and all that, part of that is driving a lot of it into cloud, and making it all as a service, 'cause again you put a level of abstraction on top of it, that makes it easier to consume, understand where it came from, and capture the meta-data. >> So JP one last question. >> Sure. >> Oracle opens worlds on the horizon, what are you looking for, or what will your customers be looking for as it pertains to this big data SQL and related technologies? >> I think specifically from a big data SQL perspective, is we're going to drive the possible adoption scope much much further, today we work with HDFS an we work with Oracle database, we're going to announce certain things like exadata, Hadoop will be supportive, we hold down super cluster support, we're going to dramatically expand the footprint big data SQL will run on, people who come for big data SQL or analytics sessions you'll see a lot of the roadmap looking far more forward. I already mentioned some things like ground to cloud, how can I run big data SQL when my exadata is on Premis, and then the rest of my HDFS data is in the cloud, we're going to be talking about how we're going to do that, and what do we think the evolution of big data SQL is going to be, I think that's going to be a very fun session to go to. >> JP Dijcks, a master product manager inside the Oracle big data product group, thank you very much for joining us here On the Ground, at Oracle headquarters, this is The Cube.

Published Date : Sep 6 2016

SUMMARY :

Narrator: The Cube presents, On the Ground. or one of the master product managers, and so that's one of the ways we are driving, and so we certainly want to create the meta-data, and some of the creations, some of the generation, So that's what it is, now how does it work? and the catalog to understand things like Hive objects and so that's how we think we can get an architecture ready So we end up with, if I got this right, let me recap, and who actually who are trying to get business value out of and we can, and if you think this through a little bit, because actually one of the interesting things you said everything's kind of out of the box, here we go. and I think we set a major step forward and sustaining those relationships that are and to some extent big data SQL will be a foundation and you have the same physical governance, Absolutely, and today we focus a lot on HDFS or Hadoop and what do we think the evolution the Oracle big data product group,

ENTITIES

Entity	Category	Confidence
Peter Burris	PERSON	0.99+
Peter	PERSON	0.99+
JP Dijcks	PERSON	0.99+
Jean-Pierre Dijcks	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
JP	PERSON	0.99+
five years	QUANTITY	0.99+
Jason	PERSON	0.99+
Python	TITLE	0.99+
SQL	TITLE	0.99+
NoSQL	TITLE	0.99+
2nd	QUANTITY	0.98+
first step	QUANTITY	0.98+
today	DATE	0.98+
HDFS	ORGANIZATION	0.98+
Today	DATE	0.98+
one	QUANTITY	0.97+
Hadoop	TITLE	0.96+
Parquet	TITLE	0.96+
one part	QUANTITY	0.95+
The Cube	ORGANIZATION	0.95+
thousands of people	QUANTITY	0.94+
yesterdays	DATE	0.94+
ORC	TITLE	0.93+
Silicon Angle	ORGANIZATION	0.92+
Flash	TITLE	0.91+
first	QUANTITY	0.84+
years	QUANTITY	0.83+
a second	QUANTITY	0.82+
the Ground	TITLE	0.82+
Hadoop HDFS	TITLE	0.81+
a lot of people	QUANTITY	0.8+
one place	QUANTITY	0.77+
The Cube	TITLE	0.6+
singular	QUANTITY	0.58+
Narrator	TITLE	0.54+
last question	QUANTITY	0.52+
IOT	ORGANIZATION	0.37+

Joel Horwitz, IBM & David Richards, WANdisco - Hadoop Summit 2016 San Jose - #theCUBE

>> Narrator: From San Jose, California, in the heart of Silicon Valley, it's theCUBE. Covering Hadoop Summit 2016. Brought to you by Hortonworks. Here's your host, John Furrier. >> Welcome back everyone. We are here live in Silicon Valley at Hadoop Summit 2016, actually San Jose. This is theCUBE, our flagship program. We go out to the events and extract the signal to the noise. Our next guest, David Richards, CEO of WANdisco. And Joel Horowitz, strategy and business development, IBM analyst. Guys, welcome back to theCUBE. Good to see you guys. >> Thank you for having us. >> It's great to be here, John. >> Give us the update on WANdisco. What's the relationship with IBM and WANdisco? 'Cause, you know. I can just almost see it, but I'm not going to predict. Just tell us. >> Okay, so, I think the last time we were on theCUBE, I was sitting with Re-ti-co who works very closely with Joe. And we began to talk about how our partnership was evolving. And of course, we were negotiating an OEM deal back then, so we really couldn't talk about it very much. But this week, I'm delighted to say that we announced, I think it's called IBM Big Replicate? >> Joel: Big Replicate, yeah. We have a big everything and Replicate's the latest edition. >> So it's going really well. It's OEM'd into IBM's analytics, big data products, and cloud products. >> Yeah, I'm smiling and smirking because we've had so many conversations, David, on theCUBE with you on and following your business through the bumpy road or the wild seas of big data. And it's been a really interesting tossing and turning of the industry. I mean, Joel, we've talked about it too. The innovation around Hadoop and then the massive slowdown and realization that cloud is now on top of it. The consumerization of the enterprise created a little shift in the value proposition, and then a massive rush to build enterprise grade, right? And you guys had that enterprise grade piece of it. IBM, certainly you're enterprise grade. You have enterprise everywhere. But the ecosystem had to evolve really fast. What happened? Share with the audience this shift. >> So, it's classic product adoption lifecycle and the buying audience has changed over that time continuum. In the very early days when we first started talking more at these events, when we were talking about Hadoop, we all really cared about whether it was Pig and Hive. >> You once had a distribution. That's a throwback. Today's Thursday, we'll do that tomorrow. >> And the buying audience has changed, and consequently, the companies involved in the ecosystem have changed. So where we once used to really care about all of those different components, we don't really care about the machinations below the application layer anymore. Some people do, yes, but by and large, we don't. And that's why cloud for example is so successful because you press a button, and it's there. And that, I think, is where the market is going to very, very quickly. So, it makes perfect sense for a company like WANdisco who've got 20, 30, 40, 50 sales people to move to a company like IBM that have 4 or 5,000 people selling our analytics products. >> Yeah, and so this is an OEM deal. Let's just get that news on the table. So, you're an OEM. IBM's going to OEM their product and brand it IBM, Big Replication? >> Yeah, it's part of our Big Insights Portfolio. We've done a great job at growing this product line over the last few years, with last year talking about how we decoupled all the value-as from the core distribution. So I'm happy to say that we're both part of the ODPI. It's an ODPI-certified distribution. That is Hadoop that we offer today for free. But then we've been adding not just in terms of the data management capabilities, but the partnership here that we're announcing with WANdisco and how we branded it as Big Replicate is squarely aimed at the data management market today. But where we're headed, as David points out, is really much bigger, right? We're talking about support for not only distributed storage and data, but we're also talking about a hybrid offering that will get you to the cloud faster. So not only does Big Replicate work with HDFS, it also works with the Swift objects store, which as you know, kind of the underlying storage for our cloud offering. So what we're hoping to see from this great partnership is as you see around you, Hadoop is a great market. But there's a lot more here when you talk about managing data that you need to consider. And I think hybrid is becoming a lot larger of a story than simply distributing your processing and your storage. It's becoming a lot more about okay, how do you offset different regions? How do you think through that there are multiple, I think there's this idea that there's one Hadoop cluster in an enterprise. I think that's factually wrong. I think what we're observing is that there's actually people who are spinning up, you know, multiple Hadoop distributions at the line of business for maybe a campaign or for maybe doing fraud detection, or maybe doing log file, whatever. And managing all those clusters, and they'll have Cloud Arrow. They'll have Hortonworks. They'll have IBM. They'll have all of these different distributions that they're having to deal with. And what we're offering is sanity. It's like give me sanity for how I can actually replicate that data. >> I love the name Big Replicate, fantastic. Big Insights, Big Replicate. And so go to market, you guys are going to have bigger sales force. It's a nice pop for you guys. I mean, it's good deal. >> We were just talking before we came on air about sort of a deal flow coming through. It's coming through, this potential deal flow coming through, which has been off the charts. I mean, obviously when you turn on the tap, and then suddenly you enable thousands and thousands of sales people to start selling your products. I mean, IBM, are doing a great job. And I think IBM are in a unique position where they own both cloud and on-prem. There are very few companies that own both the on-prem-- >> They're going to need to have that connection for the companies that are going hybrid. So hybrid cloud becomes interesting right now. >> Well, actually, it's, there's a theory that says okay, so, and we were just discussing this, the value of data lies in analytics, not in the data itself. It lies in you've been able to pull out information from that data. Most CIOs-- >> If you can get the data. >> If you can get the data. Let's assume that you've got the data. So then it becomes a question of, >> That's a big assumption. Yes, it is. (laughs) I just had Nancy Handling on about metadata. No, that's an issue. People have data they store they can't do anything with it. >> Exactly. And that's part of the problem because what you actually have to have is CPU slash processing power for an unknown amount of data any one moment in time. Now, that sounds like an elastic use case, and you can't do elastic on-prem. You can only do elastic in cloud. That means that virtually every distribution will have to be a hybrid distribution. IBM realized this years ago and began to build this hybrid infrastructure. We're going to help them to move data, completely consistent data, between on-prem and cloud, so when you query things in the cloud, it's exactly the same results and the correct results you get. >> And also the stability too on that. There's so many potential, as we've discussed in the past, that sounds simple and logical. To do an enterprise grade is pretty complex. And so it just gives a nice, stable enterprise grade component. >> I mean, the volumes of data that we're talking about here are just off the charts. >> Give me a use case of a customer that you guys are working with, or has there been any go-to-market activity or an ideal scenario that you guys see as a use case for this partnership? >> We're already seeing a whole bunch of things come through. >> What's the number one pattern that bubbles up to the top? Use case-wise. >> As Joel pointed out, that he doesn't believe that any one company just has one version of Hadoop behind their firewall. They have multiple vendors. >> 100% agree with that. >> So how do you create one, single cluster from all of those? >> John: That's one problem you solved. >> That's of course a very large problem. Second problem that we're seeing in spades is I have to move data to cloud to run analytics applications against it. That's huge. That required completely guaranteed consistent data between on-prem and cloud. And I think those two use cases alone account for pretty much every single company. >> I think there's even a third here. I think the third is actually, I think frankly there's a lot of inefficiencies in managing just HDFS and how many times you have to actually copy data. If I looked across, I think the standard right now is having like three copies. And actually, working with Big Replicate and WANdisco, you can actually have more assurances and actually have to make less copies across the cluster and actually across multiple clusters. If you think about that, you have three copies of the data sitting in this cluster. Likely, an analysts have a dragged a bunch of the same data in other clusters, so that's another multiple of three. So there's amount of waste in terms of the same data living across your enterprise. That I think there's a huge cost-savings component to this as well. >> Does this involve anything with Project Atlas at all? You guys are working with, >> Not yet, no. >> That project? It's interesting. We're seeing a lot of opening up the data, but all they're doing is creating versions of it. And so then it becomes version control of the data. You see a master or a centralization of data? Actually, not centralize, pull all the data in one spot, but why replicate it? Do you see that going on? I guess I'm not following the trend here. I can't see the mega trend going on. >> It's cloud. >> What's the big trend? >> The big trend is I need an elastic infrastructure. I can't build an elastic infrastructure on-premise. It doesn't make economic sense to build massive redundancy maybe three or four times the infrastructure I need on premise when I'm only going to use it maybe 10, 20% of the time. So the mega trend is cloud provides me with a completely economic, elastic infrastructure. In order to take advantage of that, I have to be able to move data, transactional data, data that changes all the time, into that cloud infrastructure and query it. That's the mega trend. It's as simple as that. >> So moving data around at the right time? >> And that's transaction. Anybody can say okay, press pause. Move the data, press play. >> So if I understand this correctly, and just, sorry, I'm a little slow. End of the day today. So instead of staging the data, you're moving data via the analytics engines. Is that what you're getting at? >> You use data that's being transformed. >> I think you're accessing data differently. I think today with Hadoop, you're accessing it maybe through like Flume or through Oozy, where you're building all these data pipelines that you have to manage. And I think that's obnoxious. I think really what you want is to use something like Apache Spark. Obviously, we've made a large investment in that earlier, actually, last year. To me, what I think I'm seeing is people who have very specific use cases. So, they want to do analysis for a particular campaign, and so they may just pull a bunch of data into memory from across their data environment. And that may be on the cloud. It may be from a third-party. It may be from a transactional system. It may be from anywhere. And that may be done in Hadoop. It may not, frankly. >> Yeah, this is the great point, and again, one of the themes on the show is, this is a question that's kind of been talked about in the hallways. And I'd love to hear your thoughts on this. Is there are some people saying that there's really no traction for Hadoop in the cloud. And that customers are saying, you know, it's not about just Hadoop in the cloud. I'm going to put in S3 or object store. >> You're right. I think-- >> Yeah, I'm right as in what? >> Every single-- >> There's no traction for Hadoop in the cloud? >> I'll tell you what customers tell us. Customers look at what they actually need from storage, and they compare whatever it is, Hadoop or any on-premise proprietor storage array and then look at what S3 and Swift and so on offer to them. And if you do a side-by-side comparison, there isn't really a difference between those two things. So I would argue that it's a fact that functionally, storage in cloud gives you all the functionality that any customer would need. And therefore, the relevance of Hadoop in cloud probably isn't there. >> I would add to that. So it really depends on how you define Hadoop. If you define Hadoop by the storage layer, then I would say for sure. Like HDFS versus an objects store, that's going to be a difficult one to find some sort of benefit there. But if you look at Hadoop, like I was talking to my friend Blake from Netflix, and I was asking him so I hear you guys are kind of like replatforming on Spark now. And he was basically telling me, well, sort of. I mean, they've invested a lot in Pig and Hive. So if you think it now about Hadoop as this broader ecosystem which you brought up Atlas, we talk about Ranger and Knox and all the stuff that keeps coming out, there's a lot of people who are still invested in the peripheral ecosystem around Hadoop as that central point. My argument would be that I think there's still going to be a place for distributed computing kind of projects. And now whether those will continue to interface through Yarn via and then down to HDFS, or whether that'll be Yarn on say an objects store or something and those projects will persist on their own. To me that's kind of more of how I think about the larger discussion around Hadoop. I think people have made a lot of investments in terms of that ecosystem around Hadoop, and that's something that they're going to have to think through. >> Yeah. And Hadoop wasn't really designed for cloud. It was designed for commodity servers, deployment with ease and at low cost. It wasn't designed for cloud-based applications. Storage in cloud was designed for storage in cloud. Right, that's with S3. That's what Swift and so on were designed specifically to do, and they fulfill most of those functions. But Joel's right, there will be companies that continue to use-- >> What's my whole argument? My whole argument is that why would you want to use Hadoop in the cloud when you can just do that? >> Correct. >> There's object store out. There's plenty of great storage opportunities in the cloud. They're mostly shoe-horning Hadoop, and I think that's, anyway. >> There are two classes of customers. There were customers that were born in the cloud, and they're not going to suddenly say, oh you know what, we need to build our own server infrastructure behind our own firewall 'cause they were born in the cloud. >> I'm going to ask you guys this question. You can choose to answer or not. Joel may not want to answer it 'cause he's from IBM and gets his wrist slapped. This is a question I got on DM. Hadoop ecosystem consolidation question. People are mailing in the questions. Now, keep sending me your questions if you don't want your name on it. Hold on, Hadoop system ecosystem. When will this start to happen? What is holding back the M and A? >> So, that's a great question. First of all, consolidation happens when you sort of reach that tipping point or leveling off, that inflection point where the market levels off, and we've reached market saturation. So there's no more market to go after. And the big guys like IBM and so on come in-- >> Or there was never a market to begin with. (laughs) >> I don't think that's the case, but yes, I see the point. Now, what's stopping that from happening today, and you're a naughty boy by the way for asking this question, is a lot of these companies are still very well funded. So while they still have cash on the balance sheet, of course, it's very, very hard for that to take place. >> You picked up my next question. But that's a good point. The VCs held back in 2009 after the crash of 2008. Sequoia's memo, you know, the good times role, or RIP good times. They stopped funding companies. Companies are getting funded, continually getting funding. Joel. >> So I don't think you can look at this market as like an isolated market like there's the Hadoop market and then there's a Spark market. And then even there's like an AI or cognitive market. I actually think this is all the same market. Machine learning would not be possible if you didn't have Hadoop, right? I wouldn't say it. It wouldn't have a resurgence that it has had. Mahout was one of the first machine learning languages that caught fire from Ted Dunning and others. And that kind of brought it back to life. And then Spark, I mean if you talk to-- >> John: I wouldn't say it creates it. Incubated. >> Incubated, right. >> And created that Renaissance-like experience. >> Yeah, deep learning, Some of those machine learning algorithms require you to have a distributed kind of framework to work in. And so I would argue that it's less of a consolidation, but it's more of an evolution of people going okay, there's distributed computing. Do I need to do that on-premise in this Hadoop ecosystem, or can I do that in the cloud, or in a growing Spark ecosystem? But I would argue there's other things happening. >> I would agree with you. I love both areas. My snarky comment there was never a market to begin with, what I'm saying there is that the monetization of commanding the hill that everyone's fighting for was just one of many hills in a bigger field of hills. And so, you could be in a cul-de-sac of being your own champion of no paying customers. >> What you have-- >> John: Or a free open-source product. >> Unlike the dotcom era where most of those companies were in the public markets, and you could actually see proper valuations, most of the companies, the unicorns now, most are not public. So the valuations are really difficult to, and the valuation metrics are hard to come by. There are only few of those companies that are in the public market. >> The cash story's right on. I think to Joel' point, it's easy to pivot in a market that's big and growing. Just 'cause you're in the wrong corner of the market pivoting or vectoring into the value is easier now than it was 10 years ago. Because, one, if you have a unicorn situation, you have cash on the bank. So they have a good flush cash. Your runway's so far out, you can still do your thing. If you're a startup, you can get time to value pretty quickly with the cloud. So again, I still think it's very healthy. In my opinion, I kind of think you guys have good analysis on that point. >> I think we're going to see some really cool stuff happen working together, and especially from what I'm seeing from IBM, in the fact that in the IT crowd, there is a behavioral change that's happening that Hadoop opened the door to. That we're starting to see more and more It professionals walk through. In the sense that, Hadoop has opened the door to not thinking of data as a liability, but actually thinking about data differently as an asset. And I think this is where this market does have an opportunity to continue to grow as long as we don't get carried away with trying to solve all of the old problems that we solved for on-premise data management. Like if we do that, then we're just, then there will be a consolidation. >> Metadata is a huge issue. I think that's going to be a big deal. And on the M and A, my feeling on the M and A is that, you got to buy something of value, so you either have revenue, which means customers, and or initial property. So, in a market of open source, it comes back down to the valuation question. If you're IBM or Oracle or HP, they can pivot too. And they can be agile. Now slower agile, but you know, they can literally throw some engineers at it. So if there's no customers in I and P, they can replicate, >> Exactly. >> That product. >> And we're seeing IBM do that. >> They don't know what they're buying. My whole point is if there's nothing to buy. >> I think it depends on, ultimately it depends on where we see people deriving value, and clearly in WANdisco, there's a huge amount of value that we're seeing our customers derive. So I think it comes down to that, and there is a lot of IP there, and there's a lot of IP in a lot of these companies. I think it's just a matter of widening their view, and I think WANdisco is probably the earliest to do this frankly. Was to recognize that for them to succeed, it couldn't just be about Hadoop. It actually had to expand to talk about cloud and talk about other data environments, right? >> Well, congratulations on the OEM deal. IBM, great name, Big Replicate. Love it, fantastic name. >> We're excited. >> It's a great product, and we've been following you guys for a long time, David. Great product, great energy. So I'm sure there's going to be a lot more deals coming on your. Good strategy is OEM strategy thing, huh? >> Oh yeah. >> It reduces sales cost. >> Gives us tremendous operational leverage. Getting 4,000, 5,000-- >> You get a great partner in IBM. They know the enterprise, great stuff. This is theCUBE bringing all the action here at Hadoop. IBM OEM deal with WANdisco all happening right here on theCUBE. Be back with more live coverage after this short break.

Published Date : Jul 1 2016

SUMMARY :

Brought to you by Hortonworks. extract the signal to the noise. What's the relationship And of course, we were Replicate's the latest edition. So it's going really well. The consumerization of the enterprise and the buying audience has changed That's a throwback. And the buying audience has changed, Let's just get that news on the table. of the data management capabilities, I love the name Big that own both the on-prem-- for the companies that are going hybrid. not in the data itself. If you can get the data. I just had Nancy Handling and the correct results you get. And also the stability too on that. I mean, the volumes of bunch of things come through. What's the number one pattern that any one company just has one version And I think those two use cases alone of the data sitting in this cluster. I guess I'm not following the trend here. data that changes all the time, Move the data, press play. So instead of staging the data, And that may be on the cloud. And that customers are saying, you know, I think-- Swift and so on offer to them. and all the stuff that keeps coming out, that continue to use-- opportunities in the cloud. and they're not going to suddenly say, What is holding back the M and A? And the big guys like market to begin with. hard for that to take place. after the crash of 2008. And that kind of brought it back to life. John: I wouldn't say it creates it. And created that or can I do that in the cloud, that the monetization that are in the public market. I think to Joel' point, it's easy to pivot And I think this is where this market I think that's going to be a big deal. there's nothing to buy. the earliest to do this frankly. Well, congratulations on the OEM deal. So I'm sure there's going to be Gives us tremendous They know the enterprise, great stuff.

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Joel	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Joe	PERSON	0.99+
David Richards	PERSON	0.99+
Joel Horowitz	PERSON	0.99+
2009	DATE	0.99+
John	PERSON	0.99+
4	QUANTITY	0.99+
WANdisco	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
20	QUANTITY	0.99+
San Jose	LOCATION	0.99+
HP	ORGANIZATION	0.99+
thousands	QUANTITY	0.99+
Joel Horwitz	PERSON	0.99+
Ted Dunning	PERSON	0.99+
Big Replicate	ORGANIZATION	0.99+
last year	DATE	0.99+
Silicon Valley	LOCATION	0.99+
Big Replicate	ORGANIZATION	0.99+
40	QUANTITY	0.99+
30	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
third	QUANTITY	0.99+
today	DATE	0.99+
Hadoop	TITLE	0.99+
San Jose, California	LOCATION	0.99+
three	QUANTITY	0.99+
two things	QUANTITY	0.99+
2008	DATE	0.99+
5,000 people	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
David Richards	PERSON	0.99+
Blake	PERSON	0.99+
4,000, 5,000	QUANTITY	0.99+
S3	TITLE	0.99+
two classes	QUANTITY	0.99+
tomorrow	DATE	0.99+
Second problem	QUANTITY	0.99+
both areas	QUANTITY	0.99+
three copies	QUANTITY	0.99+
Hadoop Summit 2016	EVENT	0.99+
Swift	TITLE	0.99+
both	QUANTITY	0.99+
Big Insights	ORGANIZATION	0.99+
one problem	QUANTITY	0.98+
Today	DATE	0.98+

Irfan Khan, SAP | SAP SapphireNow 2016

>> Voiceover: It's theCUBE covering Sapphire Now. Headlines sponsored by SAP HANA Cloud, the leader in platform as a service. With support from Console Inc., the cloud internet company. Now, here are your hosts: John Furrier and Peter Burris. >> Okay, welcome back, everyone. We are here live in Orlando, Florida, for exclusive coverage of SAP Sapphire Now. This is theCUBE's SiliconANGLE's flagship program. We go out to the events and extract the signal from the noise. I'm John Furrier, Peter Burris. I want to thank our sponsors for allowing us to get down here, SAP HANA Cloud Platform, Console Inc., Capgemini, and EMC, thanks so much for supporting us. Our next guest is Ifran Khan, who is the SVP General Manager of digital enterprise platforms which includes HANA, end-to-end. Welcome back to theCUBE. >> Thank you. >> John: Good to see you. >> Lovely to be back here again. >> John: So, you know theCUBE history. We go way back, we've done pretty much every Hadoop World up until 2013, now we have an event the same day, week Estrada, New York, NSV, and we've been to every Sapphire since 2010 except for 2014, 2015. We had a little conflict of events, but it's been great. It's been big data. I remember Bill McDermott got up there when HANA was announced, kind of, or pre-built before Hadoop hit. So, you had HANA coming out of the oven, Hadoop hits the scene, Hadoop gets all the press, HANA's now rolling, so then you roll forward to four more years, we're here. What's your take on this, because it's been an interesting shift. Hadoop, some are saying, is hard to use, total costs of ownership. Now, HANA's rising, Hadoop is sliding. That's my opinion, but what's your opinion? >> Well, that's a well, sort of, summarized history lesson there, so to speak. Well, firstly, great to be on theCUBE again. It's always lovely to see you gentlemen here, you do a wonderful job. What I'd perhaps just highlight is maybe some of they key milestones that I've observed over the last four or five years. Ironically, 2010 when I arrived at SAP, when the entire, sort of if you like, trajectory of HANA started going in that direction, and Hadoop was sort of there, but it was maybe petering out a little bit because it was the unknown, the uncertainty of scale in whether or not this is going to be only batch or whether it's going to ever become real-time. So, I would maybe make the two or three milestones from the SAP side. HANA started off as a disruptive technology, which was perhaps conceived as being a response to a lot of internal challenges that we were running into using the systems of record of yester-era. They were incapable of dealing with SAP applications, incapable of giving us what we now refer to as a digital core, and that were incapable of giving our customers truly what they needed. As a response, HANA was introduced into the market, but it wasn't limited in scope to the, if you like the historical baggage of the relational era, or even the Hadoop era, so to speak. It was completely new imagined technologies built around in-memory computing, a columnar architecture, and therefore it gave us an opportunity to project ultimately what we could achieve with this as a foundation. So, HANA came into the market focusing on analytics to start with, going full circle into being able to do transactionality, as well, and where we are today? I think Hadoop is now being recognized, I would say probably as a de facto data operating system. So, HDFS is a very significant sort of extension to most IT organizations, but it's still lacking the compute capabilities. This is what's given their eyes a spark, and of course with HANA, HANA isn't, within itself, a very significant computing engine. >> John: And Vora. And Vora a-- >> Ifran: Of course, and Vora, as well. Now you're finishing off my sentences. Thank you. >> (laughs) This is what theCUBE is all about, we got a good cadence going here. Alright, so but now the challenge. HANA's also, by the way, was super fast when it came out, but then it didn't really fire in my opinion. It's swim-lane. It seems now, it's so clear that the fruit is coming off the tree, now. You're seeing it blossom beautifully. You got S/4 HANA, you got the core... Explain that because people get confused. Am I buying HANA Cloud, am I buying HANA Cloud Platform? Share how this is all segmented to the buyer, to the customer, to the customer. >> Sure, I mean firstly, SAP applications need to have a system of record. HANA is a system of record. It has a database capability, but ultimately HANA is not just a database. It's an entire platform with integration, and application services, and, of course, with data services. Now, as a consequence, when we talk about the HANA Cloud Platform, this is taking HANA as a core technology, as a platform, embedding it inside of a cloud deployment environment called a HANA Cloud Platform. It gives on opportunity where customers are perhaps implementing on premise S/4, or even in a public S/4 instance, an opportunity to extend those applications as perhaps they may need or require to do so for their business requirements. So, in layman's terms, you have a system of record requirement with SAP applications, that is HANA. It is only HANA now in the case of S/4. And in order to extend the application as customers want to customize those applications, there is one definitive extension venue, and that's called the HANA Cloud Platform. >> John: And that mainly is for developers, too. I call it the developer cloud, for lack of a better description or a more generic one. That's the cloud foundry. Basically the platform is a service that is actually bolting on, I guess a developer on-ramp, if you will. Is that a safe way to look at it? >> Ifran: Yeah, I mean I think the developer interaction point with SAP now certainly becomes HCP, but it also is a significant ecosystem enabler, as well. Only last week, or week-before-last in fact, we announced the relationship with Apple, which is a phenomenal extension of what we do with business applications, and HCP is the definitive venue for the Apple relationship in effect. >> So, tell us a little bit about borrowing or building upon that. What is increasingly... How should an executive, when I think about digitalization, how should they think about it? Is this something that is a new set of channels, or the ability to reach new customers, or is there something for fundamental going on here? Is it really about trying to translate more of your business into data in a way that it's accessible so it can be put to use and put to work in more and different ways? >> Sure, it's a great question. So, what is digitalization? Well, firstly, it's not new. I mean, SAP didn't invent digitalization, but I think we know a fair bit about where digitalization is going to take many businesses in the next three to five years. So, I would say that there's five prevailing trends that are fueling the need to go digital. The first thing is about hyperconnectivity. If we understand that data and information is not only just consumed, it's created in a variety of places, and geographically just about anywhere now is connected. I mean, in fact, I read one statistic that 90 percent of the world's inhabitable land masses have either cellular or wireless reception. So, truly, we're hyperconnected. The second thing is about the scale of the cloud, right? The cloud gives us compute, not just on the desktop, but anywhere; and by definition of anywhere, we're saying if you have a smart appliance at an edge, that is, in fact, supercomputing because it gives you an extension to be able to get to any compute device. And then you've got cloud, and on top of which, you have cyber-security, and a variety of other things like IOT. These things are all fueling the need to become digitally aware enterprises, and what's ultimately happening is that business transformation is happening because somebody without any premises, without any assets, comes along and disrupts a business. In fact, one study from Capgemini and, of course, from MIT back in 2013, was revealing that in the year 2,000 and 20, 2020 rather, out of the SMP 500, approximately 40 percent of the businesses are going to cease to exist. For the simple reason, those business transformations that are going on disrupting their classical business models are going to change the way that they operate. So, I would just, in a concatenated way of answering your question, digital transformation at the executive level is about, not just surviving, it's about thriving. It's about taking advantage of the digital trends. It's about making sure that, as you reinvent your businesses, you're not just looking at what you do today. You're always looking at that as a line that's been deprecated. What are you going to do in addition to that? That's where your growth is going to come from, and SAP's all about helping customers become digitally aware and transform their organizations. >> Paul: So, you're having conversations with customers all the time about the evolution of data management technologies, and your argument being is that HANA is more advanced, a columnar database in memory, speed, more complexity in the IO, all kinds of wonderful things that it makes possible can then be reflected in more complex, or more rich, value creating applications. But, the data is often undervalued. >> Ifran: Of course. >> The data itself. We haven't figured out how to look at that data, and start treating it literally as capital. We talk about a business problem, we talk about how much money we want to put there, how much people we want to put there, but we don't yet talk about how much data is going to be required either to go there and make it work, or that we're going to capture out of it. How are you working with customers to think that problem through? Are they thinking it through differently in your experience? >> Yeah, that's a great question. So, firstly, if I was to look at their value association with data, we can borrow from the airline industry perhaps as an analogy. If you look at data, it's very equivalent to passengers. The businesses that we typically operate on are working on first and business class data. They've actually made significant investments around how to securely store, access, process, manage all of this business class and first class data. But, there's an economy class of data which is significant and very pervasive, and if you look at it from the airline's point of view, an economy class individual passenger doesn't really equate to an awful lot, but if you aggregate all the economy class passengers, it's significant. It's actually more than your business and first class revenue, so to speak. So, consequently, large organizations have to start looking at data, monetizing the data, and not ignoring all of the noise signals that come out of the sensors, out of the various machinery, and making sure that they can aggregate that data, and build context around it. So, we have to start thinking along those ways. >> John: Yes, I love that analogy, so good. But, let's take that one step further. I want to make sure I go on the right plane, right? So, one, that's the data aware. So, digital assets is the data, so evaluation techniques come into play, but having a horizontally traversal data plane really, in real time, is a big thing because, not only do I go through security, put my shoes through, my laptop out, that's just IT. The plane is where the action is. I want to be on the right plane. That's making data aware, the alchemy behind it, that's the trick. What's your thoughts on that because this is a cutting area. You hear AI ontolgies and stuff going on there now, machine learning, certainly. Surely not advancing to the point where it's really working yet. It's getting there, but what's your thoughts on all this? >> Yeah, so I think the vehicle that you're referring to, whether it's a plane or whatever the mode of transportation is, at a metaphor level, we have to understand that there is a value in association with making decisions at the right time when you have all the information that you need, and by definition, we have created a culture in IT where we segregate data. We create this almost two swim lane approach. This is my now data, this is my transactional data, and here's my data that will then feed into some other environment, and I may look to analyze it after the event. Now, getting back to the HANA philosophy from day one, it was about creating a simplified model where you can do live analytics on transactional data. This is a big, significant shift. So, using your aircraft analogy, as I'm on there, I don't want to suddenly worry about I didn't pick up my magazine from Duty Free or whatever, from the newspaper stand. I've got no content now, I can't do anything. Alright, for the next nine hours, I'm on a plane now and I've got nothing to do. I've got no internet, I've got no connectivity. The idea is that you want to have all of the right information readily available and make real time decisions. That calls for simplified architectures all about HANA. >> We're getting the signal here. I know you're super busy. Thanks so much for coming on theCUBE. I want to get one final question in. What's your vision around your plans? I'll say it's cutting-edge, you get a great area, ecosystem's developing nicely. What's your goals for the next year? What are you looking to do? What are your key KPI's? What are you trying to knock down this year? What's your plans? >> I mean, first and foremost, we've spent an awful lot of time talking about SAP transformations and around SAP customer landscape transformations. S/4 is all about that. That is a digital core. The translation of digital core to SAP should not be inhibiting other customers who don't have an SAP transaction or application foundation. We want to be able to take SAP to every single platform usage out there and most customers will have a need for HANA-like technology. So, the top of my agenda is let's increase the full use requirements and actual value of HANA, and we're seeing an awful lot of traction there. The second thing is, we're now driving towards the cloud. HCP is the definitive venue not just for the ecosystem, for the developer and also for the traditional SAP customers, and we're going to be promoting an awful lot more exciting relationships, and I'd love to be able to speak to you again in the future about how the evolution is taking place. >> John: We wish we had more time. You're a super guest, great insight. Thank you for sharing the data here >> Ifran: Thank you for having me. >> John: On theCUBE. We'll be right back with more live coverage here inside the cube at Sapphire Now. You're watching theCUBE. (techno music) (calm music) >> Voiceover: There'll be millions of people in the near future that want to be involved in their own personal well-being and well--

Published Date : May 19 2016

SUMMARY :

the leader in platform as a service. We go out to the events and extract an event the same day, or even the Hadoop era, so to speak. John: And Vora. and Vora, as well. that the fruit is coming and that's called the HANA Cloud Platform. I call it the developer cloud, and HCP is the definitive venue or the ability to reach new customers, that are fueling the need to go digital. all the time about the evolution is going to be required either and not ignoring all of the noise signals So, digital assets is the data, at the right time when you have all We're getting the signal here. HCP is the definitive venue Thank you for sharing the data here here inside the cube at Sapphire Now.

ENTITIES

Entity	Category	Confidence
Vora	PERSON	0.99+
John	PERSON	0.99+
Paul	PERSON	0.99+
Ifran Khan	PERSON	0.99+
Apple	ORGANIZATION	0.99+
Peter Burris	PERSON	0.99+
two	QUANTITY	0.99+
Ifran	PERSON	0.99+
John Furrier	PERSON	0.99+
2014	DATE	0.99+
Irfan Khan	PERSON	0.99+
2013	DATE	0.99+
2015	DATE	0.99+
Bill McDermott	PERSON	0.99+
HANA	TITLE	0.99+
2010	DATE	0.99+
Console Inc.	ORGANIZATION	0.99+
next year	DATE	0.99+
last week	DATE	0.99+
EMC	ORGANIZATION	0.99+
HANA Cloud Platform	TITLE	0.99+
S/4	TITLE	0.99+
Capgemini	ORGANIZATION	0.99+
Orlando, Florida	LOCATION	0.99+
SAP	ORGANIZATION	0.99+
second thing	QUANTITY	0.99+
today	DATE	0.98+
one final question	QUANTITY	0.98+
MIT	ORGANIZATION	0.98+
first	QUANTITY	0.97+
Hadoop	TITLE	0.97+
2016	DATE	0.97+
HANA Cloud	TITLE	0.97+
one	QUANTITY	0.97+
approximately 40 percent	QUANTITY	0.96+
firstly	QUANTITY	0.96+
one study	QUANTITY	0.96+
four more years	QUANTITY	0.96+
three milestones	QUANTITY	0.95+
five prevailing trends	QUANTITY	0.94+
theCUBE	ORGANIZATION	0.94+
five years	QUANTITY	0.93+
one statistic	QUANTITY	0.92+
this year	DATE	0.91+
SAP HANA Cloud	TITLE	0.91+
first thing	QUANTITY	0.91+
day one	QUANTITY	0.9+
2020	DATE	0.9+

George Mathew, Alteryx - BigDataSV 2014 - #BigDataSV #theCUBE

>>The cube at big data SV 2014 is brought to you by headline sponsors. When disco we make Hadoop invincible and Aptean accelerating big data, 2.0, >>Okay. We're back here, live in Silicon valley. This is big data. It has to be, this is Silicon England, Wiki bonds, the cube coverage of big data in Silicon valley and all around the world covering the strata conference. All the latest news analysis here in Silicon valley, the cube was our flagship program about the events extract the signal from noise. I'm John furrier, the founders of looking angle. So my co-host and co-founder of Wiki bond.org, Dave Volante, uh, George Matthew CEO, altruist on the cube again, back from big data NYC just a few months ago. Um, our two events, um, welcome back. Great to be here. So, um, what fruit is dropped into the blend or the change, the colors of the big data space this this time. So we were in new Yorkers. We saw what happened there. A lot of talk about financial services, you know, big business, Silicon valley Kool-Aid is more about innovation. Partnerships are being formed, channel expansion. Obviously the market's hot growth is still basing. Valuations are high. What's your take on the current state of the market? >>Yeah. Great question. So John, when we see this market today, I remember even a few years ago when I first visited the cave, particularly when it came to a deep world and strata a few years back, it was amazing that we talked about this early innings of a ballgame, right? We said it was like, man, we're probably in the second or third inning of this ball game. And what has progressed particularly this last few years has been how much the actual productionization, the actual industrialization of this activity, particularly from a big data analytics standpoint has merged. And that's amazing, right? And in a short span, two, three years, we're talking about technologies and capabilities that were kind of considered things that you play with. And now these are things that are keeping the lights on and running, you know, major portions of how better decision-making and analytics are done inside of organizations. So I think that industrialization is a big shift forward. In fact, if you've listened to guys like Narendra Mulani who runs most of analytics at Accenture, he'll actually highlight that as one of the key elements of how not only the transformation is occurring among organizations, but even the people that are servicing a large companies today are going through this big shift. And we're right in the middle of it. >>We saw, you mentioned a censure. We look at CSC, but service mesh and the cloud side, you seeing the consulting firms really seeing build-out mandates, not just POC, like let's go and lock down now for the vendors. That means is people looking for reference accounts right now? So to me, I'm kind of seeing the tea leaves say, okay, who's going to knock down the reference accounts and what is that going to look like? You know, how do you go in and say, I'm going to tune up this database against SAP or this against that incumbent legacy vendor with this new scale-out, all these things are on in play. So we're seeing that, that focus of okay, tire kicking is over real growth, real, real referenceable deployments, not, not like a, you know, POC on steroids, like full on game-changing deployments. Do you see that? And, and if you do, what versions of that do you seeing happening and what ending of that is that like the first pitch of the sixth inning? Uh, w what do you, how would you benchmark that? >>Yeah, so I, I would say we're, we're definitely in the fourth or fifth inning of a non ballgame now. And, and there's innings. What we're seeing is I describe this as a new analytic stack that's emerged, right? And that started years ago when particularly the major Hadoop distro vendors started to rethink how data management was effectively being delivered. And once that data management layer started to be re thought, particularly in terms of, you know, what the schema was on read what the ability to do MPP and scale-out was in terms of how much cheaper it is to bring storage and compute closer to data. What's now coming above that stack is, you know, how do I blend data? How do I be able to give solutions to data analysts who can make better decisions off of what's being stored inside of that petabyte scale infrastructure? So we're seeing this new stack emerge where, you know, Cloudera Hortonworks map are kind of that underpinning underlying infrastructure where now our based analytics that revolution provides Altrix for data blending for analytic work, that's in the hands of data analysts, Tableau for visual analysis and dashboarding. Those are basically the solutions that are moving forward as a capability that are package and product. >>Is that the game-changing feature right now, do you think that integration of the stack, or is that the big, game-changer this sheet, >>That's the hardening that's happening as we speak right now, if you think about the industrialization of big data analytics that, you know, as I think of it as the fourth or fifth inning of the ballgame, that hardening that ability to take solutions that either, you know, the Accentures, the KPMGs, the Deloitte of the world deliver to their clients, but also how people build stuff internally, right? They have much better solutions that work out of the box, as opposed to fumbling with, you know, things that aren't, you know, stitched as well together because of the bailing wire and bubblegum that was involved for the last few years. >>I got it. I got to ask you, uh, one of the big trends you saw in certainly in the tech world, you mentioned stacks, and that's the success of Amazon, the cloud. You're seeing integrated stacks being a key part of the, kind of the, kind of the formation of you said hardening of the stack, but the word horizontally scalable is a term that's used in a lot of these open source environments, where you have commodity hardware, you have open source software. So, you know, everything it's horizontally scalable. Now, that's, that's very easy to envision, but thinking about the implementation in an enterprise or a large organization, horizontally scalable is not a no brainer. What's your take on that. And how does that hyperscale infrastructure mindset of scale-out scalable, which is a big benefit of the current infrastructure? How does that fit into, into the big day? >>Well, I think it fits extremely well, right? Because when you look at the capabilities of the last, as we describe it stack, we almost think of it as vertical hardware and software that's factually built up, but right now, for anyone who's building scale in this world, it's all about scale-out and really being able to build that stack on a horizontal basis. So if you look at examples of this, right, say for instance, what a cloud era recently announced with their enterprise hub. And so when you look at that capability of the enterprise data hub, a lot of it is about taking what yarn has become as a resource manager. What HDFS has been ACOM as a scale-out storage infrastructure, what the new plugin engines have merged beyond MapReduce as a capability for engines to come into a deep. And that is a very horizontal description of how you can do scale out, particularly for data management. >>When we built a lot of the work that was announced at strata a few years ago, particularly around how the analytics architecture for Galerie, uh, emerged at Altryx. Now we have hundreds of, of apps, thousands of users in that infrastructure. And when we built that out was actually scaling out on Amazon where the worker nodes and the capability for us to manage workload was very horizontal built out. If you look at servers today of any layer of that stack, it is really about that horizontal. Scale-out less so about throwing more hardware, more, uh, you know, high-end infrastructure at it, but more about how commodity hardware can be leveraged and use up and down that stack very easily. So Georgia, >>I asked you a question, so why is analytics so hard for so many companies? Um, and you've been in this big data, we've been talking to you since the beginning, um, and when's it going to get easier? And what are you guys specifically doing? You know, >>So facilitate that. Sure. So a few things that we've seen to date is that a lot of the analytics work that many people do internal and external to organizations is very rote, hand driven coding, right? And I think that's been one of the biggest challenges because the two end points in analytics have been either you hard code stuff that you push into a, you know, a C plus plus or a Java function, and you push it into database, or you're doing lightweight analytics in Excel. And really there needs to be a middle ground where someone can do effective scale-out and have repeatability in what's been done and ease of use. And what's been done that you don't have to necessarily be a programmer and Java programmer in C plus plus to push an analytic function and database. And you certainly don't have to deal with the limitations of Excel today. >>And really that middle ground is what Altryx serves. We look at it as an opportunity for analysts to start work with a very repeatable re reasonable workflow of how they would build their initial constructs around an analytic function that they would want to deploy. And then the scale-out happens because all of the infrastructure works on that analyst behalf, whether that be the infrastructure on Hadoop, would that be the infrastructure of the scale out of how we would publish an analytic function? Would that be how the visualizations would occur inside of a product like Tableau? And so that, I think Dave is one of the biggest things that needs to shift over where you don't have the only options in front of you for analytics is either Excel or hard coding, a bunch of code in C plus plus, or Java and pushing it in database. Yeah. >>And you correct me if I'm wrong, but it seems to be building your partnerships and your ecosystem really around driving that solution and, and, and really driving a revolution in the way in which people think about analytics, >>Ease of use. The idea is that ultimately if you can't get data analysts to be able to not only create work, that they can actually self-describe deploy and deliver and deliver success inside of an organization. And scale that out at the petabyte scale information that exists inside of most organizations you fail. And that's the job of folks like ourselves to provide great software. >>Well, you mentioned Tableau, you guys have a strong partnership there, and Christian Chabot, I think has a good vision. And you talked about sort of, you know, the, the, the choices of the spectrum and neither are good. Can you talk a little bit more about that, that, that partnership and the relationship and what you guys are doing together? Yeah. >>Uh, I would say Tableau's our strongest and most strategic partner today. I mean, we were diamond sponsors of their conference. I think I was there at their conference when I was on the cube the time before, and they are diamond sponsors of our conference. So our customers and particular users are one in the same for Tablo. It really becomes a, an experience around how visual analysis and dashboard, and can be very easily delivered by data analysts. And we think of those same users, the same exact people that Tablo works with to be able to do data blending and advanced analytics. And so that's why the two software products, that's why the two companies, that's where our two customer bases are one in the same because of that integrated experience. So, you know, Tableau is basically replacing XL and that's the mission that thereafter. And we feel that anyone who wants to be able to do the first form of data blending, which I would think of as a V lookup in Excel, should look at Altryx as a solution for that one. >>So you mentioned your conference it's inspire, right? It >>Is inspiring was coming up in June, >>June. Yeah. Uh, how many years have you done inspire? >>Inspire is now in its fifth year. And you're gonna bring the >>Cube this year. Yeah. >>That would be great. You guys, yeah, that would be fun. >>You should do it. So talk about the conference a little bit. I don't know much about it, but I mean, I know of it. >>Yeah. It's very centered around business users, particularly data analysts and many organizations that cut across retail, financial services, communications, where companies like Walmart at and T sprint Verizon bring a lot of their underlying data problems, underlying analytic opportunities that they've wrestled with and bring a community together this year. We're expecting somewhere in the neighborhood of 550 600 folks attending. So largely to, uh, figure out how to bring this, this, uh, you know, game forward, really to build out this next rate analytic capability that's emerging for most organizations. And we think that that starts ultimately with data analysts. All right. We think that there are well over two and a half million data analysts that are underserved by the current big data tools that are in this space. And we've just been highly focused on targeting those users. And so far, it's been pretty good at us. >>It's moving, it's obviously moving to the casual user at some levels, but I ended up getting there not soon, but I want to, I want to ask you the role of the cloud and all this, because when you have underneath the hood is a lot of leverage. You mentioned integrates that's when to get your perspective on the data cloud, not data cloud is it's putting data in the cloud, but the role of cloud, the role of dev ops that intersection, but you're seeing dev ops, you know, fueling a lot of that growth, certainly under the hood. Now on the top of the stack, you have the, I guess, this middle layer for lack of a better description, I'm of use old, old metaphor developing. So that's the enablement piece. Ultimately the end game is fully turnkey, data science, personalization, all that's, that's the holy grail. We all know. So how do you see that collision with cloud and the big, the big data? >>Yeah. So cloud is basically become three things for a lot of folks in our space. One is what we talked about, which is scale up and scale out, uh, is something that is much more feasible when you can spin up and spin down infrastructure as needed, particularly on an elastic basis. And so many of us who built our solutions leverage Amazon being one of the most defacto solutions for cloud based deployment, that it just makes it easy to do the scale-out that's necessary. This is the second thing it actually enables us. Uh, and many of our friends and partners to do is to be able to bring a lower cost basis to how infrastructure stood up, right? Because at the end of the day, the challenge for the last generation of analytics and data warehousing that was in this space is your starting conversation is two to $3 million just in infrastructure alone before you even buy software and services. >>And so now if you can rent everything that's involved with the infrastructure and the software is actually working within days, hours of actually starting the effort, as opposed to a 14 month life cycle, it's really compressing the time to success and value that's involved. And so we see almost a similarity to how Salesforce really disrupted the market. 10 years ago, I happened to be at Salesforce when that disruption occurred and the analytics movement that is underway really impacted by cloud. And the ability to scale out in the cloud is really driving an economic basis. That's unheard of with that >>Developer market, that's robust, right? I mean, you have easy kind of turnkey development, right? Tapping >>It is right, because there's a robust, uh, economy that's surrounding the APIs that are now available for cloud services. So it's not even just at the starting point of infrastructure, but there's definite higher level services where all the way to software as industry, >>How much growth. And you'll see in those, in that, as that, that valley of wealth and opportunity that will be created from your costs, not only for the companies involved, but the company's customers, they have top line focus. And then the goal of the movement we've seen with analytics is you seeing the CIO kind of with less of a role, more of the CEO wants to the chief data officer wants most of the top line drivers to be app focused. So you seeing a big shift there. >>Yeah. I mean, one of the, one of the real proponents of the cloud is now the fact that there is an ability for a business analyst business users and the business line to make impacts on how decisions are done faster without the infrastructure underpinnings that were needed inside the four walls in our organization. So the decision maker and the buyer effectively has become to your point, the chief analytics officer, the chief marketing officer, right. Less so that the chief information officer of an organization. And so I think that that is accelerating in a tremendous, uh, pace, right? Because even if you look at the statistics that are out there today, the buying power of the CMO is now outstrip the buying power of the CIO, probably by 1.2 to 1.3 X. Right. And that used to be a whole different calculus that was in front of us before. So I would see that, uh, >>The faster, so yeah, so Natalie just kind of picked this out here real time. So you got it, which we all know, right. I went to the it world for a long time service, little catalog. Self-service, you know, Sarah's already architectures whatever you want to call it, evolve in modern era. That's good. But on the business side, there's still a need for this same kind of cataloguing of tooling platform analytics. So do you agree with that? I mean, do you see that kind of happening that way, where there's still some connection, but it's not a complete dependency. That's kind of what we're kind of rethinking real time you see that happen. >>Yeah. I think it's pretty spot on because when you look at what businesses are doing today, they're selecting software that enables them to be more self-reliant the reason why we have been growing as much among business analysts as we have is we deliver self-reliance software and in some way, uh, that's what tablet does. And so the, the winners in this space are going to be the ones that will really help users get to results faster for self-reliance. And that's, that's really what companies like Altrix Stanford today. >>So I want to ask you a follow up on that CMOs CIO discussion. Um, so given that, that, that CMOs are spending a lot more where's the, who owns the data, is that, is we, we talk, well, I don't know if I asked you this before, but do you see the role of a chief data officer emerging? And is that individual, is that individual part of the marketing organization? Is it part of it? Is it a separate parallel role? What are you, >>One of the things I will tell you is that as I've seen chief analytics and chief data officers emerge, and that is a real category entitled real deal of folks that have real responsibilities in the organization, the one place that's not is in it, which is interesting to see, right? Because oftentimes those individuals are reporting straight to the CEO, uh, or they have very close access to line of business owners, general managers, or the heads of marketing, the heads of sales. So I seeing that shift where wherever that chief data officer is, whether that's reporting to CEOs or line of business managers or general managers of, of, you know, large strategic business units, it's not in the information office, it's not in the CEO's, uh, purview anymore. And that, uh, is kind of telling for how people are thinking about their data, right? Data is becoming much more of an asset and a weapon for how companies grow and build their scale less. So about something that we just have to deal with. >>Yeah. And it's clearly emerging that role in certain industry sectors, you know, clearly financial services, government and healthcare, but slowly, but we have been saying that, >>Yeah, it's going to cross the board. Right. And one of the reasons why I wrote the article at the end of last year, I literally titled it. Uh, analytics is eating the world, is this exact idea, right? Because, uh, you have this, this notion that you no longer are locked down with data and infrastructure kind of holding you back, right? This is now much more in the hands of people who are responsible for making better decisions inside their organizations, using data to drive those decisions. And it doesn't matter the size and shape of the data that it's coming in. >>Yeah. Data is like the F the food that just spilled all over it spilled out from the truck and analytics is on the Pac-Man eating out. Sorry. >>Okay. Final question in this segment is, um, summarize big data SV for us this year, from your perspective, knowing what's going on now, what's the big game changer. What should the folks know who are watching and should take note of which they pay attention to? What's the big story here at this moment. >>There's definite swim lanes that are being created as you can see. I mean, and, and now that the bigger distribution providers, particularly on the Hadoop side of the world have started to call out what they all stand for. Right. You can tell that map are, is definitely about creating a fast, slightly proprietary Hadoop distro for enterprise. You can tell that the folks at cloud era are focusing themselves on enterprise scale and really building out that hub for enterprise scale. And you can tell Horton works is basically embedding, enabling an open source for anyone to be able to take advantage of. And certainly, you know, the previous announcements and some of the recent ones give you an indicator of that. So I see the sense swimlanes forming in that layer. And now what is going to happen is that focus and attention is going to move away from how that layer has evolved into what I would think of as advanced analytics, being able to do the visual analysis and blending of information. That's where the next, uh, you know, battle war turf is going to be in particularly, uh, the strata space. So we're, we're really looking forward to that because it basically puts us in a great position as a company and a market leader in particularly advanced analytics to really serve customers in how this new battleground is emerging. >>Well, we really appreciate you taking the time. You're an awesome guest on the queue biopsy. You know, you have a company that you're running and a great team, and you come and share your great knowledge with our fans and an audience. Appreciate it. Uh, what's next for you this year in the company with some of your goals, let's just share that. >>Yeah. We have a few things that are, we mentioned a person inspired coming up in June. There's a big product release. Most of our product team is actually here and we have a release coming up at the beginning of Q2, which is Altryx nine oh. So that has quite a bit involved in it, including expansion of connectivity, uh, being able to go and introduce a fair degree of modeling capability so that the AR based modeling that we do scales out very well with revolution and Cloudera in mind, as well as being able to package into play analytic apps very quickly from those data analysts in mind. So it's, uh, it's a release. That's been almost a year in the works, and we're very much looking forward to a big launch at the beginning of Q2. >>George, thanks so much. You got inspire coming out. A lot of great success as a growing market, valuations are high, and the good news is this is just the beginning, call it mid innings in the industry, but in the customers, I call the top of the first lot of build-out real deployment, real budgets, real deal, big data. It's going to collide with cloud again, and I'm going to start a load, get a lot of innovation all happening right here. Big data SV all the big data Silicon valley coverage here at the cube. I'm Jennifer with Dave Alonzo. We'll be right back with our next guest. After the short break.

Published Date : Feb 15 2014

SUMMARY :

The cube at big data SV 2014 is brought to you by headline sponsors. A lot of talk about financial services, you know, big business, Silicon valley Kool-Aid is of the key elements of how not only the transformation is occurring among organizations, We look at CSC, but service mesh and the cloud side, you seeing the consulting that stack is, you know, how do I blend data? That's the hardening that's happening as we speak right now, if you think about the industrialization kind of the, kind of the formation of you said hardening of the stack, but the word horizontally And that is a very horizontal description of how you can do scale out, particularly around how the analytics architecture for Galerie, uh, been one of the biggest challenges because the two end points in analytics have been either you hard code stuff that have the only options in front of you for analytics is either Excel or And that's the job of folks like ourselves to provide great software. And you talked about sort of, you know, the, the, the choices of the spectrum and neither are So, you know, Tableau is basically replacing XL and that's the mission that thereafter. And you're gonna bring the Cube this year. That would be great. So talk about the conference a little bit. this, uh, you know, game forward, really to build out this next rate analytic capability that's the stack, you have the, I guess, this middle layer for lack of a better description, I'm of use old, Because at the end of the day, the challenge for the last generation of analytics And the ability to scale out in the cloud is really driving an economic basis. So it's not even just at the starting point of infrastructure, And then the goal of the movement we've seen with analytics is you seeing Less so that the chief information officer of an organization. of rethinking real time you see that happen. the winners in this space are going to be the ones that will really help users get to is that individual part of the marketing organization? One of the things I will tell you is that as I've seen chief analytics and chief data officers you know, clearly financial services, government and healthcare, but slowly, but we have been And one of the reasons why I wrote the article the Pac-Man eating out. What's the big story here at this moment. and some of the recent ones give you an indicator of that. Well, we really appreciate you taking the time. a fair degree of modeling capability so that the AR based modeling that we do scales and the good news is this is just the beginning, call it mid innings in the industry, but in the customers,

ENTITIES

Entity	Category	Confidence
Walmart	ORGANIZATION	0.99+
Dave Alonzo	PERSON	0.99+
Jennifer	PERSON	0.99+
two	QUANTITY	0.99+
George Mathew	PERSON	0.99+
Narendra Mulani	PERSON	0.99+
Dave	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
two companies	QUANTITY	0.99+
Excel	TITLE	0.99+
Dave Volante	PERSON	0.99+
June	DATE	0.99+
Natalie	PERSON	0.99+
14 month	QUANTITY	0.99+
Accentures	ORGANIZATION	0.99+
John	PERSON	0.99+
George	PERSON	0.99+
two events	QUANTITY	0.99+
fourth	QUANTITY	0.99+
Silicon valley	LOCATION	0.99+
Deloitte	ORGANIZATION	0.99+
Tableau	ORGANIZATION	0.99+
Accenture	ORGANIZATION	0.99+
KPMGs	ORGANIZATION	0.99+
George Matthew	PERSON	0.99+
Java	TITLE	0.99+
C plus plus	TITLE	0.99+
fifth year	QUANTITY	0.99+
John furrier	PERSON	0.99+
one	QUANTITY	0.99+
$3 million	QUANTITY	0.99+
second	QUANTITY	0.99+
C plus plus	TITLE	0.99+
1.2	QUANTITY	0.99+
Salesforce	ORGANIZATION	0.99+
NYC	LOCATION	0.99+
sixth inning	QUANTITY	0.99+
Tableau	TITLE	0.99+
first	QUANTITY	0.99+
three things	QUANTITY	0.99+
Altrix Stanford	ORGANIZATION	0.99+
Sarah	PERSON	0.99+
One	QUANTITY	0.99+
Wiki bonds	ORGANIZATION	0.98+
second thing	QUANTITY	0.98+
this year	DATE	0.98+
550 600 folks	QUANTITY	0.98+
fifth	QUANTITY	0.98+
two customer	QUANTITY	0.98+
Tablo	ORGANIZATION	0.98+
Christian Chabot	PERSON	0.98+
thousands of users	QUANTITY	0.98+
fifth inning	QUANTITY	0.98+
today	DATE	0.98+
10 years ago	DATE	0.98+
T sprint	ORGANIZATION	0.98+
2014	DATE	0.97+
Altrix	ORGANIZATION	0.97+
three years	QUANTITY	0.97+
Altryx	ORGANIZATION	0.96+
Georgia	LOCATION	0.96+
two software products	QUANTITY	0.95+
Horton	PERSON	0.94+
first form	QUANTITY	0.94+
MapReduce	TITLE	0.94+
first pitch	QUANTITY	0.93+

Dr. Amr Awadallah - Interview 1 - Hadoop World 2011 - theCUBE

okay we're back live in new york city for hadoop world 2011 john furrier its founder SiliconANGLE calm and we have a special walk-in guest tomorrow and allah the vp of engineering co founder of Cloudera who's going to be on at two thirty eastern time on the cube to go more in depth but since we saw her in the hallway we had a quick spot wanted to grab him in here this is the cube our flagship telecast where we go out to the event atop the smartest people and i'm here with my co-host i'm dave vellante Wikibon door welcome back you're a longtime cube alum so appreciate you coming back on and doing a quick drive by here thanks for the nice welcome so you know we go talk to the smart people in the room you're one of the smartest guys that I know and we've been friends for years and it was your my tweet heard around the world by you to find space and we've been sharing the office space at Cloudera a year didn't have you I meant to have you we're going to be trying to find space because you're expanding so fast we have to get in a new home sorry about that but I wanted to really thank you personally appear on live you've enabled SiliconANGLE Wikibon to we figured it out early because of you I mean we had our nose sniffing around the big data area before it's called big data but when we met talked we've been tracking the social web and really it's exploded in an amazing way and I'm just really thankful because I've been had a front-row seat in the trenches with you guys and and it's been amazing so I want to thank you're welcome and that's great to have you on board and so so you you've been evangelizing in the trenches at Yahoo you were a ir a textile partners announcing the hundred million dollar fund which is all great news today but you've been the real spark get cloudy air is one of the 10 others one of them but I know one of the main sparks a co-founder a lots of ginger cuz I'm Rebecca and my co-founder from facebook I mean we both we said this before like we saw the future like an hour companies we saw the future where everybody is gonna go next and now Jeff's gonna be on as well he's now taking this whole date of science thing art yep building out a team you gotta drilled that down with him what do you what do you think about all this I mean like right now how do you feel personally emotionally and looking at the marketplace share with us your yeah I'm very emotional today actually yeah lots of the good news is you heard about the funding news yes million dollars for startups but no but the 14 oh yeah yeah it is more most actually the news was supposed to come out today came out a bit earlier sir day but yeah I'm very very emotional because of that it's a very Testament from very big name investor's of how well we were doing and recognition of how big this wave really is also the hundred million fun from Excel that's also a huge testament and lots of hopefully lots of new innovations or startups will come out of that so I'm very emotional about that but also overwhelmed by the by the the size of this event and how many people are really gravitating towards the technology which shows how much work we still have to do going forward it was very very August of a great a bit scared a bit scared Michaels is a great CEO on stage they're great guy we love Mike just really he's geeky and he's pragmatic Jerry strategist and you got Kirk who's the operator yeah but he showed a slide up at his keynote that showed the evolution of Hadoop yes the core Hadoop and then he showed ya year-by-year and now we got that columns extending and you got new new components coming out take us through that that progression just go back a few years in and walk us through why is this going on so fast and what are the what's the what's the community doing and just yeah and what happened in 2008 it doesn't need was one mr. yeah when we when we started so I mean first 2008 when we started and what he was believing us back then that hey this thing is going to be big like we had the belief because we saw it happen firsthand but many folks were dismissive and no no no this this big data thing is a fat and nobody will care about it and look and behold today it's obviously proving not to be the case in terms of the maturity of the of the platform you're absolutely right i mean the slide that Mike showed should but only thirty percent of the contributions happening today are in the Hadoop core layer and and and and the overall kind of vision there is very system very similar to the operating system right except what this really is it's a data operating system right it's how to operate large amounts of data in a big data center so sorry it's like an operating system for many machines as opposed to Linux which does not bring system for a single machine right so Hadoop when it came out Hadoop is only the colonel it's only that inner layers which if you look at any opening system like windows or linux and so on the core functionality is two things storing files and running applications on top of these files that's what windows does that's what linux does that was loop does at the heart but then to really get an opening system to work you need many ancillary components around it that really make it functional you need libraries in it applications in eat integration IO devices etc etc and that's really what's happening in the hadoop world so started with the core OS layer which is Hadoop HDFS for storage MapReduce for computation but then now all of these other things are showing around that core kernel to really make it a fully functional extensible data opening system I which made a little replay button but let's just put the paws on that because this is kind of an important point in folks out there there's a lot of different and a lot of people and metaphors are used in this business so it's the Linux I want to be it's just like Red Hat right yeah we kind of use that term the business model is talk a little bit about that we just mentioned you know not like Linux just unpack that a little bit deeper for us what's the difference you mentioned Linux is can you replay what you just said that was really so I was actually talking about the similarity the similarity and then i can and then i can talk about the difference the similarity is the heart of Hadoop is a system for storing files which is sdfs and a system for running applications on top of these files which is MapReduce the heart of Linux is the same thing assistant for storing files which is a txt for and a system for scheduling applications on top of these files that's the same heart of Windows and so on the difference though so that's the similarity I got a difference is Linux is made to run on a single note right and when this is made to run on a single note Hadoop is really made to run on many many notes so hadoo bicester cares about taking a data center of servers a rack of servers or a data center of servers and having them look like one big massive mainframe built out of commodity hardware that can store arbitrary amounts of data and run any type of hence the new components like the hives of the world so now so now these new components coming up like high for example I've makes it easier to write queries for Hadoop it's it's a sequel language for writing queries on top of Hadoop so you don't have to go and write it in MapReduce which we call that assembly language of Hadoop so if you write it and MapReduce you will get the most flexibility you will get the most performance but only if you know what you're doing very similar when you do machine code if you do machine cool assembly you will able do anything but you can also shoot yourself in the foot sunbelt is that right the same thing with MapReduce right when you use hive hive abstracts that out for you so your rights equal and then hive takes care of doing all of the plumbing work to get that compulsion to map it is for you so that's hive HBase for example is a very nice system that augments a dupe makes it low latency and makes it makes it support update and insert and delete transactions which are HDFS does not support out of the box so small like a database it's more like my sequel yeah the energy of my sequel to Linux is very similar to hbase to HDFS and what's your take on were from you know your founders had on now yeah on the business model similarities and differences with with redhead yes so actually they are different I mean that the sonority the similarity stops at open source we are both open source right in the sense that the core system is open source is available out there you can look at the source code again the and so on the difference is with redhead red that actually has a license on their bits so there's the source code and then there's the bits so when Red Hat compiles the source code and two bits these bits you cannot deploy them without having a red hat license with us is very different is now we have the source code which is Apache is all in the patchy we compile the source code into a bunch of bits which is our distribution called cdh these bits are one hundred percent open-source 103 can deploy them use them you don't have to face anything the only reason why you would come back and pay us is for Cloudera enterprise which is really when you go operational when become operational a mission-critical cloud enterprise gives you two things first it gives you a proprietary management suite that we built and it's very unique to us nobody in the market has anything close to what we have right now that makes it easier for you to deploy configure monitor provision do capacity planning security management etc for a loop nobody else has anything close what we have right now for that management's that is unique to cloud area and not part of a patchy open source yes it's not part of the vet's office you only get that as a subscriber to cloud era we do have a free version of that that's available for download and it can run up to 15 hours just for you to get up and running quickly yeah and it's really very simple has a very simple installer like you should be able to go fire off that software and say install Hadoop these are one of my servers and would take care of everything else for you it's like having these installers you know when windows came out in the beginning and he had this nice progress bar and you can install applications very easily imagine that now for a cluster of servers right that's ready what this is the other reason why people subscribe to the cloud enterprise in addition to getting this management suite is getting our support services right and support is necessary for any software even if it's free even for hardware think if I give you a free airplane right now just comment just give it here you go here is an airplane right you can run this airplane make money from passengers you still need somebody to maintain their plane for you right you can still go higher your mechanics maybe we'd have a tweetup bummer you can hire your own mechanics to maintain that airplane but we tell you like if you subscribe with us as the mechanics for your airplane the support you will get with us will be way better than anything else and economics of it also would be way better than having your own stuff for doing the maintenance for that airplane okay final question and we got a one-minute because we slid you in real quick we're going to come back for folks armor is going to come back at two-thirty so come back its eastern time and we'll have a more in-depth conversation but just share with the folks watching your view of what's going on in the patchy and you know there's all these kind of weird you know Fudd being thrown around that clutter is not this and that and you guys clearly the leader we talked with Kirk about that we don't need to go into that but just surely this what's going on what's the real deal happening with Apache the code and you have a unique offering which I mean the real deal and I advise people to go look at this blog post that our CEO wrote called by Michaelson road called the community effect and the real deal is there is a very big healthy community developing the source code for Hadoop the core system which is actually fsm MapReduce and all the components around around that core system we at Cloudera employ a very large engineering organization and tactile engineering relation is bigger than many of these other companies in the space that's our engineering is bigger if you look at the whole company itself is much much bigger than any of these other players so we we do a lot of contributions and to the core system and to the projects around it however we are part of the community and we're definitely doing this with the community it's not just a clowder thing for the core platform so that that's the real deal all right yeah so here we are armor that co-founder congratulations great funding hundred L from accel partners who invested in you guys congratulations you're part of the community we all know that just kind of clarifying that for the record and you have a unique differentiator management suite and the enterprise stuff and say expand the experience experience yeah I think a huge differentiation we have is we have been doing this for three years I had over everybody else we have the experience across all the industries that matter so when you come to us we know how to do this in the finance industry in the retail industry and the health industry and the government so that that's something also that so I'll just for the audience out there arm is coming back at two third you're gonna go deeper in today's the highly decorated or a general because there is there a leak oh and thanks for the small extra info he's in the uniform to the cloud era logo yes sir affecting some of those for us to someday great so what you see you again love love our great great friend

Published Date : May 1 2012

SUMMARY :

clarifying that for the record and you

ENTITIES

Entity	Category	Confidence
Rebecca	PERSON	0.99+
Mike	PERSON	0.99+
Cloudera	ORGANIZATION	0.99+
2008	DATE	0.99+
Excel	TITLE	0.99+
Hadoop	TITLE	0.99+
three years	QUANTITY	0.99+
linux	TITLE	0.99+
one-minute	QUANTITY	0.99+
windows	TITLE	0.99+
Michaels	PERSON	0.99+
Jeff	PERSON	0.99+
john furrier	PERSON	0.99+
2011	DATE	0.99+
Linux	TITLE	0.99+
Kirk	PERSON	0.99+
today	DATE	0.99+
thirty percent	QUANTITY	0.99+
Yahoo	ORGANIZATION	0.99+
hbase	TITLE	0.98+
single note	QUANTITY	0.98+
two things	QUANTITY	0.97+
single note	QUANTITY	0.97+
two bits	QUANTITY	0.97+
dave vellante	PERSON	0.97+
HDFS	TITLE	0.97+
10	QUANTITY	0.97+
first	QUANTITY	0.97+
Jerry	PERSON	0.97+
facebook	ORGANIZATION	0.97+
hundred L	QUANTITY	0.96+
both	QUANTITY	0.96+
million dollars	QUANTITY	0.96+
one hundred percent	QUANTITY	0.95+
Red Hat	TITLE	0.95+
August	DATE	0.95+
MapReduce	TITLE	0.95+
Amr Awadallah	PERSON	0.95+
tomorrow	DATE	0.94+
hundred million	QUANTITY	0.94+
Dr.	PERSON	0.94+
hundred million dollar	QUANTITY	0.94+
up to 15 hours	QUANTITY	0.93+
hadoop	TITLE	0.93+
Windows	TITLE	0.93+
single machine	QUANTITY	0.92+
HBase	TITLE	0.92+
new york city	LOCATION	0.9+
years	QUANTITY	0.9+
a year	QUANTITY	0.9+
Apache	ORGANIZATION	0.9+
one	QUANTITY	0.89+
a lot of people	QUANTITY	0.87+
red hat	TITLE	0.85+
Hadoop World	TITLE	0.84+
SiliconANGLE	ORGANIZATION	0.82+
two-thirty	DATE	0.8+
Fudd	PERSON	0.77+
Michaelson road	PERSON	0.74+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for HDFS: