Basil Faruqui, BMC | theCUBE NYC 2018
(upbeat music) >> Live from New York, it's theCUBE. Covering theCUBE New York City 2018. Brought to you by SiliconANGLE Media and its ecosystem partners. >> Okay, welcome back everyone to theCUBE NYC. This is theCUBE's live coverage covering CubeNYC Strata Hadoop Strata Data Conference. All things data happen here in New York this week. I'm John Furrier with Peter Burris. Our next guest is Basil Faruqui lead solutions marketing manager digital business automation within BMC returns, he was here last year with us and also Big Data SV, which has been renamed CubeNYC, Cube SV because it's not just big data anymore. We're hearing words like multi cloud, Istio, all those Kubernetes. Data now is so important, it's now up and down the stack, impacting everyone, we talked about this last year with Control M, how you guys are automating in a hurry. The four pillars of pipelining data. The setup days are over; welcome to theCUBE. >> Well thank you and it's great to be back on theCUBE. And yeah, what you said is exactly right, so you know, big data has really, I think now been distilled down to data. Everybody understands data is big, and it's important, and it is really you know, it's quite a cliche, but to a larger degree, data is the new oil, as some people say. And I think what you said earlier is important in that we've been very fortunate to be able to not only follow the journey of our customers but be a part of it. So about six years ago, some of the early adopters of Hadoop came to us and said that look, we use your products for traditional data warehousing on the ERP side for orchestration workloads. We're about to take some of these projects on Hadoop into production and really feel that the Hadoop ecosystem is lacking enterprise-grade workflow orchestration tools. So we partnered with them and some of the earliest goals they wanted to achieve was build a data lake, provide richer and wider data sets to the end users to be able to do some dashboarding, customer 360, and things of that nature. Very quickly, in about five years time, we have seen a lot of these projects mature from how do I build a data lake to now applying cutting-edge ML and AI and cloud is a major enabler of that. You know, it's really, as we were talking about earlier, it's really taking away excuses for not being able to scale quickly from an infrastructure perspective. Now you're talking about is it Hadoop or is it S3 or is it Azure Blob Storage, is it Snowflake? And from a control-end perspective, we're very platform and technology agnostic, so some of our customers who had started with Hadoop as a platform, they are now looking at other technologies like Snowflake, so one of our customers describes it as kind of the spine or a power strip of orchestration where regardless of what technology you have, you can just plug and play in and not worry about how do I rewire the orchestration workflows because control end is taking care of it. >> Well you probably always will have to worry about that to some degree. But I think where you're going, and this is where I'm going to test with you, is that as analytics, as data is increasingly recognized as a strategic asset, as analytics increasingly recognizes the way that you create value out of those data assets, and as a business becomes increasingly dependent upon the output of analytics to make decisions and ultimately through AI to act differently in markets, you are embedding these capabilities or these technologies deeper into business. They have to become capabilities. They have to become dependable. They have to become reliable, predictable, cost, performance, all these other things. That suggests that ultimately, the historical approach of focusing on the technology and trying to apply it to a periodic or series of data science problems has to become a little bit more mature so it actually becomes a strategic capability. So the business can say we're operating on this, but the technologies to take that underlying data science technology to turn into business operations that's where a lot of the net work has to happen. Is that what you guys are focused on? >> Yeah, absolutely, and I think one of the big differences that we're seeing in general in the industry is that this time around, the pull of how do you enable technology to drive the business is really coming from the line of business, versus starting on the technology side of the house and then coming to the business and saying hey we've got some cool technologies that can probably help you, it's really line of business now saying no, I need better analytics so I can drive new business models for my company, right? So the need for speed is greater than ever because the pull is from the line of business side. And this is another area where we are unique is that, you know, Control M has been designed in a way where it's not just a set of solutions or tools for the technical guys. Now, the line of business is getting closer and closer, you know, it's blending into the technical side as well. They have a very, very keen interest in understanding are the dashboards going to be refreshed on time? Are we going to be able to get all the right promotional offers at the right time? I mean, we're here at NYC Strata, there's a lot of real-time promotion happening here. The line of business has direct interest in the delivery and the timing of all of this, so we have always had multiple interfaces to Control M where a business user who has an interest in understanding are the promotional offers going to happen at the right time and is that on schedule? They have a mobile app for them to do that. A developer who's building up complex, multi-application platform, they have an API and a programmatic interface to do that. Operations that has to monitor all of this has rich dashboards to be able to do that. That's one of the areas that has been key for our success over the last couple decades, and we're seeing that translate very well into the big data place. >> So I just want to go under the hood for a minute because I love that answer. And I'd like to pivot off what Peter said, tying it back to the business, okay, that's awesome. And I want to learn a little bit more about this because we talked about this last year and I kind of am seeing it now. Kubernetes and all this orchestration is about workloads. You guys nailed the workflow issue, complex workflows. Because if you look at it, if you're adding line of business into the equation, that's just complexity in and of itself. As more workflows exist within its own line of business, whether it's recommendations and offers and workflow issues, more lines of business in there is complex for even IT to deal with, so you guys have nailed that. How does that work? Do you plug it in and the lines of businesses have their own developers, so the people who work with the workflows engage how? >> So that's a good question, with sort of orchestration and automation now becoming very, very generic, it's kind of important to classify where we play. So there's a lot of tools that do release and build automation. There's a lot of tools that'll do infrastructure automation and orchestration. All of this infrastructure and release management process is done ultimately to run applications on top of it, and the workflows of the application need orchestration and that's the layer that we play in. And if you think about how does the end user, the business and consumer interact with all of this technology is through applications, k? So the orchestration of the workflow's inside the applications, whether you start all the way from an ERP or a CRM and then you land into a data lake and then do an ML model, and then out come the recommendations analytics, that's the layer we are automating today. Obviously, all of this-- >> By the way, the technical complexity for the user's in the app. >> Correct, so the line of business obviously has a lot more control, you're seeing roles like chief digital officers emerge, you're seeing CTOs that have mandates like okay you're going to be responsible for all applications that are facing customer facing where the CIO is going to take care of everything that's inward facing. It's not a settled structure or science involved. >> It's evolving fast. >> It's evolving fast. But what's clear is that line of business has a lot more interest and influence in driving these technology projects and it's important that technologies evolve in a way where line of business can not only understand but take advantage of that. >> So I think it's a great question, John, and I want to build on that and then ask you something. So the way we look at the world is we say the first fifty years of computing were known process, unknown technology. The next fifty years are going to be unknown process, known technology. It's all going to look like a cloud. But think about what that means. Known process, unknown technology, Control M and related types of technologies tended to focus on how you put in place predictable workflows in the technology layer. And now, unknown process, known technology, driven by the line of business, now we're talking about controlling process flows that are being created, bespoke, strategic, differentiating doing business. >> Well, dynamic, too, I mean, dynamic. >> Highly dynamic, and those workflows in many respects, those technologies, piecing applications and services together, become the process that differentiates the business. Again, you're still focused on the infrastructure a bit, but you've moved it up. Is that right? >> Yeah, that's exactly right. We see our goal as abstracting the complexity of the underlying application data and infrastructure. So, I mean, it's quite amazing-- >> So it could be easily reconfigured to a business's needs. >> Exactly, so whether you're on Hadoop and now you're thinking about moving to Snowflake or tomorrow something else that comes up, the orchestration or the workflow, you know, that's as a business as a product that's our goal is to continue to evolve quickly and in a manner that we continue to abstract the complexity so from-- >> So I've got to ask you, we've been having a lot of conversations around Hadoop versus Kubernetes on multi cloud, so as cloud has certainly come in and changed the game, there's no debate on that. How it changes is debatable, but we know that multiple clouds is going to be the modus operandus for customers. >> Correct. >> So I got a lot of data and now I've got pipelining complexities and workflows are going to get even more complex, potentially. How do you see the impact of the cloud, how are you guys looking at that, and what are some customer use cases that you see for you guys? >> So the, what I mentioned earlier, that being platform and technology agnostic is actually one of the unique differentiating factors for us, so whether you are an AWS or an Azure or a Google or On-Prem or still on a mainframe, a lot of, we're in New York, a lot of the banks, insurance companies here still do some of the most critical processing on the mainframe. The ability to abstract all of that whether it's cloud or legacy solutions is one of our key enablers for our customers, and I'll give you an example. So Malwarebytes is one of our customers and they've been using Control M for several years. Primarily the entire structure is built on AWS, but they are now utilizing Google cloud for some of their recommendation analysis on sentiment analysis because their goal is to pick the best of breed technology for the problem they're looking to solve. >> Service, the best breed service is in the cloud. >> The best breed service is in the cloud to solve the business problem. So from Control M's perspective, transcending from AWS to Google cloud is completely abstracted for them, so runs Google tomorrow it's Azure, they decide to build a private cloud, they will be able to extend the same workflow orchestration. >> But you can build these workflows across whatever set of services are available. >> Correct, and you bring up an important point. It's not only being able to build the workflows across platforms but being able to define dependencies and track the dependencies across all of this, because none of this is happening in silos. If you want to use Google's API to do the recommendations, well, you've got to feed it the data, and the data's pipeline, like we talked about last time, data ingestion, data storage, data processing, and analytics have very, very intricate dependencies, and these solutions should be able to manage not only the building of the workflow but the dependencies as well. >> But you're defining those elements as fundamental building blocks through a control model >> Correct. >> That allows you to treat the higher level services as reliable, consistent, capabilities. >> Correct, and the other thing I would like to add here is not only just build complex multiplatform, multiapplication workflows, but never lose focus of the business service of the business process there, so you can tie all of this to a business service and then, these things are complex, there are problems, let's say there's an ETL job that fails somewhere upstream, Control M will immediately be able to predict the impact and be able to tell you this means the recommendation engine will not be able to make the recommendations. Now, the staff that's going to work under mediation understands the business impact versus looking at a screen where there's 500 jobs and one of them has failed. What does that really mean? >> Set priorities and focal points and everything else. >> Right. >> So I just want to wrap up by asking you how your talk went at Strata Hadoop Data Conference. What were you talking about, what was the core message? Was it Control M, was it customer presentations? What was the focus? >> So the focus of yesterday's talk was actually, you know, one of the things is academic talk is great, but it's important to, you know, show how things work in real life. The session was focused on a real-use case from a customer. Navistar, they have IOT data-driven pipelines where they are predicting failures of parts inside trucks and buses that they manufacture, you know, reducing vehicle downtime. So we wanted to simulate a demo like that, so that's exactly what we did. It was very well received. In real-time, we spun up EMR environment in AWS, automatically provision control of infrastructure there, we applied spark and machine learning algorithms to the data and out came the recommendation at the end was that, you know, here are the vehicles that are-- >> Fix their brakes. (laughing) >> Exactly, so it was very, very well received. >> I mean, there's a real-world example, there's real money to be saved, maintenance, scheduling, potential liability, accidents. >> Liability is a huge issue for a lot of manufacturers. >> And Navistar has been at the leading edge of how to apply technologies in that business. >> They really have been a poster child for visual transformation. >> They sure have. >> Here's a company that's been around for 100 plus years and when we talk to them they tell us that we have every technology under the sun that has come since the mainframe, and for them to be transforming and leading in this way, we're very fortunate to be part of their journey. >> Well we'd love to talk more about some of these customer use cases. Other people love about theCUBE, we want to do more of them, share those examples, people love to see proof in real-world examples, not just talk so appreciate it sharing. >> Absolutely. >> Thanks for sharing, thanks for the insights. We're here Cube live in New York City, part of CubeNYC, we're getting all the data, sharing that with you. I'm John Furrier with Peter Burris. Stay with us for more day two coverage after this short break. (upbeat music)
SUMMARY :
Brought to you by SiliconANGLE Media with Control M, how you guys are automating in a hurry. describes it as kind of the spine or a power strip but the technologies to take that underlying of the house and then coming to the business You guys nailed the workflow issue, and that's the layer that we play in. for the user's in the app. Correct, so the line of business and it's important that technologies evolve in a way So the way we look at the world is we say that differentiates the business. of the underlying application data and infrastructure. so as cloud has certainly come in and changed the game, and what are some customer use cases that you see for the problem they're looking to solve. is in the cloud. The best breed service is in the cloud But you can build these workflows across and the data's pipeline, like we talked about last time, That allows you to treat the higher level services and be able to tell you this means the recommendation engine So I just want to wrap up by asking you at the end was that, you know, Fix their brakes. there's real money to be saved, And Navistar has been at the leading edge of how They really have been a poster child for and for them to be transforming and leading in this way, people love to see proof in real-world examples, Thanks for sharing, thanks for the insights.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
John | PERSON | 0.99+ |
Basil Faruqui | PERSON | 0.99+ |
Peter Burris | PERSON | 0.99+ |
BMC | ORGANIZATION | 0.99+ |
Peter | PERSON | 0.99+ |
500 jobs | QUANTITY | 0.99+ |
ORGANIZATION | 0.99+ | |
New York | LOCATION | 0.99+ |
last year | DATE | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
New York City | LOCATION | 0.99+ |
SiliconANGLE Media | ORGANIZATION | 0.99+ |
John Furrier | PERSON | 0.99+ |
Hadoop | TITLE | 0.99+ |
first fifty years | QUANTITY | 0.99+ |
theCUBE | ORGANIZATION | 0.99+ |
Navistar | ORGANIZATION | 0.99+ |
tomorrow | DATE | 0.98+ |
yesterday | DATE | 0.98+ |
one | QUANTITY | 0.98+ |
this week | DATE | 0.97+ |
Malwarebytes | ORGANIZATION | 0.97+ |
Cube | ORGANIZATION | 0.95+ |
Control M | ORGANIZATION | 0.95+ |
NYC | LOCATION | 0.95+ |
Snowflake | TITLE | 0.95+ |
Strata Hadoop Data Conference | EVENT | 0.94+ |
100 plus years | QUANTITY | 0.93+ |
CubeNYC Strata Hadoop Strata Data Conference | EVENT | 0.92+ |
last couple decades | DATE | 0.91+ |
Azure | TITLE | 0.91+ |
about five years | QUANTITY | 0.91+ |
Istio | ORGANIZATION | 0.9+ |
CubeNYC | ORGANIZATION | 0.89+ |
day | QUANTITY | 0.87+ |
about six years ago | DATE | 0.85+ |
Kubernetes | TITLE | 0.85+ |
today | DATE | 0.84+ |
NYC Strata | ORGANIZATION | 0.83+ |
Hadoop | ORGANIZATION | 0.78+ |
one of them | QUANTITY | 0.77+ |
Big Data SV | ORGANIZATION | 0.75+ |
2018 | EVENT | 0.7+ |
Kubernetes | ORGANIZATION | 0.66+ |
fifty years | DATE | 0.62+ |
Control M | TITLE | 0.61+ |
four pillars | QUANTITY | 0.61+ |
two | QUANTITY | 0.6+ |
-Prem | ORGANIZATION | 0.6+ |
Cube SV | COMMERCIAL_ITEM | 0.58+ |
a minute | QUANTITY | 0.58+ |
S3 | TITLE | 0.55+ |
Azure | ORGANIZATION | 0.49+ |
cloud | TITLE | 0.49+ |
2018 | DATE | 0.43+ |
Ronen Schwartz, Informatica | theCUBE NYC 2018
>> Live from New York, it's theCUBE covering theCUBE New York City 2018. Brought to you by SiliconANGLE Media and its ecosystem partners. (techy music) >> Welcome back to the Big Apple, everybody. This is theCUBE, the leader in live tech coverage. My name is Dave Vellante, I'm here with my cohost Peter Burris, and this is our week-long coverage of CUBENYC. It used to be, really, a big data theme. It sort of evolved into data, AI, machine learning. Ronan Schwartz is here, he's the senior vice president and general manager of cloud, big data, and data integration at data integration company Informatica. Great to see you again, Ronan, thanks so much for coming on. >> Thanks for inviting me, it's a good, warm day in New York. >> Yeah, the storm is coming and... Well, speaking of storms, the data center is booming. Data is this, you know, crescendo of storms (chuckles) have occurred, and you guys are at the center of that. It's been a tailwind for your business. Give us the update, how's business these days? >> So, we finished Q2 in a great, great success, the best Q2 that we ever had, and the third quarter looks just as promising, so I think the short answer is that we are seeing the strong demand for data, for technologies that supports data. We're seeing more users, new use cases, and definitely a huge growth in need to support... To support data, big data, data in the cloud, and so on, so I think very, very good Q2 and it looks like Q3's going to be just as good, if not better. >> That's great, so there's been a decades-long conversation, of course, about data, the value of data, but more often than not over the history of recent history, when I say recent I mean let's say 20 years on, data's been a problem for people. It's been expensive, how do you manage it, when do you delete it? It's sort of this nasty thing that people have to deal with. Fast forward to 2010, the whole Hadoop movement, all of a sudden data's the new oil, data's... You know, which Peter, of course, disagrees with for many reasons. >> No, it's... >> We don't have to get into it. >> It's subtlety. >> It's a subtlety, but you're right about it, and well, maybe if we have time we can talk about that, but the bromide of... But really focused attention on data and the importance of data and the value of data, and that was really a big contribution that Hadoop made. There were a lot of misconceptions. "Oh, we don't need the data warehouse anymore. "Oh, we don't need old," you know, "legacy databases." Of course none of those are true. Those are fundamental components of people's big data strategy, but talk about the importance of data and where Informatica fits. >> In a way, if I look into the same history that you described, and Informatica have definitely been a player through this history. We divide it into three eras. The first one is when data was like this thing that sits below the application, that used the application to feed the data in and if you want to see the data you go through the application, you see the data. We sometimes call that as Data 1.0. Data 2.0 was the time that companies, including Informatica, kind of froze and been able to give you a single view of the data across multiple systems, across your organization, and so on, because we're Informatica we have the ETL with data quality, even with master data management, kind of came into play and allowed an organization to actually build analytics as a system, to build single view as a system, et cetera. I think what is happening, and Hadoop was definitely a trigger, but I would say the cloud is just as big of a trigger as the big data technologies, and definitely everything that's happening right now with Spark and the processing power, et cetera, is contributing to that. This is the time of the Data 3.0 when data is actually in the center. It's not a single application like it was in the Data 2.0. It's not this thing below the application in Data 1.0. Data is in the center and everything else is just basically have to be connected to the data, and I think it's an amazing time. A big part of digitalization is the fact that the data is actually there. It's the most important asset the organization has. >> Yeah, so I want to follow up on something. So, last night we had a session Peter hosted on the future of AI, and he made the point, I said earlier data's the new oil. I said you disagreed, there's a nuance there. You made the point last night that oil, I can put oil in my car, I can put oil in my house, I can't do both. Data is the new currency, people said, "Well, I can spend a dollar or I can spend "a dollar on sports tickets, I can't do both." Data's different in that... >> It doesn't follow the economics of scarcity, and I think that's one of the main drivers here. As you talk about 1.0, 2.0, and 3.0, 1.0 it's locked in the application, 2.0 it's locked in a model, 3.0 now we're opening it up so that the same data can be shared, it can be evolved, it can be copied, it can be easily transformed, but their big issue is we have to sustain overall coherence of it. Security has to remain in place, we have to avoid corruption. Talk to us about some of the new demands given, especially that we've got this, more data but more users of that data. As we think about evidence-based management, where are we going to ensure that all of those new claims from all of those new users against those data sources can be satisfied? >> So, first, I truly like... This is a big nuance, it's not a small one. (laughs) The fact that you have better idea actually means that you do a lot of things better. It doesn't mean that you do one thing better and you cannot do the other. >> Right. I agree 100%, I actually contribute that for two things. One is more users, and the other thing is more ways to use the data, so the fact that you have better data, more data, big data, et cetera, actually means that your analytics is going to be better, right, but it actually means that if you are looking into hyperautomation and AI and machine learning and so on, suddenly this is possible to do because you have this data foundation that is big enough to actually support machine learning processes, and I think we're just in the beginning of that. I think we're going to see data being used for more and more use cases. We're in the integration business and in the data management business, and we're seeing, within what our customers are asking us to support, this huge growth in the number of patterns of how they want the data to be available, how they want to bring data into different places, into different users, so all of that is truly supporting what you just mentioned. I think if you look into the Data 2.0 timeframe, it was the time that a single team that is very, very strong with the right tools can actually handle the organization needs. In what you described, suddenly self-service. Can every group consume the data? Can I get the data in both batch and realtime? Can I get the data in a massive amount as well as in small chunks? These are all becoming very, very central. >> And very use case, but also user and context, you know, we think about time, dependent, and one of the biggest challenges that we have is to liberate the data in the context of the multiple different organization uses, and one of the biggest challenges that customers have, or that any enterprise has, and again, evidence-based management, nice trend, a lot of it's going to happen, but the familiarity with data is still something that's not, let's say broadly diffused, and a lot of the tools for ensuring that people can be made familiar, can discover, can reuse, can apply data, are modestly endowed today, so talk about some of these new tools that are going to make it easier to discover, capture, catalog, sustain these data assets? >> Yeah, and I think you're absolutely right, and if this is such a critical asset, and data is, and we're actually looking into more user consuming the data in more ways, it actually automatically create a bottleneck in how do I find the data, how do I identify the data that I need, and how am I making this available in the right place at the right time? In general, it looks like a problem that is almost unsolvable, like I got more data, more users, more patterns, nobody have their budget tripled or quadrupled just to be able to consume it. How do you address that, and I think Informatica very early have identified this growing need, and we have invested in a product that we call the enterprise data catalog, and it's actually... The concept of a catalog or a metadata repository, a place that you can actually identify all the data that exists, is not necessarily a new concept-- >> No, it's been around for years. >> Yes, but doing it in an enterprise-unified way is unique, and I think if you look into what we're trying to basically empower any user to do I basically, you know, we all using Google. You type something and you find it. If you're trying to find data in the organization in a similar way, it's a much harder task, and basically the catalog and Informatica unified, enterprise-unified catalog is doing that, leveraging a lot of machine learning and AI behind the scenes to basically make this search possible, make basically the identification of the data possible, the curation of the data possible, and basically empowering every user to find the data that he wants, see recommendation for other data that can work with it, and then basically consume the data in the way that he wants. I totally think that this will change the way IT is functioning. It is actually an amazing bridge between IT and the business. If there is one place that you can search all your data, suddenly the whole interface between IT and the business is changing, and Informatica's actually leading this change. >> So, the catalog gives you line-of-sight on all, (clears throat) all those data sources, what's the challenge in terms of creating a catalog and making it performant and useful? >> I think there are a few levels of the challenge. I chose the word enterprise-unified intelligent catalog deliberately, and I think each one of them is kind of representing a different challenge. The first challenge is the unified. There is technical metadata, this is the mapping and the processes that move data from one place to the other, then there is business metadata. These are the definition the business is using, and then there is the operational metadata as well, as well as the physical location and so on. Unifying all of them so that you can actually connect and see them in one place is a unique challenge that at this stage we have already completely addressed. The second one is enterprise, and when talking about enterprise metadata it means that you want all of your applications, you want application in the cloud, you want your cloud environment, your big data environment. You want, actually, your APIs, you want your integration environment. You want to be able to collect all of this metadata across the enterprise, so unified all the types, enterprise is the second one. The third challenge is actually the most exciting one, is how can you leverage intelligence so it's not limited by the human factor, by the amount of people that you have to actually put the data together, right? >> Mm-hm. >> And today we're using a very, very sophisticated, interesting logarithm to run on the metadata and be able to tell you that even though you don't know how the data got from here to here, it actually did get from here to here. >> Mm-hm. >> It's a dotted line, maybe somebody copied it, maybe something else happened, but the data is so similar that we can actually tell you it came from one place. >> So, actually, let me see, because I think there's... I don't think you missed a step, but let me reveal a step that's in there. One of the key issues in the enterprise side of things is to reveal how data's being used. The value of data is tied to its context, and having catalogs that can do, as you said, the unified, but also the metadata becomes part of how it's used makes that opportunity, that ability to then create audit trails and create lineage possible. >> You're absolutely right, and I think it actually is one of the most important things, is to see where the data came from and what steps did it go to. >> Right. >> There's also one other very interesting value of lineage that I think sometimes people tend to ignore is who else is using it? >> Right. >> Who else is consuming it, because that is actually, like, a very good indicator of how good the data is or how common the data is. The ability to actually leverage and create this lineage is a mandatory thing. The ability to create lineage that is inferred, and not actually specifically defined, is also very, very interesting, but we're now doing, like, things that are, I think, really exciting. For example, let's say that a user is looking into a data field in one source and he is actually identifying that this is a certain, specific ID that his organization is using. Now we're able to actually automatically understand that this field actually exists in 700 places, and actually, leverage the intelligence that he just gave us and actually ask him, "Do you want it to be automatically updated everywhere? "Do you want to do it in a step-by-step, guided way?" And this is how you actually scale to handle the massive amount of data, and this is how organizations are going to learn more and more and get the data to be better and better the more they work with the data. >> Now, Ronan, you have hard news this week, right? Why don't you update us on what you've announced? >> So, I think in the context for our discussion, Informatica announced here, actually today, this morning in Strata, a few very exciting news that are actually helping the customer go into this data journey. The first one is basically supporting data across, big data across multi-clouds. The ability to basically leverage all of these great tools, including the catalog, including the big data management, including data quality, data governance, and so on, on AWS, on Azure, on GCP, basically without any effort needed. We're even going further and we're empowering our user to use it in a serverless mode where we're actually allowing them full control over the resources that are being consumed. This is really, really critical because this is actually allowing them to do more with the data in a lower cost. I think the last part of the news that is really exciting is we added a lot, a lot of functionality around our Spark processing and the capabilities of the things that you can do so that the developers, the AI and machine learning can use their stuff, but at the same time we actually empower business users to do more than they ever did before. So, kind of being able to expand the amount of users that can access the data, wanting a more sophisticated way, and wanting a very simple but still very powerful way, I think this is kind of the summary of the news. >> And just a quick followup on that. If I understand it, it's your full complement of functionality across these clouds, is that right? You're not neutering... (chuckles) >> That is absolutely correct, yes, and we are seeing, definitely within our customers, a growing choice to decide to focus their big data efforts in the cloud, it makes a lot of sense. The ability to scale up and down in the cloud is significantly superior, but also the ability to give more users access in the cloud is typically easier, so I think Informatica have chosen as the market we're focusing on enterprise cloud data management. We talked a lot about data management. This is a lot about the cloud, the cloud part of it, and it's basically a very, very focused effort in optimizing things across clouds. >> Cloud is critical, obviously. That's how a lot of people want to do business. They want to do business in a cloud-like fashion, whether it's on-prem or off-prem. A lot of people want things to be off-prem. Cloud's important because it's where innovation is happening, and scale. Ronan, thanks so much for coming on theCUBE today. >> Yeah, thank you very much and I did learn something, oil is not one of the terms that I'm going to use for data in the future. >> Makes you think about that, right? >> I'm going to use something different, yes. >> It's good, and I also... My other takeaway is, in that context, being able to use data in multiple places. Usage is a proportional relationship between usage and value, so thanks for that. >> Excellent. >> Happy to be here. >> And thank you, everybody, for watching. We will be right back right after this short break. You're watching theCUBE at #CUBENYC, we'll be right back. (techy music)
SUMMARY :
Brought to you by SiliconANGLE Media Ronan Schwartz is here, he's the senior Well, speaking of storms, the data center is booming. the best Q2 that we ever had, and the third quarter conversation, of course, about data, the value of data, and the importance of data and the value of data, that the data is actually there. Data is the new currency, people said, so that the same data can be shared, it can be evolved, The fact that you have better idea actually so the fact that you have better data, in how do I find the data, how do I identify the data behind the scenes to basically make this search possible, by the amount of people that you have to actually put how the data got from here to here, it actually did get maybe something else happened, but the data and having catalogs that can do, as you said, it actually is one of the most important things, and get the data to be better and better of the things that you can do so that the developers, of functionality across these clouds, is that right? but also the ability to give more users That's how a lot of people want to do business. that I'm going to use for data in the future. being able to use data in multiple places. And thank you, everybody, for watching.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave Vellante | PERSON | 0.99+ |
Ronan | PERSON | 0.99+ |
Ronan Schwartz | PERSON | 0.99+ |
Informatica | ORGANIZATION | 0.99+ |
Peter | PERSON | 0.99+ |
New York | LOCATION | 0.99+ |
100% | QUANTITY | 0.99+ |
Peter Burris | PERSON | 0.99+ |
SiliconANGLE Media | ORGANIZATION | 0.99+ |
20 years | QUANTITY | 0.99+ |
Ronen Schwartz | PERSON | 0.99+ |
700 places | QUANTITY | 0.99+ |
2010 | DATE | 0.99+ |
third challenge | QUANTITY | 0.99+ |
One | QUANTITY | 0.99+ |
both | QUANTITY | 0.99+ |
two things | QUANTITY | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
a dollar | QUANTITY | 0.99+ |
ORGANIZATION | 0.99+ | |
one source | QUANTITY | 0.99+ |
first challenge | QUANTITY | 0.98+ |
first one | QUANTITY | 0.98+ |
today | DATE | 0.98+ |
one | QUANTITY | 0.98+ |
last night | DATE | 0.98+ |
first | QUANTITY | 0.98+ |
this week | DATE | 0.97+ |
this morning | DATE | 0.97+ |
second one | QUANTITY | 0.97+ |
one place | QUANTITY | 0.97+ |
Spark | TITLE | 0.97+ |
3.0 | OTHER | 0.96+ |
single application | QUANTITY | 0.96+ |
New York City | LOCATION | 0.95+ |
single team | QUANTITY | 0.93+ |
decades | QUANTITY | 0.92+ |
2.0 | OTHER | 0.91+ |
each one | QUANTITY | 0.91+ |
Hadoop | TITLE | 0.9+ |
theCUBE | ORGANIZATION | 0.89+ |
1.0 | OTHER | 0.89+ |
single view | QUANTITY | 0.89+ |
third quarter | DATE | 0.88+ |
Data 3.0 | TITLE | 0.85+ |
Data 2.0 | TITLE | 0.85+ |
Data 1.0 | OTHER | 0.84+ |
Q2 | DATE | 0.83+ |
Data 2.0 | OTHER | 0.83+ |
Azure | TITLE | 0.82+ |
both batch | QUANTITY | 0.81+ |
Big Apple | LOCATION | 0.81+ |
NYC | LOCATION | 0.78+ |
one thing | QUANTITY | 0.74+ |
three eras | QUANTITY | 0.74+ |
GCP | TITLE | 0.65+ |
Q3 | DATE | 0.64+ |
Hadoop | PERSON | 0.64+ |
David Richards, WANdisco | theCUBE NYC 2018
Live from New York, it's theCUBE. Covering theCUBE, New York City 2018. Brought to you by SiliconANGLE Media and its ecosystem partners. >> Okay, welcome back everyone. This is theCUBE live in New York City for our CUBE NYC event, #cubenyc. This is our ninth year covering the big data ecosystem going back to the original Hadoop world, now it's evolved to essentially all things AI, future of AI. Peter Burris is my cohost. He gave a talk two nights ago on the future of AI presented in his research. So it's all about data, it's all about the cloud, it's all about live action here in theCUBE. Our next guest is David Richards, who's been in the industry for a long time, seen the evolution of Hadoop, been involved in it, has been a key enabler of the technology, certainly enabling cloud recovery replication for cloud, welcome back to theCUBE. It's good to see you. >> It's really good to be here. >> I got to say, you've been on theCUBE pretty much every year, I think every year, we've done nine years now. You made some predictions and calls that actually happened. Like five years ago you said the cloud's going to kill Hadoop. Yeah, I think you didn't say that off camera, but it might (laughing) maybe you said it on camera. >> I probably did, yeah. >> [John] But we were kind of pontificating but also speculating, okay, where does this go? You've been right on a lot of calls. You also were involved in the Hadoop distribution business >>back in the day. Oh god. >> You got out of that quickly. (laughing) You saw that early, good call. But you guys have essentially a core enabler that's been just consistently performing well in the market both on the Hadoop side, cloud, and as data becomes the conversation, which has always been your perspective, you guys have had a key in part of the infrastructure for a long time. What's going on? Is it still doing deals, what's? >> Yes, I mean, the history of WANdisco's play and big data in Hadoop has been, as you know because you've been with us for a long time, kind of an interesting one. So we back in sort of 2013, 2014, 2015 we built a Hadoop-specific product called Non-Stop NameNode and we had a Hadoop distribution. But we could see this transition, this change in the market happening. And the change wasn't driven necessarily by the advent of new technology. It was driven by overcomplexity associated with deploying, managing Hadoop clusters at scale because lots of people, and we were talking about this off-camera before, can deploy Hadoop in a fairly small way, but not many companies are equipped or built to deploy massive scale Hadoop distributions. >> Sustain it. >> They can't sustain it, and so the call that I made you know, actions speak louder than words. The company rebuilt the product, built a general purpose data replication platform called WANdisco Fusion that, yes, supported Hadoop but also supported object store and cloud technologies. And we're now seeing use cases in cloud certainly begin to overtake Hadoop for us for the first time. >> And you guys have a patent that's pretty critical in all this, right? >> Yeah. So there's some real IP. >> Yes, so people often make the mistake of calling us a data replication business, which we are, but data replication happens post-consensus or post-agreement, so the very heart of WANdisco of 35 patents are all based around a Paxos-based consensus algorithm, which wasn't a very cool thing to talk about now with the advent of blockchain and decentralized computing, consensus is at the core of pretty much that movement, so what WANdisco does is a consensus algorithm that enables things like hybrid cloud, multi cloud, poly cloud as Microsoft call it, as well as disaster recovery for Hadoop and other things. >> Yeah, as you have more disparate parts working together, say multi cloud, I mean, you're really perfectly positioned for multi cloud. I mean, hybrid cloud is hybrid cloud, but also multi cloud, they're two different things. Peter has been on the record describing the difference between hybrid cloud and multi cloud, but multi cloud is essentially connecting clouds. >> We're on a mission at the moment to define what those things actually are because I can tell you what it isn't. A multi cloud strategy doesn't mean you have disparate data and processes running in two different clouds that just means that you've got two different clouds. That's not a multi cloud strategy. >> [Peter] Two cloud silos. >> Yeah, correct. That's kind of creating problems that are really going to be bad further down the road. And hybrid cloud doesn't mean that you run some operations and processes and data on premise and a different siloed approach to cloud. What this means is that you have a data layer that's clustered and stretched, the same data that's stretched across different clouds, different on-premise systems, whether it's Hadoop on-premise and maybe I want to build a huge data lake in cloud and start running complex AI and analytics processes over there because I'm, less face it, banks et cetera ain't going to be able to manage and run AI themselves. It's already being done by Amazon, Google, Microsoft, Alibaba, and others in the cloud. So the ability to run this simultaneously in different locations is really important. That's what we do. >> [John] All right, let me just ask this directly since we're filming and we'll get a clip out of this. What is the definition of hybrid cloud? And what is the definition of multi cloud? Take, explain both of those. >> The ability to manage and run the same data set against different applications simultaneously. And achieve exactly the same result. >> [John] That's hybrid cloud or multi cloud? >> Both. >> So they're the same. >> The same. >> You consider hybrid cloud multi cloud the same? >> For us it's just a different end point. It's hybrid kind of mean that you're running something implies on-premise. A multi cloud or poly cloud implies that you're running between different cloud venues. >> So hybrid is location, multi is source. >> Correct. >> So but let's-- >> [David] That's a good definition. >> Yes, but let's unpack this a little bit because at the end of the day, what a business is going to want to do is they're going to want to be able to run apply their data to the best service. >> [David] Correct. >> And increasingly that's what we're advising our clients to think about. >> [David] Yeah. >> Don't think about being an AWS customer, per se, think about being a customer of AWS services that serve your business. Or IBM services that serve your business. But you want to ensure that your dependency on that service is not absolute, and that's why you want to be able to at least have the option of being able to run your data in all of these different places. >> And I think the market now realizes that there is not going to be a single, dominant vendor for cloud infrastructure. That's not going to happen. Yes, it happened, Oracle dominated in relational data. SAP dominated for ERP systems. For cloud, it's democratized. That's not going to happen. So everybody knows that Amazon probably have the best serverless compute lambda functions available. They've got millions of those things already written or in the process of being written. Everybody knows that Microsoft are going to extend the wonderful technology that they have on desktop and move that into cloud for analytics-based technologies and so on. The Google have been working on artificial intelligence for an elongated period of time, so vendors are going to arbitrage between different cloud vendors. They're going to choose the best of brood approach. >> [John] They're going to go to Google for AI and scale, they're going to go to Amazon for robustness of services, and they're going to go to Microsoft for the Suite. >> [Peter] They're going to go for the services. They're looking at the services, that's what they need to do. >> And the thing that we'll forget, that we don't at WANdisco, is that that requires guaranteed consistent data sets underneath the whole thing. >> So where does Fusion fit in here? How is that getting traction? Give us some update. Are you working with Microsoft? I know we've been talking about Amazon, what about Microsoft? >> So we've been working with Microsoft, we announced a strategic partnership with them in March where we became a tier zero vendor, which basically means that we're partnered with them in lockstep in the field. We executed extremely well since that point and we've done a number of fairly large, high-profile deals. A retailer, for example, that was based in Amazon didn't really like being based in Amazon so had to build a poly cloud implementation to move had to buy scale data from AWS into Azure, that went seamlessly. It was an overnight success. >> [John] And they're using your technology? >> They're using our technology. There's no other way to do that. I think the world has now, what Microsoft and others have realized, CDC technology changed data capture. Doesn't work at this kind of scale where you batch up a bunch of changes and then you ship them, block shipping or whatever, every 15 minutes or so. We're talking about petabyte scale ingest processes. We're talking about huge data lakes, that that technology simply doesn't work at this kind of scale. >> [John] We've got a couple minutes left, I want to just make sure we get your views on blockchain, you mentioned consensus, I want to get your thoughts on that because we're seeing blockchain is certainly experimental, it's got, it's certainly powering money, Bitcoin and the international markets, it's certainly becoming a money backbone for countries to move billions of dollars out. It's certainly in the tank right now about 600 million below its mark in January, but blockchain is fundamentally supply chain, you're seeing consensus, you're seeing some of these things that are in your realm, what's your view? >> So first of all, at WANdisco, we separate the notion of cryptocurrency and blockchain. We see blockchain as something that's been around for a long time. It's basically the world is moving to decentralization. We're seeing this with airlines, with supermarkets, and so on. People actually want to decentralize rather that centralize now. And the same thing is going to happen in the financial industry where we don't actually need a central transaction coordinator anymore, we don't need a clearinghouse, in other words. Now, how do you do that? At the very heart of blockchain is an incorrect assumption. So must people think that Satoshi's invention, whoever that may be, was based around the blockchain itself. Blockchain is pieced together technologies that doesn't actually scale, right? So it takes game-theoretic approach to consensus. And I won't get, we don't have enough time for me to delve into exactly what that means, but our consensus algorithm has already proven to scale, right? So what does that mean? Well, it means that if you want to go and buy a cup of coffee at the Starbucks next door, and you want to use a Bitcoin, you're going to be waiting maybe half an hour for that transaction to settle, right? Because the-- >> [John] The buyer's got to create a block, you know, all that step's in one. >> The game-theoretic approach basically-- >> Bitcoin's running 500,000 transactions a day. >> Yeah. That's eight. >> There's two transactions per second, right? Between two and eight transactions per second. We've already proven that we can achieve hundreds of thousands, potentially millions of agreements per second. Now the argument against using Paxos, which is what our technology's based on, is it's too complicated. Well, no shit, of course it's too complicated. We've solved that problem. That's what WANdisco does. So we've filed a patent >> So you've abstracted the complexity, that's your job. >> We've extracted the complexity. >> So you solve the complexity problem by being a complex solution, but you're making and abstracting it even easier. >> We have an algorithmic not a game-theoretic approach. >> Solving the scale problem Correct. >> Using Paxos in a way that allows real developers to be able to build consensus algorithm-based applications. >> Yes, and 90% of blockchain is consensus. We've solved the consensus problem. We'll be launching a product based around Hyperledger very soon, we're already in tests and we're already showing tens of thousands of transactions per second. Not two, not 2,000, two transactions. >> [Peter] The game theory side of it is still going to be important because when we start talking about machines and humans working together, programs don't require incentives. Human beings do, and so there will be very, very important applications for this stuff. But you're right, from the standpoint of the machine-to-machine when there is no need for incentive, you just want consensus, you want scale. >> Yeah and there are two approaches to this world of blockchains. There's public, which is where the Bitcoin guys are and the anarchists who firmly believe that there should be no oversight or control, then there's the real world which is permission blockchains, and permission blockchains is where the banks, where the regulators, where NASDAQ will be when we're trading shares in the future. That will be a permission blockchain that will be overseen by a regulator like the SEC, NASDAQ, or London Stock Exchange, et cetera. >> David, always great to chat with you. Thanks for coming on, again, always on the cutting edge, always having a great vision while knocking down some good technology and moving your IP on the right waves every time, congratulations. >> Thank you. >> Always on the next wave, David Richards here inside theCUBE. Every year, doesn't disappoint, theCUBE bringing you all the action here. Cube NYC, we'll be back with more coverage. Stay with us; a lot more action for the rest of the day. We'll be right back; stay with us for more after this short break. (upbeat music)
SUMMARY :
Brought to you by SiliconANGLE Media has been a key enabler of the technology, I got to say, you've been on theCUBE [John] But we were kind of pontificating back in the day. and as data becomes the conversation, in the market happening. and so the call that I made So there's some real IP. consensus is at the core of Peter has been on the record at the moment to define So the ability to run this simultaneously What is the definition of hybrid cloud? and run the same data set implies that you're running is they're going to want to be able to run our clients to think about. of being able to run your data that there is not going to and they're going to go to They're looking at the services, And the thing that we'll forget, How is that getting traction? in lockstep in the field. and then you ship them, Bitcoin and the international markets, And the same thing is going to happen got to create a block, 500,000 transactions a day. That's eight. Now the argument against using Paxos, So you've abstracted the So you solve the complexity problem We have an algorithmic not Solving the scale problem to be able to build consensus We've solved the consensus problem. is still going to be important because and the anarchists who firmly believe that Thanks for coming on, again, always on the action for the rest of the day.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
David | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
Alibaba | ORGANIZATION | 0.99+ |
Peter Burris | PERSON | 0.99+ |
John | PERSON | 0.99+ |
Peter | PERSON | 0.99+ |
ORGANIZATION | 0.99+ | |
David Richards | PERSON | 0.99+ |
SEC | ORGANIZATION | 0.99+ |
NASDAQ | ORGANIZATION | 0.99+ |
March | DATE | 0.99+ |
two | QUANTITY | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
January | DATE | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
2014 | DATE | 0.99+ |
millions | QUANTITY | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
90% | QUANTITY | 0.99+ |
2013 | DATE | 0.99+ |
WANdisco | ORGANIZATION | 0.99+ |
London Stock Exchange | ORGANIZATION | 0.99+ |
2015 | DATE | 0.99+ |
New York City | LOCATION | 0.99+ |
nine years | QUANTITY | 0.99+ |
both | QUANTITY | 0.99+ |
two transactions | QUANTITY | 0.99+ |
eight | QUANTITY | 0.99+ |
five years ago | DATE | 0.99+ |
New York | LOCATION | 0.99+ |
SiliconANGLE Media | ORGANIZATION | 0.99+ |
half an hour | QUANTITY | 0.99+ |
35 patents | QUANTITY | 0.99+ |
hundreds of thousands | QUANTITY | 0.99+ |
2,000 | QUANTITY | 0.99+ |
Both | QUANTITY | 0.99+ |
ninth year | QUANTITY | 0.98+ |
first time | QUANTITY | 0.98+ |
billions of dollars | QUANTITY | 0.98+ |
Hadoop | TITLE | 0.98+ |
SAP | ORGANIZATION | 0.98+ |
Starbucks | ORGANIZATION | 0.98+ |
Paxos | ORGANIZATION | 0.98+ |
two nights ago | DATE | 0.97+ |
single | QUANTITY | 0.97+ |
two approaches | QUANTITY | 0.97+ |
500,000 transactions a day | QUANTITY | 0.97+ |
about 600 million | QUANTITY | 0.96+ |
theCUBE | ORGANIZATION | 0.96+ |
Satoshi | PERSON | 0.92+ |
two different clouds | QUANTITY | 0.91+ |
NYC | LOCATION | 0.89+ |
one | QUANTITY | 0.88+ |
theCUBE | EVENT | 0.87+ |
Sreesha Rao, Niagara Bottling & Seth Dobrin, IBM | Change The Game: Winning With AI 2018
>> Live, from Times Square, in New York City, it's theCUBE covering IBM's Change the Game: Winning with AI. Brought to you by IBM. >> Welcome back to the Big Apple, everybody. I'm Dave Vellante, and you're watching theCUBE, the leader in live tech coverage, and we're here covering a special presentation of IBM's Change the Game: Winning with AI. IBM's got an analyst event going on here at the Westin today in the theater district. They've got 50-60 analysts here. They've got a partner summit going on, and then tonight, at Terminal 5 of the West Side Highway, they've got a customer event, a lot of customers there. We've talked earlier today about the hard news. Seth Dobern is here. He's the Chief Data Officer of IBM Analytics, and he's joined by Shreesha Rao who is the Senior Manager of IT Applications at California-based Niagara Bottling. Gentlemen, welcome to theCUBE. Thanks so much for coming on. >> Thank you, Dave. >> Well, thanks Dave for having us. >> Yes, always a pleasure Seth. We've known each other for a while now. I think we met in the snowstorm in Boston, sparked something a couple years ago. >> Yep. When we were both trapped there. >> Yep, and at that time, we spent a lot of time talking about your internal role as the Chief Data Officer, working closely with Inderpal Bhandari, and you guys are doing inside of IBM. I want to talk a little bit more about your other half which is working with clients and the Data Science Elite Team, and we'll get into what you're doing with Niagara Bottling, but let's start there, in terms of that side of your role, give us the update. >> Yeah, like you said, we spent a lot of time talking about how IBM is implementing the CTO role. While we were doing that internally, I spent quite a bit of time flying around the world, talking to our clients over the last 18 months since I joined IBM, and we found a consistent theme with all the clients, in that, they needed help learning how to implement data science, AI, machine learning, whatever you want to call it, in their enterprise. There's a fundamental difference between doing these things at a university or as part of a Kaggle competition than in an enterprise, so we felt really strongly that it was important for the future of IBM that all of our clients become successful at it because what we don't want to do is we don't want in two years for them to go "Oh my God, this whole data science thing was a scam. We haven't made any money from it." And it's not because the data science thing is a scam. It's because the way they're doing it is not conducive to business, and so we set up this team we call the Data Science Elite Team, and what this team does is we sit with clients around a specific use case for 30, 60, 90 days, it's really about 3 or 4 sprints, depending on the material, the client, and how long it takes, and we help them learn through this use case, how to use Python, R, Scala in our platform obviously, because we're here to make money too, to implement these projects in their enterprise. Now, because it's written in completely open-source, if they're not happy with what the product looks like, they can take their toys and go home afterwards. It's on us to prove the value as part of this, but there's a key point here. My team is not measured on sales. They're measured on adoption of AI in the enterprise, and so it creates a different behavior for them. So they're really about "Make the enterprise successful," right, not "Sell this software." >> Yeah, compensation drives behavior. >> Yeah, yeah. >> So, at this point, I ask, "Well, do you have any examples?" so Shreesha, let's turn to you. (laughing softly) Niagara Bottling -- >> As a matter of fact, Dave, we do. (laughing) >> Yeah, so you're not a bank with a trillion dollars in assets under management. Tell us about Niagara Bottling and your role. >> Well, Niagara Bottling is the biggest private label bottled water manufacturing company in the U.S. We make bottled water for Costcos, Walmarts, major national grocery retailers. These are our customers whom we service, and as with all large customers, they're demanding, and we provide bottled water at relatively low cost and high quality. >> Yeah, so I used to have a CIO consultancy. We worked with every CIO up and down the East Coast. I always observed, really got into a lot of organizations. I was always observed that it was really the heads of Application that drove AI because they were the glue between the business and IT, and that's really where you sit in the organization, right? >> Yes. My role is to support the business and business analytics as well as I support some of the distribution technologies and planning technologies at Niagara Bottling. >> So take us the through the project if you will. What were the drivers? What were the outcomes you envisioned? And we can kind of go through the case study. >> So the current project that we leveraged IBM's help was with a stretch wrapper project. Each pallet that we produce--- we produce obviously cases of bottled water. These are stacked into pallets and then shrink wrapped or stretch wrapped with a stretch wrapper, and this project is to be able to save money by trying to optimize the amount of stretch wrap that goes around a pallet. We need to be able to maintain the structural stability of the pallet while it's transported from the manufacturing location to our customer's location where it's unwrapped and then the cases are used. >> And over breakfast we were talking. You guys produce 2833 bottles of water per second. >> Wow. (everyone laughs) >> It's enormous. The manufacturing line is a high speed manufacturing line, and we have a lights-out policy where everything runs in an automated fashion with raw materials coming in from one end and the finished goods, pallets of water, going out. It's called pellets to pallets. Pellets of plastic coming in through one end and pallets of water going out through the other end. >> Are you sitting on top of an aquifer? Or are you guys using sort of some other techniques? >> Yes, in fact, we do bore wells and extract water from the aquifer. >> Okay, so the goal was to minimize the amount of material that you used but maintain its stability? Is that right? >> Yes, during transportation, yes. So if we use too much plastic, we're not optimally, I mean, we're wasting material, and cost goes up. We produce almost 16 million pallets of water every single year, so that's a lot of shrink wrap that goes around those, so what we can save in terms of maybe 15-20% of shrink wrap costs will amount to quite a bit. >> So, how does machine learning fit into all of this? >> So, machine learning is way to understand what kind of profile, if we can measure what is happening as we wrap the pallets, whether we are wrapping it too tight or by stretching it, that results in either a conservative way of wrapping the pallets or an aggressive way of wrapping the pallets. >> I.e. too much material, right? >> Too much material is conservative, and aggressive is too little material, and so we can achieve some savings if we were to alternate between the profiles. >> So, too little material means you lose product, right? >> Yes, and there's a risk of breakage, so essentially, while the pallet is being wrapped, if you are stretching it too much there's a breakage, and then it interrupts production, so we want to try and avoid that. We want a continuous production, at the same time, we want the pallet to be stable while saving material costs. >> Okay, so you're trying to find that ideal balance, and how much variability is in there? Is it a function of distance and how many touches it has? Maybe you can share with that. >> Yes, so each pallet takes about 16-18 wraps of the stretch wrapper going around it, and that's how much material is laid out. About 250 grams of plastic that goes on there. So we're trying to optimize the gram weight which is the amount of plastic that goes around each of the pallet. >> So it's about predicting how much plastic is enough without having breakage and disrupting your line. So they had labeled data that was, "if we stretch it this much, it breaks. If we don't stretch it this much, it doesn't break, but then it was about predicting what's good enough, avoiding both of those extremes, right? >> Yes. >> So it's a truly predictive and iterative model that we've built with them. >> And, you're obviously injecting data in terms of the trip to the store as well, right? You're taking that into consideration in the model, right? >> Yeah that's mainly to make sure that the pallets are stable during transportation. >> Right. >> And that is already determined how much containment force is required when your stretch and wrap each pallet. So that's one of the variables that is measured, but the inputs and outputs are-- the input is the amount of material that is being used in terms of gram weight. We are trying to minimize that. So that's what the whole machine learning exercise was. >> And the data comes from where? Is it observation, maybe instrumented? >> Yeah, the instruments. Our stretch-wrapper machines have an ignition platform, which is a Scada platform that allows us to measure all of these variables. We would be able to get machine variable information from those machines and then be able to hopefully, one day, automate that process, so the feedback loop that says "On this profile, we've not had any breaks. We can continue," or if there have been frequent breaks on a certain profile or machine setting, then we can change that dynamically as the product is moving through the manufacturing process. >> Yeah, so think of it as, it's kind of a traditional manufacturing production line optimization and prediction problem right? It's minimizing waste, right, while maximizing the output and then throughput of the production line. When you optimize a production line, the first step is to predict what's going to go wrong, and then the next step would be to include precision optimization to say "How do we maximize? Using the constraints that the predictive models give us, how do we maximize the output of the production line?" This is not a unique situation. It's a unique material that we haven't really worked with, but they had some really good data on this material, how it behaves, and that's key, as you know, Dave, and probable most of the people watching this know, labeled data is the hardest part of doing machine learning, and building those features from that labeled data, and they had some great data for us to start with. >> Okay, so you're collecting data at the edge essentially, then you're using that to feed the models, which is running, I don't know, where's it running, your data center? Your cloud? >> Yeah, in our data center, there's an instance of DSX Local. >> Okay. >> That we stood up. Most of the data is running through that. We build the models there. And then our goal is to be able to deploy to the edge where we can complete the loop in terms of the feedback that happens. >> And iterate. (Shreesha nods) >> And DSX Local, is Data Science Experience Local? >> Yes. >> Slash Watson Studio, so they're the same thing. >> Okay now, what role did IBM and the Data Science Elite Team play? You could take us through that. >> So, as we discussed earlier, adopting data science is not that easy. It requires subject matter, expertise. It requires understanding of data science itself, the tools and techniques, and IBM brought that as a part of the Data Science Elite Team. They brought both the tools and the expertise so that we could get on that journey towards AI. >> And it's not a "do the work for them." It's a "teach to fish," and so my team sat side by side with the Niagara Bottling team, and we walked them through the process, so it's not a consulting engagement in the traditional sense. It's how do we help them learn how to do it? So it's side by side with their team. Our team sat there and walked them through it. >> For how many weeks? >> We've had about two sprints already, and we're entering the third sprint. It's been about 30-45 days between sprints. >> And you have your own data science team. >> Yes. Our team is coming up to speed using this project. They've been trained but they needed help with people who have done this, been there, and have handled some of the challenges of modeling and data science. >> So it accelerates that time to --- >> Value. >> Outcome and value and is a knowledge transfer component -- >> Yes, absolutely. >> It's occurring now, and I guess it's ongoing, right? >> Yes. The engagement is unique in the sense that IBM's team came to our factory, understood what that process, the stretch-wrap process looks like so they had an understanding of the physical process and how it's modeled with the help of the variables and understand the data science modeling piece as well. Once they know both side of the equation, they can help put the physical problem and the digital equivalent together, and then be able to correlate why things are happening with the appropriate data that supports the behavior. >> Yeah and then the constraints of the one use case and up to 90 days, there's no charge for those two. Like I said, it's paramount that our clients like Niagara know how to do this successfully in their enterprise. >> It's a freebie? >> No, it's no charge. Free makes it sound too cheap. (everybody laughs) >> But it's part of obviously a broader arrangement with buying hardware and software, or whatever it is. >> Yeah, its a strategy for us to help make sure our clients are successful, and I want it to minimize the activation energy to do that, so there's no charge, and the only requirements from the client is it's a real use case, they at least match the resources I put on the ground, and they sit with us and do things like this and act as a reference and talk about the team and our offerings and their experiences. >> So you've got to have skin in the game obviously, an IBM customer. There's got to be some commitment for some kind of business relationship. How big was the collective team for each, if you will? >> So IBM had 2-3 data scientists. (Dave takes notes) Niagara matched that, 2-3 analysts. There were some working with the machines who were familiar with the machines and others who were more familiar with the data acquisition and data modeling. >> So each of these engagements, they cost us about $250,000 all in, so they're quite an investment we're making in our clients. >> I bet. I mean, 2-3 weeks over many, many weeks of super geeks time. So you're bringing in hardcore data scientists, math wizzes, stat wiz, data hackers, developer--- >> Data viz people, yeah, the whole stack. >> And the level of skills that Niagara has? >> We've got actual employees who are responsible for production, our manufacturing analysts who help aid in troubleshooting problems. If there are breakages, they go analyze why that's happening. Now they have data to tell them what to do about it, and that's the whole journey that we are in, in trying to quantify with the help of data, and be able to connect our systems with data, systems and models that help us analyze what happened and why it happened and what to do before it happens. >> Your team must love this because they're sort of elevating their skills. They're working with rock star data scientists. >> Yes. >> And we've talked about this before. A point that was made here is that it's really important in these projects to have people acting as product owners if you will, subject matter experts, that are on the front line, that do this everyday, not just for the subject matter expertise. I'm sure there's executives that understand it, but when you're done with the model, bringing it to the floor, and talking to their peers about it, there's no better way to drive this cultural change of adopting these things and having one of your peers that you respect talk about it instead of some guy or lady sitting up in the ivory tower saying "thou shalt." >> Now you don't know the outcome yet. It's still early days, but you've got a model built that you've got confidence in, and then you can iterate that model. What's your expectation for the outcome? >> We're hoping that preliminary results help us get up the learning curve of data science and how to leverage data to be able to make decisions. So that's our idea. There are obviously optimal settings that we can use, but it's going to be a trial and error process. And through that, as we collect data, we can understand what settings are optimal and what should we be using in each of the plants. And if the plants decide, hey they have a subjective preference for one profile versus another with the data we are capturing we can measure when they deviated from what we specified. We have a lot of learning coming from the approach that we're taking. You can't control things if you don't measure it first. >> Well, your objectives are to transcend this one project and to do the same thing across. >> And to do the same thing across, yes. >> Essentially pay for it, with a quick return. That's the way to do things these days, right? >> Yes. >> You've got more narrow, small projects that'll give you a quick hit, and then leverage that expertise across the organization to drive more value. >> Yes. >> Love it. What a great story, guys. Thanks so much for coming to theCUBE and sharing. >> Thank you. >> Congratulations. You must be really excited. >> No. It's a fun project. I appreciate it. >> Thanks for having us, Dave. I appreciate it. >> Pleasure, Seth. Always great talking to you, and keep it right there everybody. You're watching theCUBE. We're live from New York City here at the Westin Hotel. cubenyc #cubenyc Check out the ibm.com/winwithai Change the Game: Winning with AI Tonight. We'll be right back after a short break. (minimal upbeat music)
SUMMARY :
Brought to you by IBM. at Terminal 5 of the West Side Highway, I think we met in the snowstorm in Boston, sparked something When we were both trapped there. Yep, and at that time, we spent a lot of time and we found a consistent theme with all the clients, So, at this point, I ask, "Well, do you have As a matter of fact, Dave, we do. Yeah, so you're not a bank with a trillion dollars Well, Niagara Bottling is the biggest private label and that's really where you sit in the organization, right? and business analytics as well as I support some of the And we can kind of go through the case study. So the current project that we leveraged IBM's help was And over breakfast we were talking. (everyone laughs) It's called pellets to pallets. Yes, in fact, we do bore wells and So if we use too much plastic, we're not optimally, as we wrap the pallets, whether we are wrapping it too little material, and so we can achieve some savings so we want to try and avoid that. and how much variability is in there? goes around each of the pallet. So they had labeled data that was, "if we stretch it this that we've built with them. Yeah that's mainly to make sure that the pallets So that's one of the variables that is measured, one day, automate that process, so the feedback loop the predictive models give us, how do we maximize the Yeah, in our data center, Most of the data And iterate. the Data Science Elite Team play? so that we could get on that journey towards AI. And it's not a "do the work for them." and we're entering the third sprint. some of the challenges of modeling and data science. that supports the behavior. Yeah and then the constraints of the one use case No, it's no charge. with buying hardware and software, or whatever it is. minimize the activation energy to do that, There's got to be some commitment for some and others who were more familiar with the So each of these engagements, So you're bringing in hardcore data scientists, math wizzes, and that's the whole journey that we are in, in trying to Your team must love this because that are on the front line, that do this everyday, and then you can iterate that model. And if the plants decide, hey they have a subjective and to do the same thing across. That's the way to do things these days, right? across the organization to drive more value. Thanks so much for coming to theCUBE and sharing. You must be really excited. I appreciate it. I appreciate it. Change the Game: Winning with AI Tonight.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Shreesha Rao | PERSON | 0.99+ |
Seth Dobern | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
Walmarts | ORGANIZATION | 0.99+ |
Costcos | ORGANIZATION | 0.99+ |
Dave | PERSON | 0.99+ |
30 | QUANTITY | 0.99+ |
Boston | LOCATION | 0.99+ |
New York City | LOCATION | 0.99+ |
California | LOCATION | 0.99+ |
Seth Dobrin | PERSON | 0.99+ |
60 | QUANTITY | 0.99+ |
Niagara | ORGANIZATION | 0.99+ |
Seth | PERSON | 0.99+ |
Shreesha | PERSON | 0.99+ |
U.S. | LOCATION | 0.99+ |
Sreesha Rao | PERSON | 0.99+ |
third sprint | QUANTITY | 0.99+ |
90 days | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
first step | QUANTITY | 0.99+ |
Inderpal Bhandari | PERSON | 0.99+ |
Niagara Bottling | ORGANIZATION | 0.99+ |
Python | TITLE | 0.99+ |
both | QUANTITY | 0.99+ |
tonight | DATE | 0.99+ |
ibm.com/winwithai | OTHER | 0.99+ |
one | QUANTITY | 0.99+ |
Terminal 5 | LOCATION | 0.99+ |
two years | QUANTITY | 0.99+ |
about $250,000 | QUANTITY | 0.98+ |
Times Square | LOCATION | 0.98+ |
Scala | TITLE | 0.98+ |
2018 | DATE | 0.98+ |
15-20% | QUANTITY | 0.98+ |
IBM Analytics | ORGANIZATION | 0.98+ |
each | QUANTITY | 0.98+ |
today | DATE | 0.98+ |
each pallet | QUANTITY | 0.98+ |
Kaggle | ORGANIZATION | 0.98+ |
West Side Highway | LOCATION | 0.97+ |
Each pallet | QUANTITY | 0.97+ |
4 sprints | QUANTITY | 0.97+ |
About 250 grams | QUANTITY | 0.97+ |
both side | QUANTITY | 0.96+ |
Data Science Elite Team | ORGANIZATION | 0.96+ |
one day | QUANTITY | 0.95+ |
every single year | QUANTITY | 0.95+ |
Niagara Bottling | PERSON | 0.93+ |
about two sprints | QUANTITY | 0.93+ |
one end | QUANTITY | 0.93+ |
R | TITLE | 0.92+ |
2-3 weeks | QUANTITY | 0.91+ |
one profile | QUANTITY | 0.91+ |
50-60 analysts | QUANTITY | 0.91+ |
trillion dollars | QUANTITY | 0.9+ |
2-3 data scientists | QUANTITY | 0.9+ |
about 30-45 days | QUANTITY | 0.88+ |
almost 16 million pallets of water | QUANTITY | 0.88+ |
Big Apple | LOCATION | 0.87+ |
couple years ago | DATE | 0.87+ |
last 18 months | DATE | 0.87+ |
Westin Hotel | ORGANIZATION | 0.83+ |
pallet | QUANTITY | 0.83+ |
#cubenyc | LOCATION | 0.82+ |
2833 bottles of water per second | QUANTITY | 0.82+ |
the Game: Winning with AI | TITLE | 0.81+ |
Scott Hebner, IBM | Change the Game: Winning With AI
>> Live from Times Square in New York City, it's theCUBE. Covering IBMs Change the Game, Winning With AI. Brought to you by, IBM. >> Hi, everybody, we're back. My name is Dave Vellante and you're watching, theCUBE. The leader in live tech coverage. We're here with Scott Hebner who's the VP of marketing for IBM analytics and AI. Scott, it's good to see you again, thanks for coming back on theCUBE. >> It's always great to be here, I love doing these. >> So one of the things we've been talking about for quite some time on theCUBE now, we've been following the whole big data movement since the early Hadoop days. And now AI is the big trend and we always ask is this old wine, new bottle? Or is it something substantive? And the consensus is, it's real, it's real innovation because of the data. What's your perspective? >> I do think it's another one of these major waves, and if you kind of go back through time, there's been a series of them, right? We went from, sort of centralized computing into client server, and then we went from client server into the whole world of e-business in the internet, back around 2000 time frame or so. Then we went from internet computing to, cloud. Right? And I think the next major wave here is that next step is AI. And machine learning, and applying all this intelligent automation to the entire system. So I think, and it's not just a evolution, it's a pretty big change that's occurring here. Particularly the value that it can provide businesses is pretty profound. >> Well it seems like that's the innovation engine for at least the next decade. It's not Moore's Law anymore, it's applying machine intelligence and AI to the data and then being able to actually operationalize that at scale. With the cloud-like model, whether its OnPrem or Offprem, your thoughts on that? >> Yeah, I mean I think that's right on 'cause, if you kind of think about what AI's going to do, in the end it's going to be about just making much better decisions. Evidence based decisions, your ability to get to data that is previously unattainable, right? 'Cause it can discover things in real time. So it's about decision making and it's about fueling better, and more intelligent business processing. Right? But I think, what's really driving, sort of under the covers of that, is this idea that, are clients really getting what they need from their data? 'Cause we all know that the data's exploding in terms of growth. And what we know from our clients and from studies is only about 15% of what business leaders believe that they're getting what they need from their data. Yet most businesses are sitting on about 80% of their data, that's either inaccessible, un-analyzed, or un-trusted, right? So, what they're asking themselves is how do we first unlock the value of all this data. And they knew they have to do it in new ways, and I think the new ways starts to talk about cloud native architectures, containerization, things of that nature. Plus, artificial intelligence. So, I think what the market is starting to tell us is, AI is the way to unlock the value of all this data. And it's time to really do something significant with it otherwise, it's just going to be marginal progress over time. They need to make big progress. >> But data is plentiful, insights aren't. And part of your strategy is always been to bring insights out of that dividend and obviously focused on clients outcomes. But, a big part of your role is not only communicating IBMs analytic and AI strategy, but also helping shape that strategy. How do you, sort of summarize that strategy? >> Well we talk about the ladder to AI, 'cause one thing when you look at the actual clients that are ahead of the game here, and the challenges that they've faced to get to the value of AI, what we've learned, very, very clearly, is that the hardest part of AI is actually making your data ready for AI. It's about the data. It's sort of this notion that there's no AI without a information architecture, right? You have to build that architecture to make your data ready, 'cause bad data will be paralyzing to AI. And actually there was a great MIT Sloan study that they did earlier in the year that really dives into all these challenges and if I remember correctly, about 81% of them said that the number one challenge they had is, their data. Is their data ready? Do they know what data to get to? And that's really where it all starts. So we have this notion of the ladder to AI, it's several, very prescriptive steps, that we believe through best practices, you need to actually take to get to AI. And once you get to AI then it becomes about how you operationalize it in the way that it scales, that you have explainability, you have transparency, you have trust in what the model is. But it really much is a systematical approach here that we believe clients are going to get there in a much faster way. >> So the picture of the ladder here it starts with collect, and that's kind of what we did with, Hadoop, we collected a lot of data 'cause it was inexpensive and then organizing it, it says, create a trusted analytics foundation. Still building that sort of framework and then analyze and actually start getting insights on demand. And then automation, that seems to be the big theme now. Is, how do I get automation? Whether it's through machine learning, infusing AI everywhere. Be a blockchain is part of that automation, obviously. And it ultimately getting to the outcome, you call it trust, achieving trust and transparency, that's the outcome that we want here, right? >> I mean I think it all really starts with making your data simple and accessible. Which is about collecting the data. And doing it in a way you can tap into all types of data, regardless of where it lives. So the days of trying to move data around all over the place or, heavy duty replication and integration, let it sit where it is, but be able to virtualize it and collect it and containerize it, so it can be more accessible and usable. And that kind of goes to the point that 80% of the enterprised data, is inaccessible, right? So it all starts first with, are you getting all the data collected appropriately, and getting it into a way that you can use it. And then we start feeding things in like, IOT data, and sensors, and it becomes real time data that you have to do this against, right? So, notions of replicating and integrating and moving data around becomes not very practical. So that's step one. Step two is, once you collect all the data doesn't necessarily mean you trust it, right? So when we say, trust, we're talking about business ready data. Do people know what the data is? Are there business entities associated with it? Has it been cleansed, right? Has it been take out all the duplicate data? What do you when a situation with data, you know you have sources of data that are telling you different things. Like, I think we've all been on a treadmill where the phone, the watch, and the treadmill will actually tell you different distances, I mean what's the truth? The whole notion of organizing is getting it ready to be used by the business, in applying the policies, the compliance, and all the protections that you need for that data. Step three is, the ability to build out all this, ability to analyze it. To do it on scale, right, and to do it in a way that everyone can leverage the data. So not just the business analysts, but you need to enable everyone through self-service. And that's the advancements that we're getting in new analytics capabilities that make mere mortals able to get to that data and do their analysis. >> And if I could inject, the challenge with the sort of traditional decision support world is you had maybe two, or three people that were like, the data gods. You had to go through them, and they would get the analysis. And it's just, the agility wasn't there. >> Right. >> So you're trying to, democratizing that, putting it in the hands. >> Absolutely. >> Maybe the business user's not as much of an expert as the person who can build theCUBE, but they could find new use cases, and drive more value, right? >> Actually, from a developer, that needs to get access, and analytics infused into their applications, to the other end of the spectrum which could be, a marketing leader, a finance planner, someone who's planning budgets, supply chain planner. Right, so it's that whole spectrum, not only allowing them to tap into, and analyze the data and gain insights from it, but allow them to customize how they do it and do it in a more self-service. So that's the notion of scale on demand insights. It's really a cultural thing enabled through the technology. With that foundation, then you have the ability to start infuse, where I think the real power starts to kick in here. So I mean, all that's kind of making your data ready for AI, right? Then you start to infuse machine learning, everywhere. And that's when you start to build these models that are self-learning, that start to automate the ability to get to these insights, and to the data. And uncover what has previously been unattainable, right? And that's where the whole thing starts to become automated and more real time and more intelligent. And that's where those models then allow you to do things you couldn't do before. With the data, they're saying they're not getting access to. And then of course, once you get the models, just because you have good models doesn't mean that they've been operationalized, that they've been embedded in applications, embedded in business process. That you have trust and transparency and explainability of what it's telling you. And that's that top tier of the ladder, is really about embedding it, right, so that into your business process in a way that you trust it. So, we have a systematic set of approaches to that, best practices. And of course we have the portfolio that would help you step up that ladder. >> So the fat middle of this bell curve is, something kind of this maturity curve, is kind of the organize and analyze phase, that's probably where most people are today. And what's the big challenge of getting up that ladder, is it the algorithms, what is it? >> Well I think it, it clearly with most movements like this, starts with culture and skills, right? And the ability to just change the game within an organization. But putting that aside, I think what's really needed here is an information architecture that's based in the agility of a cloud native platform, that gives you the productivity, and truly allows you to leverage your data, wherever it resides. So whether it's in the private cloud, the public cloud, on premise, dedicated no matter where it sits, you want to be able to tap into all that data. 'Cause remember, the challenge with data is it's always changing. I don't mean the sources, but the actual data. So you need an architecture that can handle all that. Once you stabilize that, then you can start to apply better analytics to it. And so yeah, I think you're right. That is sort of the bell curve here. And with that foundation that's when the power of infusing machine learning and deep learning and neuronetworks, I mean those kind of AI technologies and models into it all, just takes it to a whole new level. But you can't do those models until you have those bottom tiers under control. >> Right, setting that foundation. Building that framework. >> Exactly. >> And then applying. >> What developers of AI applications, particularly those that have been successful, have told us pretty clearly, is that building the actual algorithms, is not necessarily the hard part. The hard part is making all the data ready for that. And in fact I was reading a survey the other day of actual data scientists and AI developers and 60% of them said the thing they hate the most, is all the data collection, data prep. 'Cause it's so hard. And so, a big part of our strategy is just to simplify that. Make it simple and accessible so that you can really focus on what you want to do and where the value is, which is building the algorithms and the models, and getting those deployed. >> Big challenge and hugely important, I mean IBM is a 100 year old company that's going through it's own digital transformation. You know, we've had Inderpal Bhandari on talking about how to essentially put data at the core of the company, it's a real hard problem for a lot of companies who were not born, you know, five or, seven years ago. And so, putting data at that core and putting human expertise around it as opposed to maybe, having whatever as the core. Humans or the plant or the manufacturing facility, that's a big change for a lot of organizations. Now at the end of the day IBM, and IBM sells strategy but the analytics group, you're in the software business so, what offerings do you have, to help people get there? >> Well in the collect step, it's essentially our hybrid data management portfolio. So think DB2, DB2 warehouse, DB2 event store, which is about IOT data. So there's a set of, and that's where big data in Hadoop and all that with Wentworth's, that's where that all fits in. So building the ability to access all this data, virtualize it, do things like Queryplex, things of that nature, is where that all sits. >> Queryplex being that to the data, virtualization capability. >> Yeah. >> Get to the data no matter where it is. >> To find a queary and don't worry about where it resides, we'll figure that out for you, kind of thought, right? In the organize, that is infosphere, so that's basically our unified governance and integration part of our portfolio. So again, that is collecting all this, taking the collected data and organizing it, and making sure you're compliant with whatever policies. And making it, you know, business ready, right? And so infosphere's where you should look to understand that portfolio better. When you get into scale and analytics on demand, that's Cognos analytics, it is our planning analytics portfolio. And that's essentially our business analytics part of all this. And some data science tools like, SPSS, we're doing statistical analysis and SPSS modeler, if we're doing statistical modeling, things of that nature, right? When you get into the automate and the ML, everywhere, that's Watson Studio which is the integrated development environment, right? Not just for IBM Watson, but all, has a huge array of open technologies in it like, TensorFlow and Python, and all those kind of things. So that's the development environment that Watson machine learning is the runtime that will allow you to run those models anywhere. So those are the two big pieces of that. And then from there you'll see IBM building out more and more of what we already have. But we have Watson applications. Like Watson Assistant, Watson Discovery. We have a huge portfolio of Watson APIs for everything from tone to speech, things of that nature. And then the ability to infuse that all into the business processes. Sort of where you're going to see IBM heading in the future here. >> I love how you brought that home, and we talked about the ladder and it's more than just a PowerPoint slide. It actually is fundamental to your strategy, it maps with your offerings. So you can get the heads nodding, with the customers. Where are you on this maturity curve, here's how we can help with products and services. And then the other thing I'll mention, you know, we kind of learned when we spoke to some others this week, and we saw some of your announcements previously, the Red Hat component which allows you to bring that cloud experience no matter where you are, and you've got technologies to do that, obviously, you know, Red Hat, you guys have been sort of birds of a feather, an open source. Because, your data is going to live wherever it lives, whether it's on Prem, whether it's in the cloud, whether it's in the Edge, and you want to bring sort of a common model. Whether it's, containers, kubernetes, being able to, bring that cloud experience to the data, your thoughts on that? >> And this is where the big deal comes in, is for each one of those tiers, so, the DB2 family, infosphere, business analytics, Cognos and all that, and Watson Studio, you can get started, purchase those technologies and start to use them, right, as individual products or softwares that service. What we're also doing is, this is the more important step into the future, is we're building all those capabilities into one integrated unified cloud platform. That's called, IBM Cloud Private for data. Think of that as a unified, collaborative team environment for AI and data science. Completely built on a cloud native architecture of containers and micro services. That will support a multi cloud environment. So, IBM cloud, other clouds, you mention Red Hat with Openshift, so, over time by adopting IBM Cloud Private for data, you'll get those steps of the ladder all integrated to one unified environment. So you have the ability to buy the unified environment, get involved in that, and it all integrated, no assembly required kind of thought. Or, you could assemble it by buying the individual components, or some combination of both. So a big part of the strategy is, a great deal of flexibility on how you acquire these capabilities and deploy them in your enterprise. There's no one size fits all. We give you a lot of flexibility to do that. >> And that's a true hybrid vision, I don't have to have just IBM and IBM cloud, you're recognizing other clouds out there, you're not exclusive like some companies, but that's really important. >> It's a multi cloud strategy, it really is, it's a multi cloud strategy. And that's exactly what we need, we recognize that most businesses, there's very few that have standardized on only one cloud provider, right? Most of them have multiples clouds, and then it breaks up of dedicated, private, public. And so our strategy is to enable this capability, think of it as a cloud data platform for AI, across all these clouds, regardless of what you have. >> All right, Scott, thanks for taking us through the strategies. I've always loved talking to you 'cause you're a clear thinker, and you explain things really well in simple terms, a lot of complexity here but, it is really important as the next wave sets up. So thanks very much for your time. >> Great, always great to be here, thank you. >> All right, good to see you. All right, thanks for watching everybody. We are now going to bring it back to CubeNYC so, thanks for watching and we will see you in the afternoon. We've got the panel, the influencer panel, that I'll be running with Peter Burris and John Furrier. So, keep it right there, we'll be right back. (upbeat music)
SUMMARY :
Brought to you by, IBM. it's good to see you again, It's always great to be And now AI is the big and if you kind of go back through time, and then being able to actually in the end it's going to be about And part of your strategy is of the ladder to AI, So the picture of the ladder And that's the advancements And it's just, the agility wasn't there. the hands. And that's when you start is it the algorithms, what is it? And the ability to just change Right, setting that foundation. is that building the actual algorithms, And so, putting data at that core So building the ability Queryplex being that to the data, Get to the data no matter And so infosphere's where you should look and you want to bring So a big part of the strategy is, I don't have to have And so our strategy is to I've always loved talking to you to be here, thank you. We've got the panel, the influencer panel,
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave Vellante | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Scott | PERSON | 0.99+ |
Scott Hebner | PERSON | 0.99+ |
80% | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
60% | QUANTITY | 0.99+ |
John Furrier | PERSON | 0.99+ |
New York City | LOCATION | 0.99+ |
Python | TITLE | 0.99+ |
Inderpal Bhandari | PERSON | 0.99+ |
PowerPoint | TITLE | 0.99+ |
IBMs | ORGANIZATION | 0.99+ |
Peter Burris | PERSON | 0.99+ |
TensorFlow | TITLE | 0.99+ |
three people | QUANTITY | 0.99+ |
both | QUANTITY | 0.98+ |
Times Square | LOCATION | 0.98+ |
Watson | TITLE | 0.98+ |
about 80% | QUANTITY | 0.98+ |
Watson Assistant | TITLE | 0.98+ |
step one | QUANTITY | 0.98+ |
one | QUANTITY | 0.97+ |
MIT Sloan | ORGANIZATION | 0.97+ |
next decade | DATE | 0.97+ |
about 15% | QUANTITY | 0.97+ |
Watson Studio | TITLE | 0.97+ |
this week | DATE | 0.97+ |
Step two | QUANTITY | 0.96+ |
Watson Discovery | TITLE | 0.96+ |
two big pieces | QUANTITY | 0.96+ |
Red Hat | TITLE | 0.96+ |
about 81% | QUANTITY | 0.96+ |
Openshift | TITLE | 0.95+ |
CubeNYC | LOCATION | 0.94+ |
five | DATE | 0.94+ |
Queryplex | TITLE | 0.94+ |
first | QUANTITY | 0.93+ |
today | DATE | 0.92+ |
100 year old | QUANTITY | 0.92+ |
Wentworth | ORGANIZATION | 0.91+ |
Step three | QUANTITY | 0.91+ |
Change the Game: Winning With AI | TITLE | 0.9+ |
one cloud provider | QUANTITY | 0.9+ |
one thing | QUANTITY | 0.89+ |
DB2 | TITLE | 0.85+ |
each one | QUANTITY | 0.84+ |
seven years ago | DATE | 0.83+ |
OnPrem | ORGANIZATION | 0.83+ |
waves | EVENT | 0.82+ |
number one challenge | QUANTITY | 0.8+ |
Red Hat | TITLE | 0.78+ |
Offprem | ORGANIZATION | 0.77+ |
DB2 | ORGANIZATION | 0.76+ |
major | EVENT | 0.76+ |
major wave | EVENT | 0.75+ |
SPSS | TITLE | 0.73+ |
Moore's Law | TITLE | 0.72+ |
Cognos | TITLE | 0.72+ |
next | EVENT | 0.66+ |
Cloud | TITLE | 0.64+ |
around 2000 | QUANTITY | 0.64+ |
Hadoop | TITLE | 0.61+ |
early Hadoop days | DATE | 0.55+ |
them | QUANTITY | 0.51+ |
wave | EVENT | 0.5+ |
in | DATE | 0.49+ |
theCUBE | TITLE | 0.45+ |
theCUBE | ORGANIZATION | 0.42+ |
Yaron Haviv, Iguazio | theCUBE NYC 2018
Live from New York It's theCUBE! Covering theCUBE New York City 2018 Brought to you by Silicon Angle Media and it's ecosystem partners >> Hey welcome back and we're live in theCUBE in New York city. It's our 2nd day of two days of coverage CUBE NYC. The hashtag CUBENYC Formerly Big data NYC renamed because it's about big data, it's about the server, it's about Cooper _________'s multi-cloud data. It's all about data, and that's the fundamental change in the industry. Our next guest is Yaron Haviv, who's the CTO of Iguazio, key alumni, always coming out with some good commentary smart analysis. Kind of a guest host as well as an industry participant supplier. Welcome back to theCUBE. Good to see you. >> Thank you John. >> Love having you on theCUBE because you always bring some good insight and we appreciate that. Thank you so much. First, before we get into some of the comments because I really want to delve into comments that David Richards said a few years ago, CEO of RenDisco. He said, "Cloud's going to kill Hadoop". And people were looking at him like, "Oh my God, who is this heretic? He's crazy. What is he talking about?" But you might not need Hadoop, if you can run server less Spark, Tensorflow.... You talk about this off camera. Is Hadoop going to be the open stack of the big data world? >> I don't think cloud necessary killed Hadoop, although it is working on that, you know because you go to Amazon and you know, you can consume a bunch of services and you don't really need to think about Hadoop. I think cloud native serve is starting to kill Hadoop, cause Hadoop is three layers, you know, it's a file system, it's DFS, and then you have server scheduling Yarn, then you have applications starting with map produce and then you evolve into things like Spark. Okay, so, file system I don't really need in the cloud. I use Asfree, I can use a database as a service, as you know, pretty efficient way of storing data. For scheduling, Kubernetes is a much more generic way of scheduling workloads and not confined to Spark and specific workloads. I can run with Dancerflow, I can run with data science tools, etc., just containerize. So essentially, why would I need Hadoop? If I can take the traditional tools people are now evolving in and using like Jupiter Notebooks, Spark, Dancerflow, you know, those packages with Kubernetes on top of a database as a service and some object store, I have a much easier stack to work with. And I could mobilize that whether it's in the cloud, you know on different vendors. >> Scale is important too. How do you scale it? >> Of course, you have independent scaling between data and computation, unlike Hadoop. So I can just go to Google, and use Vquery, or use, you know, DynamoDB on Amazon or Redchick, or whatever and automatically scale it down and then, you know >> That's a unique position, so essentially, Hadoop versus Kubernetes is a top-line story. And wouldn't that be ironic for Google, because Google essentially created Map Produce and Coudera ran with it and went public, but when we're talking about 2008 timeframe, 2009 timeframe, back when ventures with cloud were just emerging in the mainstream. So wouldn't it be ironic Kubernetes, which is being driven by Google, ends up taking over Hadoop? In terms of running things on Kubernetes and cloud eight on Visa Vis on premise with Hadoop. >> The poster is tend to give this comment about Google, but essentially Yahoo started Hadoop. Google started the technology  and couple of years after Hadoop started, with Google they essentially moved to a different architecture, with something called Percolator. So Google's not too associated with Hadoop. They're not really using this approach for a long time. >> Well they wrote the map-produced paper and the internal conversations we report on theCUBE about Google was, they just let that go. And Yahoo grabbed it. (cross-conversation) >> The companies that had the most experience were the first to leave. And I think it may respect what you're saying. As the marketplace realizes the outcomes of the dubious associate with, they will find other ways of achieving those outcomes. It might be more depth. >> There's also a fundamental shift in the consumption where Hadoop was about a ranking pages in a batch form. You know, just collecting logs and ranking pages, okay. The chances that people have today revolve around applying AI to business application. It needs to be a lot more concurring, transactional, real-time ish, you know? It's nothing to do with Hadoop, okay? So that's why you'll see more and more workers, mobilizing different black server functions, into service pre-canned services, etc. And Kubernetes playing a good role here is providing the trend. Transport for migrating workloads across cloud providers, because I can use GKE, the Google Kubenetes, or Amazon Kubernetes, or Azure Kubernetes, and I could write a similar application and deploy it on any cloud, or on Clam on my own private cluster. It makes the infrastructure agnostic really application focused. >> Question about Kubernetes we heard on theCUBE earlier, the VP of Project BlueData said that Kubernetes ecosystem and community needs to do a better job with Stapla, they nailed Stapflalis, Stafle application support is something that they need help on. Do you agree with that comment, and then if so, what alternatives do you have for customers who care about Stafe? >> They should use our product (laughing) >> (mumbling) Is Kubernetes struggling there? And if so, talk about your product >> So, I think that our challenge is rounded that there are many solutions in that. I think that they are attacking it from a different approach Many of them are essentially providing some block storage to different containers on really cloud 90. What you want to be able is to have multiple containers access the same data. That means either sharing through file systems, for objects or through databases because one container is generating, for example, ingestion or __________. Another container is manipulating that same data. A third container may look for something in the data, and generate a trigger or an action. So you need shared access to data from those containers. >> The rest of the data synchronizes all three of those things. >> Yes because the data is the form of state. The form of state cannot be associated with the same container, which is what most of where I am very active and sincere in those committees, and you have all the storage guys in the committees, and they think the block story just drag solution. Cause they still think like virtual machines, okay? But the general idea is that if you think about Kubernetes is like the new OS, where you have many processes, they're just scattered around. In OS, the way for us to share state between processes an OS, is whether through files, or through databases, in those form. And that's really what >> Threads and databases as a positive engagement. >> So essentially I gave maybe two years ago, a session at KubeCon in Europe about what we're doing on storing state. It's really high-performance access from those container processes to our database. Impersonate objects, files, streams or time series data, etc And then essentially, all those workloads just mount on top of and we can all share stape. We can even control the access for each >> Do you think you nailed the stape problem? >> Yes, by the way, we have a managed service. Anyone could go today to our cloud, to our website, that's in our cloud. It gets it's own Kubernetes cluster, a provision within less than 10 minutes, five to 10 minutes. With all of those services pre-integrated with Spark, Presto, ______________, real-time, these services functions. All that pre-configured on it's own time. I figured all of these- >> 100% compatible with Kubernetes, it's a good investment >> Well we're just expanding it to Kubernetes stripes, now it's working on them, Amazon Kubernetes, EKS I think, we're working on AKS and GK. We partner with Azure and Google. And we're also building an ad solution that is essentially exactly the same stock. Can run on an edge appliance in a factory. You can essentially mobilize data and functions back and forth. So you can go and develop your work loads, your application in the cloud, test it under simulation, push a single button and teleport the artifacts into the edge factory. >> So is it like a real-time Kubernetes? >> Yes, it's a real-time Kubernetes. >> If you _______like the things we're doing, it's all real-time. >> Talk about real-time in the database world because you mentioned time-series databases. You give objects store versus blog. Talk about time series. You're talking about data that is very relevant in the moment. And also understanding time series data. And then, it's important post-event, if you will, meaning How do you store it? Do you care? I mean, it's important to manage the time series. At the same time, it might not be as valuable as other data, or valuable at certain points and time, which changes it's relationship to how it's stored and how it's used. Talk about the dynamic of time series.. >> Figured it out in the last six or 12 months that since real-time is about time series. Everything you think about real-time censored data, even video is a time-series of frames, okay And what everyone wants to do is just huge amount of time series. They want to cross-correlate it, because for example, you think about stock tickers you know, the stock has an impact from news feeds or Twitter feeds, or of a company or a segment. So essentially, what they need to do is something called multi-volume analysis of multiple time series to be able to extract some meaning, and then decide if you want to sell or buy a stock, as in vacation example. And there is a huge gap in the solution in that market, because most of the time series databases were designed for operational databases, you know, things that monitor apps. Nothing that injects millions of data points per second, and cross-correlates and run real-time AI analytics. Ah, so we've essentially extended because we have a programmable database essentially under the hoop. We've extended it to support time series data with about 50 to 1 compression ratio, compared to some other solutions. You know we've break with the customer, we've done sizing, they told them us they need half a pitabyte. After a small sizing exercise, about 10 to 20 terabytes of storage for the same data they stored in Kassandra for 500 terabytes. No huge ingestion rates, and what's very important, we can do an in-flight with all those cross-correlations, so, that's something that's working very well for us. >> This could help on smart mobility. Kenex 5G comes on, certainly. Intelligent edge. >> So the customers we have, these cases that we applied right now is in financial services, two or three main applications. One is tick data and analytics, everyone wants to be smarter learning on how to buy and sell stocks or manage risk, the second one is infrastructure, monitoring, critical infrastructure, monitoring is SLA monitoring is be able to monitor network devices, latencies, applications, you now, transaction rate, or that, be able to predict potential failures or escalation We have similar applications; we have about three Telco customers using it for real-time time. Series analytics are metric data, cybersecurity attacks, congestion avoidance, SLA management, and also automotive. Fleet management, file linking, they are also essentially feeding huge data sets of time series analytics. They're running cross-correlation and AI logic, so now they can generate triggers. Now apply to Hadoop. What does Hadoop have anything to do with those kinds of applications? They cannot feed huge amounts of datasets, they cannot react in real-time, doesn't store time-series efficiently. >> Hapoop (laughing) >> You said that. >> Yeah. That's good. >> One, I know we don't have a lot of time left. We're running out of time, but I want to make sure we get this out here. How are you engaging with customers? You guys got great technical support. We can vouch for the tech chops that you guys have. We seen the solution. If it's compatible to Kubernetes, certainly this is an alternative to have really great analytical infrastructure. Cloud native, goodness of your building, You do PFC's, they go to your website, and how do you engage, how do you get deals? How do people work with you? >> So because now we have a cloud service, so also we engage through the cloud. Mainly, we're going after customers and leads, or from webinars and activities on the internet, and we sort of follow-up with those customers, we know >> Direct sales? >> Direct sales, but through lead generation mechanism. Marketplace activity, Amazon, Azure, >> Partnerships with Azure and Google now. And Azure joint selling activities. They can actually resale and get compensated. Our solution is an edge for Azure. Working on similar solution for Google. Very focused on retailers. That's the current market focus of since you think about stores that have a single supermarket will have more than a 1,000 cameras. Okay, just because they're monitoring shelves in real-time, think about Amazon go, kind of replication. Real-time inventory management. You cannot push a 1,000 camera feeds into the cloud. In order to analyze it then decide on inventory level. Proactive action, so, those are the kind of applications. >> So bigger deals, you've had some big deals. >> Yes, we're really not a raspberry pie-kind of solution. That's where the bigger customers >> Got it. Yaron, thank you so much. The CTO of Iguazio Check him out. It's actually been great commentary. The Hadoop versus Kubernetes narrative. Love to explore that further with you. Stay with us for more coverage after this short break. We're live in day 2 of CUBE NYC. Par Strata, Hadoop Strata, Hadoop World. CUBE Hadoop World, whatever you want to call it. It's all because of the data. We'll bring it to ya. Stay with us for more after this short break. (upbeat music)
SUMMARY :
It's all about data, and that's the fundamental change Love having you on theCUBE because you always and then you evolve into things like Spark. How do you scale it? and then, you know and cloud eight on Visa Vis on premise with Hadoop. Google started the technology and couple of years and the internal conversations we report on theCUBE The companies that had the most experience It's nothing to do with Hadoop, okay? and then if so, what alternatives do you have for So you need shared access to data from those containers. The rest of the data synchronizes is like the new OS, where you have many processes, We can even control the access for each Yes, by the way, we have a managed service. So you can go and develop your work loads, your application If you And then, it's important post-event, if you will, meaning because most of the time series databases were designed for This could help on smart mobility. So the customers we have, and how do you engage, how do you get deals? and we sort of follow-up with those customers, we know Direct sales, but through lead generation mechanism. since you think about stores that have Yes, we're really not a raspberry pie-kind of solution. It's all because of the data.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
John | PERSON | 0.99+ |
Lisa Martin | PERSON | 0.99+ |
Ed Macosky | PERSON | 0.99+ |
Darren Anthony | PERSON | 0.99+ |
Yaron Haviv | PERSON | 0.99+ |
Mandy Dolly | PERSON | 0.99+ |
Mandy Dhaliwal | PERSON | 0.99+ |
David Richards | PERSON | 0.99+ |
Suzi Jewett | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
John Furrier | PERSON | 0.99+ |
HP | ORGANIZATION | 0.99+ |
two | QUANTITY | 0.99+ |
2.9 times | QUANTITY | 0.99+ |
Darren | PERSON | 0.99+ |
ORGANIZATION | 0.99+ | |
Suzi | PERSON | 0.99+ |
Silicon Angle Media | ORGANIZATION | 0.99+ |
RenDisco | ORGANIZATION | 0.99+ |
2009 | DATE | 0.99+ |
Suzie Jewitt | PERSON | 0.99+ |
HPE | ORGANIZATION | 0.99+ |
2022 | DATE | 0.99+ |
Yahoo | ORGANIZATION | 0.99+ |
Lisa | PERSON | 0.99+ |
2008 | DATE | 0.99+ |
AKS | ORGANIZATION | 0.99+ |
Las Vegas | LOCATION | 0.99+ |
500 terabytes | QUANTITY | 0.99+ |
60% | QUANTITY | 0.99+ |
2021 | DATE | 0.99+ |
Hadoop | TITLE | 0.99+ |
1,000 camera | QUANTITY | 0.99+ |
one | QUANTITY | 0.99+ |
18,000 customers | QUANTITY | 0.99+ |
five | QUANTITY | 0.99+ |
Amsterdam | LOCATION | 0.99+ |
2030 | DATE | 0.99+ |
One | QUANTITY | 0.99+ |
HIPAA | TITLE | 0.99+ |
tomorrow | DATE | 0.99+ |
2026 | DATE | 0.99+ |
Yaron | PERSON | 0.99+ |
two days | QUANTITY | 0.99+ |
Europe | LOCATION | 0.99+ |
First | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
Telco | ORGANIZATION | 0.99+ |
both | QUANTITY | 0.99+ |
three | QUANTITY | 0.99+ |
DD, Cisco + Han Yang, Cisco | theCUBE NYC 2018
>> Live from New York, It's the CUBE! Covering theCUBE, New York City 2018. Brought to you by SiliconANGLE Media and its Ecosystem partners. >> Welcome back to the live CUBE coverage here in New York City for CUBE NYC, #CubeNYC. This coverage of all things data, all things cloud, all things machine learning here in the big data realm. I'm John Furrier and Dave Vellante. We've got two great guests from Cisco. We got DD who is the Vice President of Data Center Marketing at Cisco, and Han Yang who is the Senior Product Manager at Cisco. Guys, welcome to the Cube. Thanks for coming on again. >> Good to see ya. >> Thanks for having us. >> So obviously one of the things that has come up this year at the Big Data Show, used to be called Hadoop World, Strata Data, now it's called, the latest name. And obviously CUBE NYC, we changed from Big Data NYC to CUBE NYC, because there's a lot more going on. I heard hallway conversations around blockchain, cryptocurrency, Kubernetes has been said on theCUBE already at least a dozen times here today, multicloud. So you're seeing the analytical world try to be, in a way, brought into the dynamics around IT infrastructure operations, both cloud and on premises. So interesting dynamics this year, almost a dev ops kind of culture to analytics. This is a new kind of sign from this community. Your thoughts? >> Absolutely, I think data and analytics is one of those things that's pervasive. Every industry, it doesn't matter. Even at Cisco, I know we're going to talk a little more about the new AI and ML workload, but for the last few years, we've been using AI and ML techniques to improve networking, to improve security, to improve collaboration. So it's everywhere. >> You mean internally, in your own IT? >> Internally, yeah. Not just in IT, in the way we're designing our network equipment. We're storing data that's flowing through the data center, flowing in and out of clouds, and using that data to make better predictions for better networking application performance, security, what have you. >> The first topic I want to talk to you guys about is around the data center. Obviously, you do data center marketing, that's where all the action is. The cloud, obviously, has been all the buzz, people going to the cloud, but Andy Jassy's announcement at VMworld really is a validation that we're seeing, for the first time, hybrid multicloud validated. Amazon announced RDS on VMware on-premises. >> That's right. This is the first time Amazon's ever done anything of this magnitude on-premises. So this is a signal from the customers voting with their wallet that on-premises is a dynamic. The data center is where the data is, that's where the main footprint of IT is. This is important. What's the impact of that dynamic, of data center, where the data is with the option of a cloud. How does that impact data, machine learning, and the things that you guys see as relevant? >> I'll start and Han, feel free to chime in here. So I think those boundaries between this is a data center, and this a cloud, and this is campus, and this is the edge, I think those boundaries are going away. Like you said, data center is where the data is. And it's the ability of our customers to be able to capture that data, process it, curate it, and use it for insight to take decision locally. A drone is a data center that flies, and boat is a data center that floats, right? >> And a cloud is a data center that no one sees. >> That's right. So those boundaries are going away. We at Cisco see this as a continuum. It's the edge cloud continuum. The edge is exploding, right? There's just more and more devices, and those devices are cranking out more data than ever before. Like I said, it's the ability of our customers to harness the data to make more meaningful decisions. So Cisco's take on this is the new architectural approach. It starts with the network, because the network is the one piece that connects everything- every device, every edge, every individual, every cloud. There's a lot of data within the network which we're using to make better decisions. >> I've been pretty close with Cisco over the years, since '95 timeframe. I've had hundreds of meetings, some technical, some kind of business. But I've heard that term edge the network many times over the years. This is not a new concept at Cisco. Edge of the network actually means something in Cisco parlance. The edge of the network >> Yeah. >> that the packets are moving around. So again, this is not a new idea at Cisco. It's just materialized itself in a new way. >> It's not, but what's happening is the edge is just now generating so much data, and if you can use that data, convert it into insight and make decisions, that's the exciting thing. And that's why this whole thing about machine learning and artificial intelligence, it's the data that's being generated by these cameras, these sensors. So that's what is really, really interesting. >> Go ahead, please. >> One of our own studies pointed out that by 2021, there will be 847 zettabytes of information out there, but only 1.3 zettabytes will actually ever make it back to the data center. That just means an opportunity for analytics at the edge to make sense of that information before it ever makes it home. >> What were those numbers again? >> I think it was like 847 zettabytes of information. >> And how much makes it back? >> About 1.3. >> Yeah, there you go. So- >> So a huge compression- >> That confirms your research, Dave. >> We've been saying for a while now that most of the data is going to stay at the edge. There's no reason to move it back. The economics don't support it, the latency doesn't make sense. >> The network cost alone is going to kill you. >> That's right. >> I think you really want to collect it, you want to clean it, and you want to correlate it before ever sending it back. Otherwise, sending that information, of useless information, that status is wonderful. Well that's not very valuable. And 99.9 percent, "things are going well." >> Temperature hasn't changed. (laughs) >> If it really goes wrong, that's when you want to alert or send more information. How did it go bad? Why did it go bad? Those are the more insightful things that you want to send back. >> This is not just for IoT. I mean, cat pictures moving between campuses cost money too, so why not just keep them local, right? But the basic concepts of networking. This is what I want to get in my point, too. You guys have some new announcements around UCS and some of the hardware and the gear and the software. What are some of the new announcements that you're announcing here in New York, and what does it mean for customers? Because they want to know not only speeds and feeds. It's a software-driven world. How does the software relate? How does the gear work? What's the management look like? Where's the control plane? Where's the management plane? Give us all the data. >> I think the biggest issues starts from this. Data scientists, their task is to export different data sources, find out the value. But at the same time, IT is somewhat lagging behind. Because as the data scientists go from data source A to data source B, it could be 3 petabytes of difference. IT is like, 3 petabytes? That's only from Monday through Wednesday? That's a huge infrastructure requirement change. So Cisco's way to help the customer is to make sure that we're able to come out with blueprints. Blueprints enabling the IT team to scale, so that the data scientists can work beyond their own laptop. As they work through the petabytes of data that's come in from all these different sources, they're able to collaborate well together and make sense of that information. Only by scaling with IT helping the data scientists to work the scale, that's the only way they can succeed. So that's why we announced a new server. It's called a C480ML. Happens to have 8 GPUs from Nvidia inside helping customers that want to do that deep learning kind of capabilities. >> What are some of the use cases on these as products? It's got some new data capabilities. What are some of the impacts? >> Some of the things that Han just mentioned. For me, I think the biggest differentiation in our solution is things that we put around the box. So the management layer, right? I mean, this is not going to be one server and one data center. It's going to be multiple of them. You're never going to have one data center. You're going to have multiple data centers. And we've got a really cool management tool called Intersight, and this is supported in Intersight, day one. And Intersight also uses machine learning techniques to look at data from multiple data centers. And that's really where the innovation is. Honestly, I think every vendor is bend sheet metal around the latest chipset, and we've done the same. But the real differentiation is how we manage it, how we use the data for more meaningful insight. I think that's where some of our magic is. >> Can you add some code to that, in terms of infrastructure for AI and ML, how is it different than traditional infrastructures? So is the management different? The sheet metal is not different, you're saying. But what are some of those nuances that we should understand. >> I think especially for deep learning, multiple scientists around the world have pointed that if you're able to use GPUs, they're able to run the deep learning frameworks faster by roughly two waters magnitude. So that's part of the reason why, from an infrastructure perspective, we want to bring in that GPUs. But for the IT teams, we didn't want them to just add yet another infrastructure silo just to support AI or ML. Therefore, we wanted to make sure it fits in with a UCS-managed unified architecture, enabling the IT team to scale but without adding more infrastructures and silos just for that new workload. But having that unified architecture, it helps the IT to be more efficient and, at the same time, is better support of the data scientists. >> The other thing I would add is, again, the things around the box. Look, this industry is still pretty nascent. There is lots of start-ups, there is lots of different solutions, and when we build a server like this, we don't just build a server and toss it over the fence to the customer and say "figure it out." No, we've done validated design guides. With Google, with some of the leading vendors in the space to make sure that everything works as we say it would. And so it's all of those integrations, those partnerships, all the way through our systems integrators, to really understand a customer's AI and ML environment and can fine tune it for the environment. >> So is that really where a lot of the innovation comes from? Doing that hard work to say, "yes, it's going to be a solution that's going to work in this environment. Here's what you have to do to ensure best practice," etc.? Is that right? >> So I think some of our blueprints or validated designs is basically enabling the IT team to scale. Scale their stores, scale their CPU, scale their GPU, and scale their network. But do it in a way so that we work with partners like Hortonworks or Cloudera. So that they're able to take advantage of the data lake. And adding in the GPU so they're able to do the deep learning with Tensorflow, with Pytorch, or whatever curated deep learning framework the data scientists need to be able to get value out of those multiple data sources. These are the kind of solutions that we're putting together, making sure our customers are able to get to that business outcome sooner and faster, not just a-- >> Right, so there's innovation at all altitudes. There's the hardware, there's the integrations, there's the management. So it's innovation. >> So not to go too much into the weeds, but I'm curious. As you introduce these alternate processing units, what is the relationship between traditional CPUs and these GPUs? Are you managing them differently, kind of communicating somehow, or are they sort of fenced off architecturally. I wonder if you could describe that. >> We actually want it to be integrated, because by having it separated and fenced off, well that's an IT infrastructure silo. You're not going to have the same security policy or the storage mechanisms. We want it to be unified so it's easier on IT teams to support the data scientists. So therefore, the latest software is able to manage both CPUs and GPUs, as well as having a new file system. Those are the solutions that we're putting forth, so that ARC-IT folks can scale, our data scientists can succeed. >> So IT's managing a logical block. >> That's right. And even for things like inventory management, or going back and adding patches in the event of some security event, it's so much better to have one integrated system rather than silos of management, which we see in the industry. >> So the hard news is basically UCS for AI and ML workloads? >> That's right. This is our first server custom built ground up to support these deep learning, machine learning workloads. We partnered with Nvidia, with Google. We announced earlier this week, and the phone is ringing constantly. >> I don't want to say godbot. I just said it. (laughs) This is basically the power tool for deep learning. >> Absolutely. >> That's how you guys see it. Well, great. Thanks for coming out. Appreciate it, good to see you guys at Cisco. Again, deep learning dedicated technology around the box, not just the box itself. Ecosystem, Nvidia, good call. Those guys really get the hot GPUs out there. Saw those guys last night, great success they're having. They're a key partner with you guys. >> Absolutely. >> Who else is partnering, real quick before we end the segment? >> We've been partnering with software sci, we partner with folks like Anaconda, with their Anaconda Enterprise, which data scientists love to use as their Python data science framework. We're working with Google, with their Kubeflow, which is open source project integrating Tensorflow on top of Kubernetes. And of course we've been working with folks like Caldera as well as Hortonworks to access the data lake from a big data perspective. >> Yeah, I know you guys didn't get a lot of credit. Google Cloud, we were certainly amplifying it. You guys were co-developing the Google Cloud servers with Google. I know they were announcing it, and you guys had Chuck on stage there with Diane Greene, so it was pretty positive. Good integration with Google can make a >> Absolutely. >> Thanks for coming on theCUBE, thanks, we appreciate the commentary. Cisco here on theCUBE. We're in New York City for theCUBE NYC. This is where the world of data is converging in with IT infrastructure, developers, operators, all running analytics for future business. We'll be back with more coverage, after this short break. (upbeat digital music)
SUMMARY :
It's the CUBE! Welcome back to the live CUBE coverage here So obviously one of the things that has come up this year but for the last few years, Not just in IT, in the way we're designing is around the data center. and the things that you guys see as relevant? And it's the ability of our customers to It's the edge cloud continuum. The edge of the network that the packets are moving around. is the edge is just now generating so much data, analytics at the edge Yeah, there you go. that most of the data is going to stay at the edge. I think you really want to collect it, (laughs) Those are the more insightful things and the gear and the software. the data scientists to work the scale, What are some of the use cases on these as products? Some of the things that Han just mentioned. So is the management different? it helps the IT to be more efficient in the space to make sure that everything works So is that really where a lot of the data scientists need to be able to get value There's the hardware, there's the integrations, So not to go too much into the weeds, Those are the solutions that we're putting forth, in the event of some security event, and the phone is ringing constantly. This is basically the power tool for deep learning. Those guys really get the hot GPUs out there. to access the data lake from a big data perspective. the Google Cloud servers with Google. This is where the world of data
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave Vellante | PERSON | 0.99+ |
Nvidia | ORGANIZATION | 0.99+ |
Cisco | ORGANIZATION | 0.99+ |
Han Yang | PERSON | 0.99+ |
ORGANIZATION | 0.99+ | |
New York | LOCATION | 0.99+ |
Diane Greene | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Dave | PERSON | 0.99+ |
Hortonworks | ORGANIZATION | 0.99+ |
2021 | DATE | 0.99+ |
New York City | LOCATION | 0.99+ |
Andy Jassy | PERSON | 0.99+ |
8 GPUs | QUANTITY | 0.99+ |
847 zettabytes | QUANTITY | 0.99+ |
John Furrier | PERSON | 0.99+ |
99.9 percent | QUANTITY | 0.99+ |
Monday | DATE | 0.99+ |
SiliconANGLE Media | ORGANIZATION | 0.99+ |
3 petabytes | QUANTITY | 0.99+ |
Anaconda | ORGANIZATION | 0.99+ |
Wednesday | DATE | 0.99+ |
DD | PERSON | 0.99+ |
first time | QUANTITY | 0.99+ |
one server | QUANTITY | 0.99+ |
Cloudera | ORGANIZATION | 0.99+ |
Python | TITLE | 0.99+ |
first topic | QUANTITY | 0.99+ |
one piece | QUANTITY | 0.99+ |
VMworld | ORGANIZATION | 0.99+ |
'95 | DATE | 0.98+ |
1.3 zettabytes | QUANTITY | 0.98+ |
NYC | LOCATION | 0.98+ |
both | QUANTITY | 0.98+ |
one | QUANTITY | 0.98+ |
this year | DATE | 0.98+ |
Big Data Show | EVENT | 0.98+ |
Caldera | ORGANIZATION | 0.98+ |
two waters | QUANTITY | 0.97+ |
today | DATE | 0.97+ |
Chuck | PERSON | 0.97+ |
One | QUANTITY | 0.97+ |
Big Data | ORGANIZATION | 0.97+ |
earlier this week | DATE | 0.97+ |
Intersight | ORGANIZATION | 0.97+ |
hundreds of meetings | QUANTITY | 0.97+ |
CUBE | ORGANIZATION | 0.97+ |
first server | QUANTITY | 0.97+ |
last night | DATE | 0.95+ |
one data center | QUANTITY | 0.94+ |
UCS | ORGANIZATION | 0.92+ |
petabytes | QUANTITY | 0.92+ |
two great guests | QUANTITY | 0.9+ |
Tensorflow | TITLE | 0.86+ |
CUBE NYC | ORGANIZATION | 0.86+ |
Han | PERSON | 0.85+ |
#CubeNYC | LOCATION | 0.83+ |
Strata Data | ORGANIZATION | 0.83+ |
Kubeflow | TITLE | 0.82+ |
Hadoop World | ORGANIZATION | 0.81+ |
2018 | DATE | 0.8+ |
Brent Compton, Red Hat | theCUBE NYC 2018
>> Live from New York, it's theCUBE, covering theCUBE New York City 2018. Brought to you by SiliconANGLE Media and its ecosystem partners. >> Hello, everyone, welcome back. This is theCUBE live in New York City for theCUBE NYC, #CUBENYC. This is our ninth year covering the big data ecosystem, which has now merged into cloud. All things coming together. It's really about AI, it's about developers, it's about operations, it's about data scientists. I'm John Furrier, my co-host Dave Vellante. Our next guest is Brent Compton, Technical Marketing Director for Storage Business at Red Hat. As you know, we cover Red Hat Summit and great to have the conversation. Open source, DevOps is the theme here. Brent, thanks for joining us, thanks for coming on. >> My pleasure, thank you. >> We've been talking about the role of AI and AI needs data and data needs storage, which is what you do, but if you look at what's going on in the marketplace, kind of an architectural shift. It's harder to find a cloud architect than it is to find diamonds these days. You can't find a good cloud architect. Cloud is driving a lot of the action. Data is a big part of that. What's Red Hat doing in this area and what's emerging for you guys in this data landscape? >> Really, the days of specialists are over. You mentioned it's more difficult to find a cloud architect than find diamonds. What we see is the infrastructure, it's become less about compute as storage and networking. It's the architect that can bring the confluence of those specialties together. One of the things that we see is people bringing their analytics workloads onto the common platforms where they've been running the rest of their enterprise applications. For instance, if they're running a lot of their enterprise applications on AWS, of course, they want to run their analytics workloads in AWS and that's EMRs long since in the history books. Likewise, if they're running a lot of their enterprise applications on OpenStack, it's natural that they want to run a lot of their analytics workloads on the same type of dynamically provisioned infrastructure. Emerging, of course, we just announced on Monday this week with Hortonworks and IBM, if they're running a lot of their enterprise applications on a Kubernetes substrate like OpenShift, they want to run their analytics workloads on that same kind of agile infrastructure. >> Talk about the private cloud impact and hybrid cloud because obviously we just talked to the CEO of Hortonworks. Normally it's about early days, about Hadoop, data legs and then data planes. They had a good vision. They're years into it, but I like what Hortonworks is doing. But he said Kubernetes, on a data show Kubernetes. Kubernetes is a multi-cloud, hybrid cloud concept, containers. This is really enabling a lot of value and you guys have OpenShift which became very successful over the past few years, the growth has been phenomenal. So congratulations, but it's pointing to a bigger trend and that is that the infrastructure software, the platform as a service is becoming the middleware, the glue, if you will, and Kubernetes and containers are facilitating a new architecture for developers and operators. How important is that with you guys, and what's the impact of the customer when they think, okay I'm going to have an agile DevOps environment, workload portability, but do I have to build that out? You mentioned people don't have to necessarily do that anymore. The trend has become on-premise. What's the impact of the customer as they hear Kubernetese and containers and the data conversation? >> You mentioned agile DevOps environment, workload portability so one of the things that customers come to us for is having that same thing, but infrastructure agnostic. They say, I don't want to be locked in. Love AWS, love Azure, but I don't want to be locked into those platforms. I want to have an abstraction layer for my Kubernetese layer that sits on top of those infrastructure platforms. As I bring my workloads, one-by-one, custom DevOps from a lift and shift of legacy apps onto that substrate, I want to have it be independent, private cloud or public cloud and, time permitting, we'll go into more details about what we've seen happening in the private cloud with analytics as well, which is effectively what brought us here today. The pattern that we've discovered with a lot of our large customers who are saying, hey, we're running OpenStack, they're large institutions that for lots of reasons they store a lot of their data on-premises saying, we want to use the utility compute model that OpenStack gives us as well as the shared data context that Ceph gives us. We want to use that same thing for our analytics workload. So effectively some of our large customers taught us this program. >> So they're building infrastructure for analytics essentially. >> That's what it is. >> One of the challenges with that is the data is everywhere. It's all in silos, it's locked in some server somewhere. First of all, am I overstating that problem and how are you seeing customers deal with that? What are some of the challenges that they're having and how are you guys helping? >> Perfect lead in, in fact, one of our large government customers, they recently sent us an unsolicited email after they deployed the first 10 petabytes in a deca petabyte solution. It's OpenStack based as well as Ceph based. Three taglines in their email. The first was releasing the lock on data. The second was releasing the lock on compute. And the third was releasing the lock on innovation. Now, that sounds a bit buzzword-y, but when it comes from a customer to you. >> That came from a customer? Sounds like a marketing department wrote that. >> In the details, as you know, traditional HDFS clusters, traditional Hadoop clusters, sparklers or whatever, HDFS is not shared between clusters. One of our large customers has 50 plus analytics clusters. Their data platforms team employ a maze of scripts to copy data from one cluster to the other. And if you are a scientist or an engineer, you'd say, I'm trying to obtain these types of answers, but I need access to data sets A, B, C, and D, but data sets A and B are only on this cluster. I've got to go contact the data platforms team and have them copy it over and ensure that it's up-to-date and in sync so it's messy. >> It's a nightmare. >> Messy. So that's why the one customer said releasing the lock on data because now it's in a shared. Similar paradigm as AWS with EMR. The data's in a shared context, an S3. You spin up your analytics workloads on AC2. Same paradigm discussion as with OpenStack. Your spinning up your analytics workloads via OpenStack virtualization and their sourcing is shared data context inside of Ceph, S3 compatible Ceph so same architecture. I love his last bit, the one that sounds the most buzzword-y which was releasing lock on innovation. And this individual, English was not this person's first language so love the word. He said, our developers no longer fear experimentation because it's so easy. In minutes they can spin up an analytics cluster with a shared data context, they get the wrong mix of things they shut it down and spin it up again. >> In previous example you used HDFS clusters. There's so many trip wires, right. You can break something. >> It's fragile. >> It's like scripts. You don't want to tinker with that. Developers don't want to get their hand slapped. >> The other thing is also the recognition that innovation comes from data. That's what my takeaway is. The customer saying, okay, now we can innovate because we have access to the data, we can apply intelligence to that data whether it's machine intelligence or analytics, et cetera. >> This the trend in infrastructure. You mentioned the shared context. What other observations and learnings have you guys come to as Red Hat starts to get more customer interactions around analytical infrastructure. Is it an IT problem? You mentioned abstracting the way different infrastructures, and that means multi-cloud's probably setup for you guys in a big way. But what does that mean for a customer? If you had to explain infrastructure analytics, what needs to get done, what does the customer need to do? How do you describe that? >> I love the term that industry uses of multi-tenant workload isolation with shared data context. That's such a concise term to describe what we talk to our customers about. And most of them, that's what they're looking for. They've got their data scientist teams that don't want their workloads mixed in with the long running batch workloads. They say, listen, I'm on deadline here. I've got an hour to get these answers. They're working with Impala. They're working with Presto. They iterate, they don't know exactly the pattern they're looking for. So having to take a long time because their jobs are mixed in with these long MapReduce jobs. They need to be able to spin up infrastructure, workload isolation meaning they have their own space, shared context, they don't want to be placing calls over to the platform team saying, I need data sets C, D, and E. Could you please send them over? I'm on deadline here. That phrase, I think, captures so nicely what customers are really looking to do with their analytics infrastructure. Analytics tools, they'll still do their thing, but the infrastructure underneath analytics delivering this new type of agility is giving that multi-tenant workload isolation with shared data context. >> You know what's funny is we were talking at the kickoff. We were looking back nine years. We've been at this event for nine years now. We made prediction there will be no Red Hat of big data. John, years ago said, unless it's Red Hat. You guys got dragged into this by your customers really is how it came about. >> Customers and partners, of course with your recent guest from Hortonworks, the announcement that Red Hat, Hortonworks, and IBM had on Monday of this week. Dialing up even further taking the agility, okay, OpenStack is great for agility, private cloud, utility based computing and storage with OpenStack and Ceph, great. OpenShift dials up that agility another notch. Of course, we heard from the CEO of Hortonworks how much they love the agility that a Kubernetes based substrate provides their analytics customers. >> That's essentially how you're creating that sort of same-same experience between on-prem and multi-cloud, is that right? >> Yeah, OpenShift is deployed pervasively on AWS, on-premises, on Azure, on GCE. >> It's a multi-cloud world, we see that for sure. Again, the validation was at VMworld. AWS CEO, Andy Jassy announced RDS which is their product on VMware on-premises which they've never done. Amazon's never done any product on-premises. We were speculating it would be a hardware device. We missed that one, but it's a software. But this is the validation, seamless cloud operations on-premise in the cloud really is what people want. They want one standard operating model and they want to abstract away the infrastructure, as you were saying, as the big trend. The question that we have is, okay, go to the next level. From a developer standpoint, what is this modern developer using for tools in the infrastructure? How can they get that agility and spinning up isolated, multi-tenant infrastructure concept all the time? This is the demand we're seeing, that's an evolution. Question for Red Hat is, how does that change your partnership strategy because you mentioned Rob Bearden. They've been hardcore enterprise and you guys are hardcore enterprise. You kind of know the little things that customers want that might not be obvious to people: compliance, certification, a decade of support. How is Red Hat's partnership model changing with this changing landscape, if you will? You mentioned IBM and Hortonworks release this week, but what in general, how does the partnership strategy look for you? >> The more it changes, the more it looks the same. When you go back 20 years ago, what Red Hat has always stood for is any application on any infrastructure. But back in the day it was we had n-thousand of applications that were certified on Red Hat Linux and we ran on anybody's server. >> Box. >> Running on a box, exactly. It's a similar play, just in 2018 in the world of hybrid, multi-cloud architectures. >> Well, you guys have done some serious heavy lifting. Don't hate me for saying this, but you're kind of like the mules of the industry. You do a lot of stuff that nobody either wants to do or knows how to do and it's really paid off. You just look at the ascendancy of the company, it's been amazing. >> Well, multi-cloud is hard. Look at what it takes to do multi-cloud in DevOps. It's not easy and a lot of pretenders will fall out of the way, you guys have done well. What's next for you guys? What's on the horizon? What's happening for you guys this next couple months for Red Hat and technology? Any new announcements coming? What's the vision, what's happening? >> One of the announcements that you saw last week, was Red Hat, Cloudera, and Eurotech as analytics in the data center is great. Increasingly, the world's businesses run on data-driven decisions. That's great, but analytics at the edge for more realtime industrial automation, et cetera. Per the announcements we did with Cloudera and Eurotech about the use of, we haven't even talked about Red Hat's middleware platforms, such as AMQ Streams now based on Kafka, a Kafka distribution, Fuze, an integration master effectively bringing Red Hat technology to the edge of analytics so that you have the ability to do some processing in realtime before back calling all the way back to the data center. That's an area that you'll also see is pushing some analytics to the edge through our partnerships such as announced with Cloudera and Eurotech. >> You guys got the Red Hat Summit coming up next year. theCUBE will be there, as usual. It's great to cover Red Hat. Thanks for coming on theCUBE, Brent. Appreciate it, thanks for spending the time. We're here in New York City live. I'm John Furrier, Dave Vallante, stay with us. All day coverage today and tomorrow in New York City. We'll be right back. (upbeat music)
SUMMARY :
Brought to you by SiliconANGLE Media Open source, DevOps is the theme here. Cloud is driving a lot of the action. One of the things that we see is people and that is that the infrastructure software, the shared data context that Ceph gives us. So they're building infrastructure One of the challenges with that is the data is everywhere. And the third was releasing the lock on innovation. That came from a customer? In the details, as you know, I love his last bit, the one that sounds the most buzzword-y In previous example you used HDFS clusters. You don't want to tinker with that. that innovation comes from data. You mentioned the shared context. I love the term that industry uses of You guys got dragged into this from Hortonworks, the announcement that Yeah, OpenShift is deployed pervasively on AWS, You kind of know the little things that customers want But back in the day it was we had n-thousand of applications in the world of hybrid, multi-cloud architectures. You just look at the ascendancy of the company, What's on the horizon? One of the announcements that you saw last week, You guys got the Red Hat Summit coming up next year.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave Vallante | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
John | PERSON | 0.99+ |
Brent Compton | PERSON | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
John Furrier | PERSON | 0.99+ |
Eurotech | ORGANIZATION | 0.99+ |
Hortonworks | ORGANIZATION | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Brent | PERSON | 0.99+ |
New York City | LOCATION | 0.99+ |
2018 | DATE | 0.99+ |
Red Hat | ORGANIZATION | 0.99+ |
Rob Bearden | PERSON | 0.99+ |
nine years | QUANTITY | 0.99+ |
Andy Jassy | PERSON | 0.99+ |
last week | DATE | 0.99+ |
first language | QUANTITY | 0.99+ |
Three taglines | QUANTITY | 0.99+ |
SiliconANGLE Media | ORGANIZATION | 0.99+ |
first | QUANTITY | 0.99+ |
tomorrow | DATE | 0.99+ |
second | QUANTITY | 0.99+ |
One | QUANTITY | 0.99+ |
Cloudera | ORGANIZATION | 0.99+ |
next year | DATE | 0.99+ |
third | QUANTITY | 0.99+ |
New York | LOCATION | 0.99+ |
Impala | ORGANIZATION | 0.99+ |
Monday this week | DATE | 0.99+ |
VMworld | ORGANIZATION | 0.98+ |
one cluster | QUANTITY | 0.98+ |
Red Hat Summit | EVENT | 0.98+ |
ninth year | QUANTITY | 0.98+ |
one | QUANTITY | 0.98+ |
OpenStack | TITLE | 0.98+ |
today | DATE | 0.98+ |
NYC | LOCATION | 0.97+ |
20 years ago | DATE | 0.97+ |
Kubernetese | TITLE | 0.97+ |
Kafka | TITLE | 0.97+ |
First | QUANTITY | 0.96+ |
this week | DATE | 0.96+ |
Red Hat | TITLE | 0.95+ |
English | OTHER | 0.95+ |
Monday of this week | DATE | 0.94+ |
OpenShift | TITLE | 0.94+ |
one standard | QUANTITY | 0.94+ |
50 plus analytics clusters | QUANTITY | 0.93+ |
Ceph | TITLE | 0.92+ |
Azure | TITLE | 0.92+ |
GCE | TITLE | 0.9+ |
Presto | ORGANIZATION | 0.9+ |
agile DevOps | TITLE | 0.89+ |
theCUBE | ORGANIZATION | 0.88+ |
DevOps | TITLE | 0.87+ |
Rob Bearden, Hortonworks | theCUBE NYC 2018
>> Live from New York, it's theCUBE, covering theCUBE, New York City, 2018. Brought to you by SiliconANGLE Media and its ecosystem partners. >> And welcome to theCUBE here in New York City. We're live from CUBE NYC, this is our big data now: AI, now all things cloud 9 years covering the beginning of Hadoop. Now into cloud and data as the center of the value I'm John Furrier with David Vellante. Our special guest is Rob Bearden, CEO of Hortonworks CUBE alumni, been on many times Great supporter of theCUBE, legend in OpenSource Great to see you. >> It's great to be here, thanks. Yes, absolutely. >> So one of the things I wanted to talk to you about is that OpenSource certainly has been a big part of the Ethos, just seeing it in all sectors, again, growing even in Blockchain, Open Ethos is growing. The role of data now certainly in the center. You guys have been on this vision of open data, if you will and making data, and move and flight, maybe rest all these things are going on. Certainly the Hadoop world has changed, not just Hadoop and data lakes anymore, it's data. All things data, it's happening. This is core to your business, you guys have been banging this drum for a long time. Stock's at an all-time high. Congratulations on the business performance. So it's working, things are working for you guys. >> I think the model in this strategy are really coming together nicely. And to your point, it's about all the data. It's about the entire life-cycle of the data and bringing all data under management through its entire life-cycle. And being able to give the enterprise that accessibility to that data across each tier on-prem, private cloud, and across all the multi-clouds. And that's really changed, really in many regards, the overall core architecture of Hadoop and how it needs to manage data. And how it needs to interact with other data sources. And our model and strategy is been about not going above the Hadoop stack, but actually going out to the edge, and bringing data under management from the point of origination through its entire movement life-cycle until it comes at rest, and then have the ability, to deploy and access that data across each tier and across a multi-cloud environment. And it's a hybrid architecture world now. >> You guys have been on this trend for a while now, it's kind of getting lift obviously you're seeing the impact that cloud, impact AI cause the faster computer you have, the faster you can process data, the faster the data can be used, machine learning it's a nice flywheel. So again, that flywheel is being recognized. So I have to ask you, what is in your opinion, been the impact of cloud computing, specifically the Amazons, and the Azures, and now Google where certainly AI is in the center of their proposition, now hybrid cloud is validated with Amazon announcing RDS on the premises on VMWARE. That's the first Amazon ever, ever on premises activity. So this is clearly a validation of hybrid cloud. How has the cloud impacted the data space, and if you will, it used to be data warehousing, cloud has changed that. What's your opinion? >> Well what's it's done is given a, an architectural extension to the enterprise of what their data architecture needs to be, and the real key is, it's now, it's not about hybrid or cloud or on-prem, it's about having a data strategy overall. And how do I bring all my different assets, and bring a connected community together, in real-time? because what enterprise is trying to do is, connect and have higher velocity and faster visibility between the enterprise, the product, their customer, and their supply chain. And to do that, they need to be able to aggregate data into the best economic platform from the point of origination, maybe starting from the component on their product, a single component, and be able to bring all that data together through its life-cycle, aggregate it, and then deploy it on the most economically feasible tier. Whether that's on-prem, or a private cloud, or across multiple public clouds. And our platform with HDF, HDP, and data plane and complete that hybrid data architecture. And by doing that, the real value is then the cloud, AI and machine learning capabilities have the ability now to access all data across the enterprise, whether it be their tier in the cloud, or whether that be on-prem. And our strategy is around bringing that and being that fabric, to bring all the interconnectivity irrespective of whether it sits on the edge and the cloud is somewhere in between. Because the more accessibility AI has to data, the faster velocity of driving value back in to that AI cycle. >> Yeah, people don't want to move data if they don't have to And so, and we've been on this for a while, that this idea that you want to bring the cloud model to your data, and not the data to the cloud always. And so, how do you do that? How do you make it this kind of same, same environment? What role does HortonWorks play in it? >> Well the first thing we want to do is, bring the data under management from and through its life-cycle where HDF goes to the edge, brings the data through its movement cycle, aggregates the streams. HDP is the data at rest platform that can sit on-prem and a public cloud or a private cloud. And then data plains that fabric, that ensures that we have connectivity to all types of data across all tiers. And then serves as the common security and governance framework, irrespective of which tier that is. And that's very very important. And then that then gives the AI platforms the ability to bring AI onto a broader array of data, that they can then have a higher and better impact on it than just having an isolated AI impact on just a single tier I data in the cloud. >> Well that messages seems to be resonating, we talked earlier about the stock price, but also I think Neil Bushery and Frank Sluben popularized the metric of number of seven-figure deals. You guys are closing some big deals, and remember in the early days Robert Vor Breath, people are like how these guys going to sell anything, it's all open-source and you're doing a lot of a million plus dollar deals. So it's resonating not only with the streep but also with enterprises, your thoughts. >> Last quarter we, I think the key is that the industry really understands, the investors understand, the enterprises really now understand the importance of hybrid and hybrid cloud. And it's not going to be all about managing data lakes on-prem. All the data's not going to go and have this giant line of demarkation and now all reside in the cloud. It has to coexist across each tier and our role is to be that aggregation point. >> And you've seen the big cloud players now, all it's the big three, all have on-prem strategies. Azure with Azure Stack, Google we saw Kubernetes on-prem, and even AWS now, the last load up putting RDS on-prem announced that VMWorld. So they've all sort of recognized that not everything's going to go into the cloud. So that's got to be, you know good confirmation for you guys >> It's great validation. What is also says though is, we must have cloud first architecture and a cloud first approach with all of our tech. And the key to that is, from our standpoint, within our strategy is to containerize everything. And we had an announcement earlier this week that was really a three-way announcement between us, Red Hat, and IBM; and the essence of that announcement is we've adopted the Kubernetes distro from Red Hat. To where we're are containerizing all of our platforms with Red Hat's Kubernetes distribution. And what that does, is gives us the ability to optimize our platforms for OpenShift, the Red Hat pass, and optimize then the deployment of that and the IBM private cloud, right. And naturally data plane will also then give us the ability, to extend those workloads; those very granular workloads up in to the public clouds, and we can even leverage their native objects stores. >> So that's an interesting love triangle right? You and Red Hat are kind of birds of a feather with open-source. IBM has always been a big proponent of open-source, you know funded Linux in the early days. And then brings this, a massive channel and brand, you know to that world. >> Yes. And you know this is really going to accelerate our movement into a cloud first architecture, with pure containerization. And the reason that's so important is, it gives us that modularity to move those applications and those workloads, across whichever tiers most appropriate architecturally for it to run and be deployed. >> You know we said this on theCUBE many many years ago, and continues to be this theme, enterprise is one really wanting hardened solutions, but they don't mind experimenting. And Stu Miniman and I, were always talking about and comparing OpenStack ecosystem to what's happened in the Hadoop ecosystem. There's some pockets of relevance and it's a lot of work to build your own, and OpenStack has a great solution for certain use cases, now mostly on the infrastructure side But when cloud came in and changed the game, because you saw things like Kubernetes. I mean we're here at the Hadoop show that started with Hadoop, now it's AI, the word Kubernetes is being talked about. You mentioned hybrid cloud, these aren't words that were spoken at an event like this. So the IT problem in multi-cloud has always been a storage issue. So you do some storage work, you got to store the data somewhere, but now you're talking about Kubernetes. You're talking about orchestration around workloads, the role of data in workloads. This is what enterprise IT actually cares about right now. This is not like, a small little thing, it's a big deal because data is not only in the workloads, they're using instrumentation with containers, with service meshes around the coin. You're starting to see policy, this is hardcore B2B enterprise features. >> This is where with what we're seeing is a massive transformational shift of how the IT architecture's going to look for the next 20 years. Right. The IT world it is been horribly constrained from this very highly configured, very procedural-based applications and now they want to create high velocity engagement between the enterprise, their product, their customer and supply chain. They were so constrained with these very procedural-based applications and containerization gives the ability now to create that velocity and to move those workloads, and those interactions between that four pillars. >> Now let's talk about the edge. Cause the pendulum is clearly swinging sort of back to some decentralization going on, and the edge to us is a data play. We talk about it all the time. What are your thoughts on the edge, where does HortonWorks fit? What's your vision of the data modeling and how that evolves? >> That goes back to, the insight to that would be our strategy and what we did and had the great fortune, quite frankly, of having the ability to merge on Yara and HortonWorks back in 2015. And we wanted, and the whole goal of that besides working with a great team, Joe Witt had built, is being able to get to the edge. And what we wanted to have the ability to do, was to operate on every sensor, on every device at the edge for the customer so that they could bring the data under management whenever that may be, through its entire life-cycle; so from point of origination through its movement until it comes at rest. So our belief is that if we can bring enough intelligence and faster insights as that data is being generated, and as events or conditions are happening, moving, or changing before it ever comes to rest we can process and take prescriptive action. Leveraging AI and machine learning as it's in its life-cycle we can dramatically decrease the amount of data we have to bring to rest. We can just bring the province the metadata to rest and have that insight. And we try to get to these high velocity, real-time insights starting with the data on the edge. And that's why we think it's so important to manage the entire life-cycle. And then, what's even more important is then put that data, on to what ever tier. That may be bring it back to rest in a day like on-prem, right, to aggregate with other like data structures. Or it may be, take it into cold storage on a native object store in a cloud, that has the lowest cost of storage structure for a particular time. >> Or take an action on the edge and leave it there. >> Yeah. You guys definitely think about the edge in a big way, that's pretty obvious. But what I want to get your thoughts on is an emerging area we're watching, and I'll call it for lack of a better description, programmable data. And you mentioned data architecture is being setup probably set a 10, 20 year run for enterprises they setup their data architecture with the cloud architects. Making data programmable is kind of a dev-ops concept right. And this is something that you guys have thought about with the data plane, what's your reaction to this notion of making data programmable? When you start talking about Kubernetes, you're going to have statefull applications, stateless applications, you have new dynamics I call it API 2.0 happening. Whole new infrastructure happening, data has to be programmable, going to need policy around it, the role of data's certainly changing rather than storing it somewhere. What's your view of programmable data, making it programmable? >> Well you've got to be able to, to truly have programmable data, you can't have slices of accessibility or window. You have to understand the lineage of that entire data, and the context of that data through its entire life-cycle. That's step and point number one. Point number two is, you have to be able to have that containerized so that you can take the module of data that you want to take prescriptive action against, or create action against a condition. And to be able to do that in granular bites or chunks, right. And then you've got to have accessibility to all the other contextual data, which means whether that's as its in motion as its at rest or, as its contextual cousin if you will, that sits up in an object store on another tier in a public cloud. Right. But what's important is that you have to be able to control and understand the entire lineage of that. And therefore, that's where our second step in this is data plane. And having the ability to have a full security model through that entire architectural chain, as well as the entire governance and lineage leveraging, leveraging atlas through data plane. And that then gives you the ability to take these very prescriptive actions that are driven through AI and machine learning insights. >> And that makes you very agile, love it. I mean the ethos of open-source and dev-ops is literally being applied to every thing. We see it with at the network layer, you see it at the data layer, you're starting to see this concept of dev and ops being applied in a big way. >> The next you know, previous years we've talked about what we're trying to accomplish. And we've started HortonWorks, it was about changing the data architecture for the next 20 years and how data was going to be managed. And that's had, to your earlier point we opened up the show, that's had twists and turns. Hadoop's evolved, the nature and velocity of data has evolved in the last five, six, seven, eight years you know. It's about going to the edge, it's about leveraging the cloud and we're very excited about where we're positioned as this massive transformation's happening. And what we're seeing is the iteration of change, is happening at an incredibly fast pace. Even much more so than it was two, three years ago. >> Yeah, the clock speeds definitely up, their data is working. People putting it to work. What works... >> They're able to get more value faster because of it. >> The AI is great. >> The data economy is here and now. And the enterprise understands it. So they want to now move aggressively to change and transform their business model to take advantage of what their data is giving them the ability to do. >> That's great. They always want the value, and they want it fast and anything gets in the way they'll remove the blockers as what we say. >> Alright, it's theCUBE here Rob Bearden, CEO of Hortonworks giving his vision but also an update on the company; data at the center of the value proposition. This is about AI, it's about big data, it's about the cloud. It's theCUBE bringing you, theCUBE data here in New York City. CUBENYC, that's the hashtag; check us out on Twitter. Stay with us for a live coverage all day today and tomorrow here in New York City. We'll be right back after this short break. (upbeat music)
SUMMARY :
Brought to you by SiliconANGLE Media Now into cloud and data as the center of the value It's great to be here, thanks. So one of the things I wanted to talk to you about above the Hadoop stack, but actually going out to the edge, How has the cloud impacted the data space, and if you will, have the ability now to access all data across the and not the data to the cloud always. HDP is the Well that messages seems to be resonating, And it's not going to be So that's got to be, you know good confirmation for you guys And the key to that is, from our standpoint, And then brings this, a massive channel and brand, And the reason that's because data is not only in the workloads, they're using containerization gives the ability now to create going on, and the edge to us is a data play. the metadata to rest and have that insight. And this is something that you guys have thought about And having the ability to have a full security model And that makes you very agile, love it. And that's had, to your earlier point we opened up the show, Yeah, the clock speeds definitely up, their data And the enterprise understands it. and they want it fast and anything gets in the way it's about the cloud.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
David Vellante | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Rob Bearden | PERSON | 0.99+ |
Frank Sluben | PERSON | 0.99+ |
2015 | DATE | 0.99+ |
Hortonworks | ORGANIZATION | 0.99+ |
John Furrier | PERSON | 0.99+ |
10 | QUANTITY | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
New York City | LOCATION | 0.99+ |
Yara | ORGANIZATION | 0.99+ |
New York | LOCATION | 0.99+ |
Joe Witt | PERSON | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
Stu Miniman | PERSON | 0.99+ |
tomorrow | DATE | 0.99+ |
Amazons | ORGANIZATION | 0.99+ |
ORGANIZATION | 0.99+ | |
SiliconANGLE Media | ORGANIZATION | 0.99+ |
Kubernetes | TITLE | 0.99+ |
CUBE | ORGANIZATION | 0.99+ |
second step | QUANTITY | 0.99+ |
VMWorld | ORGANIZATION | 0.99+ |
today | DATE | 0.99+ |
Last quarter | DATE | 0.99+ |
HortonWorks | ORGANIZATION | 0.99+ |
Robert Vor Breath | PERSON | 0.98+ |
first | QUANTITY | 0.98+ |
Neil Bushery | PERSON | 0.98+ |
six | QUANTITY | 0.98+ |
each tier | QUANTITY | 0.98+ |
Hadoop | TITLE | 0.97+ |
seven-figure deals | QUANTITY | 0.97+ |
Point number two | QUANTITY | 0.97+ |
two | DATE | 0.97+ |
seven | QUANTITY | 0.97+ |
theCUBE | ORGANIZATION | 0.97+ |
OpenShift | TITLE | 0.97+ |
OpenSource | ORGANIZATION | 0.96+ |
each tier | QUANTITY | 0.96+ |
2018 | DATE | 0.96+ |
three years ago | DATE | 0.95+ |
earlier this week | DATE | 0.95+ |
first thing | QUANTITY | 0.93+ |
eight years | QUANTITY | 0.93+ |
single component | QUANTITY | 0.93+ |
VMWARE | TITLE | 0.93+ |
Linux | TITLE | 0.92+ |
first approach | QUANTITY | 0.92+ |
point number one | QUANTITY | 0.9+ |
first architecture | QUANTITY | 0.9+ |
Red Hat | ORGANIZATION | 0.88+ |
NYC | LOCATION | 0.88+ |
Ethos | ORGANIZATION | 0.88+ |
CEO | PERSON | 0.88+ |
Open Ethos | ORGANIZATION | 0.88+ |
one | QUANTITY | 0.87+ |
three-way announcement | QUANTITY | 0.87+ |
next 20 years | DATE | 0.86+ |
Red Hat | TITLE | 0.84+ |
single tier | QUANTITY | 0.83+ |
OpenStack | TITLE | 0.82+ |
20 year | QUANTITY | 0.82+ |
Azure Stack | TITLE | 0.79+ |
9 years | QUANTITY | 0.77+ |
many years ago | DATE | 0.77+ |
Hortonworks CUBE | ORGANIZATION | 0.76+ |
three | QUANTITY | 0.76+ |
Arun Murthy, Hortonworks | theCUBE NYC 2018
>> Live from New York, it's The Cube, covering The Cube New York City 2018 brought to you by SiliconAngle Media and its Ecosystem partners. >> Okay, welcome back everyone, here live in New York City for Cube NYC, formally Big Data NYC, now called CubeNYC. The topic has moved beyond big data. It's about cloud, it's about data, it's also about potentially blockchain in the future. I'm John Furrier, Dave Vellante. We're happy to have a special guest here, Arun Murthy. He's the cofounder and chief product officer of Hortonworks, been in the Ecosystem from the beginning, at Yahoo, already been on the Cube many times, but great to see you, thanks for coming in, >> My pleasure, >> appreciate it. >> thanks for having me. >> Super smart to have you on here, because a lot of people have been squinting through the noise of the market place. You guys have been now for a few years on this data plan idea, so you guys have actually launched Hadoop with Cloudera, they were first. You came after, Yahoo became second, two big players. Evolved it quickly, you guys saw early on that this is bigger than Hadoop. And now, all the conversations on what you guys have been talking about three years ago. Give us the update, what's the product update? How is the hybrids a big part of that, what's the story? >> We started off being the Hadoop company, and Rob, our CEO who was here on Cube, a couple of hours ago, he calls it sort of the phase one of the company, where it were Hadoop company. Very quickly realized we had to help enterprises manage the entire life cycle data, all the way from the edge to the data center, to the cloud, and between, right. So which is why we did acquisition of YARN, we've been talking about it, which kind of became the basis of our Hot marks Data flow product. And then as we went through the phase of that journey it was quickly obvious to us that enterprises had to manage data and applications in a hybrid manner right which is both on prem And public load and increasingly Edge, which is really very we spend a lot of time these days With IOT and everything from autonomous cars to video monitoring to all these aspects coming in. Which is why we wanted to get to the data plan architecture it allows to get you to a consistent security governance model. There's a lot of, I'll call it a lot of, a lot of fight about Cloud being insecure and so on, I don't think there's anything inherently insecure about the Cloud. The issue that we see is lack of skills and our enterprises know how to manage the data on-prem they know how to do LDAP, groups, and curb rows, and AAD, and what have you, they just don't have the skill sets yet to be able to do it on the public load, which leads to mistakes occasionally. >> Um-hm. >> And Data breaches and so on. So we recognize really early that part of data plan was to get that consistent security in governance models, so you don't have to worry about how you set up IMRL's on Amazon versus LDAP on-prem versus something else on Google. >> It's operating consistency. >> It's operating, exactly. I've talked about this in the past. So getting that Data plan was that journey, and this week at Charlotte work week we announced was we wanted to take that step further we've been able to kind of allow enterprise to manage this hybrid architecture on prem, multiple public loads. >> And the Edge. >> In a connected manner, the issue we saw early on and it's something we've been working on for a long while. Is that we've been able to connect the architectures Hadoop when it started it was more of an on premise architecture right, and I was there in 2005, 2006 when it started, Hadoop's started was bought on the world wide web we had a gigabyte of ethernet and I was up to the rack. From the rack on we had only eight gigs up to the rack so if you have a 2000 or cluster your dealing with eight gigs of connection. >> Bottleneck >> Huge bottleneck, fast forward today, you have at least ten if not one hundred gigabits. Moving to one hundred to a terabyte architecture, for that standpoint, and then what's happening is everything in that world, if you had the opportunity to read things on the assumptions we have in Hadoop. And then the good news is that when Cloud came along Cloud already had decoupled storage and architecture, storage and compute architectures. As we've sort of helped customers navigate the two worlds, with data plan, it's been a journey that's been reasonably successful and I think we have an opportunity to kind of provide identical consistent architectures both on prem and on Cloud. So it's almost like we took Hadoop and adapted it to Cloud. I think we can adapt the Cloud architecture back on prem, too to have consistent architectures. >> So talk about the Cloud native architecture. So you have a post that just got published. Cloud native architecture for big data and the data center. No, Cloud native architecture to big data in the data center. That's hyrid, explain the hybrid model, how do you define that? >> Like I said, for us it's really important to be able to have consistent architectures, consistent security, consistent governance, consistent way to manage data, and consistent way to actually to double up and port applications. So portability for data is important, which is why having security and governance consistently is a key. And then portability for the applications themselves are important, which is why we are so excited to kind of be, kind of first to embrace the whole containerize the ecosystem initiative. We've announced the open hybrid architecture initiative which is about decoupling storage and compute and then leveraging containers for all the big data apps, for the entire ecosystem. And this is where we are really excited to be working with both IBM and Redhat especially Redhat given their sort of investments in Kubernetes and open ship. We see that much like you'll have S3 and EC2, S3 for storage, EC2 for compute, and same thing with ADLS and azure compute. You'll actually have the next gen HDFS and Kubernetives. So is this a massive architectural rewrite, or is it more sort of management around the core. >> Great question. So part of it is evolution of the architecture. We have to get, whether it's Spark or Kafka or any of these open source projects, we need to do some evolution in the architecture, to make them work in the ecosystem, in the containerized world. So we are containerizing every one of the 28 animals 30 animals, in the zoo, right. That's a lot of work, we are kind of you know, sort of do it, we've done it in the past. Along with your point it's not enough to just have the architecture, you need to have a consistent fabric to be able to manage and operate it, which is really where the data plan comes in again. That was really the point of data plane all the time, this is a multi-roadmap, you know when we sit down we are thinking about what we'll do in 22, and 23. But we really have to execute on a multi-roadmap. >> And Data plane was a lynch pin. >> Well it was just like the sharp edge of the sword. Right, it was the tip of the sphere, but really the idea was always that we have to get data plan in to kind of get that hybrid product out there. And then we can sort of get to a inter generational data plan which would work with the next generation of the big data ecosystem itself. >> Do you see Kubernetes and things like Kubernetes, you've got STO a few service meshes up the stack, >> Absolutely are going to play a pretty instrumental role around orchestrating work loads and providing new stateless and stateful application with data, so now data you've got more data being generated there. So this is a new dynamic, it sounds like that's a fit for what you guys are doing. >> Which is something we've seen for awhile now. Like containers are something we've tracked for a long time and really excited to see Docker and RedHat. All the work that they are doing with Redhat containers. Get the security and so on. It's the maturing of that ecosystem. And now, the ability to port, build and port applications. And the really cool part for me is that, we will definitely see Kubenetes and open shift, and prem but even if you look at the Cloud the really nice part is that each of the Cloud providers themselves, provide a Kubenesos. Whether it's GKE on Google or Fargate on Amazon or AKS on Microsoft, we will be able to take identical architectures and leverage them. When we containerize high mark aft or spark we will be able to do this with kubernetes on spark with open shift and there will be open shift on leg which is available in the public cloud but also GKE and Fargate and AKS. >> What's interesting about the Redhat relationship is that I think you guys are smart to do this, is by partnering with Redhat you can, customers can run their workloads, analytical workloads, in the same production environment that Redhat is in. But with kind of differentiation if you will. >> Exactly with data plane. >> Data plane is just a wonderful thing there. So again good move there. Now around the ecosystem. Who else are you partnering with? what else do you see out there? who is in your world that is important? >> You know again our friends at IBM, that we've had a long relationship with them. We are doing a lot of work with IBM to integrate, data plane and also ICPD, which is the IBM Cloud plane for data, which brings along all of the IBM ecosystem. Whether it's DBT or IGC information governance catalogs, all that kind of were back in this world. What we also believe this will give a flip to is the whole continued standardization of security and governance. So you guys remember the old dpi, it caused a bit of a flutter, a few years ago. (anxious laughing) >> We know how that turned out. >> What we did was we kind of said, old DPI was based on the old distributions, now it's DPI's turn to be more about merit and governance. So we are collaborating with IBM on DPI more on merit and governance, because again we see that as being very critical in this sort of multi-Cloud, on prem edge world. >> Well the narrative, was always why do you need it, but it's clear that these three companies have succeeded dramatically, when you look at the financials, there has been statements made about IBM's contribution to seven figure deals to you guys. We had Redhat on and you guys are birds of a feather. [Murhty] Exactly. >> It certainly worked for you three, which presumably means it confers value to your customers. >> Which is really important, right from a customer standpoint, what is something we really focus on is that the benefit of the bargain is that now they understand that some of their key vendor partners that's us and Ibm and Redhat, we have a shared roadmap so now they can be much more sure about the fact that they can go to containers and kubernetes and so on and so on. Because all of the tools that they depend on are and all the partners they depend on are working together. >> So they can place bets. >> So they can place bets, and the important thing is that they can place longer term bets. Not a quarter bet, we hear about customers talking about building the next gen data centers, with kubernetes in mind. >> They have too. >> They have too, right and it's more than just building machines up, because what happens is with this world we talked about things like networking the way you do networking in this world with kubernetes, is different than you do before. So now they have to place longer term bets and they can do this now with the guarantee that the three of us will work together to deliver on the architecture. >> Well Arun, great to have you on the Cube, great to see you, final question for you, as you guys have a good long plan which is very cool. Short term customers are realizing, the set-up phase is over, okay now they're in usage mode. So the data has got to deliver value, so there is a real pressure for ROI, we would give people a little bit of a pass earlier on because set-up everything, set-up the data legs, do all this stuff, get it all operationalized, but now, with the AI and the machine learning front and center that's a signal that people want to start putting this to work. What have you seen customers gravitate to from the product side? Where are they going, is it the streaming is it the Kafka, is it the, what products are they gravitating to? >> Yeah definitely, I look at these in my role, in terms of use cases, right, we are certainly seeing a continued push towards the real-time analytics space. Which is why we place a longer-term bet on HDF and Kafka and so on. What's been really heartening kind of back to your sentiment, is we are seeing a lot of push right now on security garments. That's why we introduced for GDPR, we introduced a bunch of cable readies and data plane, with DSS and James Cornelius wrote about this earlier in the year, we are seeing customers really push us for key aspects like GDPR. This is a reflection for me of the fact of the maturing of the ecosystem, it means that it's no longer something on the side that you play with, it's something that's more, the whole ecosystem is now more a system of record instead of a system of augmentation, so that is really heartening but also brings a sharper focus and more sort of responsibility on our shoulders. >> Awesome, well congratulations, you guys have stock prices at a 52-week high. Congratulations. >> Those things take care of themselves. >> Good products, and stock prices take care of themselves. >> Okay the Cube coverage here in New York City, I'm John Vellante, stay with us for more live coverage all things data happening here in New York City. We will be right back after this short break. (digital beat)
SUMMARY :
brought to you by SiliconAngle Media at Yahoo, already been on the Cube many times, And now, all the conversations on what you guys a couple of hours ago, he calls it sort of the phase one so you don't have to worry about how you set up IMRL's on was we wanted to take that step further we've been able In a connected manner, the issue we saw early on on the assumptions we have in Hadoop. So talk about the Cloud native architecture. it more sort of management around the core. evolution in the architecture, to make them work in idea was always that we have to get data plan in to for what you guys are doing. And the really cool part for me is that, we will definitely What's interesting about the Redhat relationship is that Now around the ecosystem. So you guys remember the old dpi, it caused a bit of a So we are collaborating with IBM on DPI more on merit and Well the narrative, was always why do you need it, but It certainly worked for you three, which presumably be much more sure about the fact that they can go to building the next gen data centers, with kubernetes in mind. So now they have to place longer term bets and they So the data has got to deliver value, so there is a on the side that you play with, it's something that's Awesome, well congratulations, you guys have stock Okay the Cube coverage here in New York City,
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave Vellante | PERSON | 0.99+ |
Arun Murthy | PERSON | 0.99+ |
Rob | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
2005 | DATE | 0.99+ |
John Vellante | PERSON | 0.99+ |
John Furrier | PERSON | 0.99+ |
Redhat | ORGANIZATION | 0.99+ |
Yahoo | ORGANIZATION | 0.99+ |
30 animals | QUANTITY | 0.99+ |
SiliconAngle Media | ORGANIZATION | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
AKS | ORGANIZATION | 0.99+ |
New York City | LOCATION | 0.99+ |
second | QUANTITY | 0.99+ |
52-week | QUANTITY | 0.99+ |
James Cornelius | PERSON | 0.99+ |
ORGANIZATION | 0.99+ | |
Microsoft | ORGANIZATION | 0.99+ |
Hortonworks | ORGANIZATION | 0.99+ |
New York | LOCATION | 0.99+ |
three | QUANTITY | 0.99+ |
YARN | ORGANIZATION | 0.99+ |
28 animals | QUANTITY | 0.99+ |
one hundred | QUANTITY | 0.99+ |
Fargate | ORGANIZATION | 0.99+ |
two worlds | QUANTITY | 0.99+ |
GDPR | TITLE | 0.99+ |
2006 | DATE | 0.99+ |
Arun | PERSON | 0.99+ |
three companies | QUANTITY | 0.99+ |
one hundred gigabits | QUANTITY | 0.99+ |
eight gigs | QUANTITY | 0.99+ |
this week | DATE | 0.99+ |
two big players | QUANTITY | 0.99+ |
Hadoop | TITLE | 0.98+ |
first | QUANTITY | 0.98+ |
Spark | TITLE | 0.98+ |
GKE | ORGANIZATION | 0.98+ |
Kafka | TITLE | 0.98+ |
both | QUANTITY | 0.98+ |
Kubernetes | TITLE | 0.98+ |
each | QUANTITY | 0.97+ |
today | DATE | 0.97+ |
NYC | LOCATION | 0.97+ |
three years ago | DATE | 0.97+ |
Cloud | TITLE | 0.97+ |
Charlotte | LOCATION | 0.96+ |
seven figure | QUANTITY | 0.96+ |
DSS | ORGANIZATION | 0.96+ |
EC2 | TITLE | 0.95+ |
S3 | TITLE | 0.95+ |
Cube | COMMERCIAL_ITEM | 0.94+ |
Cube | ORGANIZATION | 0.92+ |
Murhty | PERSON | 0.88+ |
2000 | QUANTITY | 0.88+ |
few years ago | DATE | 0.87+ |
couple of hours ago | DATE | 0.87+ |
Ecosystem | ORGANIZATION | 0.86+ |
Ibm | PERSON | 0.85+ |
Byron Banks, SAP Analytics | theCUBE NYC 2018
>> Live from New York, it's theCUBE covering theCUBE New York City 2018. Brought to you by SiliconANGLE Media and its ecosystem partners. (techy music) >> Hey, welcome back, everyone. It's theCUBE live in New York City for CUBENYC, formerly Big Data NYC. Now it's turned from big data into a much broader conversation. CUBENYC is exploring all these around data, data intelligence, cloud computing, devops, application developers, data centers, the whole range, all things data. I'm John Furrier here with Peter Burris, cohost and analyst here on the session. Our next guest is Byron Banks, who's the vice president of product marketing at SAP Analytics. No stranger to enterprise analytics. Welcome to theCUBE, thanks for joining us. >> Thank you for having us. >> So, SAP is, you know, a brand that's been doing business analytics for a long, long time, certainly powering-- >> Mm-hm, sure. >> The software for larger enterprises. Supply chain, you name it-- >> Sure. >> ERP, everyone kind of knows the history of SAP, but you guys really have been involved in analytics. HANA's been tailor-made for some speed. We've been covering that, but now as the world turns into a cloud native-- >> Mm-hm. >> SAP has a global cloud platform that is multi-cloud driven you guys kind of see this picture of a horizontally scalable computing environment. Analytics is a big, big piece of that, so what's going on with machine learning and AI, and as analytical software and infrastructure need to be provisioned dynamically. >> Sure, sure. >> This is an opportunity for people who love to get into the data. >> Absolutely. >> This is a great opportunity. What's the uptake? >> Great opportunity for us. We firmly believe that the era of optimization and digitization is over. It's not enough, it's certainly important. It has given a lot of benefits, but just overwhelming every user, every customer with more data, more optimization, faster data, better data, it's not enough. So, we believe that the concept to switch to intelligence, so how do you make customers, how do you serve customers exactly what they need in the moment? How do you give them an offer that is relevant? Not spam them, give them a great offer. How do you motivate your employees to be the best at what they do, whether it's in HR or whether it's in sales, and we think technology's key to that, but at the end of the day, the customer, the organization is the driver. They are the driver, they know their business best, so what we want to do is be the pit crew, if you will, to use a racing analogy, if they're the driver of the race car we want to bring the technology to them with some best practices and advice, because again, we're SAP, we've been in the business for 45 years, so we have a very good perspective of what works based on the companies we see, and serve over 300,000 of them, but it's really enabling them to be their best, and the customers that are doing the best, we call those intelligent enterprises, and that means three components. It needs intelligent applications, what we call the intelligent suite. So, how do we make an HR application that is great at retaining the best employees and also attracting great ones? How do we enable a sales system to give the best offers and do the best forecasts? So, all of that is the intelligent applications. The middle layer for that is called intelligent technologies. So, how do we use these great technologies that we've been developing as an industry over the last three to five years? Things like big data, IoT, sensors, machine learning, and analytics. That intelligent technology layer, how do we make that available, and then finally, it's the digital core, the digital platform for that. So, how do we have this scalable platform, ideally in the cloud, that can pull data from both cloud sources, SAP sources, non-SAP sources, and give the right data to those applications-- >> Yeah. >> And technologies in realtime. >> I love the pit crew example of the race car on the track, because you want to get as much data in the system as possible because more data is, you know, more opportunities to understand and get insights, but at the end of the day, you want to make sure that the car not only runs well on the track, (chuckles) and is cost effective, but it's performing. It actually wins the race or stays in the race. So, customers want revenue, I mean, the big thing we're hearing is, "Okay, let's get some top line benefit, not just "good cost effectiveness." >> Right, right. >> So, the objective of the customer, and whatever, that can be applications, it could be, you know, insight into operational efficiency. The revenue piece of growth is a big part of the growth strategy-- >> Right. >> For companies to have a data-centric system. >> Absolutely. >> This is part of the intelligence. >> But it's not just presenting the data. We introduced a product a couple of years ago, and I promise this isn't going to be a marketing pitch, (chuckles) but I think it's very relevant to what you just said. So, the SAP Analytics Cloud, that's one of those technologies I talked about, intelligent technologies. So, it is modern, built from the ground for SAS applications, cloud-based, built on the SAP cloud platform, and it has three major components. It has planning, so what are my KPIs? If I'm in HR am I recruiting talent or am I retraining talent? What are my KPIs if I'm in sales? Am I trying to drive profitability or am I trying to track new customers? And if I'm in, you know, again, in marketing how effective are we on campaigns? Tied to that is all the data visualization we can do so that we can mix and match data to discover new insights about our business, make it very, very easy, again, to connect with both SAP and non-SAP sources, and then provide the machine learning capabilities. All of that predictive capability, so not just looking at what happened in the past, I'm also looking at what's likely to happen in the next week, and the key point to all of that is when you open the application and start, the first thing it asks you is, "What are you trying to do? "What is the business problem you're trying to solve?" It's a story, so it's designed from the get-go to be very business outcome focused, not just show you 50 different data sources or 100 different data sources and then leave it to you to figure out what you should be doing. >> Yeah. >> So, it is designed to be very much a business outcome driven environment, so that, again, people like me, a marketer, can logon to that product and immediately start to work in campaigns-- >> Yeah. >> And in the language that I want to work in, not in IT speak or geek speak. Nothing wrong with geek speak, but again-- >> Yeah, I want to get into a conversation, because one of the things, we're very data driven as a media company because we have data that's out there, consumption data, but some platforms don't have measurement capability, like LinkedIn doesn't finance any analytics. >> Sure. >> So, this data that's out there that I need, I want, that might be available down the road, but not today, so I want to get to that conversation around, okay, you can measure what you're looking at, so everything that's measurable you've got dashboards for, but-- >> Sure. >> There's some elusive gaps between what's available that could help the data model. These are future data sets, or things that aren't yet instrumented properly. >> Correct. >> As new technology comes in with cloud native the need for instrumentation's critical. How do you guys think about that from a product standpoint, because you know, customers aren't going to say, "Well, create a magic linkage between something "that doesn't exist yet," but soon data will be existing. You know, for instance, network effect or other things that might be important for people that aren't yet measurable but might be in the future. >> Sure. >> They want to be set up for that, they don't want to foreclose that. >> Sure, well and I think one of the balances we have as SAP, because we're a technology company and we built a lot of great tools, but we also work a lot with our customers around business processes, so as I said, when we introduce our products we don't want to give them just a black box, which is a bunch of feeds and speeds technologies-- >> Yeah. >> That they need to figure it out. As we see patterns in our customers, we build an end-to-end process that is analytics driven and we provide that back to our customers to give them a headstart, but we have to have all of the capabilities in our solutions that allow them to build and extend in any way possible, because again, at the end of the day, they have a very unique business, but we want to give them a jumping off point so that they're not just staring at a blank screen. It's kind of like writing a speech. You don't want to start with just a blank screen. If you're in sales and marketing and you want to do a sales forecast, we will provide out-of-the-box, what we call embedded analytics, a fully complete dashboard that will take them through a guided workflow that says, "Hey, you want to do a sales forecast. "Here's the data we think you want to pull, "do you want to pull that? "Here's some additional inference we've seen "from some of our machine learning algorithms "based on what has happened in the last six weeks "of selling and make a projection as to what "we expect will happen between now and next quarter." >> You get people started quickly, that's the whole goal. Get people started quickly. >> Exactly, but we don't lock them into only doing it the one way, the right way. We're not preaching >> Yeah. >> We want to give them the flexibility. >> But this is an important point, because every, almost every decision at some point in time comes back to finance. >> Sure. >> And so, being able to extend your ability to learn something about data and act on data as measurements improve, you still want to be able to bring it back to what it means from a return standpoint, and that requires some agreement, not just some, a lot of agreement-- >> Sure. >> With a core financial system, and I think that this could be one of the big opportunities that you guys have, is because knowing a lot about how the data works, where it is, sustaining that so that the transactional integrity remains the same but you can review it through a lot of different analytics systems-- >> Right. >> Is a crucial element of this, would you agree? >> I fully agree, and I think if you look at the analytics cloud that I talked about, the very first solution capability we built into it was planning. What are my KPIs that I'm trying to measure? Now, yes, of course if you're in a business it all turns into dollars or euros at the end of the end of the day, but customer satisfaction, employee engagement, all of those things are incredibly important, so I do believe there is a way to put measurements, not always at a dollar value, that are important for what you're trying to do, because it will ultimately translate into dollars down the road. >> Right, and I want to get the news. You guys have some hard news here in New York this week on your analytics and the stuff you're working on. What's the hard news? >> Absolutely. Absolutely, so today we announced a bunch of updates to our analytics cloud platform. We've had it around for three or four years, thousands of customers, a lot of great innovation, and what we were doing today, what we announced today, is the update since our SAPPHIRE, our big, annual conference in June this year, so we have built a number of machine learning capabilities that, again, speak in the language of the business user, give them the tools that allow them to quickly benefit from things like correlations, things like regressions, patterns we've seen in the data to guide them through a process where they can do forecasting, retainment, recruiting, maybe even looking for bias, and unintended bias, in things like campaigns or marketing campaigns. Give them a guided approach to that, speaking in their terms, using very natural language processing, so for example, we have things like Smart Insights where you can ask questions about, "Give me the sales forecast for Japan," and you can say it, just type it that way and the analytic platform will start to construct and guide you through it, and it will build all the queries, it will give you, again, you're still in control, but it's a very guided process-- >> Yep. >> That says, "Do you want to run a forecast? "Here's how we recommend a forecast. "Here are some variables we find very, very interesting." That says, "Oh, in Japan this product sold "really well two quarters ago, "but it's not selling well this quarter." Maybe there's been a competitive action, maybe we need to look at pricing, maybe we need to retrain the sales organization. So, it's giving them information, again, in a very guided business focus, and I think that's the key thing. Like data scientists, we love them. We want to use them in a lot of places, but can't have data scientists involved in every single analytic that you're trying to do. >> Yeah. >> There are just not enough in the world. >> I mean, I love the conversation, because this exact conversation goes down the road of devops-like conversation. >> Right. >> Automation, agility, these are themes that we're talking about in cloud platforms, (chuckles) say data analytics. >> Absolutely. >> So, now you're bringing data down. Hey, we're automating things, so it could look like a Siri or voice activated construct for interaction. >> Yeah, absolutely, and in their language, again, in the language that the end user wants to speak, and it doesn't take the human out of it. It's actually making them better, right? We want to automate things and give recommendations so that you can automate things. >> Yeah. >> A great example is like invoice matching. We have customers that use, you know, spent hundreds of people, thousands of hours doing invoice matching because the address wouldn't line up or the purchase order had a transposed number in it, but using machine learning-- >> Yeah, yeah. >> Or using algorithms, we can automate all of that or go, "Hey, here's a pattern we see." >> Yeah. >> "Do you want us to automate "this matching process for you?" And customers that have-- >> Yeah. >> Implemented, they've found 70% of the transactions could be automated. >> I think you're right on, I personally believe that humans are more valuable, certainly in the media business that people think is, you know, sliding down, but humans, huge role. Now, data and automation can surface and create value that humans can curate on top of, so same with data. The human role is pretty critical in this because the synthesis is being helped by the computers, but the job's not going away, it's just shortcutting to the truth. >> And I think if you do it right machine learning can actually train the users on the job. >> Yeah. >> I think about myself and I think about unintended bias, right, and you look at a resume that you put out or a job posting, if you use the term I want somebody to lead a team, you will get a demographic profile of the people that apply to that job. If you use the term build a team, you'll get a different demographic profile, so I'm not saying one's better or the other, but me as a hiring manager, I'm not aware of that. I'm not totally on top of that, but if the tool is providing me information saying, "Hey, we've seen these keywords "in your marketing campaign," or in your recruiting, or even in your customer support and the way you speak with your customers, and it's starting to see patterns, just saying, "Hey, by the way, "we know that if you use these kinds of terms "it's more likely to get this kind of a response." That helps me become a better marketer. >> Yeah. >> Or be more appropriate in the way I engage with my customers. >> So, it assists you, it's your pit crew example, it's efficiency, all kind of betterment. >> Absolutely. >> Byron, thanks for coming on theCUBE, appreciate the time, coming to share and the insights on SAP's news and your vision on analytics. Thanks for coming on, appreciate it. It's theCUBE live in New York City for CUBENYC. I'm John Furrier with Peter Burris. Stay with us, day one continues. We're here for two days, all things data here in New York City. Stay with us, we'll be right back. (techy music)
SUMMARY :
Brought to you by SiliconANGLE Media cohost and analyst here on the session. Supply chain, you name it-- ERP, everyone kind of knows the history of SAP, you guys kind of see this picture of a This is an opportunity for people What's the uptake? So, all of that is the intelligent applications. but at the end of the day, you want to make sure So, the objective of the customer, and the key point to all of that is And in the language that I want to work in, because one of the things, we're very data driven available that could help the data model. the need for instrumentation's critical. they don't want to foreclose that. "Here's the data we think you want to pull, You get people started quickly, that's the whole goal. doing it the one way, the right way. at some point in time comes back to finance. at the end of the end of the day, What's the hard news? and the analytic platform will start to construct That says, "Do you want to run a forecast? I mean, I love the conversation, because this Automation, agility, these are themes that we're So, now you're bringing data down. and it doesn't take the human out of it. We have customers that use, you know, Or using algorithms, we can automate all of that the transactions could be automated. certainly in the media business that people think the users on the job. of the people that apply to that job. the way I engage with my customers. So, it assists you, it's your pit crew example, appreciate the time, coming to share and the insights
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Peter Burris | PERSON | 0.99+ |
70% | QUANTITY | 0.99+ |
John Furrier | PERSON | 0.99+ |
Japan | LOCATION | 0.99+ |
New York | LOCATION | 0.99+ |
two days | QUANTITY | 0.99+ |
SiliconANGLE Media | ORGANIZATION | 0.99+ |
Byron | PERSON | 0.99+ |
Siri | TITLE | 0.99+ |
45 years | QUANTITY | 0.99+ |
New York City | LOCATION | 0.99+ |
ORGANIZATION | 0.99+ | |
50 different data sources | QUANTITY | 0.99+ |
four years | QUANTITY | 0.99+ |
first | QUANTITY | 0.99+ |
100 different data sources | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
SAP Analytics | ORGANIZATION | 0.99+ |
three | QUANTITY | 0.99+ |
two quarters ago | DATE | 0.99+ |
both | QUANTITY | 0.99+ |
SAP | ORGANIZATION | 0.99+ |
over 300,000 | QUANTITY | 0.98+ |
next week | DATE | 0.98+ |
next quarter | DATE | 0.98+ |
HANA | TITLE | 0.98+ |
thousands of hours | QUANTITY | 0.98+ |
June this year | DATE | 0.96+ |
this week | DATE | 0.96+ |
one | QUANTITY | 0.95+ |
single | QUANTITY | 0.95+ |
hundreds of people | QUANTITY | 0.95+ |
NYC | LOCATION | 0.94+ |
Byron Banks | PERSON | 0.92+ |
one way | QUANTITY | 0.92+ |
this quarter | DATE | 0.91+ |
theCUBE | ORGANIZATION | 0.91+ |
SAP Analytics Cloud | TITLE | 0.89+ |
couple of years ago | DATE | 0.87+ |
Big Data | ORGANIZATION | 0.87+ |
thousands of customers | QUANTITY | 0.87+ |
CUBENYC | ORGANIZATION | 0.85+ |
day one | QUANTITY | 0.84+ |
first solution | QUANTITY | 0.83+ |
last six weeks | DATE | 0.81+ |
euros | OTHER | 0.81+ |
five years | QUANTITY | 0.81+ |
CUBENYC | LOCATION | 0.74+ |
SAS | ORGANIZATION | 0.7+ |
Insights | TITLE | 0.64+ |
techy music | ORGANIZATION | 0.6+ |
customer | QUANTITY | 0.6+ |
SAPPHIRE | ORGANIZATION | 0.59+ |
Banks | ORGANIZATION | 0.59+ |
2018 | DATE | 0.58+ |
2018 | EVENT | 0.57+ |
every | QUANTITY | 0.57+ |
techy | PERSON | 0.5+ |
last | DATE | 0.48+ |
Kickoff | theCUBE NYC 2018
>> Live from New York, it's theCUBE covering theCUBE New York City 2018. Brought to you by SiliconANGLE Media and its ecosystem partners. (techy music) >> Hello, everyone, welcome to this CUBE special presentation here in New York City for CUBENYC. I'm John Furrier with Dave Vellante. This is our ninth year covering the big data industry, starting with Hadoop World and evolved over the years. This is our ninth year, Dave. We've been covering Hadoop World, Hadoop Summit, Strata Conference, Strata Hadoop. Now it's called Strata Data, I don't know what Strata O'Reilly's going to call it next. As you all know, theCUBE has been present for the creation at the Hadoop big data ecosystem. We're here for our ninth year, certainly a lot's changed. AI's the center of the conversation, and certainly we've seen some horses come in, some haven't come in, and trends have emerged, some gone away, your thoughts. Nine years covering big data. >> Well, John, I remember fondly, vividly, the call that I got. I was in Dallas at a storage networking world show and you called and said, "Hey, we're doing "Hadoop World, get over there," and of course, Hadoop, big data, was the new, hot thing. I told everybody, "I'm leaving." Most of the people said, "What's Hadoop?" Right, so we came, we started covering, it was people like Jeff Hammerbacher, Amr Awadallah, Doug Cutting, who invented Hadoop, Mike Olson, you know, head of Cloudera at the time, and people like Abi Mehda, who at the time was at B of A, and some of the things we learned then that were profound-- >> Yeah. >> As much as Hadoop is sort of on the back burner now and people really aren't talking about it, some of the things that are profound about Hadoop, really, were the idea, the notion of bringing five megabytes of code to a petabyte of data, for example, or the notion of no schema on write. You know, put it into the database and then figure it out. >> Unstructured data. >> Right. >> Object storage. >> And so, that created a state of innovation, of funding. We were talking last night about, you know, many, many years ago at this event this time of the year, concurrent with Strata you would have VCs all over the place. There really aren't a lot of VCs here this year, not a lot of VC parties-- >> Mm-hm. >> As there used to be, so that somewhat waned, but some of the things that we talked about back then, we said that big money and big data is going to be made by the practitioners, not by the vendors, and that's proved true. I mean... >> Yeah. >> The big three Hadoop distro vendors, Cloudera, Hortonworks, and MapR, you know, Cloudera's $2.5 billion valuation, you know, not bad, but it's not a $30, $40 billion value company. The other thing we said is there will be no Red Hat of big data. You said, "Well, the only Red Hat of big data might be "Red Hat," and so, (chuckles) that's basically proved true. >> Yeah. >> And so, I think if we look back we always talked about Hadoop and big data being a reduction, the ROI was a reduction on investment. >> Yeah. >> It was a way to have a cheaper data warehouse, and that's essentially-- Well, what did we get right and wrong? I mean, let's look at some of the trends. I mean, first of all, I think we got pretty much everything right, as you know. We tend to make the calls pretty accurately with theCUBE. Got a lot of data, we look, we have the analytics in our own system, plus we have the research team digging in, so you know, we pretty much get, do a good job. I think one thing that we predicted was that Hadoop certainly would change the game, and that did. We also predicted that there wouldn't be a Red Hat for Hadoop, that was a production. The other prediction was is that we said Hadoop won't kill data warehouses, it didn't, and then data lakes came along. You know my position on data lakes. >> Yeah. >> I've always hated the term. I always liked data ocean because I think it was much more fluidity of the data, so I think we got that one right and data lakes still doesn't look like it's going to be panning out well. I mean, most people that deploy data lakes, it's really either not a core thing or as part of something else and it's turning into a data swamp, so I think the data lake piece is not panning out the way it, people thought it would be. I think one thing we did get right, also, is that data would be the center of the value proposition, and it continues and remains to be, and I think we're seeing that now, and we said data's the development kit back in 2010 when we said data's going to be part of programming. >> Some of the other things, our early data, and we went out and we talked to a lot of practitioners who are the, it was hard to find in the early days. They were just a select few, I mean, other than inside of Google and Yahoo! But what they told us is that things like SQL and the enterprise data warehouse were key components on their big data strategy, so to your point, you know, it wasn't going to kill the EDW, but it was going to surround it. The other thing we called was cloud. Four years ago our data showed clearly that much of this work, the modeling, the big data wrangling, et cetera, was being done in the cloud, and Cloudera, Hortonworks, and MapR, none of them at the time really had a cloud strategy. Today that's all they're talking about is cloud and hybrid cloud. >> Well, it's interesting, I think it was like four years ago, I think, Dave, when we actually were riffing on the notion of, you know, Cloudera's name. It's called Cloudera, you know. If you spell it out, in Cloudera we're in a cloud era, and I think we were very aggressive at that point. I think Amr Awadallah even made a comment on Twitter. He was like, "I don't understand "where you guys are coming from." We were actually saying at the time that Cloudera should actually leverage more cloud at that time, and they didn't. They stayed on their IPO track and they had to because they had everything betted on Impala and this data model that they had and being the business model, and then they went public, but I think clearly cloud is now part of Cloudera's story, and I think that's a good call, and it's not too late for them. It never was too late, but you know, Cloudera has executed. I mean, if you look at what's happened with Cloudera, they were the only game in town. When we started theCUBE we were in their office, as most people know in this industry, that we were there with Cloudera when they had like 17 employees. I thought Cloudera was going to run the table, but then what happened was Hortonworks came out of the Yahoo! That, I think, changed the game and I think in that competitive battle between Hortonworks and Cloudera, in my opinion, changed the industry, because if Hortonworks did not come out of Yahoo! Cloudera would've had an uncontested run. I think the landscape of the ecosystem would look completely different had Hortonworks not competed, because you think about, Dave, they had that competitive battle for years. The Hortonworks-Cloudera battle, and I think it changed the industry. I think it couldn't been a different outcome. If Hortonworks wasn't there, I think Cloudera probably would've taken Hadoop and making it so much more, and I think they wouldn't gotten more done. >> Yeah, and I think the other point we have to make here is complexity really hurt the Hadoop ecosystem, and it was just bespoke, new projects coming out all the time, and you had Cloudera, Hortonworks, and maybe to a lesser extent MapR, doing a lot of the heavy lifting, particularly, you know, Hortonworks and Cloudera. They had to invest a lot of their R&D in making these systems work and integrating them, and you know, complexity just really broke the back of the Hadoop ecosystem, and so then Spark came in, everybody said, "Oh, Spark's going to basically replace Hadoop." You know, yes and no, the people who got Hadoop right, you know, embraced it and they still use it. Spark definitely simplified things, but now the conversation has turned to AI, John. So, I got to ask you, I'm going to use your line on you in kind of the ask-me-anything segment here. AI, is it same wine, new bottle, or is it really substantively different in your opinion? >> I think it's substantively different. I don't think it's the same wine in a new bottle. I'll tell you... Well, it's kind of, it's like the bad wine... (laughs) Is going to be kind of blended in with the good wine, which is now AI. If you look at this industry, the big data industry, if you look at what O'Reilly did with this conference. I think O'Reilly really has not done a good job with the conference of big data. I think they blew it, I think that they made it a, you know, monetization, closed system when the big data business could've been all about AI in a much deeper way. I think AI is subordinate to cloud, and you mentioned cloud earlier. If you look at all the action within the AI segment, Diane Greene talking about it at Google Next, Amazon, AI is a software layer substrate that will be underpinned by the cloud. Cloud will drive more action, you need more compute, that drives more data, more data drives the machine learning, machine learning drives the AI, so I think AI is always going to be dependent upon cloud ends or some sort of high compute resource base, and all the cloud analytics are feeding into these AI models, so I think cloud takes over AI, no doubt, and I think this whole ecosystem of big data gets subsumed under either an AWS, VMworld, Google, and Microsoft Cloud show, and then also I think specialization around data science is going to go off on its own. So, I think you're going to see the breakup of the big data industry as we know it today. Strata Hadoop, Strata Data Conference, that thing's going to crumble into multiple, fractured ecosystems. >> It's already starting to be forked. I think the other thing I want to say about Hadoop is that it actually brought such great awareness to the notion of data, putting data at the core of your company, data and data value, the ability to understand how data at least contributes to the monetization of your company. AI would not be possible without the data. Right, and we've talked about this before. You call it the innovation sandwich. The innovation sandwich, last decade, last three decades, has been Moore's law. The innovation sandwich going forward is data, machine intelligence applied to that data, and cloud for scale, and that's the sandwich of innovation over the next 10 to 20 years. >> Yeah, and I think data is everywhere, so this idea of being a categorical industry segment is a little bit off, I mean, although I know data warehouse is kind of its own category and you're seeing that, but I don't think it's like a Magic Quadrant anymore. Every quadrant has data. >> Mm-hm. >> So, I think data's fundamental, and I think that's why it's going to become a layer within a control plane of either cloud or some other system, I think. I think that's pretty clear, there's no, like, one. You can't buy big data, you can't buy AI. I think you can have AI, you know, things like TensorFlow, but it's going to be a completely... Every layer of the stack is going to be impacted by AI and data. >> And I think the big players are going to infuse their applications and their databases with machine intelligence. You're going to see this, you're certainly, you know, seeing it with IBM, the sort of Watson heavy lift. Clearly Google, Amazon, you know, Facebook, Alibaba, and Microsoft, they're infusing AI throughout their entire set of cloud services and applications and infrastructure, and I think that's good news for the practitioners. People aren't... Most companies aren't going to build their own AI, they're going to buy AI, and that's how they close the gap between the sort of data haves and the data have-nots, and again, I want to emphasize that the fundamental difference, to me anyway, is having data at the core. If you look at the top five companies in terms of market value, US companies, Facebook maybe not so much anymore because of the fake news, though Facebook will be back with it's two billion users, but Apple, Google, Facebook, Amazon, who am I... And Microsoft, those five have put data at the core and they're the most valuable companies in the stock market from a market cap standpoint, why? Because it's a recognition that that intangible value of the data is actually quite valuable, and even though banks and financial institutions are data companies, their data lives in silos. So, these five have put data at the center, surrounded it with human expertise, as opposed to having humans at the center and having data all over the place. So, how do they, how do these companies close the gap? How do the companies in the flyover states close the gap? The way they close the gap, in my view, is they buy technologies that have AI infused in it, and I think the last thing I'll say is I see cloud as the substrate, and AI, and blockchain and other services, as the automation layer on top of it. I think that's going to be the big tailwind for innovation over the next decade. >> Yeah, and obviously the theme of machine learning drives a lot of the conversations here, and that's essentially never going to go away. Machine learning is the core of AI, and I would argue that AI truly doesn't even exist yet. It's machine learning really driving the value, but to put a validation on the fact that cloud is going to be driving AI business is some of the terms in popular conversations we're hearing here in New York around this event and topic, CUBENYC and Strata Conference, is you're hearing Kubernetes and blockchain, and you know, these automation, AI operation kind of conversations. That's an IT conversation, (chuckles) so you know, that's interesting. You've got IT, really, with storage. You've got to store the data, so you can't not talk about workloads and how the data moves with workloads, so you're starting to see data and workloads kind of be tossed in the same conversation, that's a cloud conversation. That is all about multi-cloud. That's why you're seeing Kubernetes, a term I never thought I would be saying at a big data show, but Kubernetes is going to be key for moving workloads around, of which there's data involved. (chuckles) Instrumenting the workloads, data inside the workloads, data driving data. This is where AI and machine learning's going to play, so again, cloud subsumes AI, that's the story, and I think that's going to be the big trend. >> Well, and I think you're right, now. I mean, that's why you're hearing the messaging of hybrid cloud and from the big distro vendors, and the other thing is you're hearing from a lot of the no-SQL database guys, they're bringing ACID compliance, they're bringing enterprise-grade capability, so you're seeing the world is hybrid. You're seeing those two worlds come together, so... >> Their worlds, it's getting leveled in the playing field out there. It's all about enterprise, B2B, AI, cloud, and data. That's theCUBE bringing you the data here. New York City, CUBENYC, that's the hashtag. Stay with us for more coverage live in New York after this short break. (techy music)
SUMMARY :
Brought to you by SiliconANGLE Media for the creation at the Hadoop big data ecosystem. and some of the things we learned then some of the things that are profound about Hadoop, We were talking last night about, you know, but some of the things that we talked about back then, You said, "Well, the only Red Hat of big data might be being a reduction, the ROI was a reduction I mean, first of all, I think we got and I think we're seeing that now, and the enterprise data warehouse were key components and I think we were very aggressive at that point. Yeah, and I think the other point and all the cloud analytics are and cloud for scale, and that's the sandwich Yeah, and I think data is everywhere, and I think that's why it's going to become I think that's going to be the big tailwind and I think that's going to be the big trend. and the other thing is you're hearing New York City, CUBENYC, that's the hashtag.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Apple | ORGANIZATION | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Diane Greene | PERSON | 0.99+ |
ORGANIZATION | 0.99+ | |
ORGANIZATION | 0.99+ | |
John | PERSON | 0.99+ |
Alibaba | ORGANIZATION | 0.99+ |
Dave | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
Jeff Hammerbacher | PERSON | 0.99+ |
$30 | QUANTITY | 0.99+ |
New York | LOCATION | 0.99+ |
2010 | DATE | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Doug Cutting | PERSON | 0.99+ |
Mike Olson | PERSON | 0.99+ |
Hortonworks | ORGANIZATION | 0.99+ |
Dallas | LOCATION | 0.99+ |
O'Reilly | ORGANIZATION | 0.99+ |
Yahoo | ORGANIZATION | 0.99+ |
Cloudera | ORGANIZATION | 0.99+ |
five | QUANTITY | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
Abi Mehda | PERSON | 0.99+ |
John Furrier | PERSON | 0.99+ |
New York City | LOCATION | 0.99+ |
$2.5 billion | QUANTITY | 0.99+ |
SiliconANGLE Media | ORGANIZATION | 0.99+ |
MapR | ORGANIZATION | 0.99+ |
Amr Awadallah | PERSON | 0.99+ |
$40 billion | QUANTITY | 0.99+ |
17 employees | QUANTITY | 0.99+ |
VMworld | ORGANIZATION | 0.99+ |
Today | DATE | 0.99+ |
Impala | ORGANIZATION | 0.99+ |
Nine years | QUANTITY | 0.99+ |
four years ago | DATE | 0.98+ |
last night | DATE | 0.98+ |
last decade | DATE | 0.98+ |
Strata Data Conference | EVENT | 0.98+ |
Strata Conference | EVENT | 0.98+ |
Hadoop Summit | EVENT | 0.98+ |
ninth year | QUANTITY | 0.98+ |
Four years ago | DATE | 0.98+ |
two worlds | QUANTITY | 0.97+ |
five companies | QUANTITY | 0.97+ |
today | DATE | 0.97+ |
Strata Hadoop | EVENT | 0.97+ |
Hadoop World | EVENT | 0.96+ |
CUBE | ORGANIZATION | 0.96+ |
Google Next | ORGANIZATION | 0.95+ |
ORGANIZATION | 0.95+ | |
this year | DATE | 0.95+ |
Spark | ORGANIZATION | 0.95+ |
US | LOCATION | 0.94+ |
CUBENYC | EVENT | 0.94+ |
Strata O'Reilly | ORGANIZATION | 0.93+ |
next decade | DATE | 0.93+ |