Image Title

Search Results for Jim Kobielus:

Jim Kobielus | Action Item Quick Take - March 30, 2018


 

>> Hi, I'm Peter Burris, and welcome to a Wikibon Action Item Quick Take. Jim Kobielus, lots going on in the world of AI and storage. If we think about what happened in storage over the years, it used to be for disc space, get data into a persistent state, and for some of the flash base, it's get data out faster. What happened this week between Pure and NVIDIA to make it easier to get data out faster, especially for AI applications? >> Yeah Peter, this week at NVIDIA's annual conference, GPU technology conference, they announced a partnership with Pure Storage. In fact they released a jointly developed product called AIRI...A-I-R-I standing for AI Ready Infrastructure. What's significant about AIRI is that it is a... Well, I'll tell you years ago, I'm showing my age there was this constant well of data warehousing appliance, a pre-bundled, pre-integrated assembly of storage and compute and software for specific workloads. Though, I wouldn't use the term appliance here, it's a similar concept. In the AI space, there's a need for pre-integrated storage and compute devices...racks...for training workloads and other core, very compute and very data-intensive workloads for AI And that's what the Pure Storage NVIDIA AIRI is all about. It includes Pure Storage's Flashblade storage technology, plus four NVIDIA DCX supercomputers that are running the latest GPUs, the Tesla V100. As well as providing a fast interconnect of NVIDIA's. Plus, also bundling software, NVIDIA's AI frame was from modeling, there's a management tool from Pure Storage. What this is, this is a harbinger of what we expect, and Wikibon will be a broader range from these vendors and others of pre-built optimized AI storage products for premises based deployment, for hyperquads, really for complex AI pipelines involving data... Scientist data, engineers and others. We're very excited about this particular product, we think it has great potential and we believe there's a lot of pent-up demand for these kinds of pre-built hardware products. And that, in many ways, was by far the most significant story in the AI space this week. >> All right, so this has been...thanks very much for that Jim. So, more to come, moving more compute closer to the data. Part of a bigger trend. This has been a Wikibon Action Item Quick Take. >> (smooth techno music)

Published Date : Mar 30 2018

SUMMARY :

What happened this week story in the AI space this week. All right, so this has been...thanks very much

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Jim KobielusPERSON

0.99+

NVIDIAORGANIZATION

0.99+

Peter BurrisPERSON

0.99+

March 30, 2018DATE

0.99+

JimPERSON

0.99+

PeterPERSON

0.99+

PureORGANIZATION

0.98+

this weekDATE

0.97+

WikibonORGANIZATION

0.97+

Pure StorageORGANIZATION

0.97+

DCXCOMMERCIAL_ITEM

0.95+

yearsDATE

0.93+

AIRIORGANIZATION

0.84+

V100COMMERCIAL_ITEM

0.8+

Pure StorageCOMMERCIAL_ITEM

0.6+

TeslaORGANIZATION

0.57+

AIRITITLE

0.56+

Action Item Quick Take | Jim Kobielus - Mar 2018


 

(Upbeat music) (Coughs) >> Hi, I'm Peter Burris with another Wikibooks action item quick take. Jim Kobielus, IBM's up to some good with new tooling for managing data. What's going on? >> Yes Peter, it's not brand new tooling but its important because it actually is a foreshadowing of what's going to be universal. I think it's a capability for programming the uni grade as we've been discussing. Essentially this week at the IBM Signature event Sam Whitestone of IBM discussed with Dave Valente a product they have called Queryplex which is on the market for money even more. Essentially it's a data virtualization environment for distributor query processing in a mesh fabric. And what's important about Queryplex to understand, in a uni grade context, is it enables link binding distributed computation to find the lowest latency path between... Across very fairly complex edge clouds. So to speed up queries no matter where the data may reside and so forth in a fairly real time dynamic fashion. So I think the important things to know about Queryplex are A- that it prioritizes connections with lowest latency based on ongoing computations that are performed and is able to distribute this computation to find the lowest path across the network to prevent the query... The computation controller from being a bottle neck. I think that's a fundamental, architectural capability we're going to see more of with the advent or the growth of the uni grade as a broad concept for building up a distributor cloud computing environment. >> And very importantly there are still a lot of applications that run the businesses on top of IBM machines. Jim Kabielus thanks very much talking about IBM Queryplex and some of the next steps coming. This is Peter Burris with another Wikibooks action item quick take. (upbeat music)

Published Date : Mar 2 2018

SUMMARY :

Hi, I'm Peter Burris with this computation to find the lowest path a lot of applications that run

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Jim KobielusPERSON

0.99+

Jim KabielusPERSON

0.99+

Sam WhitestonePERSON

0.99+

Dave ValentePERSON

0.99+

Peter BurrisPERSON

0.99+

IBMORGANIZATION

0.99+

Mar 2018DATE

0.99+

PeterPERSON

0.99+

this weekDATE

0.96+

IBM SignatureEVENT

0.93+

WikibooksORGANIZATION

0.69+

QueryplexTITLE

0.59+

WikibooksTITLE

0.55+

Action Item Quick Take | Jim Kobielus - Feb 2018


 

(upbeat music) >> Hi, this is Peter Burris with another Wikibon Action Item Quick Take. Jim Kobielus, where are we with the next step in AI and deep learning? >> Yeah, there's a big to-do going on, a big buzz around reinforcement learning. I think that will become ever higher for working data scientists building applications for robotics and the analytics. And so what I'm encouraged by is the fact that there are open frameworks coming into being, development frameworks, for reinforcement and learning that are integrated to some degree with the investments companies are making on deep learning. In particular, a fair number of frameworks now support TensorFlow or integration with TensorFlow. So I advise our listeners, especially the developers, to look at and evaluate frameworks like TensorFlow Agents. Rate our LLB Global School, Machine Learning Agents in coach. These are not well-known yet, but I think these will become, at least one of these will become a standard component of the data scientist workbench for building the next generation of robotics and other applications that are used for adaptive control. >> Excellent, Jim. This is Peter Burris with another Wikibon Action Item Quick Take. (upbeat music)

Published Date : Feb 24 2018

SUMMARY :

Hi, this is Peter Burris with another Wikibon with the investments companies are making This is Peter Burris with another

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Jim KobielusPERSON

0.99+

Peter BurrisPERSON

0.99+

Feb 2018DATE

0.99+

JimPERSON

0.99+

LLB Global SchoolORGANIZATION

0.99+

TensorFlowTITLE

0.98+

oneQUANTITY

0.83+

WikibonORGANIZATION

0.8+

MachineORGANIZATION

0.63+

Breaking Analysis: VMworld 2019 Containers in Context


 

>> From the Silicon Angle Media Office, in Boston Massachusetts, it's theCUBE. Now, here's your host Dave Vellante. >> Hi everybody, welcome to this breaking analysis where we try to provide you some insights on theCUBE. My name is Dave Vellante. I'm here with Jim Kobielus who was up today, and Jim we were just off of the VMworld 2019. Big show, lot of energy, lot of announcements. I specifically want to focus on containers and the impact that containers are having on VMware, specifically the broader ecosystem and the industry at large. So, first of all, what was you take on VMworld 2019? >> Well, my take was that VMware is growing fast, and they're investing in the future, which is fairly clearly cloud and native computing on containers with Kubernetes and all that. But really that's the future and so, what VMware is doing is they're making significant bets that containers will rule the roost in cloud computing and application infrastructures going forward. But in fact virtual machines, VMs hypervisors are hotter than ever and that was well established last week by the fact that the core predominate announcement last week was a VMware Tanzu, which is not yet a production solution, but is in a limited preview, which is the new platform for coexistence of containers and vSphere. A container run time embedded in vSphere, so that customers can run containers in a highly-iso workloads, in a highly isolated VM environment. In other words, VMware is saying, we're saying to their customers, "You don't have to migrate away from VMs "until you're good and ready. "You can continue to run whatever containers "you build on vShpere, "but we more than encourage you to continue to run VMs "until you're good and ready "to migrate, if ever." >> All right. So, I want to come back and unpack that a little bit, but does your data, does your analysis, when you're talking to customers and the industry at large, is there any evidence from what you see that containers are hurting VMware's business? >> I don't get any sense that containers are hurting VMware's business. I get the strong sense that containers, they've just of course acquired Pivotal, a very additive to the revenue mix at VMware. And VMware, most of their announcements last week were in fact all around Kubernetes, and containers, and products that are very much for those customers who are going deep down the container road. >> So that was a setup question. >> You've got lots of products for them. >> So that was a setup question. So I have some data on this. >> Go ahead >> Right answer. So, I want to show you this. So, Alex, if you wouldn't mind bringing up that slide. And we shared this with you last week when we were prepping for VMworld. This is data from Enterprise Technology Research ETR, and they have a panel of 4500 end user customers that they go out and do spending surveys with them. So, what this shows is, this is container customers spending on VMware. So, you can see it goes back to early January. Now it's a little deceiving here. You see that big spike, but what it shows it that, A, that big spike is the number of shared customers. So, you really didn't have many customers back then that were doing both containers and VMware that ETR found. But as the N gets bigger, 186, 248, 257, 361, across those 461 customers, those are the shared customers in the green. And you can see that it's kind of a flat line. It's holding very well in the high 30's percent range, which is their sort of proprietary metric. So, there's absolutely no evidence, Jim, that containers, thus far anyway, are hurting VMware's business. Which of course was the narrative, containers are going to kill VMware, no evidence of that. But then why would they acquire Pivotal? Are they concerned about the future, what's your-- >> Well, they're concerned about cross selling their existing customer base who are primarily on V's, fearing the hypervisors, cross selling them on the new world of Kubernetes base products for cloud computing, and so forth and so on. In other words it's all about how do they grow their revenue base? VMware's been around for more than 20 years now. They rule the roost on the hypervisors. Where do they go from here, in terms of their product mix? Well, Kubernetes and beyond that, things like serverless will clearly be in the range of the things that they could add on. Their customers could add on to their existing deploys. I mean, look at Pivotal. Pivotal has a really strong Kubernetes distribution, which of course VMware co-developed with them. Pivotal also has a strong functions as a service backplane, the Pivotal function service for, serverless environments. So, this acquisition of Pivotal very much positions VMware to capitalize on those opportunities to sell those products when that market actually develops. But I see some evidence that virtual machines are going like gang busters in terms of customer deployments. Last week on theCUBE at VMworld, Mark Lohmeyer who's an SVP at a VMware for one of their cloud business unit, said that in the last year, for example, customers who are using a VMware cloud on AWS, VMware grew the customer base by 400% last year, and grew the number of VMs running in VMware, cloud, and AWS by 900%, which would imply that on average each customer more than doubled the number of VMs they're running on that particular cloud service. That means VMs are very much relevant now, and probably will be going forward. And why is that? That's a good question, we can debate that. >> Well, so the naysayers at VMworld in the audience were tweeting that, "Oh, I though we started Pivotal. "We launched Pivotal so that we didn't have to run VMs on, "or run containers on VMs, "so we could run them on bare metal." Are people running containers on virtual machines? >> Well, they are, yes. In fact, there's a broad range of industry initiatives, not just Tanzu at VMware, to do just that. To run containers on VMs. I mean, there is the KubeVirt, open source project over at CNCF, that's been going for a couple years now. But also, Google has Gvisor, Intel has the Kata containers initiative, I believe that there are a few others. Oh yeah, AWS with Firecracker, last year's reinvent. All this would imply, strongly indicate that these large cloud and tech vendors wouldn't be investing heavily into convergence of containers and VMs and hypervisors, if there weren't a strong demand from customers for hybrid environments where they're going to run both stacks as it were in parallel, why? Well, one of the strong advantages of VMs is workload isolation at the hardware level, which is something that typically container run times don't offer. For example, the workload isolation seems to be one of the strong features that VMware's touting for Tanzu going forward. >> So, VMware is--the centerpiece of VMware's strategy is obviously multicloud, Kubernetes as a lynch pin to enable running applications on different platforms. Will, in your opinion, and of course VMware is hard core enterprise, right? Will VMware, two things, will they be able to attract the developers, number one. And number two, will those developers build on top of VMware's platform or are they going to look to their cloud? >> That's a very important question. Last week at VMworld, I didn't get a sense that VMware has a strong developer story. I think that's a really open issue going forward for them. Why would a developer turn to VMware as their core solution provider when they don't offer a strong workbench for building these hybridized VM, /container/serverless applications that seem to be springing up all over? AWS and Microsoft and Google are much stronger in that area with their respective portfolios. >> So, I guess the obvious answer there is Pivotal is their answer to the developer quandary. >> Yes. >> And so, let's talk about that. So, Pivotal was struggling. I talked last week in my analysis, you saw the IPO price and then it dipped down, it never made it back up. Essentially the price that VMware paid the public shareholders for Pivotal was about half of it's initial IPO price, so, okay. So, the stock was struggling, the company didn't have the kind of momentum that, I think, that it wanted, so VMware picks it up. Can VMware fold in Pivotal, and use its go-to-market, and its largess to really prop up Pivotal and make it a leader? >> Well, possibly because Cloud Foundry, Pivotal Cloud Foundry could be the lynch pin of VMware's emerging developer story, if they position in that and really invest in the product in that regard. So yeah, in other words this could very much make VMware a go-to-vendor for the developers who are building the new generation of applications that present serverless functional interfaces, but will have containers under the cover, but also have VMs under the cover providing strong workload isolation in a multi-tenant environment. That would be the promise. >> Now, a couple things. You mentioned Microsoft, of course as you're in the clouding, and Google. The ETR data that I dug into when I wanted to understand, better understand multicloud. Who's got the multicloud momentum? Well, guess who has the most multicloud momentum? It's the cloud guys. Now, AWS doesn't specifically say they participate in multicloud. Certainly their marketing suggest that multicloud is for somebody else, that really they want to have uni-cloud. Whereas Google, and as you're kind of embracing multicloud and Kubernetes specifically, now of course AWS has a Kubernetes offering, but I suspect it's not something that they want to promote hard in the market place because it makes it easier for people to get off of AWS. Your thoughts on multicloud generally, but specifically Kubernetes, and containers as it relates to the big cloud providers. >> Yeah, well my thoughts on multicloud generally is that multicloud is the strategy of the second tier cloud vendors, obviously. If they can't dominate the entire space, at least they can maintain a strong, provide a strong connective tissue for the clouds that actually are deployed in their customer's environments. So, in other words, the Ciscos of the world, the VMwares of the world, IBM. In other words, these are not among the top tier of the public cloud players, hence where do they go to remain relevant? Well, they provide the connective tissue, and they provide the virtualized networking backbones, and they provide the AI ops that enables end-to-end automated monitoring management of the entire mesh. The whole notion of a mesh architecture is something that grew up with IBM and Google for lots of reasons, especially due to the fact that they themselves, as vendors, didn't dominate the public cloud. >> Well, so I agree with you. The only issue I would take is I think Microsoft is a leader in public cloud, but because it has a big On-Prem presence, it's in its best interest to push containers and Kubernetes, and so forth. But you're right about the others. Cisco doesn't have a public cloud, VMware doesn't have a public cloud, IBM has a public cloud but it's really small market share, and so it's in those companies, and Google is behind, but it's in those companies best interest really to promote multicloud, try to use it as a bull work against AWS, who's got an obviously awesome market momentum. The other thing that's interesting in the ETR data when I poke in there, it seems like there are more people looking at Google. Now maybe that's 'cause they have such strong strength in data and analytics, maybe it's 'cause they're looking for a hedge on AWS, but the spending data suggests that more and more people are kicking the tires, and more than kicking the tires on Google. Who of course is obviously behind Kubernetes and that container movement, and open source, your thoughts? >> Yeah, well, many ways, you have to think, that Google has developed the key pieces of the new stack for application development in the multicloud. Clearly they developed Kubernetes, its open source, and also they developed TensorFlow open sources, it's the predominant AI workbench essentially for the new generation of AI driven applications, which is everything. But also, if you look at Google developed Node JS for web applications and so forth. So really, Google now is the go-to-vendor for the new generation of open source application development, and increasingly DevOps in a multicloud environment, running over Istio meshes and so forth. So, I think that's, so, look at one of the announcements last weekend at VMworld. VMware and NVIDIA, their announcement of their collaboration, their joint offering to enable AI workloads, training workloads to run in GPUs in an optimal high performance fashion within a distributive of VMware cloud end-to-end. So really, I think VMware recognizes that the new workloads in the multicloud are predominately, increasingly AI workloads. And in order to, as the market goes towards those kinds of workloads, VMware very much recognizes they need to have a strong developer play, and they do with NVIDIA in a sense. Very much so because NVIDIA with the rapid framework and so forth, and NVIDIA being the predominant GPU vendor, very much is a very strategic partner for VMware as they're going forward, as they hope to line up the AI developers. But Google still is the vendor to beat as regards to AI developers of the world, in that regard, so-- >> So we're entering a world we sometimes call the post-virtual machine world. John Furrier is kind of tongue and cheek on a play on web tudauto. He calls it cloud tudauto, which is a world of multiple clouds. As I've said many times, I'm not sure multicloud is necessarily a coherent strategy yet as opposed to sort of a multi-vendor situation, Shadow IT, >> Yes. >> Lines on business, et cetera. But Jim, thanks very much-- >> Sure. >> For coming on and breaking down the container market, and VMworld 2019. It was great to see you. >> Likewise. >> All right, thank you for watching everybody. This is Dave Vellante with Jim Kobielus. We'll see you next time on theCUBE. (upbeat music)

Published Date : Sep 3 2019

SUMMARY :

From the Silicon Angle Media Office, and the industry at large. But really that's the future and so, what VMware is doing is there any evidence from what you see that containers and products that are very much for those customers So that was a setup question. A, that big spike is the number of shared customers. said that in the last year, for example, Well, so the naysayers at VMworld in the audience Well, one of the strong advantages of VMs or are they going to look to their cloud? AWS and Microsoft and Google are much stronger in that area So, I guess the obvious answer there So, the stock was struggling, Pivotal Cloud Foundry could be the lynch pin that they want to promote hard in the market place is that multicloud is the strategy and more than kicking the tires on Google. that Google has developed the key pieces of the new stack the post-virtual machine world. But Jim, thanks very much-- For coming on and breaking down the container market, This is Dave Vellante with Jim Kobielus.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Jim KobielusPERSON

0.99+

Mark LohmeyerPERSON

0.99+

JimPERSON

0.99+

NVIDIAORGANIZATION

0.99+

IBMORGANIZATION

0.99+

AWSORGANIZATION

0.99+

GoogleORGANIZATION

0.99+

CiscoORGANIZATION

0.99+

MicrosoftORGANIZATION

0.99+

VMwareORGANIZATION

0.99+

Dave VellantePERSON

0.99+

900%QUANTITY

0.99+

400%QUANTITY

0.99+

John FurrierPERSON

0.99+

last weekDATE

0.99+

last yearDATE

0.99+

Last weekDATE

0.99+

461 customersQUANTITY

0.99+

PivotalORGANIZATION

0.99+

AlexPERSON

0.99+

Boston MassachusettsLOCATION

0.99+

early JanuaryDATE

0.99+

vSphereTITLE

0.99+

todayDATE

0.99+

CNCFORGANIZATION

0.99+

4500 end user customersQUANTITY

0.99+

Node JSTITLE

0.99+

more than 20 yearsQUANTITY

0.98+

two thingsQUANTITY

0.98+

Silicon Angle Media OfficeORGANIZATION

0.98+

KubernetesTITLE

0.98+

second tierQUANTITY

0.98+

CiscosORGANIZATION

0.98+

IntelORGANIZATION

0.98+

both stacksQUANTITY

0.97+

VMworldORGANIZATION

0.97+

AI and Hybrid Cloud Storage | Wikibon Action Item | May 2019


 

Hi, I'm Peter Burris, and this is Wikibon's Action Item. We're joined here in the studio by David Floyer. Hi David. >> Hi there. >> And remote, we've got Jim Kobielus. Hi, Jim. >> Hi everybody. >> Now, Jim, you probably can't see this, but for those who are watching, when we do see the broad set, notice that David Floyer's got his Game of Thrones coffee cup with us. Now that has nothing to do with the topic. David, and Jim, we're going to be talking about this challenge that businesses have, that enterprises have, as they think about making practical use of AI. The presumption for many years was that we were going to move all the data up into the Cloud in a central location, and all workloads were going to be run there. As we've gained experience, it's very clear that we're actually going to see a greater distribution function, partly in response to a greater distribution of data. But what does that tell about the relationship between AI, AI workloads, storage, and hybrid Cloud? David, why don't you give us a little clue as to where we're going to go from here. >> Well I think the first thing we have to do is separate out the two types of workload. There's the development of the AI solution, the inference code, et cetera, the dealing with all of the data required for that. And then there is the execution of that code, which is the inference code itself. And the two are very different in characteristics. For the development, you've got a lot of data. It's very likely to be data-bound. And storage is a very important component of that, as well as computer and the GPUs. For the inference, that's much more compute-bound. Again, compute neural networks, GPUs, are very, very relevant to that portion. Storage is much more ephemeral in the sense that the data will come in and you will need to execute on it. But that data will be part of the, the compute will be part of that sensor, and you will want the storage to be actually in the DIMM itself, or non-volatile DIMM, right up as part of the processing. And you'll want to share that data only locally in real time, through some sort of mesh computing. So, very different compute requirements, storage requirements, and architectural requirements. >> Yeah, let's go back to that notion of the different storage types in a second, but Jim, David described how the workloads are going to play out. Give a sense of what the pipelines are going to look like, because that's what people are building right now, is the pipelines for actually executing these workloads. How will they differ? How do they differ in the different locations? >> Yeah, so the entire DataOps pipeline for data science, data analytics, AI in other words. And so what you're looking at here is all the processes from discovering and adjusting the data to transforming and preparing and correcting it, cleansing it, to modeling and training the AI models, to serving them out for inferencing along the lines of what David's describing. So, there's different types of AI models and one builds from different data to do different types of inferencing. And each of these different pipelines might be highly, often is, highly specific to a particular use case. You know, AI for robotics, that's a very different use case from AI for natural language processing, embedded for example in an e-commerce portal environment. So, what you're looking at here is different pipelines that all share a common sort of flow of activities and phases. And you need a data scientist to build and test, train and evaluate and serve out the various models to the consuming end devices or application. >> So, David we've got 50 or so years of computing. Where the primary role of storage was to assist a transaction and the data associated with that transaction that has occurred. And that's you know, disk and then you have all the way out to tape if we're talking about archive. Flash changes that equation. >> Absolutely changes it. >> AI absolutely demands a different way of thinking. Here we're not talking about persisting our data we're talking about delivering data, really fast. As you said, sometimes very ephemeral. And so, it requires a different set of technologies. What are some of the limitations that historically storage has been putting on some of these workloads? And how are we breaching those limitations, to make them possible? >> Well if we take only 10 years ago, the start of the big data was Hadoop. And that was spreading the data over very cheap disks and hard disks. With the compute there, and you spread that data and you did it all in parallel on very cheap nodes. So, that was the initial but that is a very expensive way of doing it now because you're tying the data to that set of nodes. They're all connected together so, a more modern way of doing it is to use Flash, to use multiple copies of that data but logical copies or snapshots of that Flash. And to be able to apply as many processes, nodes as is appropriate for that particular workload. And that is a far more efficient and faster way of processing that or getting through that sort of workload. And it really does make a difference of tenfold in terms of elapsed time and ability to get through that. And the overall cost is very similar. >> So that's true in the inferencing or, I'm sorry, in the modeling. What about in the inferencing side of things? >> Well, the inferencing side is again, very different. Because you are dealing with the data coming in from the sensors or coming in from other sensors or smart sensors. So, what you want to do there is process that data with the inference code as quickly as you can, in real time. Most of the time in real time. So, when you're doing that, you're holding the current data actually in memory. Or maybe in what's called non-volatile DIMM and VDIMM. Which gives you a larger amount. But, you almost certainly don't have the time to go and store that data and you certainly don't want to store it if you can avoid it because it is a large amount of data and if I open my... >> Has limited derivative use. >> Exactly. >> Yeah. >> So you want to get all or quickly get all the value out of that data. Compact it right down using whatever techniques you can, and then take just the results of that inference up to other ones. Now at the beginning of the cycle, you may need more but at the end of the cycle, you'll need very little. >> So Jim, the AI world has built algorithms over many, many, many years. Many which still persist today but they were building these algorithms with the idea that they were going to use kind of slower technologies. How is the AI world rethinking algorithms, architectures, pipelines, use cases? As a consequence of these new storage capabilities that David's describing? >> Well yeah, well, AI has become widely distributed in terms of its architecture increasingly and often. Increasingly it's running over containerized, Kubernetes orchestrated fabrics. And a lot of this is going on in the area of training, of models and distributing pieces of those models out to various nodes within an edge architecture. It may not be edge in the internet of things sense but, widely distributed, highly parallel environments. As a way of speeding up the training and speeding up the modeling and really speeding up the evaluation of many models running in parallel in an approach called ensemble modeling. To be able to converge on a predictive solution, more rapidly. So, that's very much what David's describing is that that's leveraging the fact that memory is far faster than any storage technology we have out there. And so, being able to distribute pieces of the overall modeling and training and even data prep of workloads. It's able to speed up the deployment of highly optimized and highly sophisticated AI models for the cutting edge, you know, challenges we face like the Event Horizon telescope for example. That we're all aware of when they were able to essentially make a visualization of a black hole. That relied on a form of highly distributed AI called Grid Computing. For example, I mean the challenges like that demand a highly distributed memory-centric orchestrated approach to tackling. >> So, you're essentially moving the code to the data as opposed to moving all of the data all the way out to the one central point. >> Well so if we think about that notion of moving code to the data. And I started off by suggesting that. In many respects, the Cloud is an architectural approach to how you distribute your workloads as opposed to an approach to centralizing everything in some public Cloud. I think increasingly, application architects and IT organizations and service providers are all seeing things in that way. This is a way of more broadly distributing workloads. Now as we think about, we talked briefly about the relationship between storage and AI workloads but we don't want to leave anyone with the impression that we're at a device level. We're really talking about a network of data that has to be associated with a network of storage. >> Yes. >> Now that suggests a different way of thinking about how - about data and data administration storage. We're not thinking about devices, we're really trying to move that conversation up into data services. What kind of data services are especially crucial to supporting some of these distributed AI workloads? >> Yes. So there are the standard ones that you need for all data which is the backup and safety and encryption security, control. >> Primary storage allocation. >> All of that, you need that in place. But on top of that, you need other things as well. Because you need to understand the mesh, the distributed hybrid Cloud that you have, and you need to know what the capabilities are of each of those nodes, you need to know the latencies between each of those nodes - >> Let me stop you here for a second. When you say "you need to know," do you mean "I as an individual need to know" or "the system needs to know"? >> It needs to be known, and it's too complex, far too complex for an individual ever to solve problems like this so it needs, in fact, its own little AI environment to be able to optimize and check the SLAs so that particular inference coding can be achieved in the way that it's set up. >> So it sounds like - >> It's a mesh type of computer. >> Yeah, so it sounds like one of the first use cases for AI, practical, commercial use cases, will be AI within the data plane itself because the AI workloads are going to drive such a complex model and utilization of data that if you don't have that the whole thing will probably just fold in on itself. Jim, how would you characterize this relationship between AI inside the system, and how should people think about that and is that really going to be a practical, near-term commercial application that folks should be paying attention to? >> Well looking at the Cloud native world, what we need and what we're increasingly seeing out there are solutions, tools, really data planes, that are able to associate a distributed storage infrastructure of a very hybridized nature in terms of disk and flash and so forth with a highly distributed containerized application environment. So for example just last week at Jeredhad I met with the folks from Robin Systems and they're one of the solution providers providing those capabilities to associate, like I said, the storage Cloud with the containerized, essentially application, or Cloud applications that are out there, you know, what we need there, like you've indicated, are the ability to use AI to continue to look for patterns of performance issues, bottlenecks, and so forth and to drive the ongoing placement of data storage nodes and servers which in clusters and so forth as way of making sure that storage resources are always used efficiently that SLAs as David indicated are always observed in an automated fashion as the native placement and workload placement decisions are being made and so ultimately that the AI itself, whatever it's doing like recognizing faces or recognizing human language, is able to do it as efficiently and really as cheaply as possible. >> Right, so let me summarize what we've got so far. We've got that there is a relationship between storage and AI, that the workload suggests that we're going to have centralized modeling, large volumes of data, we're going to have distributed inferencing, smaller on data, more complex computing. Flash is crucial, mesh is crucial, and increasingly because of the distributed nature of these applications, there's going to have to be very specific and specialized AI in the infrastructure, in that mesh itself, to administer a lot of these data resources. >> Absolutely. >> So, but we want to be careful here, right David? We don't want to suggest that we have, just as the notion of everything goes into a centralized Cloud under a central administrative effort, we also don't want to suggest this notion that there's this broad, heterogeneous, common, democratized, every service available everywhere. Let's bring hybrid Cloud into this. >> Right. >> How will hybrid Cloud ultimately evolve to ensure that we get common services where we need them? And know where we don't have common services so that we can factor those constraints? >> So it's useful to think about the hybrid Cloud from the point of view of the development which will be fairly normal types of computing and be in really large centers and the edges themselves, which will be what we call autonomous Clouds. Those are the ones at the edge which need to be self-sufficient. So if you have an autonomous car, you can't guarantee that you will have communication to it. And most - a lot of IOTs in distant places which again, on chips or distant places, where you can't guarantee. So they have to be able to run much more by themselves. So that's one important characteristic so that autonomous one needs to be self-sufficient itself and have within it all the capabilities of running that particular code. And then passing up data when it can. >> Now you gave examples where it's physically required to do that, but it's also OT examples. >> Exactly. >> Operational technologies where you need to have that air gap to ensure that bad guys can't get into your data. >> Yes, absolutely, I mean if you think about a boat, a ship, it has multiple very clear air gaps and a nuclear power station has a total air gap around it. You must have those sort of air gaps. So it's a different architecture for different uses for different areas. But of course data is going to come up from those autonomous, upwards, but it will be a very small amount of the data that's actually being processed. The data, and there'll be requests down to those autonomous Clouds for additional processing of one sort or another. So there still will be a discussion, communication, between them, to ensure that the final outcome, the business outcome, is met. >> All right, so I'm going to ask each of you guys to give me a quick prediction. David, I'm going to ask you about storage and then Jim I'm going to ask you about AI in light of David's prediction about storage. So David, as we think about where these AI workloads seem to be going, how is storage technology going to evolve to make AI applications easier to deal with, easier to run, cheaper to run, more secure? >> Well, the fundamental move is towards larger amounts of Flash. And the new thing is that larger amounts of non-volatile DIMM, the memory in the computer itself, those are going to get much, much bigger, those are going to help with the execution of these real-time applications and there's going to be high-speed communication between short distances between the different nodes and this mesh architecture. So that's on the inference side, there's a big change happening in that space. On the development side the storage will move towards sharing data. So having a copy of the data which is available to everybody, and that data will be distributed. So sharing that data, having that data distributed, will then enable the sorts of ways of using that data which will retain context, which is incredibly important, and avoid the cost and the loss of value because of the time taken of moving that data from A to B. >> All right, so to summarize, we've got a new level in the storage hierarchy that puts between Flash and memory to really accelerate things, and then secondly we've got this notion that increasingly we have to provide a way of handling time and context so that we sustain fidelity especially in more real-time applications. Jim, given that this is where storage is going to go, what does that say about AI? >> What it says about AI is that first of all, we're talking about like David said, meshes of meshes, every edge node is increasingly becoming a mesh in its own right with disparate CPUs and GPUs and whatever, doing different inferencing on each device, but every one of these, like a smart car, will have plenty of embedded storage to process a lot of data locally that may need to be kept locally for lots of very good reasons, like a black box in case of an accident, but also in terms of e-discovery of the data and the models that might have led up to an accident that might have caused fatalities and whatnot. So when we look at where AI is going, AI is going into the mesh of mesh, meshes of meshes, where there's AI running it in each of the nodes within the meshes, and the meshes themselves will operate as autonomous decisioning nodes within a broader environment. Now in terms of the context, the context increasingly that surrounds all of the AI within these distributed architectures will be in the form of graphs and graphs are something distinct from the statistical algorithms that we built AI out of. We're talking about knowledge graphs, we're talking about social graphs, we're talking about behavioral graphs, so graph technology is just getting going. For example, Microsoft recently built, they made a big continued push into threading graph - contextual graph technology - into everything they do. So that's where I see AI going is up from statistical models to graph models as the broader metadata framework for binding everything together. >> Excellent. All right guys, so Jim, I think another topic another time might be the mesh mess. (laughs) But we won't do that now. All right, let's summarize really quickly. We've talked about how the relationship between AI, storage and hybrid Clouds are going to evolve. Number one, AI workloads are at least differentiated by where we handle modeling, large amounts of data still need a lot of compute, but we're really focused on large amounts of data and moving that data around very, very quickly. But therefore proximate to where the workload resides. Great, great application for Clouds, large, public as well as private. On the other side, where the inferencing work is done, that's going to be very compute-bound, smaller data volumes, but very, very fast data. Lot of flash everywhere. The second thing we observed is that these new AI applications are going to be used and applied in a lot of different domains, both within human interaction as well as real-time domains within IOT, et cetera, but that as we evolve, we're going to see a greater relationship between the nature of the workload and the class of the storage, and that is going to be a crucial feature for storage administrators and storage vendors over the next few year is to ensure that that specialization is reflected in what's known. What's needed. Now the last point that we'll make very quickly is that as we look forward, the whole concept of hybrid Cloud where we can have greater predictability into the nature of data-oriented services that are available for different workloads is going to be really, really important. We're not going to have all data services common in all places. But we do want to make sure that we can assure whether it's a container-based application or some other structure, that we can ensure that the data that is required will be there in the context, form and metadata structures that are required. Ultimately, as we look forward, we see new classes of storage evolving that bring data even closer to the compute side, and we see new data models emerging, such as graph models, that are a better overall reflection of how this distributed data is going to evolve within hybrid Cloud environments. David Floyer, Jim Kobielus, Wikibon analysts, I'm Peter Burris, once again, this has been Action Item.

Published Date : May 16 2019

SUMMARY :

We're joined here in the studio by David Floyer. And remote, we've got Jim Kobielus. Now that has nothing to do with the topic. in the sense that the data will come in of the different storage types in a second, and adjusting the data to transforming out to tape if we're talking about archive. What are some of the limitations that historically storage of the big data was Hadoop. What about in the inferencing side of things? and store that data and you certainly don't want to store it Now at the beginning of the cycle, you may need more but So Jim, the AI world has built algorithms for the cutting edge, you know, challenges we face as opposed to moving all of the data that has to be associated with a network of storage. to supporting some of these distributed AI workloads? and encryption security, control. the distributed hybrid Cloud that you have, "I as an individual need to know" in the way that it's set up. and is that really going to be a practical, are the ability to use AI to continue to look and increasingly because of the distributed nature just as the notion of everything goes and the edges themselves, which will be what we call to do that, but it's also OT examples. to have that air gap to ensure But of course data is going to come up and then Jim I'm going to ask you about AI because of the time taken of moving that data from A to B. and context so that we sustain fidelity and the models that might have led up to an accident and that is going to be a crucial feature

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
DavidPERSON

0.99+

David FloyerPERSON

0.99+

JimPERSON

0.99+

Jim KobielusPERSON

0.99+

Peter BurrisPERSON

0.99+

Robin SystemsORGANIZATION

0.99+

May 2019DATE

0.99+

MicrosoftORGANIZATION

0.99+

twoQUANTITY

0.99+

Game of ThronesTITLE

0.99+

eachQUANTITY

0.99+

last weekDATE

0.99+

WikibonORGANIZATION

0.99+

two typesQUANTITY

0.99+

second thingQUANTITY

0.98+

bothQUANTITY

0.98+

oneQUANTITY

0.98+

each deviceQUANTITY

0.98+

FlashTITLE

0.97+

10 years agoDATE

0.96+

JeredhadORGANIZATION

0.95+

todayDATE

0.9+

first use casesQUANTITY

0.85+

first thingQUANTITY

0.84+

one important characteristicQUANTITY

0.76+

secondlyQUANTITY

0.76+

one central pointQUANTITY

0.74+

Event HorizonCOMMERCIAL_ITEM

0.72+

many yearsQUANTITY

0.71+

50 or so yearsQUANTITY

0.7+

CloudTITLE

0.67+

firstQUANTITY

0.66+

next few yearDATE

0.65+

lot of dataQUANTITY

0.62+

VDIMMOTHER

0.59+

every oneQUANTITY

0.58+

secondQUANTITY

0.57+

DataOpsTITLE

0.46+

KubernetesTITLE

0.44+

Jim Kobelius HPE5


 

(upbeat music) >> From our studios in the heart of Silicon Valley, Palo Alto, California, this is a Cube Conversation. >> Hi, I'm Peter Burris, and welcome to another Cube Conversation from our spectacular studios here in beautiful Palo Alto, California. As enterprises move forward with transformation plans that include AI, that include new classes of data-first applications, and very importantly, new storage resources necessary to support all that, you have to ask the question. Where is the talent gonna come from? How are we gonna organize that talent? What tasks are gonna be especially important? Ultimately, what will be the roles and responsibility of storage and AI administrators in the future? And to have that conversation, we've got my colleague, Jim Kobielus, from Wikibon, here to talk about this. Jim, welcome to the Cube. >> Hello, Peter, nice to be here. >> So let's start with the first question of, what is it about AI that's catalyzing these new classes of roles? >> What's catalyzing new classes of roles is regards AI is the fact that AI is not just one thing. AI is a range of applications and approaches that, to be done right, has to be built out into an organization with a specialization. A fairly fine degree of specialization between different things we've all heard, people we've all heard about data scientists, data engineers, you also need various analysts, subject matter experts. But you also need people like, when you think of normally, people who are masters of natural language processing and conversational UIs, robotics, a lot of other specialties come in depending on a type of AI project you're undertaking. It can be a fairly complex engineering process to get this stuff built and trained and working right in the real world. >> So Jim, as I like to talk about, I talk about that we are in the midst of a significant inflection point in the industry. We can characterize the first fifty years as known process, unknown technology. And by that I mean, we knew we were gonna do accounting, we knew we were gonna do finance, we knew we were gonna do HR. We didn't know what platform it was gonna be on. To now, going forward, it's known technology. We know it's gonna be cloud, and things that provide cloud-like services. We may not know the specifics of the technology, but we know what it's generally gonna look like. But more importantly, it's unknown process. Somewhere paradoxically, in the old world, the storage administrator could, based on the process, and the classes of data required to support that process, do a lot of optimization down at the physical schema layer in storage. As we move forward to these data first-world though, it opens up unknown, unexpected emergent characteristics of how the business uses data, and that's gotta put significant net new pressure on the expertise required to administer storage resources. Can you talk a little bit about that? >> Yeah, because the unknown process, I love to talk about conversational UIs which are in everything, and AI is the magic behind those, and the impact that that, for example, that application has on storage. To do conversational UI right, to do it at all really, you need natural language processing, and to do natural language processing, you need statistical models, machine learning. You need lots of textual unstructured data and so forth, and that requires a fair amount of storage, 'cause that's a lot of data, and it just takes up petabytes and beyond depending on the level to which you want your natural language processing model to be able not just to understand it, but to generate a text, a speech, with a highly read averse and millitude. So when we look at the new world of AI, natural language processing actually has been the killer app for AI ever since AI was coined as a term in the 1950s. What we're seeing is that in the last 10 plus years natural language processing, textual data in very large databases in the cloud, really catalyzed this whole big data revolution. And a big part of the big data revolution is this notion of embedded analytics, or in database analytics, and the in database analytics has gotten ever more parallelized over time to the point where you have these massive hood dupe clusters and object storage clusters and so forth. >> Jim, let me interrupt you. But doesn't that suggest ultimately that the storage administrator which used to have to know a lot about the complexities of the underlying physical device, especially when that device was spinning. That person now has to know more about the data services, how those data services are being consumed, in service too, a broader set of application issues. The storage administrators have to get more knowledgeable about how data and business come together so they can do a better job of providing and administering and optimizing data services. Is that kind of a decent summary of where the maturity's gonna be in a few years? Yeah, it comes down to storage administrators now have to master storage virtualization which is many different types of storage device or storage platforms, or database platforms, and these terms of blurring into each other have to play together within fairly complex hybrid data environments, to be able to drive an application such as conversational UI inside I would say a chat bot that's a front end to an ecommerce system that is built on being able to in real time, do heavy transactional and analytical processing. What I've just laid out, at the highest level, is an architecture that's mainstream in the database world whether you have RBBMs's, you have Calver data storage, you have distributed file stores like Hidupe File Store and so forth, and you have stream computing with some degree occasionally of persistence in a low latency one. All these different storage and data management approaches have to play together within a broader data or storage virtualization environment that very much has to be geared to what the business needs. You don't just plug in new databases as a data engineer 'cause you like them or you like the opensource project that they come out of. You do it because each of those is optimized for a particular type of data that's associated with a particular type of data source, which is also associated with a particular type of downstream usage of that data in a given business application. So the storage administrator, they have to make sure this entire storage virtualization architecture is able to align with the sources and the uses of the data, and they need a high level virtualization abstraction layer that enables, the people who build the data analytic applications, the data scientists and so forth, be able to find the data they need to put into their machine learning models to drive the magic of AI. So there has to be a greater degree of alignment with the business inside of storage, people who define the storage architecture now than ever before. Understanding the underlying formatting of the data on the spinning, well it used to be spinning disk. In the underlying of say, flash storage environment, becomes less important in terms of its relationship to the business. >> So, but I would suggest that, you tell me I got this right, it does seem to me as though as we move forward that there's going to be a tighter correspondence between what the business needs, and what the actual storage can deliver. Now we've talked about the skills gap in the data-driven world. In security, in AI, in data science, etcetera. It seems also that as we try to do a better job of matching, that we're gonna see a skills gap in how data services are conceived and operate within a business. And some of that can be filled in by new products, some of that can be filled in by new tooling, but talk to me a little bit about how partners, companies that might of historically been associated with just moving pieces of hardware, and very focused on that kind of a transaction, can step up or are going to have to step up to be more cognizant and aware, and able to participate in the process of closing that skills gap between what we need storage to do, and what the business outcomes are. >> Yeah, lemme go back to AI and lemme go back to sort of a leading edge AI use case that is everywhere, and it's gonna increasingly dominate the industrial world, which is robotics. I mean, it's not just factory floor robotics, robotics is being embedded in so many different business environments and consumer environments now, and the magic of robotics going forward is all about AI, the AI, the brains that drive. Autonomous vehicle are a type of robot, so are drones and so forth. And so, in terms of where partners come in, not that many enterprises have robotic specialists just sort of on staff on call. This is a specialized discipline that you would bring in. There's any number of robotics friends you'd bring in to a robotics related project that involves AI. You might have your staff of data scientists building out the AI. They need to call in the robotics partners, contractors, whoever they are, to provide that piece of the overall, as it were, application scenario. You might also need to call out a separate set of partners who are masters of building conversational UIs so that, to enable human beings to interact more naturally with the robots that are being built and infused with intelligence through AI. So what I'm getting is, I'm starting a sketch of an ecosystem where more companies have internal AI, or I should say data science expertise. They may increasingly call on robotics partners to provide that piece for drone related projects or whatnot. They may call on conversational UI or natural language processing partners to handle that piece. More of the UI, the interactions and so forth, they're like other specialties that are brought into these projects based on the extent to which geospatial intelligence is required. You might bring in, you know, mapping firms and so forth. Partners will provide various pieces of the overall application ecosystem needed to stand up a working AI application. Now the storage becomes important, because every one of the components in these increasingly physical, based in say robotics or drone projects or autonomous vehicles, will have its own local storage. It'll persist data locally for lots of good reasons. A, it acquires the data there 'cause it's got sensors. B, it needs to cache history in terms of over the last hour or day or last month worth of data for lots of reasons in terms of doing trend analysis. So the storage architecture that needs to be built out needs to span all these disparate assets, some of which are provided by partners, the suppliers, some of which are provided in house. So how do you build out a storage architecture that has enough flexibility to bring in more of these edge storage requirements in a unified way so that the clouds and the gateways and the edge devices and all of their disparate flash and spinning disk and what not, they all play together as a unified storage resource, while partners have to be part of that overall setting of that architecture in terms of providing some degree of, look them into the process by which the end to end storage requirements will be sort of mapped out. As you're starting to build out these more complex projects. >> Alright Jim, lemme stop you there. So it sounds as though we are in a position where there is an enormous opportunity for partners to establish a presence as helping their customers better associate what the business needs with application characteristics and the data requirements for those applications, and make it simpler and faster to associate storage resources to those so that they can accelerate these outcomes. Jim Kobielus of Wikibon, thanks very much once again for being on The Cube. >> It's a pleasure. >> And once again, I'm Peter Burris, and this has been another Cube conversation, until next time.

Published Date : May 1 2019

SUMMARY :

From our studios in the heart of Silicon of storage and AI administrators in the future? to be done right, has to be built out and the classes of data required to support that process, to the point where you have these massive hood dupe clusters in terms of its relationship to the business. and able to participate in the process of So the storage architecture that needs to be built out and the data requirements for those applications, and this has been another Cube conversation,

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Jim KobielusPERSON

0.99+

JimPERSON

0.99+

PeterPERSON

0.99+

Peter BurrisPERSON

0.99+

Jim KobeliusPERSON

0.99+

1950sDATE

0.99+

Palo Alto, CaliforniaLOCATION

0.99+

first fifty yearsQUANTITY

0.98+

eachQUANTITY

0.98+

first questionQUANTITY

0.97+

WikibonORGANIZATION

0.97+

one thingQUANTITY

0.94+

Silicon Valley, Palo Alto, CaliforniaLOCATION

0.92+

petabytesQUANTITY

0.89+

CubeORGANIZATION

0.81+

first applicationsQUANTITY

0.77+

CalverORGANIZATION

0.72+

last 10 plus yearsDATE

0.64+

The CubeTITLE

0.61+

StoreTITLE

0.56+

HidupeORGANIZATION

0.53+

Cube ConversationCOMMERCIAL_ITEM

0.51+

oneQUANTITY

0.5+

Jim Kobelius HPE3 (Do not make public)


 

(jazzy techno music) >> From our studios, in the heart of Silicon Valley, Palo Alto, California, this is a Cube Conversation. >> Hi, I'm Peter Burris, and welcome to another Cube Conversation. Everybody talks about AI, and how it's going to dramatically alter both the nature and the productivity of different classes of business outcomes, and it's clear that we're on a variety of different vector's and road map's to achieve that. One of the most important conversations is the roll that AI's gonna play within the IT organization, within digital operations. To improve the productivity of the resources that we have put in place to make these broader, more complex business outcomes possible and operationally efficient. One of the key places where this is gonna be important is in storage itself. How will AI improve the productivity, both from a cost stand point, but even more importantly, from the amount of work that storage resources can do standpoint. Now, to have that conversation we've got Jim Kobielus, my colleague from Wikibon, to talk about his vision of how AI technology, Jim you're the, our key AI guy. How AI technologies will be embedded in storage services, data services, and the new classes of products, that are gonna make possible these new types of data driven, AI driven outcomes. Jim, welcome back to the cube. >> Thanks, Peter. >> All right, so let's start Jim. As you think about it, what is it about AI that makes it relevant to improving storage productivity? >> Well, AI is a broad term, but let me net it out to the core of what AI's all about. Core AI is what is called machine learning, and machine learning is being able to find patterns in the data using algorithmic methods in a way that can be automated, and also in a way that humans, mortal humans can't usually do. In other words you have complex datasets. Machine learning is very good at doing such things as looking for anomalies, looking for trends, looking for blotter statistical patterns among (mumbles) elements within a broader dataset. So, when you talk about storage resources, and you talk about storage resources in (mumbles) environment, you have many tables, and you have records, and you have indices, keys and so forth. >> Logs. >> Yeah, yeah. You have, yeah. So, when you have a lot of entities in various, and quite often complex relationships, that when a storage exists, if you will, a number of things to persist data, you know, as a historical artifact, but also storage exists to facilitate queries, and access to the data. To answer questions, that's what analytics is all about. If you can shorten the path that a query takes to assemble the relevant tables, records, and so forth, and deliver a result back to whoever posts the query, then storage becomes evermore efficient in serving the end user. The more complex your storage resources get, it can be across different servers, different clusters, different clouds, it can be highly distributed across the internet of things, and what not. The more complex your storage architecture and distributed becomes, the more critically you need machine learning to be able to detect the high level patterns, to be able to identify, you know at any point in time, what is the path that a given query might take to be able to respond in real time to some kind of requirement from a business user who's sitting there at a dashboard trying to call up some complex metrics. So, machine learnings able to not only identify the current patterns within distributed datasets, but also to predicting models that are built in. That's what AI often does, is predict the models. To identify the predict, how if under various scenarios, if the data were placed in different storage volumes, or devices, or cached here. Now there distributed, and (mumbles) in a particular way. How you might be able to speed up queries. So, machine learning is increasingly used in storage architectures to identify, A the current patterns, B to identify query paths, C to predict a recommended, and automatically move data around so that the performance, whether it be queries, or reporting, or data transfers, or all that. So, the performance of that data transaction or analytic, is as good or as fast as it can possibly be. >> More predictable, right? >> No, here you are. Automate it, predictably. So that humans don't have to muck around with query plans, and so forth. That the architecture, the infrastructure takes care of that problem. That's why these capabilities are autonomous operations, they're built into things like Oracle database. That's just the way database computing has to be done. There's less of a need for human data engineers to do that. I think human data engineers everywhere are saying hallelujah, that is way too complex for us, especially in the era of distributed edge computing. We can't do that in a finite amount of time. Let the infrastructure automate that function. >> So, if we look back, storage used to be machine attached. Then we went to network classes of storage. Now, we're increasingly distributing data. I think one of the big misnomer's in the industry, is that cloud was a tactic for centralizing resources. In fact, it's turning out that cloud is a tactic for more broad distribution of compute data, and related resources. All of those patterns in this increasingly distributed cloud service oriented world, have to be accommodated, have to be understood, and as you said, to improve predictability, and competence in the system, we have to have some visibility into what it's gonna take to perform, and AI can help us do that. >> Exactly. >> One thing you didn't mention Jim. I want to pick up on something though. Is the idea as we move to Kubernetes, as we move to container based, transient, even serverless types of application forms where the data is. Where all the state is really baked into the data, and not residing in the application. This notion of data assurance is important. Assuring that the data that's required by an instance of a Kubernetes cluster, is available. Can be made available, or will be available when that cluster spins it up. Talk about how that becomes a use case for more AI in the storage subsystems, to ensure that storage can assure that the data that's required is available in the form it needs to be, when it needs to be, and with the policies that are required to secure it, and insure it's integrity. >> Oh yeah, that requirement for that level of data protection requires end-to-end data replication architecture. Infrastructure that's able to assure that all the critical data, or data that's tagged by the real criticality, is always available with backup copies that are always available and close enough to the applications and the users at any point in time. Continuously, so that nobody ever need worry that the data that they need will not be available, because a given server, or storage device is down, or given network is down. End-to-end data replication architecture is automated to a degree that it's always assured, and it will (mumbles) AI, as a (mumbles). First of all, making sure that the end-to-end infrastructure always has a high level, and a very fine (mumbles). A depiction on what is where at every point in time. Also, on the path between all applications, and the critical data sources that they require. Of those paths, always include backups that are hot backups. They're just available without having to worry about it. That the infrastructure predicatively takes care of caching, and replicating, and storing the data wherever it needs to be. To assure that degree of end-to-end data protection and assurance. Once again, that's an automated, and it needs to be an automated capability, especially in the era edge computing, where the storage resources are everywhere. In high preventive architecture really, storage is everywhere, it's just baked into everything. It's the very nature of HCI. >> Right. >> So, you know, yeah. >> So, Jim. We've always, you use a term anomalous behavior, and in the storage world, the storage administrator world, regarded that, or associated that with anticipating or predicting the possibility of a failure somewhere within the subsystem, but as we move to a more broadly distributed use of storage, feeding a more rich and complex set of applications, supporting a more varied and unknown set of user and business activities, the roll of anomalous behavior, even within these data patterns, and security starts to come together. Talk a little about how AI, security, and storage are likely to conflate over the course of the next few years. >> Okay. AI, security, and storage. Well, when you look at security now, data security where everything is being pushed to the edge. You need each device now. You just know that in the internet of things, whether it be an actual edge device, or a gateway, and so forth. To be able to protect the local data resources in an autonomous or semi-autonomous fashion without necessarily having to round trip back to the cloud center, if there is a cloud center, because the critical data is being stored at the edges, so what's happening more and more is that we see something that's called. I forgot the name. Zero perimeter, or perimeterless-- >> Oh, the zero-trust perimeterless security. >> Yeah, there you go, thank you. Where the policies follow the data all the way to wherever it's stored, in a zero trust environment, the permissions, the crypto keys, and so forth, and this is just automatic, so that no matter where the data happens to move, the entire security context follows it. So, what's happening now is that we're seeing that more autonomous operation become part of the architecture of end-to-end data management in this new world. So, to enable, what's happening is that, in terms of protecting that data from any number of theft, or you know denial services, and so forth, AI becomes critically important, machine learning in particular, to be able to detect intrusions autonomously at those edge devices using embedded machine running models that are persistent within the edge nodes themselves. To be able to look for patterns that might be indicative of security threats, because fixed rules are becoming less and less relevant in terms of security rules in an era where the access patterns become more or less 360 degree, in terms of every data resource is being bombarded from all sides by all possible threats. So machine learning is the tool for looking at, what access requests, or attempts on a given data resource, are anomalous in terms of, they've not been seen before. There unusual, they fall inside the confidence intervals that would be normally expected in terms of the access request. So, those edge nodes need then to be able to take action autonomously based on those patterns according to the (mumbles). So we're seeing more of that pattern based security. The edge nodes have zero trust. They're not trusting any access attempt. Any access attempt, even from local applications on the same device, is treated as if it were coming from a remote party, and it has to come through gateway that's erected through machine learning. That machine learning that it learns in real time to adapt to the threat patterns that are seen at that node. >> All right Jim, let's wrap it up there. Once again, we've been, Jim Kobielus and I have been talking about the role that AI's going to play inside the storage capacity, the storage and data services that enterprises are gonna use to improve their business outcomes. Jim, thank you very much for being on The Cube. >> Thank you very much, Peter. >> Once again, I'm Peter Burris. 'Till next time. (techno music)

Published Date : May 1 2019

SUMMARY :

in the heart of Silicon Valley, To improve the productivity of the resources that we have relevant to improving storage productivity? and machine learning is being able to find so that the performance, especially in the era of distributed edge computing. and competence in the system, in the form it needs to be, and the critical data sources that they require. and in the storage world, You just know that in the internet of things, in terms of the access request. Jim Kobielus and I have been talking about the role Once again, I'm Peter Burris.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
JimPERSON

0.99+

Jim KobielusPERSON

0.99+

PeterPERSON

0.99+

Peter BurrisPERSON

0.99+

Jim KobeliusPERSON

0.99+

each deviceQUANTITY

0.99+

OneQUANTITY

0.98+

360 degreeQUANTITY

0.98+

bothQUANTITY

0.98+

OracleORGANIZATION

0.97+

WikibonORGANIZATION

0.94+

FirstQUANTITY

0.92+

Palo Alto, CaliforniaLOCATION

0.91+

zeroQUANTITY

0.85+

KubernetesTITLE

0.84+

HPE3ORGANIZATION

0.84+

oneQUANTITY

0.83+

Silicon Valley,LOCATION

0.81+

nodesTITLE

0.81+

zero trustQUANTITY

0.8+

ZeroQUANTITY

0.78+

CubeORGANIZATION

0.71+

next few yearsDATE

0.59+

Cube ConversationORGANIZATION

0.54+

Basil Faruqui, BMC Software | BigData NYC 2017


 

>> Live from Midtown Manhattan, it's theCUBE. Covering BigData New York City 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. (calm electronic music) >> Basil Faruqui, who's the Solutions Marketing Manger at BMC, welcome to theCUBE. >> Thank you, good to be back on theCUBE. >> So first of all, heard you guys had a tough time in Houston, so hope everything's gettin' better, and best wishes to everyone down in-- >> We're definitely in recovery mode now. >> Yeah and so hopefully that can get straightened out quick. What's going on with BMC? Give us a quick update in context to BigData NYC. What's happening, what is BMC doing in the big data space now, the AI space now, the IOT space now, the cloud space? >> So like you said that, you know, the data link space, the IOT space, the AI space, there are four components of this entire picture that literally haven't changed since the beginning of computing. If you look at those four components of a data pipeline it's ingestion, storage, processing, and analytics. What keeps changing around it, is the infrastructure, the types of data, the volume of data, and the applications that surround it. And the rate of change has picked up immensely over the last few years with Hadoop coming in to the picture, public cloud providers pushing it. It's obviously creating a number of challenges, but one of the biggest challenges that we are seeing in the market, and we're helping costumers address, is a challenge of automating this and, obviously, the benefit of automation is in scalability as well and reliability. So when you look at this rather simple data pipeline, which is now becoming more and more complex, how do you automate all of this from a single point of control? How do you continue to absorb new technologies, and not re-architect our automation strategy every time, whether it's it Hadoop, whether it's bringing in machine learning from a cloud provider? And that is the issue we've been solving for customers-- >> Alright let me jump into it. So, first of all, you mention some things that never change, ingestion, storage, and what's the third one? >> Ingestion, storage, processing and eventually analytics. >> And analytics. >> Okay so that's cool, totally buy that. Now if your move and say, hey okay, if you believe that standard, but now in the modern era that we live in, which is complex, you want breath of data, but also you want the specialization when you get down to machine limits highly bounded, that's where the automation is right now. We see the trend essentially making that automation more broader as it goes into the customer environments. >> Correct >> How do you architect that? If I'm a CXO, or I'm a CDO, what's in it for me? How do I architect this? 'Cause that's really the number one thing, as I know what the building blocks are, but they've changed in their dynamics to the market place. >> So the way I look at it, is that what defines success and failure, and particularly in big data projects, is your ability to scale. If you start a pilot, and you spend three months on it, and you deliver some results, but if you cannot roll it out worldwide, nationwide, whatever it is, essentially the project has failed. The analogy I often given is Walmart has been testing the pick-up tower, I don't know if you've seen. So this is basically a giant ATM for you to go pick up an order that you placed online. They're testing this at about a hundred stores today. Now if that's a success, and Walmart wants to roll this out nation wide, how much time do you think their IT department's going to have? Is this a five year project, a ten year project? No, and the management's going to want this done six months, ten months. So essentially, this is where automation becomes extremely crucial because it is now allowing you to deliver speed to market and without automation, you are not going to be able to get to an operational stage in a repeatable and reliable manner. >> But you're describing a very complex automation scenario. How can you automate in a hurry without sacrificing the details of what needs to be? In other words, there would seem to call for repurposing or reusing prior automation scripts and rules, so forth. How can the Walmart's of the world do that fast, but also do it well? >> Yeah so we do it, we go about it in two ways. One is that out of the box we provide a lot of pre-built integrations to some of the most commonly used systems in an enterprise. All the way from the Mainframes, Oracles, SAPs, Hadoop, Tableaus of the world, they're all available out of the box for you to quickly reuse these objects and build an automated data pipeline. The other challenge we saw, and particularly when we entered the big data space four years ago was that the automation was something that was considered close to the project becoming operational. Okay, and that's where a lot of rework happened because developers had been writing their own scripts using point solutions, so we said alright, it's time to shift automation left, and allow companies to build automations and artifact very early in the developmental life cycle. About a month ago, we released what we call Control-M Workbench, its essentially a community edition of Control-M, targeted towards developers so that instead of writing their own scripts, they can use Control-M in a completely offline manner, without having to connect to an enterprise system. As they build, and test, and iterate, they're using Control-M to do that. So as the application progresses through the development life cycle, and all of that work can then translate easily into an enterprise edition of Control-M. >> Just want to quickly define what shift left means for the folks that might not know software methodologies, they don't think >> Yeah, so. of left political, left or right. >> So, we're not shifting Control-M-- >> Alt-left, alt-right, I mean, this is software development, so quickly take a minute and explain what shift left means, and the importance of it. >> Correct, so if you think of software development as a straight line continuum, you've got, you will start with building some code, you will do some testing, then unit testing, then user acceptance testing. As it moves along this chain, there was a point right before production where all of the automation used to happen. Developers would come in and deliver the application to Ops and Ops would say, well hang on a second, all this Crontab, and these other point solutions we've been using for automation, that's not what we use in production, and we need you to now go right in-- >> So test early and often. >> Test early and often. So the challenge was the developers, the tools they used were not the tools that were being used on the production end of the site. And there was good reason for it, because developers don't need something really heavy and with all the bells and whistles early in the development lifecycle. Now Control-M Workbench is a very light version, which is targeted at developers and focuses on the needs that they have when they're building and developing it. So as the application progresses-- >> How much are you seeing waterfall-- >> But how much can they, go ahead. >> How much are you seeing waterfall, and then people shifting left becoming more prominent now? What percentage of your customers have moved to Agile, and shifting left percentage wise? >> So we survey our customers on a regular basis, and the last survey showed that eighty percent of the customers have either implemented a more continuous integration delivery type of framework, or are in the process of doing it, And that's the other-- >> And getting close to a 100 as possible, pretty much. >> Yeah, exactly. The tipping point is reached. >> And what is driving. >> What is driving all is the need from the business. The days of the five year implementation timelines are gone. This is something that you need to deliver every week, two weeks, and iteration. >> Iteration, yeah, yeah. And we have also innovated in that space, and the approach we call jobs as code, where you can build entire complex data pipelines in code format, so that you can enable the automation in a continuous integration and delivery framework. >> I have one quick question, Jim, and I'll let you take the floor and get a word in soon, but I have one final question on this BMC methodology thing. You guys have a history, obviously BMC goes way back. Remember Max Watson CEO, and Bob Beach, back in '97 we used to chat with him, dominated that landscape. But we're kind of going back to a systems mindset. The question for you is, how do you view the issue of this holy grail, the promised land of AI and machine learning, where end-to-end visibility is really the goal, right? At the same time, you want bounded experiences at root level so automation can kick in to enable more activity. So there's a trade-off between going for the end-to-end visibility out of the gate, but also having bounded visibility and data to automate. How do you guys look at that market? Because customers want the end-to-end promise, but they don't want to try to get there too fast. There's a diseconomies of scale potentially. How do you talk about that? >> Correct. >> And that's exactly the approach we've taken with Control-M Workbench, the Community Edition, because earlier on you don't need capabilities like SLA management and forecasting and automated promotion between environments. Developers want to be able to quickly build and test and show value, okay, and they don't need something that is with all the bells and whistles. We're allowing you to handle that piece, in that manner, through Control-M Workbench. As things progress and the application progresses, the needs change as well. Well now I'm closer to delivering this to the business, I need to be able to manage this within an SLA, I need to be able to manage this end-to-end and connect this to other systems of record, and streaming data, and clickstream data, all of that. So that, we believe that it doesn't have to be a trade off, that you don't have to compromise speed and quality for end-to-end visibility and enterprise grade automation. >> You mentioned trade offs, so the Control-M Workbench, the developer can use it offline, so what amount of testing can they possibly do on a complex data pipeline automation when the tool's offline? I mean it seems like the more development they do offline, the greater the risk that it simply won't work when they go into production. Give us a sense for how they mitigate, the mitigation risk in using Control-M Workbench. >> Sure, so we spend a lot of time observing how developers work, right? And very early in the development stage, all they're doing is working off of their Mac or their laptop, and they're not really connected to any. And that is where they end up writing a lot of scripts, because whatever code business logic they've written, the way they're going to make it run is by writing scripts. And that, essentially, becomes the problem, because then you have scripts managing more scripts, and as the application progresses, you have this complex web of scripts and Crontabs and maybe some opensource solutions, trying to simply make all of this run. And by doing this on an offline manner, that doesn't mean that they're losing all of the other Control-M capabilities. Simply, as the application progresses, whatever automation that the builtin Control-M can seamlessly now flow into the next stage. So when you are ready to take an application into production, there's essentially no rework required from an automation perspective. All of that, that was built, can now be translated into the enterprise-grade Control M, and that's where operations can then go in and add the other artifacts, such as SLA management and forecasting and other things that are important from an operational perspective. >> I'd like to get both your perspectives, 'cause, so you're like an analyst here, so Jim, I want you guys to comment. My question to both of you would be, lookin' at this time in history, obviously in the BMC side we mention some of the history, you guys are transforming on a new journey in extending that capability of this world. Jim, you're covering state-of-the-art AI machine learning. What's your take of this space now? Strata Data, which is now Hadoop World, which is Cloud Air went public, Hortonworks is now public, kind of the big, the Hadoop guys kind of grew up, but the world has changed around them, it's not just about Hadoop anymore. So I'd like to get your thoughts on this kind of perspective, that we're seeing a much broader picture in big data in NYC, versus the Strata Hadoop show, which seems to be losing steam, but I mean in terms of the focus. The bigger focus is much broader, horizontally scalable. And your thoughts on the ecosystem right now? >> Let the Basil answer fist, unless Basil wants me to go first. >> I think that the reason the focus is changing, is because of where the projects are in their lifecycle. Now what we're seeing is most companies are grappling with, how do I take this to the next level? How do I scale? How do I go from just proving out one or two use cases to making the entire organization data driven, and really inject data driven decision making in all facets of decision making? So that is, I believe what's driving the change that we're seeing, that now you've gone from Strata Hadoop to being Strata Data, and focus on that element. And, like I said earlier, the difference between success and failure is your ability to scale and operationalize. Take machine learning for an example. >> Good, that's where there's no, it's not a hype market, it's show me the meat on the bone, show me scale, I got operational concerns of security and what not. >> And machine learning, that's one of the hottest topics. A recent survey I read, which pulled a number of data scientists, it revealed that they spent about less than 3% of their time in training the data models, and about 80% of their time in data manipulation, data transformation and enrichment. That is obviously not the best use of a data scientist's time, and that is exactly one of the problems we're solving for our customers around the world. >> That needs to be automated to the hilt. To help them >> Correct. to be more productive, to deliver faster results. >> Ecosystem perspective, Jim, what's your thoughts? >> Yeah, everything that Basil said, and I'll just point out that many of the core uses cases for AI are automation of the data pipeline. It's driving machine learning driven predictions, classifications, abstractions and so forth, into the data pipeline, into the application pipeline to drive results in a way that is contextually and environmentally aware of what's goin' on. The history, historical data, what's goin' on in terms of current streaming data, to drive optimal outcomes, using predictive models and so forth, in line to applications. So really, fundamentally then, what's goin' on is that automation is an artifact that needs to be driven into your application architecture as a repurposable resource for a variety of-- >> Do customers even know what to automate? I mean, that's the question, what do I-- >> You're automating human judgment. You're automating effort, like the judgments that a working data engineer makes to prepare data for modeling and whatever. More and more that can be automated, 'cause those are pattern structured activities that have been mastered by smart people over many years. >> I mean we just had a customer on with a Glass'Gim CSK, with that scale, and his attitude is, we see the results from the users, then we double down and pay for it and automate it. So the automation question, it's an option question, it's a rhetorical question, but it just begs the question, which is who's writing the algorithms as machines get smarter and start throwing off their own real-time data? What are you looking at? How do you determine? You're going to need machine learning for machine learning? Are you going to need AI for AI? Who writes the algorithms >> It's actually, that's. for the algorithm? >> Automated machine learning is a hot, hot not only research focus, but we're seeing it more and more solution providers, like Microsoft and Google and others, are goin' deep down, doubling down in investments in exactly that area. That's a productivity play for data scientists. >> I think the data markets going to change radically in my opinion. I see you're startin' to some things with blockchain and some other things that are interesting. Data sovereignty, data governance are huge issues. Basil, just give your final thoughts for this segment as we wrap this up. Final thoughts on data and BMC, what should people know about BMC right now? Because people might have a historical view of BMC. What's the latest, what should they know? What's the new Instagram picture of BMC? What should they know about you guys? >> So I think what I would say people should know about BMC is that all the work that we've done over the last 25 years, in virtually every platform that came before Hadoop, we have now innovated to take this into things like big data and cloud platforms. So when you are choosing Control-M as a platform for automation, you are choosing a very, very mature solution, an example of which is Navistar. Their CIO's actually speaking at the Keno tomorrow. They've had Control-M for 15, 20 years, and they've automated virtually every business function through Control-M. And when they started their predictive maintenance project, where they're ingesting data from about 300,000 vehicles today to figure out when this vehicle might break, and to predict maintenance on it. When they started their journey, they said that they always knew that they were going to use Control-M for it, because that was the enterprise standard, and they knew that they could simply now extend that capability into this area. And when they started about three, four years ago, they were ingesting data from about 100,000 vehicles. That has now scaled to over 325,000 vehicles, and they have no had to re-architect their strategy as they grow and scale. So I would say that is one of the key messages that we are taking to market, is that we are bringing innovation that spans over 25 years, and evolving it-- >> Modernizing it, basically. >> Modernizing it, and bringing it to newer platforms. >> Well congratulations, I wouldn't call that a pivot, I'd call it an extensibility issue, kind of modernizing kind of the core things. >> Absolutely. >> Thanks for coming and sharing the BMC perspective inside theCUBE here, on BigData NYC, this is the theCUBE, I'm John Furrier. Jim Kobielus here in New York city. More live coverage, for three days we'll be here, today, tomorrow and Thursday, and BigData NYC, more coverage after this short break. (calm electronic music) (vibrant electronic music)

Published Date : Feb 11 2019

SUMMARY :

Brought to you by SiliconANGLE Media who's the Solutions Marketing Manger at BMC, in the big data space now, the AI space now, And that is the issue we've been solving for customers-- So, first of all, you mention some things that never change, and eventually analytics. but now in the modern era that we live in, 'Cause that's really the number one thing, No, and the management's going to How can the Walmart's of the world do that fast, One is that out of the box we provide a lot of left political, left or right. Alt-left, alt-right, I mean, this is software development, and we need you to now go right in-- and focuses on the needs that they have And getting close to a 100 The tipping point is reached. The days of the five year implementation timelines are gone. and the approach we call jobs as code, At the same time, you want bounded experiences at root level And that's exactly the approach I mean it seems like the more development and as the application progresses, kind of the big, the Hadoop guys kind of grew up, Let the Basil answer fist, and focus on that element. it's not a hype market, it's show me the meat of the problems we're solving That needs to be automated to the hilt. to be more productive, to deliver faster results. and I'll just point out that many of the core uses cases like the judgments that a working data engineer makes So the automation question, it's an option question, for the algorithm? doubling down in investments in exactly that area. What's the latest, what should they know? should know about BMC is that all the work kind of modernizing kind of the core things. Thanks for coming and sharing the BMC perspective

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
JimPERSON

0.99+

Jim KobielusPERSON

0.99+

WalmartORGANIZATION

0.99+

BMCORGANIZATION

0.99+

GoogleORGANIZATION

0.99+

NYCLOCATION

0.99+

MicrosoftORGANIZATION

0.99+

oneQUANTITY

0.99+

Basil FaruquiPERSON

0.99+

five yearQUANTITY

0.99+

ten monthsQUANTITY

0.99+

two weeksQUANTITY

0.99+

three monthsQUANTITY

0.99+

six monthsQUANTITY

0.99+

John FurrierPERSON

0.99+

15QUANTITY

0.99+

BasilPERSON

0.99+

HoustonLOCATION

0.99+

HortonworksORGANIZATION

0.99+

SiliconANGLE MediaORGANIZATION

0.99+

MacCOMMERCIAL_ITEM

0.99+

BMC SoftwareORGANIZATION

0.99+

two waysQUANTITY

0.99+

bothQUANTITY

0.99+

tomorrowDATE

0.99+

Midtown ManhattanLOCATION

0.99+

OneQUANTITY

0.99+

ten yearQUANTITY

0.99+

over 25 yearsQUANTITY

0.99+

over 325,000 vehiclesQUANTITY

0.99+

about 300,000 vehiclesQUANTITY

0.99+

third oneQUANTITY

0.99+

three daysQUANTITY

0.99+

about 100,000 vehiclesQUANTITY

0.99+

about 80%QUANTITY

0.98+

BigDataORGANIZATION

0.98+

ThursdayDATE

0.98+

eighty percentQUANTITY

0.98+

todayDATE

0.98+

20 yearsQUANTITY

0.98+

one quick questionQUANTITY

0.98+

single pointQUANTITY

0.98+

Bob BeachPERSON

0.97+

four years agoDATE

0.97+

two use casesQUANTITY

0.97+

one final questionQUANTITY

0.97+

'97DATE

0.97+

InstagramORGANIZATION

0.97+

AgileTITLE

0.96+

New York cityLOCATION

0.96+

About a month agoDATE

0.96+

OraclesORGANIZATION

0.96+

HadoopTITLE

0.95+

about a hundred storesQUANTITY

0.94+

less than 3%QUANTITY

0.94+

2017DATE

0.93+

Glass'GimORGANIZATION

0.92+

aboutQUANTITY

0.92+

firstQUANTITY

0.91+

OpsORGANIZATION

0.91+

HadoopORGANIZATION

0.9+

Max WatsonPERSON

0.88+

100QUANTITY

0.88+

theCUBEORGANIZATION

0.88+

MainframesORGANIZATION

0.88+

NavistarORGANIZATION

0.86+

VMworld 2018 Show Analysis | VMworld 2018


 

(upbeat techno music) >> Live, from Las Vegas, it's theCUBE covering VMworld 2018, brought to you by VMware and it's ecosystem partners. >> Okay, welcome back everyone, we're here live in Las Vegas for VMworld 2018 coverage. It's the final analysis, the final interview of three days, 94 interviews, two CUBE sets, amazing production, our ninth year covering VMworld. We've seen the evolution, we've seen the trials and tribulations of VMware and it's ecosystem and as it moves into the modern era, the dynamics are changing. We heard quotes like, "From playing tennis "to playing soccer," it's a lot of complicated things, the cloud certainly a big part of it. I'm John Furrier your host, Stu Miniman couldn't be here for the wrap, he had an appointment. I'm here with Dave Vallente and Jim Kobielus who's with Wikibon and SiliconANGLE and theCUBE team. >> Guys, great job, I want to say thanks to you guys and thanks to the crew on both sets. Amazing production, we're just going to have some fun here. We've analyzed this event, ten different ways from Sunday. >> So many people working so hard for such a steady clip as we have here the last three days, amazing. >> Just to give some perspective, I want to get, just lay out kind of what's going on with theCUBE. I've get a lot of people come up and ask me hey what's going on, you guys are amazing. It's gotten so much bigger, there's two sets. But every year, Dave, we always try to at VMworld, make VMworld our show to up our value. We always love to innovate, but we got a business to run. We have no outside finance, we have a great set of partners. I'm proud of the team, what Jeff Frick did and the team has done amazing work. Sonia's here's, the whole analyst team's here, our whole team's here. But we have an orchestrated system now, we have the blogging at SilconANGLE.com and Rob Hof leading the editorial. Working on a content immersion program. Jim you were involved in with Rob and Peter in the team, bringing content on the written word side, as fast as possible, the best quality, fast as possible, the analysts getting the pre-briefing and the NDAs, theCUBE team setting it up. Pretty unique formula at full stride right now, only going to get better. New photography, better pictures, better video, better guests, more content. Now with the video clipper tool and our video cloud service and we did a tech preview of our block chain, token economics, a lot of the insiders of VMworld, the senior executives and the community, all with great results, they all loved it, they want to do more. Opening up our platform, opening up the content's been a big success, I want to thank you guys for that. >> And I agree, I should point out that one of the things we have that say an agency doesn't offer, I used to be with a large multi national solutions provider doing kind of similar work but in a thought leadership market kind of, let me just state something here, what we've got is unique because we have analysts, market researchers, who know this stuff at the core of our business model, including, especially the content immersion program. Peter Boroughs did a bit, I did a fair amount on this one. You need subject matter experts to curate and really define the themes that the entire editorial team, and I'm including theCUBE people on the editorial team, are basically, so we're all aligned around we know what the industry is all about, the context, the vendor, and somebody's just curating making sure that the subject matter is on target was what the community wants to see. >> So I got to day, first of all, VMware set us up with two stages here, two sets, amazing. They've been unbelievable partners. They really put their money with their mouth is. They allow us to bring in the ecosystem, do our own thing, so that's phenomenal and our goal is to give back to the community. We had two sets, 94 guests this week, 70 interview segments, hundreds and hundreds of assets coming out, all free. >> It was amazing. >> SiliconANGLE.com, Wikibon.com, theCUBE.net, all free content was really incredible. >> It's good free content. >> It's great free content. >> We dropped a true private cloud report with market shares, that's all open and free. Floyer did a piece on VMware's hybrid cloud strategy, near to momentum, ice bergs ahead. Jim Kobelius, first of all, every day here you laid out here's what happened today with your analysis plus you had previews plus you have a trip report coming. >> Plus I had a Wikibon research note that had been in the pipeline for about a month and I held off on publishing until Monday at the show, the AI ready IT infrastructure because it's so aligned with what's going on. >> And then Paul Gillan and Rob Hof did a series in their team on the future of the data center. Paul Gillan, the walls are tumbling down, I mean that thing got amazing play, check that out. It's just a lot of detail in there. >> And more importantly, that's our content. We're linking, we're open, we're linking to other people's content, from Tech Field Day what Foskett's doing to vBrownBag to linking to stories, sharing, quoting other analysts, Patrick Moorehead for more insights. Anyone who has content that we can get it in fast, in real time, out to the marketplace, is our mission and we love doing it so I think the formula of open is working. >> Yeah Charles King, this morning I saw Charles, I thanked him for, he had great quotes. >> Yeah, great guy. >> He's like, "I love with Paul Gillan calls me." John, talk about the tech preview because the tech preview was an open community project that's all about bringing the community together, helping them and helping get content out into the marketplace. >> Well our goal for this event was to use the VMworld to preview some of our innovations and you're going to start to hear more from the siliconANGLE media, CUBE and siliconANGLE team around concepts like the CUBE cloud. We have technology we're going to start to surface and bring out to the marketplace and we want to make it free and open and allow people to use and share in what we do and make theCUBE a community brand and a community concept and continue this mission and treat theCUBE like an upstream project. Let's all co-create together because the downstream benefits in communities are significantly better when there's co-creation and self governance. Highest quality content, from highly reputable people, whether it's news, analysis, opinion, commentary, pontification, we love it all, let the content stand on it's own and let's the benefits come down so if you're a sponsor, if you're a thought leader, you're a news maker, you're an analyst, we love to do that and we love talking with the executives so that's great. The tech preview is about showcasing how we want to create a new network. As communities are growing and changing, VMware's community is robust, Dave, it's it's own subnet, but as the world grows in those multiple clouds, Azure has a community, Google has a community, and people have been trained to sit in these silos, okay? >> Mm-hmm. >> We go to so many events and we engage with so many communities, we want to connect them all through the CUBE coin concept of block chain where if someone's in a community, they can download the wallet and join theCUBE network. Today there's no mechanism to join theCUBE network. You can go to theCUBE.net and subscribe, you can go to YouTube and subscribe, you can get e-mail marketing but that's not acceptable to us we want a subscribe button that's going to add value to people who contribute value, they can capture it. That was the tech preview, it's a block chain based community. We're calling it the Open Community Project. >> Wow. >> Open Community Project is the first upstream content software model that's free to use, where if the community uses it, they can capture value that they create. It's a new concept and it's radical and revolutionary. >> In some ways were analogous to what VMware has evolved into where they bridge clouds and they say that, "We bridge clouds." We bridge communities all around thought leadership and to provide a forum for conversations that bridge the various siloed communities. >> Well Jim you and I talked about this, we've seen the movie and media. In the old school media days and search engine marketing and e-mail marketing and starting a blog, which we were part of, the blogging was the first generation of sharing economy where you linked to other bloggers and shared your traffic, because you were working together against the mainstream media. >> It's my major keyboard, by the way, I love blogs. >> And if you were funded you had to build an audience. Audience development, audience development. Not anymore, the audience is already there. They are now in networks so the new ethos, like blogging, is joining networks and not making it an ownership, lock in walled garden. So the new ethos is not link sharing, community sharing, co-creation and merging networks. This is something that we're seeing across all event communities and content is the nutrients and the glue for those communities. >> You got multi cloud, you got multi content networks. Making it together, it's exciting. I mean there were some people that I saw this week, I mean Alan Cohen as a guest host, amazingly articulate, super smart guy, plugged in to Silicon Valley. Christophe Bertrand, analyst at ESG, a great analysis today on theCUBE, bringing those guys, nominate them into the community for the Open Community Project. >> You know what I like, Dave, was also Jeff Frick, Sonia and Gabe were all at the front there, greeting the guests. We had great speakers, it all worked. The stages worked but it's for the community, by the community, this is the model, right? This is what we want to do and it was a lot of fun, I had a lot of great interviews from Andy Bechtolsheim, Michael Dell, Pat Gellsinger to practitioners and to the vendors and suppliers all co-creating here in real time, it was really a lot of fun. >> Oh yes, amen. >> Well Dave, thanks for everything. Thanks for the crew, great job everybody. >> Awesome. >> Jim, well done. >> Thanks to Stu Miniman, Peter Burris and all the guests, Justin Warren, John Troyer, guest host Alan Cohen, great community participation. This is theCUBE signing off from Las Vegas, this is VMworld 2018 final analysis, thanks for watching. (upbeat techno music)

Published Date : Aug 29 2018

SUMMARY :

covering VMworld 2018, brought to you and as it moves into the modern era, and thanks to the crew on both sets. as we have here the last three days, amazing. and the team has done amazing work. And I agree, I should point out that one of the things and our goal is to give back to the community. all free content was really incredible. near to momentum, ice bergs ahead. at the show, the AI ready IT infrastructure Paul Gillan, the walls are tumbling down, and we love doing it so I think the formula of open this morning I saw Charles, I thanked him for, because the tech preview was an open community project and allow people to use and share in what we do We're calling it the Open Community Project. Open Community Project is the first that bridge the various siloed communities. In the old school media days and search engine marketing is the nutrients and the glue for those communities. for the Open Community Project. by the community, this is the model, right? Thanks for the crew, great job everybody. Thanks to Stu Miniman, Peter Burris and all the guests,

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Andy BechtolsheimPERSON

0.99+

Justin WarrenPERSON

0.99+

Christophe BertrandPERSON

0.99+

RobPERSON

0.99+

Alan CohenPERSON

0.99+

Jeff FrickPERSON

0.99+

Jim KobeliusPERSON

0.99+

Peter BurrisPERSON

0.99+

Michael DellPERSON

0.99+

Jim KobielusPERSON

0.99+

Pat GellsingerPERSON

0.99+

DavePERSON

0.99+

John TroyerPERSON

0.99+

Paul GillanPERSON

0.99+

PeterPERSON

0.99+

Rob HofPERSON

0.99+

Dave VallentePERSON

0.99+

Stu MinimanPERSON

0.99+

Patrick MooreheadPERSON

0.99+

JimPERSON

0.99+

John FurrierPERSON

0.99+

94 guestsQUANTITY

0.99+

94 interviewsQUANTITY

0.99+

Silicon ValleyLOCATION

0.99+

Las VegasLOCATION

0.99+

two setsQUANTITY

0.99+

CharlesPERSON

0.99+

Charles KingPERSON

0.99+

VMwareORGANIZATION

0.99+

SundayDATE

0.99+

Peter BoroughsPERSON

0.99+

ninth yearQUANTITY

0.99+

hundredsQUANTITY

0.99+

JohnPERSON

0.99+

VMworldORGANIZATION

0.99+

both setsQUANTITY

0.99+

three daysQUANTITY

0.99+

FloyerPERSON

0.99+

CUBEORGANIZATION

0.99+

GoogleORGANIZATION

0.99+

70 interview segmentsQUANTITY

0.98+

ESGORGANIZATION

0.98+

SoniaPERSON

0.98+

YouTubeORGANIZATION

0.98+

MondayDATE

0.98+

TodayDATE

0.98+

firstQUANTITY

0.98+

WikibonORGANIZATION

0.97+

two stagesQUANTITY

0.97+

theCUBEORGANIZATION

0.97+

SiliconANGLEORGANIZATION

0.97+

theCUBE.netOTHER

0.97+

VMworld 2018EVENT

0.95+

GabePERSON

0.95+

ten different waysQUANTITY

0.95+

todayDATE

0.95+

this weekDATE

0.94+

oneQUANTITY

0.93+

Wikibon.comORGANIZATION

0.93+

first generationQUANTITY

0.91+

AzureTITLE

0.88+

about a monthQUANTITY

0.86+

SilconANGLE.comOTHER

0.86+

VMworldEVENT

0.83+

Rob Bearden, Hortonworks | DataWorks Summit 2018


 

>> Live from San Jose in the heart of Silicon Valley, it's theCUBE covering DataWorks Summit 2018, brought to you by Hortonworks. >> Welcome back to theCUBE's live coverage of DataWorks Summit here in San Jose, California. I'm your host, Rebecca Knight, along with my co-host, James Kobielus. We're joined by Rob Bearden. He is the CEO of Hortonworks. So thanks so much for coming on theCUBE again, Rob. >> Thank you for having us. >> So you just got off of the keynote on the main stage. The big theme is really about modern data architecture. So we're going to have this modern data architecture. What is it all about? How do you think about it? What's your approach? And how do you walk customers through this process? >> Well, there's a lot of moving parts in enabling a modern data architecture. One of the first steps is what we're trying to do is unlock the siloed transactional applications, and to get that data into a central architecture so you can get real time insights around the inclusive dataset. But what we're really trying to accomplish then within that modern data architecture is to bring all types of data whether it be real time streaming data, whether it be sensor data, IoT data, whether it be data that's coming from a connected core across the network, and to be able to bring all that data together in real time, and give the enterprise the ability to be able to take best in class action so that you get a very prescriptive outcome of what you want. So if we bring that data under management from point of origination and out on the edge, and then have the platforms that move that through its entire lifecycle, and that's our HDF platform, it gives the customer the ability to, after they capture it at the edge, move it, and then have the ability to process it as an event happens, a condition changes, various conditions come together, have the ability to process and take the exact action that you want to see performed against that, and then bring it to rest, and that's where our HDP platform comes into play where then all that data can be aggregated so you can have a holistic insight, and have real time interactions on that data. But then it then becomes about deploying those datasets and workloads on the tier that's most economically and architecturally pragmatic. So if that's on-prem, we make sure that we are architected for that on-prem deployment or private cloud or even across multiple public clouds simultaneously, and give the enterprise the ability to support each of those native environments. And so we think hybrid cloud architecture is really where the vast majority of our customers today and in the future, are going to want to be able to run and deploy their applications and workloads. And that's where our DataPlane Service Offering gives them the ability to have that hybrid architecture and the architectural latitude to move workloads and datasets across each tier transparently to what storage file format that they did or where that application is, and we provide all the tooling to match the complexity from doing that, and then we ensured that it has one common security framework, one common governance through its entire lifecycle, and one management platform to handle that entire lifecycle data. And that's the modern data architecture is to be able to bring all data under management, all types of data under management, and manage that in real time through its lifecycle til it comes at rest and deploy that across whatever architecture tier is most appropriate financially and from a performance on-cloud or prem. >> Rob, this morning at the keynote here in day one at DataWorks San Jose, you presented this whole architecture that you described in the context of what you call hybrid clouds to enable connected communities and with HDP, Hortonworks Data Platform 3.0 is one of the prime announcements, you brought containerization into the story. Could you connect those dots, containerization, connected communities, and HDP 3.0? >> Well, HDP 3.0 is really the foundation for enabling that hybrid architecture natively, and what's it done is it separated the storage from the compute, and so now we have the ability to deploy those workloads via a container strategy across whichever tier makes the most sense, and to move those application and datasets around, and to be able to leverage each tier in the deployment architectures that are most pragmatic. And then what that lets us do then is be able to bring all of the different data types, whether it be customer data, supply chain data, product data. So imagine as an industrial piece of equipment is, an airplane is flying from Atlanta, Georgia to London, and you want to be able to make sure you really understand how well is that each component performing, so that that plane is going to need service when it gets there, it doesn't miss the turnaround and leave 300 passengers stranded or delayed, right? Now with our Connected platform, we have the ability to take every piece of data from every component that's generated and see that in real time, and let the airlines make that real time. >> Delineate essentially. >> And ensure that we know every person that touched it and looked at that data through its entire lifecycle from the ground crew to the pilots to the operations team to the service. Folks on the ground to the reservation agents, and we can prove that if somehow that data has been breached, that we know exactly at what point it was breached and who did or didn't get to see it, and can prevent that because of the security models that we put in place. >> And that relates to compliance and mandates such as the Global Data Protection Regulation GDPR in the EU. At DataWorks Berlin a few months ago, you laid out, Hortonworks laid out, announced a new product called the Data Steward Studio to enable GDPR compliance. Can you give our listeners now who may not have been following the Berlin event a bit of an update on Data Steward Studio, how it relates to the whole data lineage, or set of requirements that you're describing, and then going forward what does Hortonworks's roadmap for supporting the full governance lifecycle for the Connected community, from data lineage through like model governance and so forth. Can you just connect a few dots that will be helpful? >> Absolutely. What's important certainly, driven by GDPR, is the requirement to be able to prove that you understand who's touched that data and who has not had access to it, and that you ensure that you're in compliance with the GDPR regulations which are significant, but essentially what they say is you have to protect the personal data and attributes of that data of the individual. And so what's very important is that you've got to be able to have the systems that not just secure the data, but understand who has the accessibility at any point in time that you've ever maintained that individual's data. And so it's not just about when you've had a transaction with that individual, but it's the rest of the history that you've kept or the multiple datasets that you may try to correlate to try to expand relationship with that customer, and you need to make sure that you can ensure not only that you've secured their data, but then you're protecting and governing who has access to it and when. And as importantly that you can prove in the event of a breach that you had control of that, and who did or did not access it, because if you can't prove any breach, that it was secure, and that no one breached it, who has or access to this not supposed to, you can be opened up for hundreds of thousands of dollars or even multiple millions of dollars of fines just because you can't prove that it was not accessed, and that's what the variety of our platforms, you mentioned Data Studio, is part of. DataPlane is one of the capabilities that gives us the ability. The core engine that does that is Atlas, and that's the open source governance platform that we developed through the community that really drives all the capabilities for governance that moves through each of our products, HDP, HDF, then of course, and DataPlane and Data Studio takes advantage of that and how it moves and replicates data and manages that process for us. >> One of the things that we were talking about before the cameras were rolling was this idea of data driven business models, how they are disrupting current contenders, new rivals coming on the scene all the time. Can you talk a little bit about what you're seeing and what are some of the most exciting and maybe also some of the most threatening things that you're seeing? >> Sure, in the traditional legacy enterprise, it's very procedural driven. You think about classic Encore ERP. It's worked very hard to have a very rigid, very structural procedural order to cash cycle that has not a great deal of flexibility. And it takes through a design process, it builds product, that then you sell product to a customer, and then you service that customer, and then you learn from that transaction different ways to automate or improve efficiencies in their supply chain. But it's very procedural, very linear. And in the new world of connected data models, you want to bring transparency and real time understanding and connectivity between the enterprise, the customer, the product, and the supply chain, and that you can take real time best in practice action. So for example you understand how well your product is performing. Is your customer using it correctly? Are they frustrated with that? Are they using it in the patterns and the frequency that they should be if they are going to expand their use and buy more, and if they're not, how do we engage in that cycle? How do we understand if they're going through a re-review and another buying of something similar that may not be with you for a different reason. And when we have real time visibility to our customer's interaction, understand our product's performance through its entire lifecycle, then we can bring real time efficiency with linking those together with our supply chain into the various relationships we have with our customers. To do that, it requires the modern data architecture, bringing data under management from the point it originates, whether it's from the product or the customer interacting with the company, or the customer interacting potentially with our ecosystem partners, mutual partners, and then letting the best in practice supply chain techniques, make sure that we're bringing the highest level of service and support to that entire lifecycle. And when we bring data under management, manage it through its lifecycle and have the historical view at rest, and leverage that across every tier, that's when we get these high velocity, deep transparency, and connectivity between each of the constituents in the value chain, and that's what our platforms give them the ability to do. >> Not only your platform, you guys have been in business now for I think seven years or so, and you shifted from being in the minds of many and including your own strategy from being the premier data at rest company in terms of the a Hadoop platform to being one of the premier data in motion companies. Is that really where you're going? To be more of a completely streaming focus, solution provider in a multi-cloud environment? And I hear a lot of Kafka in your story now that it's like, oh yeah, that's right, Hortonworks is big on Kafka. Can you give us just a quick sense of how you're making that shift towards low latency real time streaming, big data, or small data for that matter, with embedded analytics and machine learning? >> So, we have evolved from certainly being the leader in global data platforms with all the work that we do collaboratively, and in through the community, to make Hadoop an enterprise viable data platform that has the ability to run mission critical workloads and apps at scale, ensuring that it has all the enterprise facilities from security and governance and management. But you're right, we have expanded our footprint aggressively. And we saw the opportunity to actually create more value for our customers by giving them the ability to not wait til they bring data under management to gain an insight, because in that case, they're happened to be reactive post event post transaction. We want to give them the ability to shift their business model to being interactive, pre-event, pre-conditioned. The way to do that we learned was to be able to bring the data under management from the point of origination, and that's what we used MiNiFi and NiFi for, and then HDF, to move it through its lifecycle, and your point, we have the intellect, we have the insight, and then we have the ability then to process the best in class outcome based on what we know the variables are we're trying to solve for as that's happening. >> And there's the word, the phrase asset which of course is a transactional data paradigm plan, I hear that all over your story now in streaming. So, what you're saying is it's a completely enterprise-grade streaming environment from n to n for the new era of edge computing. Would that be a fair way of-- >> It's very much so. And our model and strategy has always been bring the other best in class engines for what they do well for their particular dataset. A couple of examples of that, one, you brought up Kafka, another is Spark. And they do what they do really well. But what we do is make sure that they fit inside an overall data architecture that then embodies their access to a much broader central dataset that goes from point of origination to point of rest on a whole central architecture, and then benefit from our security, governance, and operations model, being able to manage those engines. So what we're trying to do is eliminate the silos for our customers, and having siloed datasets that just do particular functions. We give them the ability to have an enterprise modern data architecture, we manage the things that bring that forward for the enterprise to have the modern data driven business models by bringing the governance, the security, the operations management, ensure that those workflows go from beginning to end seamlessly. >> Do you, go ahead. >> So I was just going to ask about the customer concerns. So here you are, you've now given them this ability to make these real time changes, what's sort of next? What's on their mind now and what do you see as the future of what you want to deliver next? >> First and foremost we got to make sure we get this right, and we really bring this modern data architecture forward, and make sure that we truly have the governance correct, the security models correct. One pane of glass to manage this. And really enable that hybrid data architecture, and let them leverage the cloud tier where it's architecturally and financially pragmatic to do it, and give them the ability to leg into a cloud architecture without risk of either being locked in or misunderstanding where the lines of demarcation of workloads or datasets are, and not getting the economies or efficiencies they should. And we solved that with DataPlane. So we're working very hard with the community, with our ecosystem and strategic partners to make sure that we're enabling the ability to bring each type of data from any source and deploy it across any tier with a common security, governance, and management framework. So then what's next is now that we have this high velocity of data through its entire lifecycle on one common set of platforms, then we can start enabling the modern applications to function. And we can go look back into some of the legacy technologies that are very procedural based and are dependent on a transaction or an event happening before they can run their logic to get an outcome because that grinds the customer in post world activity. We want to make sure that we're bringing that kind of, for example, supply chain functionality, to the modern data architecture, so that we can put real time inventory allocation based on the patterns that our customers go in either how they're using the product, or frustrations they've had, or success they've had. And we know through artificial intelligence and machine learning that there's a high probability not only they will buy or use or expand their consumption of whatever that they have of our product or service, but it will probably to these other things as well if we do those things. >> Predict the logic as opposed to procedural, yes, AI. >> And very much so. And so it'll be bringing those what's next will be the modern applications on top of this that become very predictive and enabler versus very procedural post to that post transaction. We're little ways downstream. That's looking out. >> That's next year's conference. >> That's probably next year's conference. >> Well, Rob, thank you so much for coming on theCUBE, it's always a pleasure to have you. >> Thank you both for having us, and thank you for being here, and enjoy the summit. >> We're excited. >> Thank you. >> We'll do. >> I'm Rebecca Knight for Jim Kobielus. We will have more from DataWorks Summit just after this. (upbeat music)

Published Date : Jun 20 2018

SUMMARY :

in the heart of Silicon Valley, He is the CEO of Hortonworks. keynote on the main stage. and give the enterprise the ability in the context of what you call and let the airlines from the ground crew to the pilots And that relates to and that you ensure that and maybe also some of the most and that you can take real and you shifted from being that has the ability to run for the new era of edge computing. and then benefit from our security, and what do you see as the future and make sure that we truly have Predict the logic as the modern applications on top of this That's probably next year's it's always a pleasure to have you. and enjoy the summit. I'm Rebecca Knight for Jim Kobielus.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
James KobielusPERSON

0.99+

Rebecca KnightPERSON

0.99+

Rob BeardenPERSON

0.99+

Jim KobielusPERSON

0.99+

LondonLOCATION

0.99+

300 passengersQUANTITY

0.99+

San JoseLOCATION

0.99+

RobPERSON

0.99+

Silicon ValleyLOCATION

0.99+

HortonworksORGANIZATION

0.99+

seven yearsQUANTITY

0.99+

hundreds of thousands of dollarsQUANTITY

0.99+

San Jose, CaliforniaLOCATION

0.99+

each componentQUANTITY

0.99+

GDPRTITLE

0.99+

DataWorks SummitEVENT

0.99+

oneQUANTITY

0.99+

OneQUANTITY

0.98+

millions of dollarsQUANTITY

0.98+

AtlasTITLE

0.98+

first stepsQUANTITY

0.98+

HDP 3.0TITLE

0.97+

One paneQUANTITY

0.97+

bothQUANTITY

0.97+

DataWorks Summit 2018EVENT

0.97+

FirstQUANTITY

0.96+

next yearDATE

0.96+

eachQUANTITY

0.96+

DataPlaneTITLE

0.96+

theCUBEORGANIZATION

0.96+

HadoopTITLE

0.96+

DataWorksORGANIZATION

0.95+

SparkTITLE

0.95+

todayDATE

0.94+

EULOCATION

0.93+

this morningDATE

0.91+

Atlanta,LOCATION

0.91+

BerlinLOCATION

0.9+

each typeQUANTITY

0.88+

Global Data Protection Regulation GDPRTITLE

0.87+

one commonQUANTITY

0.86+

few months agoDATE

0.85+

NiFiORGANIZATION

0.85+

Data Platform 3.0TITLE

0.84+

each tierQUANTITY

0.84+

Data StudioORGANIZATION

0.84+

Data StudioTITLE

0.83+

day oneQUANTITY

0.83+

one management platformQUANTITY

0.82+

MiNiFiORGANIZATION

0.82+

SanLOCATION

0.71+

DataPlaneORGANIZATION

0.69+

KafkaTITLE

0.67+

Encore ERPTITLE

0.66+

one common setQUANTITY

0.65+

Data Steward StudioORGANIZATION

0.65+

HDFORGANIZATION

0.59+

GeorgiaLOCATION

0.55+

announcementsQUANTITY

0.51+

JoseORGANIZATION

0.47+

Arun Murthy, Hortonworks | DataWorks Summit 2018


 

>> Live from San Jose in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2018, brought to you by Hortonworks. >> Welcome back to theCUBE's live coverage of DataWorks here in San Jose, California. I'm your host, Rebecca Knight, along with my cohost, Jim Kobielus. We're joined by Aaron Murphy, Arun Murphy, sorry. He is the co-founder and chief product officer of Hortonworks. Thank you so much for returning to theCUBE. It's great to have you on >> Yeah, likewise. It's been a fun time getting back, yeah. >> So you were on the main stage this morning in the keynote, and you were describing the journey, the data journey that so many customers are on right now, and you were talking about the cloud saying that the cloud is part of the strategy but it really needs to fit into the overall business strategy. Can you describe a little bit about how you're approach to that? >> Absolutely, and the way we look at this is we help customers leverage data to actually deliver better capabilities, better services, better experiences, to their customers, and that's the business we are in. Now with that obviously we look at cloud as a really key part of it, of the overall strategy in terms of how you want to manage data on-prem and on the cloud. We kind of joke that we ourself live in a world of real-time data. We just live in it and data is everywhere. You might have trucks on the road, you might have drawings, you might have sensors and you have it all over the world. At that point, we've kind of got to a point where enterprise understand that they'll manage all the infrastructure but in a lot of cases, it will make a lot more sense to actually lease some of it and that's the cloud. It's the same way, if you're delivering packages, you don't got buy planes and lay out roads you go to FedEx and actually let them handle that view. That's kind of what the cloud is. So that is why we really fundamentally believe that we have to help customers leverage infrastructure whatever makes sense pragmatically both from an architectural standpoint and from a financial standpoint and that's kind of why we talked about how your cloud strategy, is part of your data strategy which is actually fundamentally part of your business strategy. >> So how are you helping customers to leverage this? What is on their minds and what's your response? >> Yeah, it's really interesting, like I said, cloud is cloud, and infrastructure management is certainly something that's at the foremost, at the top of the mind for every CIO today. And what we've consistently heard is they need a way to manage all this data and all this infrastructure in a hybrid multi-tenant, multi-cloud fashion. Because in some GEOs you might not have your favorite cloud renderer. You know, go to parts of Asia is a great example. You might have to use on of the Chinese clouds. You go to parts of Europe, especially with things like the GDPR, the data residency laws and so on, you have to be very, very cognizant of where your data gets stored and where your infrastructure is present. And that is why we fundamentally believe it's really important to have and give enterprise a fabric with which it can manage all of this. And hide the details of all of the underlying infrastructure from them as much as possible. >> And that's DataPlane Services. >> And that's DataPlane Services, exactly. >> The Hortonworks DataPlane Services we launched in October of last year. Actually I was on CUBE talking about it back then too. We see a lot of interest, a lot of excitement around it because now they understand that, again, this doesn't mean that we drive it down to the least common denominator. It is about helping enterprises leverage the key differentiators at each of the cloud renderers products. For example, Google, which we announced a partnership, they are really strong on AI and MO. So if you are running TensorFlow and you want to deal with things like Kubernetes, GKE is a great place to do it. And, for example, you can now go to Google Cloud and get DPUs which work great for TensorFlow. Similarly, a lot of customers run on Amazon for a bunch of the operational stuff, Redshift as an example. So the world we live in, we want to help the CIO leverage the best piece of the cloud but then give them a consistent way to manage and count that data. We were joking on stage that IT has just about learned how deal with Kerberos and Hadoob And now we're telling them, "Oh, go figure out IM on Google." which is also IM on Amazon but they are completely different. The only thing that's consistent is the name. So I think we have a unique opportunity especially with the open source technologies like Altas, Ranger, Knox and so on, to be able to draw a consistent fabric over this and secured occurrence. And help the enterprise leverage the best parts of the cloud to put a best fit architecture together, but which also happens to be a best of breed architecture. >> So the fabric is everything you're describing, all the Apache open source projects in which HortonWorks is a primary committer and contributor, are able to scheme as in policies and metadata and so forth across this distributed heterogeneous fabric of public and private cloud segments within a distributed environment. >> Exactly. >> That's increasingly being containerized in terms of the applications for deployment to edge nodes. Containerization is a big theme in HTP3.0 which you announced at this show. >> Yeah. >> So, if you could give us a quick sense for how that containerization capability plays into more of an edge focus for what your customers are doing. >> Exactly, great point, and again, the fabric is obviously, the core parts of the fabric are the open source projects but we've also done a lot of net new innovation with data plans which, by the way, is also open source. Its a new product and a new platform that you can actually leverage, to lay it out over the open source ones you're familiar with. And again, like you said, containerization, what is actually driving the fundamentals of this, the details matter, the scale at which we operate, we're talking about thousands of nodes, terabytes of data. The details really matter because a 5% improvement at that scale leads to millions of dollars in optimization for capex and opex. So that's why all of that, the details are being fueled and driven by the community which is kind of what we tell over HDP3 Until the key ones, like you said, are containerization because now we can actually get complete agility in terms of how you deploy the applications. You get isolation not only at the resource management level with containers but you also get it at the software level, which means, if two data scientists wanted to use a different version of Python or Scala or Spark or whatever it is, they get that consistently and holistically. That now they can actually go from the test dev cycle into production in a completely consistent manner. So that's why containers are so big because now we can actually leverage it across the stack and the things like MiNiFi showing up. We can actually-- >> Define MiNiFi before you go further. What is MiNiFi for our listeners? >> Great question. Yeah, so we've always had NiFi-- >> Real-time >> Real-time data flow management and NiFi was still sort of within the data center. What MiNiFi does is actually now a really, really small layer, a small thin library if you will that you can throw on a phone, a doorbell, a sensor and that gives you all the capabilities of NiFi but at the edge. >> Mmm Right? And it's actually not just data flow but what is really cool about NiFi it's actually command and control. So you can actually do bidirectional command and control so you can actually change in real-time the flows you want, the processing you do, and so on. So what we're trying to do with MiNiFi is actually not just collect data from the edge but also push the processing as much as possible to the edge because we really do believe a lot more processing is going to happen at the edge especially with the A6 and so on coming out. There will be custom hardware that you can throw and essentially leverage that hardware at the edge to actually do this processing. And we believe, you know, we want to do that even if the cost of data not actually landing up at rest because at the end of the day we're in the insights business not in the data storage business. >> Well I want to get back to that. You were talking about innovation and how so much of it is driven by the open source community and you're a veteran of the big data open source community. How do we maintain that? How does that continue to be the fuel? >> Yeah, and a lot of it starts with just being consistent. From day one, James was around back then, in 2011 we started, we've always said, "We're going to be open source." because we fundamentally believed that the community is going to out innovate any one vendor regardless of how much money they have in the bank. So we really do believe that's the best way to innovate mostly because their is a sense of shared ownership of that product. It's not just one vendor throwing some code out there try to shove it down the customers throat. And we've seen this over and over again, right. Three years ago, we talk about a lot of the data plane stuff comes from Atlas and Ranger and so on. None of these existed. These actually came from the fruits of the collaboration with the community with actually some very large enterprises being a part of it. So it's a great example of how we continue to drive it6 because we fundamentally believe that, that's the best way to innovate and continue to believe so. >> Right. And the community, the Apache community as a whole so many different projects that for example, in streaming, there is Kafka, >> Okay. >> and there is others that address a core set of common requirements but in different ways, >> Exactly. >> supporting different approaches, for example, they are doing streaming with stateless transactions and so forth, or stateless semantics and so forth. Seems to me that HortonWorks is shifting towards being more of a streaming oriented vendor away from data at rest. Though, I should say HDP3.0 has got great scalability and storage efficiency capabilities baked in. I wonder if you could just break it down a little bit what the innovations or enhancements are in HDP3.0 for those of your core customers, which is most of them who are managing massive multi-terabyte, multi-petabyte distributed, federated, big data lakes. What's in HDP3.0 for them? >> Oh for lots. Again, like I said, we obviously spend a lot of time on the streaming side because that's where we see. We live in a real-time world. But again, we don't do it at the cost of our core business which continues to be HDP. And as you can see, the community trend is drive, we talked about continuization massive step up for the Hadoob Community. We've also added support for GPUs. Again, if you think about Trove's at scale machine learning. >> Graphing processing units, >> Graphical-- >> AI, deep learning >> Yeah, it's huge. Deep learning, intensive flow and so on, really, really need a custom, sort of GPU, if you will. So that's coming. That's an HDP3. We've added a whole bunch of scalability improvements with HDFS. We've added federation because now we can go from, you can go over a billion files a billion objects in HDFS. We also added capabilities for-- >> But you indicated yesterday when we were talking that very few of your customers need that capacity yet but you think they will so-- >> Oh for sure. Again, part of this is as we enable more source of data in real-time that's the fuel which drives and that was always the strategy behind the HDF product. It was about, can we leverage the synergies between the real-time world, feed that into what you do today, in your classic enterprise with data at rest and that is what is driving the necessity for scale. >> Yes. >> Right. We've done that. We spend a lot of work, again, loading the total cost of ownership the TCO so we added erasure coding. >> What is that exactly? >> Yeah, so erasure coding is a classic sort of storage concept which allows you to actually in sort of, you know HTFS has always been three replicas So for redundancy, fault tolerance and recovery. Now, it sounds okay having three replicas because it's cheap disk, right. But when you start to think about our customers running 70, 80 hundred terabytes of data those three replicas add up because you've now gone from 80 terabytes of effective data where actually two 1/4 of an exobyte in terms of raw storage. So now what we can do with erasure coding is actually instead of storing the three blocks we actually store parody. We store the encoding of it which means we can actually go down from three to like two, one and a half, whatever we want to do. So, if we can get from three blocks to one and a half especially for your core data, >> Yeah >> the ones you're not accessing every day. It results in a massive savings in terms of your infrastructure costs. And that's kind of what we're in the business doing, helping customers do better with the data they have whether it's on-prem or on the cloud, that's sort of we want to help customers be comfortable getting more data under management along with secured and the lower TCO. The other sort of big piece I'm really excited about HDP3 is all the work that's happened to Hive Community for what we call the real-time database. >> Yes. >> As you guys know, you follow the whole sequel of ours in the Doob Space. >> And hive has changed a lot in the last several years, this is very different from what it was five years ago. >> The only thing that's same from five years ago is the name (laughing) >> So again, the community has done a phenomenal job, kind of, really taking sort of a, we used to call it like a sequel engine on HDFS. From there, to drive it with 3.0, it's now like, with Hive 3 which is part of HDP3 it's a full fledged database. It's got full asset support. In fact, the asset support is so good that writing asset tables is at least as fast as writing non-asset tables now. And you can do that not only on-- >> Transactional database. >> Exactly. Now not only can you do it on prem, you can do it on S3. So you can actually drive the transactions through Hive on S3. We've done a lot of work to actually, you were there yesterday when we were talking about some of the performance work we've done with LAP and so on to actually give consistent performance both on-prem and the cloud and this is a lot of effort simply because the performance characteristics you get from the storage layer with HDFS versus S3 are significantly different. So now we have been able to bridge those with things with LAP. We've done a lot of work and sort of enhanced the security model around it, governance and security. So now you get things like account level, masking, row-level filtering, all the standard stuff that you would expect and more from an Enprise air house. We talked to a lot of our customers, they're doing, literally tens of thousands of views because they don't have the capabilities that exist in Hive now. >> Mmm-hmm 6 And I'm sitting here kind of being amazed that for an open source set of tools to have the best security and governance at this point is pretty amazing coming from where we started off. >> And it's absolutely essential for GDPR compliance and compliance HIPA and every other mandate and sensitivity that requires you to protect personally identifiable information, so very important. So in many ways HortonWorks has one of the premier big data catalogs for all manner of compliance requirements that your customers are chasing. >> Yeah, and James, you wrote about it in the contex6t of data storage studio which we introduced >> Yes. >> You know, things like consent management, having--- >> A consent portal >> A consent portal >> In which the customer can indicate the degree to which >> Exactly. >> they require controls over their management of their PII possibly to be forgotten and so forth. >> Yeah, it's going to be forgotten, it's consent even for analytics. Within the context of GDPR, you have to allow the customer to opt out of analytics, them being part of an analytic itself, right. >> Yeah. >> So things like those are now something we enable to the enhanced security models that are done in Ranger. So now, it's sort of the really cool part of what we've done now with GDPR is that we can get all these capabilities on existing data an existing applications by just adding a security policy, not rewriting It's a massive, massive, massive deal which I cannot tell you how much customers are excited about because they now understand. They were sort of freaking out that I have to go to 30, 40, 50 thousand enterprise apps6 and change them to take advantage, to actually provide consent, and try to be forgotten. The fact that you can do that now by changing a security policy with Ranger is huge for them. >> Arun, thank you so much for coming on theCUBE. It's always so much fun talking to you. >> Likewise. Thank you so much. >> I learned something every time I listen to you. >> Indeed, indeed. I'm Rebecca Knight for James Kobeilus, we will have more from theCUBE's live coverage of DataWorks just after this. (Techno music)

Published Date : Jun 19 2018

SUMMARY :

brought to you by Hortonworks. It's great to have you on Yeah, likewise. is part of the strategy but it really needs to fit and that's the business we are in. And hide the details of all of the underlying infrastructure for a bunch of the operational stuff, So the fabric is everything you're describing, in terms of the applications for deployment to edge nodes. So, if you could give us a quick sense for Until the key ones, like you said, are containerization Define MiNiFi before you go further. Yeah, so we've always had NiFi-- and that gives you all the capabilities of NiFi the processing you do, and so on. and how so much of it is driven by the open source community that the community is going to out innovate any one vendor And the community, the Apache community as a whole I wonder if you could just break it down a little bit And as you can see, the community trend is drive, because now we can go from, you can go over a billion files the real-time world, feed that into what you do today, loading the total cost of ownership the TCO sort of storage concept which allows you to actually is all the work that's happened to Hive Community in the Doob Space. And hive has changed a lot in the last several years, And you can do that not only on-- the performance characteristics you get to have the best security and governance at this point and sensitivity that requires you to protect possibly to be forgotten and so forth. Within the context of GDPR, you have to allow The fact that you can do that now Arun, thank you so much for coming on theCUBE. Thank you so much. we will have more from theCUBE's live coverage of DataWorks

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Jim KobielusPERSON

0.99+

Rebecca KnightPERSON

0.99+

JamesPERSON

0.99+

Aaron MurphyPERSON

0.99+

Arun MurphyPERSON

0.99+

ArunPERSON

0.99+

2011DATE

0.99+

GoogleORGANIZATION

0.99+

5%QUANTITY

0.99+

80 terabytesQUANTITY

0.99+

FedExORGANIZATION

0.99+

twoQUANTITY

0.99+

Silicon ValleyLOCATION

0.99+

HortonworksORGANIZATION

0.99+

San JoseLOCATION

0.99+

AmazonORGANIZATION

0.99+

Arun MurthyPERSON

0.99+

HortonWorksORGANIZATION

0.99+

yesterdayDATE

0.99+

San Jose, CaliforniaLOCATION

0.99+

three replicasQUANTITY

0.99+

James KobeilusPERSON

0.99+

three blocksQUANTITY

0.99+

GDPRTITLE

0.99+

PythonTITLE

0.99+

EuropeLOCATION

0.99+

millions of dollarsQUANTITY

0.99+

ScalaTITLE

0.99+

SparkTITLE

0.99+

theCUBEORGANIZATION

0.99+

five years agoDATE

0.99+

one and a halfQUANTITY

0.98+

EnpriseORGANIZATION

0.98+

threeQUANTITY

0.98+

Hive 3TITLE

0.98+

Three years agoDATE

0.98+

bothQUANTITY

0.98+

AsiaLOCATION

0.97+

50 thousandQUANTITY

0.97+

TCOORGANIZATION

0.97+

MiNiFiTITLE

0.97+

ApacheORGANIZATION

0.97+

40QUANTITY

0.97+

AltasORGANIZATION

0.97+

Hortonworks DataPlane ServicesORGANIZATION

0.96+

DataWorks Summit 2018EVENT

0.96+

30QUANTITY

0.95+

thousands of nodesQUANTITY

0.95+

A6COMMERCIAL_ITEM

0.95+

KerberosORGANIZATION

0.95+

todayDATE

0.95+

KnoxORGANIZATION

0.94+

oneQUANTITY

0.94+

hiveTITLE

0.94+

two data scientistsQUANTITY

0.94+

eachQUANTITY

0.92+

ChineseOTHER

0.92+

TensorFlowTITLE

0.92+

S3TITLE

0.91+

October of last yearDATE

0.91+

RangerORGANIZATION

0.91+

HadoobORGANIZATION

0.91+

HIPATITLE

0.9+

CUBEORGANIZATION

0.9+

tens of thousandsQUANTITY

0.9+

one vendorQUANTITY

0.89+

last several yearsDATE

0.88+

a billion objectsQUANTITY

0.86+

70, 80 hundred terabytes of dataQUANTITY

0.86+

HTP3.0TITLE

0.86+

two 1/4 of an exobyteQUANTITY

0.86+

Atlas andORGANIZATION

0.85+

DataPlane ServicesORGANIZATION

0.84+

Google CloudTITLE

0.82+

Wrap | Informatica World 2018


 

>> Narrator: Live from Las Vegas, it's theCUBE, covering Informatica World 2018. Brought to you by Informatica. >> Okay, welcome back everyone. This is theCUBE, here at Informatica World 2018 in Las Vegas. CUBE's exclusive coverage. I'm John Furrier, here for the wrap-up of day two of Informatica World, wrapping up the show coverage. Peter Burris has been my co-host all week, chief analyst at Wikibon.org, SiliconANGLE and theCUBE. And Jim Kobielus, lead researcher on AI analytics, big data for Wikibon, SiliconANGLE and theCUBE as well. Guys, let's kind of analyze and dissect what we heard from the conversations. Peter and Jim, we heard from the customers, we heard from the executive management, top partners and top executives. So interesting, and Jim, you've been at the analyst one-on-ones, the keynotes. Good show, I thought it was well done, the messaging, again, continuing the brand. The 25th anniversary of Informatica. Which, that's okay for me, but it's really not 25 years old. It's really like five years old. When the private equity came in, they took the legacy and made it new. >> Well they're a continually renewed company. They're a very different company from what they were even ten years ago, and they've got a fairly aggressive roadmap in terms of evolving into the world of AI and so forth. So they continually renew, as every vendor that hopes to survive inflection points must. >> Jim, what was your takeaway from your sessions? I mean, you saw the keynote, you saw the messaging, you had a chance to sit down one-on-one and ask some tough questions. You heard the hallway conversations amongst the other analysts and customers. What's your personal takeaway? >> A personal takeaway is that Informatica understands that their future must be in the cloud and a subscription model. That means they need to get closer to their core established cloud partners, Microsoft Azure, AWS, Google. At this show, Microsoft, they had the most important new announcements at this show, were all about further integration of the new ICCS, which is the Informatica-- >> Intelligent cloud service. >> Integration and platform service offerings, into the Azure cloud. That was the most important new piece of news in terms of enabling their customers, they have many joint customers already, to bring all of their Informatica assets more completely into the Azure cloud. That was quite important. But there was of lot of showing from AWS here on the main stage and so forth. And we expect further deepening of their Informatica footprint on AWS from those customers. So a, Informatica's future and their customers' future is in public clouds, and I think Informatica knows that the prem-based deployments will decline over time. But this will be-- >> Still good now, so the migration-- >> Well it's a hybrid cloud store. They have Informatica, a strong hybrid cloud store in the same way that an IBM does, or that a Hortonworks does, because most of their customers will have hybridized, multi-cloud models for deployment of this technology for the long term, really, with an emphasis on more public deployments, and I think it's understood. >> Peter, what's your thoughts? You had some great observations and questions. I was listening to you highlighted some of the digital business imperatives that you've been observing and researching and reporting on with the team, but also these guys have been doing it themselves. Any takeaways from you on any change of landscape on digital business, the role of data, the role of the asset. What's your thoughts on that? >> Yeah, I think if we look at the 25 year history, and Jim mentioned there've been a lot of inflection points. The thing that's distinguished Informatica for years is that it always was a company that sought to serve underserved data requirements. So it started out when relational database was the rage, started out doing OLAP and new types of analytics. And then when the data warehouse became what it was it became a data integration issue. And you can kind of see Informatica's always tried to be one step ahead of the needs of hardcore data people. And I think we're saying that here too. They have got really, really smart people that went private so that they could re-tool the company and they are introducing a portfolio that is very focused on the next needs, the next rounds of needs of data people. >> That's a lot of cloud too. >> They're a data pipeline power-- >> Well I would say they're a data pipeline pure player, I think you're doing a-- >> The closest of anybody out there. >> But I think the key thing is, right now, they're at the vanguard of talking about data as an asset, what it means to present data as an asset, tools that should provide for managing data as an asset. And they have the pipeline and all the other stuff, the catalog store that they have is very tied to that. The CLAIRE store that they have is very tied to that. Data is very, very complex. And often it takes an enormous amount of manual labor. >> I think they're checking the boxes on some of the things that I've observed over the years, going back to the early Adobe days streaming data requires some machine intelligence, obviously machine learning, AI, CLAIRE, check. Ingestion of data, managing, getting it all in an intelligent, not a data lake or data swamp, in a fabric that's going to be horizontally scalable-- >> Yeah, absolutely. >> With APIs-- >> Well horizontally scalable actually means something, it means expanding out through APIs and finding new ways of leveraging data. And I think we can make a prediction here based on four years of being here, that Informatica will probably be at the vanguard of the next round of data needs. So today, we're talking about cloud versus on-premise. I wouldn't be surprised if in a year to two years Informatica isn't talking more about how IoT data gets incorporated-- >> And blockchain. >> Yeah, IoT was not mentioned, nor was blockchain, and I think those are kind of significant deficiencies in terms of what we're hearing at this show from Informatica in terms of strategic-- >> Well hold on-- >> But I've think they've got a great team and I expect to see more of that in coming years. >> Well that's a double-edged sword, when the hype's not there, they have a lot of sizzle at stake. >> When I say deficiencies, I mean in terms of strategic discussions of where they're going. I would have liked to have heard more of Peter's discussion. >> I would too, let's get to that in a second. But I want to get your reaction on the whole enterprise catalog piece. Pretty much promoted by Jerry Held, founder of INGRES, legend in the industry, Bruce Chizen, really pumping that up. Their quote was, "This is probably "the most important product." Now, is that a board perspective bias, or is that really something that you guys believe? >> That's really organic. Metadata management is their core competency, and really their core asset inside of all their applications at Informatica, and that's what the big data catalog is all about. It's not just a data catalog, it's a metadata catalog for data discovery and so forth. Everything that is done inside of the Informatica portfolio requires a central metadata repository, and I think we at Wikibon, in our recent report on the big data market, focused on the big data catalog as being one of the key pieces of infrastructure going forward in multi-cloud. You know, there's not just Informatica, there's Alation, and there's Codero, Hortonworks and IBM and others that are going deep on their big data catalogs. >> So you see that's a flagship product for these companies. >> Well let's put it this way, AI has been around since the late 1940s. The algorithms for doing AI have been around, '40s, '50s. The algorithms have been around for years. But the point is, what's occurred recently is the introduction of technology that can actually run these algorithms, that can actually sustain the algorithms against very large volumes of data. So the technology's gotten to the point where you can actually do some of this stuff. The catalog concept has been around for as long as database managers have been around. The problem was you could only build a catalog for just that database manager. The promise of building enterprise-wide catalogs, that dream has been in place for years. One of the worst two days of my life was flying back from Japan, into New York, and sitting in an IBM information model meeting for analysts. It was absolutely-- >> Was that the 40s or 50s? (laughter) >> That was in the 80s. It was absolute hell. But the point is that Informatica is now-- >> You were the prodigy. >> Yeah, I was a prodigy. Informatica is now bringing together a combination of technologies, including CLAIRE, to make it possible to actually do catalog in a very active way. And that's trend setting. >> I think they're right too. I think that's clearly, they make a good product because I've got to say, you know, watching them for five years. This is our fourth year coming to Informatica World. Our first meeting with Anil, when he was chief product officer, was 2014 and so we've seen the progression. They're right on track, and I think they have an opportunity with IoT and blockchain, but the question I want to ask you guys is, this event of about 4,000 people, not a huge big data show, but it's really all about data. There's no distractions. The fact that they can't even get a lot of IoT airtime means that there's been a lot of core discussions. >> They're really focused. >> This is not like a Strata-- >> No. >> Where everyone's marketing some tool or platform. >> These guys are down and dirty with the products. >> They are really focused on their core opportunities, and like Peter was saying, they're really focused, they're the premier, I see the data pipeline solution or platform vendor. The data pipeline is the center of the AI revolution. And so in many ways, all of the forces, all of the trends have converged to the advantage of Informatica as being the core, go-to vendor for a complete data pipeline for all your requirements, including machine learning development. >> There's one more thing. We didn't hear blockchain, we didn't hear IoT, although I bet you there's a lot of conversation, one-on-ones between customers and Informatica about some of those things. But there's one other thing we didn't hear, which I think is very telling, and speaks to some of our trends. We didn't hear open source. Open source was not once mentioned on theCUBE, except maybe you mentioned it once. >> John: You're right. >> Now, if we think about where the big data market was forged, and where it was going to always remain, was it was going to be this big, huge, open-source play. And that has not happened. Informatica, by saying, "We're going to have "a great individual product, "and a great portfolio that works together," is demonstrating that the way to show how the new compute model is going to work is to take a coherent, integrated, focused approach on how to do it. >> It's interesting, I mean we could dissect this. Open source is a great observation, because is there really open source needed if you have a pipeline thing? I'd much rather have a discussion about open data, which I think as your deal points to, is getting into hybrid cloud as fast as possible in a console. To me, that's so much more powerful than open source. >> Jim: Open APIs. >> Open APIs where I can not get locked into Azure. >> I think open source is still important, but I'll bet you that the open source, if you start looking at what these guys are doing and others like them are doing, my guess is that we'll see open source vendors saying, "Oh, so that's how you're going to do catalog. "Okay great, so let's take an open source approach "to doing that." And you know, Informatica's going to have to stay in front of that. >> They might be using some open source. It might not be a top-line message. But let's go the next level, let's go critical analysis on Informatica. What does Informatica need to do, obviously they've got a tail wind, they've got great timing with GDPR, you couldn't ask for a better time to showcase engineering data, governance and application integration across clouds than now. So they're in a good spot. Where are they strong? What do they need to work on? >> Well okay, let's just focus on GDPR, because it is three days from now for that compliance date. GDPR, I mean, Informatica's had some good announcements at this show and prior to this show, in terms of tools for discovery of all your PII and so forth, so you can catalog it in the big data catalog. What needs to be built up by them and other vendors as well, is a more fully fleshed-out, GDPR compliance platform, or portfolio, or ecosystem. There's a lot of things that are needed, like a standardized consent portal so your customers can go in, look up their PII in your big data catalog and indicate their consent or their withdrawal of consent for you to use particular pieces of data. Hortonworks a few weeks ago at their data works in Berlin, they made an announcement related to such a portal. What I'm getting at is that more vendors, including this, every big data catalog vendor needs to have in their portfolio, and will, and I predict within the next two years, a consent portal as one of several important components to enable not just GDPR compliance, but really compliance with any such privacy-- >> A subject portal that offers consent but then is verified. >> Jim: For example, but it needs to be open source. >> Here's what I'll say, John. And we had a conversation about it with Amil, the present chief product officer. I think that if Informatica, similar to what we think, is on the right path, the world is moving to an acknowledgment that data has to be treated as an asset. That tooling is required so that you can do so. And that you have to re-institutionalize work, re-organize work, and re-think, culturally, what it means to use data as an asset. >> With penalties down the road, obviously on the horizon. >> Well there are penalties, and you know, proximate like GDPRs, but also you're out of business if you don't do these kinds of penalties. But one of the things that's going to determine what's going to gate their growth is how many people will actually end up utilizing these technologies? And so if I were to have one thing that I think they absolutely have to do, we're coming out of a world that's focused on we use process, and process models and process-oriented tools to build applications. We're moving into a world where we use data, data methods, data models to build applications. This notion of a data-first world as opposed to a process-first world, Informatica has to take a lead on what it means to be data-first, tooling for data-first, building applications that are data-first, and very importantly, that's how you're going to grow your user base. >> Sajit was talking about data value, data value chains or whatever it's called. >> Supply chains. >> Data supply chains. I think there's going to be a series of data supply chains that are going to be well-formed, well-defined, and ones that are going to be dynamic. Seeing it happening now. >> And actually that's an interesting discussion, data value chains, data supply chains, but really, data monetization chains. The whole GDPR phenomenon is that your customer's PI is their property, and that you need their consent to use it, and to the extent that they give you consent. On some level, the customer's expecting a return of value to them. You know, maybe monetization. Maybe they make money, but more enterprises have to start thinking of data as a product. And then they need to license the IP from whoever owns it. >> Peter: This is a huge issue. >> And vendors like Informatica need to understand that phenomenon and bake it, as it were, into their solution portfolio. >> Either they're going to be on the right side of history on that, or the wrong side, because you're right and you just highlighted Peter's point, which is that data direction, not the process, to your point. >> Data first. >> If I own the data, it's got to be very dynamic. Okay, my final comment would be, and I mentioned this last night when we were talking, is that I think that things are clicking for them. I think they've got tail winds, I think they're smart enough on the product side. The trend is their friend. They've got the clould deals in place. They're in a nice layer in the stack where they can be that Switzerland. You've got storage vendors underneath, there's a nice data layer, so in the position, with coming over the top cloud-native Kubernetes and containers-- >> This is going to get messy fast. >> John: I didn't hear Kubernetes at all this show. >> Hold on, let me finish. This is going to be a robust Switzerland model where I don't think they can handle the onboarding of partners. I think they have a lot of partners now from their standpoint, but I think they might have an AWS factor where they're going to have to start thinking really hard about how to be efficient about onboarding partners. To your point about adoption, this is going to be a huge issue that could make or break them. They could scale the partnership model through the APIs, they could have a robust ecosystem. That could show us 15,000-- >> If they could be a magnet brand inside Azure, or a magnet brand inside AWS for how you think about building new classes of value, applications and others, with a data-first approach, then a lot of interesting things could happen. >> Yeah, they could be a magnet brand to avoid getting disintermediated by their public cloud partners because Microsoft's got a portfolio they could place with theirs. AWS has built one. >> Everybody wants this. >> Yeah, everybody wants them. >> Guys, great job. Peter, great to host with you. Jim, great to have you on, making an appearance in between your meetings, one-on-ones and the analyst stuff. >> I'm a busy man. >> That's theCUBE here, wrapping up day two of coverage here at Informatica World 2018. The trend is their friend. Data's at the center of the value proposition, and more strategic ever, data engineering, governance, application. This is all happening right now. Regulations on the horizon. A cultural shift happening. And we're out here in the open doing it, sharing the data with you. Thanks for watching Informatica World 2018. (energetic music)

Published Date : May 23 2018

SUMMARY :

Brought to you by Informatica. from the customers, that hopes to survive You heard the hallway future must be in the cloud knows that the prem-based in the same way that an IBM does, of the asset. company that sought to serve that they have is very tied to that. on some of the things that I've observed of the next round of data needs. and I expect to see more a lot of sizzle at stake. of where they're going. founder of INGRES, legend in the industry, Everything that is done inside of the So you see that's a flagship product So the technology's gotten to the point But the point is that Informatica is now-- to make it possible to actually do catalog to ask you guys is, some tool or platform. dirty with the products. all of the trends have converged and speaks to some of our trends. is demonstrating that the way to show if you have a pipeline thing? Open APIs where I can going to have to stay But let's go the next level, in the big data catalog. A subject portal that offers consent to be open source. is on the right path, the world is moving With penalties down the But one of the things that's Sajit was talking about data value, and ones that are going to be dynamic. and that you need their consent to use it, Informatica need to understand not the process, to your point. They're in a nice layer in the stack Kubernetes at all this show. This is going to be a for how you think about to avoid getting disintermediated and the analyst stuff. Regulations on the horizon.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
JimPERSON

0.99+

JohnPERSON

0.99+

Jim KobielusPERSON

0.99+

PeterPERSON

0.99+

Jerry HeldPERSON

0.99+

AWSORGANIZATION

0.99+

JapanLOCATION

0.99+

IBMORGANIZATION

0.99+

New YorkLOCATION

0.99+

Peter BurrisPERSON

0.99+

BerlinLOCATION

0.99+

2014DATE

0.99+

InformaticaORGANIZATION

0.99+

WikibonORGANIZATION

0.99+

five yearsQUANTITY

0.99+

Bruce ChizenPERSON

0.99+

MicrosoftORGANIZATION

0.99+

SiliconANGLEORGANIZATION

0.99+

INGRESORGANIZATION

0.99+

GoogleORGANIZATION

0.99+

HortonworksORGANIZATION

0.99+

fourth yearQUANTITY

0.99+

CoderoORGANIZATION

0.99+

John FurrierPERSON

0.99+

Wikibon.orgORGANIZATION

0.99+

15,000QUANTITY

0.99+

two yearsQUANTITY

0.99+

theCUBEORGANIZATION

0.99+

GDPRTITLE

0.99+

25 yearQUANTITY

0.99+

Las VegasLOCATION

0.99+

CUBEORGANIZATION

0.99+

two daysQUANTITY

0.99+

late 1940sDATE

0.98+

four yearsQUANTITY

0.98+

todayDATE

0.98+

Informatica World 2018EVENT

0.98+

about 4,000 peopleQUANTITY

0.98+

AlationORGANIZATION

0.98+

SwitzerlandLOCATION

0.98+

oneQUANTITY

0.97+

Vira Shanty, Lippo Digital Group | Informatica World 2018


 

>> Announcer: Live from Las Vegas, it's the Cube. Covering Informatica World, 2018. Brought to you by Informatica. >> Okay welcome back everyone, this is the Cube live here in Las Vegas for Informatica World 2018 exclusive coverage of the Cube. I'm John Furrier co-host of the Cube with Jim Kobielus, my co-host this segment and with that we'll keep on continue with the Cube. Our next guest is Vira Shanti who is the chief data officer at Lippo Digital Group, welcome to the Cube. >> Thank you so much, very excited to be here. >> Thank you for coming on, but people don't know before we came on camera, you and Jim were talking in the native tongue. Thanks for coming on. I know your chief data officer, we've got a lot of questions we love these conversations because we love data, but take a minute to explain what you guys are doing, what the company is, what the size is and the data challenges. >> Okay, maybe let me introduce myself first, so my name is Vira, my role is the chief data officer. Responsibility, that actually is cover for the big data transformation for the Lippo group data. Lippo group is actually part of the one of the largest in Indonesia, we serve a middle class for the consumer services, so we are connecting I think more than 120 million of the customers. What's Lippo as a group doing is actually we do many things. We are the largest of the hospital in Indonesia or just super market, we do department stores, coffee shop, cinema, data centers. We on bang as well, news, cable TV, what else? >> You have a lot of digital assets. >> What you do is you drive to any state in Indonesia and you see Lippo everywhere. >> Yeah, education as well, from the kindergarten to the university, that's why it's a lot of diversity of the business, that owned by Lippo. But recently we're endorsing a lot in the digital transformation, so we're releasing a new mobile app, it is called OVO, O, V, O. Actually it's like centralized loyalty E money to providing the priority bills to all the Lippo group customers, so they're not going to maintain their own membership loyalty program, it's going to just like the OVO, so it's not only being accepted by Lippo ecosystem, but also to the external ecosystem as well. We start to engage with the machine partner, we just today sorted like reaching out 30000 machine outlets. >> Let's get Jim's perspective, I want you to connect the dots for me, because the size and scope of data, you talk about deep learning a lot. And let's connect the dots, cuz we've heard a lot of customers here talking about being having data all over the place. How does deep learning, why do you catalog everything? If you've always diverse assets, I'm sure there are different silos. Is there a connection, how are you handling? >> Okay, differently it's not easy job to do, implementing big data for this kind of a lot of diversity of the business, because how to bring all of this data coming from the different source, coming from the different ecosystem to the single analytical platform is quite challenging. The thing is, we also need to learn first about the business, what kind of the business, how they operate, how they run the hospital, how they run the supermarket, how they run the cinema, how they run the coffee shop. By understanding this thing, my team is responsible to transform, not start from the calling the data, cleansing the data, transform the data, then generate the insight. It has to be an action inside. Then we also not only doing the BI things, but also how from their data we can developing the analytical product on top of the technology big data, that we own today. What we deliver is actually beyond the BI. Of course we do a lot of thing, for example, we really focusing in doing the customers 360 degree profile, because that's the only reason how we really can understand out customers. Today, we have more than 100s of customer attribute teaching for individual customers. I can understand what's your profile for the purchasing behaviors, what kind of the product, that you like. Let's say for the data coming from the supermarket, I know what's your brands, your favorite, whether you're spending is declining. How you spend your point, part of the loyalty program. Then many things, so by understanding very deep these, that we can engage with customers in the better way in providing the new customer experience, because we not only let's say providing them with the right deals, but also when would be the right time, we should connect to them providing something, that they might need. This is the way how from the data we try to connect with our customers. >> Yeah, provided more organic experience across the entire portfolio of Lippo brands throughout the ecosystem. It doesn't feel to the customer and so it isn't simply a federation of brands, it's one unified brand in some degree from the customer's point of view delivering value, that each of the individual components of the Lippo portfolio may not be able to provide. >> Yes, yes, so many things actually we can do on top of that 360 degree of the customers. Our big data outcome in the form of the API. Why it has to be in the API, because when we interact with the customer, there could be unlimited customer touch point to call this API. It could be like the mobile apps after smart customer touch point or could be the dashboard, that we develop for our Lippo internal business. Could be anything or even we can also connect to the other industry from the different business, then how we can connect each other using that big data API, so that's why-- >> Is it an ecosystem, isn't that one API, or it's one API, when unified API for accessing all the back end data and services? >> For something like this, there are to type of the API, that we develop, number one is the API, that belong to the customer 360 degree. Every entry would then attach to your profile and say we can convert it to the API. Let's say smart apps, as part of customer touch point, for example like OVO, we would like to engage with our customers, meaning, that the apps can just designing their online business orchestration, then calling a specific API by understanding let's say from the point of view of loyalty or product preference, that you like, so that then what kind of offers, that we need to push to the customer touch point general using the OVO apps. Or even let's say other supermarket have their on apps, so the apps can also following our API based on their data to understand what kind of the brand or the preference probably they like. Let's run in their apps, when the customer connects, it's going to be something, that really personalized. That's why it's in order to manage the future, actually it's very important for us to deliver this big data outcome in the form of the API. >> It scales too, not a lot of custom work, you don't have to worry about connecting people and making sure it works, expose an API and say, there it is and then. >> Different countries, in terms of privacy in the use of personally identifiable information, different countries and regions have their own different policies and regulations, clearly the European union is fairly strict, the European union with GDPR coming along, the US has its own privacy mandates, in Indonesia, are there equivalent privacy regulations or laws, that we require for example. You ask the customers to consent to particular uses of their data, that you're managing with your big data system, that sits behind OVO. Is that something in your overall program, that you reflect? >> Yes, there are some regulation in Indonesia governed by the government, they'll call having their own regulation, but we let's say part of the thing, that, yes, there is a specific regulation. But regulation for the retail is not really that clear yet for now, but we put ourself in the higher restricted regulation, that we put in place as part of our data protection, part of our data governance compliance as well. If until we do this demonetization or consolidating this data, there is no data, that's being shared outside the entity of the organization. Because let's say, when we do that demonetization everything's done by system to system, when it's called the API, so there is no hands off for other customer in individual data. Let's say if our partner FMCG digital agency or even advertiser, future wise they would like to call our API, what they can see, but that target lead of the customers, that they would like to connect is actually not individual of the data. It's going to be in the aggregated format. Even though many segmentation, that we can deliver is not going to expose every individual customer. >> You have a lot of use cases, that you can handle, because of the control governance piece. How about, by the way, that's fantastic and I know how hard it must be the challenge, but you have it setup nicely. Now that the setup with Informatica and the work you're doing, how are you interfacing with developers, cuz now you have the API. Is it just API based, are you looking at containers, kubernetes, clout technologies? Are you guys looking at that down the road or is that part of the, or is it just expose the API to the developers? >> For today, that actually who's going to consume our API actually? Definitely it's going to be the ecosystem of the Lippo internals, how the customer touch point can leverage the API. Then for the external, for example, like FMCG, the digital agency, when they call our API, usually it's like they can subscribe, there could be some kind of the business model divine there, but once again, like I mentioned to you, let's say it's not going to reveal any individual customer information, but the thing is, how we deliver this API things? We develop our own API system, we develop our API gateway, in simple thing, that actually how to put the permission or grant the access of any kind of digital channel, when they consumer our API and what kind of subscription meta? What we did for the big data actually is not really into, we investing a lot of technology in place for us to use. The thing, that makes my team so exciting about this transformation, because we like to create something, that's we create our own API gateway. We create some analytic product on top of the technology, that we have today. >> When they subscribe to the API, you're setting policy for the data, that they can get and you're done. >> Something like that. >> You automated that. Cool, well we see a lot of AI, any machine learning in your future, you, guys, doing any automation, how are you guys thinking about some of the tools we've been seeing here at the show around automation and AI, Clair, you tapping into any of the goodness? >> Yes, if everybody like to talk what AI right? >> John: You got API, you're good, you don't need anything. >> Many organization, when they're really implementing big data, sometimes they start jumping, I need to start doing the AI things. But from our point of view, yes, AI is very important, definitely we will go there, but for now, what's important for us is how we really can bring the data to single analytical platform, developing that 360 degree customer profile, because we really need to understand our customer better. Then thinking about how we can connect with them, how we can bring the new experience and especially at the right time. >> Actually let me break down AI, cuz I cover AI for Wiki bond, it's such an enormous topic, I break it down in specific things, like for example, speech recognition for voice activated access to digital assistance, that might be embedded in a mobile phones. Indonesia is a huge diverse country, it's an acapela, you have many groups living under the unitary national structure, but they speak different languages, they have different dialects, do you use or are you considering speech recognition? How you would tailor speech recognition in a country, that is so diverse as Indonesia. Is that something an application of AI you're considering using in terms of your user interface? >> Okay, for now we not really into there yet, because you are definitely correct. Developing that kind of library for Indonesia, because different dialect, different accent, it's tough, so the AI things, that we're looking for is actually going to be product recommendation engine. Because you know, let's say, that a lot of things on top of this customer 360 degree, that we can do, right? Because meaning it's going to open unlimited opportunity how I can engage to the customers, what kind of the right offer. Because there's a lot of brand owners, like FMCG, that they would like to connect, also getting in touch, reach out our customers. By developing this kind of product recommendation engine, let's say using the typical machine learning, so we can understand when we introduce this thing, customer like it, introduce that thing, they don't like it. >> Let me ask the next logical question there, it's such a big diverse country, do you, in modeling the customer profile, are you able to encode cultural sensitivities, once again, a very diverse country, there's probably things you could recommend in terms of products to some peoples, that other people might find offensive or insensitive, is that something, that in terms of modeling the customer, you take into consideration? It doesn't just apply to Indonesia, it applies here too or anywhere else, where you have many people. >> Of course can to do that the modeling, but we're doing right now, let's say once again, speaking about the personalized offer, from that point of view, what we see is to create the definition based on customer spending power first, buying power, we need to understand, that this customer's actually in which level of the buying power. By understanding this kind of buying power level, then we really can understand, that should we introduce this kind of the offers or not. Because this is too expensive or not. Because customer spending level can be also different. Let's say when our customers spend in our supermarket, maybe it's going to medium spending level, but let's say when they spend their money to purchase the coffee, maybe it's regular basis, so it's more spending. Could be different spending, so we also need to learn this kind of thing, because sometimes the low spending or medium spending or high spending, sometimes it's not something, that we put in the effort level for everything, sometimes it could be different. This is the thing, that also very exciting for us to understand this kind of spending, buying power. >> Great to have you on the Cube, thanks for coming, so I got to ask you one final question. I heard you were in an honorary Informatica innovation award honoree, congratulations. >> Thank you. >> What advice would you have for your peers, that might want to aspire to get the award next year? >> The thing is, our big data journey just start last year. Really start from the zero, so when yesterday we get an award for the analytics, so actually what we really focus on to do something, that actually is very simple. Some organization, when they're implementing big data sometimes they would like to do everything in the phase one. What we're planning to do is number one, how to bring the data very fast, then understand what kind of value of the data, that we can bring to the organization. Our favorite one is developing the customer 360 degree profile, because once you really understand your customer from any point of view, it's going to open unlimited opportunities how you can engage with your customers, it also open another opportunity how you can bring another ecosystem to our business to engage with our customers, that one point of view is already opening a lot of thing, huge. Either that thinking what would be the next step. Of course, that API is going to simplify your business in the future scale so on. That's becoming our main focus to allow us to deliver a lot of quick low hanging effort at the same time. I think that's a thing, that makes us really can, within a short period of time, can deliver a lot of things. >> The chief data officer at Lippo digital group, thanks for sharing your story, it's the Cube, we're here live in Las Vegas. They're going to be bonding here talking about all the greatness going on there. This is the Cube here in Las Vegas, stay with us for continuing day two coverage of Informatica world 2018, we'll be right back.

Published Date : May 23 2018

SUMMARY :

Las Vegas, it's the Cube. I'm John Furrier co-host of the Cube Thank you so much, and the data challenges. of the one of the largest to any state in Indonesia of the business, that owned by Lippo. And let's connect the the data we try to connect of the Lippo portfolio may of that 360 degree of the customers. of the API, that we develop, you don't have to worry You ask the customers to but that target lead of the customers, the API to the developers? of the Lippo internals, how for the data, that they into any of the goodness? you don't need anything. the data to single analytical platform, to digital assistance, degree, that we can do, right? in modeling the customer of the buying power. so I got to ask you one final question. that we can bring to the organization. This is the Cube here in Las Vegas,

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Jim KobielusPERSON

0.99+

JohnPERSON

0.99+

LippoORGANIZATION

0.99+

Vira ShantiPERSON

0.99+

IndonesiaLOCATION

0.99+

JimPERSON

0.99+

Vira ShantyPERSON

0.99+

Lippo Digital GroupORGANIZATION

0.99+

John FurrierPERSON

0.99+

InformaticaORGANIZATION

0.99+

ClairPERSON

0.99+

360 degreeQUANTITY

0.99+

Las VegasLOCATION

0.99+

last yearDATE

0.99+

next yearDATE

0.99+

360 degreeQUANTITY

0.99+

TodayDATE

0.99+

yesterdayDATE

0.99+

European unionORGANIZATION

0.99+

more than 120 millionQUANTITY

0.99+

30000 machine outletsQUANTITY

0.99+

todayDATE

0.98+

ViraPERSON

0.98+

one final questionQUANTITY

0.98+

more than 100sQUANTITY

0.97+

OVOORGANIZATION

0.97+

firstQUANTITY

0.96+

GDPRTITLE

0.95+

Informatica world 2018EVENT

0.95+

Informatica World 2018EVENT

0.95+

oneQUANTITY

0.94+

USORGANIZATION

0.93+

CubeCOMMERCIAL_ITEM

0.93+

OVOTITLE

0.92+

day twoQUANTITY

0.9+

eachQUANTITY

0.89+

zeroQUANTITY

0.88+

one pointQUANTITY

0.85+

single analytical platformQUANTITY

0.83+

2018DATE

0.81+

single analytical platformQUANTITY

0.76+

FMCGORGANIZATION

0.76+

phase oneQUANTITY

0.7+

CubeORGANIZATION

0.59+

Informatica WorldEVENT

0.54+

Wiki bondORGANIZATION

0.43+

Tracy Ring, Deloitte Consulting | Informatica World 2018


 

>> Announcer: Live, from Las Vegas, it's theCUBE! Covering Informatica World 2018. Brought to you by Informatica. Okay, welcome back everyone, this is theCUBE, live here in Las Vegas at The Venetian, this is Informatica Worlds exclusive coverage with theCUBE, Informatica World 2018, I'm John Furrier, with my co-host Jim Kobielus, analyst at Wikibon, SiliconANGLE, and theCUBE, our next guest is Tracy Ring, Vice President at Deloitte Consulting, great to see you again. >> You as well! >> So, love havin' you on, last year, you know, we go through all the interviews and, you know it always comes up, and this is important, you know we are passionate about women in tech, inclusion and diversity, huge topic, the job's never done, in fact, I was in New York last week for a blockchain event, and I wore a shirt that said: Satoshi's Female. (Tracy laughing) And I literally was getting so many high fives and, but it's not just women in tech, there's a role that men play, this is, sort of an ongoing conversation so. What's the state of the industry, from your perspective, how do you see it? Obviously the data world is, indiscriminate data is data, >> Tracy: Absolutely. >> It should be 50/50. >> Yeah, you know I think that the, the opportunity is multi-faceted, right? So we're in a place where technology is changing unbelievably fast, we're graduating nearly as many men as women in, fields of science, data analytics, computer engineering, etc. But what we're not seeing, a combination of women in leadership roles as much as we would expect, we're not seeing the retention of women in those roles. And for me, I'm really passionate about the fact that supporting, attracting, and keeping women in those roles, is really critical, right? There's an interesting facet to how this all really, really plays together, Deloitte for 20 years has a women initiative, right? 20 years of supporting women, embracing them, helping them support leadership roles and, and I think that the time is now. If not, it's long overdue, to really support them within this field. I also think that women in data, an initiative that we're launching this year, and having our launch event today, is sort of super timely because women in data is not women who only become CIOs, or will only become CDOs, these are women that will be the Chief Marketing Officers, the CHROs, and using data to tell their stories. >> You know, we had a guest on earlier, who was a man, but he was the head of the CDO for the Ireland Bank, and Peter Burris asked the question, said hey, where did you come from technical? No, he came from the business side, who knows technology, this is what you're getting at, and I think this is something that we've been seeing as a pattern that you don't rise up through the ranks and be super nerdy, although that's cool too, and there's a lot more STEM action but there's also multiple vectors into the field. You can come from business, and know tech, and a lot more tech is consumable, and learnable, either online, or through some sort of other proficiency so, this is a big story and so, how do you guys, looking at that, at Deloitte, I know Deloitte's got the track record, but this all scales beyond Deloitte, right? It's an industry thing. >> Tracy: Absolutely. >> How are you guys seeing this? How are you looking at helping people, either connect the dots, or support each other? What's some of the latest and greatest? >> Yeah, I mean I think Informatica is part of what has created the case for change, right? We've democratized data integration, we have, you know, made self-service analytics, we've put data in the cloud in everyone's hands, right? So technology is out there more, every single day, and I think the unique part is, is that, when we think about diversity wholistically, and I think of diversity from ages, and geographic, and gender, etc. I think really being able to take all of that diverse experience, and be able to listen to business user's requirements in a way that they can hear it! And listen for something different, right? And brings skills to bare, that aren't necessarily there. I think if we can build better technology, that's more future-proofed, based on having a diverse crowd listening, and trying to build something that's far more compelling than, you know, I asked for X, build me X. I think when we really do our clients, and the world of justice is when we, you know, someone asks for X, and you ask them 10 more questions, and heavy--what about this? And what, and what, and what? And I think really being much more inquisitive, giving people the ability to be inquisitive, and bringing more opinions to the table to be inquisitive. >> And bringing more diversity of practice, makes the applications better, so that's clear. We see that in some of the conversations we have, but I got to ask about the question of roles, what are you seeing, kind of, you look at the trends, are there certain roles that are, that are being adopted with women in tech more than others? Less, trending down, up? What are some of the trend lines on, either roles in tech, for women? >> Yeah, you know, I think that over all, when I had the opportunity, so when we decided, we're going to launch a program within Informatica. We want the women who are going to be the Chief Data Officers of tomorrow. And it was a great question because, actually what we ended up saying is, the Chief Data Officers of tomorrow could be so many different current roles right now, right? And how do we really, kind of, attract the right women into this cohort, support them for a long year and, provide them the forum to network, connect with others, understand different career paths. You know, looking at what we're seeing, you know, with GDPR, and regulations, and all these other things happening, you know, the concepts and roles that didn't even exist years ago, right, so data governance leads and, Chief Analytic Officers, and all of these-- >> James: Or Chief AI Officers! >> Exac--(laughing) >> How do we bring women into the hottest fields like AI, deep learning? If you look at the research literature, out of, both the commercial and the academic world, many of the authors of the papers are men, I mean, more than the standard ratio of men to women in the corporate space, near as I can tell, from my deep reading. How do you break women into AI, for example, when they haven't been part of that overall research community? That's just a, almost like a rhetorical question. >> Yeah, how do you not, you know, it's just impossible to not bring them to bear, the skills, the talent, the ingenuity, I think it's absolutely mandatory, and someone said to me, they said well, why are the men not invited to this event? Why are they not in the cohort? And I said, you know, because there's a component of all this, that we want to grow and foster and support, and create opportunities. You know, one of the women that sat on our board today said, you know, I'm not somebody who's going to golf, I'm not someone who's going to go to a sports game, I'm going to meet you in the board room, and we're going to talk about compelling topics there. And so I think it's about, encouraging and fostering a new way of networking that's more aligned with what women are interested in, and what, you know, sometimes we do best and, I think creating an opportunity for a different type of everything, in the way that we operate is important. >> I think self-awareness, for men, and this also, creating a good vibe, right? Having a good vibe is critical, in my opinion, and also, you know, not judging people right, you know, based upon, you have some women say, hey I like to get dressed up and that's what I am, some people who don't want to go to sports and, some guys want this, so I think generally, there needs to be, kind of a reset, like hey, let's just have an open mind and a good vibe. >> It's like lunch and learns, you know, lunch and learns are, are a great enabler for centers of confidence, to get together on a regular basis, to talk about business and technical-related things, but also it's a social environment. How can you build more of those kinds of opportunities into the corporate culture, where, they're not skewing, the actual socializing, to traditionally male-dominated hobbies or interests, or traditionally female-dominated hobbies or interests? How can you have, sort of a balance, of those kinds of socialization opportunities in a professionally appropriate environment that also involve a fair amount of shop talk? 'cause that's what gets people bonding, promoted in their careers is that they do deep shop talk in the appropriate settings. >> Yeah, it's interesting, one of the women that I personally consider a mentor, she said if it wasn't for data, I wouldn't be where I am today. And she said, you know, I grew up in and industry where, unfortunately, I really didn't have a voice at the table, and my voice at the table came from data, it came from my ability to see connections, patterns, and detect things, and also for my ability to create networks of people, and make connections and pull things together in a way that my colleagues weren't doing. And, you know, when she tells that story I think that's, that's the template, right? >> John: That's the empowerment. >> We want to say, use everything at your bevy to bring the best value to your business end-users, and she's connecting the dots in a way that no one else had, and is using data as really, the impetis to really, solidify everything that she's saying, it's inarguable. >> That's a great story, it's a phenomenal story. >> It's just amazing. >> Once she got into power she really drove that hard, that's awesome. Well, let's take that to the next level, so, you know, I have a daughter as a junior at UCAL Berkeley, and she's a STEM girl, and so she's got a good vibe in there >> James: STEM girl, I have a stem girl too, mines 28 now. >> You know, and so, kind of aside, but she, turned away from computer science because, at, you know, in middle school the vibe wasn't there, right? And it was kind of a social thing, we mentioned social. You're advice to young women now? Because we're seeing people with the democratization, you see YouTube, you see all these tools, you got robots, you got makers, of course, you got data, you've seen a lot more touch points where people can, you know, ingratiate in unthreatened, un, you know, just, getting immersed in tech. So you have, you're starting to get people the taste of not being tracked into it. So, what's the advice for young folks trying to navigate? And is it networking groups, is it mentoring? What's the playbook in your mind? >> Yeah, I think it's a combination of everything that you've mentioned, right? I absolutely think that your network, and what one of my mentors calls your sleeper network, right? The network that's out there, the people that I worked with five years ago, and we worked, and were in a war room til two a.m. and you know, then I, I just got busy, right? And reactivating your sleeper networks, you know, having the courage to kind of, keep people apprised, using social media, in a way that people, you know, the number of people that say, oh I didn't know you were up to this, that, or the other, thank goodness you posted. And so, I think using all of the technology to your advantage. And I also think there's a component of someone, I mean, I had an MIS degree for undergrad, and I started out as a developer. >> You might have to explain what this is for the younger generation. (laughing) >> Oh, I know, how crazy is that! Oh my gosh, >> Is that in the DP department, was that in the DP department? >> Can you imagine. But I wasn't interested in technology that much, it was what was going to get me a job and, and I thought I would become a business analyst, I've stayed with it, and now really passionate about tech, but, I think there's a component of all this that, every job, you know, the CHROs, the CAOs, all of the roles that roll up, you know, every finance person I know that's exceptional, is phenomenal with data! Right? And so, I think, not only creating a network of people that are in the industry, but I think it's about telling the stories outside the industry, and telling the oh my gosh, you'll never believed what we learned today. And I think that's the magic of the stories, and being transparent. >> Well Tracy, you're an inspiration, thanks so much for coming on theCUBE, really love the story. I got to ask, what are you up to now? Tell us what's up with you, obviously you've moved on from MIS, Management Information Systems, part of the DP, Data Processing department, that's many computer days. >> Tracy: Oh my. >> Oh my God, we're goin' throwback there. >> Tracy: Absolutely. >> What're you up to now? What are you havin' fun with? >> Yeah, so my day job, I have the luxury of working across our cognitive analytic, and our PA alliances, which is an insane mouthful, but it means I get to work with some of our most exciting alliance partners that Deloitte is building solutions, and going to market, and getting really great customer stories under our belt. And I think really kind of blowing the doors off of, of what we did three years ago, five years ago, and 20 years ago, when MIS degrees were still being handed out, so. >> A lot more exciting now, isn't it. >> (laughing) It's way better now! So. >> I wish I was 23 again, you know, havin' a good time. (Tracy laughing) >> Yeah, so, really wholistically, seeing what we consider ecosystems and alliances, is, that's my day job. >> Tracy Ring, Vice President at Deloitte, great story, fun to have on theCUBE, also doing some great work, super exciting time, you got cloud, you got data, it really is probably one of the most creative times in the tech industry, it's super fun to get involved. This is theCUBE, here out in the open, at Informatica World in Las Vegas. I'm John Furrier with Jim Kobielus, be back with more, stay with us! From Vegas, we'll be right back. >> Tracy: Thank you. (bubbly music)

Published Date : May 23 2018

SUMMARY :

great to see you again. on, last year, you know, I also think that women in data, I know Deloitte's got the track record, is when we, you know, what are you seeing, kind Yeah, you know, I think that over all, and the academic world, And I said, you know, and also, you know, not It's like lunch and learns, you know, And she said, you know, I and she's connecting the dots That's a great story, you know, I have a daughter James: STEM girl, I have a at, you know, in middle school in a way that people, you know, for the younger generation. all of the roles that roll up, you know, I got to ask, what are you up to now? I have the luxury of (laughing) It's way better now! you know, havin' a good time. seeing what we consider of the most creative times Tracy: Thank you.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Jim KobielusPERSON

0.99+

Peter BurrisPERSON

0.99+

TracyPERSON

0.99+

JohnPERSON

0.99+

Tracy RingPERSON

0.99+

JamesPERSON

0.99+

DeloitteORGANIZATION

0.99+

Ireland BankORGANIZATION

0.99+

Deloitte ConsultingORGANIZATION

0.99+

New YorkLOCATION

0.99+

VegasLOCATION

0.99+

InformaticaORGANIZATION

0.99+

John FurrierPERSON

0.99+

Las VegasLOCATION

0.99+

28QUANTITY

0.99+

20 yearsQUANTITY

0.99+

WikibonORGANIZATION

0.99+

last yearDATE

0.99+

tomorrowDATE

0.99+

last weekDATE

0.99+

todayDATE

0.99+

23QUANTITY

0.99+

10 more questionsQUANTITY

0.99+

five years agoDATE

0.99+

three years agoDATE

0.98+

20 years agoDATE

0.98+

UCAL BerkeleyORGANIZATION

0.98+

SiliconANGLEORGANIZATION

0.98+

GDPRTITLE

0.98+

YouTubeORGANIZATION

0.98+

bothQUANTITY

0.98+

this yearDATE

0.97+

theCUBEORGANIZATION

0.97+

oneQUANTITY

0.97+

two a.m.DATE

0.96+

Informatica World 2018EVENT

0.94+

CDOORGANIZATION

0.8+

50/50QUANTITY

0.78+

Vice PresidentPERSON

0.77+

SatoshiPERSON

0.69+

2018DATE

0.68+

single dayQUANTITY

0.64+

Informatica WorldEVENT

0.6+

VenetianLOCATION

0.55+

WorldLOCATION

0.54+

theCUBEEVENT

0.53+

WorldsEVENT

0.45+

Matthew Cox, McAfee | Informatica World 2018


 

(techy music) >> Announcer: Live from Las Vegas, it's theCUBE, covering Informatica World 2018. Brought to you by Informatica. >> Hello, and welcome back to theCUBE. We are broadcasting from Informatica World 2018, The Venetian in Las Vegas. I'm Peter Burris, once again, my cohost is Jim Kobielus, Wikibon/SiliconANGLE. And at this segment, we're joined by Matthew Cox, who's the director of Data & Technology Services in McAfee. Welcome to theCUBE, Matthew. >> Thank you very much. Glad to be here. >> So, you're a user, so you're on the practitioner side. Tell us a little bit about what you're doing in McAfee then. >> So, from a technology standpoint, my role, per se, is to create and deliver an end-to-end vision and strategy for data, data platforms and services around those, but always identifying a line to measurable business outcomes. So my goal is to leverage data and bring meaning of data to the business and help them leverage more data-driven decisions, more toward business outcomes and business goals. >> So you're working both with the people who are managing the data or administering the data, but also the consumers of the data, and trying to arbitrate and match. >> Absolutely, absolutely. So, the first part of my career, I was in IT for many years, and then I moved into the business. So for probably the last 10 years, I've been in sales and marketing in various roles, so it gives me kind of a unique perspective in that I've lived their life and, probably more importantly, I understand the language of business, and I think too often, with our IT roles, we get into an IT-speak, and we aren't translating that into the world of the business, and I have been able to do that. So I'm really acting as a liaison, kind of bringing what I've seen of the business to IT, and helping us deliver solutions that drive business outcomes and goals. >> What strategic initiatives are you working on at McAfee that involve data? >> Well, we have a handful. Number one, I would say that our first goal is to build out our hub-and-spoke model with MDM, and really delivering our-- >> Jim: Master data management? >> Our master data management, that's correct. And really delivering our, because at MDM, that is where we define our accounts, our contacts, we build our upward-linking parents and our account hierarchies, and we create that customer master. That's the one lens that we want to see, our customers across all of our ecosystem. So we're finishing out that hub-and-spoke model, which is kind of an industry best practice, but for both realtime and batch-type integrations. But on top of that, MDM is a great platform, and it gives you that, but the end-to-end data flow is another area that we've really put a priority on, and making sure that as we move data throughout the ecosystem, we are looking at the transformations, we are looking at the data quality, we're looking at governance, to make sure that what started on one end of the spectrum look the same, or, appropriately, it was transformed by the time it gets to the other side as well. I'll say data quality three times: Data quality, data quality, data quality. For us, it's really about mastering the domain of data quality, and then looking at other areas of compliance, and the GDPR just being one. There's a number of areas of compliance areas around data, but GDPR's the most relevant one at this time. >> There's compliance, there's data quality, but also, there must be operational analytical insights to be gained from using MDM. Can you describe how McAfee, what kind of insights you're gaining from utilization of that technology in your organization? >> Sure, well, and MDM's a piece part of that, so I can talk how the account hierarchy gives us a full view. Now you've got other products, like data quality, that bolt on, that allow us to filter through and make sure that that data looks correct, and is augmented and appended correctly, but MDM gives us that wonderful foundation of understanding the lens of an account, no matter what landscape or platform we're leveraging. So if I'm looking at reporting, if I'm looking at my CRM system, if I'm looking at my marketing automation platform, I can see Account A consistently. What that allows me to do is not only have analytics built that I can have the same answers, because if I get a different number for Company A at every platform, we've got problem. What I should be seeing, the same information across the landscape, but importantly, it also drives the conversation between the different business units, so I can have marketing talk to sales, talk to operations, about Company A, and they all know who we're talking about. Historically, that's been a problem for a lot of companies because a source system would have Company A a little bit differently, or would have the data around it differently, or see it differently from one spectrum to the next. And we're trying to make that one lens consistent. >> So MDM allows you to have one consistent lens, based on the customer, but McAfee, I'm sure, is also in the midst of finding new ways, sources of data and new ways of using data, like product information, how it's being used, improving products, improving service quality. How is it, how is that hub-and-spoke approach able to accommodate some of the evolving challenges or evolving definitions and needs of data, since so much of that data often is localized to specific activities after they're performed? >> In business, there is a lot of data that happens very specific to that silo. So I have certain data within, say, marketing, that really is only marketing data, so one of the things that we do is we differentiate data. This kind of goes to governance, even saying there's some data as an organization is kind of our treasure that we want to make sure we manage consistently across the landscape of the ecosystem. There's some data that's very specific to a business function, that doesn't need to proliferate around. So we don't necessarily have the type of governance that would necessitate the level of governance that an ecosystem level data attribute would. So MDM provides, in that hub-and-spoke, what's really powerful for that as it relates to that account domain, because you're talking about product. Products is another area we may go look at at some point, adding a product domain into MDM, but today with our customer domain, and kind of our partners as well, it gives us the ability to, with this hub-and-spoke topology, to do realtime and batch, whereas before, it may have been a latency as we moved information around, and things could get either out of sync or there'd be a delay. With that hub-and-spoke, we're able to now have a realtime integration, a realtime interaction, so I can see changes made-- >> At the spoke? >> Peter: At the spoke, right. So the spoke pops back to the hub, hub delivers that back out again, so I can have something happening in marketing, translate that to sales, very quickly, translate that out to service and support, and that gives me the ability to have clarity, consistency, and timeliness across my ecosystem. And the hub-and-spoke helps drive that. >> Tell us about, you just alluded to it, sales and marketing, how is customer data, as an asset that you manage through your MDM environment, how is that driving better engagement with your customers? >> Well, it drives better engagement, first of all, you said an important thing, which is asset. We are very keen on doing data as an asset. I mean, systems come and go, platforms come and go. It's CRM tool today, CRM tool number two tomorrow, but data always is. Some of the things we've done is try to house and put a label on data as an asset, something that needs to be managed, that needs to be maintained, that needs to-- >> Governed. >> have an investment to. Right, governed, because if you don't, then it's going to decline in value over time, just like a physical asset, like a building. If you don't maintain and invest, it deteriorates. It's the same with data. What's really important about getting data from a customer's standpoint is the more we can align quality data, again, looking at that, not all data. Trying to govern all data is very difficult, but there's a treasure of data that helps us make decisions about our customers, but having that data align consistently to a lens of an account that's driven by MDM proliferate across your ecosystem so that everyone knows how to act and react accordingly, regardless of their function, gives us a very powerful process that we can gauge our customers, so that customer experience becomes consistent as well. If I'm talking to someone in sales and they understand me differently, then I'm talking to someone in support, versus talking to someone in marketing or another organization, it creates a differentiating customer experience. So if I can house that customer data, aligned to one lens of the customer, that provides that ubiquity and a consistency from a view in dealing with our customers. >> Talk to us about governance and stewardship with the data. Who owns the customer data? Is it sales, is it marketing, or is there another specified data steward who manages that data? >> Well, there's several different roles that you've going to hit through. Stewardship, we have, within my data technology services organization, we have a stewardship function. So, we steward data, act on data, but there's processes that we put in place, that's you're default process, and that's how we steward data and augment data over time. We do take very specific requests from sales and marketing. More likely, when it comes to an account from marketing, sorry, from sales, whose sales will guide, you know, move this, change this, alter that. So from a domain perspective, one of the things we're working through right now is data domains, and who has, I don't know if you're familiar with racing models, but who is responsible, who is accountable, who is consulted, who just receives an interest or gets information about it. But understanding how those data domains play against data is very, very important. We're working through some of that now, but typically, from a customer data, we align more toward sales, because they have that direct engagement. Part of it, also, is that differentiated view. Who has the most authority, the most knowledge about the top 500, top 1,000, top 2,000 customers is different than the people you had customer 10,000. So you usually have different audiences that play, who helps us govern and steward that data. >> So, one of the tensions that's been in place for years as we tried to codify and capture information about engagement, was who put the data in, what was the level of quality that got in there, and in many respects, the whole CRM thing, took a long time to work, precisely, because what we did is we moved data entry jobs from administrators into sales people, and they rebelled. So as you think about the role that quality plays and how you guide your organization to become active participants in data quality, what types of challenges do you face in communicating with the business, how to do about doing that, and then having your systems reflect what is practical and real in the rest of your organization? >> Well, it's a number of things. First of all, you have to make data relevant. If the data that that these people are entering is not relevant and isn't meaningful to them, the quality isn't going to be there, because they haven't had a purpose or a reason to engage. So, first thing is help make the data be relevant to the people who are you're data creators, right? And that's also to your business leaders. You also want the business leaders coming to you and talking about data, not just systems, and that's one of the things we're working toward as well. But as part of that, though, is giving them tools to ease the process of data-create. If I can go to my CRM tool instead of having to type in a new account, if I can then click on a tool and say, Hey, send to CRM, or add to CRM. So it's really more of a click and action that moves data, so I ensure that I have a good quality source that moves into my data store. That removes that person from being in the middle, and making those typing mistakes, those error mistakes. So it's really about the data-create process and putting a standard there, which is very important, but also then having your cleansing tools and capabilities in your back end, like the MDM or a data stewardship function. >> So by making the activity valuable, you create incentive for them to stay very close to quality consideration? >> Absolutely, because at the end of the day, they use that old term, garbage in, garbage out, and we try to be very clear with them, listen, someday you're going to want to see this data, and you probably should take the time to put quality effort in to begin with. >> Got it, one last quick question. If you think about five years, how is your role going to change? 30 seconds. >> I think the role's going to change in going from an IT-centric view, where I'm looking at tools and systems, to driving business outcomes and addressing business goals, and really, talking to business about how do they leverage data as a meaningful asset to move their business forward, versus just how am I deploying stewardship governance and systems and tools. >> Excellent. Matthew Cox, McAffee, data quality and utilization. >> Absolutely. >> Once again, you're watching theCUBE. We'll be back in a second. (techy music)

Published Date : May 22 2018

SUMMARY :

Brought to you by Informatica. Welcome to theCUBE, Matthew. Glad to be here. on the practitioner side. and bring meaning of data to the business but also the consumers of the data, seen of the business to IT, is to build out our and making sure that as we move data to be gained from using MDM. What that allows me to do is not only is also in the midst of finding new ways, that doesn't need to proliferate around. and that gives me the ability something that needs to be managed, is the more we can Talk to us about governance that we put in place, and in many respects, the whole CRM thing, the quality isn't going to be there, and we try to be very clear with them, how is your role going to change? and really, talking to business about Matthew Cox, McAffee, data We'll be back in a second.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Jim KobielusPERSON

0.99+

PeterPERSON

0.99+

Peter BurrisPERSON

0.99+

Matthew CoxPERSON

0.99+

McAfeeORGANIZATION

0.99+

InformaticaORGANIZATION

0.99+

MatthewPERSON

0.99+

JimPERSON

0.99+

todayDATE

0.99+

30 secondsQUANTITY

0.99+

firstQUANTITY

0.99+

tomorrowDATE

0.99+

bothQUANTITY

0.99+

three timesQUANTITY

0.98+

FirstQUANTITY

0.98+

first goalQUANTITY

0.97+

GDPRTITLE

0.97+

McAffeeORGANIZATION

0.97+

oneQUANTITY

0.97+

Las VegasLOCATION

0.97+

Data & Technology ServicesORGANIZATION

0.95+

theCUBEORGANIZATION

0.94+

Company AORGANIZATION

0.93+

first partQUANTITY

0.92+

about five yearsQUANTITY

0.92+

one spectrumQUANTITY

0.91+

Informatica World 2018EVENT

0.87+

topQUANTITY

0.86+

one consistentQUANTITY

0.84+

MDMTITLE

0.84+

one last quick questionQUANTITY

0.77+

WikibonORGANIZATION

0.76+

2,000 customersQUANTITY

0.74+

Informatica WorldEVENT

0.72+

last 10DATE

0.7+

MDMORGANIZATION

0.7+

SiliconANGLEORGANIZATION

0.69+

2018DATE

0.68+

one endQUANTITY

0.68+

secondQUANTITY

0.62+

top 500QUANTITY

0.6+

10,000QUANTITY

0.51+

yearsQUANTITY

0.51+

VenetianLOCATION

0.5+

1,000QUANTITY

0.49+

twoQUANTITY

0.42+

Wikibon Action Item | The Roadmap to Automation | April 27, 2018


 

>> Hi, I'm Peter Burris and welcome to another Wikibon Action Item. (upbeat digital music) >> Cameraman: Three, two, one. >> Hi. Once again, we're broadcasting from our beautiful Palo Alto studios, theCUBE studios, and this week we've got another great group. David Floyer in the studio with me along with George Gilbert. And on the phone we've got Jim Kobielus and Ralph Finos. Hey, guys. >> Hi there. >> So we're going to talk about something that's going to become a big issue. It's only now starting to emerge. And that is, what will be the roadmap to automation? Automation is going to be absolutely crucial for the success of IT in the future and the success of any digital business. At its core, many people have presumed that automation was about reducing labor. So introducing software and other technologies, we would effectively be able to substitute for administrative, operator, and related labor. And while that is absolutely a feature of what we're talking about, the bigger issue is ultimately is that we cannot conceive of more complex workloads that are capable of providing better customer experience, superior operations, all the other things a digital business ultimately wants to achieve. If we don't have a capability for simplifying how those underlying resources get put together, configured, or organized, orchestrated, and ultimately sustained delivery of. So the other part of automation is to allow for much more work that can be performed on the same resources much faster. It's a basis for how we think about plasticity and the ability to reconfigure resources very quickly. Now, the challenge is this industry, the IT industry has always used standards as a weapon. We use standards as a basis of creating eco systems or scale, or mass for even something as, like mainframes. Where there weren't hundreds of millions of potential users. But IBM was successful at using that as a basis for driving their costs down and approving a superior product. That's clearly what Microsoft and Intel did many years ago, was achieve that kind of scale through the driving more, and more, and more, ultimately, volume of the technology, and they won. But along the way though, each time, each generation has featured a significant amount of competition at how those interfaces came together and how they worked. And this is going to be the mother of all standard-oriented competition. How does one automation framework and another automation framework fit together? One being able to create value in a way that serves another automation framework, but ultimately as a, for many companies, a way of creating more scale onto their platform. More volume onto that platform. So this notion of how automation is going to evolve is going to be crucially important. David Floyer, are APIs going to be enough to solve this problem? >> No. That's a short answer to that. This is a very complex problem, and I think it's worthwhile spending a minute just on what are the component parts that need to be brought together. We're going to have a multi-cloud environment. Multiple private clouds, multiple public clouds, and they've got to work together in some way. And the automation is about, and you've got the Edge as well. So you've got a huge amount of data all across all of these different areas. And automation and orchestration across that, are as you said, not just about efficiency, they're about making it work. Making it able to be, to work and to be available. So all of the issues of availability, of security, of compliance, all of these difficult issues are a subject to getting this whole environment to be able to work together through a set of APIs, yes, but a lot lot more than that. And in particular, when you think about it, to me, volume of data is critical. Is who has access to that data. >> Peter: Now, why is that? >> Because if you're dealing with AI and you're dealing with any form of automation like this, the more data you have, the better your models are. And if you can increase that amount of data, as Google show every day, you will maintain that handle on all that control over that area. >> So you said something really important, because the implied assumption, and obviously, it's a major feature of what's going on, is that we've been talking about doing more automation for a long time. But what's different this time is the availability of AI and machine learning, for example, >> Right. as a basis for recognizing patterns, taking remedial action or taking predictive action to avoid the need for remedial action. And it's the availability of that data that's going to improve the quality of those models. >> Yes. Now, George, you've done a lot of work around this a whole notion of ML for ITOM. What are the kind of different approaches? If there's two ways that we're looking at it right now, what are the two ways? >> So there are two ends of the extreme. One is I want to see end to end what's going on across my private cloud or clouds. As well as if I have different applications in different public clouds. But that's very difficult. You get end-to-end visibility but you have to relax a lot of assumptions about what's where. >> And that's called the-- >> Breadth first. So the pro is end-to-end visibility. Con is you don't know how all the pieces fit together quite as well, so you get less fidelity in terms of diagnosing root causes. >> So you're trying to optimize at a macro level while recognizing that you can't optimize at a micro level. >> Right. Now the other approach, the other end of the spectrum, is depth first. Where you constrain the set of workloads and services that you're building and that you know about, and how they fit together. And then the models, based on the data you collect there, can become so rich that you have very very high fidelity root cause determination which allows you to do very precise recommendations or even automated remediation. What we haven't figured out hot to do yet is marry the depth first with the breadth first. So that you have multiple focus depth first. That's very tricky. >> Now, if you think about how the industry has evolved, we wrote some stuff about what we call, what I call the iron triangle. Which is basically a very tight relationship between specialists in technology. So the people who were responsible for a particular asset, be it storage, or the system, or the network. The vendors, who provided a lot of the knowledge about how that worked, and therefore made that specialist more or less successful and competent. And then the automation technology that that vendor ultimately provided. Now, that was not automation technology that was associated with AI or anything along those lines. It was kind of out of the box, buy our tool, and this is how you're going to automate various workflows or scripts, or whatever else it might be. And every effort to try to break that has been met with screaming because, well, you're now breaking my automation routines. So the depth-first approach, even without ML, has been the way that we've done it historically. But, David, you're talking about something different. It's the availability of the data that starts to change that. >> Yeah. >> So are we going to start seeing new compacts put in place between users and vendors and OEMs and a lot of these other folks? And it sounds like it's going to be about access to the data. >> Absolutely. So you're going to start. let's start at the bottom. You've got people who have a particular component, whatever that component is. It might be storage. It might be networking. Whatever that component is. They have products in that area which will be collecting data. And they will need for their particular area to provide a degree of automation. A degree of capability. And they need to do two things. They need to do that optimization and also provide data to other people. So they have to have an OEM agreement not just for the equipment that they provide, but for the data that they're going to give and the data they're going to give back. The automatization of the data, for example, going up and the availability of data to help themselves. >> So contracts effectively mean that you're going to have to negotiate value capture on the data side as well as the revenue side. >> Absolutely. >> The ability to do contracting historically has been around individual products. And so we're pretty good at that. So we can say, you will buy this product. I'm delivering you the value. And then the utility of that product is up to you. When we start going to service contracts, we get a little bit different kind of an arrangement. Now, it's an ongoing continuous delivery. But for the most part, a lot of those service contracts have been predicated to known in advance classes of functions, like Salesforce, for example. Or the SASS business where you're able to write a contract that says over time you will have access to this service. When we start talking about some of this automation though, now we're talking about ongoing, but highly bespoke, and potentially highly divergent, over a relatively short period of time, that you have a hard time writing contracts that will prescribe the range of behaviors and the promise about how those behaviors are actually going to perform. I don't think we're there yet. What do you guys think? >> Well, >> No, no way. I mean, >> Especially when you think about realtime. (laughing) >> Yeah. It has to be realtime to get to the end point of automating the actual reply than the actual action that you take. That's where you have to get to. You can't, It won't be sufficient in realtime. I think it's a very interesting area, this contracts area. If you think about solutions for it, I would be going straight towards blockchain type architectures and dynamic blockchain contracts that would have to be put in place. >> Peter: But they're not realtime. >> The contracts aren't realtime. The contracts will never be realtime, but the >> Accessed? access to the data and the understanding of what data is required. Those will be realtime. >> Well, we'll see. I mean, the theorem's what? Every 12 seconds? >> Well. That's >> Everything gets updated? >> That's To me, that's good enough. >> Okay. >> That's realtime enough. It's not going to solve the problem of somebody >> Peter: It's not going to solve the problem at the edge. >> At the very edge, but it's certainly sufficient to solve the problem of contracts. >> Okay. >> But, and I would add to that and say, in addition to having all this data available. Let's go back like 10, 20 years and look at Cisco. A lot of their differentiation and what entrenched them was sort of universal familiarity with their admin interfaces and they might not expose APIs in a way that would make it common across their competitors. But if you had data from them and a constrained number of other providers for around which you would build let's say, these modern big data applications. It's if you constrain the problem, you can get to the depth first. >> Yeah, but Cisco is a great example of it's an archetype for what I said earlier, that notion of an iron triangle. You had Cisco admins >> Yeah. that were certified to run Cisco gear and therefore had a strong incentive to ensure that more Cisco gear was purchased utilizing a Cisco command line interface that did incorporate a fair amount of automation for that Cisco gear and it was almost impossible for a lot of companies to penetrate that tight arrangement between the Cisco admin that was certified, the Cisco gear, and the COI. >> And the exact same thing happened with Oracle. The Oracle admin skillset was pervasive within large >> Peter: Happened with everybody. >> Yes, absolutely >> But, >> Peter: The only reason it didn't happen in the IBM mainframe, David, was because of a >> It did happen, yeah, >> Well, but it did happen, but governments stepped in and said, this violates antitrust. And IBM was forced by law, by court decree, to open up those interfaces. >> Yes. That's true. >> But are we going to see the same type of thing >> I think it's very interesting to see the shape of this market. When we look a little bit ahead. People like Amazon are going to have IAS, they're going to be running applications. They are going to go for the depth way of doing things across, or what which way around is it? >> Peter: The breadth. They're going to be end to end. >> But they will go depth in individual-- >> Components. Or show of, but they will put together their own type of things for their services. >> Right. >> Equally, other players like Dell, for example, have a lot of different products. A lot of different components in a lot of different areas. They have to go piece by piece and put together a consortium of suppliers to them. Storage suppliers, chip suppliers, and put together that outside and it's going to have to be a different type of solution that they put together. HP will have the same issue there. And as of people like CA, for example, who we'll see an opportunity for them to be come in again with great products and overlooking the whole of all of this data coming in. >> Peter: Oh, sure. Absolutely. >> So there's a lot of players who could be in this area. Microsoft, I missed out, of course they will have the two ends that they can combine together. >> Well, they may have an advantage that nobody else has-- >> Exactly. Yeah. because they're strong in both places. But I have Jim Kobielus. Let me check, are you there now? Do we got Jim back? >> Can you hear me? >> Peter: I can barely hear you, Jim. Could we bring Jim's volume up a little bit? So, Jim, I asked the question earlier, about we have the tooling for AI. We know how to get data. How to build models and how to apply the models in a broad brush way. And we're certainly starting to see that happen within the IT operations management world. The ITOM world, but we don't yet know how we're going to write these contracts that are capable of better anticipating, putting in place a regime that really describes how the, what are the limits of data sharing? What are the limits of derivative use? Et cetera. I argued, and here in the studio we generally agreed, that's we still haven't figured that out and that this is going to be one of the places where the tension between, at least in the B2B world, data availability and derivative use and where you capture value and where those profitables go, is going to be significant. But I want to get your take. Has the AI community >> Yeah. started figuring out how we're going to contractually handle obligations around data, data use, data sharing, data derivative use. >> The short answer is, no they have not. The longer answer is, that can you hear me, first of all? >> Peter: Barely. >> Okay. Should I keep talking? >> Yeah. Go ahead. >> Okay. The short answer is, no that the AI community has not addressed those, those IP protection issues. But there is a growing push in the AI community to leverage blockchain for such requirements in terms of block chains to store smart contracts where related to downstream utilization of data and derivative models. But that's extraordinarily early on in its development in terms of insight in the AI community and in the blockchain community as well. In other words, in fact, in one of the posts that I'm working on right now, is looking at a company called 8base that's actually using blockchain to store all of those assets, those artifacts for the development and lifecycle along with the smart contracts to drive those downstream uses. So what I'm saying is that there's lots of smart people like yourselves are thinking about these problems, but there's no consensus, definitely, in the AI community for how to manage all those rights downstream. >> All right. So very quickly, Ralph Finos, if you're there. I want to get your perspective >> Yeah. on what this means from markets, market leadership. What do you think? How's this going to impact who are the leaders, who's likely to continue to grow and gain even more strength? What're your thoughts on this? >> Yeah. I think, my perspective on this thing in the near term is to focus on simplification. And to focus on depth, because you can get return, you can get payback for that kind of work and it simplifies the overall picture so when you're going broad, you've got less of a problem to deal with. To link all these things together. So I'm going to go with the Shaker kind of perspective on the world is to make things simple. And to focus there. And I think the complexity of what we're talking about for breadth is too difficult to handle at this point in time. I don't see it happening any time in the near future. >> Although there are some companies, like Splunk, for example, that are doing a decent job of presenting a more of a breadth approach, but they're not going deep into the various elements. So, George, really quick. Let's talk to you. >> I beg to disagree on that one. >> Peter: Oh! >> They're actually, they built a platform, originally that was breadth first. They built all these, essentially, forwarders which could understand the formats of the output of all sorts of different devices and services. But then they started building what they called curated experiences which is the equivalent of what we call depth first. They're doing it for IT service management. They're doing it for what's called user behavior. Analytics, which is it's a way of tracking bad actors or bad devices on a network. And they're going to be pumping out more of those. What's not clear yet, is how they're going to integrate those so that IT service management understands security and vice versa. >> And I think that's one of the key things, George, is that ultimately, the real question will be or not the real question, but when we think about the roadmap, it's probably that security is going to be early on one of the things that gets addressed here. And again, it's not just security from a perimeter standpoint. Some people are calling it a software-based perimeter. Our perspective is the data's going to go everywhere and ultimately how do you sustain a zero trust world where you know your data is going to be out in the clear so what are you going to do about it? All right. So look. Let's wrap this one up. Jim Kobielus, let's give you the first Action Item. Jim, Action Item. >> Action Item. Wow. Action Item Automation is just to follow the stack of assets that drive automation and figure out your overall sharing architecture for sharing out these assets. I think the core asset will remain orchestration models. I don't think predictive models in AI are a huge piece of the overall automation pie in terms of the logic. So just focus on building out and protecting and sharing and reusing your orchestration models. Those are critically important. In any domain. End to end or in specific automation domains. >> Peter: David Floyer, Action Item. >> So my Action Item is to acknowledge that the world of building your own automation yourself around a whole lot of piece parts that you put together are over. You won't have access to a sufficient data. So enterprises must take a broad view of getting data, of getting components that have data be giving them data. Make contracts with people to give them data, masking or whatever it is and become part of a broader scheme that will allow them to meet the automation requirements of the 21st century. >> Ralph Finos, Action Item. >> Yeah. Again, I would reiterate the importance of keeping it simple. Taking care of the depth questions and moving forward from there. The complexity is enormous, and-- >> Peter: George Gilbert, Action Item. >> I say, start with what customers always start with with a new technology, which is a constrained environment like a pilot and there's two areas that are potentially high return. One is big data, where it's been a multi vendor or multi-vendor component mix, and a mess. And so you take that and you constrain that and make that a depth-first approach in the cloud where there is data to manage that. And the second one is security, where we have now a more and more trained applications just for that. I say, don't start with a platform. Start with those solutions and then start adding more solutions around that. >> All right. Great. So here's our overall Action Item. The question of automation or roadmap to automation is crucial for multiple reasons. But one of the most important ones is it's inconceivable to us to envision how a business can institute even more complex applications if we don't have a way of improving the degree of automation on the underlying infrastructure. How this is going to play out, we're not exactly sure. But we do think that there are a few principals that are going to be important that users have to focus on. Number one is data. Be very clear that there is value in your data, both to you as well as to your suppliers and as you think about writing contracts, don't write contracts that are focused on a product now. Focus on even that product as a service over time where you are sharing data back and forth in addition to getting some return out of whatever assets you've put in place. And make sure that the negotiations specifically acknowledge the value of that data to your suppliers as well. Number two, that there is certainly going to be a scale here. There's certainly going to be a volume question here. And as we think about where a lot of the new approaches to doing these or this notion of automation, is going to come out of the cloud vendors. Once again, the cloud vendors are articulating what the overall model is going to look like. What that cloud experience is going to look like. And it's going to be a challenge to other suppliers who are providing an on-premises true private cloud and Edge orientation where the data must live sometimes it is not something that they just want to do because they want to do it. Because that data requires it to be able to reflect that cloud operating model. And expect, ultimately, that your suppliers also are going to have to have very clear contractual relationships with the cloud players and each other for how that data gets shared. Ultimately, however, we think it's crucially important that any CIO recognized that the existing environment that they have right now is not converged. The existing environment today remains operators, suppliers of technology, and suppliers of automation capabilities and breaking that up is going to be crucial. Not only to achieving automation objectives, but to achieve a converged infrastructure, hyper converged infrastructure, multi-cloud arrangements, including private cloud, true private cloud, and the cloud itself. And this is going to be a management challenge, goes way beyond just products and technology, to actually incorporating how you think about your shopping, organized, how you institutionalize the work that the business requires, and therefore what you identify as a tasks that will be first to be automated. Our expectation, security's going to be early on. Why? Because your CEO and your board of directors are going to demand it. So think about how automation can be improved and enhanced through a security lens, but do so in a way that ensures that over time you can bring new capabilities on with a depth-first approach at least, to the breadth that you need within your shop and within your business, your digital business, to achieve the success and the results that you want. Okay. Once again, I want to thank David Floyer and George Gilbert here in the studio with us. On the phone, Ralph Finos and Jim Kobielus. Couldn't get Neil Raiden in today, sorry Neil. And I am Peter Burris, and this has been an Action Item. Talk to you again soon. (upbeat digital music)

Published Date : Apr 27 2018

SUMMARY :

and welcome to another Wikibon Action Item. And on the phone we've got Jim Kobielus and Ralph Finos. and the ability to reconfigure resources very quickly. that need to be brought together. the more data you have, is the availability of AI and machine learning, And it's the availability of that data What are the kind of different approaches? You get end-to-end visibility but you have to relax So the pro is end-to-end visibility. while recognizing that you can't optimize at a micro level. So that you have multiple focus depth first. that starts to change that. And it sounds like it's going to be about access to the data. and the data they're going to give back. have to negotiate value capture on the data side and the promise about how those behaviors I mean, Especially when you think about realtime. than the actual action that you take. but the access to the data and the understanding I mean, the theorem's what? To me, that's good enough. It's not going to solve the problem of somebody but it's certainly sufficient to solve the problem in addition to having all this data available. Yeah, but Cisco is a great example of and therefore had a strong incentive to ensure And the exact same thing happened with Oracle. to open up those interfaces. They are going to go for the depth way of doing things They're going to be end to end. but they will put together their own type of things that outside and it's going to have to be a different type Peter: Oh, sure. the two ends that they can combine together. Let me check, are you there now? and that this is going to be one of the places to contractually handle obligations around data, The longer answer is, that and in the blockchain community as well. I want to get your perspective How's this going to impact who are the leaders, So I'm going to go with the Shaker kind of perspective Let's talk to you. I beg to disagree And they're going to be pumping out more of those. Our perspective is the data's going to go everywhere Action Item Automation is just to follow that the world of building your own automation yourself Taking care of the depth questions and make that a depth-first approach in the cloud Because that data requires it to be able to reflect

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
JimPERSON

0.99+

David FloyerPERSON

0.99+

Jim KobielusPERSON

0.99+

DavidPERSON

0.99+

George GilbertPERSON

0.99+

Peter BurrisPERSON

0.99+

GeorgePERSON

0.99+

NeilPERSON

0.99+

PeterPERSON

0.99+

April 27, 2018DATE

0.99+

Ralph FinosPERSON

0.99+

IBMORGANIZATION

0.99+

Neil RaidenPERSON

0.99+

MicrosoftORGANIZATION

0.99+

CiscoORGANIZATION

0.99+

DellORGANIZATION

0.99+

AmazonORGANIZATION

0.99+

21st centuryDATE

0.99+

two waysQUANTITY

0.99+

8baseORGANIZATION

0.99+

10QUANTITY

0.99+

hundredsQUANTITY

0.99+

oneQUANTITY

0.99+

SplunkORGANIZATION

0.99+

two areasQUANTITY

0.99+

OracleORGANIZATION

0.99+

OneQUANTITY

0.99+

HPORGANIZATION

0.99+

each generationQUANTITY

0.99+

theCUBEORGANIZATION

0.99+

IntelORGANIZATION

0.99+

both placesQUANTITY

0.99+

Palo AltoLOCATION

0.99+

bothQUANTITY

0.98+

two thingsQUANTITY

0.98+

ThreeQUANTITY

0.98+

twoQUANTITY

0.98+

SASSORGANIZATION

0.98+

this weekDATE

0.97+

each timeQUANTITY

0.97+

two endsQUANTITY

0.97+

todayDATE

0.97+

GoogleORGANIZATION

0.96+

firstQUANTITY

0.96+

second oneQUANTITY

0.94+

CALOCATION

0.92+

Alan Gates, Hortonworks | Dataworks Summit 2018


 

(techno music) >> (announcer) From Berlin, Germany it's theCUBE covering DataWorks Summit Europe 2018. Brought to you by Hortonworks. >> Well hello, welcome to theCUBE. We're here on day two of DataWorks Summit 2018 in Berlin, Germany. I'm James Kobielus. I'm lead analyst for Big Data Analytics in the Wikibon team of SiliconANGLE Media. And who we have here today, we have Alan Gates whose one of the founders of Hortonworks and Hortonworks of course is the host of DataWorks Summit and he's going to be, well, hello Alan. Welcome to theCUBE. >> Hello, thank you. >> Yeah, so Alan, so you and I go way back. Essentially, what we'd like you to do first of all is just explain a little bit of the genesis of Hortonworks. Where it came from, your role as a founder from the beginning, how that's evolved over time but really how the company has evolved specifically with the folks on the community, the Hadoop community, the Open Source community. You have a deepening open source stack with you build upon with Atlas and Ranger and so forth. Gives us a sense for all of that Alan. >> Sure. So as I think it's well-known, we started as the team at Yahoo that really was driving a lot of the development of Hadoop. We were one of the major players in the Hadoop community. Worked on that for, I was in that team for four years. I think the team itself was going for about five. And it became clear that there was an opportunity to build a business around this. Some others had already started to do so. We wanted to participate in that. We worked with Yahoo to spin out Hortonworks and actually they were a great partner in that. Helped us get than spun out. And the leadership team of the Hadoop team at Yahoo became the founders of Hortonworks and brought along a number of the other engineering, a bunch of the other engineers to help get started. And really at the beginning, we were. It was Hadoop, Pig, Hive, you know, a few of the very, Hbase, the kind of, the beginning projects. So pretty small toolkit. And we were, our early customers were very engineering heavy people, or companies who knew how to take those tools and build something directly on those tools right? >> Well, you started off with the Hadoop community as a whole started off with a focus on the data engineers of the world >> Yes. >> And I think it's shifted, and confirm for me, over time that you focus increasing with your solutions on the data scientists who are doing the development of the applications, and the data stewards from what I can see at this show. >> I think it's really just a part of the adoption curve right? When you're early on that curve, you have people who are very into the technology, understand how it works, and want to dive in there. So those tend to be, as you said, the data engineering types in this space. As that curve grows out, you get, it comes wider and wider. There's still plenty of data engineers that are our customers, that are working with us but as you said, the data analysts, the BI people, data scientists, data stewards, all those people are now starting to adopt it as well. And they need different tools than the data engineers do. They don't want to sit down and write Java code or you know, some of the data scientists might want to work in Python in a notebook like Zeppelin or Jupyter but some, may want to use SQL or even Tablo or something on top of SQL to do the presentation. Of course, data stewards want tools more like Atlas to help manage all their stuff. So that does drive us to one, put more things into the toolkit so you see the addition of projects like Apache Atlas and Ranger for security and all that. Another area of growth, I would say is also the kind of data that we're focused on. So early on, we were focused on data at rest. You know, we're going to store all this stuff in HDFS and as the kind of data scene has evolved, there's a lot more focus now on a couple things. One is data, what we call data-in-motion for our HDF product where you've got in a stream manager like Kafka or something like that >> (James) Right >> So there's processing that kind of data. But now we also see a lot of data in various places. It's not just oh, okay I have a Hadoop cluster on premise at my company. I might have some here, some on premise somewhere else and I might have it in several clouds as well. >> K, your focus has shifted like the industry in general towards streaming data in multi-clouds where your, it's more stateful interactions and so forth? I think you've made investments in Apache NiFi so >> (Alan) yes. >> Give us a sense for your NiFi versus Kafka and so forth inside of your product strategy or your >> Sure. So NiFi is really focused on that data at the edge, right? So you're bringing data in from sensors, connected cars, airplane engines, all those sorts of things that are out there generating data and you need, you need to figure out what parts of the data to move upstream, what parts not to. What processing can I do here so that I don't have to move upstream? When I have a error event or a warning event, can I turn up the amount of data I'm sending in, right? Say this airplane engine is suddenly heating up maybe a little more than it's supposed to. Maybe I should ship more of the logs upstream when the plane lands and connects that I would if, otherwise. That's the kind o' thing that Apache NiFi focuses on. I'm not saying it runs in all those places by my point is, it's that kind o' edge processing. Kafka is still going to be running in a data center somewhere. It's still a pretty heavy weight technology in terms of memory and disk space and all that so it's not going to be run on some sensor somewhere. But it is that data-in-motion right? I've got millions of events streaming through a set of Kafka topics watching all that sensor data that's coming in from NiFi and reacting to it, maybe putting some of it in the data warehouse for later analysis, all those sorts of things. So that's kind o' the differentiation there between Kafka and NiFi. >> Right, right, right. So, going forward, do you see more of your customers working internet of things projects, is that, we don't often, at least in the industry of popular mind, associate Hortonworks with edge computing and so forth. Is that? >> I think that we will have more and more customers in that space. I mean, our goal is to help our customers with their data wherever it is. >> (James) Yeah. >> When it's on the edge, when it's in the data center, when it's moving in between, when it's in the cloud. All those places, that's where we want to help our customers store and process their data. Right? So, I wouldn't want to say that we're going to focus on just the edge or the internet of things but that certainly has to be part of our strategy 'cause it's has to be part of what our customers are doing. >> When I think about the Hortonworks community, now we have to broaden our understanding because you have a tight partnership with IBM which obviously is well-established, huge and global. Give us a sense for as you guys have teamed more closely with IBM, how your community has changed or broadened or shifted in its focus or has it? >> I don't know that it's shifted the focus. I mean IBM was already part of the Hadoop community. They were already contributing. Obviously, they've contributed very heavily on projects like Spark and some of those. They continue some of that contribution. So I wouldn't say that it's shifted it, it's just we are working more closely together as we both contribute to those communities, working more closely together to present solutions to our mutual customer base. But I wouldn't say it's really shifted the focus for us. >> Right, right. Now at this show, we're in Europe right now, but it doesn't matter that we're in Europe. GDPR is coming down fast and furious now. Data Steward Studio, we had the demonstration today, it was announced yesterday. And it looks like a really good tool for the main, the requirements for compliance which is discover and inventory your data which is really set up a consent portal, what I like to refer to. So the data subject can then go and make a request to have my data forgotten and so forth. Give us a sense going forward, for how or if Hortonworks, IBM, and others in your community are going to work towards greater standardization in the functional capabilities of the tools and platforms for enabling GDPR compliance. 'Cause it seems to me that you're going to need, the industry's going to need to have some reference architecture for these kind o' capabilities so that going forward, either your ecosystem of partners can build add on tools in some common, like the framework that was laid out today looks like a good basis. Is there anything that you're doing in terms of pushing towards more Open Source standardization in that area? >> Yes, there is. So actually one of my responsibilities is the technical management of our relationship with ODPI which >> (James) yes. >> Mandy Chessell referenced yesterday in her keynote and that is where we're working with IBM, with ING, with other companies to build exactly those standards. Right? Because we do want to build it around Apache Atlas. We feel like that's a good tool for the basis of that but we know one, that some people are going to want to bring their own tools to it. They're not necessarily going to want to use that one platform so we want to do it in an open way that they can still plug in their metadata repositories and communicate with others and we want to build the standards on top of that of how do you properly implement these features that GDPR requires like right to be forgotten, like you know, what are the protocols around PIII data? How do you prevent a breach? How do you respond to a breach? >> Will that all be under the umbrella of ODPI, that initiative of the partnership or will it be a separate group or? >> Well, so certainly Apache Atlas is part of Apache and remains so. What ODPI is really focused up is that next layer up of how do we engage, not the programmers 'cause programmers can gage really well at the Apache level but the next level up. We want to engage the data professionals, the people whose job it is, the compliance officers. The people who don't sit and write code and frankly if you connect them to the engineers, there's just going to be an impedance mismatch in that conversation. >> You got policy wonks and you got tech wonks so. They understand each other at the wonk level. >> That's a good way to put it. And so that's where ODPI is really coming is that group of compliance people that speak a completely different language. But we still need to get them all talking to each other as you said, so that there's specifications around. How do we do this? And what is compliance? >> Well Alan, thank you very much. We're at the end of our time for this segment. This has been great. It's been great to catch up with you and Hortonworks has been evolving very rapidly and it seems to me that, going forward, I think you're well-positioned now for the new GDPR age to take your overall solution portfolio, your partnerships, and your capabilities to the next level and really in terms of in an Open Source framework. In many ways though, you're not entirely 100% like nobody is, purely Open Source. You're still very much focused on open frameworks for building fairly scalable, very scalable solutions for enterprise deployment. Well, this has been Jim Kobielus with Alan Gates of Hortonworks here at theCUBE on theCUBE at DataWorks Summit 2018 in Berlin. We'll be back fairly quickly with another guest and thank you very much for watching our segment. (techno music)

Published Date : Apr 19 2018

SUMMARY :

Brought to you by Hortonworks. of Hortonworks and Hortonworks of course is the host a little bit of the genesis of Hortonworks. a bunch of the other engineers to help get started. of the applications, and the data stewards So those tend to be, as you said, the data engineering types But now we also see a lot of data in various places. So NiFi is really focused on that data at the edge, right? So, going forward, do you see more of your customers working I mean, our goal is to help our customers with their data When it's on the edge, when it's in the data center, as you guys have teamed more closely with IBM, I don't know that it's shifted the focus. the industry's going to need to have some So actually one of my responsibilities is the that GDPR requires like right to be forgotten, like and frankly if you connect them to the engineers, You got policy wonks and you got tech wonks so. as you said, so that there's specifications around. It's been great to catch up with you and

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
IBMORGANIZATION

0.99+

James KobielusPERSON

0.99+

Mandy ChessellPERSON

0.99+

AlanPERSON

0.99+

YahooORGANIZATION

0.99+

Jim KobielusPERSON

0.99+

EuropeLOCATION

0.99+

HortonworksORGANIZATION

0.99+

Alan GatesPERSON

0.99+

four yearsQUANTITY

0.99+

JamesPERSON

0.99+

INGORGANIZATION

0.99+

BerlinLOCATION

0.99+

yesterdayDATE

0.99+

ApacheORGANIZATION

0.99+

SQLTITLE

0.99+

JavaTITLE

0.99+

GDPRTITLE

0.99+

PythonTITLE

0.99+

100%QUANTITY

0.99+

Berlin, GermanyLOCATION

0.99+

SiliconANGLE MediaORGANIZATION

0.99+

DataWorks SummitEVENT

0.99+

AtlasORGANIZATION

0.99+

DataWorks Summit 2018EVENT

0.98+

Data Steward StudioORGANIZATION

0.98+

todayDATE

0.98+

oneQUANTITY

0.98+

NiFiORGANIZATION

0.98+

Dataworks Summit 2018EVENT

0.98+

HadoopORGANIZATION

0.98+

one platformQUANTITY

0.97+

2018EVENT

0.97+

bothQUANTITY

0.97+

millions of eventsQUANTITY

0.96+

HbaseORGANIZATION

0.95+

TabloTITLE

0.95+

ODPIORGANIZATION

0.94+

Big Data AnalyticsORGANIZATION

0.94+

OneQUANTITY

0.93+

theCUBEORGANIZATION

0.93+

NiFiCOMMERCIAL_ITEM

0.92+

day twoQUANTITY

0.92+

about fiveQUANTITY

0.91+

KafkaTITLE

0.9+

ZeppelinORGANIZATION

0.89+

AtlasTITLE

0.85+

RangerORGANIZATION

0.84+

JupyterORGANIZATION

0.83+

firstQUANTITY

0.82+

Apache AtlasORGANIZATION

0.82+

HadoopTITLE

0.79+

Day Two Keynote Analysis | Dataworks Summit 2018


 

>> Announcer: From Berlin, Germany, it's the Cube covering Datawork Summit Europe 2018. Brought to you by Hortonworks. (electronic music) >> Hello and welcome to the Cube on day two of Dataworks Summit 2018 from Berlin. It's been a great show so far. We have just completed the day two keynote and in just a moment I'll bring ya up to speed on the major points and the presentations from that. It's been a great conference. Fairly well attended here. The hallway chatter, discussion's been great. The breakouts have been stimulating. For me the takeaway is the fact that Hortonworks, the show host, has announced yesterday at the keynote, Scott Gnau, the CTO of Hortonworks announced Data Steward Studio, DSS they call it, part of the data plane, Hotronworks data plane services portfolio and it could not be more timely Data Steward Studio because we are now five weeks away from GDPR, that's the General Data Protection Regulation becoming the law of the land. When I say the land, the EU, but really any company that operates in the EU, and that includes many U.S. based and Apac based and other companies will need to comply with the GDPR as of May 25th and ongoing. In terms of protecting the personal data of EU citizens. And that means a lot of different things. Data Steward Studio announced yesterday, was demo'd today, by Hortonworks and it was a really excellent demo, and showed that it's a powerful solution for a number of things that are at the core of GDPR compliance. The demo covered the capability of the solution to discover and inventory personal data within a distributed data lake or enterprise data environment, number one. Number two, the ability of the solution to centralize consent, provide a consent portal essentially that data subjects can use then to review the data that's kept on them to make fine grain consents or withdraw consents for use in profiling of their data that they own. And then number three, the show, they demonstrated the capability of the solution then to execute the data subject to people's requests in terms of the handling of their personal data. The three main points in terms of enabling, adding the teeth to enforce GDPR in an operational setting in any company that needs to comply with GDPR. So, what we're going to see, I believe going forward in the, really in the whole global economy and in the big data space is that Hortonworks and others in the data lake industry, and there's many others, are going to need to roll out similar capabilities in their portfolios 'cause their customers are absolutely going to demand it. In fact the deadline is fast approaching, it's only five weeks away. One of the interesting take aways from the, the keynote this morning was the fact that John Kreisa, the VP for marketing at Hortonworks today, a quick survey of those in the audience a poll, asking how ready they are to comply with GDPR as of May 25th and it was a bit eye opening. I wasn't surprised, but I think it was 19 or 20%, I don't have the numbers in front of me, said that they won't be ready to comply. I believe it was something where between 20 and 30% said they will be able to comply. About 40% I'm, don't quote me on that, but a fair plurality said that they're preparing. So that, indicates that they're not entirely 100% sure that they will be able to comply 100% to the letter of the law as of May 25th. I think that's probably accurate in terms of ballpark figures. I think there's a lot of, I know there's a lot of companies, users racing for compliance by that date. And so really GDPR is definitely the headline banner, umbrella story around this event and really around the big data community world-wide right now in terms of enterprise, investments in the needed compliance software and services and capabilities are needed to comply with GDPR. That was important. That wasn't the only thing that was covered in, not only the keynotes, but in the sessions here so far. AI, clearly AI and machine learning are hot themes in terms of the innovation side of big data. There's compliance, there's GDPR, but really innovation in terms of what enterprises are doing with their data, with their analytics, they're building more and more AI and embedding that in conversational UIs and chatbots and their embedding AI, you know manner of e-commerce applications, internal applications in terms of search, as well as things like face recognition, voice recognition, and so forth and so on. So, what we've seen here at the show is what I've been seeing for quite some time is that more of the actual developers who are working with big data are the data scientists of the world. And more of the traditional coders are getting up to speed very rapidly on the new state of the art for building machine learning and deep learning AI natural language processing into their applications. That said, so Hortonworks has become a fairly substantial player in the machine learning space. In fact, you know, really across their portfolio many of the discussions here I've seen shows that everybody's buzzing about getting up to speed on frameworks for building and deploying and iterating and refining machine learning models in operational environments. So that's definitely a hot theme. And so there was an AI presentation this morning from the first gentleman that came on that laid out the broad parameters of what, what developers are doing and looking to do with data that they maintain in their lakes, training data to both build the models and train them and deploy them. So, that was also something I expected and it's good to see at Dataworks Summit that there is a substantial focus on that in addition of course to GDPR and compliance. It's been about seven years now since Hortonworks was essentially spun off of Yahoo. It's been I think about three years or so since they went IPO. And what I can see is that they are making great progress in terms of their growth, in terms of not just the finances, but their customer acquisition and their deal size and also customer satisfaction. I get a sense from talking to many of the attendees at this event that Hortonworks has become a fairly blue chip vendor, that they're really in many ways, continuing to grow their footprint of Hortonworks products and services in most of their partners, such as IBM. And from what I can see everybody was wrapped with intention around Data Steward Studio and I sensed, sort of a sigh of relief that it looks like a fairly good solution and so I have no doubt that a fair number of those in this hall right now are probably, as we say in the U.S., probably kicking the tires of DSS and probably going to expedite their adoption of it. So, with that said, we have day two here, so what we're going to have is Alan Gates, one of the founders of Hortonworks coming on in just a few minutes and I'll be interviewing him, asking about the vibrancy in the health of the community, the Hortonworks ecosystem, developers, partners, and so forth as well as of course the open source communities for Hadoop and Ranger and Atlas and so forth, the growing stack of open source code upon which Hortonworks has built their substantial portfolio of solutions. Following him we'll have John Kreisa, the VP for marketing. I'm going to ask John to give us an update on, really the, sort of the health of Hortonworks as a business in terms of the reach out to the community in terms of their messaging obviously and have him really position Hortonworks in the community in terms of who's he see them competing with. What segments is Hortonworks in now? The whole Hadoop segment increasingly... Hadoop is there. It's the foundation. The word is not invoked in the context of discussions of Hortonworks as much now as it was in the past. And the same thing for say Cloudera one of their closest to traditional rivals, closest in the sense that people associate them. I was at the Cloudera analyst event the other week in Santa Monica, California. It was the same thing. I think both of these vendors are on a similar path to become fairly substantial data warehousing and data governance suppliers to the enterprises of the world that have traditionally gone with the likes of IBM and Oracle and SAP and so forth. So I think they're, Hortonworks, has definitely evolved into a far more diversified solution provider than people realize. And that's really one of the take aways from Dataworks Summit. With that said, this is Jim Kobielus. I'm the lead analyst, I should've said that at the outset. I'm the lead analyst at SiliconANGLE's Media's Wikibon team focused on big data analytics. I'm your host this week on the Cube at Dataworks Summit Berlin. I'll close out this segment and we'll get ready to talk to the Hortonworks and IBM personnel. I understand there's a gentleman from Accenture on as well today on the Cube here at Dataworks Summit Berlin. (electronic music)

Published Date : Apr 19 2018

SUMMARY :

Announcer: From Berlin, Germany, it's the Cube as a business in terms of the reach out to the community

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Jim KobielusPERSON

0.99+

John KreisaPERSON

0.99+

HortonworksORGANIZATION

0.99+

Scott GnauPERSON

0.99+

IBMORGANIZATION

0.99+

JohnPERSON

0.99+

ClouderaORGANIZATION

0.99+

May 25thDATE

0.99+

BerlinLOCATION

0.99+

YahooORGANIZATION

0.99+

five weeksQUANTITY

0.99+

Alan GatesPERSON

0.99+

OracleORGANIZATION

0.99+

HotronworksORGANIZATION

0.99+

Data Steward StudioORGANIZATION

0.99+

General Data Protection RegulationTITLE

0.99+

Santa Monica, CaliforniaLOCATION

0.99+

GDPRTITLE

0.99+

19QUANTITY

0.99+

bothQUANTITY

0.99+

100%QUANTITY

0.99+

todayDATE

0.99+

20%QUANTITY

0.99+

oneQUANTITY

0.99+

yesterdayDATE

0.99+

U.S.LOCATION

0.99+

DSSORGANIZATION

0.99+

30%QUANTITY

0.99+

Berlin, GermanyLOCATION

0.98+

Dataworks Summit 2018EVENT

0.98+

three main pointsQUANTITY

0.98+

AtlasORGANIZATION

0.98+

20QUANTITY

0.98+

about seven yearsQUANTITY

0.98+

AccentureORGANIZATION

0.97+

SiliconANGLEORGANIZATION

0.97+

OneQUANTITY

0.97+

about three yearsQUANTITY

0.97+

Day TwoQUANTITY

0.97+

first gentlemanQUANTITY

0.96+

day twoQUANTITY

0.96+

SAPORGANIZATION

0.96+

EULOCATION

0.95+

Datawork Summit Europe 2018EVENT

0.95+

Dataworks SummitEVENT

0.94+

this morningDATE

0.91+

About 40%QUANTITY

0.91+

WikibonORGANIZATION

0.9+

EUORGANIZATION

0.9+

Action Item, Graph DataBases | April 13, 2018


 

>> Hi, I'm Peter Burris. Welcome to Wikibon's Action Item. (electronic music) Once again, we're broadcasting from our beautiful theCUBE Studios in Palo Alto, California. Here in the studio with me, George Gilbert, and remote, we have Neil Raden, Jim Kobielus, and David Floyer. Welcome, guys! >> Hey. >> Hi, there. >> We've got a really interesting topic today. We're going to be talking about graph databases, which probably just immediately turned off everybody. But we're actually not going to talk so much about it from a technology standpoint. We're really going to spend most of our time talking about it from the standpoint of the business problems that IT and technology are being asked to address, and the degree to which graph databases, in fact, can help us address those problems, and what do we need to do to actually address them. Human beings tend to think in terms of relationships of things to each other. So what the graph community talks about is graphed-shaped problems. And by graph-shaped problem we might mean that someone owns something and someone owns something else, or someone shares an asset, or it could be any number of different things. But we tend to think in terms of things and the relationship that those things have to other things. Now, the relational model has been an extremely successful way of representing data for a lot of different applications over the course of the last 30 years, and it's not likely to go away. But the question is, do these graph-shaped problems actually lend themselves to a new technology that can work with relational technology to accelerate the rate at which we can address new problems, accelerate the performance of those new problems, and ensure the flexibility and plasticity that we need within the application set, so that we can consistently use this as a basis for going out and extending the quality of our applications as we take on even more complex problems in the future. So let's start here. Jim Kobielus, when we think about graph databases, give us a little hint on the technology and where we are today. >> Yeah, well, graph databases have been around for quite a while in various forms, addressing various core-use cases such as social network analysis, recommendation engines, fraud detection, semantic search, and so on. The graph database technology is essentially very closely related to relational, but it's specialized to, when you think about it, Peter, the very heart of a graph-shaped business problem, the entity relationship polygram. And anybody who's studied databases has mastered, at least at a high level, entity relationship diagrams. The more complex these relationships grow among a growing range of entities, the more complex sort of the network structure becomes, in terms of linking them together at a logical level. So graph database technology was developed a while back to be able to support very complex graphs of entities, and relationships, in order to do, a lot of it's analytic. A lot of it's very focused on fast query, they call query traversal, among very large graphs, to find quick answers to questions that might involve who owns which products that they bought at which stores in which cities and are serviced by which support contractors and have which connections or interrelationships with other products they may have bought from us and our partners, so forth and so on. When you have very complex questions of this sort, they lend themselves to graph modeling. And to some degree, to the extent that you need to perform very complex queries of this sort very rapidly, graph databases, and there's a wide range of those on the market, have been optimized for that. But we also have graph abstraction layers over RDBMSes and multi-model databases. You'll find them running in IBM's databases, or Microsoft Cosmos DB, and so forth. You don't need graph-specialized databases in order to do graph queries, in order to manipulate graphs. That's the issue here. When does a specialized graph database serve your needs better than a non-graph-optimized but nonetheless graph-enabling database? That's the core question. >> So, Neil Raden, let's talk a little bit about the classes of business problems that could in fact be served by representing data utilizing a graph model. So these graph-shaped problems, independent of the underlying technology. Let's start there. What kinds of problems can business people start thinking about solving by thinking in terms of graphs of things and relationships amongst things? >> It all comes down to connectedness. That's the basis of a graph database, is how things are connected, either weakly or strongly. And these connected relationships can be very complicated. They can be based on very complex properties. A relational database is not based on, not only is it not based on connectedness, it's not based on connectedness at all. I'd like to say it's based on un-connectedness. And the whole idea in a relational database is that the intelligence about connectedness is buried in the predicate of a query. It's not in the database itself. So I don't know how overlaying graph abstractions on top of a relational database are a good idea. On the other hand, I don't know how stitching a relational database into your existing operation is going to work, either. We're going to have to see. But I can tell you that a major part of data science, machine learning, and AI is going to need to address the issue of causality, not just what's related to each other. And there's a lot of science behind using graphs to get at the causality problem. >> And we've seen, well, let's come back to that. I want to come back to that. But George Gilbert, we've kind of experienced a similar type of thing back in the '90s with the whole concept of object-orientated databases. They were represented as a way of re-conceiving data. The problem was that they had to go from the concept all the way down to the physical thing, and they didn't seem to work. What happened? >> Well it turns out, the big argument was, with object-oriented databases, we can model anything that's so much richer, especially since we're programming with objects. And it turns out, though, that theoretically, especially at that time, you could model anything down at the physical level or even the logical level in a relational database, and so those code bases were able to handle sort of similar, both ends of the use cases, both ends of the spectrum. But now that we have such extreme demands on our data management, rather than look at a whole application or multiple applications even sharing a single relational database, like some of the big enterprise apps, we have workloads within apps like recommendation engines, or a knowledge graph, which explains the relationship between people, places, and things. Or digital twins, or mapping your IT infrastructure and applications, and how they all hold together. You could do that in a relational database, but in a graph database, you can organize it so that you can have really fast analysis of these structures. But, the trade-off is, you're going to be much more restricted in how you can update the stuff. >> Alright, so we think about what happened, then, with some of the object-orientated technology, the original world database, the database was bound to the application, and the developer used the database to tell the application where to go find the data. >> George: Right. >> Relational data allowed us not to tell the applications where to find things, but rather how to find things, and that was persisted, and was very successful for a long time. Object-orientated technologies, in many respects, went back to the idea that the developer had to be very concrete about telling the application where the data was, but we didn't want to do things that way. Now, something's happened, David Floyer. One of the reasons why we had this challenge of representing data in a more abstract way across a lot of different forms without having it also being represented physically, and therefore a lot of different copies and a lot of different representations of the data which broke systems of record and everything else, was that the underlying technology was focused on just persisting data and not necessarily delivering it into these new types of datas, databases, data models, et cetera. But Flash changes that, doesn't it? Can't we imagine a world in which we can have our data in Flash and then, which is a technology that's more focused on delivering data, and then having that data be delivered to a lot of different representations, including things like graph databases, graph models. Is that accurate? >> Absolutely. In a moment I'll take it even further. I think the first point is that when we were designing real-time applications, transactional applications, we were very constrained, indeed, by the amount of data that we could get to. So, as a database administrator, I used to have a rule which you could, database developers could not issue more than 100 database calls. And the reason was that, they could always do more than that, but the applications became very unstable and they became very difficult to maintain. The cost of maintenance went up a lot. The whole area of Flash allows us to do a number of things, and the area of UniGrid enables us to do a number of things very differently. So that we can, for example, share data and have many different views of it. We can use UniGrid to be able to bring far greater amounts of power, compute power, GPUs, et cetera, to bear on specific workloads. I think the most useful thing to think about this is this type of architecture can be used to create systems of intelligence, where you have the traditional relational databases dealing with systems of record, and then you can have the AI systems, graph systems, all the other components there looking at the best way of providing data and making decisions in real time that can be fed back into the systems of record. >> Alright, alright. So let's-- >> George: I want to add to something on this. >> So, Neil, let me come back to you very quickly, sorry, George. Let me come back to Neil. I want to spend, go back to this question of what does a graph-shaped problem look like? Let's kind of run down it. We talked about AI, what about IoT, guys? Is IoT going to help us, is IoT going to drive this notion of looking at the world in terms of graphs more or less? What do you think, Neil? >> I don't know. I hadn't really thought about it, Peter, to tell you the truth. I think that one thing we leave out when we talk about graphs is we talk about, you know, nodes and edges and relationships and so forth, but you can also build a graph with very rich properties. And one thing you can get from a graph query that you can't get from a relational query, unless you write careful predicate, is it can actually do some thinking for you. It can tell you something you don't know. And I think that's important. So, without being too specific about IoT, I have to say that, you know, streaming data and trying to relate it to other data, getting down to, very quickly, what's going on, root-cause analysis, I think graph would be very helpful. >> Great, and, Jim Kobielus, how about you? >> I think, yeah I think that IoT is tailor-made for, or I should say, graph modeling and graph databases are tailor-made for the IoT. Let me explain. I think the IoT, the graph is very much a metadata technology, it's expressing context in a connected universe. Where the IoT is concerned it's all about connectivity, and so graphs, increasingly complex graphs of, say, individuals and the devices and the apps they use and locations and various contexts and so forth, these are increasingly graph-based. They're hierarchical and shifting and changing, and so in order to contextualize and personalize experience in a graph, in an IoT world, I think graph databases will be embedded in the very fabric of these environments. Microsoft has a strategy they announced about a year ago to build more of an intelligent edge around, a distributed graph across all their offerings. So I think graphs will become more important in this era, undoubtedly. >> George, what do you think? Business problems? >> Business problems on IoT. The knowledge graph that holds together digital twin, both of these lend themselves to graph modeling, but to use the object-oriented databases as an example, where object modeling took off was in the applications server, where you had the ability to program, in object-oriented language, and that mapped to a relational database. And that is an option, not the only one, but it's an option for handling graph-model data like a digital twin or IT operations. >> Well that suggests that what we're thinking about here, if we talk about graph as a metadata, and I think, Neil, this partly answers the question that you had about why would anybody want to do this, that we're representing the output of a relational data as a node in a network of data types or data forms so that the data itself may still be relationally structured, but from an application standpoint, the output of that query is, itself, a thing that is then used within the application. >> But to expand on that, if you store it underneath, as fully normalized, in relational language, laid out so that there's no duplicates and things like that, it gives you much faster update performance, but the really complex queries, typical of graph data models, would be very, very slow. So, once we have, say, more in memory technology, or we can manage under the covers the sort of multiple representations of the data-- >> Well that's what Flash is going to allow us to do. >> Okay. >> What David Floyer just talked about. >> George: Okay. >> So we can have a single, persistent, physical storage >> Yeah. >> but it can be represented in a lot of different ways so that we avoid some of the problems that you're starting to raise. If we had to copy the data and have physical, physical copies of the data on disc in a lot of different places then we would run into all kinds of consistency and update. It would probably break the model. We'd probably come back to the notion of a single data store. >> George: (mumbles) >> I want to move on here, guys. One really quick thing, David Floyer, I want to ask you. If there's, you mentioned when you were database administrator and you put restrictions on how many database actions an application or transaction was allowed to generate. When we think about what a business is going to have to do to take advantage of this, are there any particular, like one thing that we need to think about? What's going to change within an IT organization to take advantage of graph database? And we'll do the action items. >> Right. So the key here is the number of database calls can grow by a factor of probably a thousand times what it is now with what we can see is coming as technologies over the next couple of years. >> So let me put that in context, David. That's a single transaction now generating a hundred thousand, >> Correct. >> a hundred thousand database calls. >> Well, access calls to data. >> Right. >> Whatever type of database. And the important thing here is that a lot of that is going to move out, with the discussion of IoT, to where the data is coming in. Because the quicker you can do that, the earlier you can analyze that data, and you talked about IoT with possible different sources coming in, a simple one like traffic lights, for example. The traffic lights are being affected by the traffic lights around them within the city. Those sort of problems are ideal for this sort of graph database. And having all of that data locally and being processed locally in memory very, very close to where those sensors are, is going to be the key to developing solutions in this area. >> So, Neil, I've got one question from you, or one question for you. I'm going to put you on the spot. I just had a thought. And here's the thought. We talk a lot about, in some of the new technologies that could in fact be employed here, whether it be blockchain or even going back to SOA, but when we talk about what a system is going to have the authority to do about the idea of writing contracts that describe very, very discretely, what a system is or is not going to do. I have a feeling those contracts are not going to be written in relational terms. I have a feeling that, like most legal documents, they will be written in what looks more like graph terms. I'm extending that a little bit, but this has rights to do this at this point in time. Is that also, this notion of incorporating more contracts directly to how systems work, to assure that we have the appropriate authorities laid out. What do you think? Is that going to be easier or harder as a consequence of thinking in terms of these graph-shaped models? >> Boy, I don't know. Again, another thing I hadn't really thought about. But I do see some real gaps in thinking. Let me give you an analogy. OLAP databases came on the scene back in the '90s whatever. People in finance departments and whatever they loved OLAP. What they hated was the lack of scalability. And now what we see now is scalability isn't a problem and OLAP solutions are suddenly bursting out all over the place. So I think there's a role for a mental model of how you model your data and how you use it that's different from the relational model. I think the relational model has prominence and has that advantage of, what's it called? Occupancy or something. But I think that the graph is going to show some real capabilities that people are lacking right now. I think some of them are at the very high end, things, like I said, getting to causality. But I think that graph theory itself is so much richer than the simple concept of graphs that's implemented in graph databases today. >> Yeah, I agree with that totally. Okay, let's do the action item round. Jim Kobielus, I want to start with you. Jim, action item. >> Yeah, for data professionals and analytic professionals, focus on what graphs can't do, cannot do, because you hear a lot of hyperbolic, they're not useful for unstructured data or for machine learning in database. They're not as useful for schema on read. What they are useful for is the same core thing that relational is useful for which is schema on write applied to structured data. Number one. Number two, and I'll be quick on this, focus on the core use cases that are already proven out for graph databases. We've already ticked them off here, social network analysis, recommendation engines, influencer analysis, semantic web. There's a rich range of mature use cases for which semantic techniques are suited. And then finally, and I'll be very quick here, bear in mind that relational databases have been supporting graph modeling, graph traversal and so forth, for quite some time, including pretty much all the core mature enterprise databases. If you're using those databases already, and they can perform graph traversals and so forth reasonably well for your intended application, stick with that. No need to investigate the pure play, graph-optimized databases on the market. However, that said, there's plenty of good ones, including AWS is coming out with Neptune. Please explore the other alternatives, but don't feel like you have to go to a graph database first and foremost. >> Alright. David Floyer, action item. >> Action item. You are going to need to move your data center and your applications from the traditional way of thinking about it, of handling things, which is sequential copies going around, usually taking it two or three weeks. You're going to have to move towards a shared data model where the same set of data can have multiple views of it and multiple uses for multiple different types of databases. >> George Gilbert, action item. >> Okay, so when you're looking at, you have a graph-oriented problem, in other words the data is shaped like a graph, question is what type of database do you use? If you have really complex query and analysis use cases, probably best to use a graph database. If you have really complex update requirements, best to use a combination, perhaps of relational and graph or something like multi-model. We can learn from Facebook where, for years, they've built their source of truth for the social graph on a bunch of sharded MySQL databases with some layers on top. That's for analyzing the graph and doing graph searches. I'm sorry, for updating the graph and maintaining it and its integrity. But for reading the graph, they have an entirely different layer for comprehensive queries and manipulating and traversing all those relationships. So, you don't get a free lunch either way. You have to choose your sweet spots and the trade-offs associated with them. >> Alright, Neil Raden, action item. >> Well, first of all, I don't think the graph databases are subject to a lot of hype. I think it's just the opposite. I think they haven't gotten much hype at all. And maybe we're going to see that. But another thing is, a fundamental difference when you're looking at a graph and a graph query, it uses something called open world reasoning. A relational database uses closed world reasoning. I'll give you an example. Country has capital city. Now you have in your graph that China has capital city Beijing, China has capital city Beijing. That doesn't violate the graph. The graph simply understands and intuits that they're different names for the same thing. Now, if you love to write correlated sub-queries for many, many different relationships, I'd say stick to your relational database. I see unique capabilities in a graph that would be difficult to implement in a relational database. >> Alright. Thank you very much, guys. Let's talk about what the action item is for all of us. This week we talked about graph databases. We do believe that they have enormous potential, but we first off have to draw a distinction between graph theory, which is a way of looking at the world and envisioning and conceptualizing solutions to problems, and graph database technology, which has the advantages of being able, for certain classes of data models, to be able to very quickly both write and read data that is based on relationships and hierarchies and network structures that are difficult to represent in a normalized relational database manager. Ultimately, our expectation is that over the next few years, we're going to see an explosion in the class of business problems that lend themselves to a graph-modeling orientation. IoT is an example, very complex analytics systems will be an example, but it is not the only approach or the only way of doing things. But what is interesting, what is especially interesting, is over the last few years, a change in the underlying hardware technology is allowing us to utilize and expand the range of tools that we might use to support these new classes of applications. Specifically, the move to Flash allows us to sustain a single physical copy of data and then have that be represented in a lot of different ways to support a lot of different model forms and a lot of different application types, without undermining the fundamental consistency and integrity of the data itself. So that is going to allow us to utilize new types of technologies in ways that we haven't utilized before, because before, whether it was object-oriented technology or OLAP technology, there was always this problem of having to create new physical copies of data which led to enormous data administrative nightmares. So looking forward, the ability to use Flash as a basis for physically storing the data and delivering it out to a lot of different model and tool forms creates an opportunity for us to use technologies that, in fact, may more naturally map to the way that human beings think about things. Now, where is this likely to really play? We talked about IoT, we talked about other types of technologies. Where it's really likely to play is when the domain expertise of a business person is really pushing the envelope on the nature of the business problem. Historically, applications like accounting or whatnot, were very focused on highly stylized data models, things that didn't necessarily exist in the real world. You don't have double-entry bookkeeping running in the wild. You do have it in the legal code, but for some of the things that we want to build in the future, people, the devices they own, where they are, how they're doing things, that lends itself to a real-world experience and human beings tend to look at those using a graph orientation. And the expectations over the next few years, because of the changes in the physical technology, how we can store data, we will be able to utilize a new set of tools that are going to allow us to more quickly bring up applications, more naturally manage data associated with those applications, and, very important, utilize targeted technology in a broader set of complex application portfolios that are appropriate to solve that particular part of the problem, whether it's a recommendation engine or something else. Alright, so, once again, I want to thank the remote guys, Jim Kobielus, Neil Raden, and David Floyer. Thank you very much for being here. George Gilbert, you're in the studio with me. And, once again, I'm Peter Burris and you've been listening to Action Item. Thank you for joining us and we'll talk to you again soon. (electronic music)

Published Date : Apr 13 2018

SUMMARY :

Here in the studio with me, George Gilbert, and the degree to which graph databases, And to some degree, to the extent that you need to perform independent of the underlying technology. that the intelligence about connectedness from the concept all the way down both ends of the use cases, both ends of the spectrum. and the developer used the database and a lot of different representations of the data and the area of UniGrid enables us to do a number of things So let's-- So, Neil, let me come back to you very quickly, I have to say that, you know, and so in order to contextualize and personalize experience and that mapped to a relational database. so that the data itself may still be relationally But to expand on that, if you store it underneath, so that we avoid some of the problems What's going to change within an IT organization So the key here is the number of database calls can grow So let me put that in context, David. the earlier you can analyze that data, the authority to do about the idea of writing contracts But I think that the graph is going to show some real Okay, let's do the action item round. focus on the core use cases that are already proven out David Floyer, action item. You are going to need to move your data center and the trade-offs associated with them. are subject to a lot of hype. So looking forward, the ability to use Flash as a basis

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
David FloyerPERSON

0.99+

Jim KobielusPERSON

0.99+

Neil RadenPERSON

0.99+

GeorgePERSON

0.99+

NeilPERSON

0.99+

George GilbertPERSON

0.99+

Peter BurrisPERSON

0.99+

DavidPERSON

0.99+

MicrosoftORGANIZATION

0.99+

April 13, 2018DATE

0.99+

PeterPERSON

0.99+

JimPERSON

0.99+

AWSORGANIZATION

0.99+

IBMORGANIZATION

0.99+

one questionQUANTITY

0.99+

twoQUANTITY

0.99+

FacebookORGANIZATION

0.99+

Palo Alto, CaliforniaLOCATION

0.99+

BeijingLOCATION

0.99+

singleQUANTITY

0.99+

three weeksQUANTITY

0.99+

bothQUANTITY

0.99+

This weekDATE

0.99+

OneQUANTITY

0.99+

first pointQUANTITY

0.98+

MySQLTITLE

0.98+

more than 100 database callsQUANTITY

0.98+

ChinaLOCATION

0.98+

FlashTITLE

0.98+

oneQUANTITY

0.98+

todayDATE

0.97+

theCUBE StudiosORGANIZATION

0.95+

one thingQUANTITY

0.94+

'90sDATE

0.91+

single data storeQUANTITY

0.88+

doubleQUANTITY

0.87+

both endsQUANTITY

0.85+

a year agoDATE

0.85+

firstQUANTITY

0.84+

Number twoQUANTITY

0.84+

next couple of yearsDATE

0.83+

yearsDATE

0.82+

hundred thousandQUANTITY

0.79+

last 30 yearsDATE

0.72+

hundred thousand databaseQUANTITY

0.72+

thousand timesQUANTITY

0.72+

FlashPERSON

0.68+

CosmosTITLE

0.67+

WikibonORGANIZATION

0.63+

databaseQUANTITY

0.57+

aboutDATE

0.57+

Number oneQUANTITY

0.56+

Action Item | How to get more value out of your data, April 06, 2018


 

>> Hi I'm Peter Burris and welcome to another Wikibon Action Item. (electronic music) One of the most pressing strategic issues that businesses face is how to get more value out of their data, In our opinion that's the essence of a digital business transformation, is the using of data as an asset to improve your operations and take better advantage of market opportunities. The problem of data though, it's shareable, it's copyable, it's reusable. It's easy to create derivative value out of it. One of the biggest misnomers in the digital business world is the notion that data is the new fuel or the new oil. It's not, You can only use oil once. You can apply it to a purpose and not multiple purposes. Data you can apply to a lot of purposes, which is why you are able to get such interesting and increasing returns to that asset if you use it appropriately. Now, this becomes especially important for technology companies that are attempting to provide digital business technologies or services or other capabilities to their customers. In the consumer world, it started to reach a head. Questions about Facebook's reuse of a person's data through an ad based business model is now starting to lead people to question the degree to which the information asymmetry about what I'm giving and how they're using it is really worth the value that I get out of Facebook, is something that consumers and certainly governments are starting to talk about. it's also one of the bases for GDPR, which is going to start enforcing significant fines in the next month or so. In the B2B world that question is going to become especially acute. Why? Because as we try to add intelligence to the services and the products that we are utilizing within digital business, some of that requires a degree of, or some sort of relationship where some amount of data is passed to improve the models and machine learning and AI that are associated with that intelligence. Now, some companies have come out and said flat out they're not going to reuse a customer's data. IBM being a good example of that. When Ginni Rometty at IBM Think said, we're not going to reuse our customer's data. The question for the panel here is, is that going to be a part of a differentiating value proposition in the marketplace? Are we going to see circumstances in which companies keep products and services low by reusing a client's data and others sustaining their experience and sustaining a trust model say they won't. How is that going to play out in front of customers? So joining me today here in the studio, David Floyer. >> Hi there. >> And on the remote lines we have Neil Raden, Jim Kobielus, George Gilbert, and Ralph Finos. Hey, guys. >> All: Hey. >> All right so... Neil, let me start with you. You've been in the BI world as a user, as a consultant, for many, many number of years. Help us understand the relationship between data, assets, ownership, and strategy. >> Oh, God. Well, I don't know that I've been in the BI world. Anyway, as a consultant when we would do a project for a company, there were very clear lines of what belong to us and what belong to the client. They were paying us generously. They would allow us to come in to their company and do things that they needed and in return we treated them with respect. We wouldn't take their data. We wouldn't take their data models that we built, for example, and sell them to another company. That's just, as far as I'm concerned, that's just theft. So if I'm housing another company's data because I'm a cloud provider or some sort of application provider and I say well, you know, I can use this data too. To me the analogy is, I'm a warehousing company and independently I go into the warehouse and I say, you know, these guys aren't moving their inventory fast enough, I think I'll sell some of it. It just isn't right. >> I think it's a great point. Jim Kobielus. As we think about the role that data, machine learning play, training models, delivering new classes of services, we don't have a clean answer right now. So what's your thought on how this is likely to play out? >> I agree totally with Neil, first of all. If it's somebody else's data, you don't own it, therefore you can't sell and you can't monetize it, clearly. But where you have derivative assets, like machine learning models that are derivative from data, it's the same phenomena, it's the same issue at a higher level. You can build and train, or should, your machine learning models only from data that you have legal access to. You own or you have license and so forth. So as you're building these derivative assets, first and foremost, make sure as you're populating your data lake, to build and to do the training, that you have clear ownership over the data. So with GDPR and so forth, we have to be doubly triply vigilant to make sure that we're not using data that we don't have authorized ownership or access to. That is critically important. And so, I get kind of queasy when I hear some people say we use blockchain to make... the sharing of training data more distributed and federated or whatever. It's like wait a second. That doesn't solve the issues of ownership. That makes it even more problematic. If you get this massive blockchain of data coming from hither and yon, who owns what? How do you know? Do you dare build any models whatsoever from any of that data? That's a huge gray area that nobody's really addressed yet. >> Yeah well, it might mean that the blockchain has been poorly designed. I think that we talked in one of the previous Action Items about the role that blockchain design's going to play. But moving aside from the blockchain, so it seems as though we generally agree that data is owned by somebody typically and that the ownership of it, as Neil said, means that you can't intercept it at some point in time just because it is easily copied and then generate rents on it yourself. David Floyer, what does that mean from a ongoing systems design and development standpoint? How are we going to assure, as Jim said, not only that we know what data is ours but make sure that we have the right protection strategies, in a sense, in place to make sure that the data as it moves, we have some influence and control over it. >> Well, my starting point is that AI and AI infused products are fueled by data. You need that data, and Jim and Neil have already talked about that. In my opinion, the most effective way of improving a company's products, whatever the products are, from manufacturing, agriculture, financial services, is to use AI infused capabilities. That is likely to give you the best return on your money and businesses need to focus on their own products. That's the first place you are trying to protect from anybody coming in. Businesses own that data. They own the data about your products, in use by your customers, use that data to improve your products with AI infused function and use it before your competition eats your lunch. >> But let's build on that. So we're not saying that, for example, if you're a storage system supplier, since that's a relatively easy one. You've got very, very fast SSDs. Very, very fast NVMe over Fabric. Great technology. You can collect data about how that system is working but that doesn't give you rights to then also collect data about how the customer's using the system. >> There is a line which you need to make sure that you are covering. For example, Call Home on a product, any product, whose data is that? You need to make sure that you can use that data. You have some sort of agreement with the customer and that's a win-win because you're using that data to improve the product, prove things about it. But that's very, very clear that you should have a contractual relationship, as Jim and Neil were pointing out. You need the right to use that data. It can't come beyond the hand. But you must get it because if you don't get it, you won't be able to improve your products. >> Now, we're talking here about technology products which have often very concrete and obvious ownership and people who are specifically responsible for administering them. But when we start getting into the IoT domain or in other places where the device is infused with intelligence and it might be collecting data that's not directly associated with its purpose, just by virtue of the nature of sensors that are out there and the whole concept of digital twin introduces some tension in all this. George Gilbert. Take us through what's been happening with the overall suppliers of technology that are related to digital twin building, designing, etc. How are they securing or making promises committing to their customers that they will not cross this data boundary as they improve the quality of their twins? >> Well, as you quoted Ginni Rometty starting out, she's saying IBM, unlike its competitors, will not take advantage and leverage and monetize your data. But it's a little more subtle than that and digital twins are just sort of another manifestation of industry-specific sort of solution development that we've done for decades. The differences, as Jim and David have pointed out, that with machine learning, it's not so much code that's at the heart of these digital twins, it's the machine learning models and the data is what informs those models. Now... So you don't want all your secret sauce to go from Mercedes Benz to BMW but at the same time the economics of industry solutions means that you do want some of the repeatability that we've always gotten from industry solutions. You might have parts that are just company specific. And so in IBM's case, if you really parse what they're saying, they take what they learn in terms of the models from the data when they're working with BMW, and some of that is going to go into the industry specific models that they're going to use when they're working with Mercedes-Benz. If you really, really sort of peel the onion back and ask them, it's not the models, it's not the features of the models, but it's the coefficients that weight the features or variables in the models that they will keep segregated by customer. So in other words, you get some of the benefits, the economic benefits of reuse across customers with similar expertise but you don't actually get all of the secret sauce. >> Now, Ralph Finos-- >> And I agree with George here. I think that's an interesting topic. That's one of the important points. It's not kosher to monetize data that you don't own but conceivably if you can abstract from that data at some higher level, like George's describing, in terms of weights and coefficients and so forth, in a neural network that's derivative from the model. At some point in the abstraction, you should be able to monetize. I mean, it's like a paraphrase of some copyrighted material. A paraphrase, I'm not a lawyer, but you can, you can sell a paraphrase because it's your own original work that's based obviously on your reading of Moby Dick or whatever it is you're paraphrasing. >> Yeah, I think-- >> Jim I-- >> Peter: Go ahead, Neil. >> I agree with that but there's a line. There was a guy who worked at Capital One, this was about ten years ago, and he was their chief statistician or whatever. This was before we had words like machine learning and data science, it was called statistics and predictive analytics. He left the company and formed his own company and rewrote and recoded all of the algorithms he had for about 20 different predictive models. Formed a company and then licensed that stuff to Sybase and Teradata and whatnot. Now, the question I have is, did that cross the line or didn't it? These were algorithms actually developed inside Capital One. Did he have the right to use those, even if he wrote new computer code to make them run in databases? So it's more than just data, I think. It's a, well, it's a marketplace and I think that if you own something someone should not be able to take it and make money on it. But that doesn't mean you can't make an agreement with them to do that, and I think we're going to see a lot of that. IMSN gets data on prescription drugs and IRI and Nielsen gets scanner data and they pay for it and then they add value to it and they resell it. So I think that's really the issue is the use has to be understood by all the parties and the compensation has to be appropriate to the use. >> All right, so Ralph Finos. As a guy who looks at market models and handles a lot of the fundamentals for how we do our forecasting, look at this from the standpoint of how people are going to make money because clearly what we're talking about sounds like is the idea that any derivative use is embedded in algorithms. Seeing how those contracts get set up and I got a comment on that in a second, but the promise, a number of years ago, is that people are going to start selling data willy-nilly as a basis for their economic, a way of capturing value out of their economic activities or their business activities, hasn't matured yet generally. Do we see like this brand new data economy, where everybody's selling data to each other, being the way that this all plays out? >> Yeah, I'm having a hard time imagining this as a marketplace. I think we pointed at the manufacturing industries, technology industries, where some of this makes some sense. But I think from a practitioner perspective, you're looking for variables that are meaningful that are in a form you can actually use to make prediction. That you understand what the the history and the validity of that of that data is. And in a lot of cases there's a lot of garbage out there that you can't use. And the notion of paying for something that ultimately you look at and say, oh crap, it's not, this isn't really helping me, is going to be... maybe not an insurmountable barrier but it's going to create some obstacles in the market for adoption of this kind of thought process. We have to think about the utility of the data that feeds your models. >> Yeah, I think there's going to be a lot, like there's going to be a lot of legal questions raised and I recommend that people go look at a recent SiliconANGLE article written by Mike Wheatley and edited by our Editor In Chief Robert Hof about Microsoft letting technology partners own right to joint innovations. This is a quite a difference. This is quite a change for Microsoft who used to send you, if you sent an email with an idea to them, you'd often get an email back saying oh, just to let you know any correspondence we have here is the property of Microsoft. So there clearly is tension in the model about how we're going to utilize data and enable derivative use and how we're going to share, how we're going to appropriate value and share in the returns of that. I think this is going to be an absolutely central feature of business models, certainly in the digital business world for quite some time. The last thing I'll note and then I'll get to the Action Items, the last thing I'll mention here is that one of the biggest challenges in whenever we start talking about how we set up businesses and institutionalize the work that's done, is to look at the nature of the assets and the scope of the assets and in circumstances where the asset is used by two parties and it's generating a high degree of value, as measured by the transactions against those assets, there's always going to be a tendency for one party to try to take ownership of it. One party that's able to generate greater returns than the other, almost always makes move to try to take more control out of that asset and that's the basis of governance. And so everybody talks about data governance as though it's like something that you worry about with your backup and restore. Well, that's important but this notion of data governance increasingly is going to become a feature of strategy and boardroom conversations about what it really means to create data assets, sustain those data assets, get value out of them, and how we determine whether or not the right balance is being struck between the value that we're getting out of our data and third parties are getting out of our data, including customers. So with that, let's do a quick Action Item. David Floyer, I'm looking at you. Why don't we start here. David Floyer, Action Item. >> So my Action Item is for businesses, you should focus. Focus on data about your products in use by your customers, to improve, help improve the quality of your products and fuse AI into those products as one of the most efficient ways of adding value to it. And do that before your competition has a chance to come in and get data that will stop you from doing that. >> George Gilbert, Action Item. >> I guess mine would be that... in most cases you you want to embrace some amount of reuse because of the economics involved from your joint development with a solution provider. But if others are going to get some benefit from sort of reusing some of the intellectual property that informs models that you build, make sure you negotiate with your vendor that any upgrades to those models, whether they're digital twins or in other forms, that there's a canonical version that can come back and be an upgraded path for you as well. >> Jim Kobielus, Action Item. >> My Action Item is for businesses to regard your data as a product that you monetize yourself. Or if you are unable to monetize it yourself, if there is a partner, like a supplier or a customer who can monetize that data, then negotiate the terms of that monetization in your your relationship and be vigilant on that so you get a piece of that stream. Even if the bulk of the work is done by your partner. >> Neil Raden, Action Item. >> It's all based on transparency. Your data is your data. No one else can take it without your consent. That doesn't mean that you can't get involved in relationships where there's an agreement to do that. But the problem is most agreements, especially when you look at a business consumer, are so onerous that nobody reads them and nobody understands them. So the person providing the data has to have an unequivocal right to sell it to you and the person buying it has to really understand what the limits are that they can do with it. >> Ralph Finos, Action Item. You're muted Ralph. But it was brilliant, whatever it was. >> Well it was and I really can't say much more than that. (Peter laughs) But I think from a practitioner perspective and I understand that from a manufacturing perspective how the value could be there. But as a practitioner if you're fishing for data out there that someone has that might look like something you can use, chances are it's not. And you need to be real careful about spending money to get data that you're not really clear is going to help you. >> Great. All right, thanks very much team. So here's our Action Item conclusion for today. The whole concept of digital business is predicated in the idea of using data assets in a differential way to better serve your markets and improve your operations. It's your data. Increasingly, that is going to be the base for differentiation. And any weak undertaking to allow that data to get out has the potential that someone else can, through their data science and their capabilities, re-engineer much of what you regard as your differentiation. We've had conversations with leading data scientists who say that if someone were to sell customer data into a open marketplace, that it would take about four days for a great data scientist to re-engineer almost everything about your customer base. So as a consequence, we have to tread lightly here as we think about what it means to release data into the wild. Ultimately, the challenge there for any business will be: how do I establish the appropriate governance and protections, not just looking at the technology but rather looking at the overall notion of the data assets. If you don't understand how to monetize your data and nonetheless enter into a partnership with somebody else, by definition that partner is going to generate greater value out of your data than you are. There's significant information asymmetries here. So it's something that, every company must undertake an understanding of how to generate value out of their data. We don't think that there's going to be a general-purpose marketplace for sharing data in a lot of ways. This is going to be a heavily contracted arrangement but it doesn't mean that we should not take great steps or important steps right now to start doing a better job of instrumenting our products and services so that we can start collecting data about our products and services because the path forward is going to demonstrate that we're going to be able to improve, dramatically improve the quality of the goods and services we sell by reducing the assets specificities for our customers by making them more intelligent and more programmable. Finally, is this going to be a feature of a differentiated business relationship through trust? We're open to that. Personally, I'll speak for myself, I think it will. I think that there is going to be an important element, ultimately, of being able to demonstrate to a customer base, to a marketplace, that you take privacy, data ownership, and intellectual property control of data assets seriously and that you are very, very specific, very transparent, in how you're going to use those in derivative business transactions. All right. So once again, David Floyer, thank you very much here in the studio. On the phone: Neil Raden, Ralph Finos, Jim Kobielus, and George Gilbert. This has been another Wikibon Action Item. (electronic music)

Published Date : Apr 6 2018

SUMMARY :

and the products that we are utilizing And on the remote lines we have Neil Raden, You've been in the BI world as a user, as a consultant, and independently I go into the warehouse and I say, So what's your thought on how this is likely to play out? that you have clear ownership over the data. and that the ownership of it, as Neil said, That is likely to give you the best return on your money but that doesn't give you rights to then also You need the right to use that data. and the whole concept of digital twin and some of that is going to go into It's not kosher to monetize data that you don't own and the compensation has to be appropriate to the use. and handles a lot of the fundamentals and the validity of that of that data is. and that's the basis of governance. and get data that will stop you from doing that. because of the economics involved from your Even if the bulk of the work is done by your partner. and the person buying it has to really understand But it was brilliant, whatever it was. how the value could be there. and that you are very, very specific,

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
JimPERSON

0.99+

David FloyerPERSON

0.99+

Jim KobielusPERSON

0.99+

NeilPERSON

0.99+

George GilbertPERSON

0.99+

Peter BurrisPERSON

0.99+

GeorgePERSON

0.99+

Neil RadenPERSON

0.99+

BMWORGANIZATION

0.99+

Mike WheatleyPERSON

0.99+

MicrosoftORGANIZATION

0.99+

Ginni RomettyPERSON

0.99+

IBMORGANIZATION

0.99+

IRIORGANIZATION

0.99+

NielsenORGANIZATION

0.99+

April 06, 2018DATE

0.99+

PeterPERSON

0.99+

DavidPERSON

0.99+

Ralph FinosPERSON

0.99+

one partyQUANTITY

0.99+

two partiesQUANTITY

0.99+

Mercedes-BenzORGANIZATION

0.99+

FacebookORGANIZATION

0.99+

Mercedes BenzORGANIZATION

0.99+

One partyQUANTITY

0.99+

Robert HofPERSON

0.99+

Capital OneORGANIZATION

0.99+

firstQUANTITY

0.99+

RalphPERSON

0.99+

oneQUANTITY

0.99+

todayDATE

0.98+

OneQUANTITY

0.98+

IMSNORGANIZATION

0.98+

GDPRTITLE

0.98+

TeradataORGANIZATION

0.98+

next monthDATE

0.96+

Moby DickTITLE

0.95+

about 20 different predictive modelsQUANTITY

0.95+

SybaseORGANIZATION

0.95+

decadesQUANTITY

0.93+

about ten years agoDATE

0.88+

about four daysQUANTITY

0.86+

secondQUANTITY

0.83+

onceQUANTITY

0.82+

WikibonORGANIZATION

0.8+

of years agoDATE

0.77+

ActionORGANIZATION

0.68+

SiliconANGLETITLE

0.66+

twinsQUANTITY

0.64+

Editor In ChiefPERSON

0.61+

ItemsQUANTITY

0.58+

twinQUANTITY

0.48+

ThinkORGANIZATION

0.46+

Action Item | March 30, 2018


 

>> Hi, I'm Peter Burris and welcome to another Wikibon Action Item. (electronic music) Once again, we're broadcasting from theCUBE studios in beautiful Palo Alto. Here in the studio with me are George Gilbert and David Floyer. And remote, we have Neil Raden and Jim Kobielus. Welcome everybody. >> David: Thank you. >> So this is kind of an interesting topic that we're going to talk about this week. And it really is how are we going to find new ways to generate derivative use out of many of the applications, especially web-based applications that are have been built over the last 20 years. A basic premise of digital business is that the difference between business and digital business is the data and how you craft data as an asset. Well, as we all know in any universal Turing machine, data is the basis for representing both the things that you're acting upon but also the algorithms, the software itself. Software is data and the basic principles of how we capture software oriented data assets or software assets and then turn them into derivative sources of value and then reapply them to new types of problems is going to become an increasingly important issue as we think about the world of digital business is going to play over the course of the next few years. Now, there are a lot of different domains where this might work but one in particular that's especially as important is in the web application world where we've had a lot of application developers and a lot of tools be a little bit more focused on how we use web based services to manipulate things and get software to do the things we want to do and also it's a source of a lot of the data that's been streaming into big data applications. And so it's a natural place to think about how we're going to be able to create derivative use or derivative value out of crucial software assets. How are we going to capture those assets, turn them into something that has a different role for the business, performs different types of work, and then reapply them. So to start the conversation, Jim Kobielus. Why don't you take us through what some of these tools start to look like. >> Hello, Peter. Yes, so really what we're looking at here, in order to capture these assets, the web applications, we first have to generate those applications and the bulk of that worker course is and remains manual. And in fact, there is a proliferation of web application development frameworks on the market and the range of them continues to grow. Everything from React to Angular to Ember and Node.js and so forth. So one of the core issues that we're seeing out there in the development world is... are there too many of these. Is there any prospect for simplification and consolidation and convergence on web application development framework to make the front-end choices for developers a bit easier and straightforward in terms of the front-end development of JavaScript and HTML as well as the back-end development of the logic to handle the interactions; not only with the front-end on the UI side but also with the infrastructure web services and so forth. Once you've developed the applications, you, a professional programmer, then and only then can we consider the derivative uses you're describing such as incorporation or orchestration of web apps through robotic process automation and so forth. So the issue is how can we simplify or is there a trend toward simplification or will there soon be a trend towards simplification of a front-end manual development. And right now, I'm not seeing a whole lot of action in this direction of a simplification on the front-end development. It's just a fact. >> So we're not seeing a lot of simplification and convergence on the actual frameworks for creating software or creating these types of applications. But we're starting to see some interesting trends for stuff that's already been created. How can we generate derivative use out of it? And also per some of our augmented programming research, new ways of envisioning the role that artificial intelligence machine learning, etc, can play in identifying patterns of utilization so that we are better able to target those types of things that could be used for derivative or could be applied to derivative use. Have I got that right, Jim? >> Yeah, exactly. AI within robotic process automation, anything that could has already been built can be captured through natural language processing, through a computer image recognition, OCR, and so forth. And then trans, in that way, it's an asset that can be repurposed in countless ways and that's the beauty RPA or where it's going. So the issue is then not so much capture of existing assets but how can we speed up and really automate the original development of all that UI logic? I think RPA is part of the solution but not the entire solution, meaning RPA provides visual front-end tools for the rest of us to orchestrate more of the front-end development of the application UI and interaction logic. >> And it's also popping up-- >> That's part of broader low-code-- >> Yeah, it's also popping up at a lot of the interviews that we're doing with CIOs about related types of things but I want to scope this appropriately. So we're not talking about how we're going to take those transaction processing applications, David Floyer, and envelope them and containerize them and segment them and apply a new software. That's not what we're talking about, nor are we talking about the machine to machine world. Robot process automation really is a tool for creating robots out of human time interfaces that can scale the amount of work and recombine it in different ways. But we're not really talking about the two extremes. The hardcore IoT or the hardcore systems of record. Right? >> Absolutely. But one question I have for Jim and yourself is the philosophy for most people developing these days is mobile first. The days of having an HTML layout on a screen have gone. If you aren't mobile first, that's going to be pretty well a disaster for any particular development. So Jim, how does RPA and how does your discussion fit in with mobile and all of the complexity that mobile brings? All of the alternative ways that you can do things with mobile. >> Yeah. Well David, of course, low-code tools, there are many. There are dozens out there. There are many of those that are geared towards primarily supporting of fast automated development of mobile applications to run on a variety of devices and you know, mobile UIs. That's part of the solution as it were but also in the standard web application development world. know there's these frameworks that I've described. Everything from React to Angular to Vue to Ember, everything else, are moving towards a concept, more than concept, it's a framework or paradigm called progressive web apps. And what progressive web apps are all about, that's really the mainstream of web application development now is blurring the distinction between mobile and web and desktop applications because you build applications, JavaScript applications for browsers. The apps look and behave as if they were real-time interactive in memory mobile apps. What that means is that they download fresh content throughout a browsing session progressively. I'm putting to determine air quotes because that's where the progressive web app comes in. And they don't require the end-user to visit an app store or download software. They don't require anything in terms of any special capabilities in terms of synchronizing data from servers to run in memory natively inside of web accessible containers that are local to the browser. They just feel mobile even though they, excuse me, they may be running on a standard desktop with narrowband connectivity and so forth. So they scream and they scream in the context of their standard JavaScript Ajax browser obsession. >> So when we think about this it got, jeez Jim it almost sounds like like client-side Java but I think you're we're talking about something, as you said, that that evolves as the customer uses it and there's a lot of techniques and approaches that we've been using to do some of those things. But George Gilbert, the reason I bring up the notion of client-side Java is because we've seen other initiatives over the years try to do this. Now, partly they failed because, David Floyer, they focused on too much and tried to standardize or presume that everything required a common approach and we know that that's always going to fail. But what are some of the other things that we need to think about as we think about ways of creating derivative use out of software or in digital assets. >> Okay, so. I come at it from two angles. And as Jim pointed out, there's been a Cambrian explosion of creativity and innovation on frankly on client-side development and server-side development. But if you look at how we're going to recombine our application assets, we tried 20 years ago with EAI but that was, and it's sort of like MuleSoft but only was for on-prem apps. And it didn't work because every app was bespoke essentially-- >> Well, it worked for point-to-point classes of applications. >> Yeah, but it required bespoke development for every-- >> Peter: Correct. >> Every instance because the apps were so customized. >> Peter: And the interfaces were so customized. >> Yes. At the same time we were trying to build higher-level application development capabilities on desktop productivity tools with macros and then scripting languages, cross application, and visual development or using applications as visual development building blocks. Now, you put those two things together and you have the ability to work with user interfaces by building on, I'm sorry, to work with applications that have user interfaces and you have the functionality that's in the richer enterprise applications and now we have the technology to say let's program by example on essentially a concrete use case and a concrete workflow. And then you go back in and you progressively generalize it so it can handle more exception conditions and edge conditions. In other words, you start with... it's like you start with the concrete and you get progressively more abstract. >> Peter: You start with the work that the application performs. >> Yeah. >> And not knowledge of the application itself. >> Yes. But the key thing is, as you said, recombining assets because we're sort of marrying the best of EAI world with the best of the visual client-side development world. Where, as Jim points out, machine learning is making it easier for the tools to stay up to date as the user interfaces change across releases. This means that, I wouldn't say this as easy as spreadsheet development, it's just not. >> It's not like building spreadsheet macros but it's more along those lines. >> Yeah, but it's not as low-level as just building raw JavaScript because, and Jim's great example of JavaScript client-side frameworks. Look at our Gmail inbox application that millions of people use. That just downloads a new version whenever they want to drop it and they're just shipping JavaScript over to us. But the the key thing and this is, Peter, your point about digital business. By combining user interfaces, we can bridge applications that were silos then we can automate the work the humans were doing to bridge those silos and then we can reconstitute workflows in much more efficient-- >> Around the digital assets, which is kind of how business ultimately evolves. And that's a crucial element of this whole thing. So let's change direction a little bit because we're talking about, as Jim said, we've been talking about the fact that there are all these frameworks out there. There may be some consolidation on the horizon, we're researching that right now. Although there's not a lot of evidence that it's happening but there clearly is an enormous number of digital assets that are in place inside these web-based applications, whether it be relative to mobile or something else. And we want to find derivative use of or we want to create derivative use out of them and there's some new tools that allow us to do that in a relatively simple straightforward way, like RPA and there are certainly others. But that's not where this ends up. We know that this is increasingly going to be a target for AI, what we've been calling augmented programming and the ability to use machine learning and related types of technologies to be able to reveal, make transparent, gain visibility into, patterns within applications and within the use of data and then have that become a crucial feature of the development process. And increasingly even potentially to start actually creating code automatically based on very clear guidance about what work needs to be performed. Jim, what's happening in that world right now? >> Oh, let's see. So basically, I think what's going to happen over time is that more of the development cycle for web applications will incorporate not just the derivative assets, the AI to be able to decompose existing UI elements and recombine them. Enable flexible and automated recombination in various ways but also will enable greater tuning of the UI in an automated fashion through A/B testing that's in line to the development cycle based on metrics that AI is able to sift through in terms of... different UI designs can be put out into production applications in real time and then really tested with different categories of users and then the best suited or best fit a design based on like reducing user abandonment rates and speeding up access to commonly required capabilities and so forth. The metrics can be rolled in line to the automation process to automatically select the best fit UI design that had been developed through automated means. In other words, this real-world experimentation of the UI has been going on for quite some time in many enterprises and it's often, increasingly it involves data scientists who are managing the predictive models to sort of very much drive the whole promotion process of promoting the best fit design to production status. I think this will accelerate. We'll take more of these in line metrics on UI and then we brought, I believe, into more RPA style environments so the rest of us building out these front ends are automating more of our transactions and many more of the UIs can't take advantage of the fact that we'll let the infrastructure choose the best fit of the designs for us without us having to worry about doing A/B testing and all that stuff. The cloud will handle it. >> So it's a big vision. This notion of it, even eventually through more concrete standard, well understood processes to apply some of these AIML technologies to being able to choose options for the developer and even automate some elements of those options based on policy and rules. Neil Raden, again, we've been looking at similar types of things for years. How's that worked in the past and let's talk a bit about what needs to happen now to make sure that if it's going to work, it's going to work this time. >> Well, it really hasn't worked very well. And the reason it hasn't worked very well is because no one has figured out a representational framework to really capture all the important information about these objects. It's just too hard to find them. Everybody knows that when you develop software, 80% of it is grunt work. It's just junk. You know, it's taking out the trash and it's setting things up and whatever. And the real creative stuff is a very small part of it. So if you could alleviate the developer from having to do all that junk by just picking up pieces of code that have already been written and tested, that would be big. But the idea of this has been overwhelmed by the scale and the complexity. And people have tried to create libraries like JavaBeans and object-oriented programming and that sort of thing. They've tried to create catalogs of these things. They've used relational databases, doesn't work. My feeling and I hate to use the word because it always puts people to sleep is some kind of ontology that's deep enough and rich enough to really do this. >> Oh, hold on Neil, I'm feeling... (laughs) >> Yeah. Well, I mean, what good is it, I mean go to Git, right. You can find a thousand things but you don't know which one is really going to work for you because it's not rich enough, it doesn't have enough information. It needs to have quality metrics. It needs to have reviews by people who have used converging and whatever. So that's that's where I think we run into trouble. >> Yeah, I know. >> As far as robots, yeah? >> Go ahead. >> As far as robots writing code, you're going to have the same problem. >> No, well here's where I think it's different this time and I want to throw it out to you guys and see if it's accurate and we'll get to the action items. Here's where I think it's different. In the past, partly perhaps because it's where developers were most fascinated, we try to create object-oriented database and object oriented representations of data and object oriented, using object oriented models as a way of thinking about it. And object oriented code and object oriented this and and a lot of it was relatively low in the stack. And we try to create everything from scratch and it turned out that whenever we did that, it was almost like CASE from many years ago. You create it in the tool and then you maintain it out of the tool and you lose all organization of how it worked. What we're talking about here, and the reason why I think this is different, I think Neil is absolutely right. It's because we're focusing our attention on the assets within an application that create the actual business value. What does the application do and try to encapsulate those and render those as things that are reusable without necessarily doing an enormous amount of work on the back-end. Now, we have to be worried about the back-end. It's not going to do any good to do a whole bunch of RPA or related types of stuff on the front-end that kicks off an enormous number of transactions that goes after a little server that's 15 years old. That's historically only handled a few transactions a minute. So we have to be very careful about how we do this. But nonetheless, by focusing more attention on what is generating value in the business, namely the actions that the application delivers as opposed to knowledge of the application itself, namely how it does it then I think that we're constraining the problem pretty dramatically subject to the realities of what it means to actually be able to maintain and scale applications that may be asked to do more work. What do you guys think about that? >> Now Peter, let me say one more thing about this, about robots. I think you're all a lot more sanguine about AI and robots doing these kinds of things. I'm not. Let me read to you have three pickup lines that a deep neural network developed after being trained to do pickup lines. You must be a tringle? 'Cause you're the only thing here. Hey baby, you're to be a key? Because I can bear your toot? Now, what kind of code would-- >> Well look, the problems look, we go back 50 years and ELIZA and the whole notion of whatever it was. The interactive psychology. Look, let's be honest about this. Neil, you're making a great point. I don't know that any of us are more or less sanguine and that probably is a good topic for a future action item. What are the practical limits of AI and how that's going to change over time. But let's be relatively simple here. The good news about applying AI inside IT problems is that you're starting with engineered systems, with engineered data forms, and engineered data types, and you're working with engineers, and a lot of that stuff is relatively well structured. Certainly more structured than the outside world and it starts with digital assets. That's why a AI for IT operations management is more likely. That's why AI for application programming is more likely to work as opposed to AI to do pickup lines, which is as you said semantically it's all over the place. There's very, very few people that are going to conform to a set of conventions for... Well, I want to move away from the concept of pickup lines and set conventions for other social interactions that are very, very complex. We don't look at a face and get excited or not in a way that corresponds to an obvious well-understood semantic problem. >> Exactly, the value that these applications deliver is in their engagement with the real world of experience and that's not the, you can't encode the real world of human lived experience in a crisp clear way. It simply has to be proven out in the applications or engagement through people or not through people, with the real world outcome and then some outcomes like the ones that Neil read off there, in terms of those ridiculous pickup lines. Most of those kinds of automated solutions won't make a freaking bit of sense because you need humans with their brains. >> Yeah, you need human engagement. So coming back to this key point, the constraint that we're putting on this right now and the reason why certainly, perhaps I'm a little bit more ebullient than you might be Neil. But I want to be careful about this because I also have some pretty strong feelings about where what the limits of AI are, regardless of what Elon Musk says. That at the end of the day, we're talking about digital objects, not real objects, that are engineered, not, haven't evolved over a few billion years, to deliver certain outputs and data that's been tested and relatively well verified. As opposed to have an unlimited, at least from human experience standpoint, potential set of outcomes. So in that small world and certainly the infrastructure universe is part of that and what we're saying is increasingly the application development universe is going to be part of that as part of the digital business transformation. I think it's fair to say that we're going to start seeing AI machine learning and some of these other things being applied to that realm with some degree of success. But, something to watch for. All right, so let's do action item. David Floyer, why don't we start with you. Action item. >> In addressing this, I think that the keys in terms of business focus is first of all mobiles, you have to design things for mobile. So any use of any particular platform or particular set of tools has to lead to mobile being first. And the mobiles are changing rapidly with the amount of data that's being generated on the mobile itself, around the mobile. So that's the first point I would make from a business perspective. And the second is that from a business perspective, one of the key things is that you can reduce cost. Automation must be a key element of this and therefore designing things that will take out tasks and remove tasks, make things more efficient, is going to be an incredibly important part of this. >> And reduce errors. >> And reduce errors, absolutely. Probably most important is reduce errors. Is to take those out of the of the chain and where you can speed things up by removing human intervention and human tasks and raising what humans are doing to a higher level. >> Other things. George Gilbert, action item. >> Okay, so. Really quickly on David's point that we have many more application forms and expressions that we have to present like mobile first. And going back to using RPA as an example. The UiPath product that we've been working with, the core of its capability is to be able to identify specific UI elements in a very complex presentation, whether it's on a web browser or whether it's on a native app on your desktop or whether it's mobile. I don't know how complete they are on mobile because I'm not sure if they did that first but that core capability to identify in a complex, essentially collection and hierarchy of UI elements, that's what makes it powerful. Now on the AI part, I don't think it's as easy as pointing it at one app and then another and say go make them talk. It's more like helping you on the parts where they might be a little ambiguous, like if pieces move around from release to release, things like that. So my action item is say start prototyping with the RPA tools because that's probably, they're probably robust enough to start integrating your enterprise apps. And the only big new wrinkle that's come out in the last several weeks that is now in everyone's consciousness is the MuleSoft acquisition by Salesforce because that's going back to the EAI model. And we will see more app to app integration at the cloud level that's now possible. >> Neil Raden, action item. >> Well, you know, Mark Twain said, there's only two kinds of people in the world. The kind who think there are only two kinds of people in the world and the ones who know better. I'm going to deviate from that a little and say that there's really two kinds of software developers in the world. They're the true computer scientists who want to write great code. It's elegant, it's maintainable, it adheres to all the rules, it's creative. And then there's an army of people who are just trying to get something done. So the boss comes to you and says we've got to get a new website up apologizing for selling the data of 50 million of our customers and you need to do it in three days. Now, those are the kind of people who need access to things that can be reused. And I think there's a huge market for that, as well as all these other software development robots so to speak. >> Jim Kobielus, action item. >> Yeah, for simplifying web application development, I think that developers need to distinguish between back-end and front-end framework. There's a lot of convergence around the back-end framework. Specifically Node.js. So you can basically decouple the decision in terms of front-end frameworks from that and you need to write upfront. Make sure that you have a back-end that supports many front ends because there are many front ends in the world. Secondly, the front ends themselves seem to be moving towards React and Angular and Vue as being the predominant ones. You'll find more programmers who are familiar with those. And then thirdly, as you move towards consolidation on to fewer frameworks on the front-end, move towards low-code tools that allow you just with the push of a button, you know visual development, being able to deploy the built out UI to a full range of mobile devices and web applications. And to close my action item... I'll second what David said. Move toward a mobile first development approach for web applications with a focus on progressive web applications that can run on mobiles and others. Where they give a mobile experience. With intermittent connectivity, with push notifications, with a real-time in memory fast experience. Move towards a mobile first development paradigm for all of your your browser facing applications and that really is the simplification strategy you can and should pursue right now on the development side because web apps are so important, you need a strategy. >> Yeah, so mobile irrespective of the... irrespective of the underlying biology or what have you of the user. All right, so here's our action item. Our view on digital business is that a digital business uses data differently than a normal business. And a digital business transformation ultimately is about how do we increase our visibility into our data assets and find new ways of creating new types of value so that we can better compete in markets. Now, that includes data but it also includes application elements, which also are data. And we think increasingly enterprises must take a more planful and purposeful approach to identifying new ways of deriving additional streams of value out of application assets, especially web application assets. Now, this is a dream that's been put forward for a number of years and sometimes it's work better than others. But in today's world we see a number of technologies emerging that are likely, at least in this more constrained world, to present a significant new set of avenues for creating new types of digital value. Specifically tools like RPA, remote process automation, that are looking at the outcomes of an application and allow programmers use a by example approach to start identifying what are the UI elements, what those UI elements do, how they could be combined, so that they can be composed into new things and thereby provide a new application approach, a new application integration approach which is not at the data and not at the code but more at the work that a human being would naturally do. These allow for greater scale and greater automation and a number of other benefits. The reality though is that you also have to be very cognizant as you do this, even though you can find these, find these assets, find a new derivative form and apply them very quickly to new potential business opportunities that you have to know what's happening at the back-end as well. Whether it's how you go about creating the assets, with some of the front-end tooling, and being very cognizant of which front ends are going to be better or not better or better able at creating these more reusable assets. Or whether you're talking about still how relatively mundane things like how a database serialized has access to data and will fall over because you've created an automated front-end that's just throwing a lot of transactions at it. The reality is there's always going to be complexity. We're not going to see all the problems being solved but some of the new tools allow us to focus more attention on where the real business value is created by apps, find ways to reuse that, and apply it, and bring it into a digital business transformation approach. All right. Once again. George Gilbert, David Floyer, here in the studio. Neil Raden, Jim Kobielus, remote. You've been watching Wikibon Action Item. Until next time, thanks for joining us. (electronic music)

Published Date : Mar 30 2018

SUMMARY :

Here in the studio with me are and get software to do the things we want to do and the range of them continues to grow. and convergence on the actual frameworks and that's the beauty RPA or where it's going. that can scale the amount of work and all of the complexity that mobile brings? but also in the standard web application development world. and we know that that's always going to fail. and innovation on frankly on client-side development classes of applications. and you have the ability to work with user interfaces that the application performs. But the key thing is, as you said, recombining assets but it's more along those lines. and they're just shipping JavaScript over to us. and the ability to use machine learning and many more of the UIs can't take advantage of the fact some of these AIML technologies to and rich enough to really do this. Oh, hold on Neil, I'm feeling... I mean go to Git, right. you're going to have the same problem. and the reason why I think this is different, Let me read to you have three pickup lines and how that's going to change over time. and that's not the, you can't encode and the reason why certainly, one of the key things is that you can reduce cost. and where you can speed things up George Gilbert, action item. the core of its capability is to So the boss comes to you and says and that really is the simplification strategy that are looking at the outcomes of an application

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
JimPERSON

0.99+

David FloyerPERSON

0.99+

Jim KobielusPERSON

0.99+

Neil RadenPERSON

0.99+

DavidPERSON

0.99+

Peter BurrisPERSON

0.99+

George GilbertPERSON

0.99+

Mark TwainPERSON

0.99+

NeilPERSON

0.99+

PeterPERSON

0.99+

March 30, 2018DATE

0.99+

80%QUANTITY

0.99+

50 millionQUANTITY

0.99+

Palo AltoLOCATION

0.99+

Node.jsTITLE

0.99+

JavaTITLE

0.99+

SalesforceORGANIZATION

0.99+

two kindsQUANTITY

0.99+

secondQUANTITY

0.99+

first pointQUANTITY

0.99+

bothQUANTITY

0.99+

AngularTITLE

0.99+

JavaScriptTITLE

0.99+

Elon MuskPERSON

0.99+

MuleSoftORGANIZATION

0.99+

two anglesQUANTITY

0.99+

oneQUANTITY

0.99+

GmailTITLE

0.98+

millions of peopleQUANTITY

0.98+

two thingsQUANTITY

0.98+

two extremesQUANTITY

0.98+

three daysQUANTITY

0.98+

dozensQUANTITY

0.98+

one questionQUANTITY

0.98+

ReactTITLE

0.98+

one appQUANTITY

0.97+

EmberTITLE

0.97+

VueTITLE

0.97+

firstQUANTITY

0.96+

20 years agoDATE

0.96+

todayDATE

0.96+

this weekDATE

0.95+

SecondlyQUANTITY

0.94+

AjaxTITLE

0.94+

JavaBeansTITLE

0.93+

RPATITLE

0.91+

WikibonTITLE

0.91+

thirdlyQUANTITY

0.9+

theCUBEORGANIZATION

0.88+

CASETITLE

0.88+

Wikibon Action Item | March 23rd, 2018


 

>> Hi, I'm Peter Burris, and welcome to another Wikibon Action Item. (funky electronic music) This was a very interesting week in the tech industry, specifically because IBM's Think Conference aggregated in a large number of people. Now, The CUBE was there. Dave Vellante, John Furrier, and myself all participated in somewhere in the vicinity of 60 or 70 interviews with thought leaders in the industry, including a number of very senior IBM executives. The reason why this becomes so important is because IBM made a proposal to the industry about how some of the digital disruption that the market faces is likely to unfold. The normal approach or the normal mindset that people have used is that startups, digital native companies were going to change the way that everything was going to operate, and the dinosaurs were going to go by the wayside. IBM's interesting proposal is that the dinosaurs actually are going to learn to dance, utilizing or playing on a book title from a number of years ago. And the specific argument was laid out by Ginni Rometty in her keynote, when she said that there are number of factors that are especially important here. Factor number one is that increasingly, businesses are going to recognize that the role that their data plays in competition is on the ascending. It's getting more important. Now, this is something that Wikibon's been arguing for quite some time. In fact, we have said that the whole key to digital disruption and digital business is to acknowledge the difference between business and digital business, is the role that data and data assets play in your business. So we have strong agreement there. But on top of that, Ginni Rometty made the observation that 80% of the data that could be accessed and put the work in business has not yet been made available to the new activities, the new processes that are essential to changing the way customers are engaged, businesses operate, and overall change and disruption occurs. So her suggestion is that that 80%, that vast amount of data that could be applied that's not being tapped, is embedded deep within the incumbents. And so the core argument from IBM is that the incumbent companies, not the digital natives, not the startups, but the incumbent companies are poised to make a significant, to have a significant role in disrupting how markets operate, because of the value of their data that hasn't currently been put to work and made available to new types of work. That was the thesis that we heard this week, and that's what we're going to talk about today. Are the incumbent really going to strike back? So Dave Vellante, let me start with you. You were at Think, you heard the same type of argument. What did you walk away with? >> So when I first heard the term incumbent disruptors, I was very skeptical, and I still am. But I like the concept and I like it a lot. So let me explain why I like it and why I think there's some real challenges. If I'm a large incumbent global 2,000, I'm not going to just roll over because the world is changing and software is eating my world. Rather what I'm going to do is I'm going to use my considerable assets to compete, and so that includes my customers, my employees, my ecosystem, the partnerships that I have there, et cetera. The reason why I'm skeptical is because incumbents aren't organized around their data assets. Their data assets are stovepipe, they're all over the place. And the skills to leverage that data value, monetize that data, understand the contribution that data makes toward monetization, those skills are limited. They're bespoke and they're very narrow. They're within lines of business or divisions. So there's a huge AI gap between the true digital business and an incumbent business. Now, I don't think all is lost. I think a lot of strategies can work, from M&A to transformation projects, joint ventures, spin-offs. Yeah, IBM gave some examples. They put up Verizon and American Airlines. I don't see them yet as the incumbent disruptors. But then there was another example of IBM Maersk doing some very interesting and disrupting things, Royal Bank of Canada doing some pretty interesting things. >> But in a joint venture forum, Dave, to your point, they specifically set up a joint venture that would be organized around this data, didn't they? >> Yes, and that's really the point I'm trying to make. All is not lost. There are certain things that you can do, many things that you can do as an incumbent. And it's really game on for the next wave of innovation. >> So we agree as a general principle that data is really important, David Floyer. And that's been our thesis for quite some time. But Ginni put something out there, Ginni Rometty put something out there. My good friend, Ginni Rometty, put something out there that 80% of the data that could be applied to disruption, better customer engagement, better operations, new markets, is not being utilized. What do we think about that? Is that number real? >> If you look at the data inside any organization, there's a lot of structured data. And that has better ability to move through an organization. Equally, there's a huge amount of unstructured data that goes in emails. It goes in voicemails, it goes in shared documents. It goes in diagrams, PowerPoints, et cetera, that also is data which is very much locked up in the way that Dave Vellante was talking about, locked up in a particular process or in a particular area. So is there a large amount of data that could be used inside an organization? Is it private, is it theirs? Yes, there is. The question is, how do you tap that data? How do you organize around that data to release it? >> So this is kind of a chicken and egg kind of a problem. Neil Raden, I'm going to turn to you. When we think about this chicken and egg problem, the question is do we organize in anticipation of creating these assets? Do we establish new processes in anticipation of creating these data assets? Or do we create the data assets first and then re-institutionalize the work? And the reason why it's a chicken and egg kind of problem is because it takes an enormous amount of leadership will to affect the way a business works before the asset's in place. But it's unclear that we're going to get the asset that we want unless we affect the reorganization, institutionalization. Neil, is it going to be a chicken? Is it going to be the egg? Or is this one of the biggest problems that these guys are going to have? >> Well, I'm a little skeptical about this 80% number. I need some convincing before I comment on that. But I would rather see, when David mentioned the PowerPoint slides or email or that sort of thing, I would rather see that information curated by the application itself, rather than dragged out in broad data and reinterpreted in something. I think that's very dangerous. I think we saw that in data warehousing. (mumbling) But when you look at building data lakes, you throw all this stuff into a data lake. And then after the fact, somebody has to say, "Well, what does this data mean?" So I find it kind of a problem. >> So Jim Kobielus, a couple weeks ago Microsoft actually introduced a technology or a toolkit that could in fact be applied to move this kind of advance processing for dragging value out of a PowerPoint or a Word document or something else, close and proximate to the application. Is that, I mean, what Neil just suggested I think is a very, very good point. Are we going to see these kinds of new technologies directly embedded within applications to help users narrowly, but businesses more broadly, lift that information out of these applications so it can be freed up for other uses? >> I think yeah, on some level, Peter, this is a topic called dark data. It's been discussed in data management circles for a long time. The vast majority, I think 75 to 80% is the number that I see in the research, is locked up in terms of it's not searchable, it's not easily discoverable. It's not mashupable, I'm making up a word. But the term mashup hasn't been used in years, but I think it's a good one. What it's all about is if we want to make the most out of our incumbent's data, then we need to give the business, the business people, the tools to find the data where it is, to mash it up into new forms and analytics and so forth, in order to monetize it and sell it, make money off of it. So there are a wide range of data discovery and other tools that support a fairly self-service combination and composition of composite data object. I don't know that, however, that the culture of monetizing existing dataset and pulling dark data into productized forms, I don't think that's taken root in any organization anywhere. I think that's just something that consultants talk about as something that gee, should be done, but I don't think it's happening in the real world. >> And I think you're probably correct about that, but I still think Neil raised a great point. And I would expect, and I think we all believe, that increasingly this is not going to come as a result of massive changes in adoption of new data science, like practices everywhere, but an embedding of these technologies. Machine learning algorithms, approaches to finding patterns within application data, in the applications themselves, which is exactly what Neil was saying. So I think that what we're going to see, and I wanted some validation from you guys about this, is increasingly tools being used by application providers to reveal data that's in applications, and not open source, independent tool chains that then ex-post-facto get applied to all kinds of different data sources in an attempt for the organization to pull the stuff out. David Floyer, what do you think? >> I agree with you. I think there's a great opportunity for the IT industry in this area to put together solutions which can go and fit in. On the basis of existing applications, there's a huge amount of potential, for example, of ERP systems to link in with IOT systems, for example, and provide a data across an organization. Rather than designing your own IOT system, I think people are going to buy-in pre-made ones. They're going to put the devices in, the data's going to come in, and the AI work will be done as part of that, as part of implementing that. And right across the board, there is tremendous opportunity to improve the applications that currently exist, or put in new versions of applications to address this question of data sharing across an organization. >> Yeah, I think that's going to be a big piece of what happens. And it also says, Neil Raden, something about whether or not enormous machine learning deities in the sky, some of which might start with the letter W, are going to be the best and only way to unlock this data. Is this going to be something that, we're suggesting now that it's something that's going to be increasingly-distributed closer to applications, less invasive and disruptive to people, more invasive and disruptive to the applications and the systems that are in place. And what do you think, Neil? Is that a better way of thinking about this? >> Yeah, let me give you an example. Data science the way it's been practiced is a mess. You have one person who's trying to find the data, trying to understand the data, complete your selection, designing experiments, doing runs, and so forth, coming up with formulas and then putting them in the cluster with funny names so they can try to remember which one was which. And now what you have are a number of software companies who've come up with brilliant ways of managing that process, of really helping the data science to create a work process in curating the data and so forth. So if you want to know something about this particular model, you don't have to go to the person and say, "Why did you do that model? "What exactly were you thinking?" That information would be available right there in the workbench. And I think that's a good model for, frankly, everything. >> So let's-- >> Development pipeline toolkits. That's a hot theme. >> Yeah, it's a very hot theme. But Jim, I don't think you think but I'm going to test it. I don't think we're going to see AI pipeline toolkits be immediately or be accessed by your average end user who's putting together a contract, so that that toolkit or so that data is automatically munched and ingested or ingested and munched by some AI pipeline. This is going to happen in an application. So the person's going to continue to do their work, and then the tooling will or will not grab that information and then combine it with other things through the application itself into the pipeline. We got that right? >> Yeah, but I think this is all being, everything you described is being embedded in applications that are making calls to backend cloud services that have themselves been built by data scientists and exposed through rest APIs. Steve, Peter, everything you're describing is coming to applications fairly rapidly. >> I think that's a good point, but I want to test it. I want to test that. So Ralph Finos, you've been paying a lot of attention during reporting season to what some of the big guys are saying on some of their calls and in some of their public statements. One company in particular, Oracle, has been finessing a transformation, shall we say? What are they saying about how this is going as we think about their customer base, the transformation of their customer base, and the degree to which applications are or are not playing a role in those transformations? >> Yeah, I think in their last earnings call a couple days ago that the point that they were making around the decline and the-- >> Again, this is Oracle. So in Oracle's last earnings call, yeah. >> Yeah, I'm sorry, yeah. And the decline and the revenue growth rate in the public cloud, the SAS end of their business, was a function really of a slowdown of the original acquisitions they made to kind of show up as being a transformative cloud vendor, and that are basically beginning to run out of gas. And I think if you're looking at marketing applications and sales-related applications and content-type of applications, those are kind of hitting a natural high of growth. And I think what they were saying is that from a migration perspective on ERP, that that's going to take a while to get done. They were saying something like 10 or 15% of their customer base had just begun doing some sort of migration. And that's a data around ERP and those kinds of applications. So it's a long slog ahead of them, but I'd rather be in their shoes, I think, for the long run than trying to kind of jazz up in the near-term some kind of pseudo-SAS cloud growth based on acquisition and low-lying fruit. >> Yeah, because they have a public cloud, right? I mean, at least they're in the game. >> Yeah, and they have to show they're in the game. >> Yeah, and specifically they're talking about their applications as clouds themselves. So they're not just saying here's a set of resources that you can build, too. They're saying here's a set of SAS-based applications that you can build around. >> Dave: Right. Go ahead, Ralph, sorry. >> Yeah, yeah. And I think the notion there is the migration to their ERP and their systems of record applications that they're saying, this is going to take a long time for people to do that migration because of complexity in process. >> So the last point, or Dave Vellante, did you have a point you want to make before I jump into a new thought here? >> I just compare and contrast IBM and Oracle. They have public clouds, they have SAS. Many others don't. I think this is a major different point of differentiation. >> Alright, so we've talked about whether or not this notion of data as a source of value's important, and we agree it is. We still don't know whether or not 80% is the right number, but it is some large number that's currently not being utilized and applied to work differently than the data currently is. And that likely creates some significant opportunities for transformation. Do we ultimately think that the incumbents, again, I mention the chicken and the egg problem. Do we ultimately think that the incumbents are... Is this going to be a test of whether or not the incumbents are going to be around in 10 years? The degree to which they enact the types of transformation we thought about. Dave Vellante, you said you were skeptical. You heard the story. We've had the conversation. Will incumbents who do this in fact be in a better position? >> Well, incumbents that do take action absolutely will be in a better position. But I think that's the real question. I personally believe that every industry is going to get disrupted by digital, and I think a lot of companies are not prepared for this and are going to be in deep trouble. >> Alright, so one more thought, because we're talking about industries overall. There's so many elements we haven't gotten to, but there's one absolute thing I want to talk about. Specifically the difference between B2C and B2B companies. Clearly the B2C industries have been disrupted, many of them pretty significantly, over the last few years. Not too long ago, I have multiple not-necessarily-good memories of running the aisles of Toys R Us sometime after 10 o'clock at night, right around December 24th. I can't do that anymore, and it's not because my kids are grown. Or I won't be able to do that soon anymore. So B2C industries seem to have been moved faster, because the digital natives are able to take advantage of the fact that a lot of these B2C industries did not have direct and strong relationships with those customers. I would posit that a lot of the B2B industries are really where the action's going to take. And the kind of way I would think about it, and David Floyer, I'll turn to you first. The way I would think about it is that in the B2C world, it's new markets and new ways of doing things, which is where the disruption's going to take place. So more of a substitution as opposed to a churn. But in the B2B markets, it's disrupting greater efficiencies, greater automation, greater engagement with existing customers, as well as finding new businesses and opportunities. What do you think about that? >> I think the B2B market is much more stable. Relationships, business relationships, very, very important. They take a long time to change. >> Peter: But much of that isn't digital. >> A lot of that is not digital. I agree with that. However, I think that the underlying change that's happening is one of automation. B2B are struggling to put into place automation with robots, automation everywhere. What you see, for example, in Amazon is a dedication to automation, to making things more efficient. And I think that's, to me, the biggest challenges, owning up to the fact that they have to change their automation, get themselves far more efficient. And if they don't succeed in doing that, then their ability to survive or their likelihood of being taken over with a reverse takeover becomes higher and higher and higher. So how do you go about that level, huge increase in automation that is needed to survive, I think is the biggest question for B2B players. >> And when we think about automation, David Floyer, we're not talking about the manufacturing arms or only talking about the manufacturing arms. We're talking about a lot of new software automation. Dave Vellante, Jim Kobielus, RPA is kind of a new thing. Dave, we saw some interesting things at Think. Bring us up to speed quickly on what the community at Think was talking about with RPA. >> Well, I tell you. There were a lot of people in financial services, which is IBM's stronghold. And they're using software robots to automate a lot of the backend stuff that humans were doing. That's a major, major use case. I would say 25 to 30% of the financial services organizations that I talked to had active RPA projects ongoing at the moment. I don't know. Jim, what are your thoughts? >> Yeah, I think backend automation is where B2B disruption is happening. As the organizations are able to automate more of their backend, digitize more of their backend functions and accelerate them and improve the throughput of transactions, are those that will clean up. I think for the B2C space, it's the frontend automation of the digitalization of the engagement channels. But RPA is essentially a key that's unlocking backend automation for everybody, because it allows more of the frontend business analysts and those who are not traditionally BPM, or business process re-engineering professionals, to begin to take standard administrative processes and begin to automate them from, as it were, the outside-in in a greater way. So I think RPA is a secret key for that. I think we'll see some of the more disruptive organizations, businesses, take RPA and use it to essentially just reverse-engineer, as it were, existing processes, but in an automated fashion, and drive that improvement but in the backend by AI. >> I just love the term software robots. I just think that that's, I think that so strongly evokes what's going to happen here. >> If I could add, I think there's a huge need to simplify that space. The other thing I witnessed at IBM Think is it's still pretty complicated. It's still a heavy lift. There's a lot of big services component to this, which is probably why IBM loves it. But there's a massive market, I think, to simplify the adoption or RPA. >> I completely agree. We have to open the aperture as well. Again, the goal is not to train people new things, new data science, new automation stuff, but to provide tools and increasingly embed those tools into stuff that people are already using, so that the disruption and the changes happen more as a consequence of continuing to do what the people do. Alright, so let's hit the action item we're on, guys. It's been a great conversation. Again, we haven't talked about GDPR. We haven't talked about a wide array of different factors that are going to be an issue. I think this is something we're going to talk about. But on the narrow issue of can the disruptors strike back? Neil Raden, let's start with you. Neil Raden, action item. >> I've been saying since 1975 that I should be hanging around with a better class of people, but I do spend a lot of time in the insurance industry. And I have been getting a consensus that in the next five to 10 years, there will no longer be underwriters for claims adjustments. That business is ready for massive, massive change. >> And those are disruptors, largely. Jim Kobielus, action item. >> Action item. In terms of business disruption, is just not to imagine that because you were the incumbent in the past era in some solution category that's declining, that that automatically guarantees you, that makes your data fit for seizing opportunities in the future. As we've learned from Blockbuster Video, the fact that they had all this customer data didn't give them any defenses against Netflix coming along and cleaning their coffin, putting them out of business. So the next generation of disruptor will not have any legacy data to work from, and they'll be able to work miracles because they made a strategic bet on some frontend digital channel that made all the difference. >> Ralph Finos, action item. >> Yeah, I think there's a notion here of siege mentality. And I think the incumbents are in the castle walls, and the disruptors are outside the castle walls. And sometimes the disruptors, you know, scale the walls. Sometimes they don't. But I think being inside the walls is a long-run tougher thing to be at. >> Dave Vellante, action item. >> I want to pick up on something Neil said. I think it's alluring for some of these industries, like insurance and financial services and healthcare, even parts of government, that really haven't been disrupted in a huge way yet to say, "Well, I'll wait and I'll see what happens." I think that's a huge mistake. I think you have to start immediately thinking about strategies, particularly around your data, as we talked about earlier. Maybe it's M&A, maybe it's joint ventures, maybe it's spinning out new companies. But the time is past where you should be acting. >> David Floyer, action item. >> I think that it's easier to focus on something that you can actually do. So my action item is that the focus of most B2B companies should be looking at all of their processes and incrementally automating them, taking out the people cost, taking out the cost, other costs, automating those processes as much as possible. That, in my opinion, is the most likely path to being in the position that you can continue to be competitive. Without that focus, it's likely that you're going to be disrupted. >> Alright. So the one thing I'll say about that, David, is when I think you say people cost I think you mean the administrative cost associated with people. >> And people doing things, automating jobs. >> Alright, so we have been talking here in today's Wikibon Action Item about the question, will the incumbents be able to strike back? The argument we heard at IBM Think this past week, and this is the third week of March, was that data is an asset that can be applied to significantly disrupt industries, and that incumbents have a lot of data that hasn't been bought into play in the disruptive flow. And IBM's argument is that we're going to see a lot of incumbents start putting their data into play, more of their data assets into play. And that's going to have a significant impact ultimately on industry structure, customer engagement, the nature of the products and services that are available over the course of the next decade. We agree. We generally agree. We might nitpick about whether it's 80%, whether it's 60%. But in general, the observation is an enormous amount of data that exists within a large company, that's related to how they conduct business, is siloed and locked away and is used once and is not made available, is dark and is not made available for derivative uses. That could, in fact, lead to significant consequential improvements in how a business's transaction costs are ultimately distributed. Automation's going to be a big deal. David Floyer's mentioned this in the past. I'm also of the opinion that there's going to be a lot of new opportunities for revenue enhancement and products. I think that's going to be as big, but it's very clear that to start it makes an enormous amount of sense to take a look at where your existing transaction costs are, where existing information asymmetries exist, and see what you can do to unlock that data, make it available to other processes, and start to do a better job of automating local and specific to those activities. And we generally ask our clients to take a look at what is your value proposition? What are the outcomes that are necessary for that value proposition? What activities are most important to creating those outcomes? And then find those that, by doing a better job of unlocking new data, you can better automate those activities. In general, our belief is that there's a significant difference between B2C and B2B businesses. Why? Because a lot of B2C businesses never really had that direct connection, therefore never really had as much of the market and customer data about what was going on. A lot of point-of-sale perhaps, but not a lot of other types of data. And then the disruptors stepped in and created direct relationships, gathered that data and were able to rapidly innovate products and services that served consumers differently. Where a lot of that new opportunity exists is in the B2B world. And here's where the real incumbents are going to start flexing their muscles over the course of the next decade, as they find those opportunities to engage differently, to automate existing practices and activities, change their cost model, and introduce new approaches to operating that are cloud-based, blockchain-based, data-based, based on data, and find new ways to utilize their people. If there's one big caution we have about this, it's this. Ultimately, the tooling is not broadly mature. The people necessary to build a lot of these tools are increasingly moving into the traditional disruptors, the legacy disruptors if we will. AWS, Netflix, Microsoft, companies more along those lines. That talent is very dear still in the industry, and it's going to require an enormous effort to bring those new types of technologies that can in fact liberate some of this data. We looked at things like RPA, robot process automation. We look at the big application providers to increasingly imbue their products and services with some of these new technologies. And ultimately, paradoxically perhaps, we look for the incumbent disruptors to find ways to disrupt without disrupting their own employees and customers. So embedding more of these new technologies in an ethical way directly into the systems and applications that serve people, so that the people face minimal changes to learning new tricks, because the systems themselves have gotten much more automated and much more... Are able to learn and evolve and adjust much more rapidly in a way that still corresponds to the way people do work. So our action item. Any company in the B2B space that is waiting for data to emerge as an asset in their business, so that they can then do all the institutional, re-institutionalizing of work and reorganizing of work and new types of investment, is not going to be in business in 10 years. Or it's going to have a very tough time with it. The big challenge for the board and the CIO, and it's not successfully been done in the past, at least not too often, is to start the process today without necessarily having access to the data, of starting to think about how the work's going to change, think about the way their organization's going to have to be set up. This is not business process re-engineering. This is organizing around future value of data, the options that data can create, and employ that approach to start doing local automation, serve customers, and change the way partnerships work, and ultimately plan out for an extended period of time how their digital business is going to evolve. Once again, I want to thank David Floyer here in the studio with me. Neil Raden, Dave Vellante, Ralph Finos, Jim Kobielus remote. Thanks very much guys. For all of our clients, once again this has been a Wikibon Action Item. We'll talk to you again. Thanks for watching. (funky electronic music)

Published Date : Mar 23 2018

SUMMARY :

is that the dinosaurs actually are going to learn to dance, And the skills to leverage that data value, Yes, and that's really the point I'm trying to make. that 80% of the data that could be applied to disruption, And that has better ability to move through an organization. that these guys are going to have? And then after the fact, somebody has to say, close and proximate to the application. that the culture of monetizing existing dataset in an attempt for the organization to pull the stuff out. the data's going to come in, Yeah, I think that's going to be a big piece of what happens. of really helping the data science That's a hot theme. So the person's going to continue to do their work, that are making calls to backend cloud services and the degree to which applications are So in Oracle's last earnings call, yeah. and that are basically beginning to run out of gas. I mean, at least they're in the game. here's a set of resources that you can build, too. is the migration to their ERP I think this is a major different point of differentiation. and applied to work differently than the data currently is. and are going to be in deep trouble. So more of a substitution as opposed to a churn. They take a long time to change. And I think that's, to me, the biggest challenges, or only talking about the manufacturing arms. of the financial services organizations that I talked to and drive that improvement but in the backend by AI. I just love the term software robots. There's a lot of big services component to this, of different factors that are going to be an issue. that in the next five to 10 years, And those are disruptors, largely. that made all the difference. And sometimes the disruptors, you know, scale the walls. But the time is past where you should be acting. So my action item is that the focus of most B2B companies So the one thing I'll say about that, David, and employ that approach to start doing local automation,

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
DavidPERSON

0.99+

JimPERSON

0.99+

Dave VellantePERSON

0.99+

David FloyerPERSON

0.99+

Ginni RomettyPERSON

0.99+

VerizonORGANIZATION

0.99+

Jim KobielusPERSON

0.99+

Peter BurrisPERSON

0.99+

Neil RadenPERSON

0.99+

NeilPERSON

0.99+

IBMORGANIZATION

0.99+

StevePERSON

0.99+

MicrosoftORGANIZATION

0.99+

NetflixORGANIZATION

0.99+

RalphPERSON

0.99+

DavePERSON

0.99+

PeterPERSON

0.99+

AWSORGANIZATION

0.99+

OracleORGANIZATION

0.99+

75QUANTITY

0.99+

American AirlinesORGANIZATION

0.99+

Ralph FinosPERSON

0.99+

March 23rd, 2018DATE

0.99+

25QUANTITY

0.99+

John FurrierPERSON

0.99+

10QUANTITY

0.99+

Toys R UsORGANIZATION

0.99+

80%QUANTITY

0.99+

60%QUANTITY

0.99+

ThinkORGANIZATION

0.99+

AmazonORGANIZATION

0.99+

15%QUANTITY

0.99+

GinniPERSON

0.99+

60QUANTITY

0.99+

PowerPointTITLE

0.99+

10 yearsQUANTITY

0.99+

1975DATE

0.99+

WordTITLE

0.99+

Royal Bank of CanadaORGANIZATION

0.99+

firstQUANTITY

0.99+

todayDATE

0.98+

this weekDATE

0.98+

Wikibon Action Item Quick Take | Microsoft AI Platform for Windows, March 2018


 

>> Hi I'm Peter Burris and welcome to another Wikibon Action Action Item Quick Take. Jim Kobielus, Microsoft seems to be gettin' ready to do a makeover of application development. What's going on? >> Yeah, that's pretty exciting, Peter. So, last week, on the 7th, Microsoft added in one of their Developer Days and now something called AI Platform for Windows and let me explain why that's important. Because that is going to bring Machine Learning down to the desktop applications, anything that's written to run on Windows 10. And why that's important is that, starting with Visual Studio 15.7, they'll be an ability for developers who don't know anything about Machine Learning to be able to, in a very visual way, create Machine Learning models, that they can then have trained in the cloud, and then deployed to their Windows applications, whatever it might be, and to do real-time, local inferencing in those applications, without need for round-tripping back to the cloud. So, what we're looking at now is they're going to bring this capability into the core of Visual Studio and then they're going to be backwards compatible with previous versions of Visual Studio. What that means is, I can just imagine, over the next couple of years, most Windows applications will be heavily ML enabled, so that more and more of the application logic at the desktop in Windows, will be driven by ML, they'll be less need for apps, as we've known them, historically, pre-packaged bundles of code. It'll be dynamic logic. It'll be ML. So, I think this is really marking the beginning of the end of the app era at the device level, I think. So, I'm really excited and we're looking forward to hearing more about Microsoft, where they're going with AI Platform for Windows, but I think that's a landmark announcement we'll stay tuned for. >> Excellent. Jim Kobielus, thank you very much. This has been another Wikibon Action Item Quick Take. (soft digital music)

Published Date : Mar 19 2018

SUMMARY :

Jim Kobielus, Microsoft seems to be gettin' ready to do and then they're going to be backwards compatible with previous Jim Kobielus, thank you very much.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Jim KobielusPERSON

0.99+

Peter BurrisPERSON

0.99+

March 2018DATE

0.99+

MicrosoftORGANIZATION

0.99+

Visual Studio 15.7TITLE

0.99+

Windows 10TITLE

0.99+

Visual StudioTITLE

0.99+

last weekDATE

0.99+

WindowsTITLE

0.99+

PeterPERSON

0.98+

PlatformTITLE

0.74+

yearsDATE

0.72+

7thDATE

0.68+

next coupleDATE

0.55+

WikibonTITLE

0.54+

WikibonORGANIZATION

0.38+

Wikibon Action Item | De-risking Digital Business | March 2018


 

>> Hi I'm Peter Burris. Welcome to another Wikibon Action Item. (upbeat music) We're once again broadcasting from theCube's beautiful Palo Alto, California studio. I'm joined here in the studio by George Gilbert and David Floyer. And then remotely, we have Jim Kobielus, David Vellante, Neil Raden and Ralph Finos. Hi guys. >> Hey. >> Hi >> How you all doing? >> This is a great, great group of people to talk about the topic we're going to talk about, guys. We're going to talk about the notion of de-risking digital business. Now, the reason why this becomes interesting is, the Wikibon perspective for quite some time has been that the difference between business and digital business is the role that data assets play in a digital business. Now, if you think about what that means. Every business institutionalizes its work around what it regards as its most important assets. A bottling company, for example, organizes around the bottling plant. A financial services company organizes around the regulatory impacts or limitations on how they share information and what is regarded as fair use of data and other resources, and assets. The same thing exists in a digital business. There's a difference between, say, Sears and Walmart. Walmart mades use of data differently than Sears. And that specific assets that are employed and had a significant impact on how the retail business was structured. Along comes Amazon, which is even deeper in the use of data as a basis for how it conducts its business and Amazon is institutionalizing work in quite different ways and has been incredibly successful. We could go on and on and on with a number of different examples of this, and we'll get into that. But what it means ultimately is that the tie between data and what is regarded as valuable in the business is becoming increasingly clear, even if it's not perfect. And so traditional approaches to de-risking data, through backup and restore, now needs to be re-thought so that it's not just de-risking the data, it's de-risking the data assets. And, since those data assets are so central to the business operations of many of these digital businesses, what it means to de-risk the whole business. So, David Vellante, give us a starting point. How should folks think about this different approach to envisioning business? And digital business, and the notion of risk? >> Okay thanks Peter, I mean I agree with a lot of what you just said and I want to pick up on that. I see the future of digital business as really built around data sort of agreeing with you, building on what you just said. Really where organizations are putting data at the core and increasingly I believe that organizations that have traditionally relied on human expertise as the primary differentiator, will be disrupted by companies where data is the fundamental value driver and I think there are some examples of that and I'm sure we'll talk about it. And in this new world humans have expertise that leverage the organization's data model and create value from that data with augmented machine intelligence. I'm not crazy about the term artificial intelligence. And you hear a lot about data-driven companies and I think such companies are going to have a technology foundation that is increasingly described as autonomous, aware, anticipatory, and importantly in the context of today's discussion, self-healing. So able to withstand failures and recover very quickly. So de-risking a digital business is going to require new ways of thinking about data protection and security and privacy. Specifically as it relates to data protection, I think it's going to be a fundamental component of the so-called data-driven company's technology fabric. This can be designed into applications, into data stores, into file systems, into middleware, and into infrastructure, as code. And many technology companies are going to try to attack this problem from a lot of different angles. Trying to infuse machine intelligence into the hardware, software and automated processes. And the premise is that meaty companies will architect their technology foundations, not as a set of remote cloud services that they're calling, but rather as a ubiquitous set of functional capabilities that largely mimic a range of human activities. Including storing, backing up, and virtually instantaneous recovery from failure. >> So let me build on that. So what you're kind of saying if I can summarize, and we'll get into whether or not it's human expertise or some other approach or notion of business. But you're saying that increasingly patterns in the data are going to have absolute consequential impacts on how a business ultimately behaves. We got that right? >> Yeah absolutely. And how you construct that data model, and provide access to the data model, is going to be a fundamental determinant of success. >> Neil Raden, does that mean that people are no longer important? >> Well no, no I wouldn't say that at all. I'm talking with the head of a medical school a couple of weeks ago, and he said something that really resonated. He said that there're as many doctors who graduated at the bottom of their class as the top of their class. And I think that's true of organizations too. You know what, 20 years ago I had the privilege of interviewing Peter Drucker for an hour and he foresaw this, 20 years ago, he said that people who run companies have traditionally had IT departments that provided operational data but they needed to start to figure out how to get value from that data and not only get value from that data but get value from data outside the company, not just internal data. So he kind of saw this big data thing happening 20 years ago. Unfortunately, he had a prejudice for senior executives. You know, he never really thought about any other people in an organization except the highest people. And I think what we're talking about here is really the whole organization. I think that, I have some concerns about the ability of organizations to really implement this without a lot of fumbles. I mean it's fine to talk about the five digital giants but there's a lot of companies out there that, you know the bar isn't really that high for them to stay in business. And they just seem to get along. And I think if we're going to de-risk we really need to help companies understand the whole process of transformation, not just the technology. >> Well, take us through it. What is this process of transformation? That includes the role of technology but is bigger than the role of technology. >> Well, it's like anything else, right. There has to be communication, there has to be some element of control, there has to be a lot of flexibility and most importantly I think there has to be acceptability by the people who are going to be affected by it, that is the right thing to do. And I would say you start with assumptions, I call it assumption analysis, in other words let's all get together and figure out what our assumptions are, and see if we can't line em up. Typically IT is not good at this. So I think it's going to require the help of a lot of practitioners who can guide them. >> So Dave Vellante, reconcile one point that you made I want to come back to this notion of how we're moving from businesses built on expertise and people to businesses built on expertise resident as patterns in the data, or data models. Why is it that the most valuable companies in the world seem to be the ones that have the most real hardcore data scientists. Isn't that expertise and people? >> Yeah it is, and I think it's worth pointing out. Look, the stock market is volatile, but right now the top-five companies: Apple, Amazon, Google, Facebook and Microsoft, in terms of market cap, account for about $3.5 trillion and there's a big distance between them, and they've clearly surpassed the big banks and the oil companies. Now again, that could change, but I believe that it's because they are data-driven. So called data-driven. Does that mean they don't need humans? No, but human expertise surrounds the data as opposed to most companies, human expertise is at the center and the data lives in silos and I think it's very hard to protect data, and leverage data, that lives in silos. >> Yes, so here's where I'll take exception to that, Dave. And I want to get everybody to build on top of this just very quickly. I think that human expertise has surrounded, in other businesses, the buildings. Or, the bottling plant. Or, the wealth management. Or, the platoon. So I think that the organization of assets has always been the determining factor of how a business behaves and we institutionalized work, in other words where we put people, based on the business' understanding of assets. Do you disagree with that? Is that, are we wrong in that regard? I think data scientists are an example of reinstitutionalizing work around a very core asset in this case, data. >> Yeah, you're saying that the most valuable asset is shifting from some of those physical assets, the bottling plant et cetera, to data. >> Yeah we are, we are. Absolutely. Alright, David Foyer. >> Neil: I'd like to come in. >> Panelist: I agree with that too. >> Okay, go ahead Neil. >> I'd like to give an example from the news. Cigna's acquisition of Express Scripts for $67 billion. Who the hell is Cigna, right? Connecticut General is just a sleepy life insurance company and INA was a second-tier property and casualty company. They merged a long time ago, they got into health insurance and suddenly, who's Express Scripts? I mean that's a company that nobody ever even heard of. They're a pharmacy benefit manager, what is that? They're an information management company, period. That's all they do. >> David Foyer, what does this mean from a technology standpoint? >> So I wanted to to emphasize one thing that evolution has always taught us. That you have to be able to come from where you are. You have to be able to evolve from where you are and take the assets that you have. And the assets that people have are their current systems of records, other things like that. They must be able to evolve into the future to better utilize what those systems are. And the other thing I would like to say-- >> Let me give you an example just to interrupt you, because this is a very important point. One of the primary reasons why the telecommunications companies, whom so many people believed, analysts believed, had this fundamental advantage, because so much information's flowing through them is when you're writing assets off for 30 years, that kind of locks you into an operational mode, doesn't it? >> Exactly. And the other thing I want to emphasize is that the most important thing is sources of data not the data itself. So for example, real-time data is very very important. So what is your source of your real-time data? If you've given that away to Google or your IOT vendor you have made a fundamental strategic mistake. So understanding the sources of data, making sure that you have access to that data, is going to enable you to be able to build the sort of processes and data digitalization. >> So let's turn that concept into kind of a Geoffrey Moore kind of strategy bromide. At the end of the day you look at your value proposition and then what activities are central to that value proposition and what data is thrown off by those activities and what data's required by those activities. >> Right, both internal-- >> We got that right? >> Yeah. Both internal and external data. What are those sources that you require? Yes, that's exactly right. And then you need to put together a plan which takes you from where you are, as the sources of data and then focuses on how you can use that data to either improve revenue or to reduce costs, or a combination of those two things, as a series of specific exercises. And in particular, using that data to automate in real-time as much as possible. That to me is the fundamental requirement to actually be able to do this and make money from it. If you look at every example, it's all real-time. It's real-time bidding at Google, it's real-time allocation of resources by Uber. That is where people need to focus on. So it's those steps, practical steps, that organizations need to take that I think we should be giving a lot of focus on. >> You mention Uber. David Vellante, we're just not talking about the, once again, talking about the Uberization of things, are we? Or is that what we mean here? So, what we'll do is we'll turn the conversation very quickly over to you George. And there are existing today a number of different domains where we're starting to see a new emphasis on how we start pricing some of this risk. Because when we think about de-risking as it relates to data give us an example of one. >> Well we were talking earlier, in financial services risk itself is priced just the way time is priced in terms of what premium you'll pay in terms of interest rates. But there's also something that's softer that's come into much more widely-held consciousness recently which is reputational risk. Which is different from operational risk. Reputational risk is about, are you a trusted steward for data? Some of that could be personal information and a use case that's very prominent now with the European GDPR regulation is, you know, if I ask you as a consumer or an individual to erase my data, can you say with extreme confidence that you have? That's just one example. >> Well I'll give you a specific number on that. We've mentioned it here on Action Item before. I had a conversation with a Chief Privacy Officer a few months ago who told me that they had priced out what the fines to Equifax would have been had the problem occurred after GDPR fines were enacted. It was $160 billion, was the estimate. There's not a lot of companies on the planet that could deal with $160 billion liability. Like that. >> Okay, so we have a price now that might have been kind of, sort of mushy before. And the notion of trust hasn't really changed over time what's changed is the technical implementations that support it. And in the old world with systems of record we basically collected from our operational applications as much data as we could put it in the data warehouse and it's data marked satellites. And we try to govern it within that perimeter. But now we know that data basically originates and goes just about anywhere. There's no well-defined perimeter. It's much more porous, far more distributed. You might think of it as a distributed data fabric and the only way you can be a trusted steward of that is if you now, across the silos, without trying to centralize all the data that's in silos or across them, you can enforce, who's allowed to access it, what they're allowed to do, audit who's done what to what type of data, when and where? And then there's a variety of approaches. Just to pick two, one is where it's discovery-oriented to figure out what's going on with the data estate. Using machine learning this is, Alation is an example. And then there's another example, which is where you try and get everyone to plug into what's essentially a new system catalog. That's not in a in a deviant mesh but that acts like the fabric for your data fabric, deviant mesh. >> That's an example of another, one of the properties of looking at coming at this. But when we think, Dave Vellante coming back to you for a second. When we think about the conversation there's been a lot of presumption or a lot of bromide. Analysts like to talk about, don't get Uberized. We're not just talking about getting Uberized. We're talking about something a little bit different aren't we? >> Well yeah, absolutely. I think Uber's going to get Uberized, personally. But I think there's a lot of evidence, I mentioned the big five, but if you look at Spotify, Waze, AirbnB, yes Uber, yes Twitter, Netflix, Bitcoin is an example, 23andme. These are all examples of companies that, I'll go back to what I said before, are putting data at the core and building humans expertise around that core to leverage that expertise. And I think it's easy to sit back, for some companies to sit back and say, "Well I'm going to wait and see what happens." But to me anyway, there's a big gap between kind of the haves and the have-nots. And I think that, that gap is around applying machine intelligence to data and applying cloud economics. Zero marginal economics and API economy. An always-on sort of mentality, et cetera et cetera. And that's what the economy, in my view anyway, is going to look like in the future. >> So let me put out a challenge, Jim I'm going to come to you in a second, very quickly on some of the things that start looking like data assets. But today, when we talk about data protection we're talking about simply a whole bunch of applications and a whole bunch of devices. Just spinning that data off, so we have it at a third site. And then we can, and it takes to someone in real-time, and then if there's a catastrophe or we have, you know, large or small, being able to restore it often in hours or days. So we're talking about an improvement on RPO and RTO but when we talk about data assets, and I'm going to come to you in a second with that David Floyer, but when we talk about data assets, we're talking about, not only the data, the bits. We're talking about the relationships and the organization, and the metadata, as being a key element of that. So David, I'm sorry Jim Kobielus, just really quickly, thirty seconds. Models, what do they look like? What are the new nature of some of these assets look like? >> Well the new nature of these assets are the machine learning models that are driving so many business processes right now. And so really the core assets there are the data obviously from which they are developed, and also from which they are trained. But also very much the knowledge of the data scientists and engineers who build and tune this stuff. And so really, what you need to do is, you need to protect that knowledge and grow that knowledge base of data science professionals in your organization, in a way that builds on it. And hopefully you keep the smartest people in house. And they can encode more of their knowledge in automated programs to manage the entire pipeline of development. >> We're not talking about files. We're not even talking about databases, are we David Floyer? We're talking about something different. Algorithms and models are today's technology's really really set up to do a good job of protecting the full organization of those data assets. >> I would say that they're not even being thought about yet. And going back on what Jim was saying, Those data scientists are the only people who understand that in the same way as in the year 2000, the COBOL programmers were the only people who understood what was going on inside those applications. And we as an industry have to allow organizations to be able to protect the assets inside their applications and use AI if you like to actually understand what is in those applications and how are they working? And I think that's an incredibly important de-risking is ensuring that you're not dependent on a few experts who could leave at any moment, in the same way as COBOL programmers could have left. >> But it's not just the data, and it's not just the metadata, it really is the data structure. >> It is the model. Just the whole way that this has been put together and the reason why. And the ability to continue to upgrade that and change that over time. So those assets are incredibly important but at the moment there is no way that you can, there isn't technology available for you to actually protect those assets. >> So if I combine what you just said with what Neil Raden was talking about, David Vallante's put forward a good vision of what's required. Neil Raden's made the observation that this is going to be much more than technology. There's a lot of change, not change management at a low level inside the IT, but business change and the technology companies also have to step up and be able to support this. We're seeing this, we're seeing a number of different vendor types start to enter into this space. Certainly storage guys, Dylon Sears talking about doing a better job of data protection we're seeing middleware companies, TIBCO and DISCO, talk about doing this differently. We're seeing file systems, Scality, WekaIO talk about doing this differently. Backup and restore companies, Veeam, Veritas. I mean, everybody's looking at this and they're all coming at it. Just really quickly David, where's the inside track at this point? >> For me it is so much whitespace as to be unbelievable. >> So nobody has an inside track yet. >> Nobody has an inside track. Just to start with a few things. It's clear that you should keep data where it is. The cost of moving data around an organization from inside to out, is crazy. >> So companies that keep data in place, or technologies to keep data in place, are going to have an advantage. >> Much, much, much greater advantage. Sure, there must be backups somewhere. But you need to keep the working copies of data where they are because it's the real-time access, usually that's important. So if it originates in the cloud, keep it in the cloud. If it originates in a data-provider, on another cloud, that's where you should keep it. If it originates on your premise, keep it where it originated. >> Unless you need to combine it. But that's a new origination point. >> Then you're taking subsets of that data and then combining that up for itself. So that would be my first point. So organizations are going to need to put together what George was talking about, this metadata of all the data, how it interconnects, how it's being used. The flow of data through the organization, it's amazing to me that when you go to an IT shop they cannot define for you how the data flows through that data center or that organization. That's the requirement that you have to have and AI is going to be part of that solution, of looking at all of the applications and the data and telling you where it's going and how it's working together. >> So the second thing would be companies that are able to build or conceive of networks as data. Will also have an advantage. And I think I'd add a third one. Companies that demonstrate perennial observations, a real understanding of the unbelievable change that's required you can't just say, oh Facebook wants this therefore everybody's going to want it. There's going to be a lot of push marketing that goes on at the technology side. Alright so let's get to some Action Items. David Vellante, I'll start with you. Action Item. >> Well the future's going to be one where systems see, they talk, they sense, they recognize, they control, they optimize. It may be tempting to say, you know what I'm going to wait, I'm going to sit back and wait to figure out how I'm going to close that machine intelligence gap. I think that's a mistake. I think you have to start now, and you have to start with your data model. >> George Gilbert, Action Item. >> I think you have to keep in mind the guardrails related to governance, and trust, when you're building applications on the new data fabric. And you can take the approach of a platform-oriented one where you're plugging into an API, like Apache Atlas, that Hortonworks is driving, or a discovery-oriented one as David was talking about which would be something like Alation, using machine learning. But if, let's say the use case starts out as an IOT, edge analytics and cloud inferencing, that data science pipeline itself has to now be part of this fabric. Including the output of the design time. Meaning the models themselves, so they can be managed. >> Excellent. Jim Kobielus, you've been pretty quiet but I know you've got a lot to offer. Action Item, Jim. >> I'll be very brief. What you need to do is protect your data science knowledge base. That's the way to de-risk this entire process. And that involves more than just a data catalog. You need a data science expertise registry within your distributed value chain. And you need to manage that as a very human asset that needs to grow. That is your number one asset going forward. >> Ralph Finos, you've also been pretty quiet. Action Item, Ralph. >> Yeah, I think you've got to be careful about what you're trying to get done. Whether it's, it depends on your industry, whether it's finance or whether it's the entertainment business, there are different requirements about data in those different environments. And you need to be cautious about that and you need leadership on the executive business side of things. The last thing in the world you want to do is depend on data scientists to figure this stuff out. >> And I'll give you the second to last answer or Action Item. Neil Raden, Action Item. >> I think there's been a lot of progress lately in creating tools for data scientists to be more efficient and they need to be, because the big digital giants are draining them from other companies. So that's very encouraging. But in general I think becoming a data-driven, a digital transformation company for most companies, is a big job and I think they need to it in piece parts because if they try to do it all at once they're going to be in trouble. >> Alright, so that's great conversation guys. Oh, David Floyer, Action Item. David's looking at me saying, ah what about me? David Floyer, Action Item. >> (laughing) So my Action Item comes from an Irish proverb. Which if you ask for directions they will always answer you, "I wouldn't start from here." So the Action Item that I have is, if somebody is coming in saying you have to re-do all of your applications and re-write them from scratch, and start in a completely different direction, that is going to be a 20-year job and you're not going to ever get it done. So you have to start from what you have. The digital assets that you have, and you have to focus on improving those with additional applications, additional data using that as the foundation for how you build that business with a clear long-term view. And if you look at some of the examples that were given early, particularly in the insurance industries, that's what they did. >> Thank you very much guys. So, let's do an overall Action Item. We've been talking today about the challenges of de-risking digital business which ties directly to the overall understanding of the role of data assets play in businesses and the technology's ability to move from just protecting data, restoring data, to actually restoring the relationships in the data, the structures of the data and very importantly the models that are resident in the data. This is going to be a significant journey. There's clear evidence that this is driving a new valuation within the business. Folks talk about data as the new oil. We don't necessarily see things that way because data, quite frankly, is a very very different kind of asset. The cost could be shared because it doesn't suffer the same limits on scarcity. So as a consequence, what has to happen is, you have to start with where you are. What is your current value proposition? And what data do you have in support of that value proposition? And then whiteboard it, clean slate it and say, what data would we like to have in support of the activities that we perform? Figure out what those gaps are. Find ways to get access to that data through piecemeal, piece-part investments. That provide a roadmap of priorities looking forward. Out of that will come a better understanding of the fundamental data assets that are being created. New models of how you engage customers. New models of how operations works in the shop floor. New models of how financial services are being employed and utilized. And use that as a basis for then starting to put forward plans for bringing technologies in, that are capable of not just supporting the data and protecting the data but protecting the overall organization of data in the form of these models, in the form of these relationships, so that the business can, as it creates these, as it throws off these new assets, treat them as the special resource that the business requires. Once that is in place, we'll start seeing businesses more successfully reorganize, reinstitutionalize the work around data, and it won't just be the big technology companies who have, who people call digital native, that are well down this path. I want to thank George Gilbert, David Floyer here in the studio with me. David Vellante, Ralph Finos, Neil Raden and Jim Kobelius on the phone. Thanks very much guys. Great conversation. And that's been another Wikibon Action Item. (upbeat music)

Published Date : Mar 16 2018

SUMMARY :

I'm joined here in the studio has been that the difference and importantly in the context are going to have absolute consequential impacts and provide access to the data model, the ability of organizations to really implement this but is bigger than the role of technology. that is the right thing to do. Why is it that the most valuable companies in the world human expertise is at the center and the data lives in silos in other businesses, the buildings. the bottling plant et cetera, to data. Yeah we are, we are. an example from the news. and take the assets that you have. One of the primary reasons why is going to enable you to be able to build At the end of the day you look at your value proposition And then you need to put together a plan once again, talking about the Uberization of things, to erase my data, can you say with extreme confidence There's not a lot of companies on the planet and the only way you can be a trusted steward of that That's an example of another, one of the properties I mentioned the big five, but if you look at Spotify, and I'm going to come to you in a second And so really, what you need to do is, of protecting the full organization of those data assets. and use AI if you like to actually understand and it's not just the metadata, And the ability to continue to upgrade that and the technology companies also have to step up It's clear that you should keep data where it is. are going to have an advantage. So if it originates in the cloud, keep it in the cloud. Unless you need to combine it. That's the requirement that you have to have that goes on at the technology side. Well the future's going to be one where systems see, I think you have to keep in mind the guardrails but I know you've got a lot to offer. that needs to grow. Ralph Finos, you've also been pretty quiet. And you need to be cautious about that And I'll give you the second to last answer and they need to be, because the big digital giants David's looking at me saying, ah what about me? that is going to be a 20-year job and the technology's ability to move from just

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Jim KobielusPERSON

0.99+

AmazonORGANIZATION

0.99+

David VellantePERSON

0.99+

DavidPERSON

0.99+

AppleORGANIZATION

0.99+

FacebookORGANIZATION

0.99+

MicrosoftORGANIZATION

0.99+

NeilPERSON

0.99+

GoogleORGANIZATION

0.99+

WalmartORGANIZATION

0.99+

Dave VellantePERSON

0.99+

David FloyerPERSON

0.99+

George GilbertPERSON

0.99+

Jim KobeliusPERSON

0.99+

Peter BurrisPERSON

0.99+

JimPERSON

0.99+

Geoffrey MoorePERSON

0.99+

GeorgePERSON

0.99+

Ralph FinosPERSON

0.99+

Neil RadenPERSON

0.99+

INAORGANIZATION

0.99+

EquifaxORGANIZATION

0.99+

SearsORGANIZATION

0.99+

PeterPERSON

0.99+

March 2018DATE

0.99+

UberORGANIZATION

0.99+

TIBCOORGANIZATION

0.99+

DISCOORGANIZATION

0.99+

David VallantePERSON

0.99+

$160 billionQUANTITY

0.99+

20-yearQUANTITY

0.99+

30 yearsQUANTITY

0.99+

RalphPERSON

0.99+

DavePERSON

0.99+

NetflixORGANIZATION

0.99+

Peter DruckerPERSON

0.99+

Express ScriptsORGANIZATION

0.99+

VeritasORGANIZATION

0.99+

David FoyerPERSON

0.99+

VeeamORGANIZATION

0.99+

$67 billionQUANTITY

0.99+

Palo Alto, CaliforniaLOCATION

0.99+

first pointQUANTITY

0.99+

thirty secondsQUANTITY

0.99+

secondQUANTITY

0.99+

SpotifyORGANIZATION

0.99+

TwitterORGANIZATION

0.99+

Connecticut GeneralORGANIZATION

0.99+

two thingsQUANTITY

0.99+

bothQUANTITY

0.99+

about $3.5 trillionQUANTITY

0.99+

HortonworksORGANIZATION

0.99+

CignaORGANIZATION

0.99+

BothQUANTITY

0.99+

2000DATE

0.99+

todayDATE

0.99+

oneQUANTITY

0.99+

Dylon SearsORGANIZATION

0.98+

Action Item | Big Data SV Preview Show - Feb 2018


 

>> Hi, I'm Peter Burris and once again, welcome to a Wikibon Action Item. (lively electronic music) We are again broadcasting from the beautiful theCUBE Studios here in Palo Alto, California, and we're joined today by a relatively larger group. So, let me take everybody through who's here in the studio with us. David Floyer, George Gilbert, once again, we've been joined by John Furrier, who's one of the key CUBE hosts, and on the remote system is Jim Kobielus, Neil Raden, and another CUBE host, Dave Vellante. Hey guys. >> Hi there. >> Good to be here. >> Hey. >> So, one of the things we're, one of the reasons why we have a little bit larger group here is because we're going to be talking about a community gathering that's taking place in the big data universe in a couple of weeks. Large numbers of big data professionals are going to be descending upon Strata for the purposes of better understanding what's going on within the big data universe. Now we have run a CUBE show next to that event, in which we get the best thought leaders that are possible at Strata, bring them in onto theCUBE, and really to help separate the signal from the noise that Strata has historically represented. We want to use this show to preview what we think that signal's going to be, so that we can help the community better understand what to look for, where to go, what kinds of things to be talking about with each other so that it can get more out of that important event. Now, George, with that in mind, what are kind of the top level thing? If it was one thing that we'd identify as something that was different two years ago or a year ago, and it's going to be different from this show, what would we say it would be? >> Well, I think the big realization that's here is that we're starting with the end in mind. We know the modern operational analytic applications that we want to build, that anticipate or influence a user interaction or inform or automate a business transaction. And for several years, we were experimenting with big data infrastructure, but that was, it wasn't solution-centric, it was technology-centric. And we kind of realized that the do it yourself, assemble your own kit, opensource big data infrastructure created too big a burden on admins. Now we're at the point where we're beginning to see a more converged set of offerings take place. And by converged, I mean an end to end analytic pipeline that is uniform for developers, uniform for admins, and because it's pre-integrated, is lower latency. It helps you put more data through one single analytic latency budget. That's what we think people should look for. Right now, though, the hottest new tech-centric activity is around Machine Learning, and I think the big thing we have to do is recognize that we're sort of at the same maturity level as we were with big data several years ago. And people should, if they're going to work with it, start with the knowledge, for the most part, that they're going to be experimenting, 'cause the tooling isn't quite mature enough, we don't have enough data scientists for people to be building all these pipelines bespoke. And the third-party applications, we don't have a high volume of them where this is embedded yet. >> So if I can kind of summarize what you're saying, we're seeing bifurcation occur within the ecosystem associated with big data that's driving toward simplification on the infrastructure side, which increasingly is being associated with the term big data, and new technologies that can apply that infrastructure and that data to new applications, including things like AI, ML, DL, where we think about modeling and services, and a new way of building value. Now that suggests that one or the other is more or less hot, but Neil Raden, I think the practical reality is that here in Silicon Valley, we got to be careful about getting too far out in front of our skis. At the end of the day, there's still a lot of work to be done inside how you simply do things like move data from one place to the other in a lot of big enterprises. Would you agree with that? >> Oh absolutely. I've been talking to a lot clients this week and, you know, we don't talk about the fact that they're still running their business on what we would call legacy systems, and they don't know how to, you know, get out of them or transform from them. So they're still starting to plan for this, but the problem is, you know, it's like talking about the 27 rocket engines on the whatever it was that he launched into space, launching a Tesla into space. But you can talk about the engineering of those engines and that's great, but what about all the other things you're going to have to do to get that (laughs) car into space? And it's the same thing. A year ago, we were talking about Hadoop and big data and, to a certain extent, Machine Learning, maybe more data science. But now people are really starting to say, How do we actually do this, how do we secure it, how do we govern it, how do we get some sort of metadata or semantics on the data we're working with so people know what they're using. I think that's where we are in a lot of companies. >> Great, so that's great feedback, Neil. So as we look forward, Jim Kobielus, the challenges associated with what it means to better improve the facilities of your infrastructure, but also use that as a basis for increasing your capability on some of the new applications services, what are we looking for, what should folks be looking for as they explore the show in the next couple of weeks on the ML side? What new technologies, what new approaches? Going back to what George said, we're in experimentation mode. What are going to be the experiments that are going to generate greatest results over the course of the next year? >> Yeah, for the data scientists, who flock to Strata and similar conferences, automation of the Machine Learning pipeline is super hot in terms of investments by the solution providers. Everybody from Google to IBM to AWS, and others, are investing very heavily in automation of, not just the data engine, that problem's been had a long time ago. It's automation of more of the feature engineering and the trending. These very manual, often labor intensive, jobs have to be sped up and automated to a great degree to enable the magic of productivity by the data scientists in the new generation of app developers. So look for automation of Machine Learning to be a super hot focus. Related to that is, look for a new generation of development suites that focus on DevOps, speeding the Machine Learning in DL and AI from modeling through training and evaluation deployment in iteration. We've seen a fair upswing in the number of such toolkits on the market from a variety of startup vendors, like the DataRobots of the world. But also coming to say, AWS with SageMaker, for example, that's hot. Also, look for development toolkits that automate more of the cogeneration, you know, a low-code tools, but the new generation of low-code tools, as highlighted in a recent Wikibons study, use ML to drive more of the actual production of fairly decent, good enough code, as a first rough prototype for a broad range of applications. And finally we're seeing a fair amount of ML-generated code generation inside of things like robotic process automation, RPA, which I believe will probably be a super hot theme at Strata and other shows this year going forward. So there's a, you mentioned the idea of better tooling for DevOps and the relationship between big data and ML, and what not, and DevOps. One of the key things that we've been seeing over the course of the last few years, and it's consistent with the trends that we're talking about, is increasing specialization in a lot of the perspectives associated with changes within this marketplace, so we've seen other shows that have emerged that have been very, very important, that we, for example, are participating in. Places like Splunk, for example, that is the vanguard, in many respects, of a lot of these trends in big data and how big data can applied to business problems. Dave Vellante, I know you've been associated with a number of, participating in these shows, how does this notion of specialization inform what's going to happen in San Jose, and what kind of advice and counsel should we tell people to continue to explore beyond just what's going to happen in San Jose in a couple weeks? >> Well, you mentioned Splunk as an example, a very sort of narrow and specialized company that solves a particular problem and has a very enthusiastic ecosystem and customer base around that problem. LAN files to solve security problems, for example. I would say Tableau is another example, you know, heavily focused on Viz. So what you're seeing is these specialized skillsets that go deep within a particular domain. I think the thing to think about, especially when we're in San Jose next week, is as we talk about digital disruption, what are the skillsets required beyond just the domain expertise. So you're sort of seeing this bifurcated skillsets really coming into vogue, where if somebody understands, for example, traditional marketing, but they also need to understand digital marketing in great depth, and the skills that go around it, so there's sort of a two-tool player. We talk about five-tool player in baseball. At least a multidimensional skillset in digital. >> And that's likely to occur not just in a place like marketing, but across the board. David Floyer, as folks go to the show and start to look more specifically about this notion of convergence, are there particular things that they should think about that, to come back to the notion of, well, you know, hardware is going to make things more or less difficult for what the software can do, and software is going to be created that will fill up the capabilities of hardware. What are some of the underlying hardware realities that folks going to the show need to keep in mind as they evaluate, especially the infrastructure side, these different infrastructure technologies that are getting more specialized? >> Well, if we look historically at the big data area, the solution has been to put in very low cost equipment as nodes, lots of different nodes, and move the data to those nodes so that you get a parallelization of the, of the data handling. That is not the only way of doing it. There are good ways now where you can, in fact, have a single version of that data in one place in very high speed storage, on flash storage, for example, and where you can allow very fast communication from all of the nodes directly to that data. And that makes things a lot simpler from an operational point of view. So using current Batch Automation techniques that are in existence, and looking at those from a new perspective, which is I do IUs apply these to big data, how do I automate these things, can make a huge difference in just the practicality in the elapsed time for some of these large training things, for example. >> Yeah, I was going to say that to many respects, what you're talking about is bringing things like training under a more traditional >> David: Operational, yeah. >> approach and operational set of disciplines. >> David: Yes, that's right. >> Very, very important. So John Furrier, I want to come back to you, or I want to come to you, and say that there are some other technologies that, while they're the bright shiny objects and people think that they're going to be the new kind of Harry Potter technologies of magic everywhere, Blockchain is certainly going to become folded into this big data concept, because Blockchain describes how contracts, ownership, authority ultimately get distributed. What should folks look for as the, as Blockchain starts to become part of these conversations? >> That's a good point, Peter. My summary of the preview for BigData SV Silicon Valley, which includes the Strata show, is two things: Blockchain points to the future and GDPR points to the present. GDPR is probably the most, one of the most fundamental impacts to the big data market in a long time. People have been working on it for a year. It is a nightmare. The technical underpinnings of what companies have to do to comply with GDPR is a moving train, and it's complete BS. There's no real solutions out there, so if I was going to tell everyone to think about that and what to look for: What is happening with GDPR, what's the impact of the databases, what's the impact of the architectures? Everyone is faking it 'til they make it. No one really has anything, in my opinion from what I can see, so it's a technical nightmare. Where was that database? So it's going to impact how you store the data, and the sovereignty issue is another issue. So the Blockchain then points to the sovereignty issue of the data, both in terms of the company, the country, and the user. These things are going to impact software development, application development, and, ultimately, cloud choice and the IoT. So to me, GDPR is not just a one and done thing and Blockchain is kind of a future thing to look at. So I would look out of those two lenses and say, Do you have a direction or a narrative that supports me today with what GDPR will impact throughout the organization. And then, what's going on with this new decentralized infrastructure and the role of data, and the sovereignty of that data, with respect to company, country, and user. So to me, that's the big issue. >> So George Gilbert, if we think about this question of these fundamental technologies that are going to become increasingly important here, database managers are not dead as a technology. We've seen a relative explosion over the last few years in at least invention, even if it hasn't been followed with, as Neil talked about, very practical ways of bringing new types of disciplines into a lot of enterprises. What's going to happen with the database world, and what should people be looking for in a couple of weeks to better understand how some of these data management technologies are going to converge and, or involve? >> It's a topic that will be of intense interest and relevance to IT professionals, because it's become the common foundation of all modern apps. But I think what we can do is we can see, for instance, a leading indicator of what's going to happen with the legacy vendors, where we have in-memory technologies from both transaction processing and analytics, and we have more advanced analytics embedded in the database engine, including Machine Learning, the model training, as well as model serving. But the, what happened in the big data community is that we disassembled the DBMS into the data manipulation language, which is an analytic language, like, could be Spark, could be Flink, even Hive. We had the Catalog, which I think Jim has talked about or will be talking about, where we're not looking, it's not just a dictionary of what's in one DBMS, but it's a whole way of tracking and governing data across many stores. And then there's the Storage Manager, could be the file system, an object store, could be just something like Kudu, which is a MPP way of, in parallel, performing a bunch of operations on data that's stored. The reason I bring all this up is, following on David's comment about the evolution of hardware, databases are fundamentally meant to expose capabilities in the hardware and to mediate access to data, using these hardware capabilities. And now that we have this, what's emerging as this unigrid, with memory-intensive architectures and super low latency to get from any point or node on that cluster to any other node, like with only a five microsecond lag, relative to previous architectures. We can now build databases that scale up with the same knowledge base that we built databases... I'm sorry, that scale out, that we used to build databases that scale up. In other words, it democratizes the ability to build databases of enormous scale, and that means that we can have analytics and the transactions working together at very low latency. >> Without binding them. Alright, so I think it's time for the action items. We got a lot to do, so guys, keep it really tight, really simple. David Floyer, let me start with you. Action item. >> So action item on big data should be focus on technologies that are going to reduce the elapse time of solutions in the data center, and those are many and many of them, but it's a production problem, it's becoming a production problem, treat it as a production problem, and put it in the fundamental procedures and technologies to succeed. >> And look for vendors >> Who can do that, yes. >> that do that. George Gilbert, action item. >> So I talked about convergence before. The converged platform now is shifting, it's center of gravity is shifting to continuous processing, where the data lake is a reference data repository that helps inform the creation of models, but then you run the models against the streaming continuous data for the freshest insights-- >> Okay, Jim Kobielus, action item. >> Yeah, focus on developer productivity in this new era of big data analytics. Specifically focus on the next generation of developers, who are data scientists, and specifically focus on automating most of what they do, so they can focus on solving problems and sifting through data. Put all the grunt work or training, and all that stuff, take and carry it by the infrastructure, the tooling. >> Peter: Neil Raden, action item. >> Well, one thing I learned this week is that everything we're talking about is about the analytical problem, which is how do you make better decisions and take action? But companies still run on transactions, and it seems like we're running on two different tracks and no one's talking about the transactions anymore. We're like the tail wagging the dog. >> Okay, John Furrier, action item. >> Action item is dig into GDPR. It is a really big issue. If you're not proactive, it could be a nightmare. It's going to have implications that are going to be far-reaching in the technical infrastructure, and it's the Sarbanes-Oxley, what they did for public companies, this is going to be a nightmare. And evaluate the impact of Blockchains. Two things. >> David Vellante, action item. >> So we often say that digital is data, and just because your industry hasn't been upended by digital transformations, don't think it's not coming. So it's maybe comfortable to sit back and say, Well, we're going to wait and see. Don't sit back and wait and see. All industries are susceptible to digital transformation. >> Alright, so I'll give the action item for the team. We've talked a lot about what to look for in the community gathering that's taking place next week in Silicon Valley around strata. Our observations as the community, it descends upon us, and what to look for is, number one, we're seeing a bifurcation in the marketplace, in the thought leadership, and in the tooling. One set of group, one group is going more after the infrastructure, where it's focused more on simplification, convergence; another group is going more after the developer, AI, ML, where it's focused more on how to create models, training those models, and building applications with the services associated with those models. Look for that. Don't, you know, be careful about vendors who say that they do it all. Be careful about vendors that say that they don't have to participate in a converged approach to doing this. The second thing I think we need to look for, very importantly, is that the role of data is evolving, and data is becoming an asset. And the tooling for driving velocity of data through systems and applications is going to become increasingly important, and the discipline that is necessary to ensure that the business can successfully do that with a high degree of predictability, bringing new production systems are also very important. A third area that we take a look at is that, ultimately, the impact of this notion of data as an asset is going to really come home to roost in 2018 through things like GDPR. As you scan the show, ask a simple question: Who here is going to help me get up to compliance and sustain compliance, as the understanding of privacy, ownership, etc. of data, in a big data context, starts to evolve, because there's going to be a lot of specialization over the next few years. And there's a final one that we might add: When you go to the show, do not just focus on your favorite brands. There's a lot of new technology out there, including things like Blockchain. They're going to have an enormous impact, ultimately, on how this marketplace unfolds. The kind of miasma that's occurred in big data is starting to specialize, it's starting to break down, and that's creating new niches and new opportunities for new sources of technology, while at the same time, reducing the focus that we currently have on things like Hadoop as a centerpiece. A lot of convergence is going to create a lot of new niches, and that's going to require new partnerships, new practices, new business models. Once again, guys, I want to thank you very much for joining me on Action Item today. This is Peter Burris from our beautiful Palo Alto theCUBE Studio. This has been Action Item. (lively electronic music)

Published Date : Feb 24 2018

SUMMARY :

We are again broadcasting from the beautiful and it's going to be different from this show, And the third-party applications, we don't have Now that suggests that one or the other is more or less hot, but the problem is, you know, it's like talking about the What are going to be the experiments that are going to in a lot of the perspectives associated with I think the thing to think about, that folks going to the show need to keep in mind and move the data to those nodes and people think that they're going to be So the Blockchain then points to the sovereignty issue What's going to happen with the database world, in the hardware and to mediate access to data, We got a lot to do, so guys, focus on technologies that are going to that do that. that helps inform the creation of models, Specifically focus on the next generation of developers, and no one's talking about the transactions anymore. and it's the Sarbanes-Oxley, So it's maybe comfortable to sit back and say, and sustain compliance, as the understanding of privacy,

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
DavidPERSON

0.99+

Jim KobielusPERSON

0.99+

GeorgePERSON

0.99+

David FloyerPERSON

0.99+

George GilbertPERSON

0.99+

Dave VellantePERSON

0.99+

Neil RadenPERSON

0.99+

NeilPERSON

0.99+

Peter BurrisPERSON

0.99+

David VellantePERSON

0.99+

IBMORGANIZATION

0.99+

San JoseLOCATION

0.99+

John FurrierPERSON

0.99+

PeterPERSON

0.99+

Feb 2018DATE

0.99+

Silicon ValleyLOCATION

0.99+

JimPERSON

0.99+

AWSORGANIZATION

0.99+

2018DATE

0.99+

GoogleORGANIZATION

0.99+

GDPRTITLE

0.99+

next weekDATE

0.99+

two thingsQUANTITY

0.99+

Palo Alto, CaliforniaLOCATION

0.99+

SplunkORGANIZATION

0.99+

bothQUANTITY

0.99+

A year agoDATE

0.99+

two lensesQUANTITY

0.99+

a year agoDATE

0.99+

two years agoDATE

0.99+

this weekDATE

0.99+

Palo AltoLOCATION

0.99+

firstQUANTITY

0.99+

third areaQUANTITY

0.98+

CUBEORGANIZATION

0.98+

one groupQUANTITY

0.98+

second thingQUANTITY

0.98+

27 rocketQUANTITY

0.98+

todayDATE

0.98+

next yearDATE

0.98+

Two thingsQUANTITY

0.97+

theCUBE StudiosORGANIZATION

0.97+

two-tool playerQUANTITY

0.97+

five microsecondQUANTITY

0.96+

One setQUANTITY

0.96+

TableauORGANIZATION

0.94+

a yearQUANTITY

0.94+

single versionQUANTITY

0.94+

oneQUANTITY

0.94+

WikibonsORGANIZATION

0.91+

WikibonORGANIZATION

0.91+

two different tracksQUANTITY

0.91+

five-tool playerQUANTITY

0.9+

several years agoDATE

0.9+

this yearDATE

0.9+

StrataTITLE

0.87+

Harry PotterPERSON

0.85+

one thingQUANTITY

0.84+

yearsDATE

0.83+

one placeQUANTITY

0.82+