Action Item | The Role of Open Source

>> Hi, I'm Peter Burris, Welcome to Wikibon's Action Item. (slow techno music) Once again Wikibon's research team is assembled, centered here in The Cube Studios in lovely Palo Alto, California, so I've got David Floyer and George Gilbert with me here in the studio, on the line we have Neil Raden and Jim Kobielus, thank you once again for joining us guys. This week we are going to talk about an issue that has been dominant consideration in the industry, but it's unclear exactly what direction it's going to take, and that is the role that open source is going to play in the next generation of solving problems with technology, or we could say the role that open source will play in future digital transformations. No one can argue whether or not open source has been hugely consequential, as I said it has been, it's been one of the major drivers of not only new approaches to creating value, but also new types of solutions that actually are leading to many of the most successful technology implementations that we've seen ever, that is unlikely to change, but the question is what formal open source take as we move into an era where there's new classes of individuals creating value, like data scientists, where those new problems that we're trying to solve, like problems that are mainly driven by the role that data as opposed to code plays, and that there are new classes of providers, namely service providers as opposed to product or software providers, these issues are going to come together, and have some pretty important changes on how open source behaves over the next few years, what types of challenges it's going to successfully take on, and ultimately how users are going to be able to get value out of it. So to start the conversation off George, let's start by making a quick observation, what has the history of open source been, take us through it kind of quickly. >> The definition has changed, in its first incarnation it was fixed UNIX fragmentation and the high price of UNIX system servers, meaning UNIX the proprietary UNIX's and the proprietary servers they were built, that actually rather quickly morphed into a second incarnation where it was let's take the Linux stack, Linux, Apache, MySQL, PHP, Python, and substitute that for the old incumbents, which was UNIX, BEA Web Logic, the J2E server and Oracle Database on an EMC storage device. So that was the collapse of the price of infrastructure, so really quickly then it morphed into something very, very different, which was we had the growth of the giant Internet scale vendors, and neither on pricing nor on capacity could traditional software serve their needs, so Google didn't quite do open source, but they published papers about what they did, those papers then were implemented. >> Like Map Produce. Yeah Map Produce, Big Table, Google File System, those became the basis of Hadoop which Yahoo open sourced. There is another incarnation going, that's probably getting near its end of life right now, which is sort of a hybrid, where you might take Kafka which is open source, and put sort of proprietary bits around it for management and things like that, same what Cloudera, this is called the open core model, it's not clear if you can build a big company around it, but the principle is, the principle for most of these is, the value of the software is declining, partly because it's open source, and partly because it's so easy to build new software systems now, and the hard part is helping the customer run the stuff, and that's where some of these vendors are capturing it. >> So let's David turn our attention to how that's going to turn into actual money. So in this first generation of open source, I think up until now, certainly Red Hat, Canonical have made money by packaging and putting forward distributions, that have made a lot of money, IBM has been one of the leaders in contributing open source, and then turning that into a services business, Cloudera, Horton Works, NapR, some of these other companies have not generated the same type of market presence that a Red Hat or Canonical have put forward, but that doesn't mean there aren't companies out there that have been very successful at appropriating significant returns out of open source software, mainly however they're doing it as George said, as a service, give us some examples. >> I think the key part of open source is providing a win-win environment, so that people are paid to do stuff, and what is happening now a lot is that people are putting stuff into open source in order that it becomes a standard, and also in order that it is maintained by the community as a whole. So those two functions, those two capabilities of being paid by a company often, by IBM or by whoever it is to do something on behalf of that company, so that it becomes a standard, so that it becomes accepted, that is a good business model, in the sense that it's win-win, the developer gets recognition, the person paying for it achieves their business objective of for example getting a standard recognized-- >> A volume. >> Volume, yes. >> So it's a way to get to volume for the technology that you want to build your business around. >> Yes, what I think is far more difficult in this area is application type software, so where open source has been successful, as George said is in the stacks themselves, the lower end of the stacks, there are a few, and they usually come from very very successful applications like Word, Microsoft Word, or things like that where they can be copied, and be put into open source, but even there they have around them software from a company, Red Hat or whoever it is, that will make it successful. >> Yes but open office wasn't that successful, get to the kind of, today we have Amazon, we have some of the hyper scalars that are using that open core model and putting forward some pretty powerful services, is that the new Red Hat, is that the new Canonical? >> The person who's made most money is clearly Amazon, they took open source code and made it robust, and made it in volume, those are the two key things you to have for success, it's got to be robust, it's got to be in volume, and it's very difficult for the open source community to achieve that on its own, it needs the support of a large company to do that, and it needs the value that that large company is going to get from it, for them to put those resources in. So that has been a very successful model a lot of people decry it because they're not giving back, and there's an argument-- >> They being Amazon, have not given back quite as much. >> Yes they have relatively very few commiters. I think that's more of a problem in the T&Cs of the open source contract, so those should probably be changed, to put more onus on people to give back into the pool. >> So let me stop you, so we have identified one thing that is likely going to have to be evolved as we move forward, to prevent problems, some of the terms and conditions, we try to ensure that there is that quid pro quo, that that win-win exists. So Jim Kobielus, let me ask you a question, open source has been, as David mentioned, open source has been more successful where there is a clear model, a clear target of what the community is trying to build, it hasn't been quite successful, where it is in fact is expected that the open source community is going to start with some of the original designs, so for example, there's an enormous plethora of big data tools, and yet people are starting to ask why is big data more successful, and partly it's because putting these tools together is so difficult. So are we going to see the type of artifacts and assets and technologies associated with machine learning, AI, deep learning et cetera, easily lend themselves to an open source treatment, what do you think? >> I think were going to see open source very much take off in the niches of the deep learning and machine learning AI space, where the target capabilities we've built are fairly well understood by our broad community. Machine learning clearly, we have a fair number of frameworks that are already well established, with respect to the core capabilities that need to be performed from modeling and training, and deployment of statistical models into applications. That's where we see a fair amount of takeoff for Tensor Flow, which Google built in an open source, because the core of deep learning in terms of the algorithm, in terms of the kinds of functions you perform to be able to take data and do feature engineering and algorithm selection are fairly well understood, so those are the kinds of very discreet capabilities for which open source code is becoming standard, but there's many different alternative frameworks for doing that, Tensor Flow being one of them, that are jostling for presence in the market. The term is commoditized, more of those core capabilities are being commoditized by the fact that there well understood and agreed to by a broad community. So those are the discrete areas we're seeing the open source alternatives become predominant, but when you take a Tensor Flow and combine it with a Spark, and with a Hadoop and a Kafka and broader collections of capabilities that are needed for robust infrastructure, those are disparate communities that each have their own participants committed and so forth, nobody owns that overall step, there's no equivalent of a lamp stack were all things to do with deep learning machine learning AI on an open source basis come to the fore. If some group of companies is going to own that broadening stack, that would indicate some degree of maturation for this overall ecosystem, that's not happening yet, we don't see that happening right now. >> So Jim, I want to, my bias, I hate the term commoditization, but I Want to unify what you said with something that David said, essentially what we're talking about is the agreement in a collaborative open way around the conventions of how we perform work that compute model which then turns into products and technologies that can in fact be distributed and regarded as a standard, and regarded as a commodity around which trading can take place. But what about the data side of things George, we have got, Jim's articulated I think a pretty good case, that we're going to start seeing some tools in the marketplace, it's going to be interesting to see whether that is just further layering on top of all this craziness that is happening in the big data world, and just adding to it in the ML world, but how does the data fit into this, are we going to see something that looks like open source data in the marketplace? >> Yes, yes, and a modified yes. Let me take those in two pieces. Just to be slightly technical, hopefully not being too pedantic, software used to mean algorithms and data structures, so in other words the recipe for what to do, and the buckets for where to put the data, that has changed in the data in terms of machine learning, analytic world where the algorithms and data are so tied together, the instances of the data, not the buckets, that the data changed the algorithms, the algorithms change the data, the significance of that is, when we build applications now, it's never done, and so you go, the construct we've been focusing on is the digital twin, more broadly defined than a smart device, but when you go from one vendor and you sort of partially build it, it's an evergreen thing, it's never done, then you go to the next vendor, but you need to be able to backport some core of that to the original vendor, so for all intents and purposes that's open source, but it boils down to actually the original Berkeley license for open source, not the Apache one everyone is using now. And remind me of the other question? >> The other issue is are we going to see datasets become open source like we see code bases and code fragments and algorithms becoming open source? >> Yes this is also, just the way Amazon made infrastructure commoditized and rentable, there are going to be many datasets were they used to be proprietary, like a Google web crawl, and Google knowledge graph of disambiguation people, places and things, some of these things are either becoming open source, or openly accessible by API, so when you put those resources together you're seeing a massive deflation, or a massive shrinkage in the capital intensity of building these sorts of apps. >> So Neil, if we take a look at where we are this far, we can see that there is, even though we're moving to a services oriented model, Amazon for example is a company that is able to generate commercial rents out of open source software, Jim has made a pretty compelling case that open source software can be, or will emerge out of the tooling world for some of these new applications, there are going to be some examples of datasets, or at least APIs to datasets that will look more open source like, so it's not inconceivable that we'll see some actual open source data, I think GDPR, and some other regulations, we're still early in the process of figuring out how we're going to turn data into commodity, using Jim's words. But what about the personnel, what about the people? There were reasons why developers moved to open source, some of the soft reasons that motivated them to do things, who they work with, getting the recognition, working on relevant projects, working with relevant technologies, are we going to see a similar set of soft motivators, diffuse into the data scientist world, so that these individuals, the real ones who are creating the real value, are going to have some degree of motivation to participate with each other collaborate with each other in an open source way, what do you think? >> Good question, I think the answer is absolutely true, but it's not unique to data scientists, academics, scientists in molecular biology, civil engineers, they all wannabe recognized by their peers, on some level beyond just their, just what they're doing in their organization, but there is another segment of data scientists that are just guys working for a paycheck, and generating predictive analysis and helping the company along and so forth, and that's what they're going to do. The whole open source thing, you remember object programming, you remember JavaBeans, you remember Web Services, we tried to turn developers into librarians, and when they wanted to develop something, you go to Github, I go to Github right now and I say I'm looking for a utility that can figure out why my face is so pink on this camera, I get 1000 listings of programs, and have no idea which ones work and which ones don't, so I think the whole open source thing is about to explode, it already has, in terms of piece parts. But I think managing in an organization is different, and when I say an organization, there's the Googles and the Amazons and so forth of the world, and then there's everybody else. >> Alright so we've identified an area where we can see some consequence of change where we can anticipate some change will be required to modernize the open source model, the licensing model, we see another one where the open source communities going to have to understand how to move from a product and code to a data and service orientation, can we think of any others? >> There is one other that I'd like to add to that, and that is compliance. You addressed it to some extent, but compliance brings some real-world requirements onto code and data, and you were saying earlier on that one of the options is bringing code and data so that they intermingle and change each other, I wonder whether that when you look at it from a compliance point of view will actually pass muster, because you need from a compliance point of view to prove, for example, in the health service, that it works, and it works the same way every time, and if you've got a set of code and data that doesn't work the same every time, you probably are going to get pushed back from the people who regularly health that this is not, you can't do it that way, you'll have to find another way to do it. But that again is, is at the same each time, so the point I'm making-- >> This is a bigger issue than just open source, this is an issue where the idea if continuous refinement of the code, and the data-- >> Automatic refinement. >> Automatic refinement, could in fact, we're going to have to change some compliance laws, is open source, is it possible the open source community might actually help us understand that problem? >> Absolutely, yes. >> I think that's a good point, I think that's a really interesting point, because you're right George, the idea of a continuous development, is not something that for example Serr Banes actually says I get this, Serr Banes actually says "Oh yeah, I get this." Serr Banes actually is like, yes the data, I acknowledge that this date is right, and I acknowledge the process by which it was created was read, now this is another subject, let's bring this up later, but I think it's relevant here, because in many respects it's a difference between an income statement and balance sheet right? Saying it's good now, is kind of like the income statement, but let's come back to this, because I think it's a bigger issue. You're asserting the open source community in fact may help solve this problem by coming up with new ways of conceiving say versioning of things, and stamping things and what is a distribution, what isn't a distribution, with some of these more tightly bound sets of-- >> What we find normally is that-- >> Jim: I think that we are going to-- >> Peter: Go on Jim. >> Just to elaborate on what Peter was talking about, that whole theme, I think what we're going to see is more open source governance of models and data, within distributed development environments, using technologies like block chain as a core enabler for these workflows, for these as it were general distributed hyper ledgers indicate the latest and greatest version of a given dataset, or a given model being developed somewhere around some common solution domain, I think those kinds of environments for governance will become critically important, as this pipeline for development and training and deployment of these assets, gets ever more distributed and virtual. >> By the way Jim I actually had a conversation with a very large open source distribution company a few months ago about this very point, and I agree, I think blockchain in fact could become a mechanism by which we track intellectual property, track intellectual contributions, find ways to then monetize those contributions, going back to what you were saying David, and perhaps that becomes something that looks like the basis of a new business model, for how we think about how open source goes after these looser, goosier problems. >> But also to guarantee integrity without going through necessarily a central-- >> Very important, very important because at the end of the day George-- >> It's always hard to find somebody to maintain. >> Right, big companies, one of the big challenges that companies today are having is that they do open source is that they want to be able to keep track of their intellectual property, both from a contribution standpoint, but also inside their own business, because they're very, very concerned that the stuff that they're creating that's proprietary to their business in a digital sense, might leave the building, and that's not something a lot of banks for example want to see happen. >> I want to stick one step into this logic process that it think we haven't yet discussed, which is, we're talking about now how end customers will consume this, but there still a disconnect in terms of how the open source software vendor's or even hybrid ones can get to market with this stuff, because between open source pricing models and pricing levels, we've seen a slow motion price collapse, and the problem is that, the new go to market motion is actually made up of many motions, which is discover, learn, try, buy, recommend, and within each of those, the motion was different, and you hear it's almost like a reflex, like when your doctor hit you on the knee and your leg kind of bounced, everybody says yeah we do land and expand, and land was to discover, learn, try augmented with inside sales, the recommend and standardizes still traditional enterprise software where someone's got to talk to IT and procurement about fitting into the broader architecture, and infrastructure of the firm, and to do that you still need what has always been called the most expensive migratory workforce in the world, which is an enterprise sales force. >> But I would suggest there's a big move towards standardization of stacks, true private cloud is about having a stack which is well established, and the relationship between all the different piece parts, and the stack itself is the person who is responsible for putting that stack and maintaining that stack. >> So for a moment pretend that you are a CIO, are you going to buy OpenStack or are you going to buy the Vmware stack? >> I'm going to buy Vmware stack. >> Because that's about open source? >> No, the point I'm saying is that those open source communities or pieces, would then be absorbed into the stack as an OEM supplier as opposed to a direct supplier and I think that's true for all of these stacks, if you look at the stack for example and you have code from Netapp or whatever it is that's in that code and they're contributing It You need an OEM agreement with that provider, and it doesn't necessarily have to be open source. >> Bottom line is this stuff is still really, really complicated. >> But this model of being an OEM provider is very different from growing an enterprise sales force, you're selling something that goes into the cost of goods sold of your customer, and that the cost of goods sold better be less than 15 percent, and preferably less than five percent. >> Your point is if you can't afford a sales force, an OEM agreement is a much better way of doing it. >> You have to get somebody else's sales force to do it for you. So look I'm going to do the Action Item on this, I think that this has been a great conversation again, David, George, Neil, Jim, thanks a lot. So here's the Action Item, nobody argues that open source hasn't been important, and nobody suggests that open source is not going to remain important, what we think based on our conversation today is that open source is going to go through some changes, and those changes will occur as a consequence of new folks that are going to be important to this like data scientists, to some of the new streams of value in the industry, may not have the same motivations that the old developer world had, new types of problems that are inherently more data oriented as opposed process-oriented, and it's not as clear that the whole concept of data as an artifact, data as a convention, data as standards and commodities, are going to be as easy to define as it was in the cold world. As well as ultimately IT organizations increasingly moving towards an approach that focused more on the consumption of services, as opposed to the consumption of product, so for these and many other reasons, our expectation is that the open source community is going to go through its own transformation as it tries to support future digital transformations, current and future digital transformations. Now some of the areas that we think are going to be transformed, is we expect that there's going to be some pressure on licensing, we think there's going to be some pressure in how compliance is handled, and we think the open source community may in fact be able to help in that regard, and we think very importantly that there will be some pressure on the open source community trying to rationalize how it conceives of the new compute models, the new design models, because where open source always has been very successful is when we have a target we can collaborate to replicate and replace that target or provide a substitute. I think we can all agree that in 10 years we will be talking about how open source took some time to in fact put forward that TPC stack, as opposed to define the true private cloud stack. So our expectation is that open source is going to remain relevant, we think it's going to go through some consequential changes, and we look forward to working with our clients to help them navigate what some of those changes are, both as commiters, and also as consumers. Once again guys, thank you very much for this week's Action Item, this is Peter Barris, and until next week thank you very much for participating on Wikibon's Action Item. (slow techno music)

Published Date : Jan 12 2018

SUMMARY :

and that is the role that open source is going to play and substitute that for the old incumbents, and partly because it's so easy to build IBM has been one of the leaders in contributing open source, so that people are paid to do stuff, that you want to build your business around. the lower end of the stacks, it needs the support of a large company to do that, of the open source contract, going to have to be evolved as we move forward, that are jostling for presence in the market. and just adding to it in the ML world, and the buckets for where to put the data, there are going to be many datasets were they used some of the soft reasons that motivated them to do things, and so forth of the world, There is one other that I'd like to add to that, and I acknowledge the process by which Just to elaborate on what Peter was talking about, going back to what you were saying David, are having is that they do open source is that they want and to do that you still need what has always and the stack itself is the person who is responsible and it doesn't necessarily have to be open source. Bottom line is this stuff is still and that the cost of goods sold better an OEM agreement is a much better way of doing it. and it's not as clear that the whole concept

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Jim Kobielus	PERSON	0.99+
Neil Raden	PERSON	0.99+
David Floyer	PERSON	0.99+
George Gilbert	PERSON	0.99+
George	PERSON	0.99+
Peter Burris	PERSON	0.99+
Jim	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Peter	PERSON	0.99+
Neil	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Canonical	ORGANIZATION	0.99+
Peter Barris	PERSON	0.99+
Amazons	ORGANIZATION	0.99+
Horton Works	ORGANIZATION	0.99+
Wikibon	ORGANIZATION	0.99+
two pieces	QUANTITY	0.99+
less than five percent	QUANTITY	0.99+
Googles	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Red Hat	TITLE	0.99+
Yahoo	ORGANIZATION	0.99+
NapR	ORGANIZATION	0.99+
Word	TITLE	0.99+
less than 15 percent	QUANTITY	0.99+
Cloudera	ORGANIZATION	0.99+
two functions	QUANTITY	0.99+
two capabilities	QUANTITY	0.99+
next week	DATE	0.99+
PHP	TITLE	0.99+
Python	TITLE	0.99+
MySQL	TITLE	0.99+
second incarnation	QUANTITY	0.99+
first incarnation	QUANTITY	0.99+
10 years	QUANTITY	0.98+
Palo Alto, California	LOCATION	0.98+
This week	DATE	0.98+
GDPR	TITLE	0.98+
two key	QUANTITY	0.98+
Linux	TITLE	0.98+
today	DATE	0.97+
1000 listings	QUANTITY	0.97+
one	QUANTITY	0.97+
UNIX	TITLE	0.97+
this week	DATE	0.96+
Github	ORGANIZATION	0.96+
first generation	QUANTITY	0.96+
Vmware	ORGANIZATION	0.96+
each	QUANTITY	0.95+
Kafka	TITLE	0.95+
one step	QUANTITY	0.94+
each time	QUANTITY	0.93+
JavaBeans	TITLE	0.92+
both	QUANTITY	0.91+
BEA Web Logic	ORGANIZATION	0.91+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for NapR: