Day Two Kickoff | Big Data NYC

(quite music) >> I'll open that while he does that. >> Co-Host: Good, perfect. >> Man: All right, rock and roll. >> This is Robin Matlock, the CMO of VMware, and you're watching theCUBE. >> This is John Siegel of VPA Product Marketing at Dell EMC. You're watching theCUBE. >> This is Matthew Morgan, I'm the chief marketing officer at Druva and you are watching theCUBE. >> Announcer: Live from midtown Manhattan, it's theCUBE. Covering BigData New York City 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. (rippling music) >> Hello, everyone, welcome to a special CUBE live presentation here in New York City for theCUBE's coverage of BigData NYC. This is where all the action's happening in the big data world, machine learning, AI, the cloud, all kind of coming together. This is our fifth year doing BigData NYC. We've been covering the Hadoop ecosystem, Hadoop World, since 2010, it's our eighth year really at ground zero for the Hadoop, now the BigData, now the Data Market. We're doing this also in conjunction with Strata Data, which was Strata Hadoop. That's a separate event with O'Reilly Media, we are not part of that, we do our own event, our fifth year doing our own event, we bring in all the thought leaders. We bring all the influencers, meaning the entrepreneurs, the CEOs to get the real story about what's happening in the ecosystem. And of course, we do it with our analyst at Wikibon.com. I'm John Furrier with my cohost, Jim Kobielus, who's the chief analyst for our data piece. Lead analyst Jim, you know the data world's changed. We had commenting yesterday all up on YouTube.com/SiliconAngle. Day one was really set the table. And we kind of get the whiff of what's happening, we can kind of feel the trend, we got a finger on the pulse. Two things going on, two big notable stories is the world's continuing to expand around community and hybrid data and all these cool new data architectures, and the second kind of substory is the O'Reilly show has become basically a marketing. They're making millions of dollars over there. A lot of people were, last night, kind of not happy about that, and what's giving back to the community. So, again, the community theme is still resonating strong. You're starting to see that move into the corporate enterprise, which you're covering. What are you finding out, what did you hear last night, what are you hearing in the hallways? What is kind of the tea leaves that you're reading? What are some of the things you're seeing here? >> Well, all things hybrid. I mean, first of all it's building hybrid applications for hybrid cloud environments and there's various layers to that. So yesterday on theCUBE we had, for example, one layer is hybrid semantic virtualization labels are critically important for bridging workloads and microservices and data across public and private clouds. We had, from AtScale, we had Bruno Aziza and one of his customers discussing what they're doing. I'm hearing a fair amount of this venerable topic of semantic data virtualization become even more important now in the era of hybrid clouds. That's a fair amount of the scuttlebutt in the hallway and atrium talks that I participated in. Also yesterday from BMC we had Basil Faruqi talking about basically talking about automating data pipelines. There are data pipelines in hybrid environments. Very, very important for DevOps, productionizing these hybrid applications for these new multi-cloud environments. That's quite important. Hybrid data platforms of all sorts. Yesterday we had from ActIn Jeff Veis discussing their portfolio for on-prem, public cloud, putting the data in various places, and speeding up the queries and so forth. So hybrid data platforms are going increasingly streaming in real time. What I'm getting is that what I'm hearing is more and more of a layering of these hybrid environments is a critical concern for enterprises trying to put all this stuff together, and future-proof it so they can add on all the new stuff. That's coming along like cirrus clouds, without breaking interoperability, and without having to change code. Just plug and play in a massively multi-cloud environment. >> You know, and also I'm critical of a lot of things that are going on. 'Cause to your point, the reason why I'm kind of critical on the O'Reilly show and particularly the hype factor going on in some areas is two kinds of trends I'm seeing with respect to the owners of some of the companies. You have one camp that are kind of groping for solutions, and you'll see that with they're whitewashing new announcements, this is going on here. It's really kind of-- >> Jim: I think it's AI now, by the way. >> And they're AI-washing it, but you can, the tell sign is they're always kind of doing a magic trick of some type of new announcement, something's happening, you got to look underneath that, and say where is the deal for the customers? And you brought this up yesterday with Peter Burris, which is the business side of it is really the conversation now. It's not about the speeds and feeds and the cluster management, it's certainly important, and those solutions are maturing. That came up yesterday. The other thing that you brought up yesterday I thought was notable was the real emphasis on the data science side of it. And it's that it's still not easy or data science to do their job. And this is where you're seeing productivity conversations come up with data science. So, really the emphasis at the end of the day boils down to this. If you don't have any meat on the bone, you don't have a solution that rubber hits the road where you can come in and provide a tangible benefit to a company, an enterprise, then it's probably not going to work out. And we kind of had that tool conversation, you know, as people start to grow. And so as buyers out there, they got to look, and kind of squint through it saying where's the real deal? So that kind of brings up what's next? Who's winning, how do you as an analyst look at the playing field and say, that's good, that's got traction, that's winning, mm not too sure? What's your analysis, how do you tell the winners from the losers, and what's your take on this from the data science lens? >> Well, first of all you can tell the winners when they have an ample number of referenced customers who are doing interesting things. Interesting enough to get a jaded analyst to pay attention. Doing something that changes the fabric of work or life, whatever, clearly. Solution providers who can provide that are, they have all the hallmarks of a winner meaning they're making money, and they're likely to grow and so forth. But also the hallmarks of a winner are those, in many ways, who have a vision and catalyze an ecosystem around that vision of something that could be made, possibly be done before but not quite as efficiently. So you know, for example, now the way what we're seeing now in the whole AI space, deep learning, is, you know, AI means many things. The core right now, in terms of the buzzy stuff is deep learning for being able to process real time streams of video, images and so forth. And so, what we're seeing now is that the vendors who appear to be on the verge of being winners are those who use deep learning inside some new innovation that has enough, that appeals to a potential mass market. It's something you put on your, like an app or something you put on your smart phone, or it's something you buy at Walmart, install in your house. You know, the whole notion of clearly Alexa, and all that stuff. Anything that takes chatbot technology, really deep learning powers chatbots, and is able to drive a conversational UI into things that you wouldn't normally expect to talk to you and does it well in a way that people have to have that. Those are the vendors that I'm looking for, in terms of those are the ones that are going to make a ton of money selling to a mass market, and possibly, and very much once they go there, they're building out a revenue stream and a business model that they can conceivably take into other markets, especially business markets. You know, like Amazon, 20-something years ago when they got started in the consumer space as the exemplar of web retailing, who expected them 20 years later to be a powerhouse provider of business cloud services? You know, so we're looking for the Amazons of the world that can take something as silly as a conversational UI inside of a, driven by DL, inside of a consumer appliance and 20 years from now, maybe even sooner, become a business powerhouse. So that's what's new. >> Yeah, the thing that comes up that I want to get your thoughts on is that we've seen data integration become a continuing theme. The other thing about the community play here is you start to see customers align with syndicates or partnerships, and I think it's always been great to have customer traction, but, as you pointed out, as a benchmark. But now you're starting to see the partner equation, because this isn't open, decentralized, distributed internet these days. And it is looking like it's going to form differently than they way it was, than the web days and with mobile and connected devices it IoT and AI. A whole new infrastructure's developing, so you're starting to see people align with partnerships. So I think that's something that's signaling to me that the partnership is amping up. I think the people are partnering more. We've had Hortonworks on with IBM, people are partner, some people take a Switzerland approach where they partner with everyone. You had, WANdisco partners with all the cloud guys, I mean, they have unique ITP. So you have this model where you got to go out, do something, but you can't do it alone. Open source is a key part of this, so obviously that's part of the collaboration. This is a key thing. And then they're going to check off the boxes. Data integration, deep learning is a new way to kind of dig deeper. So the question I have for you is, the impact on developers, 'cause if you can connect the dots between open source, 90% of the software written will be already open source, 10% differentiated, and then the role of how people going to market with the enterprise of a partnership, you can almost connect the dots and saying it's kind of a community approach. So that leaves the question, what is the impact to developers? >> Well the impact to developers, first of all, is when you go to a community approach, and like some big players are going more community and partnership-oriented in hot new areas like if you look at some of the recent announcements in chatbots and those technologies, we have sort of a rapprochement between Microsoft and Facebook and so forth, or Microsoft and AWS. The impact for developers is that there's convergence among the companies that might have competed to the death in particular hot new areas, like you know, like I said, chatbot-enabled apps for mobile scenarios. And so it cuts short the platform wars fairly quickly, harmonizes around a common set of APIs for accessing a variety of competing offerings that really overlap functionally in many ways. For developers, it's simplification around a broader ecosystem where it's not so much competition on the underlying open source technologies, it's now competition to see who penetrates the mass market with actually valuable solutions that leverage one or more of those erstwhile competitors into some broader synthesis. You know, for example, the whole ramp up to the future of self-driving vehicles, and it's not clear who's going to dominate there. Will it be the vehicle manufacturers that are equipping their cars with all manner of computerized everything to do whatnot? Or will it be the up-and-comers? Will it be the computer companies like Apple and Microsoft and others who get real deep and invest fairly heavily in self-driving vehicle technology, and become themselves the new generation of automakers in the future? So, what we're getting is that going forward, developers want to see these big industry segments converge fairly rapidly around broader ecosystems, where it's not clear who will be the dominate player in 10 years. The developers don't really care, as long as there is consolidation around a common framework to which they can develop fairly soon. >> And open source is obviously a key role in this, and how is deep learning impacting some of the contributions that are being made, because we're starting to see the competitive advantage in collaboration on the community side is with the contributions from companies. For example, you mentioned TensorFlow multiple times yesterday from Google. I mean, that's a great contribution. If you're a young kind coming into the developer community, I mean, this is not normal. It wasn't like this before. People just weren't donating massive libraries of great stuff already pre-packaged, So all new dynamics emerging. Is that putting pressure on Amazon, is that putting pressure on AWS and others? >> It is. First of all, there is a fair amount of, I wouldn't call it first-mover advantage for TensorFlow, there've been a number of DL toolkits on the market, open source, for the last several years. But they achieved the deepest and broadest adoption most rapidly, and now they are a, TensorFlow is essentially a defacto standard in the way, that we just go back, betraying my age, 30, 40 years ago where you had two companies called SAS and SPSS that quickly established themselves as the go-to statistical modeling tools. And then they got a generation, our generation, of developers, or at least of data scientists, what became known as data scientists, to standardize around you're either going to go with SAS or SPSS if you're going to do data mining. Cut ahead to the 2010s now. The new generation of statistical modelers, it's all things DL and machine learning. And so SAS versus SPSS is ages ago, those companies are, those products still exist. But now, what are you going to get hooked on in school? What are you going to get hooked on in high school, for that matter, when you're just hobby-shopping DL? You'll probably get hooked on TensorFlow, 'cause they have the deepest and the broadest open source community where you learn this stuff. You learn the tools of the trade, you adopt that tool, and everybody else in your environment is using that tool, and you got to get up to speed. So the fact is, that broad adoption early on in a hot new area like DL, means tons. It means that essentially TensorFlow is the new Spark, where Spark, you know, once again, Spark just in the past five years came out real fast. And it's been eclipsed, as it were, on the stack of cool by TensorFlow. But it's a deepening stack of open source offerings. So the new generation of developers with data science workbenches, they just assume that there's Spark, and they're going to increasingly assume that there's TensorFlow in there. They're going to increasingly assume that there are the libraries and algorithms and models and so forth that are floating around in the open source space that they can use to bootstrap themselves fairly quickly. >> This is a real issue in the open source community which we talked, when we were in LA for the Open Source Summit, was exactly that. Is that, there are some projects that become fashionable, so for example, a cloud-native foundation, very relevant but also hot, really hot right now. A lot of people are jumping on board the cloud natives bandwagon, and rightfully so. A lot of work to be done there, and a lot of things to harvest from that growth. However, the boring blocking and tackling projects don't get all the fanfare but are still super relevant, so there's a real challenge of how do you nurture these awesome projects that we don't want to become like a nightclub where nobody goes anymore because it's not fashionable. Some of these open source projects are super important and have massive traction, but they're not as sexy, or flair-ish as some of that. >> Dl is not as sexy, or machine learning, for that matter, not as sexy as you would think if you're actually doing it, because the grunt work, John, as we know for any statistical modeling exercise, is data ingestion and preparation and so forth. That's 75% of the challenge for deep learning as well. But also for deep learning and machine learning, training the models that you build is where the rubber meets the road. You can't have a really strongly predictive DL model in terms of face recognition unless you train it against a fair amount of actual face data, whatever it is. And it takes a long time to train these models. That's what you hear constantly. I heard this constantly in the atrium talking-- >> Well that's a data challenge, is you need models that are adapting and you need real time, and I think-- >> Oh, here-- >> This points to the real new way of doing things, it's not yesterday's model. It's constantly evolving. >> Yeah, and that relates to something I read this morning or maybe it was last night, that Microsoft has made a huge investment in AI and deep learning machinery. They're doing amazing things. And one of the strategic advantages they have as a large, established solution provider with a search engine, Bing, is that from what I've been, this is something I read, I haven't talked to Microsoft in the last few hours to confirm this, that Bing is a source of training data that they're using for machine learning and I guess deep learning modeling for their own solutions or within their ecosystem. That actually makes a lot of sense. I mean, Google uses YouTube videos heavily in its deep learning for training data. So there's the whole issue of if you're a pipsqueak developer, some, you know, I'm sorry, this sounds patronizing. Some pimply-faced kid in high school who wants to get real deep on TensorFlow and start building and tuning these awesome kickass models to do face recognition, or whatever it might be. Where are you going to get your training data from? Well, there's plenty of open source database, or training databases out there you can use, but it's what everybody's using. So, there's sourcing the training data, there's labeling the training data, that's human-intensive, you need human beings to label it. There was a funny recent episode, or maybe it was a last-season episode of Silicone Valley that was all about machine learning and building and training models. It was the hot dog, not hot dog episode, it was so funny. They bamboozle a class on the show, fictionally. They bamboozle a class of college students to provide training data and to label the training data for this AI algorithm, it was hilarious. But where are you going to get the data? Where are you going to label it? >> Lot more work to do, that's basically what you're getting at. >> Jim: It's DevOps, you know, but it's grunt work. >> Well, we're going to kick off day two here. This is the SiliconeANGLE Media theCUBE, our fifth year doing our own event separate from O'Reilly media but in conjunction with their event in New York City. It's gotten much bigger here in New York City. We call it BigData NYC, that's the hashtag. Follow us on Twitter, I'm John Furrier, Jim Kobielus, we're here all day, we've got Peter Burris joining us later, head of research for Wikibon, and we've got great guests coming up, stay with us, be back with more after this short break. (rippling music)

Published Date : Sep 27 2017

SUMMARY :

This is Robin Matlock, the CMO of VMware, This is John Siegel of VPA Product Marketing This is Matthew Morgan, I'm the chief marketing officer Brought to you by SiliconANGLE Media What is kind of the tea leaves that you're reading? That's a fair amount of the scuttlebutt I'm kind of critical on the O'Reilly show is really the conversation now. Doing something that changes the fabric So the question I have for you is, the impact on developers, among the companies that might have competed to the death and how is deep learning impacting some of the contributions You learn the tools of the trade, you adopt that tool, and a lot of things to harvest from that growth. That's 75% of the challenge for deep learning as well. This points to the in the last few hours to confirm this, that's basically what you're getting at. This is the SiliconeANGLE Media theCUBE,

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Robin Matlock	PERSON	0.99+
Apple	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Matthew Morgan	PERSON	0.99+
Basil Faruqi	PERSON	0.99+
Jim	PERSON	0.99+
John Siegel	PERSON	0.99+
O'Reilly Media	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
yesterday	DATE	0.99+
90%	QUANTITY	0.99+
Peter Burris	PERSON	0.99+
two companies	QUANTITY	0.99+
New York City	LOCATION	0.99+
SPS	ORGANIZATION	0.99+
SAS	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
John	PERSON	0.99+
75%	QUANTITY	0.99+
LA	LOCATION	0.99+
Silicone Valley	TITLE	0.99+
Facebook	ORGANIZATION	0.99+
10%	QUANTITY	0.99+
Walmart	ORGANIZATION	0.99+
2010s	DATE	0.99+
YouTube	ORGANIZATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
AtScale	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
10 years	QUANTITY	0.99+
WANdisco	ORGANIZATION	0.99+
Jeff Veis	PERSON	0.99+
fifth year	QUANTITY	0.99+
one	QUANTITY	0.99+
Yesterday	DATE	0.99+
Dell EMC	ORGANIZATION	0.99+
VMware	ORGANIZATION	0.99+
eighth year	QUANTITY	0.99+
BigData	ORGANIZATION	0.99+
millions of dollars	QUANTITY	0.99+
Bing	ORGANIZATION	0.99+
BMC	ORGANIZATION	0.98+
Amazons	ORGANIZATION	0.98+
last night	DATE	0.98+
two kinds	QUANTITY	0.98+
Spark	TITLE	0.98+
Hortonworks	ORGANIZATION	0.98+
Day one	QUANTITY	0.98+
20 years later	DATE	0.98+
VPA	ORGANIZATION	0.98+
2010	DATE	0.98+
ActIn	ORGANIZATION	0.98+
Open Source Summit	EVENT	0.98+
one layer	QUANTITY	0.98+
Druva	ORGANIZATION	0.97+
Alexa	TITLE	0.97+
day two	QUANTITY	0.97+
Bruno Aziza	PERSON	0.97+
SPSS	TITLE	0.97+
Switzerland	LOCATION	0.97+
Two things	QUANTITY	0.96+
NYC	LOCATION	0.96+
Wikibon	ORGANIZATION	0.96+
30	DATE	0.95+
Wikibon.com	ORGANIZATION	0.95+
SiliconeANGLE Media	ORGANIZATION	0.95+
O'Reilly	ORGANIZATION	0.95+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for ActIn: