Arun Murthy, Hortonworks | DataWorks Summit 2018

>> Live from San Jose in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2018, brought to you by Hortonworks. >> Welcome back to theCUBE's live coverage of DataWorks here in San Jose, California. I'm your host, Rebecca Knight, along with my cohost, Jim Kobielus. We're joined by Aaron Murphy, Arun Murphy, sorry. He is the co-founder and chief product officer of Hortonworks. Thank you so much for returning to theCUBE. It's great to have you on >> Yeah, likewise. It's been a fun time getting back, yeah. >> So you were on the main stage this morning in the keynote, and you were describing the journey, the data journey that so many customers are on right now, and you were talking about the cloud saying that the cloud is part of the strategy but it really needs to fit into the overall business strategy. Can you describe a little bit about how you're approach to that? >> Absolutely, and the way we look at this is we help customers leverage data to actually deliver better capabilities, better services, better experiences, to their customers, and that's the business we are in. Now with that obviously we look at cloud as a really key part of it, of the overall strategy in terms of how you want to manage data on-prem and on the cloud. We kind of joke that we ourself live in a world of real-time data. We just live in it and data is everywhere. You might have trucks on the road, you might have drawings, you might have sensors and you have it all over the world. At that point, we've kind of got to a point where enterprise understand that they'll manage all the infrastructure but in a lot of cases, it will make a lot more sense to actually lease some of it and that's the cloud. It's the same way, if you're delivering packages, you don't got buy planes and lay out roads you go to FedEx and actually let them handle that view. That's kind of what the cloud is. So that is why we really fundamentally believe that we have to help customers leverage infrastructure whatever makes sense pragmatically both from an architectural standpoint and from a financial standpoint and that's kind of why we talked about how your cloud strategy, is part of your data strategy which is actually fundamentally part of your business strategy. >> So how are you helping customers to leverage this? What is on their minds and what's your response? >> Yeah, it's really interesting, like I said, cloud is cloud, and infrastructure management is certainly something that's at the foremost, at the top of the mind for every CIO today. And what we've consistently heard is they need a way to manage all this data and all this infrastructure in a hybrid multi-tenant, multi-cloud fashion. Because in some GEOs you might not have your favorite cloud renderer. You know, go to parts of Asia is a great example. You might have to use on of the Chinese clouds. You go to parts of Europe, especially with things like the GDPR, the data residency laws and so on, you have to be very, very cognizant of where your data gets stored and where your infrastructure is present. And that is why we fundamentally believe it's really important to have and give enterprise a fabric with which it can manage all of this. And hide the details of all of the underlying infrastructure from them as much as possible. >> And that's DataPlane Services. >> And that's DataPlane Services, exactly. >> The Hortonworks DataPlane Services we launched in October of last year. Actually I was on CUBE talking about it back then too. We see a lot of interest, a lot of excitement around it because now they understand that, again, this doesn't mean that we drive it down to the least common denominator. It is about helping enterprises leverage the key differentiators at each of the cloud renderers products. For example, Google, which we announced a partnership, they are really strong on AI and MO. So if you are running TensorFlow and you want to deal with things like Kubernetes, GKE is a great place to do it. And, for example, you can now go to Google Cloud and get DPUs which work great for TensorFlow. Similarly, a lot of customers run on Amazon for a bunch of the operational stuff, Redshift as an example. So the world we live in, we want to help the CIO leverage the best piece of the cloud but then give them a consistent way to manage and count that data. We were joking on stage that IT has just about learned how deal with Kerberos and Hadoob And now we're telling them, "Oh, go figure out IM on Google." which is also IM on Amazon but they are completely different. The only thing that's consistent is the name. So I think we have a unique opportunity especially with the open source technologies like Altas, Ranger, Knox and so on, to be able to draw a consistent fabric over this and secured occurrence. And help the enterprise leverage the best parts of the cloud to put a best fit architecture together, but which also happens to be a best of breed architecture. >> So the fabric is everything you're describing, all the Apache open source projects in which HortonWorks is a primary committer and contributor, are able to scheme as in policies and metadata and so forth across this distributed heterogeneous fabric of public and private cloud segments within a distributed environment. >> Exactly. >> That's increasingly being containerized in terms of the applications for deployment to edge nodes. Containerization is a big theme in HTP3.0 which you announced at this show. >> Yeah. >> So, if you could give us a quick sense for how that containerization capability plays into more of an edge focus for what your customers are doing. >> Exactly, great point, and again, the fabric is obviously, the core parts of the fabric are the open source projects but we've also done a lot of net new innovation with data plans which, by the way, is also open source. Its a new product and a new platform that you can actually leverage, to lay it out over the open source ones you're familiar with. And again, like you said, containerization, what is actually driving the fundamentals of this, the details matter, the scale at which we operate, we're talking about thousands of nodes, terabytes of data. The details really matter because a 5% improvement at that scale leads to millions of dollars in optimization for capex and opex. So that's why all of that, the details are being fueled and driven by the community which is kind of what we tell over HDP3 Until the key ones, like you said, are containerization because now we can actually get complete agility in terms of how you deploy the applications. You get isolation not only at the resource management level with containers but you also get it at the software level, which means, if two data scientists wanted to use a different version of Python or Scala or Spark or whatever it is, they get that consistently and holistically. That now they can actually go from the test dev cycle into production in a completely consistent manner. So that's why containers are so big because now we can actually leverage it across the stack and the things like MiNiFi showing up. We can actually-- >> Define MiNiFi before you go further. What is MiNiFi for our listeners? >> Great question. Yeah, so we've always had NiFi-- >> Real-time >> Real-time data flow management and NiFi was still sort of within the data center. What MiNiFi does is actually now a really, really small layer, a small thin library if you will that you can throw on a phone, a doorbell, a sensor and that gives you all the capabilities of NiFi but at the edge. >> Mmm Right? And it's actually not just data flow but what is really cool about NiFi it's actually command and control. So you can actually do bidirectional command and control so you can actually change in real-time the flows you want, the processing you do, and so on. So what we're trying to do with MiNiFi is actually not just collect data from the edge but also push the processing as much as possible to the edge because we really do believe a lot more processing is going to happen at the edge especially with the A6 and so on coming out. There will be custom hardware that you can throw and essentially leverage that hardware at the edge to actually do this processing. And we believe, you know, we want to do that even if the cost of data not actually landing up at rest because at the end of the day we're in the insights business not in the data storage business. >> Well I want to get back to that. You were talking about innovation and how so much of it is driven by the open source community and you're a veteran of the big data open source community. How do we maintain that? How does that continue to be the fuel? >> Yeah, and a lot of it starts with just being consistent. From day one, James was around back then, in 2011 we started, we've always said, "We're going to be open source." because we fundamentally believed that the community is going to out innovate any one vendor regardless of how much money they have in the bank. So we really do believe that's the best way to innovate mostly because their is a sense of shared ownership of that product. It's not just one vendor throwing some code out there try to shove it down the customers throat. And we've seen this over and over again, right. Three years ago, we talk about a lot of the data plane stuff comes from Atlas and Ranger and so on. None of these existed. These actually came from the fruits of the collaboration with the community with actually some very large enterprises being a part of it. So it's a great example of how we continue to drive it6 because we fundamentally believe that, that's the best way to innovate and continue to believe so. >> Right. And the community, the Apache community as a whole so many different projects that for example, in streaming, there is Kafka, >> Okay. >> and there is others that address a core set of common requirements but in different ways, >> Exactly. >> supporting different approaches, for example, they are doing streaming with stateless transactions and so forth, or stateless semantics and so forth. Seems to me that HortonWorks is shifting towards being more of a streaming oriented vendor away from data at rest. Though, I should say HDP3.0 has got great scalability and storage efficiency capabilities baked in. I wonder if you could just break it down a little bit what the innovations or enhancements are in HDP3.0 for those of your core customers, which is most of them who are managing massive multi-terabyte, multi-petabyte distributed, federated, big data lakes. What's in HDP3.0 for them? >> Oh for lots. Again, like I said, we obviously spend a lot of time on the streaming side because that's where we see. We live in a real-time world. But again, we don't do it at the cost of our core business which continues to be HDP. And as you can see, the community trend is drive, we talked about continuization massive step up for the Hadoob Community. We've also added support for GPUs. Again, if you think about Trove's at scale machine learning. >> Graphing processing units, >> Graphical-- >> AI, deep learning >> Yeah, it's huge. Deep learning, intensive flow and so on, really, really need a custom, sort of GPU, if you will. So that's coming. That's an HDP3. We've added a whole bunch of scalability improvements with HDFS. We've added federation because now we can go from, you can go over a billion files a billion objects in HDFS. We also added capabilities for-- >> But you indicated yesterday when we were talking that very few of your customers need that capacity yet but you think they will so-- >> Oh for sure. Again, part of this is as we enable more source of data in real-time that's the fuel which drives and that was always the strategy behind the HDF product. It was about, can we leverage the synergies between the real-time world, feed that into what you do today, in your classic enterprise with data at rest and that is what is driving the necessity for scale. >> Yes. >> Right. We've done that. We spend a lot of work, again, loading the total cost of ownership the TCO so we added erasure coding. >> What is that exactly? >> Yeah, so erasure coding is a classic sort of storage concept which allows you to actually in sort of, you know HTFS has always been three replicas So for redundancy, fault tolerance and recovery. Now, it sounds okay having three replicas because it's cheap disk, right. But when you start to think about our customers running 70, 80 hundred terabytes of data those three replicas add up because you've now gone from 80 terabytes of effective data where actually two 1/4 of an exobyte in terms of raw storage. So now what we can do with erasure coding is actually instead of storing the three blocks we actually store parody. We store the encoding of it which means we can actually go down from three to like two, one and a half, whatever we want to do. So, if we can get from three blocks to one and a half especially for your core data, >> Yeah >> the ones you're not accessing every day. It results in a massive savings in terms of your infrastructure costs. And that's kind of what we're in the business doing, helping customers do better with the data they have whether it's on-prem or on the cloud, that's sort of we want to help customers be comfortable getting more data under management along with secured and the lower TCO. The other sort of big piece I'm really excited about HDP3 is all the work that's happened to Hive Community for what we call the real-time database. >> Yes. >> As you guys know, you follow the whole sequel of ours in the Doob Space. >> And hive has changed a lot in the last several years, this is very different from what it was five years ago. >> The only thing that's same from five years ago is the name (laughing) >> So again, the community has done a phenomenal job, kind of, really taking sort of a, we used to call it like a sequel engine on HDFS. From there, to drive it with 3.0, it's now like, with Hive 3 which is part of HDP3 it's a full fledged database. It's got full asset support. In fact, the asset support is so good that writing asset tables is at least as fast as writing non-asset tables now. And you can do that not only on-- >> Transactional database. >> Exactly. Now not only can you do it on prem, you can do it on S3. So you can actually drive the transactions through Hive on S3. We've done a lot of work to actually, you were there yesterday when we were talking about some of the performance work we've done with LAP and so on to actually give consistent performance both on-prem and the cloud and this is a lot of effort simply because the performance characteristics you get from the storage layer with HDFS versus S3 are significantly different. So now we have been able to bridge those with things with LAP. We've done a lot of work and sort of enhanced the security model around it, governance and security. So now you get things like account level, masking, row-level filtering, all the standard stuff that you would expect and more from an Enprise air house. We talked to a lot of our customers, they're doing, literally tens of thousands of views because they don't have the capabilities that exist in Hive now. >> Mmm-hmm 6 And I'm sitting here kind of being amazed that for an open source set of tools to have the best security and governance at this point is pretty amazing coming from where we started off. >> And it's absolutely essential for GDPR compliance and compliance HIPA and every other mandate and sensitivity that requires you to protect personally identifiable information, so very important. So in many ways HortonWorks has one of the premier big data catalogs for all manner of compliance requirements that your customers are chasing. >> Yeah, and James, you wrote about it in the contex6t of data storage studio which we introduced >> Yes. >> You know, things like consent management, having--- >> A consent portal >> A consent portal >> In which the customer can indicate the degree to which >> Exactly. >> they require controls over their management of their PII possibly to be forgotten and so forth. >> Yeah, it's going to be forgotten, it's consent even for analytics. Within the context of GDPR, you have to allow the customer to opt out of analytics, them being part of an analytic itself, right. >> Yeah. >> So things like those are now something we enable to the enhanced security models that are done in Ranger. So now, it's sort of the really cool part of what we've done now with GDPR is that we can get all these capabilities on existing data an existing applications by just adding a security policy, not rewriting It's a massive, massive, massive deal which I cannot tell you how much customers are excited about because they now understand. They were sort of freaking out that I have to go to 30, 40, 50 thousand enterprise apps6 and change them to take advantage, to actually provide consent, and try to be forgotten. The fact that you can do that now by changing a security policy with Ranger is huge for them. >> Arun, thank you so much for coming on theCUBE. It's always so much fun talking to you. >> Likewise. Thank you so much. >> I learned something every time I listen to you. >> Indeed, indeed. I'm Rebecca Knight for James Kobeilus, we will have more from theCUBE's live coverage of DataWorks just after this. (Techno music)

Published Date : Jun 19 2018

SUMMARY :

brought to you by Hortonworks. It's great to have you on Yeah, likewise. is part of the strategy but it really needs to fit and that's the business we are in. And hide the details of all of the underlying infrastructure for a bunch of the operational stuff, So the fabric is everything you're describing, in terms of the applications for deployment to edge nodes. So, if you could give us a quick sense for Until the key ones, like you said, are containerization Define MiNiFi before you go further. Yeah, so we've always had NiFi-- and that gives you all the capabilities of NiFi the processing you do, and so on. and how so much of it is driven by the open source community that the community is going to out innovate any one vendor And the community, the Apache community as a whole I wonder if you could just break it down a little bit And as you can see, the community trend is drive, because now we can go from, you can go over a billion files the real-time world, feed that into what you do today, loading the total cost of ownership the TCO sort of storage concept which allows you to actually is all the work that's happened to Hive Community in the Doob Space. And hive has changed a lot in the last several years, And you can do that not only on-- the performance characteristics you get to have the best security and governance at this point and sensitivity that requires you to protect possibly to be forgotten and so forth. Within the context of GDPR, you have to allow The fact that you can do that now Arun, thank you so much for coming on theCUBE. Thank you so much. we will have more from theCUBE's live coverage of DataWorks

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
James	PERSON	0.99+
Aaron Murphy	PERSON	0.99+
Arun Murphy	PERSON	0.99+
Arun	PERSON	0.99+
2011	DATE	0.99+
Google	ORGANIZATION	0.99+
5%	QUANTITY	0.99+
80 terabytes	QUANTITY	0.99+
FedEx	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
Arun Murthy	PERSON	0.99+
HortonWorks	ORGANIZATION	0.99+
yesterday	DATE	0.99+
San Jose, California	LOCATION	0.99+
three replicas	QUANTITY	0.99+
James Kobeilus	PERSON	0.99+
three blocks	QUANTITY	0.99+
GDPR	TITLE	0.99+
Python	TITLE	0.99+
Europe	LOCATION	0.99+
millions of dollars	QUANTITY	0.99+
Scala	TITLE	0.99+
Spark	TITLE	0.99+
theCUBE	ORGANIZATION	0.99+
five years ago	DATE	0.99+
one and a half	QUANTITY	0.98+
Enprise	ORGANIZATION	0.98+
three	QUANTITY	0.98+
Hive 3	TITLE	0.98+
Three years ago	DATE	0.98+
both	QUANTITY	0.98+
Asia	LOCATION	0.97+
50 thousand	QUANTITY	0.97+
TCO	ORGANIZATION	0.97+
MiNiFi	TITLE	0.97+
Apache	ORGANIZATION	0.97+
40	QUANTITY	0.97+
Altas	ORGANIZATION	0.97+
Hortonworks DataPlane Services	ORGANIZATION	0.96+
DataWorks Summit 2018	EVENT	0.96+
30	QUANTITY	0.95+
thousands of nodes	QUANTITY	0.95+
A6	COMMERCIAL_ITEM	0.95+
Kerberos	ORGANIZATION	0.95+
today	DATE	0.95+
Knox	ORGANIZATION	0.94+
one	QUANTITY	0.94+
hive	TITLE	0.94+
two data scientists	QUANTITY	0.94+
each	QUANTITY	0.92+
Chinese	OTHER	0.92+
TensorFlow	TITLE	0.92+
S3	TITLE	0.91+
October of last year	DATE	0.91+
Ranger	ORGANIZATION	0.91+
Hadoob	ORGANIZATION	0.91+
HIPA	TITLE	0.9+
CUBE	ORGANIZATION	0.9+
tens of thousands	QUANTITY	0.9+
one vendor	QUANTITY	0.89+
last several years	DATE	0.88+
a billion objects	QUANTITY	0.86+
70, 80 hundred terabytes of data	QUANTITY	0.86+
HTP3.0	TITLE	0.86+
two 1/4 of an exobyte	QUANTITY	0.86+
Atlas and	ORGANIZATION	0.85+
DataPlane Services	ORGANIZATION	0.84+
Google Cloud	TITLE	0.82+

Pankaj Sodhi, Accenture | Dataworks Summit EU 2018

>> Narrator: From Berlin, Germany, it's theCUBE. Covering Data Works Summit, Europe 2018. Brought to you by, Horton Works. >> Well hello, welcome to theCUBE. I am James Kobielus. I'm the lead analyst within the Wikbon Team at Silicon Angled Media, focused on big data analytics. And big data analytics is what Data Works Summit is all about. We are at Data Works Summit 2018 in Berlin, Germany. We are on day two, and I have, as my special guest here, Pankaj Sodhi, who is the big data practice lead with Accenture. He's based in London, and he's here to discuss really what he's seeing in terms of what his clients are doing with Big DSO. Hello, welcome Pankaj, how's it going? >> Thank you Jim, very pleased to be there. >> Great, great, so what are you seeing in terms of customers adoption of the dupe and so forth, big data platforms, for what kind of use cases are you seeing? GDPR is coming down very quickly, and we saw this poll this morning that John Chrysler, of Horton Works, did from the stage, and it's a little bit worrisome if you're an enterprise data administrator. Really, in enterprise period, because it sounds like not everybody in this audience, in fact a sizeable portion, is not entirely ready to comply with GDRP on day one, which is May 25th. What are you seeing, in terms of customer readiness, for this new regulation? >> So Jim, I'll answer the question in two ways. One was, just in terms of, you know, the adoption of Hadoop, and then, you know, get into GDPR. So in regards to Hadoop adoption, I think I would place clients in three different categories. The first ones are the ones that have been quite successful in terms of adoption of Hadoop. And what they've done there is taken a very use case driven approach to actually build up the capabilities to deploy these use cases. And they've taken an additive approach. Deployed hybrid architectures, and then taken the time. >> Jim: Hybrid public, private cloud? >> Cloud as well, but often sort of, on premise. Hybrid being, for example, with an EDW and product type AA. In that scenario, they've taken the time to actually work out some of the technical complexities and nuances of deploying these pipelines in production. Consequently, what they're in a good position to do now, is to leverage the best of Cloud computing, open so its technology, while it's looking at making the best getting the investment protection that they have from the premise deployments as well. So they're in a fairly good position. Another set of customers have done successful pilots looking at either optimization use cases. >> Jim: How so, Hadoob? >> Yes, leveraging Hadoob. Either again from a cost optimization play or potentially a Bon Sand escape abilities. And there in the process of going to production, and starting to work out, from a footprint perspective, what elements of the future pipelines are going to be on prim, potentially with Hadoop, or on cloud with Hadoop. >> When you say the pipeline in this context, what are you referring to? When I think of pipeline, in fact in our coverage of pipeline, it refers to an end to end life cycle for development and deployment and management of big data. >> Pankaj: Absolutely >> And analytics, so that's what you're saying. >> So all the way from ingestion to curation to consuming the data, through multiple different access spots, so that's the full pipeline. And I think what the organizations that have been successful have done is not just looked at the technology aspect, which is just Hadoop in this case, but looked at a mix of architecture, delivery approaches, governance, and skills. So I'd like to bring this to life by looking at advanced analytics as a use case. So rather than take the approach of lets ingest all data in a data lake, it's been driven by a use case mapped to a set of valuable data sets that can be ingested. But what's interesting then is the delivery approach has been to bring together diverse skill sets. For example, date engineers, data scientists, data ops and visualization folks, and then use them to actually challenge architecture and delivery approach. I think this is where, the key ingredient for success, which is, for me, the modern sort of Hadoob's pipeline, need to be iteratively built and deployed, rather than linear and monolithic. So this notion of, I have raw data, let me come up a minimally curated data set. And then look at how I can do future engineering and build an analytical model. If that works, and I need to enhance, get additional data attributes, I then enhance the pipeline. So this is already starting to challenge organizations architecture approaches, and how you also deploy into production. And I think that's been one of the key differences between organizations that have embarked on the journey, ingested the data, but not had a path to production. So I think that's one aspect. >> How are the data stewards of the world, or are they challenging the architecture, now that GDPR is coming down fast and furious, we're seeing, for example Horton Works architecture for data studio, are you seeing did the data govern as the data stewards of the world coming, sitting around the virtual table, challenging this architecture further to evolve? >> I think. >> To enable privacy by default and so forth? >> I think again, you know the organizations that have been successful have already been looking at privacy by design before GDPR came along. Now one of the reasons a lot of the data link implementation haven't been as successful, is the business haven't had the ability to actually curate the data sets, work out what the definitions are, what the curation levels are. So therefore, what we see with business glossaries, and sort of data architectures, from a GDPR perspective, we see this as an opportunity rather than a threat. So to actually make the data usable in the data lakes, we often talk to clients about this concept of the data marketplace. So in the data marketplace, what you need to have, is well curated data sets. The proper definition such will, for business glossary or a data catalog, underpin by the right user access model, and available for example through a search or API's. So, GDPR actually is. >> There's not a public market place, this is an architectural concept. >> Yes. >> It could be inside, completely inside, the private data center, but it's reusable data, it's both through API, and standard glossaries and meta data and so forth, is that correct? >> Correct, so data marketplace is reusable, both internally, for example, to unlock access to data scientists who might want to use the data set and then put that into a data lab. It can also be extended, from an APR perspective, for a third party data market place for exchanging data with consumers or third parties as organizations look at data monetization as well. And therefore, I think the role of data stewards is changing around a bit. Rather than looking at it from a compliance perspective, it's about how can we make data usable to the analysts and the data scientists. So actually focusing on getting the right definitions upfront, and as we curate and publish data, and as we enrich it, what's the next definition that comes of that? And actually have that available before we publish the data. >> That's a fascinating concept. So, the notion of a data steward or a data curator. It's sort of sounds like you're blending them. Where the data curator, their job, part of it, very much of it, involves identifying the relevance of data and the potential reusability and attractiveness of that data for various downstream uses and possibly being a player in the ongoing identification of the monetize-ability of data elements, both internally and externally in the (mumbles). Am I describing correctly? >> Pankaj: I think you are, yes. >> Jim: Okay. >> I think it's an interesting implication for the CDO function, because, rather than see the function being looked at as a policy. >> Jim: The chief data officer. >> Yes, chief data officer functions. So rather than imposition of policies and standards, it's about actually trying to unlock business values. So rather than look at it from a compliance perspective, which is very important, but actually flip it around and look at it from a business value perspective. >> Jim: Hmm. >> So for example, if you're able to tag and classify data, and then apply the right kind of protection against it, it actually helps the data scientists to use that data for their models. While that's actually following GDPR guidelines. So it's a win-win from that perspective. >> So, in many ways, the core requirement for GDPR compliance, which is to discover an inventory and essentially tag all of your data, on a fine grade level, can be the greatest thing that ever happened to data monetization. In other words, it's the foundation of data reuse and monetization, unlocking the true value to your business of the data. So it needn't be an overhead burden, it can be the foundation for a new business model. >> Absolutely, Because I think if you talk about organizations becoming data driven, you have to look at what does the data asset actually mean. >> Jim: Yes. >> So to me, that's a curated data set with the right level of description, again underpinned by the right authority of privacy and ability to use the data. So I think GDPR is going to be a very good enabler, so again the small minority of organizations that have been successful have done this. They've had business laws freeze data catalogs, but now with GDPR, that's almost I think going to force the issue. Which I think is a very positive outcome. >> Now Pankaj, do you see any of your customers taking this concept of curation and so forth, the next step in terms of there's data assets but then there's data derived assets, like machine learning models and so forth. Data scientists build and train and deploy these models and algorithms, that's the core of their job. >> Man: Mhmm. >> And model governance is a hot hot topic we see all over. You've got to have tight controls, not just on the data, but on the models, 'cause they're core business IP. Do you see this architecture evolving among your customer so that they'll also increasingly be required to want to essentially catalog the models and identify curate them for re-usability. Possibly monetization opportunities. Is that something that any of your customers are doing or exploring? >> Some of our customers are looking at that as well. So again, initially, exactly it's an extension of the marketplace. So while one aspect of the marketplace is data sets, you can then combine to run the models, The other aspect is models that you can also search for and prescribe data. >> Jim: Yeah, like pre-trained models. >> Correct. >> Can be golden if they're pre trained and the core domain for which they're trained doesn't change all that often, they can have a great after market value conceivably if you want to resell that. >> Absolutely, and I think this is also a key enabler for the way data scientists and data engineers expect to operate. So this notion of IDs of collaborative notebooks and so forth, and being able to soft of share the outputs of models. And to be able to share that with other folks in the team who can then maybe tweak it for a different algorithm, is a huge, I think, productivity enabler, and we've seen. >> Jim: Yes. >> Quite a few of our technology partners working towards enabling these data scientists to move very quickly from a model they may have initially developed on a laptop, to actually then deploying the (mumbles). How can you do that very quickly, and reduce the time from an ideal hypothesis to production. >> (mumbles) Modularization of machine learning and deep learning, I'm seeing a lot of that among data scientists in the business world. Well thank you, Pankaj, we're out of time right now. This has been very engaging and fascinating discussion. And we thank you very much for coming on theCUBE. This has been Pankaj Sodhi of Accenture. We're here at Data Works Summit 2018 in Berlin, Germany. Its been a great show, and we have more expert guests that we'll be interviewing later in the day. Thank you very much, Pankaj. >> Thank you very much, Jim.

Published Date : Apr 19 2018

SUMMARY :

Brought to you by, Horton Works. He's based in London, and he's here to discuss really what is not entirely ready to comply with GDRP on day one, So in regards to Hadoop adoption, I think I would place In that scenario, they've taken the time to actually and starting to work out, from a footprint perspective, it refers to an end to end life cycle for development So this is already starting to challenge organizations haven't had the ability to actually curate the data sets, this is an architectural concept. the right definitions upfront, and as we curate and possibly being a player in the ongoing identification for the CDO function, because, rather than So rather than look at it from a compliance perspective, it actually helps the data scientists that ever happened to data monetization. Absolutely, Because I think if you talk So I think GDPR is going to be a very good enabler, and algorithms, that's the core of their job. so that they'll also increasingly be required to want to of the marketplace. if you want to resell that. And to be able to share that with other folks in the team to move very quickly from a model And we thank you very much for coming on theCUBE.

ENTITIES

Entity	Category	Confidence
Pankaj	PERSON	0.99+
James Kobielus	PERSON	0.99+
Jim	PERSON	0.99+
London	LOCATION	0.99+
Pankaj Sodhi	PERSON	0.99+
May 25th	DATE	0.99+
Accenture	ORGANIZATION	0.99+
John Chrysler	PERSON	0.99+
Horton Works	ORGANIZATION	0.99+
Silicon Angled Media	ORGANIZATION	0.99+
GDPR	TITLE	0.99+
Berlin, Germany	LOCATION	0.99+
One	QUANTITY	0.98+
both	QUANTITY	0.98+
one aspect	QUANTITY	0.97+
one	QUANTITY	0.97+
Data Works Summit	EVENT	0.96+
two ways	QUANTITY	0.96+
Data Works Summit 2018	EVENT	0.95+
Dataworks Summit EU 2018	EVENT	0.93+
Europe	LOCATION	0.93+
Hadoop	TITLE	0.92+
day two	QUANTITY	0.9+
Hadoob	PERSON	0.87+
2018	EVENT	0.84+
day one	QUANTITY	0.82+
three	QUANTITY	0.79+
first ones	QUANTITY	0.77+
theCUBE	ORGANIZATION	0.76+
Wikbon Team	ORGANIZATION	0.72+
this morning	DATE	0.7+
Hadoob	TITLE	0.7+
GDRP	TITLE	0.55+
categories	QUANTITY	0.54+
Big DSO	ORGANIZATION	0.52+
Hadoob	ORGANIZATION	0.46+

Mike Merritt-Holmes, Think Big - DataWorks Summit Europe 2017 - #DW17 - #theCUBE

>> Narrator: Covering Data Works Summit Europe 2017 brought to you by Horton Works. (uptempo, energetic music) >> Okay, welcome back everyone. We're here live in Germany at Munich for DataWorks Summit 2017, formerly Hadoop Summit. I'm John Furrier, my co-host Dave Vellante. Our next guest is Mike Merritt-Holmes, is senior Vice President of Global Services Strategy at Think Big, a Teradata company, formerly the co-founder of the Big Data Partnership merged in with Think Big and Teradata. Mike, welcome to The Cube. >> Mike: Thanks for having me. >> Great having an entrepreneur on, you're the co-founder, which means you've got that entrepreneurial blood, and I got to ask you, you know, you're in the big data space, you got to be pretty pumped by all the hype right now around AI because that certainly gives a lot of that extra, extra steroid of recognition. People love AI it gives a face to it, and certainly IOT is booming as well, Internet of Things, but big data's cruising along. >> I mean it's a great place to be. The train is certainly going very, very quickly right now. But the thing for us is, we've been doing data science and AI and trying to build business outcomes, and value for businesses for a long time. It's just great now to see this really, the data science and AI both were really starting to take effect and so companies are starting to understand it and really starting to really want to embrace it which is amazing. >> It's inspirational too, I mean I have a bunch of kids in my family, some are in college and some are in high school, even the younger generation are getting jazzed up on just software, right, but the big data stuffs been cruising along now. It's been a good, decade now of really solid DevOps culture, cloud now accelerating, but now the customers are forcing the vendors to be very deliberate in delivering great product, because the demand (chuckling) for real time, the demand for more stuff, is at an all time high. Can you elaborate your thoughts on, your reaction to what customers are doing, because they're the ones driving everyone, not to create friction, to create simplicity. >> Yeah, and you know, our customers are global organizations, trying to leverage this kind of technology, and they are, you know, doing an awesome amount of stuff right now to try to move them from, effectively, a step change in their business, whether it's, kind of, shipping companies doing preventive asset maintenance, or whether it's retailers looking to target customers in a more personalized way, or really understand who their customers are, where they come from, they're leveraging all those technologies, and really what they're doing is pushing the boundaries of all of them, and putting more demands on all of the vendors in the space to say, we want to do this quicker, faster, but more easily as well. >> And then the things that you're talking about, I want to get your thoughts on, because this is the conversation that you're having with customers, I want to extract is, have those kind of data-driven mindset questions, have come out the hype of the Hadoob. So, I mean we've been on a hype cycle for awhile, but now its back to reality. Where are we with the customer conversations, and, from your stand point, what are they working on? I mean, is it mostly IT conversation? Is it a frontoffice conversation? Is it a blend of both? Because, you know, data science kind of threads both sides of the fence there. >> Yeah, I mean certainly you can't do big data without IT being involved, but since the start, I mean, we've always been engaged with the business, it's always been about business outcome, because you bring data into a platform, you provide all this data science capability, but unless you actually find ROI from that, then there's no point, because you want to be moving the business forward, so it's always been about business engagement, but part of that has always been also about helping them to change their mindset. I don't want a report, I want to understand why you look at that report and what's the thing you're looking for, so we can start to identify that for you quicker. >> What's the coolest conversation you've been in, over the past year? >> Uh, I mean, I can't go into too much details, but I've had some amazing conversations with companies like Lego, for instance, they're an awesome company to work with. But when you start to see some of the things we're doing, we're doing some amazing object recognition with deep-learning in Japan. We're doing some ford analytics in the Nordics with deep-learning, we're doing some amazing stuff that's really pushing the boundaries, and when you start to put those deep-learning aspects into real world applications, and you start to see, customers clambering over to want to be part of that, it's a really exciting place to be. >> Let me just double-click on that for a second, because a lot of, the question I get a lot on The Cube, and certainly off-camera is, I want to do deep-learning, I want to do AI, I love machine learning, I hear, oh, it's finally coming to reality so people see it forming. How do they get started, what are some of the best practices of getting involved in deep-learning? Is it using open-source, obviously, is one avenue, but what advice would you give customers? >> From a deep-learning perspective, so I think first of all, I mean, a lot of the greatest deep-learning technologies, run open-source, as you rightly said, but I think actually there's a lot of tutorials and stuff on there, but really what you need is someone who has done it before, who knows where the pitfalls are, but also know when to use the right technology at the right time, and also to know around some of the aspects about whether using a deep-learning methodology is going to be the right approach for your business problem. Because a lot of companies are, like, we want to use this deep-learning thing, its amazing, but actually its not appropriate, necessarily, for the use case you're trying to draw from. >> It's the classic holy grail, where is it, if you don't know what you're looking for, it's hard to know when to apply it. >> And also, you've got to have enough data to utilize those methods as well, so. >> You hear a lot about the technical complexity associated with Hadoop specifically, but just ol' big data generally. I wonder if you could address that, in terms of what you're seeing, how people are dealing with that technical complexity but what other headwinds are there, in terms of adopting these new capabilities. >> Yeah, absolutely, so one of the challenges that we still see is that customers are struggling to leverage value from their platform, and normally that's because of the technical complexities. So we really, we introduced to the open-source world last month Kaylo, something you can download free of charge. It's completely open-source on the Apache license, and that really was about making it easier for customers to start to leverage the data on the platform, to self-serve injection onto that, and for data scientists to wrangle the data better. So, I think there's a real push right now about that next level up, if you like, in the technology stack to start to enable non-technical users to start to do interesting things on the platform directly, rather than asking someone to do it for them. And that, you know, we've had technologies in the PI space like Tableau, and, obviously, the (mumbling) did a data-warehouse solutions on Teradata that have been giving customers something, before and previously, but actually now they're asking for more, not just that, but more as well. And that's where we are starting to see the increases. >> So that's sort of operationalizing analytics as an example, what are some of the business complexities and challenges of actually doing that? >> That's a very good question, because, I think, when you find out great insight, and you go, wow you've built this algorithm, I've seen things I've never seen before, then the business wants to have that always on they want to know that it's that insight all the time is it changing, is it going up, is it going down do I need to change my business decisions? And doing that and making that operational means, not only just deploying it but also monitoring those models, being able to keep them up to date regularly, understanding whether those things are still accurate or not, because you don't want to be making business decisions, on algorithms that are now a bit stale. So, actually operationalizing it, is about building out an entire capability that's keeping these things accurate, online, and, therefore, there's still a bit of work to do, I think, actually in the marketplace still, around building out an operational capability. >> So you kind of got bottom-up, top-down. Bottom-up is the you know the Hadoop experiments, and then top-down is CXO saying we need to do big data. Have those two constituencies come together now, who's driving the bus? Are they aligned or is it still, sort of, a mess organizationally? >> Yeah, I mean, generally, in the organization, there's someone playing the Chief Data Officer, whether they have that as a title or a roll, ultimately someone is in charge of generating value from the data they have in the organization. But they can't do that with IT, and I think where we've seen companies struggle is where they've driven it from the bottom-up, and where they succeed is where they drive it from the top-down, because by driving it from the top-down, you really align what you're doing with the business and strategy that you have. So, the company strategy, and what you're trying to achieve, but ultimately, they both need to meet in the middle, and you can't do one without the other. >> And one of our practitioner friends, who's describing this situation in our office in Palo Alto, a couple of weeks ago. he said, you know, the challenge we have as an organization is, you've got top people saying alright, we're moving. And they start moving, the train goes, and then you've got kind of middle management, sort of behind them, and then you got the doers that are far behind, and aligning those is a huge challenge for this particular organization. How do you recommend organizations to address that alignment challenge, does Think Big have capabilities to help them through that, or is that, sort of, you got to call Accenture? >> In essence, our reason for being is to help with those kind of things, and, you know, whether it's right from the start, so, oh, my God, my Chief Data Officer or my CEO is saying we need to be doing this thing right now, come on, let's get on with it, and we help them to understand what does that mean, what are the use cases, how, where's the value going to come from, what's that architecting to look like, or whether its helping them to build out capability, in terms of data science or building out the cluster itself, and then managing that and providing training for staff. Our whole reason for being is supporting that transformation as a business, from, oh, my God, what do I do about this thing, to, I'm fully embracing it, I know what's going on, I'm enabling my business, and I'm completely comfortable with that world. >> There was a lot talk three, or four or five years ago, about the ROI of so-called big data initiatives, not being really, you know, there were edge cases which were huge ROI, but there was a lot of talk about not a lot of return. My question is, has that, first question, has that changed, are you starting to see much bigger phone numbers coming back where the executives are saying yeah, lets double down on this. >> Definitely, I'm definitely seeing that. I mean, I think it's fair to say that companies are a bit nervous about reporting their ROI around this stuff, in some cases, so there's more ROI out there than you necessarily see out in the public place, but-- >> Why is that? Because they don't want to expose to the competition, or they don't want to front run their earnings, or whatever it is? >> They're trying to get a competitive edge. The minute you start saying, we're doing this, their competitors have an opportunity to catch up. >> John: Very secretive. >> Yeah and I think, it's not necessarily about what they're doing, it's about keeping the edge over their customers, really, over their competitors. So, but what we're seeing is that many customers are getting a lot of ROI more recently because they're able to execute better, rather than being struggling with the IT problems, and even just recently, for instance, we had a customer of ours, the CEO phones us up and says, you know what, we've got this problem with our sales. We don't really know why this is going down, you know, in this country, in this part of the world, it's going up, in this country, it's going down, we don't know why, and that's making us very nervous. Could you come in and just get the data together, work out why it's happening, so that we can understand what it is. And we came in, and within weeks, we were able to give them a very good insight into exactly why that is, and they changed their strategy, moving forward, for the next year, to focus on addressing that problem, and that's really amazing ROI for a company to be able to get that insight. Now, we're working with them to operationalize that, so that particular insight is always available to them, and that's an example of how companies are now starting to see that ROI come through, and a lot of it is about being able to articulate the right business question, rather than trying to worry about reports. What is the business question I'm trying to solve or answer, and that's when you can start to see the ROI come through. >> Can you talk about the customer orientation when they get to that insight, because you mentioned earlier that they got used to the reports, and you mentioned visualization, Tableau, they become table states, once you get addicted to the visualization, you want to extract more insights so the pressure seems to be getting more insight. So, two questions, process gap around what they need to do process-wise, and then just organizational behavior. Are they there mentally, what are some of the criteria in your mind, in your experiments, with customers around the processes that they go through, and then organizational mindset. >> Yeah, so what I would say is, first of all, from an organizational mindset perspective, it's very important to start educating, not just the analysis team, but the entire business on what this whole machine-learning, big data thing is all about, and how to ask the right questions. So, really starting to think about the opportunities you have to move your business forward, rather than what you already know, and think forward rather than retrospective. So, the other thing we often have to teach people, as well, is that this isn't about what you can get from the data warehouse, or replacing your data warehouse or anything like that. It's about answering the right questions, with the right tools, and here is a whole set of tools that allow you to answer different questions that you couldn't before, so leverage them. So, that's very important, and so that mindset requires time actually, to transform business into that mindset, and a lot of commitment from the business to make that happen. >> So, mindset first, and then you look at the process, then you get to the product. >> Yep, so, and basically, once you have that mindset, you need to set up an engine that's going to run, and start to drive the ROI out, and the engine includes, you know, your technical folk, but also your business users, and that engine will then start to build up momentum. The momentum builds more interest, and, overtime, you start to get your entire business into using these tools. >> It kind of makes sense, just kind of riffing in real time here, so the product-gap conversation should probably come after you lay that out first, right? >> Totally, yeah, I mean, you don't choose a product before you know what you need to do with it. So, but actually often companies don't know what they need to do with it, because they've got the wrong mindset in the first place. And so part of the road map stuff that we do, that we have a road map offering, is about changing that mindset, and helping them to get through that first stage, where we start to put, articulate the right use cases, and that really is driving a lot of value for our customers. Because they start from the right place-- >> Sometimes we hear stories, like the product kind of gives them a blind spot, because they tend to go into, with a product mindset first, and that kind of gives them some baggage, if you will. >> Well, yeah, because you end up with a situation, where you go, you get a product in, and then you say what can we do with it. Or, in fact, what happens is the vendor will say, these are the things you could do, and they give you use cases. >> It constrains things, forecloses tons of opportunities, because you're stuck within a product mindset. >> Yeah, exactly that, and you're not, you don't want to be constrained. And that's why open-source, and the kind of ecosystem that we have within the big data space is so powerful, because there's so many different tools for different things but don't choose your tool until you know what you're trying to achieve. >> I have a market question, maybe you just give us opinion, caveat, if you like, it's sort of a global, macro view. When we started first looking at the big data market, we noticed right away the dominant portion of revenue was coming from services. Hardware was commodity, so, you know, maybe sort of less than you would, obviously, in a mainframe world, and open-source software has a smaller contribution, so services dominated, and, frankly, has continued to dominate, since the early days. Do you see that changing, or do you think those percentages, if you will, will stay relatively constant? >> Well, I think it will change over time, but not in the near future, for sure, there's too much advancement in the technology landscape for that to stop, so if you had a set of tools that weren't really evolving, becoming very mature, and that's what tools you had, ultimately, the skill sets around them start to grow, and it becomes much easier to develop stuff, and then companies start to build out industry- or solutions-specific stuff on top, and it makes it very easy to build products. When you have an ecosystem that's evolving, growing with the speed it is, you're constantly trying to keep up with that technology, and, therefore, services have to play an awful big part in making sure that you are using the right technology, at the right time, and so, for the near future, for certain, that won't change. >> Complexity is your friend. >> Yeah, absolutely. Well, you know, we live in a complex world, but we live and breathe this stuff, so what's complex to some is not to us, and that's why we add value, I guess. >> Mike Merritt-Holmes here inside The Cube with Teradata Think Big. Thanks for spending the time sharing your insights. >> Thank you for having me. >> Understand the organizational mindset, identify the process, then figure out the products. That's the insight here on The Cube, more coverage of Data Works Summit 2017, here in Germany after this short break. (upbeat electronic music)

Published Date : Apr 5 2017

SUMMARY :

brought to you by Horton Works. formerly the co-founder of and I got to ask you, you know, I mean it's a great place to be. but the big data stuffs and they are, you know, of the fence there. that for you quicker. and when you start to put but what advice would you give customers? a lot of the greatest if you don't know what you're looking for, got to have enough data I wonder if you could address that, and for data scientists to and you go, wow you've Bottom-up is the you know and you can't do one without the other. and then you got the is to help with those kind of things, not being really, you know, in the public place, but-- The minute you start and that's when you can start so the pressure seems to and a lot of commitment from the business then you get to the product. and the engine includes, you and helping them to get because they tend to go into, and then you say what can we do with it. because you're stuck and the kind of ecosystem that we have of less than you would, and so, for the near future, Well, you know, we live Thanks for spending the identify the process, then

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
John	PERSON	0.99+
Japan	LOCATION	0.99+
Mike	PERSON	0.99+
John Furrier	PERSON	0.99+
Lego	ORGANIZATION	0.99+
Mike Merritt-Holmes	PERSON	0.99+
Teradata	ORGANIZATION	0.99+
Germany	LOCATION	0.99+
Palo Alto	LOCATION	0.99+
Think Big	ORGANIZATION	0.99+
two questions	QUANTITY	0.99+
first question	QUANTITY	0.99+
Munich	LOCATION	0.99+
Accenture	ORGANIZATION	0.99+
last month	DATE	0.99+
one	QUANTITY	0.99+
Horton Works	ORGANIZATION	0.99+
Big Data Partnership	ORGANIZATION	0.99+
both	QUANTITY	0.99+
both sides	QUANTITY	0.98+
two constituencies	QUANTITY	0.98+
next year	DATE	0.98+
first	QUANTITY	0.98+
Nordics	LOCATION	0.98+
first stage	QUANTITY	0.98+
#DW17	EVENT	0.97+
Data Works Summit 2017	EVENT	0.97+
DataWorks Summit 2017	EVENT	0.96+
Tableau	TITLE	0.95+
Hadoop	TITLE	0.95+
four	DATE	0.93+
Hadoop Summit	EVENT	0.93+
five years ago	DATE	0.9+
Apache	TITLE	0.89+
The Cube	ORGANIZATION	0.87+
Vice President	PERSON	0.87+
Data Works Summit Europe 2017	EVENT	0.83+
a couple of weeks ago	DATE	0.82+
one avenue	QUANTITY	0.82+
DataWorks Summit Europe 2017	EVENT	0.8+
Kaylo	PERSON	0.8+
past year	DATE	0.79+
Global Services Strategy	ORGANIZATION	0.79+
Teradata Think Big	ORGANIZATION	0.77+
three	QUANTITY	0.76+
double	QUANTITY	0.75+
Think Big -	EVENT	0.71+
Covering	EVENT	0.69+
Hadoob	ORGANIZATION	0.62+
decade	QUANTITY	0.58+
second	QUANTITY	0.58+
Cube	COMMERCIAL_ITEM	0.56+
CXO	PERSON	0.48+
Cube	ORGANIZATION	0.46+
#theCUBE	ORGANIZATION	0.45+

Frederick Reiss, IBM STC - Big Data SV 2017 - #BigDataSV - #theCUBE

>> Narrator: Live from San Jose, California it's the Cube, covering Big Data Silicon Valley 2017. (upbeat music) >> Big Data SV 2016, day two of our wall to wall coverage of Strata Hadoob Conference, Big Data SV, really what we call Big Data Week because this is where all the action is going on down in San Jose. We're at the historic Pagoda Lounge in the back of the Faramount, come on by and say hello, we've got a really cool space and we're excited and never been in this space before, so we're excited to be here. So we got George Gilbert here from Wiki, we're really excited to have our next guest, he's Fred Rice, he's the chief architect at IBM Spark Technology Center in San Francisco. Fred, great to see you. >> Thank you, Jeff. >> So I remember when Rob Thomas, we went up and met with him in San Francisco when you guys first opened the Spark Technology Center a couple of years now. Give us an update on what's going on there, I know IBM's putting a lot of investment in this Spark Technology Center in the San Francisco office specifically. Give us kind of an update of what's going on. >> That's right, Jeff. Now we're in the new Watson West building in San Francisco on 505 Howard Street, colocated, we have about a 50 person development organization. Right next to us we have about 25 designers and on the same floor a lot of developers from Watson doing a lot of data science, from the weather underground, doing weather and data analysis, so it's a really exciting place to be, lots of interesting work in data science going on there. >> And it's really great to see how IBM is taking the core Watson, obviously enabled by Spark and other core open source technology and now applying it, we're seeing Watson for Health, Watson for Thomas Vehicles, Watson for Marketing, Watson for this, and really bringing that type of machine learning power to all the various verticals in which you guys play. >> Absolutely, that's been what Watson has been about from the very beginning, bringing the power of machine learning, the power of artificial intelligence to real world applications. >> Jeff: Excellent. >> So let's tie it back to the Spark community. Most folks understand how data bricks builds out the core or does most of the core work for, like, the sequel workload the streaming and machine learning and I guess graph is still immature. We were talking earlier about IBM's contributions in helping to build up the machine learning side. Help us understand what the data bricks core technology for machine learning is and how IBM is building beyond that. >> So the core technology for machine learning in Apache Spark comes out, actually, of the machine learning department at UC Berkeley as well as a lot of different memories from the community. Some of those community members also work for data bricks. We actually at the IBM Spark Technology Center have made a number of contributions to the core Apache Spark and the libraries, for example recent contributions in neural nets. In addition to that, we also work on a project called Apache System ML, which used to be proprietary IBM technology, but the IBM Spark Technology Center has turned System ML into Apache System ML, it's now an open Apache incubating project that's been moving forward out in the open. You can now download the latest release online and that provides a piece that we saw was missing from Spark and a lot of other similar environments and optimizer for machine learning algorithms. So in Spark, you have the catalyst optimizer for data analysis, data frames, sequel, you write your queries in terms of those high level APIs and catalyst figures out how to make them go fast. In System ML, we have an optimizer for high level languages like Spark and Python where you can write algorithms in terms of linear algebra, in terms of high level operations on matrices and vectors and have the optimizer take care of making those algorithms run in parallel, run in scale, taking account of the data characteristics. Does the data fit in memory, and if so, keep it in memory. Does the data not fit in memory? Stream it from desk. >> Okay, so there was a ton of stuff in there. >> Fred: Yep. >> And if I were to refer to that as so densely packed as to be a black hole, that might come across wrong, so I won't refer to that as a black hole. But let's unpack that, so the, and I meant that in a good way, like high bandwidth, you know. >> Fred: Thanks, George. >> Um, so the traditional Spark, the machine learning that comes with Spark's ML lib, one of it's distinguishing characteristics is that the models, the algorithms that are in there, have been built to run on a cluster. >> Fred: That's right. >> And very few have, very few others have built machine learning algorithms to run on a cluster, but as you were saying, you don't really have an optimizer for finding something where a couple of the algorithms would be fit optimally to solve a problem. Help us understand, then, how System ML solves a more general problem for, say, ensemble models and for scale out, I guess I'm, help us understand how System ML fits relative to Sparks ML lib and the more general problems it can solve. >> So, ML Live and a lot of other packages such as Sparking Water from H20, for example, provide you with a toolbox of algorithms and each of those algorithms has been hand tuned for a particular range of problem sizes and problem characteristics. This works great as long as the particular problem you're facing as a data scientist is a good match to that implementation that you have in your toolbox. What System ML provides is less like having a toolbox and more like having a machine shop. You can, you have a lot more flexibility, you have a lot more power, you can write down an algorithm as you would write it down if you were implementing it just to run on your laptop and then let the System ML optimizer take care of producing a parallel version of that algorithm that is customized to the characteristics of your cluster, customized to the characteristics of your data. >> So let me stop you right there, because I want to use an analogy that others might find easy to relate to for all the people who understand sequel and scale out sequel. So, the way you were describing it, it sounds like oh, if I were a sequel developer and I wanted to get at some data on my laptop, I would find it pretty easy to write the sequel to do that. Now, let's say I had a bunch of servers, each with it's own database, and I wanted to get data from each database. If I didn't have a scale out database, I would have to figure out physically how to go to each server in the cluster to get it. What I'm hearing for System ML is it will take that query that I might have written on my one server and it will transparently figure out how to scale that out, although in this case not queries, machine learning algorithms. >> The database analogy is very apt. Just like sequel and query optimization by allowing you to separate that logical description of what you're looking for from the physical description of how to get at it. Lets you have a parallel database with the exact same language as a single machine database. In System ML, because we have an optimizer that separates that logical description of the machine learning algorithm from the physical implementation, we can target a lot of parallel systems, we can also target a large server and the code, the code that implements the algorithm stays the same. >> Okay, now let's take that a step further. You refer to matrix math and I think linear algebra and a whole lot of other things that I never quite made it to since I was a humanities major but when we're talking about those things, my understanding is that those are primitives that Spark doesn't really implement so that if you wanted to do neural nets, which relies on some of those constructs for high performance, >> Fred: Yes. >> Then, um, that's not built into Spark. Can you get to that capability using System ML? >> Yes. System ML edits core, provides you with a library, provides you as a user with a library of machine, rather, linear algebra primitives, just like a language like r or a library like Mumpai gives you matrices and vectors and all of the operations you can do on top of those primitives. And just to be clear, linear algebra really is the language of machine learning. If you pick up a paper about an advanced machine learning algorithm, chances are the specification for what that algorithm does and how that algorithm works is going to be written in the paper literally in linear algebra and the implementation that was used in that paper is probably written in the language where linear algebra is built in, like r, like Mumpai. >> So it sounds to me like Spark has done the work of sort of the blocking and tackling of machine learning to run in parallel. And that's I mean, to be clear, since we haven't really talked about it, that's important when you're handling data at scale and you want to train, you know, models on very, very large data sets. But it sounds like when we want to go to some of the more advanced machine learning capabilities, the ones that today are making all the noise with, you know, speech to text, text to speech, natural language, understanding those neural network based capabilities are not built into the core Spark ML lib, that, would it be fair to say you could start getting at them through System ML? >> Yes, System ML is a much better way to do scalable linear algebra on top of Spark than the very limited linear algebra that's built into Spark. >> So alright, let's take the next step. Can System ML be grafted onto Spark in some way or would it have to be in an entirely new API that doesn't take, integrate with all the other Spark APIs? In a way, that has differentiated Spark, where each API is sort of accessible from every other. Can you tie System ML in or do the Spark guys have to build more primitives into their own sort of engine first? >> A lot of the work that we've done with the Spark Technology Center as part of bringing System ML into the Apache ecosystem has been to build a nice, tight integration with Apache Spark so you can pass Spark data frames directly into System ML you can get data frames back. Your System ML algorithm, once you've written it, in terms of one of System ML's main systematic languages it just plugs into Spark like all the algorithms that are built into Spark. >> Okay, so that's, that would keep Spark competitive with more advanced machine learning frameworks for a longer period of time, in other words, it wouldn't hit the wall the way if would if it encountered tensor flow from Google for Google's way of doing deep learning, Spark wouldn't hit the wall once it needed, like, a tensor flow as long as it had System ML so deeply integrated the way you're doing it. >> Right, with a system like System ML, you can quickly move into new domains of machine learning. So for example, this afternoon I'm going to give a talk with one of our machine learning developers, Mike Dusenberry, about our recent efforts to implement deep learning in System ML, like full scale, convolutional neural nets running on a cluster in parallel processing many gigabytes of images, and we implemented that with very little effort because we have this optimizer underneath that takes care of a lot of the details of how you get that data into the processing, how you get the data spread across the cluster, how you get the processing moved to the data or vice versa. All those decisions are taken care of in the optimizer, you just write down the linear algebra parts and let the system take care of it. That let us implement deep learning much more quickly than we would have if we had done it from scratch. >> So it's just this ongoing cadence of basically removing the infrastructure gut management from the data scientists and enabling them to concentrate really where their value is is on the algorithms themselves, so they don't have to worry about how many clusters it's running on, and that configuration kind of typical dev ops that we see on the regular development side, but now you're really bringing that into the machine learning space. >> That's right, Jeff. Personally, I find all the minutia of making a parallel algorithm worked really fascinating but a lot of people working in data science really see parallelism as a tool. They want to solve the data science problem and System ML lets you focus on solving the data science problem because the system takes care of the parallelism. >> You guys could go on in the weeds for probably three hours but we don't have enough coffee and we're going to set up a follow up time because you're both in San Francisco. But before we let you go, Fred, as you look forward into 2017, kind of the advances that you guys have done there at the IBM Spark Center in the city, what's kind of the next couple great hurdles that you're looking to cross, new challenges that are getting you up every morning that you're excited to come back a year from now and be able to say wow, these are the one or two things that we were able to take down in 2017? >> We're moving forward on several different fronts this year. On one front, we're helping to get the notebook experience with Spark notebooks consistent across the entire IBM product portfolio. We helped a lot with the rollout of notebooks on data science experience on z, for example, and we're working actively with the data science experience and with the Watson data platform. On the other hand, we're contributing to Spark 2.2. There are some exciting features, particularly in sequel that we're hoping to get into that release as well as some new improvements to ML Live. We're moving forward with Apache System ML, we just cut Version 0.13 of that. We're talking right now on the mailing list about getting System ML out of incubation, making it a full, top level project. And we're also continuing to help with the adoption of Apache Spark technology in the enterprise. Our latest focus has been on deep learning on Spark. >> Well, I think we found him! Smartest guy in the room. (laughter) Thanks for stopping by and good luck on your talk this afternoon. >> Thank you, Jeff. >> Absolutely. Alright, he's Fred Rice, he's George Gilbert, and I'm Jeff Rick, you're watching the Cube from Big Data SV, part of Big Data Week in San Jose, California. (upbeat music) (mellow music) >> Hi, I'm John Furrier, the cofounder of SiliconANGLE Media cohost of the Cube. I've been in the tech business since I was 19, first programming on mini computers.

Published Date : Mar 15 2017

SUMMARY :

it's the Cube, covering Big Data Silicon Valley 2017. in the back of the Faramount, come on by and say hello, in the San Francisco office specifically. and on the same floor a lot of developers from Watson to all the various verticals in which you guys play. of machine learning, the power of artificial intelligence or does most of the core work for, like, the sequel workload and have the optimizer take care of making those algorithms and I meant that in a good way, is that the models, the algorithms that are in there, and the more general problems it can solve. to that implementation that you have in your toolbox. in the cluster to get it. and the code, the code that implements the algorithm so that if you wanted to do neural nets, Can you get to that capability using System ML? and all of the operations you can do the ones that today are making all the noise with, you know, linear algebra on top of Spark than the very limited So alright, let's take the next step. System ML into the Apache ecosystem has been to build so deeply integrated the way you're doing it. and let the system take care of it. is on the algorithms themselves, so they don't have to worry because the system takes care of the parallelism. into 2017, kind of the advances that you guys have done of Apache Spark technology in the enterprise. Smartest guy in the room. and I'm Jeff Rick, you're watching the Cube cohost of the Cube.

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
Jeff Rick	PERSON	0.99+
George	PERSON	0.99+
Jeff	PERSON	0.99+
Fred Rice	PERSON	0.99+
Mike Dusenberry	PERSON	0.99+
IBM	ORGANIZATION	0.99+
2017	DATE	0.99+
San Francisco	LOCATION	0.99+
John Furrier	PERSON	0.99+
San Jose	LOCATION	0.99+
Rob Thomas	PERSON	0.99+
505 Howard Street	LOCATION	0.99+
Google	ORGANIZATION	0.99+
Frederick Reiss	PERSON	0.99+
Spark Technology Center	ORGANIZATION	0.99+
Fred	PERSON	0.99+
IBM Spark Technology Center	ORGANIZATION	0.99+
one	QUANTITY	0.99+
San Jose, California	LOCATION	0.99+
Spark 2.2	TITLE	0.99+
three hours	QUANTITY	0.99+
Watson	ORGANIZATION	0.99+
UC Berkeley	ORGANIZATION	0.99+
one server	QUANTITY	0.99+
Spark	TITLE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Python	TITLE	0.99+
each server	QUANTITY	0.99+
both	QUANTITY	0.99+
each	QUANTITY	0.99+
each database	QUANTITY	0.98+
Big Data Week	EVENT	0.98+
Pagoda Lounge	LOCATION	0.98+
Strata Hadoob Conference	EVENT	0.98+
System ML	TITLE	0.98+
Big Data SV	EVENT	0.97+
each API	QUANTITY	0.97+
ML Live	TITLE	0.96+
today	DATE	0.96+
Thomas Vehicles	ORGANIZATION	0.96+
Apache System ML	TITLE	0.95+
Big Data	EVENT	0.95+
Apache Spark	TITLE	0.94+
Watson for Marketing	ORGANIZATION	0.94+
Sparking Water	TITLE	0.94+
first	QUANTITY	0.94+
one front	QUANTITY	0.94+
Big Data SV 2016	EVENT	0.94+
IBM Spark Technology Center	ORGANIZATION	0.94+
about 25 designers	QUANTITY	0.93+

Ben Sharma, Tony Fisher, Zaloni - BigData SV 2017 - #BigDataSV - #theCUBE

>> Announcer: Live from San Jose, California, it's The Cube, covering Big Data Silicon Valley 20-17. (rhythmic music) >> Hey, welcome back, everyone. We're live in Silicon Valley for Big Data SV, Big Data Silicon Valley in conjunction with Strata + Hadoob. This is the week where it all happens in Silicon Valley around the emergence of the Big Data as it goes to the next level. The Cube is actually on the ground covering it like a blanket. I'm John Furrier. My cohost, George Gilbert with Boogie Bond. And our next guest, we have two executives from Zeloni, Ben Sharma, who's the founder and CEO, and Tony Fischer, SVP and strategy. Guys, welcome back to The Cube. Good to see you. >> Thank you for having us back. >> You guys are great guests. You're in New York for Big Data NYC, and a lot is going on, certainly, here, and it's just getting kicked off with Strata-Hadoob, they got the sessions today, but you guys have already got some news out there. Give us the update. What's the big discussion at the show? >> So yeah, 20-16 was a great year for us. A lot of growth. We tripled our customer base, and a lot of interest in data lake, as customers are going from say Pilot and POCs into production implementation so far though. And in conjunction with that, this week we launched what we call a solution named Data Lake in a Box, appropriately, right? So what that means is we're bringing the full stack together to customers, so that we can get a data lake up and running in eight weeks time frame, with enterprise create data ingestion from their source systems hydrated into the data lake and ready for analytics. >> So is it a pretty big box, and is it waterproof? (all laughing) I mean, this is the big discussion now, pun intended. But the data lake is evolving, so I wanted to get your take on it. This is kind of been a theme that's been leading up and now front and center here on The Cube. Already the data lake has changed, also we've heard, I think Dave Alante in New York said data swamp. But using the data is critical on a data lake. So as it goes to more mature model of leveraging the data, what are the key trends right now? What are you guys seeing? Because this is a hot topic that everyone is talking about. >> Well, that's a good distinction that we like to make, is the difference between a data swamp and a data lake. >> And a data lake is much more governed. It has the rigor, it has the automation, it has a lot of the concepts that people are used to from traditional architectures, only we apply them in the scale-out architecture. So we put together a maturity model that really maps out a customer's journey throughout the big data and the data lake experience. And each phase of this, we can see what the customer's doing, what their trends are and where they want to go, and we can advise to them the right way to move forward. And so a lot of the customers we see are kind of in kind of what we call the ignore stage. I'd say most of the people we talk to are just ignoring. They don't have things active, but they're doing a lot of research. They're trying to figure out what's next. And we want to move them from there. The next stage up is called store. And store is basically just the sandbox environment. "I'm going to stick stuff in there." "I'm going to hope something comes out of it." No collaboration. But then, moving forward, there's the managed phase, the automated phase, and the optimized phase. And our goal is to move them up into those phases as quickly as possible. And data lake in a box is an effort to do that, to leapfrog them into a managed data lake environment. >> So that's kind of where the swamp analogy comes in, because the data lake, the swamp is kind of dirty, where you can almost think, "Okay, the first step is store it." And then they get busy or they try to figure out how to operationalize it, and then it's kind of like, "Uh ..." So your point, they're trying to get to that. So you guys get 'em to that set up, and then move them quickly to value? Is that kind of the approach? >> Yeah. So, time to value is critical, right? So how do you reduce the time to insight from the time the data is produced by the date producer, till the time you can make the data available to the data consumer for analytics and downstream use cases. So that's kind of our core focus in bringing these solutions to the market. >> Dave often and I were talking, and George always talk about the value of data at the right time at the right place, is the critical lynch-pin for the value, whether it's an app-driven, or whatever. So the data lake, you never know what data in the data lake will need to be pulled out and put into either real time or an app. So you have to assume at any given moment there's going to be data value. >> Sure >> So that, conceptually, people can get that. But how do you make that happen? Because that's a really hard problem. How do you guys tackle that when a customer says, "Hey, I want to do the data lake. "I've got to have the coverage. "I got to know who's accessing stuff. "But at the end of the day, "I got to move the data to where it's valuable." >> Sure. So the approach we have taken is with an integrated platform with a common metadata layer. Metadata is the key. So, using this common metadata layer, being able to do managed ingestion from various different sources, being able to do data validation and data quality, being able to manage the life cycle of the data, being able to generate these insights about the data itself, so that you can use that effectively for data science or for downstream applications and use cases is critical based on our experience of taking these applications from, say, a POC pilot phase into a production phase. >> And what's the next step, once you guys get to that point with the metadata? Because, like, I get that, it's like everyone's got the metadata focus. Now, I'm the data engineer, the data NG or the geek, the supergeek and then you've got the data science, then the analysts, then there will probably be a new category, a bot or something AI will do something. But you can have a spectrum of applications on the data side. How do they get access to the metadata? Is it through the machine learning? Do you guys have anything unique there that makes that seamless or is that the end goal? >> Sure, do you want to take that? >> Yes sure, it's a multi-pronged answer, but I'll start and you can jump in. One of the things we provide as part of our overall platform is a product called Micah. And Micah is really the kind of on-ramp to the data. And all those people that you just named, we love them all, but their access to the data is through a self-service data preparation product, and key to that is the metadata repository. So, all the metadata is out there; we call it a catalog at that point, and so they can go in, look at the catalog, get a sense for the data, get an understanding for the form and function of the data, see who uses it, see where it's used, and determine if that's the data that they want, and if it is, they have the ability to refine it further, or they can put it in a shopping cart if they have access to it, they can get it immediately, they can refine it, if they don't have access to it, there's an automatic request that they can get access to it. And so it's a onramp concept, of having a card catalog of all the information that's out there, how it's being used, how it's been refined, to allow the end user to make sure that they've got the right data, they can be positioned for their ultimate application. >> And just to add to what Tony said, because we are using this common metadata layer, and capturing metadata every instance, if you will, we are serving it up to the data consumers, using a rich catalog, so that a lot of our enterprise customers are now starting to create what they consider a data marketplace or a data portal within their organization, so that they're able to catalog not just the data that's in the data lake, but also data that's in other data stores. And provide one single unified view of these data sets, so that your data scientists can come in and see is this a data set that I can use for my model building? What are the different attributes of this data set? What is the quality of the data? How fresh is the data? And those kind of traits, so that they are effective in their analytical journey. >> I think that's the key thing that's interesting to me, is that you're seeing the big data explosions over the past ten years, eight years, we've been covering The Cube since the dupe world started. But now, it's the data set world, so it's a big data set in this market. The data sets are the key because that's what data scientists want to wrangle around with, and sling data sets with whatever tooling they want to use. Is that kind of the same trend that you guys see? >> That's correct. And also what we're seeing in the marketplace, is that customers are moving from a single architecture to a distributed architecture, where they may have a hybrid environment with some things being instantiated in the Cloud, some things being on PRIM. So how do you not provide a unified interface across these multiple environments, and in a governed way, so that the right people have access to the right data, and it's not the data swamp. >> Okay, so lets go back to the maturity model because I like that framework. So now you've just complicated the heck out of it. Cause now you've got Cloud, and then on PRIM, and then now, how do you put that prism of maturity model, on now hybrid, so how does that cross-connect there? And a second follow-up to that is, where are the customers on this progress bar? I'm sure they're different by customer but, so, maturity model to the hybrid, and then trends in the customer base that you're seeing? >> Alright, I'll take the second one, and then you can take the first one, okay? So, the vast majority of the people that we work with, and the people, the prospects customers, analysts we've talked to, other industry dignitaries, they put the vast majority of the customers in the ignore stage. Really just doing their research. So a good 50% plus of most organizations are still in that stage. And then, the data swamp environment, that I'm using it to store stuff, hopefully I'll get something good out of it. That's another 25% of the population. And so, most of the customers are there, and we're trying to move them kind of rapidly up and into a managed and automated data lake environment. The other trend along these lines that we're seeing, that's pretty interesting, is the emergence of IT in the big data world. It used to be a business user's world, and business users built these sandboxes, and business users did what they wanted to. But now, we see organizations that are really starting to bring IT into the fold, because they need the governance, they need the automation, they need the type of rigor that they're used to, in other data environments, and has been lacking in the big data environment. >> And you've got the IOT code cracking the code on the IOT side which has created another dimension of complexity. On the numbers of the 50% that ignore, is that profile more for Fortune 1000? >> It's larger companies, it's Fortune, and Global 2000. >> Got it, okay, and the terms of the hybrid maturity model, how's that, and add a third dimension, IOT, we've got a multi-dimensional chess game going here. >> I think they way we think about it is, that they're different patterns of data sets coming in. So they could be batched, they could be files, or database extracts, or they could be streams, right? So as long as you think about a converged architecture that can handle these different patterns, then you can map different use cases whether they are IOT and streaming use cases versus what we are seeing is that a lot of companies are trying to replace their operational analytics platforms with a data lake environment, and they're building their operational analytics on top of the data lake, correct? So you need to think more from an abstraction layer, how do you abstract it out? Because one of the challenges that we see customers facing, is that they don't want to get sticky with one Cloud service provider because they may have multiple Cloud service providers, >> John: It's a multi-Cloud world right now. >> So how do you leverage that, where you have one Cloud service provider in one geo, another Cloud service provider in another geo, and still being able to have an abstraction layer on top of it, so that you're building applications? >> So do you guys provide that data layer across that abstraction? >> That is correct, yes, so we leverage the ecosystem, but what we do is add the data management and data governance layer, we provide that abstraction, so that you can be on PREM, you can be in Cloud service provider one, or Cloud service provider two. You still have the same controls, and same governance functions as you build your data lake environment. >> And this is consistent with some of the Cube interviews we had all day today, and other Cube interviews, where when you had the Cloud, you're renting basically, but you own your data. You get to have a nice ... And that metadata seems to be the key, that's the key, right? For everything. >> That's right. And now what we're seeing is that a lot of our Enterprise customers are looking at bringing in some of the public cloud infrastructure into their on-PRAM environment as they are going to be available in appliances and things like that, right? So how do you then make sure that whatever you're doing in a non-enterprise cloud environment you are also able to extend it to the enterprise-- >> And the consequences to the enterprise is that the enterprise multiple jobs, if they don't have a consistent data layer ... >> Sure, yeah. >> It's just more redundancy. >> Exactly. >> Not redundancy, duplication actually. >> Yeah, duplication and difficulty of rationalizing it together. >> So let me drill down into a little more detail on the transition between these sort of maturity phases? And then the movement into production apps. I'm curious to know, we've heard Tableau, XL, Power BI, Click I guess, being-- sort of adapting to being front ends to big data. But they don't, for their experience to work they can't really handle big data sets. So you need the MPP sequel database on the data lake. And I guess the question there is is there value to be gotten or measurable value to be gotten just from turning the data lake into you know, interactive BI kind of platform? And sort of as the first step along that maturity model. >> One of the patterns we were seeing is that serving LIR is becoming more and more mature in the data lake, so that earlier it used to be mainly batch type of workloads. Now, with MPP engines running on the data lake itself, you are able to connect your existing BI applications, whether it's Tableau, Click, Power BI, and others, to these engines so that you are able to get low-latency query response times and are able to slice-and-dice your data sets in the data lake itself. >> But you're essentially still, you have to sample the data. You can't handle the full data set unless you're working with something like Zoom Data. >> Yeah, so there are physical limitations obviously. And then there are also this next generation of BI tools which work in a converged manner in the data lake itself. So there's like Zoom Data, Arcadia, and others that are able to kind of run inside the data lake itself instead of you having to have an external environment like the other BI tools, so we see that as a pattern. But if you already are an enterprise, you have on board a BI platform, how do you leverage that with the data lake as part of the next-generation architecture is a key trend that we are seeing. >> So that your metadata helps make that from swamp to curated data lake. >> That's right, and not only that what we have done, as Tony was mentioning, in our Micah product we have a self-service catalog and then we provide a shopping cart experience where you can actually source data sets into the shopping cart, and we let them provision a sandbox. And when they provision the sandbox, they can actually launch Tableau or whatever the BI tool of choice is on that sandbox, so that they can actually-- and that sandbox could exist in the data lake or it could exist on a relational data store or an MPP data store that's outside of the data lake. That's part of your modern data architecture. >> But further to your point, if people have to throw out all of their decision support applications and their BI applications in order to change their data infrastructure, they're not going to do it. >> Understood. >> So you have to make that environment work and that's what Ben's referring to with a lot of the new accelerator tools and things that will sit on top of the data lake. >> Guys, thanks so much for coming on The Cube. Really appreciate it. I'll give you guys the final word in the segment ... What do you expect this week? I mean, obviously, we've been seeing the consolidation. You're starting to see the swim lanes of with Spark and Open Source and you see the cloud and IOT colliding, there's a huge intersection with deep learning, AI is certainly hyped up now beyond all recognition but it's essentially deep learning. Neural networks meets machine learning. That's been around before, but now freely available with Cloud and Compute. And so kind of a interesting dynamic that's rockin' the big data world. Your thoughts on what we're going to see this week and how that relates to the industry? >> I'll take a stab at it and you may feel free to jump in. I think what we'll see is that lot of customers that have been playing with big data for a couple of years are now getting to a point where what worked for one or two use cases now needs to be scaled out and provided at an enterprise scale. So they're looking at a managed and a governance layer to put on top of the platform. So they can enable machine learning and AI and all those use cases, because business is asking for them. Right? Business is asking for how they can bring intenser flow and run on the data lake itself, right? So we see those kind of requirements coming up more and more frequently. >> Awesome. Tony? >> What he said. >> And enterprise readiness certainly has to be table-- there's a lot of table stakes in the enterprise. It's not like, easy to get into, you can see Google kind of just putting their toe in the water with the Google cloud, tenser flow, great highlight they got spanner, so all these other things like latency rearing their heads again. So these are all kind of table stakes. >> Yeah, and the other thing, moving forward with respect to machine learning and some of the advanced algorithms, what we're doing now and some of the research we're doing is actually using machine learning to manage the data lake, which is a new concept, so when we get to the optimized phase of our maturity model, a lot of that has to do with self-correcting and self-automating. >> I need some machine learning and some AI, so does George and we need machine learning to watch the machine learn, and then algorithmists for algorithms. It's a crazy world, exciting time for us. >> Are we going to have a bot next time when we come here? (all laughing) >> We're going to chat off of messenger, we just came from south by southwest. Guys, thanks for coming on The Cube. Great insight and congratulations on the continued momentum. This is The Cube breakin' it down with experts, CEOs, entrepreneurs, all here inside The Cube. Big Data Sv, I'm John for George Gilbert. We'll be back after this short break. Thanks! (upbeat electronic music)

Published Date : Mar 14 2017

SUMMARY :

Announcer: Live from This is the week where it What's the big discussion at the show? hydrated into the data lake But the data lake is evolving, is the difference between a and the data lake experience. Is that kind of the approach? make the data available So the data lake, you never "But at the end of the day, So the approach we have taken is seamless or is that the end goal? One of the things we provide that's in the data lake, Is that kind of the same so that the right people have access And a second follow-up to that is, and the people, the prospects customers, On the numbers of the 50% that ignore, it's Fortune, and Global 2000. of the hybrid maturity model, of the data lake, correct? John: It's a multi-Cloud the data management and And that metadata seems to be the key, some of the public cloud And the consequences of rationalizing it together. database on the data lake. in the data lake itself. You can't handle the full data set manner in the data lake itself. So that your metadata helps make that exist in the data lake But further to your point, if So you have to make and how that relates to the industry? and run on the data lake itself, right? stakes in the enterprise. a lot of that has to and some AI, so does George and we need on the continued momentum.

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
Tony Fischer	PERSON	0.99+
one	QUANTITY	0.99+
Tony	PERSON	0.99+
Dave Alante	PERSON	0.99+
Tony Fisher	PERSON	0.99+
George	PERSON	0.99+
Ben Sharma	PERSON	0.99+
Dave	PERSON	0.99+
New York	LOCATION	0.99+
John Furrier	PERSON	0.99+
George Gilbert	PERSON	0.99+
John	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
Zeloni	PERSON	0.99+
Zaloni	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
50%	QUANTITY	0.99+
San Jose, California	LOCATION	0.99+
25%	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
eight weeks	QUANTITY	0.99+
two executives	QUANTITY	0.99+
first step	QUANTITY	0.99+
Tableau	TITLE	0.99+
eight years	QUANTITY	0.99+
today	DATE	0.99+
Big Data	ORGANIZATION	0.98+
two	QUANTITY	0.98+
this week	DATE	0.98+
second one	QUANTITY	0.98+
One	QUANTITY	0.98+
first one	QUANTITY	0.98+
each phase	QUANTITY	0.98+
Ben	PERSON	0.97+
NYC	LOCATION	0.97+
20-16	DATE	0.97+
Cloud	TITLE	0.97+
Strata	ORGANIZATION	0.97+
Big Data Sv	ORGANIZATION	0.97+
second	QUANTITY	0.96+
two use cases	QUANTITY	0.96+
Cube	ORGANIZATION	0.96+
third	QUANTITY	0.94+
The Cube	ORGANIZATION	0.91+
single architecture	QUANTITY	0.91+
Power	TITLE	0.9+
Micah	LOCATION	0.85+
Arcadia	TITLE	0.83+
Zoom Data	TITLE	0.83+
Big Data SV	ORGANIZATION	0.82+
Micah	PERSON	0.81+
Click	TITLE	0.8+
Strata-Hadoob	TITLE	0.8+
Zoom Data	TITLE	0.78+
Fortune	ORGANIZATION	0.78+
Spark	TITLE	0.78+
Power BI	TITLE	0.78+
#theCUBE	ORGANIZATION	0.77+
one geo	QUANTITY	0.76+
one single unified	QUANTITY	0.75+
Big Data Silicon Valley	ORGANIZATION	0.72+
Bond	ORGANIZATION	0.72+
Hadoob	ORGANIZATION	0.72+
POCs	ORGANIZATION	0.67+
PRIM	TITLE	0.66+
Data	ORGANIZATION	0.65+
lake	ORGANIZATION	0.6+
Pilot	ORGANIZATION	0.58+
XL	TITLE	0.58+
of years	QUANTITY	0.56+
Global	ORGANIZATION	0.55+

James Hamilton, AWS | AWS Re:Invent 2013

(mellow electronic music) >> Welcome back, we're here live in Las Vegas. This is SiliconANGLE and Wikibon's theCUBE, our flagship program. We go out to the events, extract the signal from the noise. We are live in Las Vegas at Amazon Web Services re:Invent conference, about developers, large-scale cloud, big data, the future. I'm John Furrier, the founder of SiliconANGLE. I'm joined by co-host, Dave Vellante, co-founder of Wikibon.org, and our guest is James Hamilton, VP and Distinguished Engineer at Amazon Web Services. Welcome to theCUBE. >> Well thank you very much. >> You're a tech athlete, certainly in our book, is a term we coined, because we love to use sports analogies You're kind of the cutting edge. You've been the business and technology innovating for many years going back to the database days at IBM, Microsoft, and now Amazon. You gave a great presentation at the analyst briefing. Very impressive. So I got to ask you the first question, when did you first get addicted to the notion of what Amazon could be? When did you first taste the Cool-Aide? >> Super good question. Couple different instances. One is I was general manager of exchange hosts and services and we were doing a decent job, but what I noticed was customers were loving it, we're expanding like mad, and I saw opportunity to improve by at least a factor of two I'm sorry, 10, it's just amazing. So that was a first hint that this is really important for customers. The second one was S3 was announced, and the storage price pretty much froze the whole industry. I've worked in storage all my life, I think I know what's possible in storage, and S3 was not possible. It was just like, what is this? And so, I started writing apps against it, I was just blown away. Super reliable. Unbelievably priced. I wrote a fairly substantial app, I got a bill for $7. Wow. So that's really the beginnings of where I knew this was going to change the world, and I've been, as you said, addicted to it since. >> So you also mentioned some stats there. We'll break it down, 'cause we love to talk about the software defined data center, which is basically not even at the hype stage yet. It's just like, it's still undefined, but software virtualization, network virtualization really is pushing that movement of the software focus, and that's essentially you guys are doing. You're talking about notifications and basically it's a large-scale systems problem. That you guys are building a global operating system as Andy Jassy would say. Well, he didn't say that directly, he said internet operating system, but if you believe that APIs are critical services. So I got to ask you that question around this notion of a data center, I mean come on, nobody's really going to give up their data center. It might change significantly, but you pointed out the data center costs are in the top three order, servers, power circulation systems, or cooling circulation, and then actual power itself. Is that right, did I get that right? >> Pretty close, pretty close. Servers dominate, and then after servers if you look at data centers together, that's power, cooling, and the building and facility itself. That is the number two cost, and the actual power itself is number three. >> So that's a huge issue. When we talk like CIOs, it's like can you please take the facility's budget off my back? For many reasons, one, it's going to be written off soon maybe. All kinds of financial issues around-- >> A lot of them don't see it, though, which is a problem. >> That is a problem, that is a problem. Real estate season, and then, yes. >> And then they go, "Ah, it's not my problem" so money just flies out the window. >> So it's obviously a cost improvement for you. So what are you guys doing in that area and what's your big ah-ha for the customers that you walk in the door and say, look, we have this cloud, we have this system and all those headaches can be, not shifted, or relieved if you will, some big asprin for them. What's the communication like? What do you talk to them about? >> Really it depends an awful lot on who it is. I mean, different people care about different things. What gets me excited is I know that this is the dominate cost of offering a service is all of this muck. It's all of this complexity, it's all of this high, high capital cost up front. Facility will run 200 million before there's servers in it. This is big money, and so from my perspective, taking that way from most companies is one contribution. Second contribution is, if you build a lot of data centers you get good at it, and so as a consequence of that I think we're building very good facilities. They're very reliable, and the costs are plummeting fast. That's a second contribution. Third contribution is because... because we're making capacity available to customers it means they don't have to predict two years in advance what they're going to need, and that means there's less wastage, and that's just good for the industry as a whole. >> So we're getting some questions on our crowd chat application. If you want to ask a question, ask him anything. It's kind of like Reddit. Go to crowdchat.net/reinvent. The first question came in was, "James, when do you think ARM will be in the data center?" >> Ah ha, that's a great question. Well, many people know that I'm super excited about ARM. It's early days, the reason why I'm excited is partly because I love seeing lots of players. I love seeing lots of innovation. I think that's what's making our industry so exciting right now. So that's one contribution that ARM brings. Another is if you look at the history of server-side computing, most of the innovation comes from the volume-driven, usually on clients first. The reason why X86 ended up in such a strong position is so many desktops we running X86 processors and as a consequence it became a great server processor. High R&D flow into it. ARM is in just about every device that everyone's carrying around. It's almost every disk drive, it's just super broadly deployed. And whenever you see a broadly deployed processor it means there's an opportunity to do something special for customers. I think it's good for the industry. But in a precise answer to your question, I really don't have one right now. It's something that we're deeply interested in and investigating deeply, but at this point it hasn't happened yet, but I'm excited by it. >> Do you think that... Two lines of questioning here. One is things that are applicable to AWS, other's just your knowledge of the industry and what you think. We talked about that yesterday with OCP, right? >> Yep. >> Not a right fit for us, but you applaud the effort. We should talk about that, too, but does splitting workloads up into little itty, bitty processors change the utilization factor and change the need for things like virtualization, you know? What do you think? >> Yeah, it's a good question. I first got excited about the price performance of micro-servers back in 2007. And at that time it was pretty easy to produce a win by going to a lower-powered processor. At that point memory bandwidth wasn't as good as it could be. It was actually hard on some workloads to fully use a processor. Intel's a very smart company, they've done great work on improving the memory bandwidth, and so today it's actually harder to produce a win, and so you kind of have workloads in classes. At the very, very high end we've got database workloads. They really love single-threaded performance, and performance really is king, but there are lots of highly parallel workloads where there's an opportunity for a big gain. I still think virtualization is probably something where the industry's going to want to be there, just because it brings so many operational advantages. >> So I got to ask the question. Yesterday we had Jason Stowe on, CEO of Cycle Computing, and he had an amazing thing that he did, sorry, trumping it out kids say, but it's not new to you, but it's new to us. He basically created a supercomputer and spun up hundreds of thousands of cores in 30 minutes, which is like insane, but he did it for like 30 grand. Which would've cost, if you try to provision it to the TUCO calculator or whatever your model, it'd be months and years, maybe, and years. But the thing that he said I want to get your point on and I'm going to ask you questions specifically on is, Spot instances were critical for him to do that, and the creativity of his solutions, so I got to ask you, did you see Spot pricing instances being a big deal, and what impact has that done to AWS' vision of large scale? >> I'm super excited by Spot. In fact, it's one of the reasons I joined Amazon. I went through a day of interviews, I met a bunch of really smart people doing interesting work. Someone probably shouldn't have talked to me about Spot because it hadn't been announced yet, and I just went, "This is brilliant! "This is absolutely brilliant!" It's taking the ideas from financial markets, where you've got high-value assets, and saying why don't we actually sell it off, make a market on the basis of that and sell it off? So two things happen that make Spot interesting. The first is an observation up front that poor utilization is basically the elephant in the room. Most folks can't use more than 12% to 15% of their overall server capacity, and so all the rest ends up being wasted. >> You said yesterday 30% is outstanding. It's like have a party. >> 30% probably means you're not measuring it well. >> Yeah, you're lying. >> It's real good, yeah, basically. So that means 70% or more is wasted, it's a crime. And so the first thing that says is, that one of the most powerful advertisements for cloud computing is if you bring a large number of non-correlated workloads together, what happens is when you're supporting a workload you've got to have enough capacity to support the peak, but you only get to monetize the average. And so as the peak to average gets further apart, you're wasting more. So when you bring a large number of non-correlated workloads together what happens is it flattens out just by itself. Without doing anything it flattens out, but there's still some ups and downs. And the Spot market is a way of filling in those ups and downs so we get as close to 100%. >> Is there certain workloads that fit the spot, obviously certain workloads might fit it, but what workloads don't fit the Spot price, because, I mean, it makes total sense and it's an arbitrage opportunity for excess capacity laying around, and it's price based on usage. So is there a workload, 'cause it'll be torrent up, torrent down, I mean, what's the use cases there? >> Workloads that don't operate well in an interrupted environment, that are very time-critical, those workloads shouldn't be run in Spot. It's just not what the resource is designed for. But workloads like the one that we were talking to with Cycle Computing are awesome, where you need large numbers of resources. If the workload needs to restart, that's absolutely fine, and price is really the focus. >> Okay, and question from crowd chat. "Ask James what are his thoughts "on commodity networking and merchant silicon." >> I think an awful lot about that. >> This guy knows you. (both laughing) >> Who's that from? >> It's your family. >> Yeah, exactly! >> They're watching. >> No, network commoditization is a phenomenal thing that the whole industry's needed that for 15 years. We've got a vertical ecosystem that's kind of frozen in time. Vertically-integrated ecosystem kind of frozen in time. Costs everywhere are falling except in networking. We just got to do something, and so it's happening. I'm real excited by that. It's really changing the Amazon business and what we can do for customers. >> Let's talk a little bit about server design, because I was fascinated yesterday listening to you talk how you've come full circle. Over the last decade, right, you started with what's got to be stripped down, basic commodity and now you're of a different mindset. So describe that, and then I have some follow-up questions for you. >> Yeah, I know what you're alluding to. Is years ago I used to argue you don't want hardware specialization, it's crazy. It's the magic's in software. You want to specialize software running on general-purpose processors, and that's because there was a very small number of servers out there, and I felt like it was the most nimble way to run. However today, in AWS when we're running ten of thousands of copies of a single type of server, hardware optimizations are absolutely vital. You end up getting a power-performance advantage at 10X. You can get a price-performance advantage that's substantial and so I've kind of gone full circle where now we're pulling more and more down into the hardware, and starting to do hardware optimizations for our customers. >> So heat density is a huge problem in data centers and server design. You showed a picture of a Quanta package yesterday. You didn't show us your server, said "I can't you ours," but you said, "but we blow this away, "and this is really good." But you describe that you're able to get around a lot of those problems because of the way you design data centers. >> Yep. >> Could you talk about that a little bit? >> Sure, sure, sure. One of the problems when you're building a server it could end up anywhere. It could end up in a beautiful data center that's super well engineered. It could end up on the end of a row on a very badly run data center. >> Or in a closet. >> Or in a closet. The air is recirculating, and so the servers have to be designed with huge headroom on cooling requirements, and they have to be able to operate in any of those environments without driving warranty costs for the vendors. We take a different approach. We say we're not going to build terrible data centers. We're going to build really good data centers and we're going to build servers that exploit the fact those data centers are good, and what happens is more value. We don't have to waste as much because we know that we don't have to operate in the closet. >> We got some more questions coming here by the way. This is awesome. This ask me anything crowd chat thing is going great. We got someone, he's from Nutanix, so he's a geek. He's been following your career for many years. I got to ask you about kind of the future of large-scale. So Spot, in his comment, David's comment, Spot instances prove that solutions like WMare's distributed power management are not valuable. Don't power off the most expensive asset. So, okay, that brings up an interesting point. I don't want to slam on BMWare right now, but I just wanted to bring to the next logical question which is this is a paradigm shift. That's a buzz word, but really a lot's happening that's new and innovative. And you guys are doing it and leading. What's next in the large-scale paradigm of computing and computer science? On the science-side you mentioned merchant silicon. Obviously that's, the genie's out of the bottle there, but what's around the corner? Is it the notifications at the scheduling? Was it virtualization, is it compiler design? What are some of the things that you see out on the horizon that you got your eyes on? >> That's interesting, I mean. I've got, if you name your area, and I'll you some interesting things happening in the area, and it's one of the cool things of being in the industry right now. Is that 10 years ago we had a relatively static, kind of slow-pace. You really didn't have to look that far ahead, because of anything was coming you'd see it coming for five years. Now if you ask me about power distribution, we've got tons of work going on in power distribution. We're researching different power distribution topologies. We're researching higher voltage distribution, direct current distribution. Haven't taken any of those steps yet, but we're were working in that. We've got a ton going on in networking. You'll see an announcement tomorrow of a new instance type that is got some interesting characteristics from a networking perspective. There's a lot going on. >> Let's pre-announce, no. >> Gary's over there like-- >> How 'about database, how 'about database? I mean, 10 years ago, John always says database was kind of boring. You go to a party say, oh welcome to database business, oh yeah, see ya. 25 years ago it was really interesting. >> Now you go to a party is like, hey ah! Have a drink! >> It a whole new ballgame, you guys are participating. Google Spanner is this crazy thing, right? So what are your thoughts on the state of the database business today, in memory, I mean. >> No, it's beautiful. I did a keynote at SIGMOD a few years ago and what I said is that 10 years ago Bruce Linsey, I used to work with him in the database world, Bruce Linsey called it polishing the round ball. It's just we're making everything a little, tiny bit better, and now it's fundamentally different. I mean what's happening right now is the database world, every year, if you stepped out for a year, you wouldn't recognize it. It's just, yeah, it's amazing. >> And DynamoDB has had rapid success. You know, we're big users of that. We actually built this app, crowd chat app that people are using on Hadoop and Hbase, and we immediately moved that to DynamoDB and your stack was just so much faster and scalable. So I got to ask you the-- >> And less labor. >> Yeah, yeah. So it's just been very reliable and all the other goodness of the elastic B socket and SQS, all that other good stuff we're working with node, et cetera So I got to ask you, the area that I want your opinion around the corner is versioning control. So at large-scale one of the challenges that we have is as we're pushin' new code, making sure that the integrated stack is completely updated and synchronized with open-source projects. So where does that fit into the scaling up? 'Cause at large scale, versioning control used to be easy to manage, but downloading software and putting in patches, but now you guys handle all that at scale. So that, I'm assuming there's some automation involved, some real tech involved, but how are you guys handling the future of making sure the code is all updated in the stack? >> It's a great question. It's super important from a security perspective that the code be up to date and current. It's super important from a customer perspective and you need to make sure that these upgrades are just non-disruptive. One customer, best answer I heard was yesterday from a customer was on a panel, they were asked how did they deal with Amazon's upgrades, and what she said is, "I didn't even know when they were happening. "I can't tell when they're happening." Exactly the right answer. That's exactly our goal. We monitor the heck out of all of our systems, and our goal, and boy we take it seriously, is we need to know any issue before a customer knows it. And if you fail on that promise, you'll meet Andy really quick. >> So some other paradigm questions coming in. Floyd asks, "Ask James what his opinion of cloud brokerage "companies such as Jamcracker or Graviton. "Do they have a place, or is it wrong thinking?" (James laughs) >> From my perspective, the bigger and richer the ecosystem, the happier our customers all are. It's all goodness. >> It's Darwinism, that's the answer. You know, the fit shall survive. No, but I think that brings up this new marketplace that Spot pricing came out of the woodwork. It's a paradigm that exists in other industries, apply it to cloud. So brokering of cloud might be something, especially with regional and geographical focuses. You can imagine a world of brokering. I mean, I don't know, I'm not qualified to answer that. >> Our goal, honestly, is to provide enough diversity of services that we completely satisfy customer's requirements, and that's what we intend to do. >> How do you guys think about the make versus buy? Are you at a point now where you say, you know what, we can make this stuff for our specific requirements better than we can get it off the shelf, or is that not the case? >> It changes every few minutes. It really does. >> So what are the parameters? >> Years ago when I joined the company we were buying servers from OEM suppliers, and they were doing some tailoring for our uses. It's gotten to the point now where that's not the right model and we have our own custom designs that are being built. We've now gotten to the point where some of the components in servers are being customized for us, partly because we're driving sufficient volume that it's justified, and partly because the partners that the component suppliers are happy to work with us directly and they want input from us. And so it's every year it's a little bit more specialized and that line's moving, so it's shifting towards specialization pretty quickly. >> So now I'm going to be replaced by the crowd, gettin' great questions, I'm going to be obsolete! No earbud, I got it right here. So the question's more of a fun one probably for you to answer, or just kind of lean back and kind of pull your hair out, but how the heck does AWS add so much infrastructure per day? How do you do it? >> It's a really interesting question. I kind of know how much infrastructure, I know abstractly how much infrastructure we put out every day, but when you actually think about this number in context, it's mind boggling. So here's the number. Here's the number. Every day, we deploy enough servers to support Amazon when it was a seven billion dollar company. You think of how many servers a seven billion dollar e-commerce company would actually require? Every day we deploy that many servers, and it's just shocking to me to think that the servers are in the logistics chain, they're being built, they're delivered to the appropriate data centers, there's back positions there, there's networking there, there's power there. I'm actually, every day I'm amazed to be quite honest with you. >> It's mind-boggling. And then for a while I was there, okay, wait a minute. Would that be Moors' Law? Uh no, not even in particular. 'Cause you said every day. Not every year, every day. >> Yeah, it really is. It's a shocking number and one, my definition of scale changes almost every day, where if you look at the number of customers that are trusting with their workloads today, that's what's driving that growth, it's phenomenal! >> We got to get wrapped up, but I got to ask the Hadoob World SQL over Hadoob question solutions. Obviously Hadoob is great, great for storing stuff, but now you're seeing hybrids come out. Again this comes back down to the, you can recognize the database world anymore if you were asleep for a year. So what's your take on that ecosystem? You guys have a lasting map or a decent a bunch of other things. There's some big data stuff going on. How do you, from a database perspective, how do you look at Hadoob and SQL over Hadoob? >> I personally love 'em both, and I love the diversity that's happening in the database world. There's some people that kind of have a religion and think it's crazy to do anything else. I think it's a good thing. Map reduce is particularly, I think, is a good thing, because it takes... First time I saw map reduce being used was actually a Google advertising engineer. And what I loved about his, I was actually talking to him about it, and what I loved is he had no idea how many servers he was using. If you ask me or anyone in the technology how many servers they're using, they know. And the beautiful thing is he's running multi-thousand node applications and he doesn't know. He doesn't care, he's solving advertising problems. And so I think it's good. I think there's a place for everything. >> Well my final question is asking guests this show. Put the bumper sticker on the car leaving re:Invent this year. What's it say? What does the bumper sticker say on the car? Summarize for the folks, what is the tagline this year? The vibe, and the focus? >> Yeah, for me this was the year. I mean, the business has been growing but this is the year where suddenly I'm seeing huge companies 100% dependent upon AWS or on track to be 100% dependent upon AWS. This is no longer an experiment, something people want to learn about. This is real, and this is happening. This is running real businesses. So it's real, baby! >> It's real baby, I like, that's the best bumper... James, distinguished guest now CUBE alum for us, thanks for coming on, you're a tech athlete. Great to have you, great success. Sounds like you got a lot of exciting things you're working on and that's always fun. And obviously Amazon is killing it, as we say in Silicon Valley. You guys are doing great, we love the product. We've been using it for crowd chats. Great stuff, thanks for coming on theCUBE. >> Thank you. >> We'll be right back with our next guest after this short break. This is live, exclusive coverage with siliconANGLE theCUBE. We'll be right back.

Published Date : Nov 14 2013

SUMMARY :

I'm John Furrier, the founder of SiliconANGLE. So I got to ask you the first question, and the storage price pretty much froze the whole industry. So I got to ask you that question around and the actual power itself is number three. can you please take the facility's budget off my back? A lot of them don't see it, That is a problem, that is a problem. so money just flies out the window. So what are you guys doing in that area and that's just good for the industry as a whole. "James, when do you think ARM will be in the data center?" of server-side computing, most of the innovation and what you think. and change the need for things and so you kind of have workloads in classes. and the creativity of his solutions, so I got to ask you, and so all the rest ends up being wasted. It's like have a party. And so as the peak to average and it's an arbitrage opportunity that's absolutely fine, and price is really the focus. Okay, and question from crowd chat. This guy knows you. that the whole industry's needed that for 15 years. Over the last decade, right, you started with It's the magic's in software. because of the way you design data centers. One of the problems when you're The air is recirculating, and so the servers I got to ask you about kind of the future of large-scale. and it's one of the cool things You go to a party say, oh welcome of the database business today, in memory, I mean. is the database world, every year, So I got to ask you the-- So at large-scale one of the challenges that we have is that the code be up to date and current. So some other paradigm questions coming in. From my perspective, the bigger and richer the ecosystem, It's Darwinism, that's the answer. diversity of services that we completely It really does. the component suppliers are happy to work with us So the question's more of a fun one that the servers are in the logistics chain, 'Cause you said every day. where if you look at the number of customers the Hadoob World SQL over Hadoob question solutions. and think it's crazy to do anything else. Summarize for the folks, what is the tagline this year? I mean, the business has been growing It's real baby, I like, that's the best bumper... This is live, exclusive coverage

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Erik Kaulberg	PERSON	0.99+
2017	DATE	0.99+
Jason Chamiak	PERSON	0.99+
Dave Volonte	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Rebecca	PERSON	0.99+
Marty Martin	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Jason	PERSON	0.99+
James	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Greg Muscurella	PERSON	0.99+
Erik	PERSON	0.99+
Melissa	PERSON	0.99+
Micheal	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Justin Warren	PERSON	0.99+
Michael Nicosia	PERSON	0.99+
Jason Stowe	PERSON	0.99+
Sonia Tagare	PERSON	0.99+
Aysegul	PERSON	0.99+
Michael	PERSON	0.99+
Prakash	PERSON	0.99+
John	PERSON	0.99+
Bruce Linsey	PERSON	0.99+
Denice Denton	PERSON	0.99+
Aysegul Gunduz	PERSON	0.99+
Roy	PERSON	0.99+
April 2018	DATE	0.99+
August of 2018	DATE	0.99+
Microsoft	ORGANIZATION	0.99+
Andy Jassy	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Australia	LOCATION	0.99+
Europe	LOCATION	0.99+
April of 2010	DATE	0.99+
Amazon Web Services	ORGANIZATION	0.99+
Japan	LOCATION	0.99+
Devin Dillon	PERSON	0.99+
National Science Foundation	ORGANIZATION	0.99+
Manhattan	LOCATION	0.99+
Scott	PERSON	0.99+
Greg	PERSON	0.99+
Alan Clark	PERSON	0.99+
Paul Galen	PERSON	0.99+
Google	ORGANIZATION	0.99+
Jamcracker	ORGANIZATION	0.99+
Tarek Madkour	PERSON	0.99+
Alan	PERSON	0.99+
Anita	PERSON	0.99+
1974	DATE	0.99+
John Ferrier	PERSON	0.99+
12	QUANTITY	0.99+
ViaWest	ORGANIZATION	0.99+
San Francisco	LOCATION	0.99+
2015	DATE	0.99+
James Hamilton	PERSON	0.99+
John Furrier	PERSON	0.99+
2007	DATE	0.99+
Stu Miniman	PERSON	0.99+
$10 million	QUANTITY	0.99+
December	DATE	0.99+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Hadoob: