Incompressible Encodings

>> Hello, my name is Daniel Wichs, I'm a senior scientist at NTT research and a professor at Northeastern University. Today I want to tell you about incompressible encodings. This is a recent work from Crypto 2020 and it's a joint work with Tal Moran. So let me start with a question. How much space would it take to store all of Wikipedia? So it turns out that you can download Wikipedia for offline use and some reasonable version of it is about 50 gigabytes in size. So as you'd expect, it's a lot of data, it's quite large. But there's another way to store Wikipedia which is just to store the link www.wikipedia.org that only takes 17 bytes. And for all intents and purposes as long as you have a connection to the internet storing this link is as good as storing the Wikipedia data. You can access a Wikipedia with this link whenever you want. And the point I want to make is that when it comes to public data like Wikipedia, even though the data is huge, it's trivial to compress it down because it is public just by storing a small link to it. And the question for this talk is, can we come up with an incompressible representation of public data like Wikipedia? In other words can we take Wikipedia and represent it in some way such that this representation requires the full 50 gigabytes of storage store, even for someone who has the link to the underlying Wikipedia data and can get the underlying data for free. So let me actually tell you what this means in more detail. So this is the notion of incompressible encodings that we'll focus on in this work. So incompressible encoding consists of an encoding algorithm and a decoding algorithm, these are public algorithms. There's no secret key. Anybody can run these algorithms. The encoding algorithm takes some data m, let's say the Wikipedia data and encodes it in some probabilistic randomized way to derive a codeword c. And the codeword c, you can think of it as just an alternate representation of the Wikipedia data. Anybody can come and decode the codeword to recover the underlying data m. And the correctness property we want here is that no matter what data you start with, if you encode the data m and then decode it, you get back the original data m. This should hold with probably one over the randomness of the encoding procedure. Now for security, we want to consider an adversary that knows the underlying data m, let's say has a link to Wikipedia and can access the Wikipedia data for free does not pay for storing it. The goal of the adversary is to compress this codeword that we created this new randomized representation of the Wikipedia data. So the adversary consists of two procedures a compression procedure and a decompression procedure. The compression procedure takes its input the codeword c and output some smaller compressed value w and the decompression procedure takes w and its goal is to recover the codeword c. And a security property says that no efficient adversary should be able to succeed in this game with better than negligible property. So there are two parameters of interest in this problem. One is the codeword size, which we'll denote by alpha, and ideally we want the codeword size alpha to be as close as possible to the original data size. In other words we don't want the encoding to add too much overhead to the data. The second parameter is the incompressibility parameter beta and that tells us how much space, how much storage and adversary needs to use in order to store the codeword. And ideally, we want this beta to be as close as possible to the codeword size alpha, which should also be as close as possible to the original data size. So I want to mention that there is a trivial construction of incompressible encodings that achieves very poor parameters. So the trivial construction is just take the data m and add some randomness, concatenate some randomness to it and store the original data m plus the concatenated randomness as the codeword. And now even an adversary that knows the underlying data m cannot compress the randomness. So the incompressibility, so we ensure that this construction is incompressible with incompressibility parameter beta that just corresponds to the size of this randomness we added. So essentially the adversary cannot compress the red part of the codeword. So this gets us a scheme where alpha the size of the codeword, is the original data size m plus the incompressible parameter beta. And it turns out that you cannot do better than this information theoretically. So this is not what we want for this we want to focus on what I will call good incompressible encodings. So here, the codeword size should be as close as possible to the data size, should be just one plus little o one of the data size. And the incompressibility should be as essential as large as the entire codeword the adversary cannot compress the codeword almost at all, the incompressible parameter beta is one minus little o one of the data size or the codeword size. And in essence, what this means is that we're somehow want to take the randomness of the encoding procedure and spread it around in some clever way throughout the codeword in such a way that's impossible for the adversary to separate out the randomness and the data, and only store the randomness and rely on the fact that it can get the data for free. We want to make sure it's impossible that adversary accesses essentially this entire code word which contains both the randomness and data and some carefully intertwined way and cannot compress it down using the fact that it knows the data parts. So this notion of incompressible encodings was defined actually in a prior work of Damgard-Ganesh and Orlandi from crypto 2019. They defined a variant of this notion, they had a different name for it. As a tool or a building block for a more complex cryptographic primitive that they called Proofs of Replicated Storage. And I'm not going to talk about what these are. But in this context of constructing these Proofs of Replicated Storage, they also constructed incompressible encodings albeit with some major caveats. So in particular, their construction relied on the random Oracle models, the heuristic construction and it was not known whether you could do this in the standard model, the encoding and decoding time of the construction was quadratic in the data size. And in particular, here we want to apply this, we want to use these types of incompressible encodings on fairly large data like Wikipedia data, 50 gigabytes in size. So quadratic runtime on such huge data is really impractical. And lastly the proof of security for their construction was flawed or someone incompleted, didn't consider general adversaries. And the slope was actually also noticed by concurrent work of Garg-Lu and Waters. And they managed to give a fixed proof for this construction but this required actually quite a lot of effort. It was a highly non-trivial and subtle proof to proof the original construction of Damgard-Ganesh and Orlandi secure. So in our work, we give a new construction of these types of incompressible encodings, our construction already achieved some form of security in the Common Reference String Model come Random String Model without the use of Random Oracles. We have a linear encoding time, linear in the data size. So we get rid of the quadratic and we have a fairly simple proof of security. In fact, I'm hoping to show you a slightly simplified form of it and the stock. We also give some lower bounds and negative results showing that our construction is optimal in some aspects and lastly we give a new application of this notion of incompressible encodings to something called big-key cryptography. And so I want to tell you about this application, hopefully it'll give you some intuition about why incompressible encodings are interesting and useful, and also some intuition about what their real goal is or what it is that they're trying to achieve. So, the application of big-key cryptography is concerned with the problem of system compromise. So, a computer system can become compromised either because the user downloads a malware or remote attacker manages to hack into it. And when this happens, the remote attacker gains control over the system and any cryptographic keys that are stored on the system can easily be exfiltrated or just downloaded out of the system by the attacker and therefore, any security that these cryptographic keys were meant to provide is going to be completely lost. And the idea of big-key cryptography is to mitigate against such attacks by making the secret keys intentionally huge on the order of many gigabytes to even terabytes. And the idea is that by having a very large secret key it would make it harder to exfiltrate such a secret key. Either because the adversary's bandwidth to the compromised system is just not large enough to exfiltrate such a large key or because it might not be cost-effective to have to download so much data of compromised system and store so much data to be able to use the key in the future, especially if the attacker wants to do this on some mass scale or because the system might have some other mechanisms let's say firewall that would detect such large amounts of leakage out of the compromised system and block it in some way. So there's been a lot of work on this idea building big-key crypto systems. So crypto systems where the secret key can be set arbitrarily huge and these crypto systems should testify two goals. So one is security, security should hold even if a large amount of data about the secret key is out, as long as it's not the entire secret key. So when you have an attacker download let's say 90% of the data of the secret key, the security of the system should be preserved. And the second property is that even though the secret key of the system can be huge, many gigabytes or terabytes, we still want the crypto system to remain efficient even though the secret is huge. And particularly this means that the crypto system can even read the entire secret key during each cryptographic operation because that would already be too inefficient. So it can only read some small number of bits of the secret key during each operation, then it performs. And so there's been a lot of work constructing these types of crypto systems but one common problem for all these works is that they require the user to waste a lot of their storage the storage on their computer in storing this huge secret key which is useless for any other purpose, other than providing security. And users might not want to do this. So that's the problem that we address here. And the new idea in our work is let's make the secret key useful instead of just having a secret key with some useless, random data that the cryptographic scheme picks, let's have a secret key that stores let's say the Wikipedia data at which a user might want to store in their system anyway or the user's movie collection or music collection et cetera and the data that the user would want to store on their system. Anyway, we want to convert it. We want to use that as the secret key. Now we think about this for a few seconds. Well, is it a good idea to use Wikipedia as a secret key? No, that sounds like a terrible idea. Wikipedia is not secret, it's public, it's online, Anyone can access it whenever they want. So it's not what we're suggesting. We're suggesting to use an incompressible encoding of Wikipedia as a secret key. Now, even though Wikipedia is public the incompressible encoding is randomized. And therefore the accuracy does not know the value of this incompressible encoding. Moreover, because it's incompressible in order for the adversary to steal, to exfiltrate the entire secret key, it would have to download a very large amount of data out of the compromised system. So there's some hope that this could provide security and we show how to build public encryption schemes and the setting that make use of a secret key which is an incompressible coding of some useful data like Wikipedia. So the secret key is an incompressible encoding of useful data and security ensures that the adversary will need to exfiltrate almost entire key to break the security of this critical system. So in the last few minutes, let me give you a very brief overview of our construction of incompressible encodings. And for this part, we're going to pretend we have something a real beautiful cryptographic object called Lossy Trapdoor Permutations. It turns out we don't quite have an object that's this beautiful and in the full construction, we relax this notion somewhat in order to be able to get our full construction. So Lossy Trapdoor Permutation is a function f we just key by some public key pk and it maps end bits to end bits. And we can sample the public key in one of two indistinguishable modes. In injective mode, this function of fPK is a permutation, and there's in fact, a trapdoor that allows us to invert it efficiently. And in the Lossy mode, if we sample the public in Lossy mode, then if we take some value, random value x and give you fpk of x, then this loses a lot of information about x. And in particular, the image size of the function is very small, much smaller than two to the n and so fpk of x does not contain all the information about x. Okay, so using this type of Lossy Trapdoor Permutation, here's the encoding of a message m using long random CRS come random string. So the encoding just consists of sampling the public key of this Lossy Trapdoor Permutation in injected mode, along with the trapdoor. And the encoding is just going to take the message m, x over it with a common reference string, come random string and invert the trapdoor permutation on this value. And then Coding will just be the public key and the inverse x. So this is something anybody can decode by just taking fpk of x, x over it with the CRS. And that will recover the original message. Now, to add the security, we're going to in the proof, we're going to switch to choosing the value x uniformly at random. So the x component of the codeword is going to be chosen uniformly random and we're going to set the CRS to be fpk of x, x over the message. And if you look at it for a second this distribution is exactly equivalent. It's just a different way of sampling the exact same distribution. And in particular, the relation between the CRS and X is preserved. Now in the second step, we're going to switch the public key to Lossy mode. And now when we do this, then the Codeword part, sorry then the CRS fpk of x, x over m only leaks some small amount of information about the random value x. In other words, even if that resists these, the CRS then the value x and the codeword has a lot of entropy. And because it has a lot of entropy it's incompressible. So what we did here is that we actually start to show that the code word and the CRS are indistinguishable from a different way of sampling them where we placed information about the message and the CRS and the codeword actually is truly random, has a lot of real entropy. And therefore even given the CRS the Codeword is incompressible that's the main idea behind the proof. I just want to make two remarks, our full constructions rely on a relaxed notion of Lossy Trapdoor Permutations which we're able to construct from either the decisional residuoisity or the learning with errors assumption. So in particular, we don't actually know how to construct trapdoor permutations from LWE from any postquantum assumption but the relaxed notion that we need for our actual construction, we can achieve from post quantum assumptions that get post quantum security. I want to mention two caveats of the construction. So one is that in order to make this work, the CRS needs to be long essentially as long as the message size. And also this construction achieves a weak form of selective security where the adversary decides to choose the message before seeing the CRS. And we show that both of these caveats are inherent. We show this by black-box separation and one can overcome them only in the random oracle model. Unless I want to just end with an interesting open question. I think one of the most interesting open questions in this area all of the constructions of incompressible encodings from our work and prior work required the use of some public key crypto assumptions some sort of trapdoor permutations or trapdoor functions. And one of the interesting open question is can you construct and incompressible encodings without relying on public key crypto, using one way functions or just the random oracle model. We conjecture this is not possible, but we don't know. So I want to end with that open questions and thank you very much for listening.

Published Date : Sep 21 2020

SUMMARY :

in order for the adversary to steal,

ENTITIES

Entity	Category	Confidence
Daniel Wichs	PERSON	0.99+
second step	QUANTITY	0.99+
NTT	ORGANIZATION	0.99+
two caveats	QUANTITY	0.99+
17 bytes	QUANTITY	0.99+
50 gigabytes	QUANTITY	0.99+
two remarks	QUANTITY	0.99+
both	QUANTITY	0.99+
two procedures	QUANTITY	0.99+
Wikipedia	ORGANIZATION	0.99+
www.wikipedia.org	OTHER	0.99+
two goals	QUANTITY	0.99+
second parameter	QUANTITY	0.99+
second property	QUANTITY	0.99+
each operation	QUANTITY	0.99+
two parameters	QUANTITY	0.98+
one	QUANTITY	0.98+
Orlandi	PERSON	0.98+
Tal Moran	PERSON	0.97+
Today	DATE	0.97+
one common problem	QUANTITY	0.97+
One	QUANTITY	0.97+
Garg-Lu	ORGANIZATION	0.96+
Damgard-Ganesh	PERSON	0.96+
Northeastern University	ORGANIZATION	0.96+
two	QUANTITY	0.95+
two indistinguishable modes	QUANTITY	0.94+
Crypto 2020	ORGANIZATION	0.94+
about 50 gigabytes	QUANTITY	0.94+
each cryptographic	QUANTITY	0.94+
CRS	ORGANIZATION	0.94+
Wikipedia	TITLE	0.93+
90% of the data	QUANTITY	0.89+
LWE	ORGANIZATION	0.89+
Oracle	ORGANIZATION	0.84+
terabytes	QUANTITY	0.83+
Waters	ORGANIZATION	0.79+
one way	QUANTITY	0.77+
seconds	QUANTITY	0.74+
Lossy Trapdoor	OTHER	0.71+
Proofs of Replicated Storage	OTHER	0.64+
2019	DATE	0.62+
second	QUANTITY	0.56+
much	QUANTITY	0.55+
lot of	QUANTITY	0.54+
caveats	QUANTITY	0.51+
gigabytes	QUANTITY	0.48+
crypto	TITLE	0.33+

June Yang, Google and Shailesh Shukla, Google | Google Cloud Next OnAir '20

>> Announcer: From around the globe, it's theCUBE. Covering Google Cloud Next on Air '20. >> Hi, I'm Stu Miniman. And this is theCUBE's coverage of Google Cloud Next On Air. One of the weeks that they had for the show is to dig deep into infrastructure, of course, one of the foundational pieces when we talk about cloud, so happy to welcome to the program, I've got two of the general managers for both compute and networking. First of all, welcome back one of our cube alumni, June Yang, who's the vice president of compute and also welcoming Shailesh Shukla who's the vice president and general manager of networking both with Google Cloud. Thank you both so much for joining us. >> Great to be here. >> Great to be here, thanks for inviting us Stu. >> So June, if I can start with, you know, one of the themes I heard in the keynote that you gave during the infrastructure week was talking about, we talked about meeting customers where they are, how do I get, you know, all of my applications that I have, obviously some of them are building new applications. Some of them I'm doing SaaS, but many of them, I have to say, how do I get it from where I am to where I want to be and then start taking advantage of cloud and modernization and new capabilities. So if you could, you know, what's new when it comes to migration from a Google Cloud standpoint and, you know, give us a little bit insight as to what you're hearing from your customers. >> Yeah, definitely happy to do so. I think for many of our customers, migration is really the first step, right? A lot of the applications on premise today so the goal is really how do I move from on prem to the cloud? So to that extend, I think we have announced a number of capabilities. And one of the programs that are very exciting that we have just launched is called RAMP program which stands for Google Cloud Rapid Assessment and Migration Program. So it's really kind of bundling a holistic approach of you know, kind of programs tooling and you know, as well as incentives altogether to really help customer with that kind of a journey, right? And then also on the product side, we have introduced a number of new capabilities to really ease that transition for customer to move from on premise to the cloud as well. One of the things we just announced is Google Cloud VMware Engine. And this is really, you know, we built as a native service inside Google as a (indistinct) to allow customer to run their VMware as a service on top of Google infrastructure. So customers can easily take their, you know, what's running on premise, that's running VMware today and move it to cloud was really no change whatsoever and really lift and shift. And your other point is really about a modernization, right? Cause most of our customers coming in today, it's not just about I'm running this as a way it is. It's also, how do I extract value out of this kind of capability? So we build this as a service so that customer can easily start using services like BigQuery to be able to extract data and insights out of this and to be able to give them additional advantages and to create new services and things like that. And for other customers who might want to be able to, you know, leverage our AI, ML capability, that's at their fingertips as well. So it's just really trying to make that process super easy. Another kind of class of workloads we see is really around SAP, right? That's our bread and butter for many enterprises. So customers are moving those out into the clouds and we've seen many examples really kind of really, allow customers to take the data that's sitting in SAP HANA and be able to extract more value out of those. Home Depot is a great example of those and where they're able to leverage the inquiry to take, you know, their stockouts and some of the inventory management and really to the next level, and really giving a customer a much better experience at the end of the day. So those are kind of just a few things that we're doing on that side to really make you a customer easy to lift and shift and then be able to modernize along the way. >> Well yeah, June, if I would like to dig in a little bit on the VMware piece that you talked about. I've been talking of VM-ware a bit lately, talking to some of their customers leveraging the VMware cloud offerings and that modernization is so important because the traditional way you think about virtualization was I stick something in a VM and I leave it there and of course customers, I want to be able to take advantage of the innovation and changes in the cloud. So it seems like things like your analytics and AI would be a natural fit for VMware customers to then get access to those services that you're offering. >> Yeah, absolutely. I think we have lots of customers, that's kind of want to differentiators that customers are looking for, right? I can buy my VMware in a variety of places, but I want to be able to take it to the next level. How do I use data as my differentiator? You know, one of the core missions as part of the Google mission is really how do we help customers to digitally transform and reimagine their business was a data power innovation, and that's kind of one key piece we know we want to focus on, and this is part of the reason why we built this as really a native service inside of Google Cloud so that you're going through the same council using, you know, accessing VMware engine, accessing BigQuery, accessing networking, firewalls, and so forth, all really seamlessly. And so it makes it really easy to be able to extend and modernize. >> All right, well, June one of the other things, anytime we come to the Cloud event is we know that there's going to be updates in some of the primary offerings. So when it comes to compute and storage, know there's a number of announcements there, probably more than we'll be able to cover in this, but give us some of the highlights. >> Yeah, let me give some highlights I mean, at the core of this is a really Google Compute Engine, and we're very excited we've introduced a number of new, what we call VM families, right? Essentially different UBM instances, that's catered towards different use cases and different kinds of workloads. So for example, we launched the N2D VM, so this is a set of VMs on EMD technology and really kind of provide excellent price performance benefit for customers and who can choose to go down that particular path. We're also just really introduced our A2 VM family. This is based on GPU accelerator optimized to VM. So we're the first ones in the market to introduce NVIDIA Ampere A 100. So for lots of customers who were really introduced, we're interesting, you know, use GPU to do their ML and AI type of analysis. This is a big help because it's got a better performance compared to the previous generation so they can run their models faster and turn it around and turn insights. >> Wonderful. Shailesh, of course we want to hear about the networking components to, you know, Google, very well known you know, everybody leverages Google's network and global reach so how about the update from your network side? >> Absolutely. Stu, let me give you a set of updates that we have announced at next conference. So first of all as you know, many customers choose Google Cloud for the scale, the reach, the performance and the elasticity that we provide and ultimately results in better user experience or customer experience. And the backbone of all of this capability is our private global backbone network, right? Which all of our cloud customers benefit from. The networking is extremely important to advance our customers digital journeys, the ones that June talked about, migration and modernization, as well as security, right? So to that end, we made several announcements. Let's talk about some of them. First we announced a new subsea cable called the Grace Hopper which will actually run between the U.S. on one side and UK on the other and Spain on another leg. And it's equipped with about 16 fiber pairs that will get completed in 2022. And it will allow for significant new capacity between the U.S. and Europe, right? Second Google Cloud CDN, it's one of our most popular and fast-growing service offerings. It now offers the capability to serve content from on prem, as well as other clouds especially for hybrid and multicloud deployments. This provides a tremendous amount of flexibility in where the content can be placed and overall content and application delivery. Third we have announced the expansion of our partnership with Cisco and it's we have announced this notion of Cisco SD-WAN Cloud Hub with Google Cloud. It's one of the first in the industry to actually create an automated end to end solution that intelligently and securely, you know, connects or bridges enterprise networks to any workload across multiple clouds and to other locations. Four, we announced a new capabilities in the network intelligence center. It's a platform that provides customers with unmatched visibility into their networks, along with proactive kind of network verification, security recommendations, and so on. There were two specific modules there, around firewall insights and performance dashboard that we announced in addition to the three that already existed. And finally, we have a range of really powerful announcements in the security front, as you know, security is one of our top priorities and our infrastructure and products are designed, built and operated with an end to end security framework and end to end security as a core design principle. Let me give you a few highlights. First, as part of making it easy for firewall management for our customers to manage firewall across multiple organizations, we announced hierarchical firewall. Second, in order to enable, you know, better security capability, we announced the notion of packet metering, right? So which is something that we announced earlier in the year, but it's now GA and allows customers to collect and inspect network traffic across multiple machine types without any overhead, right? Third is, in actually in our compute and security teams, we announced the capability to what we call as confidential VMs, which offer the ability to encrypt data while being processed. We have always had the capability to encrypt data at rest and while in motion, now we are the first in the industry to announce the ability to encrypt data even while it is being processed. So we are really, you know, pleased to offer that as part of our confidential computing portfolio. We also announced the ability to do a managed service around our cloud armor security portfolio for DDoS web application and bot detection, that's called Cloud Armor Managed Protection. And finally we also announced the capability called Private Service Connect that allows customers to connect effortlessly to other Google Cloud services or to third party SaaS applications while keeping their traffic secure and private over the, in kind of the broader internet. So we were really pleased to announce in number of, you know, very critical kind of announcements, products and capabilities and partnerships such as Cisco in order to further the modernization and migration for our customers. >> Yeah, one note I will make for our audience, you know, check the details on the website. I know some of the security features are now in data, many of the other things it's now general availability. Shailesh, follow up question I have for you is when I look in 2020, the internet patterns of traffic have changed drastically. You saw a very rapid shift, everyone had needed to work from home, there's been a lot of stresses and strains on the network, when I hear things like your CDN or your SD-WAN partnership with Cisco, I have to think that there's, you know, an impact on that. What are you seeing? What are you hearing from your customers? How are you helping them work through these rapid changes to be able to respond and still give people the, you know, the performance and reliability of traffic where they need it, when they need? >> Right, absolutely. This is a, you know, very important question and a very important topic, right? And when we saw the impact of COVID, you know, as you know Google's mission is to be, continue to be helpful to our customers, we actually invested and continue to invest in building out our CDN capability, our interconnect, the capacity in our network infrastructure, and so on, in order to provide better, for example distance learning, video conferencing, e-commerce, financial services and so on and we are proud to say that we were able to support a very significant expansion in the overall traffic, you know, on a global basis, right? In Google Clouds and Google's network without a hitch. So we are really proud to be able to say that. In addition there are other areas where we have been looking to help our customers. For example, high performance computing is a very interesting capability that many customers are using for things such as COVID research, right? So a good example is Northeastern University in Boston that has been using, you know, a sort of thousands of kind of preemptable virtual machines on Google Cloud to power very large scale and a data driven model and simulations to figure out how the travel restrictions and social distancing will actually impact the spread of the virus. That's an example of the way that we are trying to be helpful as part of the the broader global situation. >> Great. June, I have to imagine generally from infrastructure there've been a number of other impacts that Google Cloud has been helping your customers, any other examples that you'd like to share? >> Yeah, absolutely. I mean, if you look at the COVID impact, it impact different industries quite differently. We've seen certain industries that just really, their demand skyrocketed overnight. For example you know, I take one of our internal customer, Google, you know, Google Meet, which is Google's video conferencing service, we just announced that we saw a 30X increase over the last few months since COVID has started. And this is all running on Google infrastructure. And we've seen similar kind of a pattern for a number of our customers on the media entertainment area, and certainly video conferencing and so forth. And we've been able to scale to beat these key customer's demand and to make sure that they have the agility they need to meet the demand from their customers and so we're definitely very proud to be part of the, you know, part of this effort to kind of enable folks to be able to work from home, to be able to study from home and so on and so forth. You know, for some customers, you know, the whole business continuity is really a big deal for them, you know, where's the whole work from home a mandate. So for example, one of our customers Telus International, it's a Canadian telecommunication company, because of COVID they had to, you know, be able to transition tens and thousands of employees to work on the whole model immediately. And they were able to work with Google Cloud and our partner, itopia, who is specializing in virtual desktop and application. So overnight, literally in 24 hours, we're able to deploy a fully configured virtual desktop environments from Google Cloud and allow their employees to come back to service. So that's just one example, there's hundreds and thousands more of those examples, and it's been very heartening to be part of this, you know, Google to be helpful to our customer. >> Great. Well, I want to let both of you just have the final word when you're talking to customers here in 2020, how should they be thinking of Google Cloud? How do you make sure that you're helping them in differentiating from some of the other solutions and the environment? May be June if we could start with you. >> Sure, so at Google Cloud, our goal is to make it easy for anyone you know, whether you're big big enterprises or small startups, to be able to build your applications, to be able to innovate and harness the power of data to extract additional information, insights, and to be able to scale your business. As an infrastructure provider, we want to deliver the best infrastructure to run all customers application and on a global basis, reliably and securely. Definitely getting more and more complicated and you know, as we kind of spread our capacity to different locations, it gets more complicated from a logistics and a perspective as well so we want to help to do the heavy lifting around the infrastructure, so that from a customer, they can simply consume our infrastructure as a service and be able to focus on their businesses and not worry about the infrastructure side. So, you know, that's our goal, we'll do the plumbing work and we'll allow customers innovate on top of that. >> Right. You know, June you said that very well, right? Distributed infrastructure is a key part of our strategy to help our customers. In addition, we also provide the platform capability. So essentially a digital transformation platform that manages data at scale to help, you know, develop and modernize the applications, right? And finally we layer on top of that, a suite of industry specific solutions that deliver kind of these digital capabilities across each of the key verticals, such as financial services or telecommunications or media and entertainment, retail, healthcare, et cetera. So that's how combining together infrastructure platform and solutions we are able to help customers in their modernization journeys. >> All right, June and Shailesh, thank you so much for sharing the updates, congratulations to your teams on the progress, and absolutely look forward to hearing more in the future. >> Great, thank you Stu. >> Thank you Stu. >> All right, and stay tuned for more coverage of Google Cloud Next On Air '20. I'm Stu Miniman, thank you for watching theCUBE. (Upbeat music)

Published Date : Aug 25 2020

SUMMARY :

the globe, it's theCUBE. so happy to welcome to the program, Great to be here, So June, if I can start with, you know, and to be able to give and changes in the cloud. And so it makes it really easy to be able there's going to be updates to the previous generation very well known you know, Second, in order to enable, you know, and still give people the, you know, and simulations to figure out June, I have to imagine and to make sure that they and the environment? and to be able to scale your business. scale to help, you know, to hearing more in the future. you for watching theCUBE.

ENTITIES

Entity	Category	Confidence
Cisco	ORGANIZATION	0.99+
Shailesh	PERSON	0.99+
Telus International	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
tens	QUANTITY	0.99+
Shailesh Shukla	PERSON	0.99+
2020	DATE	0.99+
2022	DATE	0.99+
June Yang	PERSON	0.99+
Boston	LOCATION	0.99+
24 hours	QUANTITY	0.99+
two	QUANTITY	0.99+
June	DATE	0.99+
three	QUANTITY	0.99+
June	PERSON	0.99+
hundreds	QUANTITY	0.99+
First	QUANTITY	0.99+
Third	QUANTITY	0.99+
Northeastern University	ORGANIZATION	0.99+
Stu Miniman	PERSON	0.99+
Second	QUANTITY	0.99+
first	QUANTITY	0.99+
Home Depot	ORGANIZATION	0.99+
U.S.	LOCATION	0.99+
Europe	LOCATION	0.99+
both	QUANTITY	0.99+
Spain	LOCATION	0.99+
SAP HANA	TITLE	0.98+
NVIDIA	ORGANIZATION	0.98+
one example	QUANTITY	0.98+
Four	QUANTITY	0.98+
UK	LOCATION	0.98+
Shailesh Shukla	PERSON	0.98+
one	QUANTITY	0.98+
One	QUANTITY	0.98+
today	DATE	0.98+
first step	QUANTITY	0.98+
two specific modules	QUANTITY	0.97+
Ampere A 100	COMMERCIAL_ITEM	0.97+
thousands	QUANTITY	0.97+
one note	QUANTITY	0.96+
Stu	PERSON	0.96+
one key	QUANTITY	0.96+
Google Cloud Next	TITLE	0.96+
BigQuery	TITLE	0.96+
one side	QUANTITY	0.95+
each	QUANTITY	0.94+
Grace Hopper	COMMERCIAL_ITEM	0.93+
itopia	ORGANIZATION	0.91+
Google Cloud	TITLE	0.9+
about 16 fiber pairs	QUANTITY	0.9+
first ones	QUANTITY	0.89+

Aaron T. Myers Cloudera Software Engineer Talking Cloudera & Hadooop

>>so erin you're a technique for a Cloudera, you're a whiz kid from Brown, you have, how many Brown people are engineers here at Cloudera >>as of monday, we have five full timers and two interns at the moment and we're trying to hire more all the time. >>Mhm. So how many interns? >>Uh two interns from Brown this this summer? A few more from other schools? Cool, >>I'm john furry with silicon angle dot com. Silicon angle dot tv. We're here in the cloud era office in my little mini studio hasn't been built out yet, It was studio, we had to break it down for a doctor, ralph kimball, not richard Kimble from uh I called him on twitter but coupon um but uh the data warehouse guru was in here um and you guys are attracting a lot of talent erin so tell us a little bit about, you know, how Claudia is making it happen and what's the big deal here, people smart here, it's mature, it's not the first time around this company, this company has some some senior execs and there's been a lot, a lot of people uh in the market who have been talking about uh you know, a lot of first time entrepreneurs doing their startups and I've been hearing for some folks in in the, in the trenches that there's been a frustration and start ups out there, that there's a lot of first time entrepreneurs and everyone wants to be the next twitter and there's some kind of companies that are straddling failure out there? And and I was having that conversation with someone just today and I said, they said, what's it like Cloudera and I said, uh, this is not the first time crew here in Cloudera. So, uh, share with the folks out there, what you're seeing for Cloudera and the management team. >>Sure. Well, one of the most attractive parts about working Cloudera for me, one of the reasons I, I really came here was have been incredibly experienced management team, Mike Charles, they've all there at the top of this Oregon, they have all done this before they founded startups, Growing startups, old startups and uh, especially in contrast with my, the place where I worked previously. Uh, the amount of experience here is just tremendous. You see them not making mistakes where I'm sure others would. >>And I mean, Mike Olson is veteran. I mean he's been, he's an adviser to start ups. I know he's been in some investors. Amer was obviously PhD candidates bolted out the startup, sold it to yahoo, worked at, yahoo, came back finish his PhD at stanford under Mendel over there in the PhD program over this, we banged in a speech. He came back entrepreneur residents, Excel partners. Now it does Cloudera. Um, when did you join the company and just take us through who you are and when you join Cloudera, I want your background. >>Sure. So I, I joined a little over a year ago is about 30 people at the time. Uh, I came from a small start up of the music online music store in new york city um uh, which doesn't really exist all that much anymore. Um but you know, I I sort of followed my other colleagues from Brown who worked here um was really sold by the management team and also by the tremendous market opportunity that that Hadoop has right now. Uh Cloudera was very much the first commercial player there um which is really a unique experience and I think you've covered this pretty well before. I think we all around here believe that uh the markets only growing. Um and we're going to see the market and the big data market in general get bigger and bigger in the next few years. >>So, so obviously computer science is all the rage and and I'm particularly proud of hangout, we've had conversations in the hallway while you're tweeting about this and that. Um, but you know, silicon angles home is here, we've had, I've had a chance to watch you and the other guys here grow from, you know, from your other office was a san mateo or san Bruno somewhere in there. Like >>uh it was originally in burlingame, then we relocate the headquarters Palo Alto and now we have a satellite up in san Francisco. >>So you guys bolted out. You know, you have a full on blow in san Francisco office. So um there was a big busting at the seams here in Palo Alto people commuting down uh even building their burning man. Uh >>Oh yeah sure >>skits here and they're constructing their their homes here, but burning man, so we're doing that in san Francisco, what's the vibe like in san Francisco, tell us what's going on >>in san Francisco, san Francisco is great. It's, I'm I live in san Francisco as do a lot of us. About half the engineering team works up there now. Um you know we're running out of space there certainly. Um and you're already, oh yeah, oh yeah, we're hiring as fast as we absolutely can. Um so definitely not space to build the burning man huts there like like there is down, down in Palo Alto but it's great up there. >>What are you working on right now for project insurance? The computer science is one of the hot topics we've been covering on silicon angle, taking more of a social angle, social media has uh you know, moves from this pr kind of, you know, check in facebook fan page to hype to kind of a real deal social marketplace where you know data, social data, gestural data, mobile data geo data data is the center of the value proposition. So you live that every day. So talk about your view on the computer science landscape around data and why it's such a big deal. >>Oh sure. Uh I think data is sort of one of those uh fundamental uh things that can be uh mind for value across every industry, there's there's no industry out there that can't benefit from better understanding what their customers are doing, what their competitors are doing etcetera. And that's sort of the the unique value proposition of, you know, stuff like Hadoop. Um truly we we see interest from every sector that exists, which is great as for what the project that I'm specifically working on right now, I primarily work on H. D. F. S, which is the Hadoop distributed file system underlies pretty much all the other um projects in the Hadoop ecosystem. Uh and I'm particularly working with uh other colleagues at Cloudera and at other companies, yahoo and facebook on high availability for H. D. F. S, which has been um in some deployments is a serious concern. Hadoop is primarily a batch processing system, so it's less of a concern than in others. Um but when you start talking about running H base, which needs to be up all the time serving live traffic than having highly available H DFS is uh necessity and we're looking forward to delivering that >>talk about the criticism that H. D. F. S has been having. Um Well, I wouldn't say criticism. I mean, it's been a great, great product that produced the HDs, a core parts of how do you guys been contributing to the standard of Apache, that's no secret to the folks out there, that cloud area leads that effort. Um but there's new companies out there kind of trying a new approach and they're saying they're doing it better, what are they saying in terms and what's really happening? So, you know, there's some argument like, oh, we can do it better. And what's the what, why are they doing it, that was just to make money do a new venture, or is that, what's your opinion on that? Yeah, >>sure. I mean, I think it's natural to to want to go after uh parts of the core Hadoop system and say, you know, Hadoop is a great ecosystem, but what if we just swapped out this part or swapped out that part, couldn't couldn't we get some some really easy gains. Um and you know, sometimes that will be true. I have confidence that that that just will not simply not be true in in the very near future. One of the great benefits about Apache, Hadoop being open source is that we have a huge worldwide network of developers working at some of the best engineering organizations in the world who are all collaborating on this stuff. Um and, you know, I firmly believe that the collaborative open source process produces the best software and that's that's what Hadoop is at its very core. >>What about the arguments are saying that, oh, I need to commercialize it differently for my installed base bolt on a little proprietary extensions? Um That's legitimate argument. TMC might take that approach or um you know, map are I was trying to trying to rewrite uh H. T. F. >>S. To me, is >>it legitimate? I mean is there fighting going on in the standards? Maybe that's a political question you might want to answer. But give me a shot. >>I mean the Hadoop uh isn't there's no open standard for Hadoop. You can't say like this is uh this is like do compatible or anything like that. But you know what you can say is like this is Apache Hadoop. Uh And so in that sense there's no there's no fighting to be had there. Um Yeah, >>so yeah. Who um struggling as a company. But you know, there's a strong head Duke D. N. A. At yahoo, certainly, I talked with the the founder of the startup. Horton works just announced today that they have a new board member. He's the guy who's the Ceo of Horton works and now on bluster, I'm sorry, cluster announced they have um rob from benchmark on the board. Uh He's the Ceo of Horton works and and one of my not criticisms but points about Horton was this guy's an engineer, never run a company before. He's no Mike Olson. Okay, so you know, Michaelson has a long experience. So this guy comes into running and he's obviously in in open source, is that good for Yahoo and open sources. He they say they're going to continue to invest in Hadoop? They clearly are are still using a lot of Hadoop certainly. Um how is that changing Apache, is that causing more um consolidation, is that causing more energy? What's your view on the whole Horton works? Think >>um you know, yahoo is uh has been and will continue to be a huge contributor. Hadoop, they uh I can't say for sure, but I feel pretty confident that they have more data under management under Hadoop than anyone else in the world and there's no question in my mind that they'll continue to invest huge amounts of both key way effort and engineering effort and uh all of the things that Hadoop needs to to advance. Um I'm sure that Horton works will continue to work very closely with with yahoo. Um And you know, we're excited to see um more and more contributors to to Hadoop um both from Horton works and from yahoo proper. >>Cool, Well, I just want to clarify for the folks out there who don't understand what this whole yahoo thing is, It was not a spin out, these were key Hadoop core guys who left the company to form a startup of which yahoo financed with benchmark capital. So, yahoo is clearly and told me and reaffirm that with me that they are clearly investing more in Hadoop internally as well. So there's more people inside, yahoo that work on Hadoop than they are in the entire Horton's work company. So that's very clear. So just to clear that up out there. Um erin. so you're you're a young gun, right? You're a young whiz like Todd madam on here, explain to the folks out there um a little bit older maybe guys in their thirties or C IOS a lot of people are doing, you know, they're kicking the tires on big data, they're hearing about real time analytics, they're hearing about benefits have never heard before. Uh Dave a lot and I on the cube talk about, you know, the transformations that are going on, you're seeing AMC getting into big data, everyone's transforming at the enterprise level and service provider. What explains the folks why Hadoop is so important. Why is that? Do if not the fastest or one of the fastest growing projects in Apache ever? Sure. Even faster than the web server project, which is one of the better, >>better bigger ones. >>Why is the dupes and explain to them what it is? Well, you know, >>it's been it's pretty well covered that there's been an explosion of data that more data is produced every every year over and over. We talk about exabytes which is a quantity of data that is so large that pretty much no one can really theoretically comprehend it. Um and more and more uh organizations want to store and process and learn from, you know, get insights from that data um in addition to just the explosion of data um you know that there is simply more data, organizations are less willing to discard data. One of the beauties of Hadoop is truly that it's so very inexpensive per terabyte to store data that you don't have to think up front about what you want to store, what you want to discard, store it all and figure out later what is the most useful bits we call that sort of schema on read. Um as opposed to, you know, figuring out the schema a priority. Um and that is a very powerful shift in dynamics of data storage in general. And I think that's very attractive to all sorts of organizations. >>Your, I'll see a Brown graduate and you have some interns from Brown to Brown um, Premier computer science program almost as good as when I went to school at Northeastern University. >>Um >>you know, the unsung heroes of computer science only kidding Brown's great program, but you know, cutting edge computer science areas known as obviously leading in a lot of the computer science areas do in general is known that you gotta be pretty savvy to be either masters level PhD to kind of play in this area? Not a lot of adoption, what I call the grassroots developers. What's your vision and how do you see the computer science, younger generation, even younger than you kind of growing up into this because those tools aren't yet developed. You still got to be, you're pretty strong from a computer science perspective and also explained to the folks who aren't necessarily at the browns of the world or getting into computer science, what about, what is that this revolution about and where is it going? What are some of the things you see happening around the corner that that might not be obvious. >>Sure there's a few questions there. Um part of it is how do people coming out of college get into this thing, It's not uh taught all that much in school, How do how do you sort of make the leap from uh the standard computer science curriculum into this sort of thing? And um you know, part of it is that really we're seeing more and more schools offering distributed computing classes or they have grids available um to to do this stuff there there is some research coming out of Brown actually and lots of other schools about Hadoop proper in the behavior of Hadoop under failure scenarios, that sort of stuff, which is very interesting. Google uh actually has classes that they teach, I believe in conjunction with the University of Washington um where they teach undergraduates and your master's level, graduate students about mass produced and distributed computing and they actually use Hadoop to do it because it is the architecture of Hadoop is modeled after um >>uh >>google's internal infrastructure. Um So you know that that's that's one way we're seeing more and more people who are just coming out of college who have distributed systems uh knowledge like this? Um Another question? the other part of the question you asked is how does um how does the ordinary developer get into this stuff? And the answer is we're working hard, you know, we and others in the hindu community are working hard on making it, making her do just much easier to consume. We released, you cover this fair bit, the ECM Express project that lets you install Hadoop with just minimal effort as close to 11 click as possible. Um and there's lots of um sort of layers built on top of Hadoop to make it more easily consumed by developers Hive uh sort of sequel like interface on top of mass produce. And Pig has its own DSL for programming against mass produce. Um so you don't have to write heart, you don't have to write straight map produced code, anything like that. Uh and it's getting easier for operators every day. >>Well, I mean, evolution was, I mean, you guys actually working on that cloud era. Um what about what about some of the abstractions? You're seeing those big the Rage is, you know, look back a year ago VM World coming up and uh little plugs looking angle dot tv will be broadcasting live and at VM World. Um you know, he has been on the Q XV m where um Spring Source was a big announcement that they made. Um, Haruka brought by Salesforce Cloud Software frameworks are big, what does that look like and how does it relate to do and the ecosystem around Hadoop where, you know, the rage is the software frameworks and networks kind of collide and you got the you got the kind of the intersection of, you know, software frameworks and networks obviously, you know, in the big players, we talk about E M C. And these guys, it's clear that they realize that software is going to be their key differentiator. So it's got to get to a framework stand, what is Hadoop and Apache talking about this kind of uh, evolution for for Hadoop. >>Sure. Well, you know, I think we're seeing very much the commoditization of hardware. Um, you just can't buy bigger and bigger computers anymore. They just don't exist. So you're going to need something that can take a lot of little computers and make it look like one big computer. And that's what Hadoop is especially good at. Um we talk about scaling out instead of scaling up, you can just buy more relatively inexpensive computers. Uh and that's great. And sort of the beauty of Hadoop, um, is that it will grow linearly as your data set as your um, your your scale, your traffic, whatever grows. Um and you don't have to have this exponential price increase of buying bigger and bigger computers, You can just buy more. Um and that that's sort of the beauty of it is a software framework that if you write against it. Um you don't have to think about the scaling anymore. It will do that for you. >>Okay. The question for you, it's gonna kind of a weird question but try to tackle it. You're at a party having a few cocktails, having a few beers with your buddies and your buddies who works at a big enterprise says man we've got all this legacy structured data systems, I need to implement some big data strategy, all this stuff. What do I do? >>Sure, sure. Um Not the question I thought you were going to ask me that you >>were a g rated program here. >>Okay. I thought you were gonna ask me, how do I explain what I do to you know people that we'll get to that next. Okay. Um Yeah, I mean I would say that the first thing to do is to implement a start, start small, implement a proof of concept, get a subset of the data that you would like to analyze, put it, put Hadoop on a few machines, four or five, something like that and start writing some hive queries, start writing some some pig scripts and I think you'll you know pretty quickly and easily see the value that you can get out of it and you can do so with the knowledge that when you do want to operate over your entire data set, you will absolutely be able to trivially scale to that size. >>Okay. So now the question that I want to ask is that you're at a party and I want to say, what do you >>do? You usually tell people in my hedge fund manager? No but seriously um I I tell people I work on distributed supercomputers. Software for distributed supercomputers and that people have some idea what distributed means and supercomputers and they figure that out. >>So final question for I know you gotta go get back to programming uh some code here. Um what's the future of Hadoop in the sense of from a developer standpoint? I was having a conversation with a developer who's a big data jockey and talking about Miss kelly gets anything and get his hands on G. O. Data, text data because the data data junkie and he says I just don't know what to build. Um What are some of the enabling apps that you may see out there and or you have just conceiving just brainstorming out there, what's possible with with data, can you envision the next five years, what are you gonna see evolve and what some of the coolest things you've seen that might that are happening right now. >>Sure. Sure. I mean I think you're going to see uh just the front ends to these things getting just easier and easier and easier to interact with and at some point you won't even know that you're interacting with a Hadoop cluster that will be the engine underneath the hood but you know, you'll you'll be uh from your perspective you'll be driving a Ferrari and by that I mean you know, standard B. I tool, standard sequel query language. Um we'll all be implemented on top of this stuff and you know from that perspective you could implement, you know, really anything you want. Um We're seeing a lot of great work coming out of just identifying trends amongst masses of data that you know, if you tried to analyze it with any other tool, you'd either have to distill it down so far that you would you would question your results or that you could only run the very simplest sort of queries over um and not really get those like powerful deep insights, those sort of correlative insights um that we're seeing people do. So I think you'll see, you'll continue to see uh great recommendations systems coming out of this stuff. You'll see um root cause analysis, you'll see great work coming out of the advertising industry um to you know to really say which ad was responsible for this purchase. Was it really the last ad they clicked on or was it the ad they saw five weeks ago they put the thought in mind that sort of correlative analysis is being empowered by big data systems like a dupe. >>Well I'm bullish on big data, I think people I think it's gonna be even bigger than I think you're gonna have some kids come out of college and say I could use big data to create a differentiation and build an airline based on one differentiation. These are cool new ways and, and uh, data we've never seen before. So Aaron, uh, thanks for coming >>on the issue >>um, your inside Palo Alto Studio and we're going to.

Published Date : Sep 28 2011

SUMMARY :

the market who have been talking about uh you know, a lot of first time entrepreneurs doing their startups and I've been Uh, the amount of experience take us through who you are and when you join Cloudera, I want your background. Um but you know, I I sort of followed my other colleagues you know, from your other office was a san mateo or san Bruno somewhere in there. So you guys bolted out. Um you know we're running out of space there certainly. on silicon angle, taking more of a social angle, social media has uh you know, Um but when you start talking about running H base, which needs to be up all the time serving live traffic So, you know, there's some argument like, oh, we can do it better. Um and you know, sometimes that will be true. TMC might take that approach or um you know, map are I was trying to trying to rewrite Maybe that's a political question you might want to answer. But you know what you can say is like this is Apache Hadoop. so you know, Michaelson has a long experience. Um And you know, we're excited to see um more and more contributors to Uh Dave a lot and I on the cube talk about, you know, per terabyte to store data that you don't have to think up front about what Your, I'll see a Brown graduate and you have some interns from Brown to Brown What are some of the things you see happening around the corner that And um you know, part of it is that really we're seeing more and more schools offering And the answer is we're working hard, you know, we and others in the hindu community are working do and the ecosystem around Hadoop where, you know, the rage is the software frameworks and Um and that that's sort of the beauty of it is a software framework I need to implement some big data strategy, all this stuff. Um Not the question I thought you were going to ask me that you the value that you can get out of it and you can do so with the knowledge that when you do and that people have some idea what distributed means and supercomputers and they figure that out. apps that you may see out there and or you have just conceiving just brainstorming out out of just identifying trends amongst masses of data that you know, if you tried Well I'm bullish on big data, I think people I think it's gonna be even bigger than I think you're gonna have some kids come out of college

ENTITIES

Entity	Category	Confidence
Mike Olson	PERSON	0.99+
yahoo	ORGANIZATION	0.99+
Mike Charles	PERSON	0.99+
san Francisco	LOCATION	0.99+
Palo Alto	LOCATION	0.99+
Yahoo	ORGANIZATION	0.99+
Aaron	PERSON	0.99+
Aaron T. Myers	PERSON	0.99+
University of Washington	ORGANIZATION	0.99+
Hadoop	TITLE	0.99+
facebook	ORGANIZATION	0.99+
Cloudera	ORGANIZATION	0.99+
richard Kimble	PERSON	0.99+
Michaelson	PERSON	0.99+
two interns	QUANTITY	0.99+
Oregon	LOCATION	0.99+
Google	ORGANIZATION	0.99+
Todd	PERSON	0.99+
Claudia	PERSON	0.99+
AMC	ORGANIZATION	0.99+
five weeks ago	DATE	0.99+
Northeastern University	ORGANIZATION	0.99+
monday	DATE	0.99+
first time	QUANTITY	0.99+
both	QUANTITY	0.99+
Dave	PERSON	0.99+
TMC	ORGANIZATION	0.99+
ralph kimball	PERSON	0.99+
burlingame	LOCATION	0.99+
Ferrari	ORGANIZATION	0.98+
today	DATE	0.98+
five	QUANTITY	0.98+
Brown	ORGANIZATION	0.98+
thirties	QUANTITY	0.98+
one	QUANTITY	0.98+
Horton	ORGANIZATION	0.98+
Apache	ORGANIZATION	0.98+
Hadoop	ORGANIZATION	0.98+
erin	PERSON	0.98+
google	ORGANIZATION	0.97+
One	QUANTITY	0.97+
twitter	ORGANIZATION	0.97+
Brown	PERSON	0.97+
a year ago	DATE	0.97+
Salesforce	ORGANIZATION	0.97+
john furry	PERSON	0.96+
one big computer	QUANTITY	0.95+
new york city	LOCATION	0.95+
Mendel	PERSON	0.94+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Northeastern University: