Randy Meyer, HPE & Paul Shellard, University of Cambridge | HPE Discover 2017 Madrid

>> Announcer: Live from Madrid, Spain, it's the Cube, covering HPE Discover Madrid 2017, brought to you by Hewlett Packard Enterprise. >> Welcome back to Madrid, Spain everybody, this is the Cube, the leader in live tech coverage. We're here covering HPE Discover 2017. I'm Dave Vellante with my cohost for the week, Peter Burris, Randy Meyer is back, he's the vice president and general manager Synergy and Mission Critical Solutions at Hewlett Packard Enterprise and Paul Shellerd is here, the director of the Center for Theoretical Cosmology at Cambridge University, thank you very much for coming on the Cube. >> It's a pleasure. >> Good to see you again. >> Yeah good to be back for the second time this week. I think that's, day stay outlets play too. >> Talking about computing meets the cosmos. >> Well it's exciting, yesterday we talked about Superdome Flex that we announced, we talked about it in the commercial space, where it's taking HANA and Orcale databases to the next level but there's a whole different side to what you can do with in memory compute. It's all in this high performance computing space. You think about the problems people want to solve in fluid dynamics, in forecasting, in all sorts of analytics problems, high performance compute, one of the things it does is it generates massive amounts of data that people then want to do things with. They want to compare that data to what their model said, okay can I run that against, they want to take that data and visualize it, okay how do I go do that. The more you can do that in memory, it means it's just faster to deal with because you're not going and writing this stuff off the disk, you're not moving it to another cluster back and forth, so we're seeing this burgeoning, the HPC guys would call it fat nodes, where you want to put lots of memory and eliminate the IO to go make their jobs easier and Professor Shallard will talk about a lot of that in terms of what they're doing at the Cosmos Institute, but this is a trend, you don't have to be a university. We're seeing this inside of oil and gas companies, aerospace engineering companies, anybody that's solving these complex computational problems that have an analytical element to whether it's comparative model, visualize, do something with that once you've done that. >> Paul, explain more about what it is you do. >> Well in the Cosmos Group, of which I'm the head, we're interested in two things, cosmology, which is trying to understand where the universe comes from, the whole big bang and then we're interested in black holes, particularly their collisions which produce gravitational waves, so they're the two main areas, relativity and cosmology. >> That's a big topic. I don't even know where to start, I just want to know okay what have you learned and can you summarize it for a lay person, where are you today, what can you share with us that we can understand? >> What we do is we take our mathematical models and we make predictions about the real universe and so we try and compare those to the latest observational data. We're in a particularly exciting period of time at the moment because of a flood of new data about the universe and about black holes and in the last two years, gravitational waves were discovered, there's a Nobel prize this year so lots of things are happening. It's a very data driven science so we have to try and keep up with this flood of new data which is getting larger and larger and also with new types of data, because suddenly gravitational waves are the latest thing to look at. >> What are the sources of data and new sources of data that you're tapping? >> Well, in cosmology we're mainly interested in the cosmic microwave background. >> Peter: Yeah the sources of data are the cosmos. >> Yeah right, so this is relic radiation left over from the big bang fireball, it's like a photograph of the universe, a blueprint and then also in the distribution of galaxies, so 3D maps of the universe and we've only, we're in a new age of exploration, we've only got a tiny fraction of the universe mapped so far and we're trying to extract new information about the origin of the universe from that data. In relativity, we've got these gravitational waves, these ripples in space time, they're traversing across the universe, they're essentially earthquakes in the universe and they're sound waves or seismic waves that propagate to us from these very violent events. >> I want to take you to the gravitational waves because in many respects, it's an example of a lot of what's here in action. Here's what I mean, that the experiment and correct me if I'm wrong, but it's basically, you create a, have two lasers perpendicular to each other, shooting a signal about two or three miles in that direction and it is the most precise experiment ever undertaken because what you're doing is you're measuring the time it takes for one laser versus another laser and that time is a function of the slight stretching that comes from the gravitational rays. That is an unbelievable example of edge computing, where you have just the tolerances to do that, that's not something you can send back to the cloud, you gotta do a lot of the compute right there, right? >> That's right, yes so a gravitational wave comes by and you shrink one way and you stretch the other. >> Peter: It distorts the space time. >> Yeah you become thinner and these tiny, tiny changes are what's measured and nobody expected gravitational waves to be discovered in 2015, we all thought, oh another five years, another five years, they've always been saying, we'll discover them, we'll discover them, but it happened. >> And since then, it's been used two or three times to discover new types of things and there's now a whole, I'm sure this is very centric to what you're doing, there's now a whole concept of gravitational information, can in fact becomes an entirely new branch of cosmology, have I got that right? >> Yeah you have, it's called multimessenger astronomy now because you don't just see the universe in electromagnetic waves, in light, you hear the universe. This is qualitatively different, it's sound waves coming across the universe and so combining these two, the latest event was where they heard the event first, then they turned their telescope and they saw it. So much information came out of that, even information about cosmology, because these signals are traveling hundreds of billions of light years across to us, we're getting a picture of the whole universe as they propagate all that way, so we're able to measure the expansion rate of the universe from that point. >> The techniques for the observational, the technology for observation, what is that, how has that evolved? >> Well you've got the wrong guy here. I'm from the theory group, we're doing the predictions and these guys with their incredible technology, are seeing the data, seeing and it's imagined, the whole point is you've gotta get the predictions and then you've gotta look in the data for a needle in the haystack which is this signature of these black holes colliding. >> You think about that, I have a model, I'm looking for the needle in the haystack, that's a different way to describe an in memory analytic search pattern recognition problem, that's really what it is. This is the world's largest pattern recognition problem. >> Most precise, and literally. >> And that's an observation that confirms your theory right? >> Confirms the theory, maybe it was your theory. >> I'm actually a cosmologist, so in my group we have relativists who are actively working on the black hole collisions and making predictions about this stuff. >> But they're dampening vibration from passing trucks and these things and correcting it, it's unbelievable. But coming back to the technology, the technology is, one of the reasons why this becomes so exciting and becomes practical is because for the first time, the technology has gotten to the point where you can assume that the problem you're trying to solve, that you're focused on and you don't have to translate it in technology terms, so talk a little bit about, because in many respects, that's where business is. Business wants to be able to focus on the problem and how to think the problem differently and have the technology to just respond. They don't want to have to start with the technology and then imagine what they can do with it. >> I think from our point of view, it's a very fast moving field, things are changing, new data's coming in. The data's getting bigger and bigger because instruments are getting packed tighter and tighter, there's more information, so we've got a computational problem as well, so we've got to get more computational power but there's new types of data, like suddenly there's gravitational waves. There's new types of analysis that we want to do so we want to be able to look at this data in a very flexible way and ingest it and explore new ideas more quickly because things are happening so fast, so that's why we've adopted this in memory paradigm for a number of years now and the latest incarnation of this is the HP Superdome flex and that's a shared memory system, so you can just pull in all your data and explore it without carefully programming how the memory is distributed around. We find this is very easy for our users to develop data analytic pipelines to develop their new theoretical models and to compare the two on the single system. It's also very easy for new users to use. You don't have to be an advanced programmer to get going, you can just stay with the science in a sense. >> You gotta have a PhD in Physics to do great in Physics, you don't have to have a PhD in Physics and technology. >> That's right, yeah it's a very flexible program. A flexible architecture with which to program so you can more or less take your laptop pipeline, develop your pipeline on a laptop, take it to the Superdome and then scale it up to these huge memory problems. >> And get it done fast and you can iterate. >> You know these are the most brilliant scientists in the world, bar none, I made the analogy the other day. >> Oh, thanks. >> You're supposed to say aw, chucks. >> Peter: Aw, chucks. >> Present company excepted. >> Oh yeah, that's right. >> I made the analogy of, imagine I.M. Pei or Frank Lloyd Wright or someone had to be their own general contractor, right? No, they're brilliant at designing architectures and imagining things that no one else could imagine and then they had people to go do that. This allows the people to focus on the brilliance of the science without having to go become the expert programmer, we see that in business too. Parallel programming techniques are difficult, spoken like an old tandem guy, parallelism is hard but to the extent that you can free yourself up and focus on the problem and not have to mess around with that, it makes life easier. Some problems parallelize well, but a lot of them don't need to be and you can allow the data to shine, you can allow the science to shine. >> Is it correct that the barrier in your ability to reach a conclusion or make a discovery is the ability to find that needle in a haystack or maybe there are many, but. >> Well, if you're talking about obstacles to progress, I would say computational power isn't the obstacle, it's developing the software pipelines and it's the human personnel, the smart people writing the codes that can look for the needle in the haystack who have the efficient algorithms to do that and if they're cobbled by having to think very hard about the hardware and the architecture they're working with and how they've parallelized the problem, our philosophy is much more that you solve the problem, you validate it, it can be quite inefficient if you like, but as long as it's a working program that gets you to where you want, then your second stage you worry about making it efficient, putting it on accelerators, putting it on GPUs, making it go really fast and that's, for many years now we've bought these very flexible shared memory or in memory is the new word for it, in memory architectures which allow new users, graduate students to come straight in without a Master's degree in high performance computing, they can start to tackle problems straight away. >> It's interesting, we hear the same, you talk about it at the outer reaches of the universe, I hear it at the inner reaches of the universe from the life sciences companies, we want to map the genome and we want to understand the interaction of various drug combinations with that genetic structure to say can I tune exactly a vaccine or a drug or something else for that patient's genetic makeup to improve medical outcomes? The same kind of problem, I want to have all this data that I have to run against a complex genome sequence to find the one that gets me to the answer. From the macro to the micro, we hear this problem in all different sorts of languages. >> One of the things we have our clients, mainly in business asking us all the time, is with each, let me step back, as analysts, not the smartest people in the world, as you'll attest I'm sure for real, as analysts, we like to talk about change and we always talked about mainframe being replaced by minicomputer being replaced by this or that. I like to talk in terms of the problems that computing's been able to take on, it's been able to take on increasingly complex, challenging, more difficult problems as a consequence of the advance of technology, very much like you're saying, the advance of technology allows us to focus increasingly on the problem. What kinds of problems do you think physicists are gonna be able to attack in the next five years or so as we think about the combination of increasingly powerful computing and an increasingly simple approach to use it? >> I think the simplification you're indicating here is really going to more memory. Holding your whole workload in memory, so that you, one of the biggest bottlenecks we find is ingesting the data and then writing it out, but if you can do everything at once, then that's the key element, so one of the things we've been working on a great deal is in situ visualization for example, so that you see the black holes coming together and you see that you've set the right parameters, they haven't missed each other or something's gone wrong with your simulation, so that you do the post-processing at the same time, you never need the intermediate data products, so larger and larger memory and the computational power that balances with that large memory. It's all very well to get a fat node, but you don't have the computational power to use all those terrabytes, so that's why this in memory architecture of the Superdome Flex is much more balanced between the two. What are the problems that we're looking forward to in terms of physics? Well, in cosmology we're looking for these hints about the origin of the universe and we've made a lot of progress analyzing the Plank satellite data about the cosmic microwave background. We're honing in on theories of inflation, which is where all the structure in the universe comes from, from Heisenberg's uncertainty principle, rapid period of expansion just like inflation in the financial markets in the very early universe, okay and so we're trying to identify can we distinguish between different types and are they gonna tell us whether the universe comes from a higher dimensional theory, ten dimensions, gets reduced to three plus one or lots of clues like that, we're looking for statistical fingerprints of these different models. In gravitational waves of course, this whole new area, we think of the cosmic microwave background as a photograph of the early universe, well in fact gravitational waves look right back to the earliest moment, fractions of a nanosecond after the big bang and so it may be that the answers, the clues that we're looking for come from gravitational waves and of course there's so much in astrophysics that we'll learn about compact objects, about neutron stars, about the most energetic events there are in the whole universe. >> I never thought about the idea, because cosmic radiation background goes back what, about 300,000 years if that's right. >> Yeah that's right, you're very well informed, 400,000 years because 300 is. >> Not that well informed. >> 370,000. >> I never thought about the idea of gravitational waves as being noise from the big bang and you make sense with that. >> Well with the cosmic microwave background, we're actually looking for a primordial signal from the big bang, from inflation, so it's yeah. Well anyway, what were you gonna say Randy? >> No, I just, it's amazing the frontiers we're heading down, it's kind of an honor to be able to enable some of these things, I've spent 30 years in the technology business and heard customers tell me you transformed by business or you helped me save costs, you helped me enter a new market. Never before in 30 plus years of being in this business have I had somebody tell me the things that you're providing are helping me understand the origins of the universe. It's an honor to be affiliated with you guys. >> Oh no, the honor's mine Randy, you're producing the hardware, the tools that allow us to do this work. >> Well now the honor's ours for coming onto the Cube. >> That's right, how do we learn more about your work and your discoveries, inclusions. >> In terms of looking at. >> Are there popular authors we could read other than Stephen Hawking? >> Well, read Stephen's books, they're very good, he's got a new one called A Briefer History of Time so it's more accessible than the Brief History of Time. >> So your website is. >> Yeah our website is ctc.cam.ac.uk, the center for theoretical cosmology and we've got some popular pages there, we've got some news stories about the latest things that have happened like the HP partnership that we're developing and some nice videos about the work that we're doing actually, very nice videos of that. >> Certainly, there were several videos run here this week that if people haven't seen them, go out, they're available on Youtube, they're available at your website, they're on Stephen's Facebook page also I think. >> Can you share that website again? >> Well, actually you can get the beautiful videos of Stephen and the rest of his group on the Discover website, is that right? >> I believe so. >> So that's at HP Discover website, but your website is? >> Is ctc.cam.ac.uk and we're just about to upload those videos ourselves. >> Can I make a marketing suggestion. >> Yeah. >> Simplify that. >> Ctc.cam.ac.uk. >> Yeah right, thank you. >> We gotta get the Cube at one of these conferences, one of these physics conferences and talk about gravitational waves. >> Bone up a little bit, you're kind of embarrassing us here, 100,000 years off. >> He's better informed than you are. >> You didn't need to remind me sir. Thanks very much for coming on the Cube, great pleasure having you today. >> Thank you. >> Keep it right there everybody, Mr. Universe and I will be back after this short break. (upbeat techno music)

Published Date : Nov 29 2017

SUMMARY :

brought to you by Hewlett Packard Enterprise. the director of the Center for Theoretical Cosmology Yeah good to be back for the second time this week. to what you can do with in memory compute. Well in the Cosmos Group, of which I'm the head, okay what have you learned and can you summarize it and in the last two years, gravitational waves in the cosmic microwave background. in the universe and they're sound waves or seismic waves and it is the most precise experiment ever undertaken and you shrink one way and you stretch the other. Yeah you become thinner and these tiny, tiny changes of the universe from that point. I'm from the theory group, we're doing the predictions for the needle in the haystack, that's a different way and making predictions about this stuff. the technology has gotten to the point where you can assume to get going, you can just stay with the science in a sense. You gotta have a PhD in Physics to do great so you can more or less take your laptop pipeline, in the world, bar none, I made the analogy the other day. This allows the people to focus on the brilliance is the ability to find that needle in a haystack the problem, our philosophy is much more that you solve From the macro to the micro, we hear this problem One of the things we have our clients, at the same time, you never need the I never thought about the idea, Yeah that's right, you're very well informed, from the big bang and you make sense with that. from the big bang, from inflation, so it's yeah. It's an honor to be affiliated with you guys. the hardware, the tools that allow us to do this work. and your discoveries, inclusions. so it's more accessible than the Brief History of Time. that have happened like the HP partnership they're available at your website, to upload those videos ourselves. We gotta get the Cube at one of these conferences, of embarrassing us here, 100,000 years off. You didn't need to remind me sir. Keep it right there everybody, Mr. Universe and I

ENTITIES

Entity	Category	Confidence
Stephen	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Peter Burris	PERSON	0.99+
2015	DATE	0.99+
Paul	PERSON	0.99+
Randy Meyer	PERSON	0.99+
Peter	PERSON	0.99+
30 years	QUANTITY	0.99+
Heisenberg	PERSON	0.99+
Frank Lloyd Wright	PERSON	0.99+
Paul Shellerd	PERSON	0.99+
two	QUANTITY	0.99+
Hewlett Packard Enterprise	ORGANIZATION	0.99+
Cosmos Institute	ORGANIZATION	0.99+
30 plus years	QUANTITY	0.99+
Center for Theoretical Cosmology	ORGANIZATION	0.99+
A Briefer History of Time	TITLE	0.99+
Cosmos Group	ORGANIZATION	0.99+
Randy	PERSON	0.99+
100,000 years	QUANTITY	0.99+
ten dimensions	QUANTITY	0.99+
three miles	QUANTITY	0.99+
yesterday	DATE	0.99+
five years	QUANTITY	0.99+
second stage	QUANTITY	0.99+
Paul Shellard	PERSON	0.99+
three	QUANTITY	0.99+
ctc.cam.ac.uk	OTHER	0.99+
Shallard	PERSON	0.99+
Stephen Hawking	PERSON	0.99+
three times	QUANTITY	0.99+
Brief History of Time	TITLE	0.99+
HPE	ORGANIZATION	0.99+
today	DATE	0.98+
first time	QUANTITY	0.98+
this week	DATE	0.98+
Ctc.cam.ac.uk	OTHER	0.98+
two lasers	QUANTITY	0.98+
Madrid, Spain	LOCATION	0.98+
400,000 years	QUANTITY	0.98+
hundreds of billions of light years	QUANTITY	0.98+
this year	DATE	0.98+
Discover	ORGANIZATION	0.98+
Madrid	LOCATION	0.98+
second time	QUANTITY	0.98+
one	QUANTITY	0.97+
about 300,000 years	QUANTITY	0.96+
two main areas	QUANTITY	0.96+
University of Cambridge	ORGANIZATION	0.96+
Superdome flex	COMMERCIAL_ITEM	0.96+
Nobel prize	TITLE	0.95+
One	QUANTITY	0.95+
about two	QUANTITY	0.95+
one way	QUANTITY	0.95+
one laser	QUANTITY	0.94+
HANA	TITLE	0.94+
single system	QUANTITY	0.94+
HP Discover	ORGANIZATION	0.94+
each	QUANTITY	0.93+
Youtube	ORGANIZATION	0.93+
HP	ORGANIZATION	0.93+
two things	QUANTITY	0.92+
Universe	PERSON	0.92+
first	QUANTITY	0.92+
Professor	PERSON	0.89+
last two years	DATE	0.88+
I.M. Pei	PERSON	0.88+
Cube	COMMERCIAL_ITEM	0.87+
370,000	QUANTITY	0.86+
Cambridge University	ORGANIZATION	0.85+
Synergy	ORGANIZATION	0.8+
Plank	LOCATION	0.8+
300	QUANTITY	0.72+
several videos	QUANTITY	0.65+
next five years	DATE	0.64+
HPC	ORGANIZATION	0.61+
a nanosecond	QUANTITY	0.6+

KubeCon + CloudNativeCon 2022 Preview w/ @Stu

>>Keon Cloud Native Con kicks off in Detroit on October 24th, and we're pleased to have Stewart Miniman, who's the director of Market Insights, hi, at, for hybrid platforms at Red Hat back in the studio to help us understand the key trends to look for at the events. Do welcome back, like old, old, old >>Home. Thank you, David. It's great to, great to see you and always love doing these previews, even though Dave, come on. How many years have I told you Cloud native con, It's a hoodie crowd. They're gonna totally call you out for where in a tie and things like that. I, I know you want to be an ESPN sportscaster, but you know, I I, I, I still don't think even after, you know, this show's been around for so many years that there's gonna be too many ties into Troy. I >>Know I left the hoodie in my off, I'm sorry folks, but hey, we'll just have to go for it. Okay. Containers generally, and Kubernetes specifically continue to show very strong spending momentum in the ETR survey data. So let's bring up this slide that shows the ETR sectors, all the sectors in the tax taxonomy with net score or spending velocity in the vertical axis and pervasiveness on the horizontal axis. Now, that red dotted line that you see, that marks the elevated 40% mark, anything above that is considered highly elevated in terms of momentum. Now, for years, the big four areas of momentum that shine above all the rest have been cloud containers, rpa, and ML slash ai for the first time in 10 quarters, ML and AI and RPA have dropped below the 40% line, leaving only cloud and containers in rarefied air. Now, Stu, I'm sure this data doesn't surprise you, but what do you make of this? >>Yeah, well, well, Dave, I, I did an interview with at Deepak who owns all the container and open source activity at Amazon earlier this year, and his comment was, the default deployment mechanism in Amazon is containers. So when I look at your data and I see containers and cloud going in sync, yeah, that, that's, that's how we see things. We're helping lots of customers in their overall adoption. And this cloud native ecosystem is still, you know, we're still in that Cambridge explosion of new projects, new opportunities, AI's a great workload for these type type of technologies. So it's really becoming pervasive in the marketplace. >>And, and I feel like the cloud and containers go hand in hand, so it's not surprising to see those two above >>The 40%. You know, there, there's nothing to say that, Look, can I run my containers in my data center and not do the public cloud? Sure. But in the public cloud, the default is the container. And one of the hot discussions we've been having in this ecosystem for a number of years is edge computing. And of course, you know, I want something that that's small and lightweight and can do things really fast. A lot of times it's an AI workload out there, and containers is a great fit at the edge too. So wherever it goes, containers is a good fit, which has been keeping my group at Red Hat pretty busy. >>So let's talk about some of those high level stats that we put together and preview for the event. So it's really around the adoption of open source software and Kubernetes. Here's, you know, a few fun facts. So according to the state of enterprise open source report, which was published by Red Hat, although it was based on a blind survey, nobody knew that that Red Hat was, you know, initiating it. 80% of IT execs expect to increase their use of enterprise open source software. Now, the CNCF community has currently more than 120,000 developers. That's insane when you think about that developer resource. 73% of organizations in the most recent CNCF annual survey are using Kubernetes. Now, despite the momentum, according to that same Red Hat survey, adoption barriers remain for some organizations. Stu, I'd love you to talk about this specifically around skill sets, and then we've highlighted some of the other trends that we expect to see at the event around Stu. I'd love to, again, your, get your thoughts on the preview. You've done a number of these events, automation, security, governance, governance at scale, edge deployments, which you just mentioned among others. Now Kubernetes is eight years old, and I always hear people talking about there's something coming beyond Kubernetes, but it looks like we're just getting started. Yeah, >>Dave, It, it is still relatively early days. The CMC F survey, I think said, you know, 96% of companies when they, when CMC F surveyed them last year, were either deploying Kubernetes or had plans to deploy it. But when I talked to enterprises, nobody has said like, Hey, we've got every group on board and all of our applications are on. It is a multi-year journey for most companies and plenty of them. If you, you look at the general adoption of technology, we're still working through kind of that early majority. We, you know, passed the, the chasm a couple of years ago. But to a point, you and I we're talking about this ecosystem, there are plenty of people in this ecosystem that could care less about containers and Kubernetes. Lots of conversations at this show won't even talk about Kubernetes. You've got, you know, big security group that's in there. >>You've got, you know, certain workloads like we talked about, you know, AI and ml and that are in there. And automation absolutely is playing a, a good role in what's going on here. So in some ways, Kubernetes kind of takes a, a backseat because it is table stakes at this point. So lots of people involved in it, lots of activities still going on. I mean, we're still at a cadence of three times a year now. We slowed it down from four times a year as an industry, but there's, there's still lots of innovation happening, lots of adoption, and oh my gosh, Dave, I mean, there's just no shortage of new projects and new people getting involved. And what's phenomenal about it is there's, you know, end user practitioners that aren't just contributing. But many of the projects were spawned out of work by the likes of Intuit and Spotify and, and many others that created some of the projects that sit alongside or above the, the, you know, the container orchestration itself. >>So before we talked about some of that, it's, it's kind of interesting. It's like Kubernetes is the big dog, right? And it's, it's kind of maturing after, you know, eight years, but it's still important. I wanna share another data point that underscores the traction that containers generally are getting in Kubernetes specifically have, So this is data from the latest ETR survey and shows the spending breakdown for Kubernetes in the ETR data set for it's cut for respondents with 50 or more citations in, in by the IT practitioners that lime green is new adoptions, the forest green is spending 6% or more relative to last year. The gray is flat spending year on year, and those little pink bars, that's 6% or down spending, and the bright red is retirements. So they're leaving the platform. And the blue dots are net score, which is derived by subtracting the reds from the greens. And the yellow dots are pervasiveness in the survey relative to the sector. So the big takeaway here is that there is virtually no red, essentially zero churn across all sectors, large companies, public companies, private firms, telcos, finance, insurance, et cetera. So again, sometimes I hear this things beyond Kubernetes, you've mentioned several, but it feels like Kubernetes is still a driving force, but a lot of other projects around Kubernetes, which we're gonna hear about at the show. >>Yeah. So, so, so Dave, right? First of all, there was for a number of years, like, oh wait, you know, don't waste your time on, on containers because serverless is gonna rule the world. Well, serverless is now a little bit of a broader term. Can I do a serverless viewpoint for my developers that they don't need to think about the infrastructure but still have containers underneath it? Absolutely. So our friends at Amazon have a solution called Fargate, their proprietary offering to kind of hide that piece of it. And in the open source world, there's a project called Can Native, I think it's the second or third can Native Con's gonna happen at the cncf. And even if you use this, I can still call things over on Lambda and use some of those functions. So we know Dave, it is additive and nothing ever dominates the entire world and nothing ever dies. >>So we have, we have a long runway of activities still to go on in containers and Kubernetes. We're always looking for what that next thing is. And what's great about this ecosystem is most of it tends to be additive and plug into the pieces there, there's certain tools that, you know, span beyond what can happen in the container world and aren't limited to it. And there's others that are specific for it. And to talk about the industries, Dave, you know, I love, we we have, we have a community event that we run that's gonna happen at Cubans called OpenShift Commons. And when you look at like, who's speaking there? Oh, we've got, you know, for Lockheed Martin, University of Michigan and I g Bank all speaking there. So you look and it's like, okay, cool, I've got automotive, I've got, you know, public sector, I've got, you know, university education and I've got finance. So all of you know, there is not an industry that is not touched by this. And the general wave of software adoption is the reason why, you know, not just adoption, but the creation of new software is one of the differentiators for companies. And that is what, that's the reason why I do containers, isn't because it's some cool technology and Kubernetes is great to put on my resume, but that it can actually accelerate my developers and help me create technology that makes me respond to my business and my ultimate end users. Well, >>And you know, as you know, we've been talking about the Supercloud a lot and the Kubernetes is clearly enabler to, to Supercloud, but I wanted to go back, you and John Furrier have done so many of, you know, the, the cube cons, but but go back to Docker con before Kubernetes was even a thing. And so you sort of saw this, you know, grow. I think there's what, how many projects are in CNCF now? I mean, hundreds. Hundreds, okay. And so you're, Will we hear things in Detroit, things like, you know, new projects like, you know, Argo and capabilities around SI store and things like that? Well, you're gonna hear a lot about that. Or is it just too much to cover? >>So I, I mean the, the good news, Dave, is that the CNCF really is, is a good steward for this community and new things got in get in. So there's so much going on with the existing projects that some of the new ones sometimes have a little bit of a harder time making a little bit of buzz. One of the more interesting ones is a project that's been around for a while that I think back to the first couple of Cube Cuban that John and I did service Mesh and Istio, which was created by Google, but lived under basically a, I guess you would say a Google dominated governance for a number of years is now finally under the CNCF Foundation. So I talked to a number of companies over the years and definitely many of the contributors over the years that didn't love that it was a Google Run thing, and now it is finally part. >>So just like Kubernetes is, we have SEO and also can Native that I mentioned before also came outta Google and those are all in the cncf. So will there be new projects? Yes. The CNCF is sometimes they, they do matchmaking. So in some of the observability space, there were a couple of projects that they said, Hey, maybe you can go merge down the road. And they ended up doing that. So there's still you, you look at all these projects and if I was an end user saying, Oh my God, there is so much change and so many projects, you know, I can't spend the time in the effort to learn about all of these. And that's one of the challenges and something obviously at Red Hat, we spend a lot of time figuring out, you know, not to make winners, but which are the things that customers need, Where can we help make them run in production for our, our customers and, and help bring some stability and a little bit of security for the overall ecosystem. >>Well, speaking of security, security and, and skill sets, we've talked about those two things and they sort of go hand in hand when I go to security events. I mean, we're at reinforced last summer, we were just recently at the CrowdStrike event. A lot of the discussion is sort of best practice because it's so complicated. And, and, and will you, I presume you're gonna hear a lot of that here because security securing containers now, you know, the whole shift left thing and shield right is, is a complicated matter, especially when you saw with the earlier data from the Red Hat survey, the the gaps are around skill sets. People don't have the skill. So should we expect to hear a lot about that, A lot of sort of how to, how to take advantage of some of these new capabilities? >>Yeah, Dave, absolutely. So, you know, one of the conversations going on in the community right now is, you know, has DevOps maybe played out as we expect to see it? There's a newer term called platform engineering, and how much do I need to do there? Something that I, I know your, your team's written a lot about Dave, is how much do you need to know versus what can you shift to just a platform or a service that I can consume? I've talked a number of times with you since I've been at Red Hat about the cloud services that we offer. So you want to use our offering in the public cloud. Our first recommendation is, hey, we've got cloud services, how much Kubernetes do you really want to learn versus you want to do what you can build on top of it, modernize the pieces and have less running the plumbing and electric and more, you know, taking advantage of the, the technologies there. So that's a big thing we've seen, you know, we've got a big SRE team that can manage that for use so that you have to spend less time worrying about what really is un differentiated heavy lifting and spend more time on what's important to your business and your >>Customers. So, and that's, and that's through a managed service. >>Yeah, absolutely. >>That whole space is just taken off. All right, Stu I'll give you the final word. You know, what are you excited about for, for, for this upcoming event and Detroit? Interesting choice of venue? Yeah, >>Look, first of off, easy flight. I've, I've never been to Detroit, so I'm, I'm willing to give it a shot and hopefully, you know, that awesome airport. There's some, some, some good things there to learn. The show itself is really a choose your own adventure because there's so much going on. The main show of QAN and cloud Native Con is Wednesday through Friday, but a lot of a really interesting stuff happens on Monday and Tuesday. So we talked about things like OpenShift Commons in the security space. There's cloud Native Security Day, which is actually two days and a SIG store event. There, there's a get up show, there's, you know, k native day. There's so many things that if you want to go deep on a topic, you can go spend like a workshop in some of those you can get hands on to. And then at the show itself, there's so much, and again, you can learn from your peers. >>So it was good to see we had, during the pandemic, it tilted a little bit more vendor heavy because I think most practitioners were pretty busy focused on what they could work on and less, okay, hey, I'm gonna put together a presentation and maybe I'm restricted at going to a show. Yeah, not, we definitely saw that last year when I went to LA I was disappointed how few customer sessions there were. It, it's back when I go look through the schedule now there's way more end users sharing their stories and it, it's phenomenal to see that. And the hallway track, Dave, I didn't go to Valencia, but I hear it was really hopping felt way more like it was pre pandemic. And while there's a few people that probably won't come because Detroit, we think there's, what we've heard and what I've heard from the CNCF team is they are expecting a sizable group up there. I know a lot of the hotels right near the, where it's being held are all sold out. So it should be, should be a lot of fun. Good thing I'm speaking on an edge panel. First time I get to be a speaker at the show, Dave, it's kind of interesting to be a little bit of a different role at the show. >>So yeah, Detroit's super convenient, as I said. Awesome. Airports too. Good luck at the show. So it's a full week. The cube will be there for three days, Tuesday, Wednesday, Thursday. Thanks for coming. >>Wednesday, Thursday, Friday, sorry, >>Wednesday, Thursday, Friday is the cube, right? So thank you for that. >>And, and no ties from the host, >>No ties, only hoodies. All right Stu, thanks. Appreciate you coming in. Awesome. And thank you for watching this preview of CubeCon plus cloud Native Con with at Stu, which again starts the 24th of October, three days of broadcasting. Go to the cube.net and you can see all the action. We'll see you there.

Published Date : Oct 4 2022

SUMMARY :

Red Hat back in the studio to help us understand the key trends to look for at the events. I know you want to be an ESPN sportscaster, but you know, I I, I, I still don't think even Now, that red dotted line that you And this cloud native ecosystem is still, you know, we're still in that Cambridge explosion And of course, you know, I want something that that's small and lightweight and Here's, you know, a few fun facts. I think said, you know, 96% of companies when they, when CMC F surveyed them last year, You've got, you know, certain workloads like we talked about, you know, AI and ml and that And it's, it's kind of maturing after, you know, eight years, but it's still important. oh wait, you know, don't waste your time on, on containers because serverless is gonna rule the world. And the general wave of software adoption is the reason why, you know, And you know, as you know, we've been talking about the Supercloud a lot and the Kubernetes is clearly enabler to, to Supercloud, definitely many of the contributors over the years that didn't love that it was a Google Run the observability space, there were a couple of projects that they said, Hey, maybe you can go merge down the road. securing containers now, you know, the whole shift left thing and shield right is, So, you know, one of the conversations going on in the community right now is, So, and that's, and that's through a managed service. All right, Stu I'll give you the final word. There, there's a get up show, there's, you know, k native day. I know a lot of the hotels right near the, where it's being held are all sold out. Good luck at the show. So thank you for that. Go to the cube.net and you can see all the action.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
David	PERSON	0.99+
Lockheed Martin	ORGANIZATION	0.99+
6%	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
Detroit	LOCATION	0.99+
50	QUANTITY	0.99+
CNCF	ORGANIZATION	0.99+
October 24th	DATE	0.99+
40%	QUANTITY	0.99+
Stewart Miniman	PERSON	0.99+
Friday	DATE	0.99+
Google	ORGANIZATION	0.99+
96%	QUANTITY	0.99+
two days	QUANTITY	0.99+
University of Michigan	ORGANIZATION	0.99+
Stu	PERSON	0.99+
CMC F	ORGANIZATION	0.99+
80%	QUANTITY	0.99+
Tuesday	DATE	0.99+
John	PERSON	0.99+
Wednesday	DATE	0.99+
eight years	QUANTITY	0.99+
Monday	DATE	0.99+
last year	DATE	0.99+
three days	QUANTITY	0.99+
Red Hat	ORGANIZATION	0.99+
second	QUANTITY	0.99+
73%	QUANTITY	0.99+
Thursday	DATE	0.99+
LA	LOCATION	0.99+
more than 120,000 developers	QUANTITY	0.99+
two things	QUANTITY	0.99+
John Furrier	PERSON	0.99+
hundreds	QUANTITY	0.99+
Hundreds	QUANTITY	0.99+
first time	QUANTITY	0.99+
two	QUANTITY	0.99+
24th of October	DATE	0.99+
one	QUANTITY	0.98+
KubeCon	EVENT	0.98+
CubeCon	EVENT	0.98+
CNCF Foundation	ORGANIZATION	0.98+
cube.net	OTHER	0.98+
last summer	DATE	0.98+
Valencia	LOCATION	0.98+
third	QUANTITY	0.98+
Spotify	ORGANIZATION	0.98+
Intuit	ORGANIZATION	0.98+
last year	DATE	0.98+
One	QUANTITY	0.98+
cloud Native Security Day	EVENT	0.97+
Kubernetes	TITLE	0.97+
QAN	EVENT	0.97+
ESPN	ORGANIZATION	0.97+

The University of Edinburgh and Rolls Royce Drive in Exascale Style | Exascale Day

>>welcome. My name is Ben Bennett. I am the director of HPC Strategic programs here at Hewlett Packard Enterprise. It is my great pleasure and honor to be talking to Professor Mark Parsons from the Edinburgh Parallel Computing Center. And we're gonna talk a little about exa scale. What? It means we're gonna talk less about the technology on Maura about the science, the requirements on the need for exa scale. Uh, rather than a deep dive into the enabling technologies. Mark. Welcome. >>I then thanks very much for inviting me to tell me >>complete pleasure. Um, so I'd like to kick off with, I suppose. Quite an interesting look back. You and I are both of a certain age 25 plus, Onda. We've seen these milestones. Uh, I suppose that the S I milestones of high performance computing's come and go, you know, from a gig a flop back in 1987 teraflop in 97 a petaflop in 2000 and eight. But we seem to be taking longer in getting to an ex a flop. Um, so I'd like your thoughts. Why is why is an extra flop taking so long? >>So I think that's a very interesting question because I started my career in parallel computing in 1989. I'm gonna join in. IPCC was set up then. You know, we're 30 years old this year in 1990 on Do you know the fastest computer we have them is 800 mega flops just under a getting flogged. So in my career, we've gone already. When we reached the better scale, we'd already gone pretty much a million times faster on, you know, the step from a tariff block to a block scale system really didn't feel particularly difficult. Um, on yet the step from A from a petaflop PETA scale system. To an extent, block is a really, really big challenge. And I think it's really actually related to what's happened with computer processes over the last decade, where, individually, you know, approached the core, Like on your laptop. Whoever hasn't got much faster, we've just got more often So the perception of more speed, but actually just being delivered by more course. And as you go down that approach, you know what happens in the supercomputing world as well. We've gone, uh, in 2010 I think we had systems that were, you know, a few 1000 cores. Our main national service in the UK for the last eight years has had 118,000 cores. But looking at the X scale we're looking at, you know, four or five million cores on taming that level of parallelism is the real challenge. And that's why it's taking an enormous and time to, uh, deliver these systems. That is not just on the hardware front. You know, vendors like HP have to deliver world beating technology and it's hard, hard. But then there's also the challenge to the users. How do they get the codes to work in the face of that much parallelism? >>If you look at what the the complexity is delivering an annex a flop. Andi, you could have bought an extra flop three or four years ago. You couldn't have housed it. You couldn't have powered it. You couldn't have afforded it on, do you? Couldn't program it. But you still you could have You could have bought one. We should have been so lucky to be unable to supply it. Um, the software, um I think from our standpoint, is is looking like where we're doing mawr enabling with our customers. You sell them a machine on, then the the need then to do collaboration specifically seems mawr and Maura around the software. Um, so it's It's gonna be relatively easy to get one x a flop using limb pack, but but that's not extra scale. So what do you think? On exa scale machine versus an X? A flop machine means to the people like yourself to your users, the scientists and industry. What is an ex? A flop versus >>an exa scale? So I think, you know, supercomputing moves forward by setting itself challenges. And when you when you look at all of the excess scale programs worldwide that are trying to deliver systems that can do an X a lot form or it's actually very arbitrary challenge. You know, we set ourselves a PETA scale challenge delivering a petaflop somebody manage that, Andi. But you know, the world moves forward by setting itself challenges e think you know, we use quite arbitrary definition of what we mean is well by an exit block. So, you know, in your in my world, um, we either way, first of all, see ah flop is a computation, so multiply or it's an ad or whatever on we tend. Thio, look at that is using very high precision numbers or 64 bit numbers on Do you know, we then say, Well, you've got to do the next block. You've got to do a billion billion of those calculations every second. No, a some of the last arbitrary target Now you know today from HPD Aiken by my assistant and will do a billion billion calculations per second. And they will either do that as a theoretical peak, which would be almost unattainable, or using benchmarks that stressed the system on demonstrate a relaxing law. But again, those benchmarks themselves attuned Thio. Just do those calculations and deliver and explore been a steady I'll way if you like. So, you know, way kind of set ourselves this this this big challenge You know, the big fence on the race course, which were clambering over. But the challenge in itself actually should be. I'm much more interesting. The water we're going to use these devices for having built um, eso. Getting into the extra scale era is not so much about doing an extra block. It's a new generation off capability that allows us to do better scientific and industrial research. And that's the interesting bit in this whole story. >>I would tend to agree with you. I think the the focus around exa scale is to look at, you know, new technologies, new ways of doing things, new ways of looking at data and to get new results. So eventually you will get yourself a nexus scale machine. Um, one hopes, sooner rather >>than later. Well, I'm sure you don't tell me one, Ben. >>It's got nothing to do with may. I can't sell you anything, Mark. But there are people outside the door over there who would love to sell you one. Yes. However, if we if you look at your you know your your exa scale machine, Um, how do you believe the workloads are going to be different on an extra scale machine versus your current PETA scale machine? >>So I think there's always a slight conceit when you buy a new national supercomputer. On that conceit is that you're buying a capability that you know on. But many people will run on the whole system. Known truth. We do have people that run on the whole of our archer system. Today's A 118,000 cores, but I would say, and I'm looking at the system. People that run over say, half of that can be counted on Europe on a single hand in a year, and they're doing very specific things. It's very costly simulation they're running on. So, you know, if you look at these systems today, two things show no one is. It's very difficult to get time on them. The Baroque application procedures All of the requirements have to be assessed by your peers and your given quite limited amount of time that you have to eke out to do science. Andi people tend to run their applications in the sweet spot where their application delivers the best performance on You know, we try to push our users over time. Thio use reasonably sized jobs. I think our average job says about 20,000 course, she's not bad, but that does mean that as we move to the exits, kill two things have to happen. One is actually I think we've got to be more relaxed about giving people access to the system, So let's give more people access, let people play, let people try out ideas they've never tried out before. And I think that will lead to a lot more innovation and computational science. But at the same time, I think we also need to be less precious. You know, we to accept these systems will have a variety of sizes of job on them. You know, we're still gonna have people that want to run four million cores or two million cores. That's absolutely fine. Absolutely. Salute those people for trying really, really difficult. But then we're gonna have a huge spectrum of views all the way down to people that want to run on 500 cores or whatever. So I think we need Thio broaden the user base in Alexa Skill system. And I know this is what's happening, for example, in Japan with the new Japanese system. >>So, Mark, if you cast your mind back to almost exactly a year ago after the HPC user forum, you were interviewed for Premier Magazine on Do you alluded in that article to the needs off scientific industrial users requiring, you know, uh on X a flop or an exa scale machine it's clear in your in your previous answer regarding, you know, the workloads. Some would say that the majority of people would be happier with, say, 10 100 petaflop machines. You know, democratization. More people access. But can you provide us examples at the type of science? The needs of industrial users that actually do require those resources to be put >>together as an exa scale machine? So I think you know, it's a very interesting area. At the end of the day, these systems air bought because they are capability systems on. I absolutely take the argument. Why shouldn't we buy 10 100 pattern block systems? But there are a number of scientific areas even today that would benefit from a nexus school system and on these the sort of scientific areas that will use as much access onto a system as much time and as much scale of the system as they can, as you can give them eso on immediate example. People doing chroma dynamics calculations in particle physics, theoretical calculations, they would just use whatever you give them. But you know, I think one of the areas that is very interesting is actually the engineering space where, you know, many people worry the engineering applications over the last decade haven't really kept up with this sort of supercomputers that we have. I'm leading a project called Asimov, funded by M. P S O. C in the UK, which is jointly with Rolls Royce, jointly funded by Rolls Royce and also working with the University of Cambridge, Oxford, Bristol, Warrick. We're trying to do the whole engine gas turbine simulation for the first time. So that's looking at the structure of the gas turbine, the airplane engine, the structure of it, how it's all built it together, looking at the fluid dynamics off the air and the hot gasses, the flu threat, looking at the combustion of the engine looking how fuel is spread into the combustion chamber. Looking at the electrics around, looking at the way the engine two forms is, it heats up and cools down all of that. Now Rolls Royce wants to do that for 20 years. Andi, Uh, whenever they certify, a new engine has to go through a number of physical tests, and every time they do on those tests, it could cost them as much as 25 to $30 million. These are very expensive tests, particularly when they do what's called a blade off test, which would be, you know, blade failure. They could prove that the engine contains the fragments of the blade. Sort of think, continue face really important test and all engines and pass it. What we want to do is do is use an exa scale computer to properly model a blade off test for the first time, so that in future, some simulations can become virtual rather than having thio expend all of the money that Rolls Royce would normally spend on. You know, it's a fascinating project is a really hard project to do. One of the things that I do is I am deaf to share this year. Gordon Bell Price on bond I've really enjoyed to do. That's one of the major prizes in our area, you know, gets announced supercomputing every year. So I have the pleasure of reading all the submissions each year. I what's been really interesting thing? This is my third year doing being on the committee on what's really interesting is the way that big systems like Summit, for example, in the US have pushed the user communities to try and do simulations Nowhere. Nobody's done before, you know. And we've seen this as well, with papers coming after the first use of the for Goku system in Japan, for example, people you know, these are very, very broad. So, you know, earthquake simulation, a large Eddie simulations of boats. You know, a number of things around Genome Wide Association studies, for example. So the use of these computers spans of last area off computational science. I think the really really important thing about these systems is their challenging people that do calculations they've never done before. That's what's important. >>Okay, Thank you. You talked about challenges when I nearly said when you and I had lots of hair, but that's probably much more true of May. Um, we used to talk about grand challenges we talked about, especially around the teraflop era, the ski red program driving, you know, the grand challenges of science, possibly to hide the fact that it was a bomb designing computer eso they talked about the grand challenges. Um, we don't seem to talk about that much. We talk about excess girl. We talk about data. Um Where are the grand challenges that you see that an exa scale computer can you know it can help us. Okay, >>so I think grand challenges didn't go away. Just the phrase went out of fashion. Um, that's like my hair. I think it's interesting. The I do feel the science moves forward by setting itself grand challenges and always had has done, you know, my original backgrounds in particle physics. I was very lucky to spend four years at CERN working in the early stage of the left accelerator when it first came online on. Do you know the scientists there? I think they worked on left 15 years before I came in and did my little ph d on it. Andi, I think that way of organizing science hasn't changed. We just talked less about grand challenges. I think you know what I've seen over the last few years is a renaissance in computational science, looking at things that have previously, you know, people have said have been impossible. So a couple of years ago, for example, one of the key Gordon Bell price papers was on Genome Wide Association studies on some of it. If I may be one of the winner of its, if I remember right on. But that was really, really interesting because first of all, you know, the sort of the Genome Wide Association Studies had gone out of favor in the bioinformatics by a scientist community because people thought they weren't possible to compute. But that particular paper should Yes, you could do these really, really big Continental little problems in a reasonable amount of time if you had a big enough computer. And one thing I felt all the way through my career actually is we've probably discarded Mawr simulations because they were impossible at the time that we've actually decided to do. And I sometimes think we to challenge ourselves by looking at the things we've discovered in the past and say, Oh, look, you know, we could actually do that now, Andi, I think part of the the challenge of bringing an extra service toe life is to get people to think about what they would use it for. That's a key thing. Otherwise, I always say, a computer that is unused to just be turned off. There's no point in having underutilized supercomputer. Everybody loses from that. >>So Let's let's bring ourselves slightly more up to date. We're in the middle of a global pandemic. Uh, on board one of the things in our industry has bean that I've been particularly proud about is I've seen the vendors, all the vendors, you know, offering up machine's onboard, uh, making resources available for people to fight things current disease. Um, how do you see supercomputers now and in the future? Speeding up things like vaccine discovery on help when helping doctors generally. >>So I think you're quite right that, you know, the supercomputer community around the world actually did a really good job of responding to over 19. Inasmuch as you know, speaking for the UK, we put in place a rapid access program. So anybody wanted to do covert research on the various national services we have done to the to two services Could get really quick access. Um, on that, that has worked really well in the UK You know, we didn't have an archer is an old system, Aziz. You know, we didn't have the world's largest supercomputer, but it is happily bean running lots off covert 19 simulations largely for the biomedical community. Looking at Druk modeling and molecular modeling. Largely that's just been going the US They've been doing really large uh, combinatorial parameter search problems on on Summit, for example, looking to see whether or not old drugs could be reused to solve a new problem on DSO, I think, I think actually, in some respects Kobe, 19 is being the sounds wrong. But it's actually been good for supercomputing. Inasmuch is pointed out to governments that supercomputers are important parts off any scientific, the active countries research infrastructure. >>So, um, I'll finish up and tap into your inner geek. Um, there's a lot of technologies that are being banded around to currently enable, you know, the first exa scale machine, wherever that's going to be from whomever, what are the current technologies or emerging technologies that you are interested in excited about looking forward to getting your hands on. >>So in the business case I've written for the U. K's exa scale computer, I actually characterized this is a choice between the American model in the Japanese model. Okay, both of frozen, both of condoms. Eso in America, they're very much gone down the chorus plus GPU or GPU fruit. Um, so you might have, you know, an Intel Xeon or an M D process er center or unarmed process or, for that matter on you might have, you know, 24 g. P. U s. I think the most interesting thing that I've seen is definitely this move to a single address space. So the data that you have will be accessible, but the G p u on the CPU, I think you know, that's really bean. One of the key things that stopped the uptake of GPS today and that that that one single change is going Thio, I think, uh, make things very, very interesting. But I'm not entirely convinced that the CPU GPU model because I think that it's very difficult to get all the all the performance set of the GPU. You know, it will do well in H p l, for example, high performance impact benchmark we're discussing at the beginning of this interview. But in riel scientific workloads, you know, you still find it difficult to find all the performance that has promised. So, you know, the Japanese approach, which is the core, is only approach. E think it's very attractive, inasmuch as you know They're using very high bandwidth memory, very interesting process of which they are going to have to, you know, which they could develop together over 10 year period. And this is one thing that people don't realize the Japanese program and the American Mexico program has been working for 10 years on these systems. I think the Japanese process really interesting because, um, it when you look at the performance, it really does work for their scientific work clothes, and that's that does interest me a lot. This this combination of a A process are designed to do good science, high bandwidth memory and a real understanding of how data flows around the supercomputer. I think those are the things are exciting me at the moment. Obviously, you know, there's new networking technologies, I think, in the fullness of time, not necessarily for the first systems. You know, over the next decade we're going to see much, much more activity on silicon photonics. I think that's really, really fascinating all of these things. I think in some respects the last decade has just bean quite incremental improvements. But I think we're supercomputing is going in the moment. We're a very very disruptive moment again. That goes back to start this discussion. Why is extra skill been difficult to get? Thio? Actually, because the disruptive moment in technology. >>Professor Parsons, thank you very much for your time and your insights. Thank you. Pleasure and folks. Thank you for watching. I hope you've learned something, or at least enjoyed it. With that, I would ask you to stay safe and goodbye.

Published Date : Oct 16 2020

SUMMARY :

I am the director of HPC Strategic programs I suppose that the S I milestones of high performance computing's come and go, But looking at the X scale we're looking at, you know, four or five million cores on taming But you still you could have You could have bought one. challenges e think you know, we use quite arbitrary focus around exa scale is to look at, you know, new technologies, Well, I'm sure you don't tell me one, Ben. outside the door over there who would love to sell you one. So I think there's always a slight conceit when you buy a you know, the workloads. That's one of the major prizes in our area, you know, gets announced you know, the grand challenges of science, possibly to hide I think you know what I've seen over the last few years is a renaissance about is I've seen the vendors, all the vendors, you know, Inasmuch as you know, speaking for the UK, we put in place a rapid to currently enable, you know, I think you know, that's really bean. Professor Parsons, thank you very much for your time and your insights.

ENTITIES

Entity	Category	Confidence
Ben Bennett	PERSON	0.99+
1989	DATE	0.99+
Rolls Royce	ORGANIZATION	0.99+
UK	LOCATION	0.99+
500 cores	QUANTITY	0.99+
10 years	QUANTITY	0.99+
20 years	QUANTITY	0.99+
Japan	LOCATION	0.99+
Parsons	PERSON	0.99+
1990	DATE	0.99+
Mark	PERSON	0.99+
2010	DATE	0.99+
1987	DATE	0.99+
HP	ORGANIZATION	0.99+
118,000 cores	QUANTITY	0.99+
first time	QUANTITY	0.99+
four years	QUANTITY	0.99+
America	LOCATION	0.99+
CERN	ORGANIZATION	0.99+
third year	QUANTITY	0.99+
four	QUANTITY	0.99+
first	QUANTITY	0.99+
30 years	QUANTITY	0.99+
2000	DATE	0.99+
four million cores	QUANTITY	0.99+
two million cores	QUANTITY	0.99+
Genome Wide Association	ORGANIZATION	0.99+
two services	QUANTITY	0.99+
Ben	PERSON	0.99+
first systems	QUANTITY	0.99+
two forms	QUANTITY	0.99+
US	LOCATION	0.99+
both	QUANTITY	0.99+
IPCC	ORGANIZATION	0.99+
three	DATE	0.99+
today	DATE	0.98+
Hewlett Packard Enterprise	ORGANIZATION	0.98+
University of Cambridge	ORGANIZATION	0.98+
five million cores	QUANTITY	0.98+
a year ago	DATE	0.98+
single	QUANTITY	0.98+
Mark Parsons	PERSON	0.98+
two things	QUANTITY	0.98+
$30 million	QUANTITY	0.98+
one	QUANTITY	0.98+
Edinburgh Parallel Computing Center	ORGANIZATION	0.98+
Aziz	PERSON	0.98+
Gordon Bell	PERSON	0.98+
May	DATE	0.98+
64 bit	QUANTITY	0.98+
Europe	LOCATION	0.98+
One	QUANTITY	0.97+
each year	QUANTITY	0.97+
about 20,000 course	QUANTITY	0.97+
Today	DATE	0.97+
Alexa	TITLE	0.97+
this year	DATE	0.97+
HPC	ORGANIZATION	0.96+
Intel	ORGANIZATION	0.96+
Xeon	COMMERCIAL_ITEM	0.95+
25	QUANTITY	0.95+
over 10 year	QUANTITY	0.95+
1000 cores	QUANTITY	0.95+
Thio	PERSON	0.95+
800 mega flops	QUANTITY	0.95+
Professor	PERSON	0.95+
Andi	PERSON	0.94+
one thing	QUANTITY	0.94+
couple of years ago	DATE	0.94+
over 19	QUANTITY	0.93+
U. K	LOCATION	0.92+
Premier Magazine	TITLE	0.92+
10 100 petaflop machines	QUANTITY	0.91+
four years ago	DATE	0.91+
Exascale	LOCATION	0.91+
HPD Aiken	ORGANIZATION	0.91+

Physics Successfully Implements Lagrange Multiplier Optimization

>> Hello everybody. My title is Physics Implements Lagrange Multiplier Optimization. And let me be very specific about what I mean by this, is that in physics, there are a series of principles that are optimization principles. And we are just beginning to take advantage of them. For example, most famous in physics is the principle of least action. Of equal importance is the principle of least entropy generation. That's to say a dissipated circuit will try to adjust itself to dissipated as little as possible. There's other concepts first-to-gain-threshold, the variational principle, the adiabatic method, simulated annealing but actual physical annealing. So let's look at some of these that I'm sure you probably know about is the principle of least time. And this is sort of illustrated by a lifeguard who is trying to save a swimmer and runs as fast as possible along the sand and finally jumps in the water. So it's like the refraction of light. The lifeguard is trying to get to the swimmer as quickly as possible and is trying to follow the path that takes the least amount of time. This of course occurs in optics and classical mechanics and so forth. It's the principle of least action. Let me show you another one. The principle of minimum power dissipation. Imagine you had a circuit like this, where the current was dividing unequally. Well, that would make you feel very uncomfortable. The circuit will automatically try to adjust itself, so that the two branches which are equal actually are drawing equal amount of current. If they are unequal, it will dissipate excess energy. So we talk about least power dissipation, more sophisticated way of saying the same thing is the least entropy production. This is actually the most common one of all. Here's one that's kind of interesting. People have made a lot of hay about this, is you have lasers and you try to reach the threshold. And so you have different modes on the horizontal axis. And then one mode happens to have the lowest loss and then all the energy goes into that mode. This is the first-to-gain-threshold. This is also a type of minimization principle because physics finds the mode with the lowest gain threshold. Now, what I'll show about this, is it's not as good as it seems because there continues to be, even after you reach the gain threshold, there continues to be evolution among the modes. And so it's not quite as clear cut as it might seem. Here's the one it's famous, the variational principle. It says you have a trial wave function, the red one, it's no good because it has too much energy. The true wave function is illustrated in green. And that one has fines automatically. The fines, the situation with the wave function has the lowest energy. Here's one, of course it's just physical annealing in which you could do as physical annealing, which you could also think of it as simulated annealing. And in simulated annealing, you add noise or you raise the temperature, or do something else to jump out of local minima. So you do tend to get stuck in all of these methods. You tend to get stuck in local minima and you have to find a strategy to jump out of those local minima, but certainly physical annealing actually promises to give you a global optimum. So that's, we've got to keep that one in mind. And then there's the adiabatic method. And in the adiabatic method, you have modes. Now I am one who believes that we could do this even classically, just with LC circuits? We have avoided crossings. And the avoided crossings are such that you start from a solvable problem, and then you go to a very difficult to solve problem. And yet you stay in the ground state and I'm sure you all know this. This is the adiabatic method. Some people think of it as quantum mechanical, it could be, but it's also a classical. And what you're adjusting is one of the inductances in a complicated LC circuit. And this is sort of another illustration of the same thing, a little bit more complicated graph. You go from a simple Hamiltonian to a hard Hamiltonian, and you find a solution that way. So these are all minimization principles. Now, one of the preferred attributes is to have a digital answer, which we can get with bistable elements, physics is loaded with bistable elements, starting with the flip-flop. And you can imagine somehow coupling them together. I show you here just resistors, but it's very important that the, you don't have a pure analog machine. You want to have a machine that provides digital answers and the flip-flop is actually an analog machine, but it locks into a digital state. And so we want bistable elements that will give us binary answers. Okay, so having quickly gone through it, which of these is the best? So let's try to answer, which of these is the best for doing optimization? Which physics principle might be the best? And so one of our nice problems that we like to solve is the Ising problem. And there's a way to set that up with circuits and you can have LC circuits and try to mimic the ferromagnetic case as the two circuits are in phase and so you have, you try to lock them into, either positive or negative phase. You can do that with parametric gains. You have classical parametric gain with a two omega modulation on a capacitor and it's bistable. And if you have crossing couplings, then it's a, the phases tend to be opposite. And so you tend to have anti-ferromagnetic coupling. So you can mimic with these circuits, but there's so many ways to mimic it. So we'll see some more examples. Now, one of the main points I'm going to make today is that it's very easy to set up a physical system that not only does optimization, but also includes constraints and the constraints we normally take into account with Lagrange multipliers and this sort of an explanation of Lagrange multipliers. You're trying to go toward the absolute optimum here, but you run into the red constraint. So you get stopped right there. And the gradient of the constraint is opposite to the a, they cancel each other, the gradient of the merit function. So this is standard stuff in college, Lagrange multiplier calculus. So if physics does this, how does it do it? Well, it does it by steepest descent. We just follow it. Physics, for example, will try to go to the state of lowest power dissipation. So it goes, and it minimizes the participation in blue, but also tries to satisfy the constraint. And then we finally, we find the optimum point in some multi-dimensional configuration space. Another way of saying it, is we go from some initial state to some final state and physics does this for you for free, because it is always trying to reduce the entropy production, the power dissipation. And so there have been, I'm going to show you now five different schemes, actually I have about eight different schemes. And they all use the principle of minimum entropy generation but not all of them recognize it. So here's some work from my colleague, Roychowdhury here in my department, and he has these very amplitude, stable oscillators, but they tend to lock into a phase and in this way, it's unnatural for solving the Ising problem. But if you analyze it in detail and I'll show you the link to the archive where we've shown this is that this one is trying to satisfy the principle of minimum entropy generation and it includes constraints. And the most important constraint for us is that we want a digital answer. So we want to have either a plus or minus as the answer and the parametric oscillator permits that. He's not using a parametric oscillator, he's using something a little different, but it's somewhat similar. He's using lock sort of second-harmonic locking. It's similar to the parametric oscillator. And here's another approach from England, Cambridge University. I have the symbol of the university here and they got very excited. They have polaritons, exciton-polaritons they were very excited about that. But to us they're really just coupled electromagnetic modes and created by optical excitation. And they lock into definite phases and no big surprise they're actually, it also follows, it tends to lock in, in such a way that it minimizes the power dissipation, and it is very easy to include the digital constraint in there. And so that's yet another example. Of course, all the examples I'm going to show you from literature are all following the principle of minimum entropy generation. This is not always acknowledged by the authors. This is the Yamamoto Stanford approach. Thank you very much for inviting me. So I've analyzed this one with, we think that what's going on here. I think the quantum mechanical version could be very interesting possibly. But the versions that are out there right now are they're dissipative and there's dissipation in the optical fiber it's overcome by the parametric gain. And the net conclusion of this is that the different optical parametric oscillator pulses are trying to organize themselves in such a way as to minimize the power dissipation. So it's based upon minimum entropy generation, which for our purposes is synonymous with minimizing the power dissipation. And of course, very beautifully done. It is a very beautiful system because it's time multiplexed and it locks in to digital answers. So that's very nice. Here's something different, not the Ising problem from MIT. It is an optimizer. It's an optimizer for artificial intelligence. It uses Silicon Photonics and does unitary operations. We've gone through this very carefully. I'm sure to the people at MIT, they think they have something very unusual. But to us, this is usual. This is an example of minimizing the power dissipation. As you go round over and over again, through the Silicon Photonics, you end up minimizing the power dissipation. It's kind of surprising. And principle of minimum entropy generation again. Okay. And this is from my own group where we try to mimic the coherent ising machine, except it's just electrical. And we get the, this is an anti-ferromagnetic configuration. If the resistors were this way, it would be a ferromagnetic configuration. And we can arrange that. So I've just done five of my, I think I could have done a few more, but we're running out of time. But all of these optimization approaches are similar in that they're based upon minimum entropy generation, which is a, I don't want to say it's a law of physics, but it's accepted by many physicists, and you have different examples, including particularly MIT's optimizer for artificial intelligence. They all seem to take advantage of this type of physics. So they're all versions of minimum entropy generation. The physics hardware implements steepest descent physically. And because of the constraint though, it produces a binary output. Which is digital in the same sense that a flip-flop is digital. What's the promise? The promise is that the physics-based hardware will perform the same function at far greater speed and far less power dissipation. Now. The challenge of global optimization remains unsolved. I don't think anybody has a solution to the problem of global optimization. We can try to do better, we can get a little closer. But if, so even setting that aside, there all these terrific applications in deep learning and in neural network back-propagation, artificial intelligence, control theory. So there many applications, operations research, biology, et cetera. But there are a couple of action items needed to go further. And that is, I believe that the electronic implementation is perhaps a little easier to scale. And so we need to design some chips. So we need a chip with an array of oscillators. If you had a thousand LC oscillators on the chip, I think that would be already be very interesting. But you need to interconnect them. This would require a resistive network with about a million resistors. I think that can also be done on a chip. So minimizing the power dissipation is the whole point, but you'll do have to, there is an accuracy problem. The resistors have to be very precise but there's good news. Resistors can be programmed very accurately and I'll be happy to take questions on that. So later step though, once we have the chips is we need compiler software to convert the unknown problem into the given resistance values that will fit within these oscillator chips. So let me pause then for questions and thank you very much for your attention.

Published Date : Sep 24 2020

SUMMARY :

And because of the constraint though,

ENTITIES

Entity	Category	Confidence
two branches	QUANTITY	0.99+
five	QUANTITY	0.99+
Roychowdhury	PERSON	0.99+
MIT	ORGANIZATION	0.99+
Yamamoto	PERSON	0.99+
five different schemes	QUANTITY	0.99+
one	QUANTITY	0.98+
today	DATE	0.98+
two circuits	QUANTITY	0.97+
Cambridge University	ORGANIZATION	0.97+
first	QUANTITY	0.97+
one mode	QUANTITY	0.95+
about a million resistors	QUANTITY	0.93+
two omega	QUANTITY	0.9+
Silicon Photonics	OTHER	0.87+
England	LOCATION	0.86+
a thousand LC	QUANTITY	0.82+
Physics Implements Lagrange Multiplier Optimization	TITLE	0.8+
about eight different schemes	QUANTITY	0.78+
Hamiltonian	OTHER	0.63+
much	QUANTITY	0.59+
couple	QUANTITY	0.54+
Stanford	ORGANIZATION	0.48+

Leicester Clinical Data Science Initiative

>>Hello. I'm Professor Toru Suzuki Cherif cardiovascular medicine on associate dean of the College of Life Sciences at the University of Leicester in the United Kingdom, where I'm also director of the Lester Life Sciences accelerator. I'm also honorary consultant cardiologist within our university hospitals. It's part of the national health system NHS Trust. Today, I'd like to talk to you about our Lester Clinical Data Science Initiative. Now brief background on Lester. It's university in hospitals. Lester is in the center of England. The national health system is divided depending on the countries. The United Kingdom, which is comprised of, uh, England, Scotland to the north, whales to the west and Northern Ireland is another part in a different island. But national health system of England is what will be predominantly be discussed. Today has a history of about 70 years now, owing to the fact that we're basically in the center of England. Although this is only about one hour north of London, we have a catchment of about 100 miles, which takes us from the eastern coast of England, bordering with Birmingham to the west north just south of Liverpool, Manchester and just south to the tip of London. We have one of the busiest national health system trust in the United Kingdom, with a catchment about 100 miles and one million patients a year. Our main hospital, the General Hospital, which is actually called the Royal Infirmary, which can has an accident and emergency, which means Emergency Department is that has one of the busiest emergency departments in the nation. I work at Glen Field Hospital, which is one of the main cardiovascular hospitals of the United Kingdom and Europe. Academically, the Medical School of the University of Leicester is ranked 20th in the world on Lee, behind Cambridge, Oxford Imperial College and University College London. For the UK, this is very research. Waited, uh, ranking is Therefore we are very research focused universities as well for the cardiovascular research groups, with it mainly within Glenn Field Hospital, we are ranked as the 29th Independent research institution in the world which places us. A Suffield waited within our group. As you can see those their top ranked this is regardless of cardiology, include institutes like the Broad Institute and Whitehead Institute. Mitt Welcome Trust Sanger, Howard Hughes Medical Institute, Kemble, Cold Spring Harbor and as a hospital we rank within ah in this field in a relatively competitive manner as well. Therefore, we're very research focused. Hospital is well now to give you the unique selling points of Leicester. We're we're the largest and busiest national health system trust in the United Kingdom, but we also have a very large and stable as well as ethnically diverse population. The population ranges often into three generations, which allows us to do a lot of cohort based studies which allows us for the primary and secondary care cohorts, lot of which are well characterized and focused on genomics. In the past. We also have a biomedical research center focusing on chronic diseases, which is funded by the National Institutes of Health Research, which funds clinical research the hospitals of United Kingdom on we also have a very rich regional life science cluster, including med techs and small and medium sized enterprises. Now for this, the bottom line is that I am the director of the letter site left Sciences accelerator, >>which is tasked with industrial engagement in the local national sectors but not excluding the international sectors as well. Broadly, we have academics and clinicians with interest in health care, which includes science and engineering as well as non clinical researchers. And prior to the cove it outbreak, the government announced the £450 million investment into our university hospitals, which I hope will be going forward now to give you a brief background on where the scientific strategy the United Kingdom lies. Three industrial strategy was brought out a za part of the process which involved exiting the European Union, and part of that was the life science sector deal. And among this, as you will see, there were four grand challenges that were put in place a I and data economy, future of mobility, clean growth and aging society and as a medical research institute. A lot of the focus that we have been transitioning with within my group are projects are focused on using data and analytics using artificial intelligence, but also understanding how chronic diseases evolved as part of the aging society, and therefore we will be able to address these grand challenges for the country. Additionally, the national health system also has its long term plans, which we align to. One of those is digitally enabled care and that this hope you're going mainstream over the next 10 years. And to do this, what is envision will be The clinicians will be able to access and interact with patient records and care plants wherever they are with ready access to decision support and artificial intelligence, and that this will enable predictive techniques, which include linking with clinical genomic as well as other data supports, such as image ing a new medical breakthroughs. There has been what's called the Topol Review that discusses the future of health care in the United Kingdom and preparing the health care workforce for the delivery of the digital future, which clearly discusses in the end that we would be using automated image interpretation. Is using artificial intelligence predictive analytics using artificial intelligence as mentioned in the long term plans. That is part of that. We will also be engaging natural language processing speech recognition. I'm reading the genome amusing. Genomic announced this as well. We are in what is called the Midland's. As I mentioned previously, the Midland's comprised the East Midlands, where we are as Lester, other places such as Nottingham. We're here. The West Midland involves Birmingham, and here is ah collective. We are the Midlands. Here we comprise what is called the Midlands engine on the Midland's engine focuses on transport, accelerating innovation, trading with the world as well as the ultra connected region. And therefore our work will also involve connectivity moving forward. And it's part of that. It's part of our health care plans. We hope to also enable total digital connectivity moving forward and that will allow us to embrace digital data as well as collectivity. These three key words will ah Linkous our health care systems for the future. Now, to give you a vision for the future of medicine vision that there will be a very complex data set that we will need to work on, which will involve genomics Phanom ICS image ing which will called, uh oh mix analysis. But this is just meaning that is, uh complex data sets that we need to work on. This will integrate with our clinical data Platforms are bioinformatics, and we'll also get real time information of physiology through interfaces and wearables. Important for this is that we have computing, uh, processes that will now allow this kind of complex data analysis in real time using artificial intelligence and machine learning based applications to allow visualization Analytics, which could be out, put it through various user interfaces to the clinician and others. One of the characteristics of the United Kingdom is that the NHS is that we embrace data and captured data from when most citizens have been born from the cradle toe when they die to the grave. And it's important that we were able to link this data up to understand the journey of that patient. Over time. When they come to hospital, which is secondary care data, we will get disease data when they go to their primary care general practitioner, we will be able to get early check up data is Paula's follow monitoring monitoring, but also social care data. If this could be linked, allow us to understand how aging and deterioration as well as frailty, uh, encompasses thes patients. And to do this, we have many, many numerous data sets available, including clinical letters, blood tests, more advanced tests, which is genetics and imaging, which we can possibly, um, integrate into a patient journey which will allow us to understand the digital journey of that patient. I have called this the digital twin patient cohort to do a digital simulation of patient health journeys using data integration and analytics. This is a technique that has often been used in industrial manufacturing to understand the maintenance and service points for hardware and instruments. But we would be using this to stratify predict diseases. This'll would also be monitored and refined, using wearables and other types of complex data analysis to allow for, in the end, preemptive intervention to allow paradigm shifting. How we undertake medicine at this time, which is more reactive rather than proactive as infrastructure we are presently working on putting together what's it called the Data Safe haven or trusted research environment? One which with in the clinical environment, the university hospitals and curated and data manner, which allows us to enable data mining off the databases or, I should say, the trusted research environment within the clinical environment. Hopefully, we will then be able to anonymous that to allow ah used by academics and possibly also, uh, partnering industry to do further data mining and tool development, which we could then further field test again using our real world data base of patients that will be continually, uh, updating in our system. In the cardiovascular group, we have what's called the bricks cohort, which means biomedical research. Informatics Center for Cardiovascular Science, which was done, started long time even before I joined, uh, in 2010 which has today almost captured about 10,000 patients arm or who come through to Glenn Field Hospital for various treatments or and even those who have not on. We asked for their consent to their blood for genetics, but also for blood tests, uh, genomics testing, but also image ing as well as other consent. Hable medical information s so far there about 10,000 patients and we've been trying to extract and curate their data accordingly. Again, a za reminder of what the strengths of Leicester are. We have one of the largest and busiest trust with the very large, uh, patient cohort Ah, focused dr at the university, which allows for chronic diseases such as heart disease. I just mentioned our efforts on heart disease, uh which are about 10,000 patients ongoing right now. But we would wish thio include further chronic diseases such as diabetes, respiratory diseases, renal disease and further to understand the multi modality between these diseases so that we can understand how they >>interact as well. Finally, I like to talk about the lesser life science accelerator as well. This is a new project that was funded by >>the U started this January for three years. I'm the director for this and all the groups within the College of Life Sciences that are involved with healthcare but also clinical work are involved. And through this we hope to support innovative industrial partnerships and collaborations in the region, a swells nationally and further on into internationally as well. I realized that today is a talked to um, or business and commercial oriented audience. And we would welcome interest from your companies and partners to come to Leicester toe work with us on, uh, clinical health care data and to drive our agenda forward for this so that we can enable innovative research but also product development in partnership with you moving forward. Thank you for your time.

Published Date : Sep 21 2020

SUMMARY :

We have one of the busiest national health system trust in the United Kingdom, with a catchment as part of the aging society, and therefore we will be able to address these grand challenges for Finally, I like to talk about the lesser the U started this January for three years.

ENTITIES

Entity	Category	Confidence
National Institutes of Health Research	ORGANIZATION	0.99+
Howard Hughes Medical Institute	ORGANIZATION	0.99+
Birmingham	LOCATION	0.99+
2010	DATE	0.99+
Broad Institute	ORGANIZATION	0.99+
England	LOCATION	0.99+
College of Life Sciences	ORGANIZATION	0.99+
Whitehead Institute	ORGANIZATION	0.99+
United Kingdom	LOCATION	0.99+
Toru Suzuki Cherif	PERSON	0.99+
Europe	LOCATION	0.99+
London	LOCATION	0.99+
£450 million	QUANTITY	0.99+
Lester	ORGANIZATION	0.99+
three years	QUANTITY	0.99+
Oxford Imperial College	ORGANIZATION	0.99+
Leicester	LOCATION	0.99+
European Union	ORGANIZATION	0.99+
Informatics Center for Cardiovascular Science	ORGANIZATION	0.99+
Scotland	LOCATION	0.99+
Glenn Field Hospital	ORGANIZATION	0.99+
Manchester	LOCATION	0.99+
Today	DATE	0.99+
Nottingham	LOCATION	0.99+
Cold Spring Harbor	ORGANIZATION	0.99+
today	DATE	0.99+
General Hospital	ORGANIZATION	0.99+
one	QUANTITY	0.99+
Glen Field Hospital	ORGANIZATION	0.99+
Kemble	ORGANIZATION	0.99+
Royal Infirmary	ORGANIZATION	0.99+
about 100 miles	QUANTITY	0.99+
Northern Ireland	LOCATION	0.99+
Lester Life Sciences	ORGANIZATION	0.99+
Liverpool	LOCATION	0.99+
UK	LOCATION	0.98+
about 70 years	QUANTITY	0.98+
Midland	LOCATION	0.98+
about 10,000 patients	QUANTITY	0.98+
University of Leicester	ORGANIZATION	0.98+
NHS Trust	ORGANIZATION	0.98+
Mitt Welcome Trust Sanger	ORGANIZATION	0.98+
Paula	PERSON	0.98+
West Midland	LOCATION	0.98+
about 10,000 patients	QUANTITY	0.97+
East Midlands	LOCATION	0.97+
about one hour	QUANTITY	0.97+
NHS	ORGANIZATION	0.97+
20th	QUANTITY	0.97+
United Kingdom	LOCATION	0.96+
University College London	ORGANIZATION	0.96+
One	QUANTITY	0.95+
one million patients a year	QUANTITY	0.93+
Suffield	ORGANIZATION	0.92+
Three industrial strategy	QUANTITY	0.92+
three generations	QUANTITY	0.92+
Lester Clinical Data Science Initiative	ORGANIZATION	0.89+
Lee	LOCATION	0.88+
January	DATE	0.88+
Medical School of the	ORGANIZATION	0.87+
University of Leicester	ORGANIZATION	0.87+
Midlands	LOCATION	0.87+
Lester	LOCATION	0.87+
three key words	QUANTITY	0.86+
Topol Review	TITLE	0.85+
Leicester	ORGANIZATION	0.83+
Leicester Clinical Data Science Initiative	ORGANIZATION	0.82+
four grand challenges	QUANTITY	0.82+
Emergency Department	ORGANIZATION	0.8+
twin patient	QUANTITY	0.73+
29th Independent research	QUANTITY	0.69+
next 10 years	DATE	0.66+

Day 2 Livestream | Enabling Real AI with Dell

>>from the Cube Studios >>in Palo Alto and >>Boston connecting with thought leaders all around the world. This is a cube conversation. >>Hey, welcome back here. Ready? Jeff Frick here with the Cube. We're doing a special presentation today really talking about AI and making ai really with two companies that are right in the heart of the Dell EMC as well as Intel. So we're excited to have a couple Cube alumni back on the program. Haven't seen him in a little while. First off from Intel. Lisa Spelman. She is the corporate VP and GM for the Xeon Group in Jersey on and Memory Group. Great to see you, Lisa. >>Good to see you again, too. >>And we've got Ravi Pinter. Conte. He is the SBP server product management, also from Dell Technologies. Ravi, great to see you as well. >>Good to see you on beast. Of course, >>yes. So let's jump into it. So, yesterday, Robbie, you guys announced a bunch of new kind of ai based solutions where if you can take us through that >>Absolutely so one of the things we did Jeff was we said it's not good enough for us to have a point product. But we talked about hope, the tour of products, more importantly, everything from our workstation side to the server to these storage elements and things that we're doing with VM Ware, for example. Beyond that, we're also obviously pleased with everything we're doing on bringing the right set off validated configurations and reference architectures and ready solutions so that the customer really doesn't have to go ahead and do the due diligence. Are figuring out how the various integration points are coming for us in making a solution possible. Obviously, all this is based on the great partnership we have with Intel on using not just their, you know, super cues, but FPG's as well. >>That's great. So, Lisa, I wonder, you know, I think a lot of people you know, obviously everybody knows Intel for your CPU is, but I don't think they recognize kind of all the other stuff that can wrap around the core CPU to add value around a particular solution. Set or problems. That's what If you could tell us a little bit more about Z on family and what you guys are doing in the data center with this kind of new interesting thing called AI and machine learning. >>Yeah. Um, so thanks, Jeff and Ravi. It's, um, amazing. The way to see that artificial intelligence applications are just growing in their pervasiveness. And you see it taking it out across all sorts of industries. And it's actually being built into just about every application that is coming down the pipe. And so if you think about meeting toe, have your hardware foundation able to support that. That's where we're seeing a lot of the customer interest come in. And not just a first Xeon, but, like Robbie said on the whole portfolio and how the system and solution configuration come together. So we're approaching it from a total view of being able to move all that data, store all of that data and cross us all of that data and providing options along that entire pipeline that move, um, and within that on Z on. Specifically, we've really set that as our cornerstone foundation for AI. If it's the most deployed solution and data center CPU around the world and every single application is going to have artificial intelligence in it, it makes sense that you would have artificial intelligence acceleration built into the actual hardware so that customers get a better experience right out of the box, regardless of which industry they're in or which specialized function they might be focusing on. >>It's really it's really wild, right? Cause in process, right, you always move through your next point of failure. So, you know, having all these kind of accelerants and the ways that you can carve off parts of the workload part of the intelligence that you can optimize betters is so important as you said Lisa and also Rocket and the solution side. Nobody wants General Ai just for ai sake. It's a nice word. Interesting science experiment. But it's really in the applied. A world is. We're starting to see the value in the application of this stuff, and I wonder you have a customer. You want to highlight Absalon, tell us a little bit about their journey and what you guys did with them. >>Great, sure. I mean, if you didn't start looking at Epsilon there in the market in the marketing business, and one of the crucial things for them is to ensure that they're able to provide the right data. Based on that analysis, there run on? What is it that the customer is looking for? And they can't wait for a period of time, but they need to be doing that in the near real time basis, and that's what excellent does. And what really blew my mind was the fact that they actually service are send out close to 100 billion messages. Again, it's 100 billion messages a year. And so you can imagine the amount of data that they're analyzing, which is in petabytes of data, and they need to do real time. And that's all possible because of the kind of analytics we have driven into the power It silver's, you know, using the latest of the Intel Intel Xeon processor couple with some of the technologies from the BGS side, which again I love them to go back in and analyze this data and service to the customers very rapidly. >>You know, it's funny. I think Mark Tech is kind of an under appreciated ah world of ai and, you know, in machine to machine execution, right, That's the amount of transactions go through when you load a webpage on your site that actually ideas who you are you know, puts puts a marketplace together, sells time on that or a spot on that ad and then lets people in is a really sophisticated, as you said in massive amounts of data going through the interesting stuff. If it's done right, it's magic. And if it's done, not right, then people get pissed off. You gotta have. You gotta have use our tools. >>You got it. I mean, this is where I talked about, you know, it can be garbage in garbage out if you don't really act on the right data. Right. So that is where I think it becomes important. But also, if you don't do it in a timely fashion, but you don't service up the right content at the right time. You miss the opportunity to go ahead and grab attention, >>right? Right. Lisa kind of back to you. Um, you know, there's all kinds of open source stuff that's happening also in the in the AI and machine learning world. So we hear things about tense or flow and and all these different libraries. How are you guys, you know, kind of embracing that world as you look at ai and kind of the development. We've been at it for a while. You guys are involved in everything from autonomous vehicles to the Mar Tech. Is we discussed? How are you making sure that these things were using all the available resources to optimize the solutions? >>Yeah, I think you and Robbie we're just hitting on some of those examples of how many ways people have figured out how to apply AI now. So maybe at first it was really driven by just image recognition and image tagging. But now you see so much work being driven in recommendation engines and an object detection for much more industrial use cases, not just consumer enjoyment and also those things you mentioned and hit on where the personalization is a really fine line you walk between. How do you make an experience feel good? Personalized versus creepy personalized is a real challenge and opportunity across so many industries. And so open source like you mentioned, is a great place for that foundation because it gives people the tools to build upon. And I think our strategy is really a stack strategy that starts first with delivering the best hardware for artificial intelligence and again the other is the foundation for that. But we also have, you know, Milat type processing for out of the Edge. And then we have all the way through to very custom specific accelerators into the data center, then on top about the optimized software, which is going into each of those frameworks and doing the work so that the framework recognizes the specific acceleration we built into the CPU. Whether that steel boost or recognizes the capabilities that sit in that accelerator silicon, and then once we've done that software layer and this is where we have the opportunity for a lot of partnership is the ecosystem and the solutions work that Robbie started off by talking about. So Ai isn't, um, it's not easy for everyone. It has a lot of value, but it takes work to extract that value. And so partnerships within the ecosystem to make sure that I see these are taking those optimization is building them in and fundamentally can deliver to customers. Reliable solution is the last leg of that of that strategy, but it really is one of the most important because without it you get a lot of really good benchmark results but not a lot of good, happy customer, >>right? I'm just curious, Lee says, because you kind of sit in the catbird seat. You guys at the core, you know, kind of under all the layers running data centers run these workloads. How >>do you see >>kind of the evolution of machine learning and ai from kind of the early days, where with science projects and and really smart people on mahogany row versus now people are talking about trying to get it to, like a citizen developer, but really a citizen data science and, you know, in exposing in the power of AI to business leaders or business executioners. Analysts, if you will, so they can apply it to their day to day world in their day to day life. How do you see that kind of evolving? Because you not only in it early, but you get to see some of the stuff coming down the road in design, find wins and reference architectures. How should people think about this evolution? >>It really is one of those things where if you step back from the fundamentals of AI, they've actually been around for 50 or more years. It's just that the changes in the amount of computing capability that's available, the network capacity that's available and the fundamental efficiency that I t and infrastructure managers and get out of their cloud architectures as allowed for this pervasiveness to evolve. And I think that's been the big tipping point that pushed people over this fear. Of course, I went through the same thing that cloud did where you had maybe every business leader or CEO saying Hey, get me a cloud and I'll figure out what for later give me some AI will get a week and make it work, But we're through those initial use pieces and starting to see a business value derived from from those deployments. And I think some of the most exciting areas are in the medical services field and just the amount, especially if you think of the environment we're in right now. The amount of efficiency and in some cases, reduction in human contact that you could require for diagnostics and just customer tracking and ability, ability to follow their entire patient History is really powerful and represents the next wave and care and how we scale our limited resource of doctors nurses technician. And the point we're making of what's coming next is where you start to see even more mass personalization and recommendations in that way that feel very not spooky to people but actually comforting. And they take value from them because it allows them to immediately act. Robbie reference to the speed at which you have to utilize the data. When people get immediately act more efficiently. They're generally happier with the service. So we see so much opportunity and we're continuing to address across, you know, again that hardware, software and solution stack so we can stay a step ahead of our customers, >>Right? That's great, Ravi. I want to give you the final word because you guys have to put the solutions together, it actually delivering to the customer. So not only, you know the hardware and the software, but any other kind of ecosystem components that you have to bring together. So I wonder if you can talk about that approach and how you know it's it's really the solution. At the end of the day, not specs, not speeds and feeds. That's not really what people care about. It's really a good solution. >>Yeah, three like Jeff, because end of the day I mean, it's like this. Most of us probably use the A team to retry money, but we really don't know what really sits behind 80 and my point being that you really care at that particular point in time to be able to put a radio do machine and get your dollar bills out, for example. Likewise, when you start looking at what the customer really needs to know, what Lisa hit upon is actually right. I mean what they're looking for. And you said this on the whole solution side house. To our our mantra to this is very simple. We want to make sure that we use the right basic building blocks, ensuring that we bring the right solutions using three things the right products which essentially means that we need to use the right partners to get the right processes in GPU Xen. But then >>we get >>to the next level by ensuring that we can actually do things we can either provide no ready solutions are validated reference architectures being that you have the sausage making process that you now don't need to have the customer go through, right? In a way. We have done the cooking and we provide a recipe book and you just go through the ingredient process of peering does and then off your off right to go get your solution done. And finally, the final stages there might be helped that customers still need in terms of services. That's something else Dell technology provides. And the whole idea is that customers want to go out and have them help deploying the solutions. We can also do that we're services. So that's probably the way we approach our data. The way we approach, you know, providing the building blocks are using the right technologies from our partners, then making sure that we have the right solutions that our customers can look at. And finally, they need deployment. Help weaken due their services. >>Well, Robbie, Lisa, thanks for taking a few minutes. That was a great tee up, Rob, because I think we're gonna go to a customer a couple of customer interviews enjoying that nice meal that you prepared with that combination of hardware, software, services and support. So thank you for your time and a great to catch up. All right, let's go and run the tape. Hi, Jeff. I wanted to talk about two examples of collaboration that we have with the partners that have yielded Ah, really examples of ah put through HPC and AI activities. So the first example that I wanted to cover is within your AHMAD team up in Canada with that team. We collaborated with Intel on a tuning of algorithm and code in order to accelerate the mapping of the human brain. So we have a cluster down here in Texas called Zenith based on Z on and obtain memory on. And we were able to that customer with the three of us are friends and Intel the norm, our team on the Dell HPC on data innovation, injuring team to go and accelerate the mapping of the human brain. So imagine patients playing video games or doing all sorts of activities that help understand how the brain sends the signal in order to trigger a response of the nervous system. And it's not only good, good way to map the human brain, but think about what you can get with that type of information in order to help cure Alzheimer's or dementia down the road. So this is really something I'm passionate about. Is using technology to help all of us on all of those that are suffering from those really tough diseases? Yeah, yeah, way >>boil. I'm a project manager for the project, and the idea is actually to scan six participants really intensively in both the memory scanner and the G scanner and see if we can use human brain data to get closer to something called Generalized Intelligence. What we have in the AI world, the systems that are mathematically computational, built often they do one task really, really well, but they struggle with other tasks. Really good example. This is video games. Artificial neural nets can often outperform humans and video games, but they don't really play in a natural way. Artificial neural net. Playing Mario Brothers The way that it beats the system is by actually kind of gliding its way through as quickly as possible. And it doesn't like collect pennies. For example, if you play Mary Brothers as a child, you know that collecting those coins is part of your game. And so the idea is to get artificial neural nets to behave more like humans. So like we have Transfer of knowledge is just something that humans do really, really well and very naturally. It doesn't take 50,000 examples for a child to know the difference between a dog and a hot dog when you eat when you play with. But an artificial neural net can often take massive computational power and many examples before it understands >>that video games are awesome, because when you do video game, you're doing a vision task instant. You're also doing a >>lot of planning and strategy thinking, but >>you're also taking decisions you several times a second, and we record that we try to see. Can we from brain activity predict >>what people were doing? We can break almost 90% accuracy with this type of architecture. >>Yeah, yeah, >>Use I was the lead posts. Talk on this collaboration with Dell and Intel. She's trying to work on a model called Graph Convolution Neural nets. >>We have being involved like two computing systems to compare it, like how the performance >>was voting for The lab relies on both servers that we have internally here, so I have a GPU server, but what we really rely on is compute Canada and Compute Canada is just not powerful enough to be able to run the models that he was trying to run so it would take her days. Weeks it would crash, would have to wait in line. Dell was visiting, and I was invited into the meeting very kindly, and they >>told us that they started working with a new >>type of hardware to train our neural nets. >>Dell's using traditional CPU use, pairing it with a new >>type off memory developed by Intel. Which thing? They also >>their new CPU architectures and really optimized to do deep learning. So all of that sounds great because we had this problem. We run out of memory, >>the innovation lab having access to experts to help answer questions immediately. That's not something to gate. >>We were able to train the attic snatch within 20 minutes. But before we do the same thing, all the GPU we need to wait almost three hours to each one simple way we >>were able to train the short original neural net. Dell has been really great cause anytime we need more memory, we send an email, Dell says. Yeah, sure, no problem. We'll extended how much memory do you need? It's been really simple from our end, and I think it's really great to be at the edge of science and technology. We're not just doing the same old. We're pushing the boundaries. Like often. We don't know where we're going to be in six months. In the big data world computing power makes a big difference. >>Yeah, yeah, yeah, yeah. The second example I'd like to cover is the one that will call the data accelerator. That's a publisher that we have with the University of Cambridge, England. There we partnered with Intel on Cambridge, and we built up at the time the number one Io 500 storage solution on. And it's pretty amazing because it was built on standard building blocks, power edge servers until Xeon processors some envy me drives from our partners and Intel. And what we did is we. Both of this system with a very, very smart and elaborate suffering code that gives an ultra fast performance for our customers, are looking for a front and fast scratch to their HPC storage solutions. We're also very mindful that this innovation is great for others to leverage, so the suffering Could will soon be available on Get Hub on. And, as I said, this was number one on the Iot 500 was initially released >>within Cambridge with always out of focus on opening up our technologies to UK industry, where we can encourage UK companies to take advantage of advanced research computing technologies way have many customers in the fields of automotive gas life sciences find our systems really help them accelerate their product development process. Manage Poor Khalidiya. I'm the director of research computing at Cambridge University. Yeah, we are a research computing cloud provider, but the emphasis is on the consulting on the processes around how to exploit that technology rather than the better results. Our value is in how we help businesses use advanced computing resources rather than the provision. Those results we see increasingly more and more data being produced across a wide range of verticals, life sciences, astronomy, manufacturing. So the data accelerators that was created as a component within our data center compute environment. Data processing is becoming more and more central element within research computing. We're getting very large data sets, traditional spinning disk file systems can't keep up and we find applications being slowed down due to a lack of data, So the data accelerator was born to take advantage of new solid state storage devices. I tried to work out how we can have a a staging mechanism for keeping your data on spinning disk when it's not required pre staging it on fast envy any stories? Devices so that can feed the applications at the rate quiet for maximum performance. So we have the highest AI capability available anywhere in the UK, where we match II compute performance Very high stories performance Because for AI, high performance storage is a key element to get the performance up. Currently, the data accelerated is the fastest HPC storage system in the world way are able to obtain 500 gigabytes a second read write with AI ops up in the 20 million range. We provide advanced computing technologies allow some of the brightest minds in the world really pushed scientific and medical research. We enable some of the greatest academics in the world to make tomorrow's discoveries. Yeah, yeah, yeah. >>Alright, Welcome back, Jeff Frick here and we're excited for this next segment. We're joined by Jeremy Raider. He is the GM digital transformation and scale solutions for Intel Corporation. Jeremy, great to see you. Hey, thanks for having me. I love I love the flowers in the backyard. I thought maybe you ran over to the Japanese, the Japanese garden or the Rose Garden, Right To very beautiful places to visit in Portland. >>Yeah. You know, you only get him for a couple. Ah, couple weeks here, so we get the timing just right. >>Excellent. All right, so let's jump into it. Really? And in this conversation really is all about making Ai Riel. Um, and you guys are working with Dell and you're working with not only Dell, right? There's the hardware and software, but a lot of these smaller a solution provider. So what is some of the key attributes that that needs to make ai riel for your customers out there? >>Yeah, so, you know, it's a it's a complex space. So when you can bring the best of the intel portfolio, which is which is expanding a lot, you know, it's not just the few anymore you're getting into Memory technologies, network technologies and kind of a little less known as how many resources we have focused on the software side of things optimizing frameworks and optimizing, and in these key ingredients and libraries that you can stitch into that portfolio to really get more performance in value, out of your machine learning and deep learning space. And so you know what we've really done here with Dell? It has started to bring a bunch of that portfolio together with Dell's capabilities, and then bring in that ai's V partner, that software vendor where we can really take and stitch and bring the most value out of that broad portfolio, ultimately using using the complexity of what it takes to deploy an AI capability. So a lot going on. They're bringing kind of the three legged stool of the software vendor hardware vendor dental into the mix, and you get a really strong outcome, >>right? So before we get to the solutions piece, let's stick a little bit into the Intel world. And I don't know if a lot of people are aware that obviously you guys make CPUs and you've been making great CPIs forever. But there's a whole lot more stuff that you've added, you know, kind of around the core CPU. If you will in terms of of actual libraries and ways to really optimize the seond processors to operate in an AI world. I wonder if you can kind of take us a little bit below the surface on how that works. What are some of the examples of things you can do to get more from your Gambira Intel processors for ai specific applications of workloads? >>Yeah, well, you know, there's a ton of software optimization that goes into this. You know that having the great CPU is definitely step one. But ultimately you want to get down into the libraries like tensor flow. We have data analytics, acceleration libraries. You know, that really allows you to get kind of again under the covers a little bit and look at it. How do we have to get the most out of the kinds of capabilities that are ultimately used in machine learning in deep learning capabilities, and then bring that forward and trying and enable that with our software vendors so that they can take advantage of those acceleration components and ultimately, you know, move from, you know, less training time or could be a the cost factor. But those are the kind of capabilities we want to expose to software vendors do these kinds of partnerships. >>Okay. Ah, and that's terrific. And I do think that's a big part of the story that a lot of people are probably not as aware of that. There are a lot of these optimization opportunities that you guys have been leveraging for a while. So shifting gears a little bit, right? AI and machine learning is all about the data. And in doing a little research for this, I found actually you on stage talking about some company that had, like, 350 of road off, 315 petabytes of data, 140,000 sources of those data. And I think probably not great quote of six months access time to get that's right and actually work with it. And the company you're referencing was intel. So you guys know a lot about debt data, managing data, everything from your manufacturing, and obviously supporting a global organization for I t and run and ah, a lot of complexity and secrets and good stuff. So you know what have you guys leveraged as intel in the way you work with data and getting a good data pipeline. That's enabling you to kind of put that into these other solutions that you're providing to the customers, >>right? Well, it is, You know, it's absolutely a journey, and it doesn't happen overnight, and that's what we've you know. We've seen it at Intel on We see it with many of our customers that are on the same journey that we've been on. And so you know, this idea of building that pipeline it really starts with what kind of problems that you're trying to solve. What are the big issues that are holding you back that company where you see that competitive advantage that you're trying to get to? And then ultimately, how do you build the structure to enable the right kind of pipeline of that data? Because that's that's what machine learning and deep learning is that data journey. So really a lot of focus around you know how we can understand those business challenges bring forward those kinds of capabilities along the way through to where we structure our entire company around those assets and then ultimately some of the partnerships that we're gonna be talking about these companies that are out there to help us really squeeze the most out of that data as quickly as possible because otherwise it goes stale real fast, sits on the shelf and you're not getting that value out of right. So, yeah, we've been on the journey. It's Ah, it's a long journey, but ultimately we could take a lot of those those kind of learnings and we can apply them to our silicon technology. The software optimization is that we're doing and ultimately, how we talk to our enterprise customers about how they can solve overcome some of the same challenges that we did. >>Well, let's talk about some of those challenges specifically because, you know, I think part of the the challenge is that kind of knocked big data, if you will in Hadoop, if you will kind of off the rails. Little bit was there's a whole lot that goes into it. Besides just doing the analysis, there's a lot of data practice data collection, data organization, a whole bunch of things that have to happen before. You can actually start to do the sexy stuff of AI. So you know, what are some of those challenges. How are you helping people get over kind of these baby steps before they can really get into the deep end of the pool? >>Yeah, well, you know, one is you have to have the resource is so you know, do you even have the resource is if you can acquire those Resource is can you keep them interested in the kind of work that you're doing? So that's a big challenge on and actually will talk about how that fits into some of the partnerships that we've been establishing in the ecosystem. It's also you get stuck in this poc do loop, right? You finally get those resource is and they start to get access to that data that we talked about. It start to play out some scenarios, a theorize a little bit. Maybe they show you some really interesting value, but it never seems to make its way into a full production mode. And I think that is a challenge that has faced so many enterprises that are stuck in that loop. And so that's where we look at who's out there in the ecosystem that can help more readily move through that whole process of the evaluation that proved the r a y, the POC and ultimately move that thing that capability into production mode as quickly as possible that you know that to me is one of those fundamental aspects of if you're stuck in the POC. Nothing's happening from this. This is not helping your company. We want to move things more quickly, >>right? Right. And let's just talk about some of these companies that you guys are working with that you've got some reference architectures is data robot a Grid dynamics H 20 just down the road in Antigua. So a lot of the companies we've worked with with Cube and I think you know another part that's interesting. It again we can learn from kind of old days of big data is kind of generalized. Ai versus solution specific. Ai and I think you know where there's a real opportunity is not AI for a sake, but really it's got to be applied to a specific solution, a specific problem so that you have, you know, better chatbots, better customer service experience, you know, better something. So when you were working with these folks and trying to design solutions or some of the opportunities that you saw to work with some of these folks to now have an applied a application slash solution versus just kind of AI for ai's sake. >>Yeah. I mean, that could be anything from fraud, detection and financial services, or even taking a step back and looking more horizontally like back to that data challenge. If if you're stuck at the AI built a fantastic Data lake, but I haven't been able to pull anything back out of it, who are some of the companies that are out there that can help overcome some of those big data challenges and ultimately get you to where you know, you don't have a data scientist spending 60% of their time on data acquisition pre processing? That's not where we want them, right? We want them on building out that next theory. We want them on looking at the next business challenge. We want them on selecting the right models, but ultimately they have to do that as quickly as possible so that they can move that that capability forward into the next phase. So, really, it's about that that connection of looking at those those problems or challenges in the whole pipeline. And these companies like data robot in H 20 quasi. Oh, they're all addressing specific challenges in the end to end. That's why they've kind of bubbled up as ones that we want to continue to collaborate with, because it can help enterprises overcome those issues more fast. You know more readily. >>Great. Well, Jeremy, thanks for taking a few minutes and giving us the Intel side of the story. Um, it's a great company has been around forever. I worked there many, many moons ago. That's Ah, that's a story for another time, but really appreciate it and I'll interview you will go there. Alright, so super. Thanks a lot. So he's Jeremy. I'm Jeff Frick. So now it's time to go ahead and jump into the crowd chat. It's crowdchat dot net slash make ai real. Um, we'll see you in the chat. And thanks for watching

Published Date : Jun 3 2020

SUMMARY :

Boston connecting with thought leaders all around the world. She is the corporate VP and GM Ravi, great to see you as well. Good to see you on beast. solutions where if you can take us through that reference architectures and ready solutions so that the customer really doesn't have to on family and what you guys are doing in the data center with this kind of new interesting thing called AI and And so if you think about meeting toe, have your hardware foundation part of the intelligence that you can optimize betters is so important as you said Lisa and also Rocket and the solution we have driven into the power It silver's, you know, using the latest of the Intel Intel of ai and, you know, in machine to machine execution, right, That's the amount of transactions I mean, this is where I talked about, you know, How are you guys, you know, kind of embracing that world as you look But we also have, you know, Milat type processing for out of the Edge. you know, kind of under all the layers running data centers run these workloads. and, you know, in exposing in the power of AI to business leaders or business the speed at which you have to utilize the data. So I wonder if you can talk about that approach and how you know to retry money, but we really don't know what really sits behind 80 and my point being that you The way we approach, you know, providing the building blocks are using the right technologies the brain sends the signal in order to trigger a response of the nervous know the difference between a dog and a hot dog when you eat when you play with. that video games are awesome, because when you do video game, you're doing a vision task instant. that we try to see. We can break almost 90% accuracy with this Talk on this collaboration with Dell and Intel. to be able to run the models that he was trying to run so it would take her days. They also So all of that the innovation lab having access to experts to help answer questions immediately. do the same thing, all the GPU we need to wait almost three hours to each one do you need? That's a publisher that we have with the University of Cambridge, England. Devices so that can feed the applications at the rate quiet for maximum performance. I thought maybe you ran over to the Japanese, the Japanese garden or the Rose Ah, couple weeks here, so we get the timing just right. Um, and you guys are working with Dell and you're working with not only Dell, right? the intel portfolio, which is which is expanding a lot, you know, it's not just the few anymore What are some of the examples of things you can do to get more from You know, that really allows you to get kind of again under the covers a little bit and look at it. So you know what have you guys leveraged as intel in the way you work with data and getting And then ultimately, how do you build the structure to enable the right kind of pipeline of that is that kind of knocked big data, if you will in Hadoop, if you will kind of off the rails. Yeah, well, you know, one is you have to have the resource is so you know, do you even have the So a lot of the companies we've worked with with Cube and I think you know another that can help overcome some of those big data challenges and ultimately get you to where you we'll see you in the chat.

ENTITIES

Entity	Category	Confidence
Jeff Frick	PERSON	0.99+
Jeff	PERSON	0.99+
Jeremy	PERSON	0.99+
Lisa Spelman	PERSON	0.99+
Canada	LOCATION	0.99+
Texas	LOCATION	0.99+
Robbie	PERSON	0.99+
Lee	PERSON	0.99+
Portland	LOCATION	0.99+
Xeon Group	ORGANIZATION	0.99+
Lisa	PERSON	0.99+
Dell	ORGANIZATION	0.99+
Ravi	PERSON	0.99+
Palo Alto	LOCATION	0.99+
UK	LOCATION	0.99+
60%	QUANTITY	0.99+
Jeremy Raider	PERSON	0.99+
Ravi Pinter	PERSON	0.99+
Intel	ORGANIZATION	0.99+
20 million	QUANTITY	0.99+
Mar Tech	ORGANIZATION	0.99+
50,000 examples	QUANTITY	0.99+
Rob	PERSON	0.99+
Mario Brothers	TITLE	0.99+
six months	QUANTITY	0.99+
Antigua	LOCATION	0.99+
University of Cambridge	ORGANIZATION	0.99+
Jersey	LOCATION	0.99+
140,000 sources	QUANTITY	0.99+
six participants	QUANTITY	0.99+
315 petabytes	QUANTITY	0.99+
three	QUANTITY	0.99+
Dell Technologies	ORGANIZATION	0.99+
yesterday	DATE	0.99+
two companies	QUANTITY	0.99+
500 gigabytes	QUANTITY	0.99+
AHMAD	ORGANIZATION	0.99+
Dell EMC	ORGANIZATION	0.99+
each	QUANTITY	0.99+
Cube Studios	ORGANIZATION	0.99+
first example	QUANTITY	0.99+
Both	QUANTITY	0.99+
Memory Group	ORGANIZATION	0.99+
two examples	QUANTITY	0.99+
Cambridge University	ORGANIZATION	0.98+
Rose Garden	LOCATION	0.98+
today	DATE	0.98+
both servers	QUANTITY	0.98+
one	QUANTITY	0.98+
Boston	LOCATION	0.98+
Intel Corporation	ORGANIZATION	0.98+
Khalidiya	PERSON	0.98+
second example	QUANTITY	0.98+
one task	QUANTITY	0.98+
80	QUANTITY	0.98+
intel	ORGANIZATION	0.97+
Epsilon	ORGANIZATION	0.97+
Rocket	PERSON	0.97+
both	QUANTITY	0.97+
Cube	ORGANIZATION	0.96+

Rich Gaston, Micro Focus | Virtual Vertica BDC 2020

(upbeat music) >> Announcer: It's theCUBE covering the virtual Vertica Big Data Conference 2020 brought to you by Vertica. >> Welcome back to the Vertica Virtual Big Data Conference, BDC 2020. You know, it was supposed to be a physical event in Boston at the Encore. Vertica pivoted to a digital event, and we're pleased that The Cube could participate because we've participated in every BDC since the inception. Rich Gaston this year is the global solutions architect for security risk and governance at Micro Focus. Rich, thanks for coming on, good to see you. >> Hey, thank you very much for having me. >> So you got a chewy title, man. You got a lot of stuff, a lot of hairy things in there. But maybe you can talk about your role as an architect in those spaces. >> Sure, absolutely. We handle a lot of different requests from the global 2000 type of organization that will try to move various business processes, various application systems, databases, into new realms. Whether they're looking at opening up new business opportunities, whether they're looking at sharing data with partners securely, they might be migrating it to cloud applications, and doing migration into a Hybrid IT architecture. So we will take those large organizations and their existing installed base of technical platforms and data, users, and try to chart a course to the future, using Micro Focus technologies, but also partnering with other third parties out there in the ecosystem. So we have large, solid relationships with the big cloud vendors, with also a lot of the big database spenders. Vertica's our in-house solution for big data and analytics, and we are one of the first integrated data security solutions with Vertica. We've had great success out in the customer base with Vertica as organizations have tried to add another layer of security around their data. So what we will try to emphasize is an enterprise wide data security approach, where you're taking a look at data as it flows throughout the enterprise from its inception, where it's created, where it's ingested, all the way through the utilization of that data. And then to the other uses where we might be doing shared analytics with third parties. How do we do that in a secure way that maintains regulatory compliance, and that also keeps our company safe against data breach. >> A lot has changed since the early days of big data, certainly since the inception of Vertica. You know, it used to be big data, everyone was rushing to figure it out. You had a lot of skunkworks going on, and it was just like, figure out data. And then as organizations began to figure it out, they realized, wow, who's governing this stuff? A lot of shadow IT was going on, and then the CIO was called to sort of reign that back in. As well, you know, with all kinds of whatever, fake news, the hacking of elections, and so forth, the sense of heightened security has gone up dramatically. So I wonder if you can talk about the changes that have occurred in the last several years, and how you guys are responding. >> You know, it's a great question, and it's been an amazing journey because I was walking down the street here in my hometown of San Francisco at Christmastime years ago and I got a call from my bank, and they said, we want to inform you your card has been breached by Target, a hack at Target Corporation and they got your card, and they also got your pin. And so you're going to need to get a new card, we're going to cancel this. Do you need some cash? I said, yeah, it's Christmastime so I need to do some shopping. And so they worked with me to make sure that I could get that cash, and then get the new card and the new pin. And being a professional in the inside of the industry, I really questioned, how did they get the pin? Tell me more about this. And they said, well, we don't know the details, but you know, I'm sure you'll find out. And in fact, we did find out a lot about that breach and what it did to Target. The impact that $250 million immediate impact, CIO gone, CEO gone. This was a big one in the industry, and it really woke a lot of people up to the different types of threats on the data that we're facing with our largest organizations. Not just financial data; medical data, personal data of all kinds. Flash forward to the Cambridge Analytica scandal that occurred where Facebook is handing off data, they're making a partnership agreement --think they can trust, and then that is misused. And who's going to end up paying the cost of that? Well, it's going to be Facebook at a tune of about five billion on that, plus some other finds that'll come along, and other costs that they're facing. So what we've seen over the course of the past several years has been an evolution from data breach making the headlines, and how do my customers come to us and say, help us neutralize the threat of this breach. Help us mitigate this risk, and manage this risk. What do we need to be doing, what are the best practices in the industry? Clearly what we're doing on the perimeter security, the application security and the platform security is not enough. We continue to have breaches, and we are the experts at that answer. The follow on fascinating piece has been the regulators jumping in now. First in Europe, but now we see California enacting a law just this year. They came into a place that is very stringent, and has a lot of deep protections that are really far-reaching around personal data of consumers. Look at jurisdictions like Australia, where fiduciary responsibility now goes to the Board of Directors. That's getting attention. For a regulated entity in Australia, if you're on the Board of Directors, you better have a plan for data security. And if there is a breach, you need to follow protocols, or you personally will be liable. And that is a sea change that we're seeing out in the industry. So we're getting a lot of attention on both, how do we neutralize the risk of breach, but also how can we use software tools to maintain and support our regulatory compliance efforts as we work with, say, the largest money center bank out of New York. I've watched their audit year after year, and it's gotten more and more stringent, more and more specific, tell me more about this aspect of data security, tell me more about encryption, tell me more about money management. The auditors are getting better. And we're supporting our customers in that journey to provide better security for the data, to provide a better operational environment for them to be able to roll new services out with confidence that they're not going to get breached. With that confidence, they're not going to have a regulatory compliance fine or a nightmare in the press. And these are the major drivers that help us with Vertica sell together into large organizations to say, let's add some defense in depth to your data. And that's really a key concept in the security field, this concept of defense in depth. We apply that to the data itself by changing the actual data element of Rich Gaston, I will change that name into Ciphertext, and that then yields a whole bunch of benefits throughout the organization as we deal with the lifecycle of that data. >> Okay, so a couple things I want to mention there. So first of all, totally board level topic, every board of directors should really have cyber and security as part of its agenda, and it does for the reasons that you mentioned. The other is, GDPR got it all started. I guess it was May 2018 that the penalties went into effect, and that just created a whole Domino effect. You mentioned California enacting its own laws, which, you know, in some cases are even more stringent. And you're seeing this all over the world. So I think one of the questions I have is, how do you approach all this variability? It seems to me, you can't just take a narrow approach. You have to have an end to end perspective on governance and risk and security, and the like. So are you able to do that? And if so, how so? >> Absolutely, I think one of the key areas in big data in particular, has been the concern that we have a schema, we have database tables, we have CALMS, and we have data, but we're not exactly sure what's in there. We have application developers that have been given sandbox space in our clusters, and what are they putting in there? So can we discover that data? We have those tools within Micro Focus to discover sensitive data within in your data stores, but we can also protect that data, and then we'll track it. And what we really find is that when you protect, let's say, five billion rows of a customer database, we can now know what is being done with that data on a very fine grain and granular basis, to say that this business process has a justified need to see the data in the clear, we're going to give them that authorization, they can decrypt the data. Secure data, my product, knows about that and tracks that, and can report on that and say at this date and time, Rich Gaston did the following thing to be able to pull data in the clear. And that could be then used to support the regulatory compliance responses and then audit to say, who really has access to this, and what really is that data? Then in GDPR, we're getting down into much more fine grained decisions around who can get access to the data, and who cannot. And organizations are scrambling. One of the funny conversations that I had a couple years ago as GDPR came into place was, it seemed a couple of customers were taking these sort of brute force approach of, we're going to move our analytics and all of our data to Europe, to European data centers because we believe that if we do this in the U.S., we're going to violate their law. But if we do it all in Europe, we'll be okay. And that simply was a short-term way of thinking about it. You really can't be moving your data around the globe to try to satisfy a particular jurisdiction. You have to apply the controls and the policies and put the software layers in place to make sure that anywhere that someone wants to get that data, that we have the ability to look at that transaction and say it is or is not authorized, and that we have a rock solid way of approaching that for audit and for compliance and risk management. And once you do that, then you really open up the organization to go back and use those tools the way they were meant to be used. We can use Vertica for AI, we can use Vertica for machine learning, and for all kinds of really cool use cases that are being done with IOT, with other kinds of cases that we're seeing that require data being managed at scale, but with security. And that's the challenge, I think, in the current era, is how do we do this in an elegant way? How do we do it in a way that's future proof when CCPA comes in? How can I lay this on as another layer of audit responsibility and control around my data so that I can satisfy those regulators as well as the folks over in Europe and Singapore and China and Turkey and Australia. It goes on and on. Each jurisdiction out there is now requiring audit. And like I mentioned, the audits are getting tougher. And if you read the news, the GDPR example I think is classic. They told us in 2016, it's coming. They told us in 2018, it's here. They're telling us in 2020, we're serious about this, and here's the finds, and you better be aware that we're coming to audit you. And when we audit you, we're going to be asking some tough questions. If you can't answer those in a timely manner, then you're going to be facing some serious consequences, and I think that's what's getting attention. >> Yeah, so the whole big data thing started with Hadoop, and Hadoop is open, it's distributed, and it just created a real governance challenge. I want to talk about your solutions in this space. Can you tell us more about Micro Focus voltage? I want to understand what it is, and then get into sort of how it works, and then I really want to understand how it's applied to Vertica. >> Yeah, absolutely, that's a great question. First of all, we were the originators of format preserving encryption, we developed some of the core basic research out of Stanford University that then became the company of Voltage; that build-a-brand name that we apply even though we're part of Micro Focus. So the lineage still goes back to Dr. Benet down at Stanford, one of my buddies there, and he's still at it doing amazing work in cryptography and keeping moving the industry forward, and the science forward of cryptography. It's a very deep science, and we all want to have it peer-reviewed, we all want to be attacked, we all want it to be proved secure, that we're not selling something to a major money center bank that is potentially risky because it's obscure and we're private. So we have an open standard. For six years, we worked with the Department of Commerce to get our standard approved by NIST; The National Institute of Science and Technology. They initially said, well, AES256 is going to be fine. And we said, well, it's fine for certain use cases, but for your database, you don't want to change your schema, you don't want to have this increase in storage costs. What we want is format preserving encryption. And what that does is turns my name, Rich, into a four-letter ciphertext. It can be reversed. The mathematics of that are fascinating, and really deep and amazing. But we really make that very simple for the end customer because we produce APIs. So these application programming interfaces can be accessed by applications in C or Java, C sharp, other languages. But they can also be accessed in Microservice Manor via rest and web service APIs. And that's the core of our technical platform. We have an appliance-based approach, so we take a secure data appliance, we'll put it on Prim, we'll make 50 of them if you're a big company like Verizon and you need to have these co-located around the globe, no problem; we can scale to the largest enterprise needs. But our typical customer will install several appliances and get going with a couple of environments like QA and Prod to be able to start getting encryption going inside their organization. Once the appliances are set up and installed, it takes just a couple of days of work for a typical technical staff to get done. Then you're up and running to be able to plug in the clients. Now what are the clients? Vertica's a huge one. Vertica's one of our most powerful client endpoints because you're able to now take that API, put it inside Vertica, it's all open on the internet. We can go and look at Vertica.com/secure data. You get all of our documentation on it. You understand how to use it very quickly. The APIs are super simple; they require three parameter inputs. It's a really basic approach to being able to protect and access data. And then it gets very deep from there because you have data like credit card numbers. Very different from a street address and we want to take a different approach to that. We have data like birthdate, and we want to be able to do analytics on dates. We have deep approaches on managing analytics on protected data like Date without having to put it in the clear. So we've maintained a lead in the industry in terms of being an innovator of the FF1 standard, what we call FF1 is format preserving encryption. We license that to others in the industry, per our NIST agreement. So we're the owner, we're the operator of it, and others use our technology. And we're the original founders of that, and so we continue to sort of lead the industry by adding additional capabilities on top of FF1 that really differentiate us from our competitors. Then you look at our API presence. We can definitely run as a dup, but we also run in open systems. We run on main frame, we run on mobile. So anywhere in the enterprise or one in the cloud, anywhere you want to be able to put secure data, and be able to access the protect data, we're going to be there and be able to support you there. >> Okay so, let's say I've talked to a lot of customers this week, and let's say I'm running in Eon mode. And I got some workload running in AWS, I've got some on Prim. I'm going to take an appliance or multiple appliances, I'm going to put it on Prim, but that will also secure my cloud workloads as part of a sort of shared responsibility model, for example? Or how does that work? >> No, that's absolutely correct. We're really flexible that we can run on Prim or in the cloud as far as our crypto engine, the key management is really hard stuff. Cryptography is really hard stuff, and we take care of all that, so we've all baked that in, and we can run that for you as a service either in the cloud or on Prim on your small Vms. So really the lightweight footprint for me running my infrastructure. When I look at the organization like you just described, it's a classic example of where we fit because we will be able to protect that data. Let's say you're ingesting it from a third party, or from an operational system, you have a website that collects customer data. Someone has now registered as a new customer, and they're going to do E-commerce with you. We'll take that data, and we'll protect it right at the point of capture. And we can now flow that through the organization and decrypt it at will on any platform that you have that you need us to be able to operate on. So let's say you wanted to pick that customer data from the operational transaction system, let's throw it into Eon, let's throw it into the cloud, let's do analytics there on that data, and we may need some decryption. We can place secure data wherever you want to be able to service that use case. In most cases, what you're doing is a simple, tiny little atomic efetch across a protected tunnel, your typical TLS pipe tunnel. And once that key is then cashed within our client, we maintain all that technology for you. You don't have to know about key management or dashing. We're good at that; that's our job. And then you'll be able to make those API calls to access or protect the data, and apply the authorization authentication controls that you need to be able to service your security requirements. So you might have third parties having access to your Vertica clusters. That is a special need, and we can have that ability to say employees can get X, and the third party can get Y, and that's a really interesting use case we're seeing for shared analytics in the internet now. >> Yeah for sure, so you can set the policy how we want. You know, I have to ask you, in a perfect world, I would encrypt everything. But part of the reason why people don't is because of performance concerns. Can you talk about, and you touched upon it I think recently with your sort of atomic access, but can you talk about, and I know it's Vertica, it's Ferrari, etc, but anything that slows it down, I'm going to be a concern. Are customers concerned about that? What are the performance implications of running encryption on Vertica? >> Great question there as well, and what we see is that we want to be able to apply scale where it's needed. And so if you look at ingest platforms that we find, Vertica is commonly connected up to something like Kafka. Maybe streamsets, maybe NiFi, there are a variety of different technologies that can route that data, pipe that data into Vertica at scale. Secured data is architected to go along with that architecture at the node or at the executor or at the lowest level operator level. And what I mean by that is that we don't have a bottleneck that everything has to go through one process or one box or one channel to be able to operate. We don't put an interceptor in between your data and coming and going. That's not our approach because those approaches are fragile and they're slow. So we typically want to focus on integrating our APIs natively within those pipeline processes that come into Vertica within the Vertica ingestion process itself, you can simply apply our protection when you do the copy command in Vertica. So really basic simple use case that everybody is typically familiar with in Vertica land; be able to copy the data and put it into Vertica, and you simply say protect as part of the data. So my first name is coming in as part of this ingestion. I'll simply put the protect keyword in the Syntax right in SQL; it's nothing other than just an extension SQL. Very very simple, the developer, easy to read, easy to write. And then you're going to provide the parameters that you need to say, oh the name is protected with this kind of a format. To differentiate it between a credit card number and an alphanumeric stream, for example. So once you do that, you then have the ability to decrypt. Now, on decrypt, let's look at a couple different use cases. First within Vertica, we might be doing select statements within Vertica, we might be doing all kinds of jobs within Vertica that just operate at the SQL layer. Again, just insert the word "access" into the Vertica select string and provide us with the data that you want to access, that's our word for decryption, that's our lingo. And we will then, at the Vertica level, harness the power of its CPU, its RAM, its horsepower at the node to be able to operate on that operator, the decryption request, if you will. So that gives us the speed and the ability to scale out. So if you start with two nodes of Vertica, we're going to operate at X number of hundreds of thousands of transactions a second, depending on what you're doing. Long strings are a little bit more intensive in terms of performance, but short strings like social security number are our sweet spot. So we operate very very high speed on that, and you won't notice the overhead with Vertica, perse, at the node level. When you scale Vertica up and you have 50 nodes, and you have large clusters of Vertica resources, then we scale with you. And we're not a bottleneck and at any particular point. Everybody's operating independently, but they're all copies of each other, all doing the same operation. Fetch a key, do the work, go to sleep. >> Yeah, you know, I think this is, a lot of the customers have said to us this week that one of the reasons why they like Vertica is it's very mature, it's been around, it's got a lot of functionality, and of course, you know, look, security, I understand is it's kind of table sticks, but it's also can be a differentiator. You know, big enterprises that you sell to, they're asking for security assessments, SOC 2 reports, penetration testing, and I think I'm hearing, with the partnership here, you're sort of passing those with flying colors. Are you able to make security a differentiator, or is it just sort of everybody's kind of got to have good security? What are your thoughts on that? >> Well, there's good security, and then there's great security. And what I found with one of my money center bank customers here in San Francisco was based here, was the concern around the insider access, when they had a large data store. And the concern that a DBA, a database administrator who has privilege to everything, could potentially exfil data out of the organization, and in one fell swoop, create havoc for them because of the amount of data that was present in that data store, and the sensitivity of that data in the data store. So when you put voltage encryption on top of Vertica, what you're doing now is that you're putting a layer in place that would prevent that kind of a breach. So you're looking at insider threats, you're looking at external threats, you're looking at also being able to pass your audit with flying colors. The audits are getting tougher. And when they say, tell me about your encryption, tell me about your authentication scheme, show me the access control list that says that this person can or cannot get access to something. They're asking tougher questions. That's where secure data can come in and give you that quick answer of it's encrypted at rest. It's encrypted and protected while it's in use, and we can show you exactly who's had access to that data because it's tracked via a different layer, a different appliance. And I would even draw the analogy, many of our customers use a device called a hardware security module, an HSM. Now, these are fairly expensive devices that are invented for military applications and adopted by banks. And now they're really spreading out, and people say, do I need an HSM? Well, with secure data, we certainly protect your crypto very very well. We have very very solid engineering. I'll stand on that any day of the week, but your auditor is going to want to ask a checkbox question. Do you have HSM? Yes or no. Because the auditor understands, it's another layer of protection. And it provides me another tamper evident layer of protection around your key management and your crypto. And we, as professionals in the industry, nod and say, that is worth it. That's an expensive option that you're going to add on, but your auditor's going to want it. If you're in financial services, you're dealing with PCI data, you're going to enjoy the checkbox that says, yes, I have HSMs and not get into some arcane conversation around, well no, but it's good enough. That's kind of the argument then conversation we get into when folks want to say, Vertica has great security, Vertica's fantastic on security. Why would I want secure data as well? It's another layer of protection, and it's defense in depth for you data. When you believe in that, when you take security really seriously, and you're really paranoid, like a person like myself, then you're going to invest in those kinds of solutions that get you best in-class results. >> So I'm hearing a data-centric approach to security. Security experts will tell you, you got to layer it. I often say, we live in a new world. The green used to just build a moat around the queen, but the queen, she's leaving her castle in this world of distributed data. Rich, incredibly knowlegable guest, and really appreciate you being on the front lines and sharing with us your knowledge about this important topic. So thanks for coming on theCUBE. >> Hey, thank you very much. >> You're welcome, and thanks for watching everybody. This is Dave Vellante for theCUBE, we're covering wall-to-wall coverage of the Virtual Vertica BDC, Big Data Conference. Remotely, digitally, thanks for watching. Keep it right there. We'll be right back right after this short break. (intense music)

Published Date : Mar 31 2020

SUMMARY :

Vertica Big Data Conference 2020 brought to you by Vertica. and we're pleased that The Cube could participate But maybe you can talk about your role And then to the other uses where we might be doing and how you guys are responding. and they said, we want to inform you your card and it does for the reasons that you mentioned. and put the software layers in place to make sure Yeah, so the whole big data thing started with Hadoop, So the lineage still goes back to Dr. Benet but that will also secure my cloud workloads as part of a and we can run that for you as a service but can you talk about, at the node to be able to operate on that operator, a lot of the customers have said to us this week and we can show you exactly who's had access to that data and really appreciate you being on the front lines of the Virtual Vertica BDC, Big Data Conference.

ENTITIES

Entity	Category	Confidence
Australia	LOCATION	0.99+
Europe	LOCATION	0.99+
Target	ORGANIZATION	0.99+
Verizon	ORGANIZATION	0.99+
Vertica	ORGANIZATION	0.99+
Facebook	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
May 2018	DATE	0.99+
NIST	ORGANIZATION	0.99+
2016	DATE	0.99+
Boston	LOCATION	0.99+
2018	DATE	0.99+
San Francisco	LOCATION	0.99+
New York	LOCATION	0.99+
Target Corporation	ORGANIZATION	0.99+
$250 million	QUANTITY	0.99+
50	QUANTITY	0.99+
Rich Gaston	PERSON	0.99+
Singapore	LOCATION	0.99+
Turkey	LOCATION	0.99+
Ferrari	ORGANIZATION	0.99+
six years	QUANTITY	0.99+
2020	DATE	0.99+
one box	QUANTITY	0.99+
China	LOCATION	0.99+
C	TITLE	0.99+
Stanford University	ORGANIZATION	0.99+
Java	TITLE	0.99+
First	QUANTITY	0.99+
one	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
U.S.	LOCATION	0.99+
this week	DATE	0.99+
National Institute of Science and Technology	ORGANIZATION	0.99+
Each jurisdiction	QUANTITY	0.99+
both	QUANTITY	0.99+
Vertica	TITLE	0.99+
Rich	PERSON	0.99+
this year	DATE	0.98+
Vertica Virtual Big Data Conference	EVENT	0.98+
one channel	QUANTITY	0.98+
one process	QUANTITY	0.98+
GDPR	TITLE	0.98+
SQL	TITLE	0.98+
five billion rows	QUANTITY	0.98+
about five billion	QUANTITY	0.97+
One	QUANTITY	0.97+
C sharp	TITLE	0.97+
Benet	PERSON	0.97+
first	QUANTITY	0.96+
four-letter	QUANTITY	0.96+
Vertica Big Data Conference 2020	EVENT	0.95+
Hadoop	TITLE	0.94+
Kafka	TITLE	0.94+
Micro Focus	ORGANIZATION	0.94+

Francesca Lazzeri, Microsoft | Microsoft Ignite 2019

>> Commentator: Live from Orlando, Florida It's theCUBE. Covering Microsoft Ignite. Brought to you by Cohesity. >> Hello everyone and welcome back to theCUBE's live coverage of Microsoft Ignite 2019. We are theCUBE, we are here at the Cohesity booth in the middle of the show floor at the Orange County Convention Center. 26,000 people from around the globe here. It's a very exciting show. I'm your host, Rebecca Knight, along with my co-host, Stu Miniman. We are joined by Francesca Lazzeri. She is a Ph.D Machine Learning Scientist and Cloud Advocate at Microsoft. Thank you so much for coming on the show. >> Thank you for having me. I'm very excited to be here. >> Rebecca: Direct from Cambridge, so we're an all Boston table here. >> Exactly. >> I love it. I love it. >> We are in the most technology cluster, I think, in the world probably. >> So two words we're hearing a lot of here at the show, machine learning, deep learning, can you describe, define them for us here, and tell us the difference between machine learning and deep learning. >> Yeah, this is a great question and I have to say a lot of my customers ask me this question very, very often. Because I think right now there are many different terms such as deep learning as you said, machine learning, AI, that have been used more or less in the same way, but they are not really the same thing. So machine learning is portfolio, I would say, of algorithms, and when you say algorithms I mean really statistical models, that you can use to run some data analysis. So you can use these algorithms on your data, and these are going to produce what we call an output. Output are the results. So deep learning is just a type of machine learning, that has a different structure. We call it deep learning because there are many different layers, in a neural network, which is again a type of machine learning algorithm. And it's very interesting because it doesn't look at the linear relation within the different variables, but it looks at different ways to train itself, and learn something. So you have to think just about deep learning as a type of machine learning and then we have AI. AI is just on top of everything, AI is a way of building application on top of machine learning models and they run on top of machine learning algorithms. So it's a way, AI, of consuming intelligent models. >> Yeah, so Francesca, I know we're going to be talking to Jeffrey Stover tomorrow about a topic, responsible AI. Can you talk a little bit about how Microsoft is making sure that unintentional biases or challenges with data, leave the machine learning to do things, or have biases that we wouldn't want to otherwise. >> Yes, I think that Microsoft is actually investing a lot in responsible AI. Because I have to say, as a data scientist, as a machine learning scientist, I think that it's very important to understand what the model is doing and why it's give me analysis of a specific result. So, in my team, we have a tool kit, which is called, interpretability toolkit, and it's really a way to unpack machine learning models, so it's a way of opening machine learning models and understand what are the different relations between the different viables, the different data points, so it's an easy way through different type of this relation, that you can understand why your model is giving you specific results. So that you get that visibility, as a data scientist, but also as a final consumer, final users of these AI application. And I think that visibility is the most important thing to prevent unbias, sorry, bias application, and to make sure that our results are fair, for everybody. So there are some technical tools that we can use for sure. I can tell you, as a data scientist, that bias and unfairness starts with the data. You have to make sure that the data is representative enough of the population that you are targeting with your AI applications. But this sometimes is not possible. That's why it's important to create some services, some toolkits, that are going to allow you, again, as a data scientist, as a user, to understand what the AI application, or the machine learning model is doing. >> So what's the solution? If the problem, if the root of the problem is the data in the first place, how do we fix this? Because this is such an important issue in technology today. >> Yes, and so there are a few ways that you can use... So first of all I want to say that it's not a issue that you can really fix. I would say that, again, as a data scientist, there are a few things that you can do, in order to check that your AI application is doing a good job, in terms of fairness, again. And so these few steps are, as you said, the data. So most of the time, people, or customers, they just use their own data. Something that is very helpful is also looking at external type of data, and also make sure that, again, as I said, the pure data is representative enough of the entire population. So for example, if you are collecting data from a specific category of people, of a specific age, from a specific geography, you have to make sure that you understand that their results are not general results, are results that the machine learning algorithm learn from that target population. And so it's important again, to look at different type of data, different type of data sets, and use, if you can, also external data. And then, of course, this is just the first step. There's a second step, that you can always make sure that you check your model with a business expert, with data expert. So sometimes we have data scientists that work in siloes, they do not really communicate what they're doing. And I think that this is something that you need to change within your company, within your organization, you have to, always to make sure, that data scientists, machine learning scientists are working closely with data experts, business experts, and everybody's talking. Again, to make sure that we understand what we are doing. >> Okay, there were so many things announced at the show this week. In your space, what are some of the highlights of the things that people should be taking away from Microsoft Ignite. >> So I think that as your machine learning platform has been announcing a lot of updates, I love the product because I think it's a very dynamic product. There is, what we now call, the designer, which is a new version of the old Azure Machine Learning Studio. It's a drag and drop tool so it's a tool that is great for people who do not want to, code to match, or who are just getting started with machine learning. And you can really create end-to-end machine learning pipelines with these tools, in just a matter of a few minutes. The nice thing is that you can also deploy your machine learning models and this is going to create an API for you, and this API can be used by you, or by other developers in your company, to just call the model that you deployed. As I mentioned before, this is really the part where AI is arriving, and it's the part where you create application on top of your models. So this is a great announcement and we also created a algorithm cheat sheet, that is a really nice map that you can use to understand, based on your question, based on your data, what's the best machine learning algorithm, what's the best designer module that you can use to be build your end-to-end machine learning solution. So this, I would say, is my highlight. And then of course, in terms of Azure Machine Learning, there are other updates. We have the Azure Machine Learning python SDK, which is more for pro data scientists, who wants to create customized models, so models that they have to build from scratch. And for them it's very easy, because it's a python-based environment, where they can just build their models, train it, test it, deploy it. So when I say it's a very dynamic and flexible tool because it's really a tool on the pla- on the Cloud, that is targeting more business people, data analysts, but also pro data scientists and AI developers, so this is great to see and I'm very, very excited for that. >> So in addition to your work as a Cloud advocate at Microsoft, you are also a mentor to research and post-doc students at the Massachusetts Institute of Technology, MIT, so tell us a little more about that work in terms of what kind of mentorship do you provide and what your impressions are of this young generation, a young generation of scientists that's now coming up. >> Yes. So that's another wonderful question because one of the main goal of my team is actually working with a academic type of audience, and we started this about a year ago. So we are, again, a team of Cloud advocates, developers, data scientists, and we do not want to work only with big enterprises, but we want to work with academic type of institutions. So when I say academics, of course I mean, some of the best universities, like I've been working a lot with MIT in Cambridge, Massachusetts Institute of Technology, Harvard, and also now I've been working with the Columbia University, in New York. And with all of them, I work with both the PhD and post-doc students, and most of the time, what I try to help them with is changing their mindset. Because these are all brilliant students, that need just to understand how they can translate what they have learned doing their years of study, and also their technical skillset, in to the real world. And when I say the real world, I mean more like, building applications. So there is this sort of skill transfer that needs to be done and again, working with these brilliant people, I have to say, something that is easy to do, because sometimes they just need to work on a specific project that I create for them, so I give data to them and then we work together in a sort of lab environment, and we build end-to-end solutions. But from a knowledge perspective, from a, I would say, technical perspective, these are all excellent students, so it's really, I find myself in a position in which I'm mentoring them, I prepare them for their industry, because most of them, they want to become data scientist, machine learning scientist, but I have to say that I also learn a lot from them, because at the end of the day, when we build these solutions, it's really a way to build something, a project, an app together, and then we also see, the beauty of this is also that we also see how other people are using that to build something even better. So it's an amazing experience, and I feel very lucky that I'm in Cambridge, where, as you know, we have the best schools. >> Francesca, you've dug in some really interesting things, I'd love to get just a little bit, if you can share, about how machine learning is helping drive competitiveness and innovation in companies today, and any tips you have for companies, and how they can get involved even more. >> Yeah, absolutely. So I think that everything really start with the business problem because I think that, as we started this conversation, we were mentioning words such as deep learning, machine learning, AI, so it's, a lot of companies, they just want to do this because they think that they're missing something. So my first suggestion for them is really trying to understand what's the business question that they have, if there is a business problem that they can solve, if there is an operation that they can improve, so these are all interesting questions that they can ask themselves their themes. And then as soon as they have this question in mind, the second step is understand that, if they have the data, the right data, that are needed to support this process, that is going to help them with the business question. So after that, you understand that the data, I mean, if you understand, if you have the right data, they are the steppings, of course you have to understand if you have also external data, and if you have enough data, as we were saying, because this is very, very important as a first step, in your machine learning journey. And you know, it's important also, to be able to translate the business question in to a machine learning question. Like, for example, in the supervised learning, which is an area of machine learning, we have what is called the regression. Regression is a great type of model, that is great for, to answer questions such as, how many, how much? So if you are a retailer and you wanted to predict how much, how many sales of a specific product you're going to have in the next two weeks, so for example, the regression model, is going to be a good first find, first step for you to start your machine learning journey. So the translation of the business problem into a machine learning question, so it's a consequence in to a machine learning algorithm, is also very important. And then finally, I would say that you always have to make sure that you are able to deploy this machine learning model so that your environment is ready for the deployment and what we call the operizational part. Because this is really the moment in which we are going to allow the other people, meaning internal stake holders, other things in your company, to consume the machine learning model. That's the moment really in which you are going to add business value to your machine learning solution. So yeah, my suggestion for companies who want to start this journey is really to make sure that they have cleared these steps, because I think that if they have cleared these steps, then their team, their developers, their data scientists, are going to work together to build these end-to-end solutions. >> Francesca Lenzetti, thank you so much for coming on theCUBE, it was a pleasure having you. >> Thank you. Thank you. >> I'm Rebecca Knight, Stu Miniman. Stay tuned for more of theCUBE's live coverage of Microsoft Ignite. (upbeat music)

Published Date : Nov 5 2019

SUMMARY :

Brought to you by Cohesity. in the middle of the show floor Thank you for having me. so we're an all Boston table here. I love it. We are in the most technology cluster, I think, can you describe, So you can use these algorithms on your data, leave the machine learning to do things, that you can understand why your model is giving you is the data in the first place, And I think that this is something that you need to change announced at the show this week. and it's the part where you create application So in addition to your work and most of the time, what I try to help them with I'd love to get just a little bit, if you can share, and if you have enough data, as we were saying, thank you so much for coming on theCUBE, Thank you. live coverage of Microsoft Ignite.

ENTITIES

Entity	Category	Confidence
Francesca Lenzetti	PERSON	0.99+
Francesca Lazzeri	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Francesca	PERSON	0.99+
Stu Miniman	PERSON	0.99+
Rebecca	PERSON	0.99+
Massachusetts Institute of Technology	ORGANIZATION	0.99+
Jeffrey Stover	PERSON	0.99+
MIT	ORGANIZATION	0.99+
New York	LOCATION	0.99+
26,000 people	QUANTITY	0.99+
first step	QUANTITY	0.99+
Cambridge	LOCATION	0.99+
Columbia University	ORGANIZATION	0.99+
tomorrow	DATE	0.99+
second step	QUANTITY	0.99+
first	QUANTITY	0.99+
two words	QUANTITY	0.99+
Orlando, Florida	LOCATION	0.99+
Microsoft	ORGANIZATION	0.99+
Azure Machine Learning	TITLE	0.99+
Orange County Convention Center	LOCATION	0.99+
Cohesity	ORGANIZATION	0.99+
Harvard	ORGANIZATION	0.99+
Boston	LOCATION	0.99+
first suggestion	QUANTITY	0.98+
both	QUANTITY	0.98+
this week	DATE	0.98+
python	TITLE	0.98+
today	DATE	0.95+
Azure Machine Learning Studio	TITLE	0.95+
one	QUANTITY	0.95+
theCUBE	ORGANIZATION	0.94+
idge	ORGANIZATION	0.92+
Cambr	LOCATION	0.92+
Azure Machine Learning python SDK	TITLE	0.87+
first place	QUANTITY	0.87+
Cloud	TITLE	0.87+
about	DATE	0.85+
a year ago	DATE	0.8+
next two weeks	DATE	0.79+
2019	DATE	0.68+
Ignite	TITLE	0.62+
Ignite 2019	TITLE	0.46+
Ignite	COMMERCIAL_ITEM	0.44+
Ignite	EVENT	0.31+

Julie Johnson, Armored Things | MIT CDOIQ 2019

>> From Cambridge Massachusetts, it's The Cube covering MIT Chief Data Officer, and Information Quality Symposium 2019. Brought to you by SiliconANGLE Media. (electronic music) >> Welcome back to MIT in Cambridge, Massachusets everybody. You're watching The Cube, the leader in live tech coverage. My name is Dave Vellante I'm here with Paul Gillin. Day two of the of the MIT Chief Data Officer Information Quality Conference. One of the things we like to do, at these shows, we love to profile Boston area start-ups that are focused on data, and in particular we love to focus on start-ups that are founded by women. Julie Johnson is here, She's the Co-founder and CEO of Armored Things. Julie, great to see you again. Thanks for coming on. >> Great to see you. >> So why did you start Armored Things? >> You know, Armored Things was created around a mission to keep people safe. Early in the time where were looking at starting this company, incidents like Las Vegas happened, Parkland happened, and we realized that the world of security and operations was really stuck in the past right? It's a manual solutions generally driven by a human instinct, anecdotal evidence, and tools like Walkie-Talkies and video cameras. We knew there had to be a better way right? In the world of Data that we live in today, I would ask if either of you got in your car this morning without turning on Google Maps to see where you were going, and the best route with traffic. We want to help universities, ball parks, corporate campuses do that for people. How do we keep our people safe? By understanding how they live. >> Yeah, and stay away from Lambert Street in Cambridge by the way. >> (laughing) >> Okay so, you know in people, when they think about security they think about cyber, they think about virtual security, et cetera et cetera, but there's also the physical security aspect. Can you talk about the balance of those two? >> Yeah, and I think both are very important. We actually tend to mimic some of the revolutions that have happened on the cyber security side over the last 10 years with what we're trying to do in the world of physical security. So, folks watching this who are familiar with cyber security might understand concepts like anomaly detection, SIEM and SOAR for orchestrated response. We very much believe that similar concepts can be applied to the physical world, but the unique thing about the physical world, is that it has defined boundaries, right? People behave in accordance with their environment. So, how do we take the lessons learned in cyber security over 10 to 15 years, and apply them to that physical world? I also believe that physical and cyber security are converging. So, are there things that we know in the physical world because of how we approach the problem? That can be a leading indicator of a threat in either the physical world or the digital world. What many people don't understand is that for some of these cyber security hacks, the first weak link is physical access to your network, to your data, to your systems. How do we actually help you get an eye on that, so you already have some context when you notice it in the digital realm. >> So, go back to the two examples you sited earlier, the two shooting examples. Could those have been prevented or mitigated in some way using the type of technology you're building? >> Yeah, I hate to say that you could ever prevent an incident like that. Everyone wants us to do better. Our goal is to get a better sense predicatively of the leading indicators that tell you you have a problem. So, because we're fundamentally looking at patterns of people and flow, I want to know when a normal random environment starts to disperse in a certain way, or if I have a bottle neck in my environment. Because if then I have that type of incident occur, I already know where my hotspots are, where my pockets of risk are. So, I can address it that much more efficiently from a response perspective. >> So if people are moving quickly away from a venue, it might be and indication that there's something wrong- >> It could be, Yeah. That demands attention. >> Yeah, when you go to a baseball game, or when you go to work I would imagine that you generally have a certain pattern of behavior. People know conceptually what those patterns are. But, we're the first effort to bring them data to prove what those patterns are so that they can actually use that data to consistently re-examine their operations, re-examine their security from a staffing perspective, from a management perspective, to make sure that they're using all the data that's at their disposal. >> Seems like there would be many other applications beyond security of this type of analysis. Are you committed to the security space, or do you have broader ambitions? >> Are we committed to the security space is a hundred percent. I would say the number one reason why people join our team, and the number one reason why people call us to be customers is for security. There's a better way to do things. We fundamentally believe that every ball park, every university, every corporate campus, needs a better way. I think what we've seen though is exactly what you're saying. As we built our software, for security in these venues, and started with an understanding of people and flow, there's a lot that falls out of that right? How do I open gates that are more effective based on patterns of entry and exit. How do I make sure that my staffing's appropriate for the number of people I have in my environment. There's lots of other contextual information that can ultimately drive a bottom line or top line revenue. So, you take a pro sports venue for example. If we know that on a 10 degree colder day people tend to eagres more early in the game, how do we adjust our food and beverage strategy to save money on hourly workers, so that we're not over staffing in a period of time that doesn't need those resources. >> She's talking about the physical and the logical security worlds coming together, and security of course has always been about data, but 10 years ago it was staring at logs increasing the machines are helping us do that, and software is helping us do that. So can you add some color to at least the trends in the market generally, and then maybe specifically what you're doing bringing machine intelligence to the data to make us more secure. >> Sure, and I hate to break it to you, but logs are still a pretty big part of what people are watching on a daily basis, as are video cameras. We've seen a lot of great technology evolve in the video management system realm. Very advanced technology great at object recognition and detecting certain behaviors with a video only solution, right? How do we help pinpoint certain behaviors on a specific frame or specific camera. The only problem with that is, if you have people watching those cameras, you're still relying on humans in the loop to catch a malicious behavior, to respond in the event that they're notified about something unusual. That still becomes a manual process. What we do, is we use data to watch not only cameras, but we are watching your cameras, your Wi-Fi, access control. Contextual data from public transit, or weather. How do we get this greater understanding of your environment that helps us watch everything so that we can surface the things that you want the humans in the loop to pay attention to, right? So, we're not trying to remove the human, we're trying to help them focus their time and make decisions that are backed by data in the most efficient way possible. >> How about the concerns about The Surveillance Society? In some countries, it's just taken for granted now that you're on camera all the time. In the US that's a little bit more controversial. Is what your doing, do you have to be sensitive to that in designing the tools you're building? >> Yeah, and I think to Dave's question, there are solutions like facial recognition which are very much working on identifying the individual. We have a philosophy as a company, that security doesn't necessarily start with the individual, it starts with the aggregate. How do we understand at an aggregate macro level, the patterns in an environment. Which means I don't have to identify Paul, or I don't have to identify Dave. I want to look for what's usual and unusual, and use that as the basis of my response. There's certain instances where you want to know who people are. Do I want to know who my security personnel are so I can dispatch them more efficiently? Absolutely. Let's opt those people in and allow them to share the information they need to share to be better resources for our environment. But, that's the exception not the norm. If we make the norm privacy first, I think we'll be really successful in this emerging GDPR data centric world. >> But I could see somebody down the road saying hey can you help us find this bad guy? And my kids at camp this week, This is his 7th year of camp, and this year was the first year my wife, she was able to sign up for a facial recognition thing. So, we used to have to scroll through hundreds and hundreds of pictures to see oh, there he is! And so Deb signs up for this thing, and then it pings you when your son has a picture taken. >> Yeah. And I was like, That's awesome. Oh. (laughing) >> That's great until you think about it. >> But there aren't really any clear privacy laws today. And so you guys are saying, look it, we're looking at the big picture. >> That's right. >> But that day is coming isn't it? >> There's certain environments that care more than others. If you think about universities, which is where we first started building our technology, they cared greatly about the privacy of their students. Health care is a great example. We want to make sure that we're protecting peoples personal data at a different level. Not only because that's the right thing to do, but also from a regulatory perspective. So, how do we give them the same security without compromising the privacy. >> Talk about Bottom line. You mentioned to us earlier that you just signed a contract with a sports franchise, you're actually going to help them, help save them money by deploying their resources more efficiently. How does your technology help the bottom line? >> Sure, you're average sporting venue, is getting great information at the point a ticket is scanned or a ticket is purchased, they have very little visibility beyond that into the customer journey during an event at their venue. So, if you think about again, patterns of people and flow from a security perspective, at our core we're helping them staff the right gates, or figure out where people need to be based on hot spots in their environment. But, what that also results in is an ability to drive other operational benefits. Do we have a zone that's very low utilization that we could use as maybe even a benefit to our avid fans. Send them to that area, get traffic in that area, and now give them a better concession experience because of it, right? Where they're going to end up spending more money because they're not waiting in line in the different zone. So, how do we give them a dashboard in real time, but also alerts or reports that they can use on an ongoing basis to change their decision making going forward. >> So, give us the company overview. Where are you guys at with funding, head count, all that good stuff. >> So, we raised a seed round with some great Boston and Silicon Valley investors a year ago. So, that was Glasswing is a Boston AI focused fund, has been a great partner for us, and Inovia which is Canada's largest VC fund recently opened a Silicon Valley office. We just started raising a series A about a week ago. I'm excited to say those conversation have been going really well so far. We have some potential strategic partners who we're excited about who know data better then anyone else that we think would help us accelerate our business. We also have a few folks who are very familiar with the large venue space. You know, the distributed campuses, the sporting and entertainment venues. So, we're out looking for the right partner to lead our series A round, and take our business to the next level, but where we are today with five really great branded customers, I think we'll have 20 by the end of next year, and we won't stop fighting 'till we're at every ball park, every football stadium, every convention center, school. >> The big question, at some point will you be able to eliminate security lines? (laughing) >> I don't think that's my core mission. (laughing) But, optimistically I'd love to help you. Right, I think there's some very talented people working on that challenge, so I'll defer that one to them. >> And rough head count today? >> We have 23 people. >> You're 23 people so- >> Yeah, I headquartered in Boston Post Office Square. >> Awesome, great location. So, and you say you've got five customers, so you're generating revenue? >> Yes >> Okay, good. Well, thank you for coming in The Cube >> Yeah, thank you. >> And best of luck with the series A- >> I appreciate it and going forward >> Yeah, great. >> All right, and thank you for watching. Paul Gillin and I will be back right after this short break. This is The Cube from MIT Chief Data Officer Information Quality Conference in Cambridge. We'll be right back. (electronic music)

Published Date : Aug 1 2019

SUMMARY :

Brought to you by SiliconANGLE Media. Julie, great to see you again. to see where you were going, in Cambridge by the way. Okay so, you know in people, How do we actually help you get an eye on that, So, go back to the two examples you sited earlier, Yeah, I hate to say that you could ever prevent That demands attention. data to prove what those patterns are or do you have broader ambitions? and the number one reason why people bringing machine intelligence to the data Sure, and I hate to break it to you, sensitive to that in designing the tools you're building? Yeah, and I think to Dave's question, and then it pings you when your son And I was like, That's awesome. And so you guys are saying, Not only because that's the right thing to do, You mentioned to us earlier that you So, if you think about again, Where are you guys at with funding, head count, and take our business to the next level, so I'll defer that one to them. So, and you say you've got five customers, Well, thank you for coming in The Cube All right, and thank you for watching.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Paul Gillin	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Julie Johnson	PERSON	0.99+
Julie	PERSON	0.99+
Cambridge	LOCATION	0.99+
7th year	QUANTITY	0.99+
Inovia	ORGANIZATION	0.99+
Paul	PERSON	0.99+
Lambert Street	LOCATION	0.99+
Boston	LOCATION	0.99+
five	QUANTITY	0.99+
two examples	QUANTITY	0.99+
10 degree	QUANTITY	0.99+
US	LOCATION	0.99+
five customers	QUANTITY	0.99+
23 people	QUANTITY	0.99+
two	QUANTITY	0.99+
today	DATE	0.99+
Deb	PERSON	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Armored Things	ORGANIZATION	0.99+
both	QUANTITY	0.99+
Google Maps	TITLE	0.99+
Silicon Valley	LOCATION	0.99+
Glasswing	ORGANIZATION	0.99+
One	QUANTITY	0.99+
this week	DATE	0.98+
Cambridge Massachusetts	LOCATION	0.98+
a year ago	DATE	0.98+
first year	QUANTITY	0.98+
series A	OTHER	0.98+
hundred percent	QUANTITY	0.98+
20	QUANTITY	0.98+
Day two	QUANTITY	0.97+
The Cube	TITLE	0.97+
Las Vegas	LOCATION	0.97+
first	QUANTITY	0.97+
Canada	LOCATION	0.96+
GDPR	TITLE	0.96+
Chief Data Officer	EVENT	0.95+
over 10	QUANTITY	0.94+
10 years ago	DATE	0.94+
this year	DATE	0.94+
Surveillance Society	ORGANIZATION	0.93+
Boston Post Office Square	LOCATION	0.92+
15 years	QUANTITY	0.91+
first effort	QUANTITY	0.91+
end of next year	DATE	0.89+
MIT	ORGANIZATION	0.89+
this morning	DATE	0.88+
two shooting examples	QUANTITY	0.85+
about a week ago	DATE	0.83+
Thi	PERSON	0.83+
Armored	ORGANIZATION	0.83+
football stadium	QUANTITY	0.82+
one	QUANTITY	0.82+
2019	DATE	0.81+
Information Quality Symposium	EVENT	0.8+
hundreds of pictures	QUANTITY	0.79+
great branded customers	QUANTITY	0.77+
last 10 years	DATE	0.73+
hundreds and	QUANTITY	0.73+
MIT Chief Data Officer Information Quality Conference	EVENT	0.72+
Massachusets	LOCATION	0.7+
Parkland	ORGANIZATION	0.7+
every ball park	QUANTITY	0.7+
one reason	QUANTITY	0.69+
Walkie-	ORGANIZATION	0.66+
first weak link	QUANTITY	0.66+
convention center	QUANTITY	0.65+
The Cube	ORGANIZATION	0.64+
corporate campus	QUANTITY	0.64+
ball park	QUANTITY	0.61+
MIT Chief	ORGANIZATION	0.59+
Talkies	TITLE	0.57+
university	QUANTITY	0.57+
Data Officer Information Quality Conference	EVENT	0.54+

Lisa Ehrlinger, Johannes Kepler University | MIT CDOIQ 2019

>> From Cambridge, Massachusetts, it's theCUBE, covering MIT Chief Data Officer and Information Quality Symposium 2019. Brought to you by SiliconANGLE Media. >> Hi, everybody, welcome back to Cambridge, Massachusetts. This is theCUBE, the leader in tech coverage. I'm Dave Vellante with my cohost, Paul Gillin, and we're here covering the MIT Chief Data Officer Information Quality Conference, #MITCDOIQ. Lisa Ehrlinger is here, she's the Senior Researcher at the Johannes Kepler University in Linz, Austria, and the Software Competence Center in Hagenberg. Lisa, thanks for coming in theCUBE, great to see you. >> Thanks for having me, it's great to be here. >> You're welcome. So Friday you're going to lay out the results of the study, and it's a study of Data Quality Tools. Kind of the long tail of tools, some of those ones that may not have made the Gartner Magic Quadrant and maybe other studies, but talk about the study and why it was initiated. >> Okay, so the main motivation for this study was actually a very practical one, because we have many company projects with companies from different domains, like steel industry, financial sector, and also focus on automotive industry at our department at Johannes Kepler University in Linz. We have experience with these companies for more than 20 years, actually, in this department, and what reoccurred was the fact that we spent the majority of time in such big data projects on data quality measurement and improvement tasks. So at some point we thought, okay, what possibilities are there to automate these tasks and what tools are out there on the market to automate these data quality tasks. So this was actually the motivation why we thought, okay, we'll look at those tools. Also, companies ask us, "Do you have any suggestions? "Which tool performs best in this-and-this domain?" And I think this study answers some questions that have not been answered so far in this particular detail, in these details. For example, Gartner Magic Quadrant of Data Quality Tools, it's pretty interesting but it's very high-level and focusing on some global windows, but it does not look on the specific measurement functionalities. >> Yeah, you have to have some certain number of whatever, customers or revenue to get into the Magic Quadrant. So there's a long tail that they don't cover. But talk a little bit more about the methodology, was it sort of you got hands-on or was it more just kind of investigating what the capabilities of the tools were, talking to customers? How did you come to the conclusions? >> We actually approached this from a very scientific side. We conducted a systematic search, which tools are out there on the market, not only industrial tools, but also open-sourced tools were included. And I think this gives a really nice digest of the market from different perspectives, because we also include some tools that have not been investigated by Gartner, for example, like more BTQ, Data Quality, or Apache Griffin, which has really nice monitoring capabilities, but lacks some other features from these comprehensive tools, of course. >> So was the goal of the methodology largely to capture a feature function analysis of being able to compare that in terms of binary, did it have it or not, how robust is it? And try to develop a common taxonomy across all these tools, is that what you did? >> So we came up with a very detailed requirements catalog, which is divided into three fields, like the focuses on data profiling to get a first insight into data quality. The second is data quality management in terms of dimensions, metrics, and rules. And the third part is dedicated to data quality monitoring over time, and for all those three categories, we came up with different case studies on a database, on a test database. And so we conducted, we looked, okay, does this tool, yes, support this feature, no, or partially? And when partially, to which extent? So I think, especially on the partial assessment, we got a lot into detail in our survey, which is available on Archive online already. So the preliminary results are already online. >> How do you find it? Where is it available? >> On Archive. >> Archive? >> Yes. >> What's the URL, sorry. Archive.com, or .org, or-- >> Archive.org, yeah. >> Archive.org. >> But actually there is a ID I have not with me currently, but I can send you afterwards, yeah. >> Yeah, maybe you can post that with the show notes. >> We can post it afterwards. >> I was amazed, you tested 667 tools. Now, I would've expected that there would be 30 or 40. Where are all of these, what do all of these long tail tools do? Are they specialized by industry or by function? >> Oh, sorry, I think we got some confusion here, because we identified 667 tools out there on the market, but we narrowed this down. Because, as you said, it's quite impossible to observe all those tools. >> But the question still stands, what is the difference, what are these very small, niche tools? What do they do? >> So most of them are domain-specific, and I think this really highlights also these very basic early definition about data quality, of like data qualities defined as fitness for use, and we can pretty much see it here that we excluded the majority of these tools just because they assess some specific kind of data, and we just really wanted to find tools that are generally applicable for different kinds of data, for structured data, unstructured data, and so on. And most of these tools, okay, someone came up with, we want to assess the quality of our, I don't know, like geological data or something like that, yeah. >> To what extent did you consider other sort of non-technical factors? Did you do that at all? I mean, was there pricing or complexity of downloading or, you know, is there a free version available? Did you ignore those and just focus on the feature function, or did those play a role? >> So basically the focus was on the feature function, but of course we had to contact the customer support. Especially with the commercial tools, we had to ask them to provide us with some trial licenses, and there we perceived different feedback from those companies, and I think the best comprehensive study here is definitely Gartner Magic Quadrant for Data Quality Tools, because they give a broad assessment here, but what we also highlight in our study are companies that have a very open support and they are very willing to support you. For example, Informatica Data Quality, we perceived a really close interaction with them in terms of support, trial licenses, and also like specific functionality. Also Experian, our contact from Experian from France was really helpful here. And other companies, like IBM, they focus on big vendors, and here, it was not able to assess these tools, for example, yeah. >> Okay, but the other differences of the Magic Quadrant is you guys actually used the tools, played with them, experienced firsthand the customer experience. >> Exactly, yeah. >> Did you talk to customers as well, or, because you were the customer, you had that experience. >> Yes, I were the customer, but I was also happy to attend some data quality event in Vienna, and there I met some other customers who had experience with single tools. Not of course this wide range we observed, but it was interesting to get feedback on single tools and verify our results, and it matched pretty good. >> How large was the team that ran the study? >> Five people. >> Five people, and how long did it take you from start to finish? >> Actually, we performed it for one year, roughly. The assessment. And I think it's a pretty long time, especially when you see how quick the market responds, especially in the open source field. But nevertheless, you need to make some cut, and I think it's a very recent study now, and there is also the idea to publish it now, the preliminary results, and we are happy with that. >> Were there any surprises in the results? >> I think the main results, or one of the surprises was that we think that there is definitely more potential for automation, but not only for automation. I really enjoyed the keynote this morning that we need more automation, but at the same time, we think that there is also the demand for more declaration. We observed some tools that say, yeah, we apply machine learning, and then you look into their documentation and find no information, which algorithm, which parameters, which thresholds. So I think this is definitely, especially if you want to assess the data quality, you really need to know what algorithm and how it's attuned and give the user, which in most case will be a technical person with technical background, like some chief data officer. And he or she really needs to have the possibility to tune these algorithms to get reliable results and to know what's going on and why, which records are selected, for example. >> So now what? You're presenting the results, right? You're obviously here at this conference and other conferences, and so it's been what, a year, right? >> Yes. >> And so what's the next wave? What's next for you? >> The next wave, we're currently working on a project which is called some Knowledge Graph for Data Quality Assessment, which should tackle two problems in ones. The first is to come up with a semantic representation of your data landscape in your company, but not only the data landscape itself in terms of gathering meta data, but also to automatically improve or annotate this data schema with data profiles. And I think what we've seen in the tools, we have a lot of capabilities for data profiling, but this is usually left to the user ad hoc, and here, we store it centrally and allow the user to continuously verify newly incoming data if this adheres to this standard data profile. And I think this is definitely one step into the way into more automation, and also I think it's the most... The best thing here with this approach would be to overcome this very arduous way of coming up with all the single rules within a team, but present the data profile to a group of data, within your data quality project to those peoples involved in the projects, and then they can verify the project and only update it and refine it, but they have some automated basis that is presented to them. >> Oh, great, same team or new team? >> Same team, yeah. >> Oh, great. >> We're continuing with it. >> Well, Lisa, thanks so much for coming to theCUBE and sharing the results of your study. Good luck with your talk on Friday. >> Thank you very much, thank you. >> All right, and thank you for watching. Keep it right there, everybody. We'll be back with our next guest right after this short break. From MIT CDOIQ, you're watching theCUBE. (upbeat music)

Published Date : Jul 31 2019

SUMMARY :

Brought to you by SiliconANGLE Media. and the Software Competence Center in Hagenberg. it's great to be here. Kind of the long tail of tools, Okay, so the main motivation for this study of the tools were, talking to customers? And I think this gives a really nice digest of the market And the third part is dedicated to data quality monitoring What's the URL, sorry. but I can send you afterwards, yeah. Yeah, maybe you can post that I was amazed, you tested 667 tools. Oh, sorry, I think we got some confusion here, and I think this really highlights also these very basic So basically the focus was on the feature function, Okay, but the other differences of the Magic Quadrant Did you talk to customers as well, or, and there I met some other customers and we are happy with that. or one of the surprises was that we think but present the data profile to a group of data, and sharing the results of your study. All right, and thank you for watching.

ENTITIES

Entity	Category	Confidence
Lisa Ehrlinger	PERSON	0.99+
Paul Gillin	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Hagenberg	LOCATION	0.99+
Lisa	PERSON	0.99+
Vienna	LOCATION	0.99+
Linz	LOCATION	0.99+
Five people	QUANTITY	0.99+
30	QUANTITY	0.99+
Johannes Kepler University	ORGANIZATION	0.99+
40	QUANTITY	0.99+
Friday	DATE	0.99+
one year	QUANTITY	0.99+
667 tools	QUANTITY	0.99+
France	LOCATION	0.99+
three categories	QUANTITY	0.99+
third part	QUANTITY	0.99+
Cambridge, Massachusetts	LOCATION	0.99+
Experian	ORGANIZATION	0.99+
second	QUANTITY	0.99+
two problems	QUANTITY	0.99+
more than 20 years	QUANTITY	0.99+
Gartner	ORGANIZATION	0.99+
single tools	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.98+
first	QUANTITY	0.98+
MIT CDOIQ	ORGANIZATION	0.98+
a year	QUANTITY	0.97+
three fields	QUANTITY	0.97+
Apache Griffin	ORGANIZATION	0.97+
Archive.org	OTHER	0.96+
.org	OTHER	0.96+
one step	QUANTITY	0.96+
Linz, Austria	LOCATION	0.95+
one	QUANTITY	0.94+
single	QUANTITY	0.94+
first insight	QUANTITY	0.93+
theCUBE	ORGANIZATION	0.92+
2019	DATE	0.92+
this morning	DATE	0.91+
BTQ	ORGANIZATION	0.91+
MIT Chief Data Officer and	EVENT	0.9+
Archive.com	OTHER	0.88+
Informatica	ORGANIZATION	0.85+
Software Competence Center	ORGANIZATION	0.84+
Information Quality Symposium 2019	EVENT	0.81+
MIT Chief Data Officer Information Quality Conference	EVENT	0.72+
Data Quality	ORGANIZATION	0.67+
#MITCDOIQ	EVENT	0.65+
Magic Quadrant	COMMERCIAL_ITEM	0.63+
Magic	COMMERCIAL_ITEM	0.45+
next	EVENT	0.44+
wave	EVENT	0.43+
Magic Quadrant	ORGANIZATION	0.43+
wave	DATE	0.41+
Magic	TITLE	0.39+

Dr. Stuart Madnick, MIT | MIT CDOIQ 2019

>> from Cambridge, Massachusetts. It's the Cube covering M I T. Chief data officer and information quality Symposium 2019. Brought to you by Silicon Angle Media. >> Welcome back to M I. T. In Cambridge, Massachusetts. Everybody. You're watching the cube. The leader in live tech coverage. This is M I t CDO I Q the chief data officer and information quality conference. Someday Volonte with my co host, Paul Galen. Professor Dr Stewart, Mad Nick is here. Longtime Cube alum. Ah, long time professor at M i. T soon to be retired, but we're really grateful that you're taking your time toe. Come on. The Cube is great to see you again. >> It's great to see you again. It's been a long time. She worked together and I really appreciate the opportunity to share our spirits. Hear our mighty with your audience. Well, it's really been fun >> to watch this conference evolved were full and it's really amazing. We have to move to a new venue >> next year. I >> understand. And data we talk about the date explosion all the time, But one of the areas that you're focused on and you're gonna talk about today is his ethics and privacy and data causes so many concerns in those two areas. But so give us the highlight of what you're gonna discuss with the audience today. We'll get into >> one of things that makes it so challenging. It is. Data has so many implications. Tow it. And that's why the issue of ethics is so hard to get people to reach agreement on it. We're talking people regarding medicine and the idea big data and a I so know, to be able to really identify causes you need mass amounts of data. That means more data has to be made available as long as it's Elsa data, not mine. Well, not my backyard. If he really So you have this issue where on the one hand, people are concerned about sharing the data. On the other hand, there's so many valuable things would gain by sharing data and getting people to reach agreement is a challenge. Well, one of things >> I wanted to explore with you is how things have changed you back in the day very familiar with Paul you as well with Microsoft, Department of Justice, justice, FTC issues regarding Microsoft. And it wasn't so much around data was really around browsers and bundling things today. But today you see Facebook and Google Amazon coming under fire, and it's largely data related. Listen, Liz Warren, last night again break up big tech your thoughts on similarities and differences between sort of the monopolies of yesterday and the data monopolies of today Should they be broken up? What do you thought? So >> let me broaden the issue a little bit more from Maryland, and I don't know how the demographics of the audience. But I often refer to the characteristics that millennials the millennials in general. I ask my students this question here. Now, how many of you have a Facebook account in almost every class? Facebook. You realize you've given away a lot of nation about yourself. It it doesn't really occurred to them. That may be an issue. I was told by someone that in some countries, Facebook is very popular. That's how they cordoned the kidnappings of teenagers from rich families. They track them. They know they're going to go to this basketball game of the soccer match. You know exactly what I'm going after it. That's the perfect spot to kidnap them, so I don't know whether students think about the fact that when they're putting things on Facebook than making so much of their life at risk. On the other hand, it makes their life richer, more enjoyable. And so that's why these things are so challenging now, getting back to the issue of the break up of the big tech companies. One of the big challenges there is that in order to do the great things that big data has been doing and the things that a I promises do you need lots of data. Having organizations that can gather it all together in a relatively systematic and consistent manner is so valuable breaking up the tech companies. And there's some reasons why people want to do that, but also interferes with that benefit. And that's why I think it's gonna be looked at real Kim, please, to see not only what game maybe maybe breaking up also what losses of disadvantages we're creating >> for ourselves so example might be, perhaps it makes United States less competitive. Visa VI China, in the area of machine intelligence, is one example. The flip side of that is, you know Facebook has every incentive to appropriate our data to sell ads. So it's not an easy, you know, equation. >> Well, even ads are a funny situation for some people having a product called to your attention that something actually really want. But you never knew it before could be viewed as a feature, right? So, you know, in some case of the ads, could be viewed as a feature by some people. And, of course, a bit of intrusion by other people. Well, sometimes we use the search. Google, right? Looking >> for the ad on the side. No longer. It's all ads. You know >> it. I wonder if you see public public sentiment changing in this respect. There's a lot of concerns, certainly at the legislative level now about misuse of data. But Facebook user ship is not going down. Instagram membership is not going down. Uh, indication is that that ordinary citizens don't really care. >> I know that. That's been my I don't have all the data. Maybe you may have seen, but just anecdotally and talking to people in the work we're doing, I agree with you. I think most people maybe a bit dramatic, but at a conference once and someone made a comment that there has not been the digital Pearl Harbor yet. No, there's not been some event that was just so onerous. Is so all by the people. Remember the day it happened kind of thing. And so these things happen and maybe a little bit of press coverage and you're back on your Facebook. How their instagram account the next day. Nothing is really dramatic. Individuals may change now and then, but I don't see massive changes. But >> you had the Equifax hack two years ago. 145,000,000 records. Capital one. Just this week. 100,000,000 records. I mean, that seems pretty Pearl Harbor ish to me. >> Well, it's funny way we're talking about that earlier today regarding different parts of the world. I think in Europe, the general, they really seem to care about privacy. United States that kind of care about privacy in China. They know they have no privacy. But even in us where they care about privacy, exactly how much they care about it is really an issue. And in general it's not enough to move the needle. If it does, it moves it a little bit about the time when they show that smart TVs could be broken into smart. See, TV sales did not Dutch an inch. Not much help people even remember that big scandal a year ago. >> Well, now, to your point about expects, I mean, just this week, I think Equifax came out with a website. Well, you could check whether or not your credentials were. >> It's a new product. We're where we're compromised. And enough in what has been >> as head mind, I said, My wife says it's too. So you had a choice, you know, free monitoring or $125. So that way went okay. Now what? You know, life goes >> on. It doesn't seem like anything really changes. And we were talking earlier about your 1972 book about cyber security, that many of the principles and you outlined in that book are still valid today. Why are we not making more progress against cybercriminals? >> Well, two things. One thing is you gotta realize, as I said before, the Cave man had no privacy problems and no break in problems. But I'm not sure any of us want to go back to caveman era because you've got to realize that for all these bad things. There's so many good things that are happening, things you could now do, which a smartphone you couldn't even visualize doing a decade or two ago. So there's so much excitement, so much for momentum, autonomous cars and so on and so on that these minor bumps in the road are easy to ignore in the enthusiasm and excitement. >> Well and now, as we head into 2020 affection it was. It was fake news in 2016. Now we've got deep fakes. Get the ability to really use video in new ways. Do you see a way out of that problem? A lot of people looking a Blockchain You wrote an article recently, and Blockchain you think it's on hackable? Well, think again. >> What are you seeing? I think one of things we always talk about when we talk about improving privacy and security and organizations, the first thing is awareness. Most people are really small moment of time, aware that there's an issue and it quickly pass in the mind. The analogy I use regarding industrial safety. You go into almost any factory. You'll see a sign over the door every day that says 520 days, his last industrial accident and then a sub line. Please do not be the one to reset it this year. And I often say, When's the last time you went to a data center? And so assign is at 50 milliseconds his last cyber data breach. And so it needs to be something that is really front, the mind and people. And we talk about how to make awareness activities over companies and host household. And that's one of our major movements here is trying to be more aware because we're not aware that you're putting things at risk. You're not gonna do anything about it. >> Last year we contacted Silicon Angle, 22 leading security experts best in one simple question. Are we winning or losing the war against cybercriminals? Unanimously, they said, we're losing. What is your opinion of that question? >> I have a great quote I like to use. The good news is the good guys are getting better than a firewall of cryptographic codes. But the bad guys are getting batter faster, and there's a lot of reasons for that well on all of them. But we came out with a nautical talking about the docking Web, and the reason why it's fascinating is if you go to most companies if they've suffered a data breach or a cyber attack, they'll be very reluctant to say much about unless they really compelled to do so on the dock, where they love to Brent and reputation. I'm the one who broke in the Capital One. And so there's much more information sharing that much more organized, a much more disciplined. I mean, the criminal ecosystem is so much more superior than the chaotic mess we have here on the good guys side of the table. >> Do you see any hope for that? There are service's. IBM has one, and there are others in a sort of anonymous eyes. Security data enable organizations to share sensitive information without risk to their company. You see any hope on the collaboration, Front >> said before the good guys are getting better. The trouble is, at first I thought there was an issue that was enough sharing going on. It turns out we identified over 120 sharing organizations. That's the good news. And the bad news is 120. So IBM is one and another 119 more to go. So it's not a very well coordinated sharing. It's going just one example. The challenges Do I see any hope in the future? Well, in the more distant future, because the challenge we have is that there'll be a cyber attack next week of some form or shape that we've never seen before and therefore what? Probably not well prepared for it. At some point, I'll no longer be able to say that, but I think the cyber attackers and creatures and so on are so creative. They've got another decade of more to go before they run out of >> Steve. We've got from hacktivists to organized crime now nation states, and you start thinking about the future of war. I was talking to Robert Gates, aboutthe former defense secretary, and my question was, Why don't we have the best cyber? Can't we go in the oven? It goes, Yeah, but we also have the most to lose our critical infrastructure, and the value of that to our society is much greater than some of our adversaries. So we have to be very careful. It's kind of mind boggling to think autonomous vehicles is another one. I know that you have some visibility on that. And you were saying that technical challenges of actually achieving quality autonomous vehicles are so daunting that security is getting pushed to the back burner. >> And if the irony is, I had a conversation. I was a visiting professor, sir, at the University of Niece about a 12 14 years ago. And that's before time of vehicles are not what they were doing. Big automotive tele metrics. And I realized at that time that security wasn't really our top priority. I happen to visit organization, doing really Thomas vehicles now, 14 years later, and this conversation is almost identical now. The problems we're trying to solve. A hider problem that 40 years ago, much more challenging problems. And as a result, those problems dominate their mindset and security issues kind of, you know, we'll get around him if we can't get the cot a ride correctly. Why worry about security? >> Well, what about the ethics of autonomous vehicles? Way talking about your programming? You know, if you're gonna hit a baby or a woman or kill your passengers and yourself, what do you tell the machine to Dio, that is, it seems like an unsolvable problem. >> Well, I'm an engineer by training, and possibly many people in the audience are, too. I'm the kind of person likes nice, clear, clean answers. Two plus two is four, not 3.94 point one. That's the school up the street. They deal with that. The trouble with ethic issues is they don't tend to have a nice, clean answer. Almost every study we've done that has these kind of issues on it. And we have people vote almost always have spread across the board because you know any one of these is a bad decision. So which the bad decision is least bad. Like, what's an example that you used the example I use in my class, and we've been using that for well over a year now in class, I teach on ethics. Is you out of the design of an autonomous vehicle, so you must program it to do everything and particular case you have is your in the vehicle. It's driving around the mountain and Swiss Alps. You go around a corner and the vehicle, using all of senses, realize that straight ahead on the right? Ian Lane is a woman in a baby carriage pushing on to this onto the left, just entering the garage way a three gentlemen, both sides a road have concrete barriers so you can stay on your path. Hit the woman the baby carriage via to the left. Hit the three men. Take a shop, right or shot left. Hit the concrete wall and kill yourself. And trouble is, every one of those is unappealing. Imagine the headline kills woman and baby. That's not a very good thing. There actually is a theory of ethics called utility theory that says, better to say three people than to one. So definitely doing on Kim on a kill three men, that's the worst. And then the idea of hitting the concrete wall may feel magnanimous. I'm just killing myself. But as a design of the car, shouldn't your number one duty be to protect the owner of the car? And so people basically do. They close their eyes and flip a coin because they don't want anyone. Those hands, >> not an algorithmic >> response, doesn't leave. >> I want to come back for weeks before we close here to the subject of this conference. Exactly. You've been involved with this conference since the very beginning. How have you seen the conversation changed since that time? >> I think I think it's changing to Wei first. As you know, this record breaking a group of people are expecting here. Close to 500 I think have registered s o much Clea grown kind of over the years, but also the extent to which, whether it was called big data or call a I now whatever is something that was kind of not quite on the radar when we started, I think it's all 15 years ago. He first started the conference series so clearly has become something that is not just something We talk about it in the academic world but is becoming main stay business for corporations Maur and Maur. And I think it's just gonna keep increasing. I think so much of our society so much of business is so dependent on the data in any way, shape or form that we use it and have >> it well, it's come full circle. It's policy and I were talking at are open. This conference kind of emerged from the ashes of the back office information quality and you say the big date and now a I guess what? It's all coming back to information. >> Lots of data. That's no good. Or that you don't understand what they do with this. Not very healthy. >> Well, doctor Magic. Thank you so much. It's a >> relief for all these years. Really Wanna thank you. Thank you, guys, for joining us and helping to spread the word. Thank you. Pleasure. All right, keep it right, everybody. Paul and >> I will be back at M I t cdo right after this short break. You're watching the cue.

Published Date : Jul 31 2019

SUMMARY :

Brought to you by The Cube is great to see you again. It's great to see you again. We have to move to a new venue I But one of the areas that you're focused on and you're gonna talk about today is his ethics and privacy to be able to really identify causes you need mass amounts of data. I wanted to explore with you is how things have changed you back in the One of the big challenges there is that in order to do the great things that big data has been doing The flip side of that is, you know Facebook has every incentive to appropriate our data to sell ads. But you never knew it before could be viewed as a feature, for the ad on the side. There's a lot of concerns, certainly at the legislative level now about misuse of data. Is so all by the people. I mean, that seems pretty Pearl Harbor ish to me. And in general it's not enough to move the needle. Well, now, to your point about expects, I mean, just this week, And enough in what has been So you had a choice, you know, book about cyber security, that many of the principles and you outlined in that book are still valid today. in the road are easy to ignore in the enthusiasm and excitement. Get the ability to really use video in new ways. And I often say, When's the last time you went to a data center? What is your opinion of that question? Web, and the reason why it's fascinating is if you go to most companies if they've suffered You see any hope on the collaboration, in the more distant future, because the challenge we have is that there'll be a cyber attack I know that you have some visibility on that. And if the irony is, I had a conversation. that is, it seems like an unsolvable problem. But as a design of the car, shouldn't your number one How have you seen the conversation so much of business is so dependent on the data in any way, shape or form that we use it and from the ashes of the back office information quality and you say the big date and now a I Or that you don't understand what they do with this. Thank you so much. to spread the word. I will be back at M I t cdo right after this short break.

ENTITIES

Entity	Category	Confidence
Ian Lane	PERSON	0.99+
Stuart Madnick	PERSON	0.99+
Liz Warren	PERSON	0.99+
Paul Galen	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Europe	LOCATION	0.99+
China	LOCATION	0.99+
$125	QUANTITY	0.99+
Paul	PERSON	0.99+
Equifax	ORGANIZATION	0.99+
2016	DATE	0.99+
Steve	PERSON	0.99+
Facebook	ORGANIZATION	0.99+
Robert Gates	PERSON	0.99+
Google	ORGANIZATION	0.99+
Silicon Angle	ORGANIZATION	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
Elsa	PERSON	0.99+
four	QUANTITY	0.99+
520 days	QUANTITY	0.99+
Stewart	PERSON	0.99+
Last year	DATE	0.99+
next year	DATE	0.99+
Cambridge, Massachusetts	LOCATION	0.99+
Two	QUANTITY	0.99+
Kim	PERSON	0.99+
2020	DATE	0.99+
50 milliseconds	QUANTITY	0.99+
Swiss Alps	LOCATION	0.99+
this week	DATE	0.99+
yesterday	DATE	0.99+
three men	QUANTITY	0.99+
14 years later	DATE	0.99+
two years ago	DATE	0.99+
a year ago	DATE	0.99+
three people	QUANTITY	0.99+
today	DATE	0.99+
Amazon	ORGANIZATION	0.99+
two	QUANTITY	0.99+
two things	QUANTITY	0.99+
one simple question	QUANTITY	0.99+
last night	DATE	0.99+
one example	QUANTITY	0.99+
Instagram	ORGANIZATION	0.99+
two areas	QUANTITY	0.98+
Dio	PERSON	0.98+
United States	LOCATION	0.98+
120	QUANTITY	0.98+
next week	DATE	0.98+
first	QUANTITY	0.98+
this year	DATE	0.98+
22 leading security experts	QUANTITY	0.98+
three gentlemen	QUANTITY	0.98+
One	QUANTITY	0.98+
1972	DATE	0.98+
instagram	ORGANIZATION	0.98+
FTC	ORGANIZATION	0.98+
one	QUANTITY	0.97+
100,000,000 records	QUANTITY	0.97+
Magic	PERSON	0.97+
145,000,000 records	QUANTITY	0.97+
Pearl Harbor	EVENT	0.97+
40 years ago	DATE	0.97+
Maryland	LOCATION	0.97+
University of Niece	ORGANIZATION	0.97+
Department of Justice	ORGANIZATION	0.96+
One thing	QUANTITY	0.95+
over 120 sharing organizations	QUANTITY	0.95+
next day	DATE	0.95+
12 14 years ago	DATE	0.94+
15 years ago	DATE	0.93+
an inch	QUANTITY	0.93+
first thing	QUANTITY	0.93+
one example	QUANTITY	0.92+

Ilana Golbin, PwC | MIT CDOIQ 2018

>> Live from the MIT campus in Cambridge, Massachusetts, it's The Cube, covering the 12th annual MIT Chief Data Officer and Information Quality Symposium. Brought to you by Silicon Angle Media. >> Welcome back to The Cube's coverage of MIT CDOIQ, here in Cambridge, Massachusetts. I'm your host, Rebecca Knight, along with my cohost Peter Burris. We're joined by Ilana Golbin. She is the manager of artificial intelligence accelerator PWC... >> Hi. >> Based out of Los Angeles. Thanks so much for coming on the show! >> Thank you for having me. >> So I know you were on the main stage, giving a presentation, really talking about fears, unfounded or not, about how artificial intelligence will change the way companies do business. Lay out the problem for us. Tell our viewers a little bit about how you see the landscape right now. >> Yeah, so I think... We've really all experienced this, that we're generating more data than we ever have in the past. So there's all this data coming in. A few years ago that was the hot topic: big data. That big data's coming and how are we going to harness big data. And big data coupled with this increase in computing power has really enabled us to build stronger models that can provide more predictive power for a variety of use cases. So this is a good thing. The problem is that we're seeing these really cool models come out that are black box. Very difficult to understand how they're making decisions. And it's not just for us as end users, but also developers. We don't really know 100% why some models are making the decisions that they are. And that can be a problem for auditing. It can be a problem for regulation if that comes into play. And as end users for us to trust the model. Comes down to the use case, so why we're building these models. But ultimately we want to ensure that we're building models responsibly so the models are in line with our mission as business, and they also don't do any unintended harm. And so because of that, we need some additional layers to protect ourself. We need to build explainability into models and really understand what they're doing. >> You said two really interesting things. Let's take one and then the other. >> Of course. >> We need to better understand how we build models and we need to do a better job of articulating what those models are. Let's start with the building of models. What does it mean to do a better job of building models? Where are we in the adoption of better? >> So I think right now we're at the point where we just have a lot of data and we're very excited about it and we just want to throw it into whatever models we can and see what we can get that has the best performance. But we need to take a step back and look at the data that we're using. Is the data biased? Does the data match what we see in the real world? Do we have a variety of opinions in both the data collection process and also the model design process? Diversity is not just important for opinions in a room but it's also important for models. So we need to take a step back and make sure that we have that covered. Once we're sure that we have data that's sufficient for our use case and the bias isn't there or the bias is there to the extent that we want it to be, then we can go forward and build these better models. So I think we're at the point where we're really excited, and we're seeing what we can do, but businesses are starting to take a step back and see how they can do that better. >> Now the one B and the tooling, where is the tooling? >> The tooling... If you follow any of the literature, you'll see new publications come out sometimes every minute of the different applications for these really advanced models. Some of the hottest models on the market today are deep learning models and reinforcement learning models. They may not have an application for some businesses yet, but they definitely are building those types of applications, so the techniques themselves are continuing to advance, and I expect them to continue to do so. Mostly because the data is there and the processing power is there and there's so much investment coming in from various government institutions and governments in these types of models. >> And the way typically that these things work is the techniques and the knowledge of techniques advance and then we turn them into tools. So the tools are lagging a little bit still behind the techniques, but it's catching up. Would you agree? >> I would agree with that. Just because commercial tools can't keep up with the pace of academic environment, we wouldn't really expect them to, but once you've invested in a tool you want to try and improve that tool rather than reformat that tool with the best technique that came out yesterday. So there is some kind of iteration that will continue to happen to make sure that our commercially available tools match what we see in the academic space. >> So a second question is, now we've got the model, how do we declare the model? What is the state of the art in articulating metadata, what the model does, what its issues are? How are we doing a better job and what can we do better to characterize these models so they can be more applicable while at the same time maintaining fidelity that was originally intended and embedded? >> I think the first step is identifying your use case. The extent to which we want to explain a model really is dependent on this use case. For instance, if you have a model that is going to be navigating a self-driving car, you probably want to have a lot more rigor around how that model is developed than with a model that targets mailers. There's a lot of middle ground there, and most of the business applications fall into that middle ground, but there're still business risks that need to be considered. So to the extent to which we can clearly articulate and define the use case for an AI application, that will help inform what level of explainability or interpretability we need out of our tool. >> So are you thinking in terms of what it means, how do we successfully define use cases? Do you have templates that you're using at PWC? Or other approaches to ensure that you get the rigor in the definition or the characterization of the model that then can be applied both to a lesser, you know, who are you mailing, versus a life and death situation like, is the car behaving the way it's expected to? >> And yet the mailing, we have the example, the very famous Target example that outed a young teenage girl who was pregnant before. So these can have real life implications. >> And they can, but that's a very rare instance, right? And you could also argue that that's not the same as missing a stop sign and potentially injuring someone in a car. So there are always going to be extremes, but usually when we think about use cases we think about criticality, which is the extent to which someone could be harmed. And vulnerability, which is the willingness for an end user to accept a model and the decision that it makes. A high vulnerability use case could be... Like a few years ago or a year ago I was talking to a professor at UCSC, University of California San Diego, and he was talking to a medical devices company that manufactures devices for monitoring your blood sugar levels. So this could be a high vulnerability case. If you have an incorrect reading, someone's life could be in danger. This medical device was intended to read the blood sugar levels by noninvasive means, just by scanning your skin. But the metric that was used to calculate this blood sugar was correct, it just wasn't the same that an end user was expecting. Because that didn't match, these end users did not accept this device, even though it did operate very well. >> They abandoned it? >> They abandoned it. It didn't sell. And what this comes down to is this is a high vulnerability case. People want to make sure that their lives, the lives of their kids, whoever's using this devices is in good hands, and if they feel like they can't trust it, they're not going to use it. So the use case I do believe is very important, and when we think about use cases, we think of them on those two metrics: vulnerability and criticality. >> Vulnerability and criticality. >> And we're always evolving our thinking on this, but this is our current thinking, yeah. >> Where are we, in terms of the way in which... From your perspective, the way in which corporations are viewing this, do you believe that they have the right amount of trepidation? Or are they too trepidatious when it comes to this? What is the mindset? Speaking in general terms. >> I think everybody's still trying to figure it out. What I've been seeing, personally, is businesses taking a step back and saying, "You know we've been building all these proof of concepts, "or deploying these pilots, "but we haven't done anything enterprise-wide yet." Generally speaking. So what we're seeing are business coming back and saying, "Before we go any further, we need "a comprehensive AI strategy. "We need something central within our organization "that tells us, that defines how we're going to move forward "and build these future tools, so that we're not then "moving backwards and making sure everything aligns." So I think this is really the stage that businesses are in. Once they have a central AI strategy, I think it becomes much easier to evaluate regulatory risks or anything like that. Just because it all reports to a central entity. >> But I want to build on that notion. 'Cause generally we agree. But I want to build on that notion, though. We're doing a good job in the technology world of talking about how we're distributing processing power. We're doing a good job of describing how we're distributing data. And we're even doing a good job of just describing how we're distributing known process. We're not doing a particularly good job of what we call systems of agency. How we're distributing agency. In other words, the degree to which a model is made responsible for acting on behalf of the brand. Now in some domains, medical devices, there is a very clear relationship between what the device says it's going to do, and who ultimately is decided to be, who's culpable. But in the software world, we use copyright law. And copyright law is a speech act. How do we ensure that this notion of agency, we're distributing agency appropriately so that when something is being done on behalf of the brand, that there is a lineage of culpability, a lineage of obligations associated with that? Where are we? >> I think right now we're still... And I can't speak for most organizations, just my personal experience. I think that the companies or the instances I've seen, we're still really early on in that. Because AI is different from traditional software, but it still needs to be audited. So we're at the stage where we're taking a step back and we're saying, "We know we need a mechanism "to monitor and audit our AI." We need controls around this. We need to accurately provide auditing and assurance around our AI applications. But we recognize it's different from traditional software. For a variety of reasons. AI is adaptive. It's not static like traditional software. >> It's probabilistic and not categorical. >> Exactly. So there are a lot of other externalities that need to be considered. And so this is something that a lot of businesses are thinking about. One of the reasons why having a central AI strategy is really important, is that you can also define a central controls framework, some type of centralized assurance and auditing process that's mandated from a high level of the organization that everybody will follow. And that's really the best way to get AI widely adopted. Because otherwise, I think we'll be seeing a lot of challenges. >> So I've got one more question. And one question I have is, if you look out in the next three years, as someone who is working with customers, working with academics, trying to match the need to the expertise, what is the next conversation that's going to pop to the top of the stack in this world, in, say, within the next two years? >> Yeah what we'll we be talking about next year or five years from now, too, at the next CDOIQ? >> I think this topic of explainability will persist. Because I don't think we will necessarily tick all the boxes in the next year. I think we'll uncover new challenges and we'll have to think about new ways to explain how models are operating. Other than that, I think customers will want to see more transparency in the process itself. So not just the model and how it's making its decisions, but what data is feeding into that. How are you using my data to impact how a model is making decisions on my behalf? What is feeding into my credit score? And what can I do to improve it? Those are the types of conversations I think we'll be having in the next two years, for sure. >> Great, well Ilana, thanks so much for coming on The Cube. It was great having you. >> Thank you for having me. >> I'm Rebecca Knight for Peter Burris. We will have more from MIT Chief Data Officer Symposium 2018 just after this. (upbeat electronic music)

Published Date : Jul 19 2018

SUMMARY :

Brought to you by Silicon Angle Media. She is the manager of artificial intelligence accelerator Thanks so much for coming on the show! Lay out the problem for us. are making the decisions that they are. really interesting things. We need to better understand how we build models and look at the data that we're using. and the processing power is there and there's so much So the tools are lagging a little bit still of academic environment, we wouldn't really expect them to, and most of the business applications the very famous Target example and the decision that it makes. So the use case I do believe is very important, And we're always evolving our thinking on this, What is the mindset? I think it becomes much easier to evaluate But in the software world, we use copyright law. So we're at the stage where we're taking a step back And that's really the best way the need to the expertise, So not just the model and how it's making its decisions, It was great having you. We will have more from MIT Chief Data Officer Symposium 2018

ENTITIES

Entity	Category	Confidence
Ilana	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Ilana Golbin	PERSON	0.99+
Peter Burris	PERSON	0.99+
PWC	ORGANIZATION	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
two	QUANTITY	0.99+
UCSC	ORGANIZATION	0.99+
Los Angeles	LOCATION	0.99+
one question	QUANTITY	0.99+
first step	QUANTITY	0.99+
next year	DATE	0.99+
Cambridge, Massachusetts	LOCATION	0.99+
second question	QUANTITY	0.99+
yesterday	DATE	0.99+
one more question	QUANTITY	0.99+
one	QUANTITY	0.99+
two metrics	QUANTITY	0.99+
a year ago	DATE	0.99+
One	QUANTITY	0.98+
both	QUANTITY	0.98+
The Cube	ORGANIZATION	0.97+
MIT	ORGANIZATION	0.93+
MIT Chief Data Officer and Information Quality Symposium	EVENT	0.93+
few years ago	DATE	0.93+
next two years	DATE	0.92+
today	DATE	0.92+
Target	ORGANIZATION	0.91+
MIT CDOIQ	ORGANIZATION	0.91+
interesting things	QUANTITY	0.88+
PwC	ORGANIZATION	0.87+
University of California San Diego	ORGANIZATION	0.85+
next three years	DATE	0.81+
MIT Chief Data Officer Symposium 2018	EVENT	0.79+
12th annual	QUANTITY	0.75+
MIT CDOIQ 2018	EVENT	0.74+
five	DATE	0.69+
years	QUANTITY	0.63+
Cube	ORGANIZATION	0.59+
CDOIQ	ORGANIZATION	0.45+

Bhavani Thurasingham, UT Dallas | WiDS 2018

>> Announcer: Live, from Stanford University in Palo Alto, California, it's theCUBE covering Women in Data Science Conference 2018, brought to you by Stanford. (light techno music) >> Welcome back to theCUBE's continuing coverage of the Women in Data Science event, WiDS 2018. We are live at Stanford University. You can hear some great buzz around us. A lot of these exciting ladies in data science are here around us. I'm pleased to be joined by my next guest, Bhavani Thuraisingham, who is one of the speakers this afternoon, as well as a distinguished professor of computer science and the executive director of Cyber Security Institute at the University of Texas at Dallas. Bhavani, thank you so much for joining us. >> Thank you very much for having me in your program. >> You have an incredible career, but before we get into that I'd love to understand your thoughts on WiDS. In it's third year alone, they're expecting to reach over 100,000 people today, both here at Stanford, as well as more than 150 regional events in over 50 countries. When you were early in your career you didn't have a mentor. What does an event like WiDS mean to you? What are some of the things that excite you about giving your time to this exciting event? >> This is such an amazing event and just in three years it has just grown and I'm just so motivated myself and it's just, words cannot express to see so many women working in data science or wanting to work in data science, and not just in U.S. and in Stanford, it's around the world. I was reading some information about WiDS and I'm finding that there are WiDS ambassadors in Africa, South America, Asia, Australia, Europe, of course U.S., Central America, all over the world. And data science is exploding so rapidly because data is everywhere, right? And so you really need to collect the data, stow the data, analyze the data, disseminate the data, and for that you need data scientists. And what I'm so encouraged is that when I started getting into this field back in 1985, and that was 32 plus years ago in the fall, I worked 50% in cyber security, what used to be called computer security, and 50% in data science, what used to be called data management at the time. And there were so few women and we did not have, as I said, women role models, and so I had to sort of work really hard, the commercial industry and then the MITRE Corporation and the U.S. Government, but slowly I started building a network and my strongest supporters have been women. And so that was sort of in the early 90's when I really got started to build this network and today I have a strong support group of women and we support each other and we also mentor so many of the junior women and so that, you know, they don't go through, have to learn the hard way like I have and so I'm very encouraged to see the enthusiasm, the motivation, both the part of the mentors as well as the mentees, so that's very encouraging but we really have to do so much more. >> We do, you're right. It's really kind of the tip of the iceberg, but I think this scale at which WiDS has grown so quickly shines a massive spotlight on there's clearly such a demand for it. I'd love to get a feel now for the female undergrads in the courses that you teach at UT Dallas. What are some of the things that you are seeing in terms of their beliefs in themselves, their interests in data science, computer science, cyber security. Tell me about that dynamic. >> Right, so I have been teaching for 13 plus years full-time now, after a career in industry and federal research lab and government and I find that we have women, but still not enough. But just over the last 13 years I'm seeing so much more women getting so involved and wanting to further their careers, coming and talking to me. When I first joined in 2004 fall, there weren't many women, but now with programs like WiDS and I also belong to another conference and actually I shared that in 2016, called WiCyS, Women in Cyber Security. So, through these programs, we've been able to recruit more women, but I would still have to say that most of the women, especially in our graduate programs are from South Asia and East Asia. We hardly find women from the U.S., right, U.S. born women pursuing careers in areas like cyber security and to some extent I would also say data science. And so we really need to do a lot more and events like WiDS and WiCys, and we've also started a Grace Lecture Series. >> Grace Hopper. >> We call it Grace Lecture at our university. Of course there's Grace Hopper, we go to Grace Hopper as well. So through these events I think that, you know women are getting more encouraged and taking leadership roles so that's very encouraging. But I still think that we are really behind, right, when you compare men and women. >> Yes and if you look at the statistics. So you have a speaking session this afternoon. Share with our audience some of the things that you're going to be sharing with the audience and some of the things that you think you'll be able to impart, in terms of wisdom, on the women here today. >> Okay, so, what I'm going to do is that, first start off with some general background, how I got here so I've already mentioned some of it to you, because it's not just going to be a U.S. event, you know, it's going to be in Forbes reports that around 100,000 people are going to watch this event from all over the world so I'm going to sort of speak to this global audience as to how I got here, to motivate these women from India, from Nigeria, from New Zealand, right? And then I'm going to talk about the work I've done. So over the last 32 years I've said about 50% of my time has been in cyber security, 50% in data science, roughly. Sometimes it's more in cyber, sometimes more in data. So my work has been integrating the two areas, okay? So my talk, first I'm going to wear my data science hat, and as a data scientist I'm developing data science techniques, which is integration of statistical reasoning, machine learning, and data management. So applying data science techniques for cyber security applications. What are these applications? Intrusion detection, insider threat detection, email spam filtering, website fingerprinting, malware analysis, so that's going to be my first part of the talk, a couple of charts. But then I'm going to wear my cyber security hat. What does that mean? These data science techniques could be hacked. That's happening now, there are some attacks that have been published where the data science, the models are being thwarted by the attackers. So you can do all the wonderful data science in the world but if your models are thwarted and they go and do something completely different, it's going to be of no use. So I'm going to wear my cyber security hat and I'm going to talk about how we are taking the attackers into consideration in designing our data science models. It's not easy, it's extremely challenging. We are getting some encouraging results but it doesn't mean that we have solved the problem. Maybe we will never solve the problem but we want to get close to it. So this area called Adversarial Machine Learning, it started probably around five years ago, in fact our team has been doing some really good work for the Army, Army research office, on Adversarial Machine Learning. And when we started, I believe it was in 2012, almost six years ago, there weren't many people doing this work, but now, there are more and more. So practically every cyber security conference has got tracks in data science machine learning. And so their point of view, I mean, their focus is not, sort of, designing machine learning techniques. That's the area of data scientists. Their focus is going to be coming up with appropriate models that are going to take the attackers into consideration. Because remember, attackers are always trying to thwart your learning process. >> Right, we were just at Fortinet Accelerate last week, theCUBE was, and cyber security and data science are such interesting and pervasive topics, right, cyber security things when Equifax happened, right, it suddenly translates to everyone, male, female, et cetera. And the same thing with data science in terms of the social impact. I'd love your thoughts on how cyber security and data science, how you can educate the next generation and maybe even reinvigorate the women that are currently in STEM fields to go look at how much more open and many more opportunities there are for women to make massive impact socially. >> There are, I would say at this time, unlimited opportunities in both areas. Now, in data science it's really exploding because every company wants to do data science because data gives them the edge. But what's the point in having raw data when you cannot analyze? That's why data science is just exploding. And in fact, most of our graduate students, especially international students, want to focus in data science. So that's one thing. Cyber security is also exploding because every technology that is being developed, anything that has a microprocessor could be hacked. So, we can do all the great data science in the world but an attacker can thwart everything, right? And so cyber security is really crucial because you have to try and stop the attacker, or at least detect what the attacker is doing. So every step that you move forward you're going to be attacked. That doesn't mean you want to give up technology. One could say, okay, let's just forget about Facebook, and Google, and Amazon, and the whole lot and let's just focus on cyber security but we cannot. I mean we have to make progress in technology. Whenever we make for progress in technology, driver-less cars or pacemakers, these technologies could be attacked. And with cyber security there is such a shortage with the U.S. Government. And so we have substantial funding from the National Science Foundation to educate U.S. citizen students in cyber security. And especially recruit more women in cyber security. So that's why we're also focusing, we are a permanent coach here for the women in cyber security event. >> What have some of the things along that front, and I love that, that you think are key to successfully recruiting U.S. females into cyber security? What do you think speaks to them? >> So, I think what speaks to them, and we have been successful in recent years, this program started in 2010 for us, so it's about eight years. The first phase we did not have women, so 2000 to 2014, because we were trying to get this education program going, giving out the scholarships, then we got our second round of funding, but our program director said, look, you guys have done a phenomenal job in having students, educating them, and placing them with U.S. Government, but you have not recruited female students. So what we did then is to get some of our senior lecturers, a superb lady called Dr. Janelle Stratch, she can really speak to these women, so we started the Grace Lecture. And so with those events, and we started the women in cyber security center as part of my cyber security institute. Through these events we were able to recruit more women. We are, women are still under-represented in our cyber security program but still, instead of zero women, I believe now we have about five women, and that's, five, by the time we will have finished a second phase we will have total graduated about 50 plus students, 52 to 55 students, out of which, I would say about eight would be female. So from zero to go to eight is a good thing, but it's not great. >> We want to keep going, keep growing that. >> We want out of 50 we should get at least 25. But at least it's a start for us. But data science we don't have as much of a problem because we have lots of international students, remember you don't need U.S. citizenship to get jobs at Facebook or, but you need U.S. citizenships to get jobs as NSA or CIA. So we get many international students and we have more women and I would say we have, I don't have the exact numbers, but in my classes I would say about 30%, maybe just under 30%, female, which is encouraging but still it's not good. >> 30% now, right, you're right, it's encouraging. What was that 13 years ago when you started? >> When I started, before data science and everything it was more men, very few women. I would say maybe about 10%. >> So even getting to 30% now is a pretty big accomplishment. >> Exactly, in data science, but we need to get our cyber security numbers up. >> So last question for you as we have about a minute left, what are some of the things that excite you about having the opportunity, to not just mentor your students, but to reach such a massive audience as you're going to be able to reach through WiDS? >> I, it's as I said, words cannot express my honor and how pleased and touched, these are the words, touched I am to be able to talk to so many women, and I want to say why, because I'm of, I'm a tamil of Sri Lanka origin and so I had to make a journey, I got married and I'm going to talk about, at 20, in 1975 and my husband was finishing, I was just finishing my undergraduate in mathematics and physics, my husband was finishing his Ph.D. at University of Cambridge, England, and so soon after marriage, at 20 I moved to England, did my master's and Ph.D., so I joined University of Bristol and then we came here in 1980, and my husband got a position at New Mexico Petroleum Recovery Center and so New Mexico Tech offered me a tenure-track position but my son was a baby and so I turned it down. Once you do that, it's sort of hard to, so I took visiting faculty positions for three years in New Mexico then in Minneapolis, then I was a senior software developer at Control Data Corporation it was one of the big companies. Then I had a lucky break in 1985. So I wanted to get back into research because I liked development but I wanted to get back into research. '85 I became, I was becoming in the fall, a U.S. citizen. Honeywell got a contract to design and develop a research contract from United States Air Force, one of the early secure database systems and Honeywell had to interview me and they had to like me, hire me. All three things came together. That was a lucky break and since then my career has been just so thankful, so grateful. >> And you've turned that lucky break by a lot of hard work into what you're doing now. We thank you so much for stopping. >> Thank you so much for having me, yes. >> And sharing your story and we're excited to hear some of the things you're going to speak about later on. So have a wonderful rest of the conference. >> Thank you very much. >> We wanted to thank you for watching theCUBE. Again, we are live at Stanford University at the third annual Women in Data Science Conference, #WiDs2018, I am Lisa Martin. After this short break I'll be back with my next guest. Stick around. (light techno music)

Published Date : Mar 5 2018

SUMMARY :

brought to you by Stanford. of computer science and the executive director What are some of the things that excite you so many of the junior women and so that, you know, What are some of the things that you are seeing and I find that we have women, but still not enough. So through these events I think that, you know and some of the things that you think you'll be able and I'm going to talk about how we and maybe even reinvigorate the women that are currently and let's just focus on cyber security but we cannot. and I love that, that you think are key to successfully and that's, five, by the time we will have finished to get jobs at Facebook or, but you need U.S. citizenships What was that 13 years ago when you started? it was more men, very few women. So even getting to 30% now Exactly, in data science, but we need and so I had to make a journey, I got married We thank you so much for stopping. some of the things you're going to speak about later on. We wanted to thank you for watching theCUBE.

ENTITIES

Entity	Category	Confidence
Honeywell	ORGANIZATION	0.99+
National Science Foundation	ORGANIZATION	0.99+
1980	DATE	0.99+
Bhavani	PERSON	0.99+
2010	DATE	0.99+
New Mexico	LOCATION	0.99+
1975	DATE	0.99+
Lisa Martin	PERSON	0.99+
Minneapolis	LOCATION	0.99+
Control Data Corporation	ORGANIZATION	0.99+
NSA	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
2012	DATE	0.99+
Janelle Stratch	PERSON	0.99+
1985	DATE	0.99+
England	LOCATION	0.99+
Australia	LOCATION	0.99+
MITRE Corporation	ORGANIZATION	0.99+
New Zealand	LOCATION	0.99+
Africa	LOCATION	0.99+
Facebook	ORGANIZATION	0.99+
United States Air Force	ORGANIZATION	0.99+
2016	DATE	0.99+
Google	ORGANIZATION	0.99+
Europe	LOCATION	0.99+
Asia	LOCATION	0.99+
52	QUANTITY	0.99+
five	QUANTITY	0.99+
three years	QUANTITY	0.99+
Nigeria	LOCATION	0.99+
2014	DATE	0.99+
CIA	ORGANIZATION	0.99+
U.S.	LOCATION	0.99+
13 plus years	QUANTITY	0.99+
India	LOCATION	0.99+
second round	QUANTITY	0.99+
Grace Hopper	PERSON	0.99+
Central America	LOCATION	0.99+
South Asia	LOCATION	0.99+
30%	QUANTITY	0.99+
50%	QUANTITY	0.99+
Cyber Security Institute	ORGANIZATION	0.99+
U.S. Government	ORGANIZATION	0.99+
eight	QUANTITY	0.99+
East Asia	LOCATION	0.99+
first phase	QUANTITY	0.99+
Bhavani Thuraisingham	PERSON	0.99+
South America	LOCATION	0.99+
Dallas	LOCATION	0.99+
last week	DATE	0.99+
University of Bristol	ORGANIZATION	0.99+
third year	QUANTITY	0.99+
Palo Alto, California	LOCATION	0.99+
zero	QUANTITY	0.99+
first part	QUANTITY	0.99+
2004 fall	DATE	0.99+
Stanford	LOCATION	0.99+
New Mexico Tech	ORGANIZATION	0.98+
WiDS	EVENT	0.98+
over 100,000 people	QUANTITY	0.98+
Equifax	ORGANIZATION	0.98+
one	QUANTITY	0.98+
more than 150 regional events	QUANTITY	0.98+
second phase	QUANTITY	0.98+
over 50 countries	QUANTITY	0.98+
UT Dallas	ORGANIZATION	0.98+
two areas	QUANTITY	0.98+
2000	DATE	0.98+
one thing	QUANTITY	0.98+
early 90's	DATE	0.98+
both areas	QUANTITY	0.98+
both	QUANTITY	0.98+
Stanford University	ORGANIZATION	0.98+
Women in Data Science	EVENT	0.98+
55 students	QUANTITY	0.98+
today	DATE	0.98+
first	QUANTITY	0.98+
WiDS 2018	EVENT	0.98+
'85	DATE	0.98+
theCUBE	ORGANIZATION	0.98+

Sharad Singhal, The Machine & Michael Woodacre, HPE | HPE Discover Madrid 2017

>> Man: Live from Madrid, Spain, it's the Cube! Covering HPE Discover Madrid, 2017. Brought to you by: Hewlett Packard Enterprise. >> Welcome back to Madrid, everybody, this is The Cube, the leader in live tech coverage. My name is Dave Vellante, I'm here with my co-host, Peter Burris, and this is our second day of coverage of HPE's Madrid Conference, HPE Discover. Sharad Singhal is back, Director of Machine Software and Applications, HPE and Corps and Labs >> Good to be back. And Mike Woodacre is here, a distinguished engineer from Mission Critical Solutions at Hewlett-Packard Enterprise. Gentlemen, welcome to the Cube, welcome back. Good to see you, Mike. >> Good to be here. >> Superdome Flex is all the rage here! (laughs) At this show. You guys are happy about that? You were explaining off-camera that is the first jointly-engineered product from SGI and HPE, so you hit a milestone. >> Yeah, and I came into Hewett Packard Enterprise just over a year ago with the SGI Acquisition. We're already working on our next generation in memory computing platform. We basically hit the ground running, integrated the engineering teams immediately that we closed the acquisition so we could drive through the finish line and with the product announcement just recently, we're really excited to get that out into the market. Really represent the leading in memory, computing system in the industry. >> Sharad, a high performance computer, you've always been big data, needing big memories, lots of performance... How has, or has, the acquisition of SGI shaped your agenda in any way or your thinking, or advanced some of the innovations that you guys are coming up with? >> Actually, it was truly like a meeting of the minds when these guys came into HPE. We had been talking about memory-driven computing, the machine prototype, for the last two years. Some of us were aware of it, but a lot of us were not aware of it. These guys had been working essentially in parallel on similar concepts. Some of the work we had done, we were thinking in terms of our road maps and they were looking at the same things. Their road maps were looking incredibly similar to what we were talking about. As the engineering teams came about, we brought both the Superdome X technology and The UV300 technology together into this new product that Mike can talk a lot more about. From my side, I was talking about the machine and the machine research project. When I first met Mike and I started talking to him about what they were doing, my immediate reaction was, "Oh wow wait a minute, this is exactly what I need!" I was talking about something where I could take the machine concepts and deliver products to customers in the 2020 time frame. With the help of Mike and his team, we are able to now do essentially something where we can take the benefits we are describing in the machine program and- make those ideas available to customers right now. I think to me that was the fun part of this journey here. >> So what are the key problems that your team is attacking with this new offering? >> The primary use case for the Superdome Flex is really high-performance in memory database applications, typically SAP Hana is sort of the industry leading solution in that space right now. One of the key things with the Superdome Flex, you know, Flex is the active word, it's the flexibility. You can start with a small building block of four socket, three terabyte building block, and then you just connect these boxes together. The memory footprint just grows linearly. The latency across our fabric just stays constant as you add these modules together. We can deliver up to 32 processes, 48 terabytes of in-memory data in a single rack. So it's really the flexibility, sort of a pay as you grow model. As their needs grow, they don't have to throw out the infrastructure. They can add to it. >> So when you take a look ultimately at the combination, we talked a little bit about some of the new types of problems that can be addressed, but let's bring it practical to the average enterprise. What can the enterprise do today, as a consequence of this machine, that they couldn't do just a few weeks ago? >> So it sort of builds on the modularity, as Lance explained. If you ask a CEO today, "what's my database requirement going to be in two or three years?" they're like, "I hope my business is successful, I hope I'm gonna grow my needs," but I really don't know where that side is going to grow, so the flexibility to just add modules and scale up the capacity of memory to bring that- so the whole concept of in-memory databases is basically bringing your online transaction processing and your data-analytics processing together. So then you can do this in real time and instead of your data going to a data warehouse and looking at how the business is operating days or weeks or months ago, I can see how it's acting right now with the latest updates of transactions. >> So this is important. You mentioned two different things. Number one is you mentioned you can envision- or three things. You can start using modern technology immediately on an extremely modern platform. Number two, you can grow this and scale this as needs follow, because Hana in memory is not gonna have the same scaling limitations that you know, Oracle on a bunch of spinning discs had. >> Mike: Exactly. >> So, you still have the flexibility to learn and then very importantly, you can start adding new functions, including automation, because now you can put the analytics and the transaction processing together, close that loop so you can bring transactions, analytics, boom, into a piece of automation, and scale that in unprecedented ways. That's kind of three things that the business can now think about. Have I got that right? >> Yeah, that's exactly right. It lets people really understand how their business is operating in real time, look for trends, look for new signatures in how the business is operating. They can basically build on their success and basically having this sort of technology gives them a competitive advantage over their competitors so they can out-compute or out-compete and get ahead of the competition. >> But it also presumably leads to new kinds of efficiencies because you can converge, that converge word that we've heard so much. You can not just converge the hardware and converge the system software management, but you can now increasingly converge tasks. Bring those tasks in the system, but also at a business level, down onto the same platform. >> Exactly, and so moving in memory is really about bringing real time to the problem instead of batch mode processing, you bring in the real-time aspect. Humans, we're interactive, we like to ask a question, get an answer, get on to the next question in real time. When processes move from batch mode to real time, you just get a step change in the innovation that can occur. We think with this foundation, we're really enabling the industry to step forward. >> So let's create a practical example here. Let's apply this platform to a sizeable system that's looking at customer behavior patterns. Then let's imagine how we can take the e-commerce system that's actually handling order, bill, fulfillment and all those other things. We can bring those two things together not just in a way that might work, if we have someone online for five minutes, but right now. Is that kind of one of those examples that we're looking at? >> Absolutely, you can basically- you have a history of the customers you're working with. In retail when you go in a store, the store will know your history of transactions with them. They can decide if they want to offer you real time discounts on particular items. They'll also be taking in other data, weather conditions to drive their business. Suddenly there's going to be a heat wave, I want more ice cream in the store, or it's gonna be freezing next week, I'm gonna order in more coats and mittens for everyone to buy. So taking in lots of transactional data, not just the actual business transaction, but environmental data, you can accelerate your ability to provide consumers with the things they will need. >> Okay, so I remember when you guys launched Apollo. Antonio Neri was running the server division, you might have had networking to him. He did a little reveal on the floor. Antonio's actually in the house over there. >> Mike: (laughs) Next door. There was an astronaut at the reveal. We covered it on the Cube. He's always been very focused on this part of the business of the high-performance computing, and obviously the machine has been a huge project. How has the leadership been? We had a lot of skeptics early on that said you were crazy. What was the conversation like with Meg and Antonio? Were they continuously supportive, were they sometimes skeptical too? What was that like? >> So if you think about the total amount of effort we've put in the machine program, and truly speaking, that kind of effort would not be possible if the senior leadership was not behind us inside this company. Right? A lot of us in HP labs were working on it. It was not just a labs project, it was a project where our business partners were working on it. We brought together engineering teams from the business groups who understood how projects were put together. We had software people working with us who were working inside the business, we had researchers from labs working, we had supply chain partners working with us inside this project. A project of this scale and scope does not succeed if it's a handful of researchers doing this work. We had enormous support from the business side and from our leadership team. I give enormous thanks to our leadership team to allow us to do this, because it's an industry thing, not just an HP Enterprise thing. At the same time, with this kind of investment, there's clearly an expectation that we will make it real. It's taken us three years to go from, "here is a vague idea from a group of crazy people in labs," to something which actually works and is real. Frankly, the conversation in the last six months has been, "okay, so how do we actually take it to customers?" That's where the partnership with Mike and his team has become so valuable. At this point in time, we have a shared vision of where we need to take the thing. We have something where we can on-board customers right now. We have something where, frankly, even I'm working on the examples we were talking about earlier today. Not everybody can afford a 16-socket, giant machine. The Superdome Flex allows my customer, or anybody who is playing with an application to start small, something that is reasonably affordable, try that application out. If that application is working, they have the ability to scale up. This is what makes the Superdome Flex such a nice environment to work in for the types of applications I'm worrying about because it takes something which when we had started this program, people would ask us, "when will the machine product be?" From day one, we said, "the machine product will be something that might become available to you in some form or another by the end of the decade." Well, suddenly with Mike, I think I can make it happen right now. It's not quite the end of the decade yet, right? So I think that's what excited me about this partnership we have with the Superdome Flex team. The fact that they had the same vision and the same aspirations that we do. It's a platform that allows my current customers with their current applications like Mike described within the context of say, SAB Hana, a scalable platform, they can operate it now. It's also something that allows them to involve towards the future and start putting new applications that they haven't even thought about today. Those were the kinds of applications we were talking about. It makes it possible for them to move into this journey today. >> So what is the availability of Superdome Flex? Can I buy it today? >> Mike: You can buy it today. Actually, I had the pleasure of installing the first early-access system in the UK last week. We've been delivering large memory platforms to Stephen Hawking's team at Cambridge University for the last twenty years because they really like the in-memory capability to allow them, as they say, to be scientists, not computer scientists, in working through their algorithms and data. Yeah, it's ready for sale today. >> What's going on with Hawking's team? I don't know if this is fake news or not, but I saw something come across that said he says the world's gonna blow up in 600 years. (laughter) I was like, uh-oh, what's Hawking got going now? (laughs) That's gotta be fun working with those guys. >> Yeah, I know, it's been fun working with that team. Actually, what I would say following up on Sharad's comment, it's been really fun this last year, because I've sort of been following the machine from outside when the announcements were made a couple of years ago. Immediately when the acquisition closed, I was like, "tell me about the software you've been developing, tell me about the photonics and all these technologies," because boy, I can now accelerate where I want to go with the technology we've been developing. Superdome Flex is really the first step on the path. It's a better product than either company could have delivered on their own. Now over time, we can integrate other learnings and technologies from the machine research program. It's a really exciting time. >> Excellent. Gentlemen, I always love the SGI acquisitions. Thought it made a lot of sense. Great brand, kind of put SGI back on the map in a lot of ways. Gentlemen, thanks very much for coming on the Cube. >> Thank you again. >> We appreciate you. >> Mike: Thank you. >> Thanks for coming on. Alright everybody, We'll be back with our next guest right after this short break. This is the Cube, live from HGE Discover Madrid. Be right back. (energetic synth)

Published Date : Nov 29 2017

SUMMARY :

it's the Cube! the leader in live tech coverage. Good to be back. that is the first jointly-engineered the finish line and with the product How has, or has, the acquisition of Some of the work we had done, One of the key things with the What can the enterprise do today, so the flexibility to just add gonna have the same scaling limitations that the transaction processing together, how the business is operating. You can not just converge the hardware and the innovation that can occur. Let's apply this platform to a not just the actual business transaction, Antonio's actually in the house We covered it on the Cube. the same aspirations that we do. Actually, I had the pleasure of he says the world's gonna blow up in 600 years. Superdome Flex is really the first Gentlemen, I always love the SGI This is the Cube,

ENTITIES

Entity	Category	Confidence
Peter Burris	PERSON	0.99+
Mike	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Meg	PERSON	0.99+
Sharad Singhal	PERSON	0.99+
Antonio	PERSON	0.99+
Mike Woodacre	PERSON	0.99+
SGI	ORGANIZATION	0.99+
Hawking	PERSON	0.99+
UK	LOCATION	0.99+
five minutes	QUANTITY	0.99+
Hewlett Packard Enterprise	ORGANIZATION	0.99+
Antonio Neri	PERSON	0.99+
Lance	PERSON	0.99+
HPE	ORGANIZATION	0.99+
48 terabytes	QUANTITY	0.99+
Hewlett-Packard Enterprise	ORGANIZATION	0.99+
today	DATE	0.99+
three years	QUANTITY	0.99+
next week	DATE	0.99+
Oracle	ORGANIZATION	0.99+
HP	ORGANIZATION	0.99+
Michael Woodacre	PERSON	0.99+
Stephen Hawking	PERSON	0.99+
last week	DATE	0.99+
Madrid	LOCATION	0.99+
Hewett Packard Enterprise	ORGANIZATION	0.99+
first step	QUANTITY	0.99+
2020	DATE	0.99+
Sharad	PERSON	0.99+
One	QUANTITY	0.99+
Cambridge University	ORGANIZATION	0.99+
two things	QUANTITY	0.99+
HGE Discover Madrid	ORGANIZATION	0.98+
first	QUANTITY	0.98+
last year	DATE	0.98+
three things	QUANTITY	0.98+
two	QUANTITY	0.98+
three terabyte	QUANTITY	0.98+
both	QUANTITY	0.98+
one	QUANTITY	0.97+
600 years	QUANTITY	0.97+
16-socket	QUANTITY	0.97+
second day	QUANTITY	0.97+
The Machine	ORGANIZATION	0.96+
Superdome Flex	ORGANIZATION	0.96+
Madrid, Spain	LOCATION	0.95+
two different things	QUANTITY	0.95+
up	QUANTITY	0.93+
single rack	QUANTITY	0.91+
Cube	COMMERCIAL_ITEM	0.9+
end	DATE	0.9+
HPE Discover	EVENT	0.88+
32 processes	QUANTITY	0.88+
Superdome Flex	COMMERCIAL_ITEM	0.88+
few weeks ago	DATE	0.88+
SAB Hana	TITLE	0.86+
couple of years ago	DATE	0.86+
over	DATE	0.84+
Number two	QUANTITY	0.83+
Mission Critical Solutions	ORGANIZATION	0.83+
four socket	QUANTITY	0.82+
end of the decade	DATE	0.82+
last six months	DATE	0.81+
a year ago	DATE	0.81+
earlier today	DATE	0.8+

Orran Krieger - OpenStack Summit 2017 - #OpenStackSummit #theCUBE

>> Announcer: Live, from Boston, Massachusetts. It's theCUBE. Covering OpenStack Summit 2017. Brought to you by the OpenStack Foundation, Red Hat, and additional ecosystem support. >> Welcome back. I'm Stu Miniman joined by my cohost this week, John Troyer. Hi and welcome to the program, a first time guest, Professor at Boston University, and lead of the Massachusetts Open Cloud, Orran Krieger. Thanks so much for joining us. >> Ah, my pleasure, thank you. >> Alright, so, we're here in Boston, the center of culture, the revolution, a lot of universities. Tell us a little about you, just click on yourself, your role at BU, and then we'll get into the MOC stuff in a little bit too. >> Sure, I mean, I sort of came back from industry after 15 years in industry, to this incredible opportunity we had, to create this entity. I mean, there's no other place like this, if you take the universities in this city, it's equivalent to all the universities on the Pacific West Coast. Right, the concentration of high-tech is unbelievable here. >> I want to remind you, my wife was actually involved when Partners Healthcare first got launched here in Boston, was an early technology and collaboration here in Boston. Sounds similar, what you are, what you're doing with some of the universities in Cloud. Maybe you talk, you came from the vendor side. Just real quick, your background, you worked at a company that John and I know quite well. Maybe just give a quick background? >> Sure. I left academia, I don't know how many years ago. Ended up going to IBM research, and was there for about 10 years. And then I joined this little start-up called VMWare. And started up and then worked as sort of one of the lead architects for vCloud Director and the whole vCloud Initiative. >> Alright, great. Let's speak today, you also have, you're the lead in Massachusetts Open Cloud. We actually had a couple of guests on from Red Hat that talked a little bit about it. But tell us about the project, the scope of it, how many people involved, how many users you reach with this. >> Sure. The future is in the Cloud. I mean, you look at sort of the fact that users can use what they need, when they need it. Producers can get massive economies of scale. You know, the future of computing is in the cloud. And when I was on the industry side, what really concerned me, what was going on, is that these clouds were really closed. You couldn't see what was going on inside them. Innovation was sort of gated by this single provider, that operated and controlled each of these clouds. So, the question that I was struggling with back then, is how can we create a cloud that's open? That multiple technology companies can participate. And certainly when I came back to academia, a cloud where I could do innovation in. Where not just me, but many many different researchers. You look at how much research has fundamentally impacted our field. It's dramatic. Even in just sort of the very area we're talking about. From what Mendall and team did with VMWare, and then Zen coming out of Cambridge. I mean, Ceph coming out of, just like technology after technologies come out of academia. But now clouds are these closed boxes you can't get into. So we had this incredible opportunity. There'd be this data center, the Massachusetts Green High Performance Computing Data Center, MGHPCC. 15 megawatts. That's more than half the size of one of Google's 16 data centers. That had been built, right next to Hydro Dam, one third the power costs of what it is in Boston. By five big institutions: MIT, Harvard, BU, Northeastern, UMass. And we thought, wow, couldn't we create a cloud there? Couldn't we create a cloud with some 157,000 potential students as well as the broader ecosystem? So we started discussing that idea. All the universities kind of signed up behind it. The model of the cloud is not to create another single provider cloud. It's not going to be my cloud. The idea is to have many vendors participate. Stand up different services, and create an open cloud, where there's not just multiple tenants but there's also multiple landlords with the cloud. >> Great. Could you talk to us a little about how do some of those pieces get chosen? How does OpenStack fit into it? And if you can talk about some of the underlining pieces it'd be good to understand how you sort that out too. >> Sure. So in doing that we, it's actually been sort of this cool, you know you have to kind of build different levels simultaneously. When we started the project, you know our first thing was, oh you know we'll be able to just stand up a cloud. It wasn't that easy. OpenStack is actually a complicated learning curve to get up. Now it's matured tremendously. We've been in production for about ten months, with no significant failures. I'm almost thinking that we need to kind of bring it down for a couple hours. Just so the people start realizing this is not intended to be a place where you run it like you would a production data center facility. That we don't guarantee it as so, 'cause people are starting to assume we do. (laughing) But, we started off and we sort of solved OpenStack, got it up and running. Took us a while to get it to the production layer. Started hosting courses, and users, and stuff like that. And some tastes that with sort of two other tracks. One is I'm developing some of the base technologies to enable a cloud to be multi-vendor. So mix-and-match fetterations serve our core of that. Which is this new capability that we've, after like five iterations on the right way to do this to allow multiple different clouds with their own keystone, mix different administrators say from MIT or Harvard, or from companies that might want to participate and set up a service. So, to have a capability of fettering between those things. Allowing you, for example, to use storage from one and compute from another. We started off with OpenStack because OpenStack already had the right architecture. It was designed as a series of different services. Each one which could be scaled independently. Each one that had it's own well defined API. And it seemed natural, jeez, we should be able to compose them together. Have, you know, one stand up, Nova compute. Another one stand up, Swift storage. Another one stand up, Cinder Storage. Turned out not to be that easy. There was assumptions that all these services were stood up by the same administrative entity. After three iterations of trying to figure out with the community how to make it, we finally have a capability of doing that now. That we're putting into production in the MOC itself. >> You talked about the different projects inside OpenStack, that's been one of the discussions here this week at the Summit. Different projects, the core, which are important and also the whole ecosystem of other cloud native and open source projects that have grown-up around OpenStack over the last six or seven years. Any commentary on how, which kind of projects you're finding are the most useful and the UC as kind of the core of OpenStack going on? And also, which projects from other ecosystems do you think are natural fits into working on an OpenStack base platform? >> Sure. So in our environment, we serve all the core services you think of, obviously Nova and Cinder and Swift. We're using Ceph in most of our environments. Sahara, Heat. We've actually expanded beyond in a couple of different dimensions. I guess that, one thing is we've been using extensively Ceph, that's been very valuable for us. And we've also been modifying it actually, substantially. It's actually kind of exciting cause we have graduate students that are making changes that are now going upstream in the Ceph community as a result of their experiences in doing things within our environment. But, there's other projects that sort of tied in sort of two different levels. One is we're working very closely with Red Hat, today around OpenShift. And we're making the first deployment of that available in the very near future. And the other thing is very important for our environment, we have I think three different talks related to this to have data sets in the cloud. To have data sets shared between communities of people. Data sets that are discoverable. Data sets where you can actually, that are citable. So we've been working very closely with Harvard and the OpenSource dataverse community and we've together created the cloud dataverse. Which is now actually in the MOC. So researchers from all these institutions can actually publish their data sets. As well as researchers from around the world. So there's over 15,000 data sets today in the Harvard dataverse for example. >> Curious if you can give us any commentary on how open source fits into education these days? Talk about the pipeline and the next generation of workers. Do your students get, you talked about upstream contributions, how do they get involved? How early are they getting involved? >> Well, actually, that's sort of a bit of a passion of mine. So multiple different levels, I guess. One of them I think is this is a great way for a student to sort of get exposed to a broad community of people to interact with. I think it's, rather than going in to serve one company, and getting locked down doing one thing, I think it's just enormously valuable. There's sort of two different dimensions I guess, educationally and from a research prospective. And both of them were very tied to open source. So from an education perspective, we have a course, for example, one of my frustrations of having come back from industry was students had done a lot of great, learned how to program, often as individuals they really didn't learn how to do agile, they didn't learn how to work with teams of people, so we have a large course that's served by multiple institutions today that's sort of tied to the MOC where we actually have industry mentors, we teach them agile methods, we teach them a lot of the sort of fundamentals of cloud, but we also have industry mentors come in and mentor teams of five students to create a product. There's actually three different lightning talks by different students that have taken this course, that are here in the OpenStack forum today. So it's kind of exciting to see. We've had several hundred students that have learned that and at least, in my experience, learning how to deal with open source communities, mentorship is a great way of doing that. First year we started teaching this course we had sort of struggled finding mentors, now we're about twice as many mentors applying to mentor teams as we can accommodate in it. So that's been kind of exciting. >> That's great. That's super important and learning right and not just learning how to program but how to operate as a engineer and a team. >> So in the MOC itself, a lot of it's stood up by students. We have like 20 to 30 students. We have a very small core development in our operations team and most of it is actually students doing all the real work. It's been amazing how much they can accomplish in that environment. >> You mentioned OpenShift. So another conversation that's been somewhat confusing in the broader industry is the talking about containers versus VMs and virtualization and OpenStack. Here this week, I thought it's been a fairly clear message that there's some you can be containerizing the stack itself and then there's also a role for containers on top. Obviously been involved in virtualization for a long time, how are you seeing the evolution of both containerization as a technology, but also container based platforms versus kind of the infrastructure and provisioning of the cloud part? >> I mean, there's three levels that all have its role. There's actually people that want to control all the way down to the operating system and want to do, customize things who want to use SRLV and want to use accelerators that haven't. So there's people that actually want hardware as a service and we provide a capability for doing that that's got its limitations today. There's people that want to use virtual machines and there's people that actually want to use containers. And the ability to orchestrate setting up a complex multitiered environment on that and doing fine-grain sharing in a containerized environment is huge. I think that actually all three are going to have a continued role going forward. And certainly containerized approach is an awesome way to deploy a cloud environment and scale the cloud environment even the IAS environment. So we're certainly doing that. >> Love the idea of the collaboration you have both intermittently with all the universities. Are you getting reached out by outside of Massachusetts? How do you interact with the broader community and share ideas back and forth? >> So of course there is multiple streams of that one of them is our industry partners are very broad. Second, we've participated in sort of the OpenStack Summits and all those kind of things. The other thing is that the model that we are doing, I think has a lot of excitement and interest from very many different segments. I don't think people want to see the public cloud be dominated, or could see always be dominated by a very small number of vendors. So the idea of actually creating an open mall of cloud. Lots of other academic institutions have talked with us both about setting up sister organizations, fettering between clouds and replicating the model. We're still at an early stage. This model still has to be proven out. We're excited that we have users that are using us now to get their work done. Rather than just courses and things like that. But it's still at a very early stage So I think as we scale up we'll start looking at replicating that model more broadly. >> Is there any public information about what you're doing? And I'm curious, will this tie into like mooc delivery, things like that? >> Oh, absolutely yeah. It's all on our webpage info.massopencloud.org. So everything is done in the open, I guess. So all the projects, they're all, everything is on the websites and you can discover all about it. And we welcome participation from a broad community. And are excited about that. >> Orran Krieger. Really appreciate you sharing with our community everything there. Congratulations. Local, we'd love to stop by some time to check out even more. John and I will be back with lots more coverage here from openStack Summit 2017, Boston, Massachusetts. You're watching theCUBE. (upbeat music)

Published Date : May 10 2017

SUMMARY :

Brought to you by the OpenStack Foundation, and lead of the Massachusetts Open Cloud, Orran Krieger. the revolution, a lot of universities. to this incredible opportunity we had, Sounds similar, what you are, what you're doing and the whole vCloud Initiative. the scope of it, Even in just sort of the very area we're talking about. it'd be good to understand how you sort that out too. this is not intended to be a place where you run it and the UC as kind of the core of OpenStack going on? and the OpenSource dataverse community and we've and the next generation of workers. So it's kind of exciting to see. and not just learning how to program but how to and most of it is actually students doing all the real work. of the cloud part? And the ability to orchestrate setting up a complex Love the idea of the collaboration you have So the idea of actually creating an open mall of cloud. So everything is done in the open, I guess. John and I will be back with lots more coverage here

ENTITIES

Entity	Category	Confidence
Orran Krieger	PERSON	0.99+
John Troyer	PERSON	0.99+
Dave Volante	PERSON	0.99+
Dave	PERSON	0.99+
Amy Wright	PERSON	0.99+
Boston	LOCATION	0.99+
John	PERSON	0.99+
IBM	ORGANIZATION	0.99+
20	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
2021	DATE	0.99+
UMass	ORGANIZATION	0.99+
Red Hat	ORGANIZATION	0.99+
MIT	ORGANIZATION	0.99+
Amy	PERSON	0.99+
New York	LOCATION	0.99+
OpenStack Foundation	ORGANIZATION	0.99+
Harvard	ORGANIZATION	0.99+
Stu Miniman	PERSON	0.99+
Massachusetts	LOCATION	0.99+
five students	QUANTITY	0.99+
16 data centers	QUANTITY	0.99+
one	QUANTITY	0.99+
Second	QUANTITY	0.99+
both	QUANTITY	0.99+
Pacific West Coast	LOCATION	0.99+
iPads	COMMERCIAL_ITEM	0.99+
Partners Healthcare	ORGANIZATION	0.99+
info.massopencloud.org	OTHER	0.99+
Northeastern	ORGANIZATION	0.99+
Boston, Massachusetts	LOCATION	0.99+
BU	ORGANIZATION	0.99+
15 megawatts	QUANTITY	0.99+
third	QUANTITY	0.99+
One	QUANTITY	0.99+
openStack Summit 2017	EVENT	0.99+
four	QUANTITY	0.99+
Hydro Dam	LOCATION	0.99+
Ceph	ORGANIZATION	0.99+
OpenStack Summit 2017	EVENT	0.99+
five iterations	QUANTITY	0.99+
each	QUANTITY	0.99+
today	DATE	0.98+
33	QUANTITY	0.98+
this week	DATE	0.98+
five big institutions	QUANTITY	0.98+
one group	QUANTITY	0.98+
Cambridge	LOCATION	0.98+
one person	QUANTITY	0.98+
about ten months	QUANTITY	0.98+
OpenStack	TITLE	0.98+
157,000 potential students	QUANTITY	0.98+
first time	QUANTITY	0.98+
30 students	QUANTITY	0.97+
half a half a decade	QUANTITY	0.97+
three levels	QUANTITY	0.97+
first thing	QUANTITY	0.97+
#OpenStackSummit	EVENT	0.97+
one third	QUANTITY	0.97+
two	QUANTITY	0.97+
one company	QUANTITY	0.97+

Analytics and the Future: Big Data Deep Dive Episode 6

>> No. Yeah. Wait. >> Hi, everyone, and welcome to the big data. Deep Dive with the Cube on AMC TV. I'm Richard Schlessinger, and I'm here with tech industry entrepreneur and wicked bond analyst Dave Volonte and Silicon Angle CEO and editor in chief John Furrier. For this last segment in our show, we're talking about the future of big data and there aren't two better guys to talk about that you and glad that you guys were here. Let me sort of tee up the this conversation a little bit with a video that we did. Because the results of big data leveraging are only as good as the data itself. There has to be trust that the data is true and accurate and as unbiased as possible. So AMC TV addressed that issue, and we're just trying to sort of keep the dialogue going with this spot. >> We live in a world that is in a constant state of transformation, political natural transformation that has many faces, many consequences. A world overflowing with information with the potential to improve the lives of millions with prospects of nations with generations in the balance way are awakening to the power of big data way trust and together transform our future. >> So, Gentlemen Trust, without that, where are we and how big of an issue is that in the world of big data? Well, you know, the old saying garbage in garbage out in the old days, the single version of the truth was what you were after with data warehousing. And people say that we're further away from a single version of the truth. Now with all this data. But the reality is with big data and these new algorithms you, khun algorithmic Lee, weed out the false positives, get rid of the bad data and mathematically get to the good data a lot faster than you could before. Without a lot of processes around it. The machines can do it for you. So, John, while we were watching that video, you murmured something about how this is the biggest issue. This is cutting edge stuff. This is what I mean. >> Trust, trust issues and trust the trust equation. Right now it is still unknown. It's evolving fast. You see it with social networks, Stevens go viral on the internet and and we live in a system now with mobility and cloud things. Air scaling infinitely, you know, these days and so good day two scales, big and bad data scales being so whether it's a rumor on you here and this is viral or the data data, trust is the most important issue, and sometimes big data can be creepy. So a. This really, really important area. People are watching it on DH. Trust is the most important thing. >> But, you know, you have to earn trust, and we're still sort of at the beginning of this thing. So what has to happen to make sure that you know you don't get the garbage in, so you get the garbage. >> It's iterative and and we're seeing a lot of pilot projects. And then those pilot projects get reworked, and then they spawn into new projects. And so it's an evolution. And as I've said many, many times, it's very early we've talked about, were just barely scratching the surface here. >> It's evolving, too, and the nature of the data is needs to be questioned as well. So what kind of data? For instance, if you don't authorize your data to be viewed, there's all kinds of technical issues around. >> That's one side of it, But the other side of it, I mean, they're bad people out there who would try to influence, Uh, you know what? Whatever conclusions were being drawn by big data programs, >> especially when you think about big data sources. So companies start with their internal data, and they know that pretty well. They know where the warts are. They know how to manipulate. It's when they start bringing in outside data that this gets a lot fuzzier. >> Yeah, it's a problem. And security talk to a guy not long ago who thought that big data could be used to protect big data, that you could use big data techniques to detect anomalies in data that's coming into the system, which is poetic if nothing else, that guys think data has told me that that that's totally happened. It's a good solution. I want to move on because way really want to talk about how this stuff is going to be used. Assuming that these trust issues can be solved on and you know, the best minds in the world are working on this issue to try to figure out how to best, you know, leverage the data, we all produce, which has been measured at five exabytes every two days. You know, somebody made an analogy with, like something. If a bite was a paper clip and you stretched five exabytes worth of paper clips, they would go to the moon or whatever. Anyway, it's a lot of bike. It's a lot of actually, I think that's a lot of fun and back way too many times one hundred thousand times I lost track of my paper. But anyway, the best minds are trying to figure out, you know, howto, you know, maximize that the value that data. And they're doing that not far from here where we sit. Uh, Emmett in a place called C Sale, which was just recently set up, See Sail stands for the computer signs, an artificial intelligence lab. So we went there not long ago. It's just, you know, down the Mass. Pike was an easy trip, and this is what we found. It's fascinating >> Everybody's obviously talking about big data all the time, and you hear it gets used to mean all different types of things. So he thinks we're trying to do in the big data. Is he? Still program is to understand what are the different types of big data that exists in the world? And how do we help people to understand what different problems or fall under the the overall umbrella of big data? She sells the largest interdepartmental laboratory and mitt, so there's about one hundred principal investigators. So that's faculty and sort of senior research scientists. About nine hundred students who are involved, >> basically with big data, almost anything to do with it has to be in a much larger scale than we're used to, and the way it changes that equation is you have to You have to have the hardware and software to do the things you're used to doing. You have to meet them of comedy's a larger size a much larger size >> of times. When people talk about big data, they, I mean, not so much the volume of the data, but that the data, for example, is too complex for their existing data. Processing system to be able to deal with it. So it's I've got information from Social network from Twitter. I've got your information from a person's mobile phone. Maybe I've got information about retail records. Transactions hole Very diverse set of things that need to be combined together. What this clear? It says this is If you added this, credit it to your query, you would remove the dots that you selected. That's part of what we're trying to do here. And big data is he sail on. Our big data effort in general at MIT is toe build a set of software tools that allow people to take all these different data sets, combine them together, asked questions and run algorithms on top of them that allowed him to extracting sight. >> I'm working with it was dragged by NASA, but the purpose of my work right now is Tio Tio. Take data sets within Davis's, and instead of carrying them for table results, you query them, get visualizations. So instead of looking at large sets of numbers and text him or not, you get a picture and gave the motivation Behind that is that humans are really good into pretty pictures. They're not so that interpreting huge tables with big data, that's a really big issue. So this will have scientists tio visualize their data sets more quickly so they can start exploring And, uh, just looking at it faster, because with big data, it's a challenge to be able to visualize an exploiter data. >> I'm here just to proclaim what you already know, which is that the hour of big data has arrived in Massachusetts, and >> it's a very, very exciting time. So Governor Patrick was here just a few weeks ago to announce the Mass Big Data Initiative. And really, I think what he recognizes and is partly what we recognize here is that there's a expertise in the state of Massachusetts in areas that are related to big data, partly because of companies like AMC, as well as a number of other companies in this sort of database analytic space, CMC is a partner in our big data detail, initiatives and big data and See Sale is industry focused initiative that brings companies together to work with Emmet T. Think about it. Big data problems help to understand what big data means for the companies and also to allow the companies to give feedback. Tow us about one of the most important problems for them to be working on and potentially expose our students and give access to these companies to our students. >> I think the future will tell us, and that's hard to say right now, because way haven't done a lot of thinking, and I was interpreting and Big Data Way haven't reached our potential yet, and I just there's just so many things that we can't see right now. >> So one of the things that people tell us that are involved in big data is they have trouble finding the skill sets the data. Science can pick capability and capacity. And so seeing videos like this one of them, it is a new breed of students coming out there. They're growing up in this big data world, and that's critical to keep the big data pipeline flowing. And Jon, you and I have spent a lot of time in the East Coast looking at some of the big data cos it's almost a renaissance for Massachusetts in Cambridge and very exciting to see. Obviously, there's a lot going on the West Coast as well. Yeah, I mean, I'll say, I'm impressed with Emmett and around M I. T. In Cambridge is exploding with young, young new guns coming out of there. The new rock stars, if you will. But in California we're headquartered in Palo Alto. You know we in a chance that we go up close to Google Facebook and Jeff Hammer backer, who will show a video in a second that I interview with him and had dupe some. But he was the first guy a date at Facebook to build the data platform, which now has completely changed Facebook and made it what it is. He's also the co founder of Cloudera The Leader and Had Duke, which we've talked about, and he's the poster child, in my opinion of a data scientist. He's a math geek, but he understands the world problems. It's not just a tech thing. It's a bigger picture. I think that's key. I mean, he knows. He knows that you have to apply this stuff so and the passion that he has. This video from Jeff Hammer Bacher, cofounder of Cloud Ear, Watches Video. But and then the thing walk away is that big data is for everyone, and it's about having the passion. >> Wait. Wait. >> Palmer Bacher Data scientists from Cloudera Cofounder Hacking data Twitter handle Welcome to the Cube. >> Thank you. >> So you're known in the industry? I'LL see. Everyone knows you on Twitter. Young Cora heavily follow you there at Facebook. You built the data platform for Facebook. One of the guys mean guys. They're hacking the data over Facebook. Look what happened, right? I mean, the tsunami that Facebook has this amazing co founder Cloudera. You saw the vision on Rommedahl always quotes on the Cube. We've seen the future. No one knows it yet. That was a year and a half ago. Now everyone knows it. So do you feel about that? Is the co founder Cloudera forty million thousand? Funding validation again? More validation. How do you feel? >> Yeah, sure, it's exciting. I think of you as data volumes have grown and as the complexity of data that is collected, collected and analyzed as increase your novel software architectures have emerged on. I think what I'm most excited about is the fact that that software is open source and we're playing a key role in driving where that software is going. And, you know, I think what I'm most excited about. On top of that is the commodification of that software. You know, I'm tired of talking about the container in which you put your data. I think a lot of the creativity is happening in the data collection integration on preparation stage. Esso, I think. You know, there was ah tremendous focus over the past several decades on the modeling aspect of data way really increase the sophistication of our understanding, you know, classification and regression and optimization. And all off the hard court model and it gets done. And now we're seeing Okay, we've got these great tools to use at the end of the pipe. Eso Now, how do we get more data pushed through those those modeling algorithm? So there's a lot of innovative work. So we're thinking at the time how you make money at this or did you just say, Well, let's just go solve the problem and good things will happen. It was it was a lot more the ladder. You know, I didn't leave Facebook to start a company. I just left Facebook because I was ready to do something new. And I knew this was a huge movement and I felt that, you know, it was very gnashing and unfinished a software infrastructure. So when the opportunity Cloudera came along, I really jumped on it. And I've been absolutely blown away by the commercial success we've had s o. I didn't I certainly didn't set out with a master plan about how to extract value from this. My master plan has always been to really drive her duped into the background of enterprise infrastructure. I really wanted to be as obvious of a choice as Lennox and you See you, you're We've talked a lot at this conference and others about, you know, do moving from with fringe to the mainstream commercial enterprises. And all those guys are looking at night J. P. Morgan Chase. Today we're building competitive advantage. We're saving money, those guys, to have a master plan to make money. Does that change the dynamic of what you do on a day to day basis, or is that really exciting to you? Is an entrepreneur? Oh, yeah, for sure. It's exciting. And what we're trying to do is facilitate their master plan, right? Like we wanted way. Want to identify the commonalities and everyone's master plan and then commoditize it so they can avoid the undifferentiated heavy lifting that Jeff Bezos points out. You know where you know? No one should be required, Teo to invest tremendous amounts of money in their container anymore, right? They should really be identifying novel data sources, new algorithms to manipulate that data, the smartest people for using that data. And that's where they should be building their competitive advantage on. We really feel that, you know, we know where the market's going on. We're very confident, our product strategy. And I think over the next few years, you know, you guys are gonna be pretty excited about the stuff we're building, because I know that I'm personally very excited. And yet we're very excited about the competition because number one more people building open source software has never made me angry. >> Yeah, so So, you know, that's kind of market place. So, you know, we're talking about data science building and data science teams. So first tell us Gerald feeling today to science about that. What you're doing that, Todd here, around data science on your team and your goals. And what is a data scientist? I mean, this is not, You know, it's a D B A for her. Do you know what you know, sheriff? Sure. So what's going on? >> Yeah, So, you know, to kind of reflect on the genesis of the term. You know, when we were building out the data team at Facebook, we kind of two classes of analysts. We had data analysts who are more traditional business intelligence. You know, building can reports, performing data, retrieval, queries, doing, you know, lightweight analytics. And then we had research scientists who are often phds and things like sociology or economics or psychology. And they were doing much more of the deep dive, longitudinal, complex modeling exercises. And I really wanted to combine those two things I didn't want to have. Those two folks be separate in the same way that we combined engineering and operations on our date infrastructure group. So I literally just took data analyst and research scientists and put them together and called it data scientist s O. So that's kind of the the origin of the title on then how that's translating what we do at Clyde era. So I've recently hired to folks into a a burgeoning data science group Cloudera. So the way we see the market evolving is that you know the infrastructure is going to be commoditized. Yes, mindset >> to really be a data scientists, and you know what is way should be thinking about it. And there's no real manual. Most people aboard that math skills, economic kinds of disciplines you mentioned. What should someone prepared themselves? How did they? How does someone wanna hire data scientist had, I think form? Yeah, kinds of things. >> Well, I tend to, you know, I played a lot of sports growing up, and there's this phrase of being a gym rat, which is someone who's always in the gym just practicing. Whatever support is that they love. And I find that most data scientists or sort of data rats, they're always there, always going out for having any data. So you're there's a genuine curiosity about seeing what's happening and data that you really can't teach. But in terms of the skills that are required, I didn't really find anyone background to be perfect. Eso actually put together a course at University California, Berkeley, and taught it this spring called Introduction to Data Science, and I'm teaching and teaching it again this coming spring, and they're actually gonna put it into the core curriculum. Uh, in the fall of next year for computer science. >> Right, Jack Harmer. Bakar. Thanks so much for that insight. Great epic talk here on the Cube. Another another epic conversations share with the world Live. Congratulations on the funding. Another forty months. It's great validation. Been congratulations for essentially being part of data science and finding that whole movement Facebook. And and now, with Amaar Awadallah and the team that cloud there, you contend a great job. So congratulations present on all the competition keeping you keeping a fast capitalism, right? Right. Thank >> you. But it's >> okay. It's great, isn't it? That with all these great minds working in this industry, they still can't. We're so early in this that they still can't really define what a data scientist is. Well, what does talk about an industry and its infancy? That's what's so exciting. Everyone has a different definition of what it is, and that that what that means is is that it's everyone I think. Data science represents the new everybody. It could be a housewife. It could be a homemaker to on eighth grader. It doesn't matter if you see an insight and you see something that could be solved. Date is out there, and I think that's the future. And Jeff Hamel could talked about spending all this time and technology with undifferentiated heavy lifting. And I'm excited that we are moving beyond that into essentially the human part of Big Data. And it's going to have a huge impact, as we talked about before on the productivity of organizations and potentially productivity of lives. I mean, look at what we've talked about this this afternoon. We've talked about predicting volcanoes. We've talked about, you know, the medical issues. We've talked about pretty much every aspect of life, and I guess that's really the message of this industry now is that the folks who were managing big data are looking too change pretty much every aspect of life. This is the biggest inflexion point in history of technology that I've ever seen in the sense that it truly affects everything and the data that's generated in the data that machine's generate the data that humans generate, data that forest generate things like everything is generating data. So this's a time where we can actually instrument it. So this is why this massive disruption, this area and disruption We should say the uninitiated is a good thing in this business. Well, creation, entrepreneurship, copies of being found it It's got a great opportunity. Well, I appreciate your time, I unfortunately I think that's going to wrap it up for our big date. A deep dive. John and Dave the Cube guys have been great. I really appreciate you showing up here and, you know, just lending your insights and expertise and all that on DH. I want to thank you the audience for joining us. So you should stay tuned for the ongoing conversation on the Cube and to emcee TV to be informed, inspired and hopefully engaged. I'm Richard Schlessinger. Thank you very much for joining us.

Published Date : Feb 19 2013

SUMMARY :

aren't two better guys to talk about that you and glad that you guys were here. of millions with prospects of nations with generations in the get rid of the bad data and mathematically get to the good data a lot faster than you could before. you know, these days and so good day two scales, big and bad data scales being so whether make sure that you know you don't get the garbage in, so you get the garbage. And then those pilot projects get reworked, For instance, if you don't authorize your data to be viewed, there's all kinds of technical especially when you think about big data sources. Assuming that these trust issues can be solved on and you know, the best minds in the world Everybody's obviously talking about big data all the time, and you hear it gets used and the way it changes that equation is you have to You have to have the hardware and software to It says this is If you added this, of numbers and text him or not, you get a picture and gave the motivation Behind data means for the companies and also to allow the companies to give feedback. I think the future will tell us, and that's hard to say right now, And Jon, you and I have spent a lot of time in the East Coast looking at some of the big data cos it's almost a renaissance Wait. Welcome to the Cube. So do you feel about that? Does that change the dynamic of what you do on a day to day basis, Yeah, so So, you know, that's kind of market place. So the way we see the market evolving is that you know the infrastructure is going to be commoditized. to really be a data scientists, and you know what is way should be thinking about it. data that you really can't teach. with Amaar Awadallah and the team that cloud there, you contend a great job. But it's and I guess that's really the message of this industry now is that the

ENTITIES

Entity	Category	Confidence
Jeff Hamel	PERSON	0.99+
Richard Schlessinger	PERSON	0.99+
AMC	ORGANIZATION	0.99+
Jon	PERSON	0.99+
CMC	ORGANIZATION	0.99+
California	LOCATION	0.99+
Jeff Hammer	PERSON	0.99+
Jeff Hammer Bacher	PERSON	0.99+
Massachusetts	LOCATION	0.99+
John Furrier	PERSON	0.99+
Jeff Bezos	PERSON	0.99+
Palo Alto	LOCATION	0.99+
John	PERSON	0.99+
Cloudera	ORGANIZATION	0.99+
Jack Harmer	PERSON	0.99+
Facebook	ORGANIZATION	0.99+
Dave Volonte	PERSON	0.99+
Amaar	PERSON	0.99+
Gerald	PERSON	0.99+
Silicon Angle	ORGANIZATION	0.99+
AMC TV	ORGANIZATION	0.99+
Awadallah	PERSON	0.99+
Twitter	ORGANIZATION	0.99+
NASA	ORGANIZATION	0.99+
Emmett	PERSON	0.99+
Cambridge	LOCATION	0.99+
Dave	PERSON	0.99+
five exabytes	QUANTITY	0.99+
Emmet T.	PERSON	0.99+
Todd	PERSON	0.99+
Google	ORGANIZATION	0.99+
forty months	QUANTITY	0.99+
Rommedahl	PERSON	0.99+
millions	QUANTITY	0.99+
two better guys	QUANTITY	0.99+
two folks	QUANTITY	0.99+
one hundred thousand times	QUANTITY	0.99+
Cloud Ear	ORGANIZATION	0.98+
forty million thousand	QUANTITY	0.98+
a year and a half ago	DATE	0.98+
Today	DATE	0.98+
first	QUANTITY	0.98+
M I. T.	PERSON	0.98+
two things	QUANTITY	0.98+
J. P. Morgan Chase	ORGANIZATION	0.98+
Governor	PERSON	0.98+
one	QUANTITY	0.97+
MIT	ORGANIZATION	0.97+
Berkeley	LOCATION	0.97+
today	DATE	0.97+
University California	ORGANIZATION	0.96+
single version	QUANTITY	0.96+
One	QUANTITY	0.96+
Davis	PERSON	0.96+
one side	QUANTITY	0.95+
About nine hundred students	QUANTITY	0.95+
few weeks ago	DATE	0.94+
Stevens	PERSON	0.94+
Mass Big Data Initiative	EVENT	0.94+
first guy	QUANTITY	0.93+
West Coast	LOCATION	0.93+
Palmer Bacher	PERSON	0.93+
Eso	ORGANIZATION	0.93+
two classes	QUANTITY	0.92+
about one hundred principal investigators	QUANTITY	0.92+
Cube	ORGANIZATION	0.9+
East Coast	LOCATION	0.9+
C Sale	ORGANIZATION	0.87+
khun	PERSON	0.83+
Patrick	PERSON	0.82+
two scales	QUANTITY	0.81+
every two days	QUANTITY	0.81+
Lennox	PERSON	0.8+
Had Duke	ORGANIZATION	0.78+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for University of Cambridge: