Frank Arrigo, AWS & Emma Arrigo, AWS | Women in Tech: International Women's Day

(upbeat music) >> Hey everyone. Welcome to theCUBE's coverage of the International Women's Showcase for 2022. I'm your host, Lisa Martin. I'm really excited because for the first time in my CUBE career of six years, I have a father daughter duo maybe the first time in CUBE's history. Frank and Emma Arrigo from AWS join me guys it's great to have you on the program. >> Great to be here thank you. >> So, Emma, let's go ahead and start with you. Talk to us about how you got to AWS and a little bit about your background. >> Yes, thanks Lisa. So I've joined AWS as a recent graduate from university. So I did my masters of data science and I was going through the grad, the grad job hunt applying for all these different places. And AWS appeared on my radar for an intern program. And Frank was there at the time and so I was like, "Should I do it?" But I still applied cause it was a great program. And so I went did that internship for three months over the summer of 2019-2020, and then I went back and finished my degree. And another grad role came up for in AWS in Tech U to be an associate solutions architect. And so I was approached to apply for that. And I got through to that program and joined the team almost a year ago in March, 2021 through Tech U and yeah, that's how I ended up at AWS. >> Excellent and so Frank, this is a pretty unique situation, father daughter duo at AWS let alone Amazon, let alone probably a lot of companies. Talk to me about it the parental lens. >> Yeah, look it is unique, there's a few family connections within AWS, but you know, definitely here in Australia, it's really rare, but I think the family connection is, you know Emma and we've got four kids, I've got four kids in total. So Emma has three brothers, you know, I've lived in tech, my entire career and so they've been part of it. You know, we've lived in the States, lived in Seattle for a couple of years. And so they'd come to the office and see what dad did. And so it wasn't a big surprise for them to understand what the role was and what we did, so, you know, they kind of grew up with it. And you know, when the opportunity came up for Emma, did the internship, I was excited for it because it was in a different area. It was working in a startup team doing some interesting work that really lined up with some of the interest Emma had. And so she kind of learned what it was like to be Amazonian through that internship and that was... I call that a long audition for a job. And she was then able to join Tech U program, which is a early career bootcamp, I like to think of it, which is the six month program to help our grads learn some of the fundamental skills because the value of a solutions architect or some of these other tech roles is you need experience. You need to have been in the game a while to be a trusted advisor to a customer. And it's hard to do that when you're a grad. So the bootcamp gives them the practical experience and then they get another six months on the job experience where they develop those skills and hone it and get ready to, you know, be a trusted advisor to the customers. >> Right, and that's such a great... I'm sure that's a tremendous opportunity to learn how to become that trusted advisor, especially from peers, such as yourself and I want to go back to you. Talk to me about your interest in IT, in data science. Was this something that you were always interested in primary school or in high school? Or was this something that kind of came on later on? >> Yeah, so my interest in tech kind of emerged as I went along in my education. So when I was younger, I really wanted to be an orthodontist for some reason. I don't know why. And then you just sort of in year eight and like early school sort of didn't really know what I wanted to do. Just sort of going through just trying to survive as a teenage girl at high school at an all girls school, didn't really have many, didn't really have career aspirations, I guess, and then one year I attended a information day at a university about engineering and that just really sparked my interest, I don't know why, but I was like, I've always been obsessed with like factories and those types of things and how things are made. And so that really just sparked my interest and I never really thought of it before. And so then that put STEM engineering on my radar and then I guess spoke with it about with the parents. And then they mentioned that tech would be a like IT, Information Technology would be really useful. And so then we approached the school to ask if I could do IT in year 11. So that's sort of our second last year of high school. And they said, "No, we couldn't do IT." I couldn't go to the boys' school to do IT. That girls don't do it or that not good at it. And I wasn't allowed, and they wouldn't let me do physics either. So I moved school in for the final two of high school to be able to do IT and physics to help, you know, get to the course I wanted to do. And so that was my journey into STEM. So it wasn't really on my radar, but then events like this and at university isn't it? Organizations sparked my interest. And then still when I entered university, I didn't know exactly what I wanted to major in nor where I wanted to work would never have thought it would be where, with my father, like I was aware of the world of IT and everything, but I wouldn't, if you'd asked me in first year, it wouldn't have been that I would probably, we would've said, I don't know an academic or something. I don't know. And then, but again, as the university went on and you attend networking events or club things, you sort of learn a bit more about the ecosystem. And then that's where yeah. Tech company sort of became where I was looking for jobs and roles for when I finished up. So that was kind of my journey to... >> So what I love though, that you and Frank, this is going to be a question for you, how Emma was told. "No, you can't study IT. No, you can't study physics. You can't go to the boys school and do that either." Talk to me about that, Frank, from your perspective as a parent of a daughter, and you said, I think she's got three brothers lucky, Emma, but talk to me about that from your perspective, in terms of going, my daughter has really has an in a strong interest in this and they're telling her no we're going to pivot and actually change schools to be able give her the opportunities that she wants to pursue. >> Yeah. Look, as a parent, we were shocked. You know, it was just an unexpected response, you know, in a lot of ways, the school that she was at was more of a finishing school than anything else, you know, preparing young ladies for marriage and, you know, career as a, I don't know, I will leave it at that. So we were really disappointed. And so very quickly we looked at other alternatives and other options and we pulled Emma out of school and we knew it was like the last two years are critical in Australia. We don't have a middle school and a senior school, it's all one, you know, combined thing. But those last two years are all about getting ready for university. And so we made a really tough call and we picked her up, dropped her into a totally new school. It was co-ed school. And then when we told her previous, her girl's school. I actually spoke to the vice principal and he said, "Oh, I can't believe you're sending her to a co-ed school. She's going to struggle 'cause boys are so much better in tech." And I was totally, I was lost for words, right? Because I felt back in my career and I had some amazing female managers, leaders, role models in my time that I worked for and I followed and they were always struggling because, you know, they were in the minority, but they were incredible, you know, technologists and leaders. And I just couldn't believe it. So as parents we made the tough call. We picked Emma up. We put her into another high school and she flourished, you know, Emma started a club, she got convolved with a whole bunch of other things. When she graduated, the teachers felt that she'd been there six years, right? The whole time of it. So she really made a mark, made an impact at this school and so much so that her younger brother then followed and went to that school and completed his high school there as well. But it, we just can't believe it. And we tell it everyone, this story, you know, we name the school, we won't name. We choose not to name them here, but we name the school because we just think it's really terrible guidance and terrible advice. Like we want people to follow their passion. I tell my kids and I tell the folks when I speak to, you know, early career folks, follow your passion first, guess what the job will appear. Right? You know, there'll be the... The work will come if you do something that you love. And then the second piece that I always say is, "Every future job is going to be a tech job." Technology is embedded in everything that we do. So the fact that you say, "A girl can't do technology," you're limiting yourselves, right? You don't want to think that, you want to think about the possibilities rather than the things you can't do. It's the things you can do. And the things that you haven't even thought about doing. So that's why, you know, it was so exciting to see that experience with Emma, and just seeing her grow through that and she became a bit of a STEM advocate at a high school as well. So, she saw the value of her role model that helped her. And she wants to be a.... Continue being a role model for others as well, which again, I think is admirable, right? It's about- >> Absolutely. >> Shining a light and leading and as a parent, irrespective that we work at the same company as a parent, that's what you want to see. You want to see your kids aim high and inspire others. That's what she does. >> Well, she's already been a role model too, I mean, to your younger brother, but one of the things that we say often, and theCUBE does a lot for women in technology events. And I'm fortunate to get to host a lot of those, we say, "You can't be what you can't see." So needing to have those role models who are visible. Now, it doesn't have to be female necessarily. and Frank you mentioned that you had female mentors and role models and in your illustrious career. But the important point is being able to elevate women into positions where others can see and can identify, "Oh, there's a role model. There's somebody that might be a mentor for me, or a sponsor down the road, it's critically important." And as of course, we look at the numbers in tech, women in technical roles are still quite low, but Emma, tell me a little bit about, you've been through the program. You talked about that. What are some of the things that you feel in like the last six months that you've been able to learn that had you not had this opportunity, maybe you wouldn't have. >> You know, I think that's a great point. So as a solutions architect, I get to be both technical. So hands on building an AWS, helping customers solve their problems, whether it be a data leak or I don't know, an image recognition tool to look for garbage dumped on the street or, and also thinking from the business perspective for the customers, so that's a fun part as the, of the role, but things I get to do. So currently I'm working on a demo for the conference in Sydney. So I'm building a traffic detection model using some computer vision and IOT so I get to bring my data science background to this build and also learn about new areas like IOT, Internet Of Things; Technologies. So that's been a really fun project and yeah, just having the ability to play around on AWS, we have... >> Right. Well, the exposure in the experiences is priceless. You can't put a price on that, but being able to get into the environment, learn it from a technical perspective, learn it from a practical perspective. And then of course get all the great things about getting to interact with customers and learning how different industries work, you mentioned you were in public sector. That just must be a field of dreams, I would imagine. >> I know. >> In some senses for you, right? >> Really have lucked out. I know it's, I'm like, "Wow, this my job is to play around with some new service, just because need to know about that for the customer meeting. Like I'm building a chatbot or helping build a chatbot for a customer, at the university. So yeah, things like that make it very, yeah. It's a pretty amazing role. >> It sounds, it sure sounds like it. And sounds like you're are excelling at it tremendously. Let me ask you Emma. For young girls who might be in a similar situation to where you were not that long ago with the school telling you, "No, you can't do IT." "No, you can't do physics." So you actually switched schools. What would you tell those young girls who might be in that situation about hearing the word, "No." And would you advise them to embrace a career in technology? >> Yeah, I would say that it really..... What makes me so sad is if my family didn't know about tech and had my... Supported me through that like if I would've just gone, "Oh, okay. I won't do it." You know what I mean? Like that just makes me really sad. How many people have missed out on studying what they wanted to study. So by having those types of experiences, so what I would say as advice is, "Back yourself, find supporters, whether it be your family or a teacher that you really sort of connect with, to be able to support you and through these decisions." And yeah, I think having those sponsors in a way, your advocates to help you make those choices and help support you through those choices. >> Yeah. I agree. And I have a feeling you're going to be one of those sponsors and mentors, if you aren't already Emma, I have a feeling that's just around the corner from you. So Frank, last question to you. What's the overall lesson here, if we look at statistics, I mentioned some of the stats about, you know, women in technical roles as usually less than 25% globally. But also we see data that shows that companies are more profitable and more performant when there's at least 30% of the executive suite it's women. So from your parental perspective, and from an Amazonian perspective, Frank, what's the lesson here? >> Well, look from an Amazonian perspective, we need to make sure that we have a team that represents our customers, right? And our customers aren't all boys. You know, they're not all blokes, as we say down here. So you've got to have a team that is made up of what represents your customers. So I think that's the Amazonian view. And so diverse perspectives, diverse experience, diverse backgrounds is what does that. The other from a parent, you know, I said it earlier. I think every future job is a tech job. And I think it's really important that as kids come through, you know, primary school, high school, whatever, they're prepared for that, they're already consumers of technology. You know, they need to be creators or, or participate in that environment. And I can give you an example, a few years ago, I worked for, at a large telco here. And we actually invested in a thing called code club, which was aimed at primary school kids, kids in grade four, five and six. So elementary school for my friends in America, it's kids in grade four, five and six. And they were learning how to use scratch. Scratch is this interactive tool like building lego to write programming and believe it or not, there were more girls interested and were part of code club. It was probably 60-40 was the ratio of young girls doing it compared to boys because it was creative, it was a creative outlet, they were building stuff and assembling and making these things that they loved to make. Right. But then what we saw was there'd be a drop off at high school, whether it's curriculum related or interests or distractions, I don't know what it is, but there things get lost along the way along high school. But I see it at the primary school stage at elementary school that the interest is there. So I think part of it is, there needs to be a bit of a switch up in education or other opportunities outside of school to really foster and nurture and develop this interest because it really does take all kinds to be successful in the role. And Emma talked about a chatbot that she's building and that's a conversational thing. I can't see geek boys having being able to impact and create a interesting conversation, right. Then there's other areas that seems to be skewed and biased based on a predominantly male view of the world. So we need the tech, the industry needs these diverse perspectives and these diverse views, because, you know, to your point, it's going to impact the bottom line. It's going to also deliver a better product and it's going to reflect society. It's going to reflect the customers that are using it because we're made up of every, every race and color, creed, gender. And we need a team that represents that. >> Exactly. I couldn't agree more. Well, it sounds like the Arrigo family are quite the supporters of this, but also we need more of both of you. We need more of the sponsors and the parents who are encouraging the kids and making the right decisions to help them get along that path. And we need more folks like Emma and more women that we can see, "Wow, look what she's doing in such a short time period. We want to be just like that." So you guys are, have both been fantastic. I thank you so much for joining me at the International. Women's Showcase, more power to your family. We need more folks like you guys, so great work. Keep it up. >> Thank you. >> Thanks Lisa. >> For Frank and Emma Arrigo, I'm Lisa Martin. You're watching theCUBE's coverage of International Women's Showcase 2022. (soothing music)

Published Date : Mar 9 2022

SUMMARY :

it's great to have you on the program. Talk to us about how you got to AWS And I got through to that Talk to me about it the parental lens. And so they'd come to the office Talk to me about your interest physics to help, you know, and you said, I think she's And the things that you haven't that's what you want to see. able to learn that had you not the ability to play but being able to get "Wow, this my job is to play And would you advise them to or a teacher that you really sort So Frank, last question to you. And I can give you an example, and the parents who are For Frank and Emma

ENTITIES

Entity	Category	Confidence
Frank	PERSON	0.99+
Lisa Martin	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Lisa	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Seattle	LOCATION	0.99+
Frank Arrigo	PERSON	0.99+
Australia	LOCATION	0.99+
America	LOCATION	0.99+
Emma	PERSON	0.99+
Emma Arrigo	PERSON	0.99+
Emma Arrigo	PERSON	0.99+
six years	QUANTITY	0.99+
March, 2021	DATE	0.99+
Sydney	LOCATION	0.99+
four kids	QUANTITY	0.99+
Tech U	ORGANIZATION	0.99+
second piece	QUANTITY	0.99+
three months	QUANTITY	0.99+
six months	QUANTITY	0.99+
less than 25%	QUANTITY	0.99+
first time	QUANTITY	0.99+
three brothers	QUANTITY	0.99+
both	QUANTITY	0.99+
International Women's Day	EVENT	0.98+
six month	QUANTITY	0.98+
Amazonian	ORGANIZATION	0.98+
six	QUANTITY	0.97+
a year ago	DATE	0.97+
five	QUANTITY	0.95+
one	QUANTITY	0.95+
first	QUANTITY	0.94+
two	QUANTITY	0.93+
scratch	TITLE	0.93+
second last	QUANTITY	0.92+
International Women's Showcase 2022	EVENT	0.92+
first year	QUANTITY	0.92+
60-40	QUANTITY	0.91+
International Women's Showcase for 2022	EVENT	0.9+
one year	QUANTITY	0.9+
theCUBE	ORGANIZATION	0.89+
code club	ORGANIZATION	0.89+
telco	ORGANIZATION	0.89+
CUBE	ORGANIZATION	0.88+
few years ago	DATE	0.87+
Scratch	TITLE	0.86+
Women in Tech	EVENT	0.85+
last six months	DATE	0.84+
Women's Showcase	EVENT	0.8+
at least 30%	QUANTITY	0.79+

Ken Eisner, Director, AWS | AWS Public Sector Summit 2019

>> live from Washington, D. C. It's the Cube covering a ws public sector summit by Amazon Web services. >> Welcome back, everyone to our nation's capital. We are the Cube. We are live at A W s Public Sector summit. I'm your host Rebecca Night, along with my co host, John Farrier. We're joined by Ken Eisner Director Worldwide Educational programs at a WS Thanks so much for coming on the show >> you for having me. >> So tell our viewers a little bit. About what? What you do as the director of educational programs. Sure, I head >> up a program called a Ws Educate a ws educate is Amazon's global initiative to provide students and teachers around the world with the resource is that they need really to propel students into this awesome field of cloud computing. We launched it back in May of 2,015 and we did it to fill this demand. If we look at it today, what kind of right in the midst of this fourth industrial revolution is changing the means of production obviously in the digital on cloud space, But it's also creating this new worker class all around. Yeah, the cloud Advanced services like machine learning I robotics, I ot and so on. And if you looked at the employer demand, um, Cloud computing has been the number one linked in skill for the past four years in a row. We look at cloud computing. We kind of divide into four families. Software development, cloud architecture, the data world, you know, like machine learning I data science, business intelligence and Alex and then the middle school opportunities like technical customer support, age and cybersecurity, which can range all the way from middle school of Ph. D. But yet the timeto hire these people has grown up dramatically. Glass door as study of companies over there platform between two thousand 92 1,050 18 and show that the timeto higher had increased by 80%. Yet just think about that we talk about I mean, this conference is all about innovation. If you don't have builders, if you don't have innovators, how the heck Kenya Kenya innovate? >> Can I gotta ask you, Andy, just to have known him for over eight years and reporting on him and covering it was on when when everyone didn't understand yet what it was. Now everyone kind of does our congratulations and success. But to see him on stage, talk passionately about education. Yeah, mean and knowing Andy means it's kind of boiled up because he's very reserved, very conservative guy, pragmatic. But for him to be overtly projecting, his opinion around education, which was really yeah, pretty critical means something's going on. This is a huge issue not just in politics, riel, state, local areas where education, where >> the root of income inequality it's it's a lot of. >> There's a lot of challenges. People just aren't ready for these new types of jobs that are coming out that >> pay well, by the way. And this is Elliott >> of him out there that are unfilled for the first time, there are more jobs unfilled than there are candidates for them. You're solving this problem. Tell us what's going on in Amazon. Why the fewer what's going on with all this? Why everyone's so jacked up >> a great point. I, Andy, I think, said that education is at a crisis point today and really talked about that racial inequality piece way. Timeto hire people in the software development space Cloud architecture um technical called cloud Support Age. It's incredibly long so that it's just creating excess costs into the system, but were so passionate, like if you look at going to the cloud, Amazon wants to disrupt areas where we do not see that progress happening. Education is an area that's in vast need for disruption. There are people were doing amazing stuff. We've heard from Cal Poly. We've heard from Yeah, Arizona State. Carnegie Mellon. There's Joseph Alan at North Northeastern. >> People are >> doing great stuff. We're looking at you some places that are doing dual enrollment programs between high school and community in college and higher ed. But we're not moving fast enough, but you guys >> are provided with educate your program. This is people can walk in the front door without any kind of going through gatekeepers or any kind of getting college. This is straight up from the front, or they could be dropouts that could be post college re Skilling. Whatever it is, they could walk in the front door and get skilled up through educators that correct, >> we send people the ws educate dot com. All you need is some element of being in school activity, or you won't be going back from Re Skilling perspective and you came free access into resource is whether your student teacher get free access into content. That's map two jobs, because again, would you people warm from the education way? All want enlightenment contributors to sai all important, But >> really they >> want careers and all the stats gallop ransom good stats about both what, yet students and what industry wants. They want them to be aligned to jobs. And we're seeing that there's a man >> my master was specifically If I'm unemployed and I want to work, what can I do? I walk into you, You can go >> right on and we can you sign up, we'll give you access to these online cloud. Career pathways will give you micro credentials so we can bad you credential you against you We belong something on Samarian Robo maker. So individual services and full pathways. >> So this a >> direct door for someone unemployed We're going to get some work and a high paying job, >> right? Right. Absolutely. >> We and we also >> give you free access into a ws because we know that hands on practice doing real world applications is just vital. So we >> will do that end. By the way, at the end of >> this, we have a job board Amazon customer In part of our job, we're all saying >> these air >> jobs are super high in demand. You can apply to get a job as an intern or as a full time. Are you through our job? >> This is what people don't know about Rebecca. The war is not out there, and this is the people. Some of the problems. This is a solution >> exactly, but I actually want to get drilled down a little bit. This initiative is not just for grown ups. It's it's for Kimmie. This is for you. Kid starts in kindergarten, So I'm really interested to hear what you're doing and how you're thinking about really starting with the little kids and particularly underrepresented minorities and women who are not. There were also under representative in the in the cloud industry how you're thinking expansively about getting more of those people into these jacks. And actually, it's still >> Day one within all y'all way started with Way started with 18 and older because we saw that as the Keith the key lever into that audience and start with computer science but we've expanded greatly. Our wee last year reinvent, We introduced pathways for students 14 over and cloud literacy materials such as a cloud inventor, Cloud Explorer and Cloud Builder. Back to really get at those young audiences. We've introduced dual enrollment stuff that happens between high school community college or high school in higher ed, and we're working on partnerships with scratch First Robotics Project lead the way that introduced, whether it's blocked based coding, robotics were finding robotics is such a huge door opener again, not just for technically and >> get into it absolutely, because it's hands on >> stuff is relevant. They weren't relevant stuff that they can touch that. They can feel that they can open their browser, make something happen, build a mobile application. But they also want tohave pathways into the future. They want to see something that they can. Eventually you'll wind up in and a ws the cloud just makes it real, because you, Khun do real worlds stuff from a browser by working with the first robot. Biotics are using scratch toe develop Ai ai extensions in recognition and Lex and Polly and so on. So we've entered into partnerships with him right toe. Open up those doors and create that long term engagement and pipe on into the high demand jobs of tomorrow. >> What do you do in terms of the colleges that you mentioned and you mention Northeastern and Cal Poly Arizona State? What? What are you seeing? Is the most exciting innovations there. >> Yes. So, first of all, we happen to be it. We're in over 24 100 institutions around the world. We actually, by the way, began in the U. S. And was 65% us. Now it's actually 35% US 65% outside. We're in 200 countries and territories around the world. But institutions such as the doing amazing stuff Polo chow at a Georgia Tech. Things that he's doing with visual ization on top of a ws is absolutely amazing. We launched a cloud Ambassador program to reward and recognize the top faculty from around the world. They're truly doing amazing stuff, but even more, we're seeing the output from students. There was a student, Alfredo Cologne. He was lived in Puerto Rico, devastated by Hurricane Maria. So lost his, you know, economic mobility came to Florida and started taking classes at local schools. He found a ws educate and just dove headlong into it. Did eight Pathways and then applied for a job in Dev Ops at Universal Studios and received a job. He is one of my favorite evangelists, but and it's not just that higher ed. We found community college students. We launched a duel enrolment with between Santa Monica College and Roosevelt High School in Los Angeles, focusing again a majority minority students, largely Hispanic, in that community. Um, and Michael Brown, you finish the cloud computing certificate, applied for an internship, a mission clouds so again a partner of ours and became a God. Hey, guys, internship And they start a whole program around. So not only were seeing your excitement out of the institutions, which we are, but we're also seeing Simon. Our students and businesses all want to get involved in this hiring brigade. >> Can I gotta ask. We're learning so much about Amazon would cover him for a long time. You know all the key buzzwords. Yeah, raise the bar all these terms working backwards. So >> tell us about what's your >> working backwards plan? Because you have a great mission and we applaud. I think it's a super critical. I think it's so under promoted. I think we'll do our best to kind of promote. It's really valuable to society and getting people their jobs. Yeah, but it's a great opportunity, you know, itself. But what's your goal? What's your What's your objective? How you gonna get there, What your priorities, What do you what do you what do you need >> to wear? A pure educational workforce? And today our job is to work backwards from employers and this cloud opportunity, >> the thing that we >> care about our customers still remains or student on DH. So we want to give excessive mobility to students into these fields in cloud computing, not just today and tomorrow. That requires a lot that requires machine lurking in the algorithm that you that changed the learning objectives you based on career, so content maps to thes careers, and we're gonna be working with educational institutions on that recruited does. Recruiting doesn't do an effective job at matching students into jobs. >> Are we >> looking at all of just the elite institutions as signals for that? That's a big >> students are your customer and customer, but older in support systems that that support you, right? Like Cal Poly and others to me. >> Luli. We've also got governments. So we were down in Louisiana just some last month, and Governor Bel Edwards said, We're going to state why with a WS educates cloud degree program across all of their community college system across the University of Louisiana State system and into K 12 because we believe in those long term pathways. Never before have governors have ministers of country were being with the Ministry of Education for Singapore in Indonesia, and we're working deep into India. Never had they been more aligned toe workforce development. It creates huge unrest. We've seen this in Spain and Greece we see in the U. S. But it's also this economic imperative, and Andy is right. Education is at a crisis. Education is not solving the needs of all their constituents, but also industries to blame. We haven't been deeply partnered with education. That partnership is such a huge part of >> this structural things of involved in the educational system. It's Lanier's Internets nonlinear got progressions air differently. This is an opportunity because I think if the it's just like competition, Hey, if the U. S Department of Education not get their act together. People aren't going to go to school. I mean, Peter Thiel, another political spectrums, was paying people not to go to college when I was a little different radical view Andy over here saying, Look at it. That's why you >> see the >> data points starting to boil up. I see some of my younger son's friends all saying questioning right what they could get on YouTube. What's accessible now, Thinking Lor, You can learn about anything digitally now. This is totally People are starting to realize that I might not need to be in college or I might not need to be learning this. I can go direct >> and we pay lip >> service to lifelong education if you end. If you terminally end education at X year, well, you know what's what's hap happening with the rest of your life? We need to be lifelong learners. And, yes, we need to have off ramps and the on ramps throughout our education. Thie. Other thing is, it's not just skill, it's the skills are important, and we need to have people were certified in various a ws skills and come but we also need to focus on those competencies. Education does a good job around critical decision making skills and stuff like, um, collaboration. But >> do they really >> do a good job at inventing? Simplified? >> Do they teach kids >> to fam? Are we walking kids to >> social emotional, you know? >> Absolutely. Are we teaching? Were kids have tio think big to move >> fast and have that bias for action? >> I think that I want to have fun doing it way. Alright, well, so fun having you on the show. A great conversation. >> Thank you. I appreciate it. >> I'm Rebecca Knight for John. For your you are watching the cube. Stay tuned.

Published Date : Jun 12 2019

SUMMARY :

live from Washington, D. C. It's the Cube covering We are the Cube. What you do as the director of educational programs. 1,050 18 and show that the timeto higher had increased But for him to be overtly projecting, There's a lot of challenges. And this is Elliott Why the fewer what's it's just creating excess costs into the system, but were so passionate, We're looking at you some places that are doing dual enrollment programs This is people can walk in the front door without any and you came free access into resource is whether your student teacher get free access into They want them to be aligned to jobs. right on and we can you sign up, we'll give you access to these online cloud. Absolutely. give you free access into a ws because we know that hands on practice doing By the way, at the end of Are you through our job? Some of the problems. This initiative is not just for grown ups. the key lever into that audience and start with computer science but we've expanded term engagement and pipe on into the high demand jobs of tomorrow. What do you do in terms of the colleges that you mentioned and you mention Northeastern and Cal Poly Arizona State? Um, and Michael Brown, you finish the cloud computing certificate, raise the bar all these terms working backwards. Yeah, but it's a great opportunity, you know, itself. that you that changed the learning objectives you based on career, Like Cal Poly and others to me. Education is not solving the needs of all their constituents, Hey, if the U. S Department of Education not get their act together. need to be in college or I might not need to be learning this. service to lifelong education if you end. Were kids have tio think big to move Alright, well, so fun having you on the show. I appreciate it. For your you are watching the cube.

ENTITIES

Entity	Category	Confidence
Michael Brown	PERSON	0.99+
John Farrier	PERSON	0.99+
Ken Eisner	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Andy	PERSON	0.99+
Joseph Alan	PERSON	0.99+
Puerto Rico	LOCATION	0.99+
Rebecca Night	PERSON	0.99+
Peter Thiel	PERSON	0.99+
Cal Poly	ORGANIZATION	0.99+
Spain	LOCATION	0.99+
Florida	LOCATION	0.99+
Rebecca	PERSON	0.99+
Louisiana	LOCATION	0.99+
Santa Monica College	ORGANIZATION	0.99+
U. S Department of Education	ORGANIZATION	0.99+
Elliott	PERSON	0.99+
Washington, D. C.	LOCATION	0.99+
Indonesia	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
Roosevelt High School	ORGANIZATION	0.99+
two thousand	QUANTITY	0.99+
U. S.	LOCATION	0.99+
Rebecca Knight	PERSON	0.99+
Alfredo Cologne	PERSON	0.99+
Simon	PERSON	0.99+
Greece	LOCATION	0.99+
35%	QUANTITY	0.99+
Governor	PERSON	0.99+
Los Angeles	LOCATION	0.99+
80%	QUANTITY	0.99+
18	QUANTITY	0.99+
65%	QUANTITY	0.99+
tomorrow	DATE	0.99+
Hurricane Maria	EVENT	0.99+
one	QUANTITY	0.99+
WS	ORGANIZATION	0.99+
last month	DATE	0.99+
Georgia Tech	ORGANIZATION	0.99+
Biotics	ORGANIZATION	0.99+
200 countries	QUANTITY	0.99+
two jobs	QUANTITY	0.99+
today	DATE	0.98+
Universal Studios	ORGANIZATION	0.98+
over eight years	QUANTITY	0.98+
first time	QUANTITY	0.98+
last year	DATE	0.98+
both	QUANTITY	0.98+
AWS	EVENT	0.98+
first robot	QUANTITY	0.97+
YouTube	ORGANIZATION	0.97+
Ministry of Education for Singapore	ORGANIZATION	0.97+
Carnegie Mellon	ORGANIZATION	0.96+
Amazon Web	ORGANIZATION	0.96+
K 12	OTHER	0.96+
Lor	PERSON	0.96+
four families	QUANTITY	0.95+
India	LOCATION	0.95+
Polly	PERSON	0.95+
14	QUANTITY	0.93+
over 24 100 institutions	QUANTITY	0.93+
Kimmie	PERSON	0.93+
W s Public Sector summit	EVENT	0.92+
Bel Edwards	PERSON	0.91+
Khun	PERSON	0.91+
May of 2,015	DATE	0.9+
fourth industrial revolution	EVENT	0.89+
US	LOCATION	0.88+
University of Louisiana State	ORGANIZATION	0.87+
Cal Poly Arizona State	ORGANIZATION	0.84+
Hispanic	OTHER	0.83+
Day one	QUANTITY	0.83+
Samarian	PERSON	0.82+
Alex	TITLE	0.82+
ws public sector summit	EVENT	0.81+
Public Sector Summit 2019	EVENT	0.8+
John	PERSON	0.8+
North Northeastern	LOCATION	0.79+
first	QUANTITY	0.76+
Cube	ORGANIZATION	0.75+
past four years	DATE	0.74+
92 1,050	QUANTITY	0.72+

Panel Discussion | IBM Fast Track Your Data 2017

>> Narrator: Live, from Munich, Germany, it's the CUBE. Covering IBM, Fast Track Your Data. Brought to you by IBM. >> Welcome to Munich everybody. This is a special presentation of the CUBE, Fast Track Your Data, brought to you by IBM. My name is Dave Vellante. And I'm here with my cohost, Jim Kobielus. Jim, good to see you. Really good to see you in Munich. >> Jim: I'm glad I made it. >> Thanks for being here. So last year Jim and I hosted a panel at New York City on the CUBE. And it was quite an experience. We had, I think it was nine or 10 data scientists and we felt like that was a lot of people to organize and talk about data science. Well today, we're going to do a repeat of that. With a little bit of twist on topics. And we've got five data scientists. We're here live, in Munich. And we're going to kick off the Fast Track Your Data event with this data science panel. So I'm going to now introduce some of the panelists, or all of the panelists. Then we'll get into the discussions. I'm going to start with Lillian Pierson. Lillian thanks very much for being on the panel. You are in data science. You focus on training executives, students, and you're really a coach but with a lot of data science expertise based in Thailand, so welcome. >> Thank you, thank you so much for having me. >> Dave: You're very welcome. And so, I want to start with sort of when you focus on training people, data science, where do you start? >> Well it depends on the course that I'm teaching. But I try and start at the beginning so for my Big Data course, I actually start back at the fundamental concepts and definitions they would even need to understand in order to understand the basics of what Big Data is, data engineering. So, terms like data governance. Going into the vocabulary that makes up the very introduction of the course, so that later on the students can really grasp the concepts I present to them. You know I'm teaching a deep learning course as well, so in that case I start at a lot more advanced concepts. So it just really depends on the level of the course. >> Great, and we're going to come back to this topic of women in tech. But you know, we looked at some CUBE data the other day. About 17% of the technology industry comprises women. And so we're a little bit over that on our data science panel, we're about 20% today. So we'll come back to that topic. But I don't know if there's anything you would add? >> I'm really passionate about women in tech and women who code, in particular. And I'm connected with a lot of female programmers through Instagram. And we're supporting each other. So I'd love to take any questions you have on what we're doing in that space. At least as far as what's happening across the Instagram platform. >> Great, we'll circle back to that. All right, let me introduce Chris Penn. Chris, Boston based, all right, SMI. Chris is a marketing expert. Really trying to help people understand how to get, turn data into value from a marketing perspective. It's a very important topic. Not only because we get people to buy stuff but also understanding some of the risks associated with things like GDPR, which is coming up. So Chris, tell us a little bit about your background and your practice. >> So I actually started in IT and worked at a start up. And that's where I made the transition to marketing. Because marketing has much better parties. But what's really interesting about the way data science is infiltrating marketing is the technology came in first. You know, everything went digital. And now we're at a point where there's so much data. And most marketers, they kind of got into marketing as sort of the arts and crafts field. And are realizing now, they need a very strong, mathematical, statistical background. So one of the things, Adam, the reason why we're here and IBM is helping out tremendously is, making a lot of the data more accessible to people who do not have a data science background and probably never will. >> Great, okay thank you. I'm going to introduce Ronald Van Loon. Ronald, your practice is really all about helping people extract value out of data, driving competitive advantage, business advantage, or organizational excellence. Tell us a little bit about yourself, your background, and your practice. >> Basically, I've three different backgrounds. On one hand, I'm a director at a data consultancy firm called Adversitement. Where we help companies to become data driven. Mainly large companies. I'm an advisory board member at Simply Learn, which is an e-learning platform, especially also for big data analytics. And on the other hand I'm a blogger and I host a series of webinars. >> Okay, great, now Dez, Dez Blanchfield, I met you on Twitter, you know, probably a couple of years ago. We first really started to collaborate last year. We've spend a fair amount of time together. You are a data scientist, but you're also a jack of all trades. You've got a technology background. You sit on a number of boards. You work very active with public policy. So tell us a little bit more about what you're doing these days, a little bit more about your background. >> Sure, I think my primary challenge these days is communication. Trying to join the dots between my technical background and deeply technical pedigree, to just plain English, every day language, and business speak. So bridging that technical world with what's happening in the boardroom. Toe to toe with the geeks to plain English to execs in boards. And just hand hold them and steward them through the journey of the challenges they're facing. Whether it's the enormous rapid of change and the pace of change, that's just almost exhaustive and causing them to sprint. But not just sprint in one race but in multiple lanes at the same time. As well as some of the really big things that are coming up, that we've seen like GDPR. So it's that communication challenge and just hand holding people through that journey and that mix of technical and commercial experience. >> Great, thank you, and finally Joe Caserta. Founder and president of Caserta Concepts. Joe you're a practitioner. You're in the front lines, helping organizations, similar to Ronald. Extracting value from data. Translate that into competitive advantage. Tell us a little bit about what you're doing these days in Caserta Concepts. >> Thanks Dave, thanks for having me. Yeah, so Caserta's been around. I've been doing this for 30 years now. And natural progressions have been just getting more from application development, to data warehousing, to big data analytics, to data science. Very, very organically, that's just because it's where businesses need the help the most, over the years. And right now, the big focus is governance. At least in my world. Trying to govern when you have a bunch of disparate data coming from a bunch of systems that you have no control over, right? Like social media, and third party data systems. Bringing it in and how to you organize it? How do you ingest it? How do you govern it? How do you keep it safe? And also help to define ownership of the data within an organization within an enterprise? That's also a very hot topic. Which ties back into GDPR. >> Great, okay, so we're going to be unpacking a lot of topics associated with the expertise that these individuals have. I'm going to bring in Jim Kobielus, to the conversation. Jim, the newest Wikibon analyst. And newest member of the SiliconANGLE Media Team. Jim, get us started off. >> Yeah, so we're at an event, at an IBM event where machine learning and data science are at the heart of it. There are really three core themes here. Machine learning and data science, on the one hand. Unified governance on the other. And hybrid data management. I want to circle back or focus on machine learning. Machine learning is the coin of the realm, right now in all things data. Machine learning is the heart of AI. Machine learning, everybody is going, hiring, data scientists to do machine learning. I want to get a sense from our panel, who are experts in this area, what are the chief innovations and trends right now on machine learning. Not deep learning, the core of machine learning. What's super hot? What's in terms of new techniques, new technologies, new ways of organizing teams to build and to train machine learning models? I'd like to open it up. Let's just start with Lillian. What are your thoughts about trends in machine learning? What's really hot? >> It's funny that you excluded deep learning from the response for this, because I think the hottest space in machine learning is deep learning. And deep learning is machine learning. I see a lot of collaborative platforms coming out, where people, data scientists are able to work together with other sorts of data professionals to reduce redundancies in workflows. And create more efficient data science systems. >> Is there much uptake of these crowd sourcing environments for training machine learning wells. Like CrowdFlower, or Amazon Mechanical Turk, or Mighty AI? Is that a huge trend in terms of the workflow of data science or machine learning, a lot of that? >> I don't see that crowdsourcing is like, okay maybe I've been out of the crowdsourcing space for a while. But I was working with Standby Task Force back in 2013. And we were doing a lot of crowdsourcing. And I haven't seen the industry has been increasing, but I could be wrong. I mean, because there's no, if you're building automation models, most of the, a lot of the work that's being crowdsourced could actually be automated if someone took the time to just build the scripts and build the models. And so I don't imagine that, that's going to be a trend that's increasing. >> Well, automation machine learning pipeline is fairly hot, in terms of I'm seeing more and more research. Google's doing a fair amount of automated machine learning. The panel, what do you think about automation, in terms of the core modeling tasks involved in machine learning. Is that coming along? Are data scientists in danger of automating themselves out of a job? >> I don't think there's a risk of data scientist's being put out of a job. Let's just put that on the thing. I do think we need to get a bit clearer about this meme of the mythical unicorn. But to your call point about machine learning, I think what you'll see, we saw the cloud become baked into products, just as a given. I think machine learning is already crossed this threshold. We just haven't necessarily noticed or caught up. And if we look at, we're at an IBM event, so let's just do a call out for them. The data science experience platform, for example. Machine learning's built into a whole range of things around algorithm and data classification. And there's an assisted, guided model for how you get to certain steps, where you don't actually have to understand how machine learning works. You don't have to understand how the algorithms work. It shows you the different options you've got and you can choose them. So you might choose regression. And it'll give you different options on how to do that. So I think we've already crossed this threshold of baking in machine learning and baking in the data science tools. And we've seen that with Cloud and other technologies where, you know, the Office 365 is not, you can't get a non Cloud Office 365 account, right? I think that's already happened in machine learning. What we're seeing though, is organizations even as large as the Googles still in catch up mode, in my view, on some of the shift that's taken place. So we've seen them write little games and apps where people do doodles and then it runs through the ML library and says, "Well that's a cow, or a unicorn, or a duck." And you get awards, and gold coins, and whatnot. But you know, as far as 12 years ago I was working on a project, where we had full size airplanes acting as drones. And we mapped with two and 3-D imagery. With 2-D high res imagery and LiDAR for 3-D point Clouds. We were finding poles and wires for utility companies, using ML before it even became a trend. And baking it right into the tools. And used to store on our web page and clicked and pointed on. >> To counter Lillian's point, it's not crowdsourcing but crowd sharing that's really powering a lot of the rapid leaps forward. If you look at, you know, DSX from IBM. Or you look at Node-RED, huge number of free workflows that someone has probably already done the thing that you are trying to do. Go out and find in the libraries, through Jupyter and R Notebooks, there's an ability-- >> Chris can you define before you go-- >> Chris: Sure. >> This is great, crowdsourcing versus crowd sharing. What's the distinction? >> Well, so crowdsourcing, kind of, where in the context of the question you ask is like I'm looking for stuff that other people, getting people to do stuff that, for me. It's like asking people to mine classifieds. Whereas crowd sharing, someone has done the thing already, it already exists. You're not purpose built, saying, "Jim, help me build this thing." It's like, "Oh Jim, you already "built this thing, cool. "So can I fork it and make my own from it?" >> Okay, I see what you mean, keep going. >> And then, again, going back to earlier. In terms of the advancements. Really deep learning, it probably is a good idea to just sort of define these things. Machine learning is how machines do things without being explicitly programmed to do them. Deep learning's like if you can imagine a stack of pancakes, right? Each pancake is a type of machine learning algorithm. And your data is the syrup. You pour the data on it. It goes from layer, to layer, to layer, to layer, and what you end up with at the end is breakfast. That's the easiest analogy for what deep learning is. Now imagine a stack of pancakes, 500 or 1,000 high, that's where deep learning's going now. >> Sure, multi layered machine learning models, essentially, that have the ability to do higher levels of abstraction. Like image analysis, Lillian? >> I had a comment to add about automation and data science. Because there are a lot of tools that are able to, or applications that are able to use data science algorithms and output results. But the reason that data scientists aren't in risk of losing their jobs, is because just because you can get the result, you also have to be able to interpret it. Which means you have to understand it. And that involves deep math and statistical understanding. Plus domain expertise. So, okay, great, you took out the coding element but that doesn't mean you can codify a person's ability to understand and apply that insight. >> Dave: Joe, you have something to add? >> I could just add that I see the trend. Really, the reason we're talking about it today is machine learning is not necessarily, it's not new, like Dez was saying. But what's different is the accessibility of it now. It's just so easily accessible. All of the tools that are coming out, for data, have machine learning built into it. So the machine learning algorithms, which used to be a black art, you know, years ago, now is just very easily accessible. That you can get, it's part of everyone's toolbox. And the other reason that we're talking about it more, is that data science is starting to become a core curriculum in higher education. Which is something that's new, right? That didn't exist 10 years ago? But over the past five years, I'd say, you know, it's becoming more and more easily accessible for education. So now, people understand it. And now we have it accessible in our tool sets. So now we can apply it. And I think that's, those two things coming together is really making it becoming part of the standard of doing analytics. And I guess the last part is, once we can train the machines to start doing the analytics, right? And get smarter as it ingests more data. And then we can actually take that and embed it in our applications. That's the part that you still need data scientists to create that. But once we can have standalone appliances that are intelligent, that's when we're going to start seeing, really, machine learning and artificial intelligence really start to take off even more. >> Dave: So I'd like to switch gears a little bit and bring Ronald on. >> Okay, yes. >> Here you go, there. >> Ronald, the bromide in this sort of big data world we live in is, the data is the new oil. You got to be a data driven company and many other cliches. But when you talk to organizations and you start to peel the onion. You find that most companies really don't have a good way to connect data with business impact and business value. What are you seeing with your clients and just generally in the community, with how companies are doing that? How should they do that? I mean, is that something that is a viable approach? You don't see accountants, for example, quantifying the value of data on a balance sheet. There's no standards for doing that. And so it's sort of this fuzzy concept. How are and how should organizations take advantage of data and turn it into value. >> So, I think in general, if you look how companies look at data. They have departments and within the departments they have tools specific for this department. And what you see is that there's no central, let's say, data collection. There's no central management of governance. There's no central management of quality. There's no central management of security. Each department is manages their data on their own. So if you didn't ask, on one hand, "Okay, how should they do it?" It's basically go back to the drawing table and say, "Okay, how should we do it?" We should collect centrally, the data. And we should take care for central governance. We should take care for central data quality. We should take care for centrally managing this data. And look from a company perspective and not from a department perspective what the value of data is. So, look at the perspective from your whole company. And this means that it has to be brought on one end to, whether it's from C level, where most of them still fail to understand what it really means. And what the impact can be for that company. >> It's a hard problem. Because data by its' very nature is now so decentralized. But Chris you have a-- >> The thing I want to add to that is, think about in terms of valuing data. Look at what it would cost you for data breach. Like what is the expensive of having your data compromised. If you don't have governance. If you don't have policy in place. Look at the major breaches of the last couple years. And how many billions of dollars those companies lost in market value, and trust, and all that stuff. That's one way you can value data very easily. "What will it cost us if we mess this up?" >> So a lot of CEOs will hear that and say, "Okay, I get it. "I have to spend to protect myself, "but I'd like to make a little money off of this data thing. "How do I do that?" >> Well, I like to think of it, you know, I think data's definitely an asset within an organization. And is becoming more and more of an asset as the years go by. But data is still a raw material. And that's the way I think about it. In order to actually get the value, just like if you're creating any product, you start with raw materials and then you refine it. And then it becomes a product. For data, data is a raw material. You need to refine it. And then the insight is the product. And that's really where the value is. And the insight is absolutely, you can monetize your insight. >> So data is, abundant insights are scarce. >> Well, you know, actually you could say that intermediate between insights and the data are the models themselves. The statistical, predictive, machine learning models. That are a crystallization of insights that have been gained by people called data scientists. What are your thoughts on that? Are statistical, predictive, machine learning models something, an asset, that companies, organizations, should manage governance of on a centralized basis or not? >> Well the models are essentially the refinery system, right? So as you're refining your data, you need to have process around how you exactly do that. Just like refining anything else. It needs to be controlled and it needs to be governed. And I think that data is no different from that. And I think that it's very undisciplined right now, in the market or in the industry. And I think maturing that discipline around data science, I think is something that's going to be a very high focus in this year and next. >> You were mentioning, "How do you make money from data?" Because there's all this risk associated with security breaches. But at the risk of sounding simplistic, you can generate revenue from system optimization, or from developing products and services. Using data to develop products and services that better meet the demands and requirements of your markets. So that you can sell more. So either you are using data to earn more money. Or you're using data to optimize your system so you have less cost. And that's a simple answer for how you're going to be making money from the data. But yes, there is always the counter to that, which is the security risks. >> Well, and my question really relates to, you know, when you think of talking to C level executives, they kind of think about running the business, growing the business, and transforming the business. And a lot of times they can't fund these transformations. And so I would agree, there's many, many opportunities to monetize data, cut costs, increase revenue. But organizations seem to struggle to either make a business case. And actually implement that transformation. >> Dave, I'd love to have a crack at that. I think this conversation epitomizes the type of things that are happening in board rooms and C suites already. So we've really quickly dived into the detail of data. And the detail of machine learning. And the detail of data science, without actually stopping and taking a breath and saying, "Well, we've "got lots of it, but what have we got? "Where is it? "What's the value of it? "Is there any value in it at all?" And, "How much time and money should we invest in it?" For example, we talk of being about a resource. I look at data as a utility. When I turn the tap on to get a drink of water, it's there as a utility. I counted it being there but I don't always sample the quality of the water and I probably should. It could have Giardia in it, right? But what's interesting is I trust the water at home, in Sydney. Because we have a fairly good experience with good quality water. If I were to go to some other nation. I probably wouldn't trust that water. And I think, when you think about it, what's happening in organizations. It's almost the same as what we're seeing here today. We're having a lot of fun, diving into the detail. But what we've forgotten to do is ask the question, "Well why is data even important? "What's the reasoning to the business? "Why are we in business? "What are we doing as an organization? "And where does data fit into that?" As opposed to becoming so fixated on data because it's a media hyped topic. I think once you can wind that back a bit and say, "Well, we have lot's of data, "but is it good data? "Is it quality data? "Where's it coming from? "Is it ours? "Are we allowed to have it? "What treatment are we allowed to give that data?" As you said, "Are we controlling it? "And where are we controlling it? "Who owns it?" There's so many questions to be asked. But the first question I like to ask people in plain English is, "Well is there any value "in data in the first place? "What decisions are you making that data can help drive? "What things are in your organizations, "KPIs and milestones you're trying to meet "that data might be a support?" So then instead of becoming fixated with data as a thing in itself, it becomes part of your DNA. Does that make sense? >> Think about what money means. The Economists' Rhyme, "Money is a measure for, "a systems for, a medium, a measure, and exchange." So it's a medium of exchange. A measure of value, a way to exchange something. And a way to store value. Data, good clean data, well governed, fits all four of those. So if you're trying to figure out, "How do we make money out of stuff." Figure out how money works. And then figure out how you map data to it. >> So if we approach and we start with a company, we always start with business case, which is quite clear. And defined use case, basically, start with a team on one hand, marketing people, sales people, operational people, and also the whole data science team. So start with this case. It's like, defining, basically a movie. If you want to create the movie, You know where you're going to. You know what you want to achieve to create the customer experience. And this is basically the same with a business case. Where you define, "This is the case. "And this is how we're going to derive value, "start with it and deliver value within a month." And after the month, you check, "Okay, where are we and how can we move forward? "And what's the value that we've brought?" >> Now I as well, start with business case. I've done thousands of business cases in my life, with organizations. And unless that organization was kind of a data broker, the business case rarely has a discreet component around data. Is that changing, in your experience? >> Yes, so we guide companies into be data driven. So initially, indeed, they don't like to use the data. They don't like to use the analysis. So that's why, how we help. And is it changing? Yes, they understand that they need to change. But changing people is not always easy. So, you see, it's hard if you're not involved and you're not guiding it, they fall back in doing the daily tasks. So it's changing, but it's a hard change. >> Well and that's where this common parlance comes in. And Lillian, you, sort of, this is what you do for a living, is helping people understand these things, as you've been sort of evangelizing that common parlance. But do you have anything to add? >> I wanted to add that for organizational implementations, another key component to success is to start small. Start in one small line of business. And then when you've mastered that area and made it successful, then try and deploy it in more areas of the business. And as far as initializing big data implementation, that's generally how to do it successfully. >> There's the whole issue of putting a value on data as a discreet asset. Then there's the issue, how do you put a value on a data lake? Because a data lake, is essentially an asset you build on spec. It's an exploratory archive, essentially, of all kinds of data that might yield some insights, but you have to have a team of data scientists doing exploration and modeling. But it's all on spec. How do you put a value on a data lake? And at what point does the data lake itself become a burden? Because you got to store that data and manage it. At what point do you drain that lake? At what point, do the costs of maintaining that lake outweigh the opportunity costs of not holding onto it? >> So each Hadoop note is approximately $20,000 per year cost for storage. So I think that there needs to be a test and a diagnostic, before even inputting, ingesting the data and storing it. "Is this actually going to be useful? "What value do we plan to create from this?" Because really, you can't store all the data. And it's a lot cheaper to store data in Hadoop then it was in traditional systems but it's definitely not free. So people need to be applying this test before even ingesting the data. Why do we need this? What business value? >> I think the question we need to also ask around this is, "Why are we building data lakes "in the first place? "So what's the function it's going to perform for you?" There's been a huge drive to this idea. "We need a data lake. "We need to put it all somewhere." But invariably they become data swamps. And we only half jokingly say that because I've seen 90 day projects turn from a great idea, to a really bad nightmare. And as Lillian said, it is cheaper in some ways to put it into a HDFS platform, in a technical sense. But when we look at all the fully burdened components, it's actually more expensive to find Hadoop specialists and Spark specialists to maintain that cluster. And invariably I'm finding that big data, quote unquote, is not actually so much lots of data, it's complex data. And as Lillian said, "You don't always "need to store it all." So I think if we go back to the question of, "What's the function of a data lake in the first place? "Why are we building one?" And then start to build some fully burdened cost components around that. We'll quickly find that we don't actually need a data lake, per se. We just need an interim data store. So we might take last years' data and tokenize it, and analyze it, and do some analytics on it, and just keep the meta data. So I think there is this rush, for a whole range of reasons, particularly vendor driven. To build data lakes because we think they're a necessity, when in reality they may just be an interim requirement and we don't need to keep them for a long term. >> I'm going to attempt to, the last few questions, put them all together. And I think, they all belong together because one of the reasons why there's such hesitation about progress within the data world is because there's just so much accumulated tech debt already. Where there's a new idea. We go out and we build it. And six months, three years, it really depends on how big the idea is, millions of dollars is spent. And then by the time things are built the idea is pretty much obsolete, no one really cares anymore. And I think what's exciting now is that the speed to value is just so much faster than it's ever been before. And I think that, you know, what makes that possible is this concept of, I don't think of a data lake as a thing. I think of a data lake as an ecosystem. And that ecosystem has evolved so much more, probably in the last three years than it has in the past 30 years. And it's exciting times, because now once we have this ecosystem in place, if we have a new idea, we can actually do it in minutes not years. And that's really the exciting part. And I think, you know, data lake versus a data swamp, comes back to just traditional data architecture. And if you architect your data lake right, you're going to have something that's substantial, that's you're going to be able to harness and grow. If you don't do it right. If you just throw data. If you buy Hadoop cluster or a Cloud platform and just throw your data out there and say, "We have a lake now." yeah, you're going to create a mess. And I think taking the time to really understand, you know, the new paradigm of data architecture and modern data engineering, and actually doing it in a very disciplined way. If you think about it, what we're doing is we're building laboratories. And if you have a shabby, poorly built laboratory, the best scientist in the world isn't going to be able to prove his theories. So if you have a well built laboratory and a clean room, then, you know a scientist can get what he needs done very, very, very efficiently. And that's the goal, I think, of data management today. >> I'd like to just quickly add that I totally agree with the challenge between on premise and Cloud mode. And I think one of the strong themes of today is going to be the hybrid data management challenge. And I think organizations, some organizations, have rushed to adopt Cloud. And thinking it's a really good place to dump the data and someone else has to manage the problem. And then they've ended up with a very expensive death by 1,000 cuts in some senses. And then others have been very reluctant as a result of not gotten access to rapid moving and disruptive technology. So I think there's a really big challenge to get a basic conversation going around what's the value using Cloud technology as in adopting it, versus what are the risks? And when's the right time to move? For example, should we Cloud Burst for workloads? Do we move whole data sets in there? You know, moving half a petabyte of data into a Cloud platform back is a non-trivial exercise. But moving a terabyte isn't actually that big a deal anymore. So, you know, should we keep stuff behind the firewalls? I'd be interested in seeing this week where 80% of the data, supposedly is. And just push out for Cloud tools, machine learning, data science tools, whatever they might be, cognitive analytics, et cetera. And keep the bulk of the data on premise. Or should we just move whole spools into the Cloud? There is no one size fits all. There's no silver bullet. Every organization has it's own quirks and own nuances they need to think through and make a decision themselves. >> Very often, Dez, organizations have zonal architectures so you'll have a data lake that consists of a no sequel platform that might be used for say, mobile applications. A Hadoop platform that might be used for unstructured data refinement, so forth. A streaming platform, so forth and so on. And then you'll have machine learning models that are built and optimized for those different platforms. So, you know, think of it in terms of then, your data lake, is a set of zones that-- >> It gets even more complex just playing on that theme, when you think about what Cisco started, called Folk Computing. I don't really like that term. But edge analytics, or computing at the edge. We've seen with the internet coming along where we couldn't deliver everything with a central data center. So we started creating this concept of content delivery networks, right? I think the same thing, I know the same thing has happened in data analysis and data processing. Where we've been pulling social media out of the Cloud, per se, and bringing it back to a central source. And doing analytics on it. But when you think of something like, say for example, when the Dreamliner 787 from Boeing came out, this airplane created 1/2 a terabyte of data per flight. Now let's just do some quick, back of the envelope math. There's 87,400 fights a day, just in the domestic airspace in the USA alone, per day. Now 87,400 by 1/2 a terabyte, that's 43 point five petabytes a day. You physically can't copy that from quote unquote in the Cloud, if you'll pardon the pun, back to the data center. So now we've got the challenge, a lot of our Enterprise data's behind a firewall, supposedly 80% of it. But what's out at the edge of the network. Where's the value in that data? So there are zonal challenges. Now what do I do with my Enterprise versus the open data, the mobile data, the machine data. >> Yeah, we've seen some recent data from IDC that says, "About 43% of the data "is going to stay at the edge." We think that, that's way understated, just given the examples. We think it's closer to 90% is going to stay at the edge. >> Just on the airplane topic, right? So Airbus wasn't going to be outdone. Boeing put 4,000 sensors or something in their 787 Dreamliner six years ago. Airbus just announced an 83, 81,000 with 10,000 sensors in it. Do the same math. Now the FAA in the US said that all aircraft and all carriers have to be, by early next year, I think it's like March or April next year, have to be at the same level of BIOS. Or the same capability of data collection and so forth. It's kind of like a mini GDPR for airlines. So with the 83, 81,000 with 10,000 sensors, that becomes two point five terabytes per flight. If you do the math, it's 220 petabytes of data just in one day's traffic, domestically in the US. Now, it's just so mind boggling that we're going to have to completely turn our thinking on its' head, on what do we do behind the firewall? What do we do in the Cloud versus what we might have to do in the airplane? I mean, think about edge analytics in the airplane processing data, as you said, Jim, streaming analytics in flight. >> Yeah that's a big topic within Wikibon, so, within the team. Me and David Floyer, and my other colleagues. They're talking about the whole notion of edge architecture. Not only will most of the data be persisted at the edge, most of the deep learning models like TensorFlow will be executed at the edge. To some degree, the training of those models will happen in the Cloud. But much of that will be pushed in a federated fashion to the edge, or at least I'm predicting. We're already seeing some industry moves in that direction, in terms of architectures. Google has a federated training, project or initiative. >> Chris: Look at TensorFlow Lite. >> Which is really fascinating for it's geared to IOT, I'm sorry, go ahead. >> Look at TensorFlow Lite. I mean in the announcement of having every Android device having ML capabilities, is Google's essential acknowledgment, "We can't do it all." So we need to essentially, sort of like a setting at home. Everyone's smartphone top TV box just to help with the processing. >> Now we're talking about this, this sort of leads to this IOT discussion but I want to underscore the operating model. As you were saying, "You can't just "lift and shift to the Cloud." You're not going to, CEOs aren't going to get the billion dollar hit by just doing that. So you got to change the operating model. And that leads to, this discussion of IOT. And an entirely new operating model. >> Well, there are companies that are like Sisense who have worked with Intel. And they've taken this concept. They've taken the business logic and not just putting it in the chip, but actually putting it in memory, in the chip. So as data's going through the chip it's not just actually being processed but it's actually being baked in memory. So level one, two, and three cache. Now this is a game changer. Because as Chris was saying, even if we were to get the data back to a central location, the compute load, I saw a real interesting thing from I think it was Google the other day, one of the guys was doing a talk. And he spoke about what it meant to add cognitive and voice processing into just the Android platform. And they used some number, like that had, double the amount of compute they had, just to add voice for free, to the Android platform. Now even for Google, that's a nontrivial exercise. So as Chris was saying, I think we have to again, flip it on its' head and say, "How much can we put "at the edge of the network?" Because think about these phones. I mean, even your fridge and microwave, right? We put a man on the moon with something that these days, we make for $89 at home, on the Raspberry Pie computer, right? And even that was 1,000 times more powerful. When we start looking at what's going into the chips, we've seen people build new, not even GPUs, but deep learning and stream analytics capable chips. Like Google, for example. That's going to make its' way into consumer products. So that, now the compute capacity in phones, is going to, I think transmogrify in some ways because there is some magic in there. To the point where, as Chris was saying, "We're going to have the smarts in our phone." And a lot of that workload is going to move closer to us. And only the metadata that we need to move is going to go centrally. >> Well here's the thing. The edge isn't the technology. The edge is actually the people. When you look at, for example, the MIT language Scratch. This is kids programming language. It's drag and drop. You know, kids can assemble really fun animations and make little movies. We're training them to build for IOT. Because if you look at a system like Node-RED, it's an IBM interface that is drag and drop. Your workflow is for IOT. And you can push that to a device. Scratch has a converter for doing those. So the edge is what those thousands and millions of kids who are learning how to code, learning how to think architecturally and algorithmically. What they're going to create that is beyond what any of us can possibly imagine. >> I'd like to add one other thing, as well. I think there's a topic we've got to start tabling. And that is what I refer to as the gravity of data. So when you think about how planets are formed, right? Particles of dust accrete. They form into planets. Planets develop gravity. And the reason we're not flying into space right now is that there's gravitational force. Even though it's one of the weakest forces, it keeps us on our feet. Oftentimes in organizations, I ask them to start thinking about, "Where is the center "of your universe with regard to the gravity of data." Because if you can follow the center of your universe and the gravity of your data, you can often, as Chris is saying, find where the business logic needs to be. And it could be that you got to think about a storage problem. You can think about a compute problem. You can think about a streaming analytics problem. But if you can find where the center of your universe and the center of your gravity for your data is, often you can get a really good insight into where you can start focusing on where the workloads are going to be where the smarts are going to be. Whether it's small, medium, or large. >> So this brings up the topic of data governance. One of the themes here at Fast Track Your Data is GDPR. What it means. It's one of the reasons, I think IBM selected Europe, generally, Munich specifically. So let's talk about GDPR. We had a really interesting discussion last night. So let's kind of recreate some of that. I'd like somebody in the panel to start with, what is GDPR? And why does it matter, Ronald? >> Yeah, maybe I can start. Maybe a little bit more in general unified governance. So if i talk to companies and I need to explain to them what's governance, I basically compare it with a crime scene. So in a crime scene if something happens, they start with securing all the evidence. So they start sealing the environment. And take care that all the evidence is collected. And on the other hand, you see that they need to protect this evidence. There are all kinds of policies. There are all kinds of procedures. There are all kinds of rules, that need to be followed. To take care that the whole evidence is secured well. And once you start, basically, investigating. So you have the crime scene investigators. You have the research lab. You have all different kind of people. They need to have consent before they can use all this evidence. And the whole reason why they're doing this is in order to collect the villain, the crook. To catch him and on the other hand, once he's there, to convict him. And we do this to have trust in the materials. Or trust in basically, the analytics. And on the other hand to, the public have trust in everything what's happened with the data. So if you look to a company, where data is basically the evidence, this is the value of your data. It's similar to like the evidence within a crime scene. But most companies don't treat it like this. So if we then look to GDPR, GDPR basically shifts the power and the ownership of the data from the company to the person that created it. Which is often, let's say the consumer. And there's a lot of paradox in this. Because all the companies say, "We need to have this customer data. "Because we need to improve the customer experience." So if you make it concrete and let's say it's 1st of June, so GDPR is active. And it's first of June 2018. And I go to iTunes, so I use iTunes. Let's go to iTunes said, "Okay, Apple please "give me access to my data." I want to see which kind of personal information you have stored for me. On the other end, I want to have the right to rectify all this data. I want to be able to change it and give them a different level of how they can use my data. So I ask this to iTunes. And then I say to them, okay, "I basically don't like you anymore. "I want to go to Spotify. "So please transfer all my personal data to Spotify." So that's possible once it's June 18. Then I go back to iTunes and say, "Okay, I don't like it anymore. "Please reduce my consent. "I withdraw my consent. "And I want you to remove all my "personal data for everything that you use." And I go to Spotify and I give them, let's say, consent for using my data. So this is a shift where you can, as a person be the owner of the data. And this has a lot of consequences, of course, for organizations, how to manage this. So it's quite simple for the consumer. They get the power, it's maturing the whole law system. But it's a big consequence of course for organizations. >> This is going to be a nightmare for marketers. But fill in some of the gaps there. >> Let's go back, so GDPR, the General Data Protection Regulation, was passed by the EU in 2016, in May of 2016. It is, as Ronald was saying, it's four basic things. The right to privacy. The right to be forgotten. Privacy built into systems by default. And the right to data transfer. >> Joe: It takes effect next year. >> It is already in effect. GDPR took effect in May of 2016. The enforcement penalties take place the 25th of May 2018. Now here's where, there's two things on the penalty side that are important for everyone to know. Now number one, GDPR is extra territorial. Which means that an EU citizen, anywhere on the planet has GDPR, goes with them. So say you're a pizza shop in Nebraska. And an EU citizen walks in, orders a pizza. Gives her the credit card and stuff like that. If you for some reason, store that data, GDPR now applies to you, Mr. Pizza shop, whether or not you do business in the EU. Because an EU citizen's data is with you. Two, the penalties are much stiffer then they ever have been. In the old days companies could simply write off penalties as saying, "That's the cost of doing business." With GDPR the penalties are up to 4% of your annual revenue or 20 million Euros, whichever is greater. And there may be criminal sanctions, charges, against key company executives. So there's a lot of questions about how this is going to be implemented. But one of the first impacts you'll see from a marketing perspective is all the advertising we do, targeting people by their age, by their personally identifiable information, by their demographics. Between now and May 25th 2018, a good chunk of that may have to go away because there's no way for you to say, "Well this person's an EU citizen, this person's not." People give false information all the time online. So how do you differentiate it? Every company, regardless of whether they're in the EU or not will have to adapt to it, or deal with the penalties. >> So Lillian, as a consumer this is designed to protect you. But you had a very negative perception of this regulation. >> I've looked over the GDPR and to me it actually looks like a socialist agenda. It looks like (panel laughs) no, it looks like a full assault on free enterprise and capitalism. And on its' face from a legal perspective, its' completely and wholly unenforceable. Because they're assigning jurisdictional rights to the citizen. But what are they going to do? They're going to go to Nebraska and they're going to call in the guy from the pizza shop? And call him into what court? The EU court? It's unenforceable from a legal perspective. And if you write a law that's unenforceable, you know, it's got to be enforceable in every element. It can't be just, "Oh, we're only "going to enforce it for Facebook and for Google. "But it's not enforceable for," it needs to be written so that it's a complete and actionable law. And it's not written in that way. And from a technological perspective it's not implementable. I think you said something like 652 EU regulators or political people voted for this and 10 voted against it. But what do they know about actually implementing it? Is it possible? There's all sorts of regulations out there that aren't possible to implement. I come from an environmental engineering background. And it's absolutely ridiculous because these agencies will pass laws that actually, it's not possible to implement those in practice. The cost would be too great. And it's not even needed. So I don't know, I just saw this and I thought, "You know, if the EU wants to," what they're essentially trying to do is regulate what the rest of the world does on the internet. And if they want to build their own internet like China has and police it the way that they want to. But Ronald here, made an analogy between data, and free enterprise, and a crime scene. Now to me, that's absolutely ridiculous. What does data and someone signing up for an email list have to do with a crime scene? And if EU wants to make it that way they can police their own internet. But they can't go across the world. They can't go to Singapore and tell Singapore, or go to the pizza shop in Nebraska and tell them how to run their business. >> You know, EU overreach in the post Brexit era, of what you're saying has a lot of validity. How far can the tentacles of the EU reach into other sovereign nations. >> What court are they going to call them into? >> Yeah. >> I'd like to weigh in on this. There are lots of unknowns, right? So I'd like us to focus on the things we do know. We've already dealt with similar situations before. In Australia, we introduced a goods and sales tax. Completely foreign concept. Everything you bought had 10% on it. No one knew how to deal with this. It was a completely new practice in accounting. There's a whole bunch of new software that had to be written. MYRB had to have new capability, but we coped. No one actually went to jail yet. It's decades later, for not complying with GST. So what it was, was a framework on how to shift from non sales tax related revenue collection. To sales tax related revenue collection. I agree that there are some egregious things built into this. I don't disagree with that at all. But I think if I put my slightly broader view of the world hat on, we have well and truly gone past the point in my mind, where data was respected, data was treated in a sensible way. I mean I get emails from companies I've never done business with. And when I follow it up, it's because I did business with a credit card company, that gave it to a service provider, that thought that I was going to, when I bought a holiday to come to Europe, that I might want travel insurance. Now some might say there's value in that. And other's say there's not, there's the debate. But let's just focus on what we're talking about. We're talking about a framework for governance of the treatment of data. If we remove all the emotive component, what we are talking about is a series of guidelines, backed by laws, that say, "We would like you to do this," in an ideal world. But I don't think anyone's going to go to jail, on day one. They may go to jail on day 180. If they continue to do nothing about it. So they're asking you to sort of sit up and pay attention. Do something about it. There's a whole bunch of relief around how you approach it. The big thing for me, is there's no get out of jail card, right? There is no get out of jail card for not complying. But there's plenty of support. I mean, we're going to have ambulance chasers everywhere. We're going to have class actions. We're going to have individual suits. The greatest thing to do right now is get into GDPR law. Because you seem to think data scientists are unicorn? >> What kind of life is that if there's ambulance chasers everywhere? You want to live like that? >> Well I think we've seen ad blocking. I use ad blocking as an example, right? A lot of organizations with advertising broke the internet by just throwing too much content on pages, to the point where they're just unusable. And so we had this response with ad blocking. I think in many ways, GDPR is a regional response to a situation where I don't think it's the exact right answer. But it's the next evolutional step. We'll see things evolve over time. >> It's funny you mentioned it because in the United States one of the things that has happened, is that with the change in political administrations, the regulations on what companies can do with your data have actually been laxened, to the point where, for example, your internet service provider can resell your browsing history, with or without your consent. Or your consent's probably buried in there, on page 47. And so, GDPR is kind of a response to saying, "You know what? "You guys over there across the Atlantic "are kind of doing some fairly "irresponsible things with what you allow companies to do." Now, to Lillian's point, no one's probably going to go after the pizza shop in Nebraska because they don't do business in the EU. They don't have an EU presence. And it's unlikely that an EU regulator's going to get on a plane from Brussels and fly to Topeka and say, or Omaha, sorry, "Come on Joe, let's get the pizza shop in order here." But for companies, particularly Cloud companies, that have offices and operations within the EU, they have to sit up and pay attention. So if you have any kind of EU operations, or any kind of fiscal presence in the EU, you need to get on board. >> But to Lillian's point it becomes a boondoggle for lawyers in the EU who want to go after deep pocketed companies like Facebook and Google. >> What's the value in that? It seems like regulators are just trying to create work for themselves. >> What about the things that say advertisers can do, not so much with the data that they have? With the data that they don't have. In other words, they have people called data scientists who build models that can do inferences on sparse data. And do amazing things in terms of personalization. What do you do about all those gray areas? Where you got machine learning models and so forth? >> But it applies-- >> It applies to personally identifiable information. But if you have a talented enough data scientist, you don't need the PII or even the inferred characteristics. If a certain type of behavior happens on your website, for example. And this path of 17 pages almost always leads to a conversion, it doesn't matter who you are or where you're coming from. If you're a good enough data scientist, you can build a model that will track that. >> Like you know, target, infer some young woman was pregnant. And they inferred correctly even though that was never divulged. I mean, there's all those gray areas that, how can you stop that slippery slope? >> Well I'm going to weigh in really quickly. A really interesting experiment for people to do. When people get very emotional about it I say to them, "Go to Google.com, "view source, put it in seven point Courier "font in Word and count how many pages it is." I guess you can't guess how many pages? It's 52 pages of seven point Courier font, HTML to render one logo, and a search field, and a click button. Now why do we need 52 pages of HTML source code and Java script just to take a search query. Think about what's being done in that. It's effectively a mini operating system, to figure out who you are, and what you're doing, and where you been. Now is that a good or bad thing? I don't know, I'm not going to make a judgment call. But what I'm saying is we need to stop and take a deep breath and say, "Does anybody need a 52 page, "home page to take a search query?" Because that's just the tip of the iceberg. >> To that point, I like the results that Google gives me. That's why I use Google and not Bing. Because I get better search results. So, yeah, I don't mind if you mine my personal data and give me, our Facebook ads, those are the only ads, I saw in your article that GDPR is going to take out targeted advertising. The only ads in the entire world, that I like are Facebook ads. Because I actually see products I'm interested in. And I'm happy to learn about that. I think, "Oh I want to research that. "I want to see this new line of products "and what are their competitors?" And I like the targeted advertising. I like the targeted search results because it's giving me more of the information that I'm actually interested in. >> And that's exactly what it's about. You can still decide, yourself, if you want to have this targeted advertising. If not, then you don't give consent. If you like it, you give consent. So if a company gives you value, you give consent back. So it's not that it's restricting everything. It's giving consent. And I think it's similar to what happened and the same type of response, what happened, we had the Mad Cow Disease here in Europe, where you had the whole food chain that needed to be tracked. And everybody said, "No, it's not required." But now it's implemented. Everybody in Europe does it. So it's the same, what probably going to happen over here as well. >> So what does GDPR mean for data scientists? >> I think GDPR is, I think it is needed. I think one of the things that may be slowing data science down is fear. People are afraid to share their data. Because they don't know what's going to be done with it. If there are some guidelines around it that should be enforced and I think, you know, I think it's been said but as long as a company could prove that it's doing due diligence to protect your data, I think no one is going to go to jail. I think when there's, you know, we reference a crime scene, if there's a heinous crime being committed, all right, then it's going to become obvious. And then you do go directly to jail. But I think having guidelines and even laws around privacy and protection of data is not necessarily a bad thing. You can do a lot of data, really meaningful data science, without understanding that it's Joe Caserta. All of the demographics about me. All of the characteristics about me as a human being, I think are still on the table. All that they're saying is that you can't go after Joe, himself, directly. And I think that's okay. You know, there's still a lot of things. We could still cure diseases without knowing that I'm Joe Caserta, right? As long as you know everything else about me. And I think that's really at the core, that's what we're trying to do. We're trying to protect the individual and the individual's data about themselves. But I think as far as how it affects data science, you know, a lot of our clients, they're afraid to implement things because they don't exactly understand what the guideline is. And they don't want to go to jail. So they wind up doing nothing. So now that we have something in writing that, at least, it's something that we can work towards, I think is a good thing. >> In many ways, organizations are suffering from the deer in the headlight problem. They don't understand it. And so they just end up frozen in the headlights. But I just want to go back one step if I could. We could get really excited about what it is and is not. But for me, the most critical thing there is to remember though, data breaches are happening. There are over 1,400 data breaches, on average, per day. And most of them are not trivial. And when we saw 1/2 a billion from Yahoo. And then one point one billion and then one point five billion. I mean, think about what that actually means. There were 47,500 Mongodbs breached in an 18 hour window, after an automated upgrade. And they were airlines, they were banks, they were police stations. They were hospitals. So when I think about frameworks like GDPR, I'm less worried about whether I'm going to see ads and be sold stuff. I'm more worried about, and I'll give you one example. My 12 year old son has an account at a platform called Edmodo. Now I'm not going to pick on that brand for any reason but it's a current issue. Something like, I think it was like 19 million children in the world had their username, password, email address, home address, and all this social interaction on this Facebook for kids platform called Edmodo, breached in one night. Now I got my hands on a copy. And everything about my son is there. Now I have a major issue with that. Because I can't do anything to undo that, nothing. The fact that I was able to get a copy, within hours on a dark website, for free. The fact that his first name, last name, email, mobile phone number, all these personal messages from friends. Nobody has the right to allow that to breach on my son. Or your children, or our children. For me, GDPR, is a framework for us to try and behave better about really big issues. Whether it's a socialist issue. Whether someone's got an issue with advertising. I'm actually not interested in that at all. What I'm interested in is companies need to behave much better about the treatment of data when it's the type of data that's being breached. And I get really emotional when it's my son, or someone else's child. Because I don't care if my bank account gets hacked. Because they hedge that. They underwrite and insure themselves and the money arrives back to my bank. But when it's my wife who donated blood and a blood donor website got breached and her details got lost. Even things like sexual preferences. That they ask questions on, is out there. My 12 year old son is out there. Nobody has the right to allow that to happen. For me, GDPR is the framework for us to focus on that. >> Dave: Lillian, is there a comment you have? >> Yeah, I think that, I think that security concerns are 100% and definitely a serious issue. Security needs to be addressed. And I think a lot of the stuff that's happening is due to, I think we need better security personnel. I think we need better people working in the security area where they're actually looking and securing. Because I don't think you can regulate I was just, I wanted to take the microphone back when you were talking about taking someone to jail. Okay, I have a background in law. And if you look at this, you guys are calling it a framework. But it's not a framework. What they're trying to do is take 4% of your business revenues per infraction. They want to say, "If a person signs up "on your email list and you didn't "like, necessarily give whatever "disclaimer that the EU said you need to give. "Per infraction, we're going to take "4% of your business revenue." That's a law, that they're trying to put into place. And you guys are talking about taking people to jail. What jail are you? EU is not a country. What jurisdiction do they have? Like, you're going to take pizza man Joe and put him in the EU jail? Is there an EU jail? Are you going to take them to a UN jail? I mean, it's just on its' face it doesn't hold up to legal tests. I don't understand how they could enforce this. >> I'd like to just answer the question on-- >> Security is a serious issue. I would be extremely upset if I were you. >> I personally know, people who work for companies who've had data breaches. And I respect them all. They're really smart people. They've got 25 plus years in security. And they are shocked that they've allowed a breach to take place. What they've invariably all agreed on is that a whole range of drivers have caused them to get to a bad practice. So then, for example, the donate blood website. The young person who was assist admin with all the right skills and all the right experience just made a basic mistake. They took a db dump of a mysql database before they upgraded their Wordpress website for the business. And they happened to leave it in a folder that was indexable by Google. And so somebody wrote a radio expression to search in Google to find sql backups. Now this person, I personally respect them. I think they're an amazing practitioner. They just made a mistake. So what does that bring us back to? It brings us back to the point that we need a safety net or a framework or whatever you want to call it. Where organizations have checks and balances no matter what they do. Whether it's an upgrade, a backup, a modification, you know. And they all think they do, but invariably we've seen from the hundreds of thousands of breaches, they don't. Now on the point of law, we could debate that all day. I mean the EU does have a remit. If I was caught speeding in Germany, as an Australian, I would be thrown into a German jail. If I got caught as an organization in France, breaching GDPR, I would be held accountable to the law in that region, by the organization pursuing me. So I think it's a bit of a misnomer saying I can't go to an EU jail. I don't disagree with you, totally, but I think it's regional. If I get a speeding fine and break the law of driving fast in EU, it's in the country, in the region, that I'm caught. And I think GDPR's going to be enforced in that same approach. >> All right folks, unfortunately the 60 minutes flew right by. And it does when you have great guests like yourselves. So thank you very much for joining this panel today. And we have an action packed day here. So we're going to cut over. The CUBE is going to have its' interview format starting in about 1/2 hour. And then we cut over to the main tent. Who's on the main tent? Dez, you're doing a main stage presentation today. Data Science is a Team Sport. Hillary Mason, has a breakout session. We also have a breakout session on GDPR and what it means for you. Are you ready for GDPR? Check out ibmgo.com. It's all free content, it's all open. You do have to sign in to see the Hillary Mason and the GDPR sessions. And we'll be back in about 1/2 hour with the CUBE. We'll be running replays all day on SiliconAngle.tv and also ibmgo.com. So thanks for watching everybody. Keep it right there, we'll be back in about 1/2 hour with the CUBE interviews. We're live from Munich, Germany, at Fast Track Your Data. This is Dave Vellante with Jim Kobielus, we'll see you shortly. (electronic music)

Published Date : Jun 24 2017

SUMMARY :

Brought to you by IBM. Really good to see you in Munich. a lot of people to organize and talk about data science. And so, I want to start with sort of can really grasp the concepts I present to them. But I don't know if there's anything you would add? So I'd love to take any questions you have how to get, turn data into value So one of the things, Adam, the reason I'm going to introduce Ronald Van Loon. And on the other hand I'm a blogger I met you on Twitter, you know, and the pace of change, that's just You're in the front lines, helping organizations, Trying to govern when you have And newest member of the SiliconANGLE Media Team. and data science are at the heart of it. It's funny that you excluded deep learning of the workflow of data science And I haven't seen the industry automation, in terms of the core And baking it right into the tools. that's really powering a lot of the rapid leaps forward. What's the distinction? It's like asking people to mine classifieds. to layer, and what you end up with the ability to do higher levels of abstraction. get the result, you also have to And I guess the last part is, Dave: So I'd like to switch gears a little bit and just generally in the community, And this means that it has to be brought on one end to, But Chris you have a-- Look at the major breaches of the last couple years. "I have to spend to protect myself, And that's the way I think about it. and the data are the models themselves. And I think that it's very undisciplined right now, So that you can sell more. And a lot of times they can't fund these transformations. But the first question I like to ask people And then figure out how you map data to it. And after the month, you check, kind of a data broker, the business case rarely So initially, indeed, they don't like to use the data. But do you have anything to add? and deploy it in more areas of the business. There's the whole issue of putting And it's a lot cheaper to store data And then start to build some fully is that the speed to value is just the data and someone else has to manage the problem. So, you know, think of it in terms on that theme, when you think about from IDC that says, "About 43% of the data all aircraft and all carriers have to be, most of the deep learning models like TensorFlow geared to IOT, I'm sorry, go ahead. I mean in the announcement of having "lift and shift to the Cloud." And only the metadata that we need And you can push that to a device. And it could be that you got to I'd like somebody in the panel to And on the other hand, you see that But fill in some of the gaps there. And the right to data transfer. a good chunk of that may have to go away So Lillian, as a consumer this is designed to protect you. I've looked over the GDPR and to me You know, EU overreach in the post Brexit era, But I don't think anyone's going to go to jail, on day one. And so we had this response with ad blocking. And so, GDPR is kind of a response to saying, a boondoggle for lawyers in the EU What's the value in that? With the data that they don't have. leads to a conversion, it doesn't matter who you are And they inferred correctly even to figure out who you are, and what you're doing, And I like the targeted advertising. And I think it's similar to what happened I think no one is going to go to jail. and the money arrives back to my bank. "disclaimer that the EU said you need to give. I would be extremely upset if I were you. And I think GDPR's going to be enforced in that same approach. And it does when you have great guests like yourselves.

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Chris	PERSON	0.99+
David Floyer	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Ronald	PERSON	0.99+
Lillian Pierson	PERSON	0.99+
Dave	PERSON	0.99+
Lillian	PERSON	0.99+
Jim	PERSON	0.99+
Joe Caserta	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Dez	PERSON	0.99+
Nebraska	LOCATION	0.99+
Adam	PERSON	0.99+
Europe	LOCATION	0.99+
Hillary Mason	PERSON	0.99+
87,400	QUANTITY	0.99+
Topeka	LOCATION	0.99+
Airbus	ORGANIZATION	0.99+
Thailand	LOCATION	0.99+
Brussels	LOCATION	0.99+
Australia	LOCATION	0.99+
EU	ORGANIZATION	0.99+
10%	QUANTITY	0.99+
Dez Blanchfield	PERSON	0.99+
Chris Penn	PERSON	0.99+
Omaha	LOCATION	0.99+
Munich	LOCATION	0.99+
May of 2016	DATE	0.99+
May 25th 2018	DATE	0.99+
Sydney	LOCATION	0.99+
nine	QUANTITY	0.99+
Germany	LOCATION	0.99+
17 pages	QUANTITY	0.99+
Joe	PERSON	0.99+
80%	QUANTITY	0.99+
$89	QUANTITY	0.99+
Yahoo	ORGANIZATION	0.99+
France	LOCATION	0.99+
June 18	DATE	0.99+
83, 81,000	QUANTITY	0.99+
30 years	QUANTITY	0.99+
Ronald Van Loon	PERSON	0.99+
Google	ORGANIZATION	0.99+
USA	LOCATION	0.99+
thousands	QUANTITY	0.99+
2013	DATE	0.99+
one point	QUANTITY	0.99+
100%	QUANTITY	0.99+

Bryan Duxbury, StreamSets | Spark Summit East 2017

>> Announcer: Live from Boston, Massachusetts. This is "The Cube" covering Spark Summit East 2017. Brought to you by Databricks. Now here are your hosts Dave Volante and George Gilbert. >> Welcome back to snowy Boston everybody. This is "The Cube." The leader in live tech coverage. This is Spark Summit. Spark Summit East #SparkSummit. Bryan Duxbury's here. He's the vice president of engineering at StreamSets. Cleveland boy! Welcome to "The Cube." >> Thanks for having me. >> You've very welcome. Tell us, let's start with StreamSets. We're going to talk about Spark and some of the use cases that it's enabling and some of the integrations you're doing. But what does StreamSets do? >> Sure, StreamSets is a data movement software. So I like to think of it either the first mile or the last mile of a lot of different analytical or data movement workflows. Basically we build a product that allows you to build a workflow, or build a data pipeline that doesn't require you to code. It's a graphical user interphase for dropping an origin, several destinations, and then lightweight transformations onto a canvas. You click play and it runs. So this is kind of different than, a lot of the market today is a programming tool or a command line tool. That still requires your systems engineers or your unfortunate data scientists pretending to be systems engineers to do systems engineering. To do a science project to figure out how to move data. The challenge of data movement I think is often underplayed how challenging it is. But it's extremely tedious work. You know, you have to connect to dozens or hundreds of different data sources. Totally different schemas. Different database drivers, or systems altogether. And it break all the time. So the home-built stuff is really challenging to keep online. When it goes down, your business is not, you're not moving data. You can't actually get the insights you built in the first place. >> I remember I broke into this industry you know, in the days of mainframe. You used to read about them and they had this high-speed data mover. And it was this key component. And it had to be integrated. It had to be able to move, back then, it was large amounts of data fast. Today especially with the advent of Hadoop, people say okay don't move the data, keep it in place. Now that's not always practical. So talk about the sort of business case for starting a company that basically moves data. >> We handle basically the one step before. I agree with you completely. Many data analytical situations today where you're doing like the true, like business-oriented detail, where you're actually analyzing data and producing value, you can do it in place. Which is to say in your cluster, in your Spark cluster, all the different environments you can imagine. The problem is that if it's not there already, then it's a pretty monumental effort to get it there. I think we see. You know a lot of people think oh I can just write a SQL script, right? And that works for the first two to 20 tables you want to deploy. But for instance, in my background, I used to work at Square. I ran a data platform there. We had 500 tables we had to move on a regular basis. Coupled with a whole variety of other data sources. So at some point it becomes really impractical to hand-code these solutions. And even when you build your own framework, and you start to build tools internally, you know, it's not your job really, these companies, to build a world class data movement tool. It's their job to make the data valuable, right? And actually data movement is like utility, right. Providing the utility, really the thing to do is be productive and cost effective, right? So the reason why we build StreamSets, the reason why this thing is a thing in the first place, is because we think people shouldn't be in the business of building data movement tools. They should be in the business of moving their data and then getting on with it. Does that make sense? >> Yeah absolutely. So talk about how it all fits in with Spark generally and specifically Spark coming to the enterprise. >> Well in terms of how StreamSets connects to stuff, we deploy in every way you can imagine, whether you want to run your own premise, on your own machines, or in the Cloud. It's up to you to deploy however you like. We're not prescriptive about that. We often get deployed on the edge of clusters, wether it's your Hadoop cluster or your Spark cluster. And basically we try not to get in the way of these analysis tools. There are many great analytical tools out there like Spark is a great example. We focus really on the moving of data. So what you'll see is someone will build a Spark streaming application or some big Spark SQL thing that actually produces the reports. And we plug in ahead of that. So if you're data is being collected from, you know, Edge web logs or some thing or some Kafka thing or a third party AVI or scripting website. We do the first collection. And then it's usually picked up from there with the next tool. Whether it's Spark or other things. I'm trying to think about the right way to put this. I think that people who write Spark they should focus on the part that's like the business value for them. They should be doing the thing that actually is applying the machine learning model, or is producing the report that the CEO or CTO wants to see. And move away from the ingest part of the business. Does that make sense? >> [] Yeah. >> Yeah. When the Spark guys sort of aspire to that by saying you don't have to worry about exactly when's delivery. And you know you can make sure this sort of guarantee, you've got guarantees that will get from point A to point B. >> Bryan: Yeah. >> Things like that. But all those sources of data and all those targets, writing all those adapters is, I mean, that's been a La Brea tar pit for many companies over time. >> In essence that is our business. I think that you touch on a good point. Spark can actually do some of these things right. There's not complete, but significant overlap in some cases. But the important difference is that Spark is a cluster tool for working with cluster data. And we're not going to beat you running a Spark application for consuming from Kafka to do your analysis. But you want to use Spark for reading local files? Do you want to use Spark for reading from a mainframe? Like these are things that StreamSets is built for. And that library of connectors you're talking about, it's our bread and butter. It's not your job as a data scientist, you know, applying Spark, to build a library of connectors. So actually the challenge is not the difficulty of building any one connector, because we have that down to an art now. But we can afford to invest, we can build a portfolio of connectors. But you as a user of Spark, can only afford to do it on demand. Reactive. And so that turn around time, of the cost it might take you to build that connector is pretty significant. And actually I often see the flow side. This is a problem I faced at Square, which was that people asked me to integrate new data sources, I had to say no. Because it was too rare, it was too unusual for what we had to do. We had other things to support. So the problem with that is that I have no idea what kind of opportunity cost I left behind. Like what kind of data we didn't get, kind of analysis we couldn't do. And with an approach like StreamSets, you can solve that problem sort of up front even. >> So sort of two follow ups. One is it would seem to be an evergreen effort to maintain the existing connectors. >> Bryan: Certainly. >> And two, is there a way to leverage connectors that others have built, like the Kafka connect type stuff. >> Truthfully we are a heavy-duty user of open source software so our actual product, if you dig in to what you see, it's a framework for executing pipelines. And it's for connecting other software into our product. So it's not like when we integrate Kafka we built a build brand new blue sky Kafka connector. We actually integrate what stuff is out there. So our idea is to bring as much of that stuff in there as we can. And really be part of the community. You know, our product is also open source. So we play well with the community. We have had people contribute connectors. People who say we love the product, we need it to connect to this other database. And then they do it for us. So it's been a pretty exciting situation. >> We were talking earlier off-camera, George and I have been talking all week about the badge workloads, interactive workloads, now you've got this sort of new emerging workloads, continuous screening workloads, which is in the name. What are you seeing there? And what kind of use cases is that enabling? >> So we're focused on mostly the continuous delivery workload. We also deliver the batch stuff. We're finding is people are moving farther and farther away from batch in general. Because batch was not the goal it was a means to the end. People wanted to get their data into their environment, so they could do their analysis. They want to run their daily reports, things like that. But ask any data scientist, they would rather the data show up immediately. So we're definitely seeing a lot of customers who want to do things like moving data live from a log file into Hadoop they can read immediately, in the order of minutes. We're trying to do our best to enable those kind of use cases. In particular we're seeing a lot of interest in the Spark arena, obviously that's kind of why we're here today. You know people want to add their event processing, or their aggregation, and analysis, like Spark, especially like Spark SQL. And they want that to be almost happening at the time of ingest. Not once it landed, but like when it's happening. So we're starting to build integration. We have kind of our foot in the door there, with our Spark processor. Which allows you to put a Spark workflow right in the middle of your data pipeline. Or as many of them as you want in fact. And we all sort of manage the lifecycle of that. And do all those connections as required to make your pipeline pretend to have a Spark processor in the middle. We really think that with that kind of workload, you can do your ingest, but you can also capture your real-time analytics along the way. And that doesn't replace batch reporting for say that'll happen after the fact. Our your daily reports or what have you. But it makes it that much easier for your data scientists to have, you know, a piece of intelligence that they had in flight. You know? >> I love talking to someone who's a practitioner now sort of working for a company that's selling technology. What do you see, from both perspectives, as Spark being good at? You know, what's the best fit? And what's it not good at? >> Well I think that Spark is following the arc of like Hadoop basically. It started out as infrastructure for engineers, for building really big scary things. But it's becoming more and more a productivity tool for analysts, data scientist, machine-learning experts. And we see that popping up all the time. And it's really exciting frankly, to think about these streaming analytics that can happen. These scoring machine-learning models. Really bringing a lot more power into the hands of these people who are not engineers. People who are much more focused on the semantic value of the data. And not the garbage in garbage out value of the data. >> You were talking before about it's really hard, data movement and the data's not always right. Data quality continues to be a challenge. >> Bryan: Yeah. >> Maybe comment on that. State the data quality and how the industry is dealing with that problem. >> It is hard, it is hard. I think that the traditional approach to data quality is to try and specify a quality up front. We take the opposite approach. We basically say that it's impossible to know that your data will be correct at all times. So we have what we call schema drift tools. So we try to go, we say like intent-driven approach. We're interacting with your data. Rather then a schema driven approach. So of course your data has an implicit schema as it's passing through the pipeline. Rather than saying, let's transform com three, we want you to use the name. We want you to be aware of what it is you're trying to actually change and affect. And the rest just kind of flows along with it. There's no magic bullet for every kind of data-quality issue or schema change that could possibly come into your pipeline. We try to do the best to make it easy for you to do effectively the best practice. The easiest thing that will survive the future, build robust data pipelines. This is one of the biggest challenges I think with like home-grown solutions. Is that it's really easy to build something that works. It's not easy to build something that works all the time. It's very easy to not imagine the edge cases. 'Cause it might take you a year until you've actually encountered you know, the first big problem. The real, the gotcha that you didn't consider when you were building your own thing. And those of us at StreamSets who have been in the industry and on the user side, we've had some of these experiences. So we're trying to export that knowledge in the product. >> Dave: Who do you guys sell to? >> Everybody. (laughing) We see a lot of success today with, we call it Hadoop replatforming. Which is people who are moving from their huge variety of data sources environment into like a Hadoop data-like kind of environment. Also Cloud, people are moving into the Cloud. The need a way for their data to get from wherever it is to where they want it to be. And certainly people could script these things manually. They could build their own tools for this. But it's just so much more productive to do it quickly in a UI. >> Is it an architect who's buying your product? Is it a developer? >> It's a variety. So I think our product resonates greatly with a developer. But also people who are higher up in the chain. People who are trying to design their whole topology. I think the thing I love to talk about is everyone, when they start on a data project, they sit down and they draw this beautiful diagram with boxes and arrows that says here's where the data's going to go. But a month later, it works, kind of, but it's never that thing. >> Dave: Yeah because the data is just everywhere. >> Exactly. And the reality is that what you have to do to make it work correctly within SLA guidelines and things like that is so not what you imagined. But then you can almost never go backwards. You can never say based on what I have, give me the box scenarios, because it's a systems analysis effort that no one has the time to engage in. But since StreamSets is actually instruments, every step of the pipeline, and we have a view into how all your pipelines actually fit together. We can give you that. We can just generate it. So we actually have a product. We've been talking about the StreamSet data collector which is the core like data movement product. We have like our enterprise edition, which is called the Dataflow Performance Manager, or DPM, It basically gives you a lot of collaboration and enterprise grade authentication. And access control, and the commander control features. So it aggregates your metrics across all your data collectors. It helps you visualize your topology. So people like your director of analytics, or your CIO, who want to know is everything okay? We have a dashboard for them now. And that's really powerful. It's a beautiful UI. And it's really a platform for us to build visualizations with more intelligence. That looks across your whole infrastructure. >> Dave: That's good. >> Yeah. And then the thing is this is strangely kind of unprecedented. Because, you know, again, the engineer who wants to build this himself would say, I could just deploy Graphite. And all of a sudden I've got graphs it's fine right. But they're missing the details. What about the systems that aren't under your control? What about the failure cases? All these things, these are the things we tackle. 'Cause it's our business we can afford to invest massively and make this a really first-class data engineering environment. >> Would it be fair to say that Kafka sort of as it exists today is just data movement built on a log, but that it doesn't do the analytics. And it doesn't really yet, maybe it's just beginning to do some of the monitoring you know, with a dashboard, or that's a statement of direction. Would it be fair to say that you can layer on top of that? Or you can substitute on top of it with all the analytics? And then when you want the really fancy analytic soup, you know, call out to Spark. >> Sure, I would say that for one thing we definitely want to stay out of the analytics base. We think there's many great analytics tools out there like Spark. We also are not a storage tool. In fact, we're kind of like, we're queue-like but we view ourselves more like, if there's a pipe and a pump, we're the pump. And Kafka is the pipe. I think that from like a monitoring perspective, we monitor Kafka indirectly. 'Cause if we know what's coming out, and we know what's going in later, we can give you the stats. And that's actually what's important. This is actually one of the challenges of having sort of a home-grown or disconnected solution, is that stitching together so you understand the end to end is extremely difficult. 'Cause if you have a relational database, and a Kafka, and a Hadoop, and a Spark job, sure you can monitor all those things. They all have their own UIs. But if you can't understand what the is on the whole system you're left like with four windows open trying to figure out where things connect. And it's just too difficult. >> So just on a sort of a positioning point of view for someone who's trying to make sense out of all the choices they have, to what extent would you call yourself a management framework for someone who's building these pipelines, whether from Scratch, or buying components. And to what extent is it, I guess, when you talk about a pump, that would be almost like the run time part of it. >> Bryan: Yeah, yeah. >> So you know there's a control plane and then there's a data plane. >> Bryan: Sure. >> What's the mix? >> Yeah well we do both for sure. I mean I would say that the data point for us is StreamSet's data collector. We move data, we physically move the data. We have our own internal pipeline execution engine. So it doesn't presuppose any other existing technologies, not dependent on Hadoop or Spark or Kafka or anything. You know to some degree data collector is also the control plane for small deployments. Because it does give you start to stop commanding control. Some metrics monitoring, things like that. Now, what people need to expand beyond the realm of single data collector, when they have enterprises with more than one business unit, or data center, or security zone, things like that. You don't just deploy one data collector, you deploy a bunch, dozens or hundreds. And in that case, that's where dataflow performance manager again comes in, as that control plane. Now dataflow performance manager has no data in it. It does not pass your actual business data. But it does again aggregate all of your metrics from all your data collectors and gives you a unified view across your whole enterprise. >> And one more follow-up along those lines. When you have a multi-vendor stack, or a multi-vendor pipeline. >> Bryan: Yeah. >> What gives you the meta view? >> Well we're at the ins and outs. We see the interfaces. So in theory if someone were to consume data out of Kafka do something right. Then there's another job later, like a Spark job. >> George: Yeah. >> So we don't automatic visibility for that. But our plan in the future is to expand as dataflow performance manager to take third party metric sources effectively. To broaden the view of your entire enterprise. >> You've got a bunch of stuff on your website here which is kind of interesting. Talking about some of the things we talked about. You know taming data drift is one of your papers. The silent killer of data integrity. And some other good resources. So just in sort of closing, how do we learn more? What would you suggest? >> Sure, yeah please visit the website. The product is open source and free to download. Data collector is free to download. I would encourage people to try it out. It's really easy to take for a spin. And if you love it you should check out our community. We have a very active Slack channel and Google group, which you can find from the website as well. And there's also a blog full of tutorials. >> Yeah well you're solving gnarly problems that a lot of companies just don't want to deal with. That's good thanks for doing the dirty work, we appreciate it. >> Yeah my pleasure. >> Alright Bryan thanks for coming on "The Cube." >> Thanks for having me. >> Good to see you. You're welcome. Keep right there buddy we'll be back with our next guest. This is "The Cube" we're live from Boston Spark Summit. Spark Summit East #SparkSummit right back. >> Narrator: Since the dawn.

Published Date : Feb 9 2017

SUMMARY :

Brought to you by Databricks. He's the vice president of engineering at StreamSets. and some of the integrations you're doing. And it break all the time. And it had to be integrated. all the different environments you can imagine. generally and specifically Spark coming to the enterprise. And move away from the ingest part of the business. When the Spark guys sort of aspire to that But all those sources of data and all those targets, of the cost it might take you to build that connector to maintain the existing connectors. like the Kafka connect type stuff. And really be part of the community. about the badge workloads, interactive workloads, We have kind of our foot in the door there, What do you see, from both perspectives, And not the garbage in garbage out value of the data. data movement and the data's not always right. and how the industry is dealing with that problem. The real, the gotcha that you didn't consider Also Cloud, people are moving into the Cloud. I think the thing I love to talk about is And the reality is that what you have to do What about the systems that aren't under your control? And then when you want the really fancy And Kafka is the pipe. to what extent would you call yourself So you know there's a control plane and gives you a unified view across your whole enterprise. When you have a multi-vendor stack, We see the interfaces. But our plan in the future is to expand Talking about some of the things we talked about. And if you love it you should check out our community. That's good thanks for doing the dirty work, Good to see you.

ENTITIES

Entity	Category	Confidence
Bryan	PERSON	0.99+
Dave	PERSON	0.99+
Dave Volante	PERSON	0.99+
George Gilbert	PERSON	0.99+
George	PERSON	0.99+
Bryan Duxbury	PERSON	0.99+
StreamSets	ORGANIZATION	0.99+
first mile	QUANTITY	0.99+
Boston, Massachusetts	LOCATION	0.99+
dozens	QUANTITY	0.99+
Spark	TITLE	0.99+
500 tables	QUANTITY	0.99+
first	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
20 tables	QUANTITY	0.99+
Kafka	TITLE	0.99+
hundreds	QUANTITY	0.99+
One	QUANTITY	0.99+
more than one business unit	QUANTITY	0.98+
Boston	LOCATION	0.98+
a year	QUANTITY	0.98+
Spark SQL	TITLE	0.98+
today	DATE	0.98+
first collection	QUANTITY	0.98+
one	QUANTITY	0.98+
a month later	DATE	0.98+
both	QUANTITY	0.98+
two	QUANTITY	0.98+
SQL	TITLE	0.98+
StreamSets	TITLE	0.98+
Today	DATE	0.97+
Databricks	ORGANIZATION	0.97+
Spark Summit East	LOCATION	0.97+
one data collector	QUANTITY	0.97+
Boston Spark Summit	LOCATION	0.97+
Spark Summit East 2017	EVENT	0.97+
Spark Summit East	EVENT	0.96+
one step	QUANTITY	0.96+
Cleveland	LOCATION	0.95+
both perspectives	QUANTITY	0.95+
StreamSet	ORGANIZATION	0.95+
Slack	ORGANIZATION	0.95+
Square	ORGANIZATION	0.95+
Hadoop	TITLE	0.94+
four windows	QUANTITY	0.93+
first two	QUANTITY	0.93+
Spark Summit	EVENT	0.93+
single data collector	QUANTITY	0.92+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Scratch: