Image Title

Search Results for East:

Dr. Amanda Broderick, University of East London | AWS Imagine 2019


 

(upbeat music) >> Narrator: From Seattle, Washington it's theCUBE. Covering AWS Imagine. Brought to you by Amazon Web Services. >> Hey, welcome back everybody, Jeff Rick here with theCUBE. We're at AWS Imagine, it's a show all about education. That's whether it's university, K to 12, community college, post-military service. Amazon is very, very committed to education market. It's part of the public sector group underneath Teresa Carlson. This is the second year of the conference. We're excited to be back, and really some interesting conversations about how does education move forward. 'Cause it doesn't necessarily have the best reputation for being the most progressive industry out there. So we're excited to have our next guest all the way from London, she's Dr. Amanda Broderick, the Vice-Chancellor and President of the University of East London. Welcome. >> Thank you very much. Thank you, very nice to meet you. >> Absolutely, so first off before we get into it, just kind of your impressions of this event, and kind of what Amazon is doing. Teresa did the keynote today, which is not insignificant. She's a super busy lady, and kind of what does this ecosystem, these resources, this kind of focus, do for you as an educator? >> The main reason that we're working with AWS in such a significant way is actually because of our genuine values alignment. Institutionally, those core priorities are really where we want to go as an organization. And for me this conference, this summit, has been an opportunity to share best practice, to innovate, to truly explore the opportunity to disrupt for ultimately, the end goal. Which is about the education, the development of our next generation, and the support of talent development for the future. >> But unfortunately, a lot of times it feels like institutions put the institution first, and we're seeing a lot of conversations here in the US about these ridiculously crazy, large endowments that sit in piles of money. And is the investment getting back to the students? Are we keeping our eye on the ball? That it's the students that need the investment, not all the other stuff, all the other distractions, that get involved in the higher education. >> I suppose that is where the University of East London is fundamentally different. Core to our mission is driving social mobility, and as such we have to be absolutely clear what those learner outcomes are, and they are about being able to access and accelerate in their careers, and indeed in their lifelong learning to enable them to progress in portfolio careers. >> Right, so it's interesting ahead the three topics for this shows is tomorrow's workforce, which we've talked a lot about the education. The role of ML, which I think is interesting that it got its own bullet. Just because machine learning is so pervasive, and software, and doing lots of things. And the one that that struck me is the effort to have higher predictability on the success of the student, and to really make sure that you're catching problems early, if there is a problem. You're actually using a lot of science to better improve the odds of that student success. A lot of conversation here about that topic. >> Absolutely, absolutely, and that machine learning approach is one of the key dimensions in our relationship with AWS. And this is not just about the student outcomes around continuation, engagement, progression, student success, but actually for the University of East London, it's also been about the identification of students at risk. So we fundamentally believe that health gain is a precondition of learning gain. Particularly important for an institution like ours that is so socially inclusive, and therefore what we're doing, we're actually one of ten institutions that have been funded by the government and working in partnership with AWS as a pilot to share best practice across the UK as a whole, is to identify the proxies. For example, mental health issues, to be able to signpost and traffic light the sign posting to areas of support and to be able to direct prevention, intervention and postvention strategies to those students at risk. And that project is actually a key area of our partnership development with AWS. >> And how long has that been going on? We talked it a little bit about it before we turn the cameras on, and it just seems so foundational to me that without putting in that infrastructure for these kids, regardless of their age, their probability of success on top of that, without a good foundation is so much less. So when did this become a priority? How are you prioritizing it? What are some of the really key measures that you're using to make sure that you're making progress against this goal? >> Absolutely, so the university has made good progress in terms of the fundamental issues of identifying where the correlations and the causations are between both physical and mental health and well-being, and outcomes. What we haven't been able to do at this point is the scalability of this issue, and that's really where this pilot project, which has literally been announced in the last couple of weeks, that we're working very closely with AWS in order to convert that core foundational research and development into scalable solutions. Not just for my own university, but actually for the sector as a whole. >> Right, so we talked about academic institutions, maybe not necessarily have the best reputation for innovation, especially kind of old storied ones with old ivy plants growing up old old brick walls. Is this a new kind of realization of the importance of this? Is this coming from maybe some of the more vocational kind of schools, or is it coming from the top? Do they realize that there's more to this than just making sure people study, and they know what they're doing when they turn in their test and get their paper in on time? >> It's both a top-down and bottom-up approach. It's fundamental to the University of East London. It's new ten year strategy vision 2028. Health gain is that precondition of learning gain. It's fundamental to the realization of our learner's success. But also it's come from a groundswell of the research and development outcomes over a number of years. So it's absolutely been the priority for the institution from September 2018, and we've been able to accelerate this over the last few months. >> So important. Such important work. Flipping the point a little bit on to something a little lighter, a little bit more fun, it's really innovation on the engagement with the students around things like mobile. We've had a lot of conversations here about integrating Alexa, and voice, and competing with online, and competing with other institutions, and being a little bit more proactive in engaging with the customer as your students. I wonder if you can share some thoughts as to how that has evolved over time. Again, you've been in the business for a while, and really starting to cater and be innovative on that front end, versus the back end, to be more engaging and help students learn in different ways. Where they are in little micro segments. It's a very different kind of approach. >> It absolutely is and one of our four major facilitating transformation projects, it's called our digital verse project, and that is across all of our activities of an institution, in terms of business transformation, our particular priority is prospect engagement, and how we actually convert our potential learners in more effective ways. Secondly, enhancing deeper learning, and how we then produce better learner outcomes. Thirdly, how we develop access to new ways of educational provision, 24/7 global access. And fourthly, how do we connect with employers in partnership to make sure that we get those challenges around pre-selection recruitment strategies, and we're unable to get the students, our learners, into careers post graduation. >> Right, and then what's the kind of feedback from the teachers and the professors? They have so much on their plate. Right, they've got their core academic research that they're doing, they're teaching their students, they've got a passion around that area. I always tell people it's like driving in the car in the snow at night with your headlights on, right. Just like all types of new regs that are coming in and requirements and law, and this that and the other. Now we're coming in with this whole four point digital transformation. Are they excited, are they overwhelmed, are they like finally, we're getting to do something different? I mean what's the take within the academics, specifically in your school? >> I think the answer to that is all of the above. >> All of the above. >> It really reflects the classic adoption curve. So you do have the innovators, you have the early adopters, and then you also have the laggards at the other end. And an often actually, the most traditional academics that have been doing things for many, many years, who are very set in their ways, if you expose them to new opportunities, new experiences, and actually provide them with the tools to innovate, they could be some of the best advocates for the transformation and we've certainly found that to be the case. >> Good, well Amanda, thanks for taking a few minutes of your time, it sounds like they're going to start the dancing here behind us soon. So I think we'll have to leave it there, but I look forward to seeing you sometime in London. >> Thank you very much. >> Alright, she's Dr. Amanda Broderick, I'm Jeff Rick, you're watching theCUBE. We're at AWS Imagine in Seattle. Thanks for watching we'll see you next time. (upbeat music)

Published Date : Jul 10 2019

SUMMARY :

Brought to you by Amazon Web Services. of the University of East London. Thank you very much. and kind of what Amazon is doing. and the support of talent development for the future. And is the investment getting back to the students? and they are about being able to access and accelerate is the effort to have higher predictability is one of the key dimensions in our relationship with AWS. and it just seems so foundational to me is the scalability of this issue, maybe not necessarily have the best reputation But also it's come from a groundswell of the research and really starting to cater and be innovative in partnership to make sure that we get those challenges in the snow at night with your headlights on, right. found that to be the case. the dancing here behind us soon. Thanks for watching we'll see you next time.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
AWSORGANIZATION

0.99+

TeresaPERSON

0.99+

September 2018DATE

0.99+

Jeff RickPERSON

0.99+

AmazonORGANIZATION

0.99+

AmandaPERSON

0.99+

Amazon Web ServicesORGANIZATION

0.99+

LondonLOCATION

0.99+

University of East LondonORGANIZATION

0.99+

Teresa CarlsonPERSON

0.99+

SeattleLOCATION

0.99+

Amanda BroderickPERSON

0.99+

USLOCATION

0.99+

oneQUANTITY

0.99+

Seattle, WashingtonLOCATION

0.99+

second yearQUANTITY

0.99+

UKLOCATION

0.98+

AWS ImagineORGANIZATION

0.98+

bothQUANTITY

0.98+

todayDATE

0.97+

fourthlyQUANTITY

0.96+

AlexaTITLE

0.95+

ThirdlyQUANTITY

0.95+

SecondlyQUANTITY

0.93+

ten institutionsQUANTITY

0.92+

tomorrowDATE

0.92+

2019DATE

0.9+

firstQUANTITY

0.89+

four pointQUANTITY

0.88+

2028DATE

0.87+

ten yearQUANTITY

0.85+

four majorQUANTITY

0.79+

last couple of weeksDATE

0.74+

three topicsQUANTITY

0.7+

theCUBEORGANIZATION

0.62+

PresidentPERSON

0.6+

lastDATE

0.57+

KOTHER

0.57+

12QUANTITY

0.55+

theCUBETITLE

0.34+

Marvin Martinez, East Los Angeles College | AWS Imagine 2018


 

>> From the Amazon Meeting Center in Downtown Seattle, it's the theCUBE. Covering, Imagine: A Better World, a global education conference, sponsored by Amazon Web Services. >> Hey welcome back everybody. Jeff Rick here with theCUBE. We're in Downtown Seattle Washington at the AWS Imagine Education Conference. First one they've ever done about 900 registrants. People from over 20 countries are here. Theresa Carlson gave the kickoff and it's a pretty exciting event. We've seen this movie before with Amazon. They get involved in a project, and it grows and grows and grows. So this is all about education. It's about education institutions. It's about students obviously, which are the core of education, and we're really excited to have our next guest. It was a big announcement that happened today. He's Marvin Martinez, the President of East Los Angeles College. Marvin great to see you. >> Thank you, pleasure to be here. >> So you're getting ready to go up on stage. it's a big announcement so tell about what it is. It's called the California Cloud... >> Computing. >> Computing Initiative. >> So this is what we've done. We've been developing for the last year a certificate where students can take a number of classes, which is basically a total of 15 units, and they're able to earn at the end of 15 units, a certificate in cloud computing. And the goal is to get them trained quickly to get them out to work quickly. Eventually we hope that the certificate evolves into a degree program, so then we're hoping that the students come back and they get their associate of our certificate and they're able to get even a better job, which is really the goal of this program is we want to get them started, want to get them excited, get them into an entry-level type of job, then they will know they like it. They're going to come back. They'll get that degree, you know do even better right. >> So let me, I just want to make sure I get this. This is a California Cloud Workforce Project. So it's really about the workforce and giving these kids the skills. So it's funny though Marvin where everybody says technology is taking away jobs. They forget yeah they take away some jobs, but there's new jobs created. >> All the time. >> All the time, there's a ton of openings especially in the engineering field and in the cloud, but so what are some of the cloud skills specifically that that kids are taking to get the certificate? >> Well you know the skills they're taking specifically so they could eventually work with some of the major industries in our area. Obviously from Amazon and other similar industries and similar businesses, and there's many of them. Los Angeles you know quickly is becoming the new Silicon Valley. So a lot of industries are moving. They call us all the time, they call me all the time, and say that you have trained students. We will hire them right now and we'll pay them a good salary. So no doubt it's a motivation for us because that's who we are as community colleges. We are here to serve students. We are here to get them trained, get them up there quickly and respond to the needs of industry, that area. >> So it's a really interesting planning that it's the community colleges that you guys have all come together. I think the number's 19 as part of this. So A, you know that you're doing it as a unified effort. So kids at a broad area can take advantage, and also you're also partnering with individual high schools. Each Community College is partners with an individual high school. So how does that work? How does that kind of come into fruition? >> Well you know, one thing that we want to do is that as we work with high schools, high schools today are also under pressure to ensure that their students are being trained well and that if they just get a high school diploma they can go and work somewhere. But also today high schools are getting smart. They're saying hey how do we work with a local college so that when students graduate, they graduate with a high school diploma and a degree from a college. So and why are they doing that because they know in order to be competitive, a young person needs to have these degrees. Today if you want to be competitive a high school diploma may not be enough. So we notice that motivation there. Secondly we're able to get students on a college campus, get them developed get them, they're mature, get them to take a college-level course and then they're able to go out and obviously and work once they complete this program. So the relationship is a natural one. It's one that high schools are seeking from us, which is great. That has not been the case all the time. Usually we've gone to them, but now they're coming to us and saying we need you help us out. >> The part I like about it too is the kids are smart. And they're like why am I taking philosophy? How am I going to use philosophy in my job, that or why am I taking this or why am I taking that? These are really concrete skills that A, they can go look in the newspaper today or I guess I don't know if they look in newspaper for jobs because couldn't find a newspaper if you threw it at them, but they could go seek the job listings at the Amazon sites and also they are working with this technology, they live in this technology, so it's not something foreign or something new. It's something they experience every day. So it's got to be a pretty easy sell I would imagine. >> It's an easy sell. Young people today are different than the way that we grew up. I grew up at a time where there were no cell phones, there was no bottled water. It was a whole different time. Young people today as you're seeing grow up with these technologies. It's part of the who they are. They more than just embraced it. So they welcome to use it in any way they can. So when we propose programs like these, guess what happens? They enroll en masse and that's because they understand it. They identify with it. Will they be willing to enroll in a Shakespeare class? They might but not as much as a class like this one. So no doubt the population today has changed, so part of my job is to introduce programs on the campus that I know will generate that kind of enrollment and interest. So we know that a program like this will do that and we just need to recognize the fact that the world has changed. Let me just add that we don't do that world's education institutions. As institutions we're some of the most conservative institutions in the history of this country. So for us to change it takes quite a lot. So what's forcing us to change, what was forcing us to change is that enrollment is down and not just in many of our colleges in LA but throughout the country. Enrollment is-- >> In Community colleges generally or colleges in general? Community colleges. >> Community colleges throughout the nation enrollment is down. And enrollment is down for a number of reasons. There's more jobs out there, so students are looking to go out and work, but also enrollment is down because of the curriculum and the courses that we have are just not interesting to them. So I think a program like this will help the campus. A program like this will get more students to come and take advantage of an incredible education that they can get at our campuses. >> I was just curious kind of what were the drivers of enrollment before that have kind of fallen away? Was it a particular type of skill set? Was it just that they don't want it generic anymore? They got to go get a job? I'm just curious if there was something that you had before that was appealing that you have now that's just not appealing anymore? >> Good questions. So the last time our economy was in bad shape when the employment was down. That was back around 2008-2009. Well guess what happened in our campuses? Enrollment was up. So when the economy is in bad shape people come back to school. When the economy is in great shape like it is today where there is a lot of jobs, enrollment is down. So we don't see the economy going down at all in a number of years. >> Anytime soon. >> So we have to develop programs that we think will be of interest to students first. Secondly we have to respond to the needs of the new economy. The new economy is now being dominated by these new technologies. We know about it, young people know about it. So when we develop a program like this and we know that it will generate interest. It will generate enrollment. And in many ways that's what drives the funding for a college. We're funded on the basis of how many people we enroll. So if we don't enroll a lot people, we have less money, so no doubt there's a motivation for us, a motivation for the entire system, to really partner with Amazon. And figure out a way for us to really get students train and to get them, hopefully get them a good job. >> So you segued perfectly. My last question was going to be kind of the role of Amazon in AWS, in terms of being a partner. I mean they obviously you know are thinking about things. Theresa's fantastic. She just talked about being from an education family, but at the same time you know they have their own reasons to do it. They need workers right? They need people to fill these jobs to fulfill Amazon's own growth beyond their ecosystem, their partners and customers etcetera. So what does it mean for you as an educator and part of this consortium of community colleges to have somebody like AWS come in and really help you codevelop and drive these types of new programs? >> Well it means everything. Number one we know that Amazon is a major employer. We know that the jobs that they have available are good-paying jobs. They have a career path and so we know it's a good direction for young people to take. So part of my job as an educator is many ways it's like a parent. You want to take care of your family, you want to take care of the kids and put them in the right path so they have the most success possible. Amazon offers that kind of path. So for us to partner with someone like Amazon is great. Secondly, students know who Amazon is. I don't have to sell them. They know who they are, and they know what Amazon can do and they know that it's a great career path for them. So now that I think it could be a great partnership for us but also it's an opportunity for Amazon to even continue further developing that workforce in Los Angeles in California. >> Alright Marvin, well thank you so much for spending a few minutes and I wish you nothing but the best with this California Cloud Workforce Project. Make sure I get it right? >> It's right. Thank you so much, I appreciate it. >> Thank you, alright he's Marvin, I'm Jeff. You're watching theCUBE. We're in Seattle at the Amazon Imagine Education event. First time ever, keep watching. It's going to grow and grow and grow. Thanks for watching. (electronic music)

Published Date : Aug 10 2018

SUMMARY :

in Downtown Seattle, it's the theCUBE. So this is all about education. It's called the California Cloud... And the goal is to get them trained quickly So it's really about the workforce and say that you have trained students. that it's the community colleges that you guys and then they're able to go out and obviously So it's got to be a pretty easy sell I would imagine. So no doubt the population today has changed, In Community colleges generally or colleges in general? and the courses that we have are just not So the last time our economy was in bad shape So we have to develop programs that we think will be but at the same time you know they have their We know that the jobs that they have available are but the best with this California Cloud Workforce Project. Thank you so much, I appreciate it. We're in Seattle at the Amazon Imagine Education event.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Marvin MartinezPERSON

0.99+

Theresa CarlsonPERSON

0.99+

AmazonORGANIZATION

0.99+

TheresaPERSON

0.99+

JeffPERSON

0.99+

SeattleLOCATION

0.99+

MarvinPERSON

0.99+

LALOCATION

0.99+

Amazon Web ServicesORGANIZATION

0.99+

AWSORGANIZATION

0.99+

Jeff RickPERSON

0.99+

TodayDATE

0.99+

15 unitsQUANTITY

0.99+

Silicon ValleyLOCATION

0.99+

Los AngelesLOCATION

0.99+

Downtown SeattleLOCATION

0.99+

todayDATE

0.99+

East Los Angeles CollegeORGANIZATION

0.99+

last yearDATE

0.99+

Downtown Seattle WashingtonLOCATION

0.99+

First timeQUANTITY

0.98+

over 20 countriesQUANTITY

0.97+

SecondlyQUANTITY

0.97+

First oneQUANTITY

0.97+

theCUBEORGANIZATION

0.96+

about 900 registrantsQUANTITY

0.94+

19OTHER

0.94+

firstQUANTITY

0.93+

CaliforniaLOCATION

0.9+

Imagine: A Better WorldEVENT

0.89+

Amazon Imagine EducationEVENT

0.88+

AWS Imagine Education ConferenceEVENT

0.85+

one thingQUANTITY

0.82+

2008-2009DATE

0.8+

Imagine 2018EVENT

0.68+

Each Community CollegeQUANTITY

0.66+

ShakespearePERSON

0.66+

Meeting CenterLOCATION

0.65+

PresidentPERSON

0.61+

California Cloud Workforce ProjectTITLE

0.57+

CaliforniaORGANIZATION

0.55+

oneQUANTITY

0.51+

cellQUANTITY

0.5+

California CloudTITLE

0.39+

Wikibon Big Data Market Update pt. 2 - Spark Summit East 2017 - #SparkSummit - #theCUBE


 

(lively music) >> [Announcer] Live from Boston, Massachusetts, this is the Cube, covering Sparks Summit East 2017. Brought to you by Databricks. Now, here are your hosts, Dave Vellante and George Gilbert. >> Welcome back to Sparks Summit in Boston, everybody. This is the Cube, the worldwide leader in live tech coverage. We've been here two days, wall-to-wall coverage of Sparks Summit. George Gilbert, my cohost this week, and I are going to review part two of the Wikibon Big Data Forecast. Now, it's very preliminary. We're only going to show you a small subset of what we're doing here. And so, well, let me just set it up. So, these are preliminary estimates, and we're going to look at different ways to triangulate the market. So, at Wikibon, what we try to do is focus on disruptive markets, and try to forecast those over the long term. What we try to do is identify where the traditional market research estimates really, we feel, might be missing some of the big trends. So, we're trying to figure out, what's the impact, for example, of real time. And, what's the impact of this new workload that we've been talking about around continuous streaming. So, we're beginning to put together ways to triangulate that, and we're going to show you, give you a glimpse today of what we're doing. So, if you bring up the first slide, we showed this yesterday in part one. This is our last year's big data forecast. And, what we're going to do today, is we're going to focus in on that line, that S-curve. That really represents the real time component of the market. The Spark would be in there. The Streaming analytics would be in there. Add some color to that, George, if you would. >> [George] Okay, for 60 years, since the dawn of computing, we have two ways of interacting with computers. You put your punch cards in, or whatever else and you come back and you get your answer later. That's batch. Then, starting in the early 60's, we had interactive, where you're at a terminal. And then, the big revolution in the 80's was you had a PC, but you still were either interactive either with terminal or batch, typically for reporting and things like that. What's happening is the rise of a new interaction mode. Which is continuous processing. Streaming is one way of looking at it but it might be more effective to call it continuous processing because you're not going to get rid of batch or interactive but your apps are going to have a little of each. So, what we're trying to do, since this is early, early in its life cycle, we're going to try and look at that streaming component from a couple of different angles. >> Okay, as I say, that's represented by this Ogive curve, or the S-curve. On the next slide, we're at the beginning when you think about these continuous workloads. We're at the early part of that S-curve, and of course, most of you or many of you know how the S-curve works. It's slow, slow, slow. For a lot of effort, you don't get much in return. Then you hit the steep part of that S-curve. And that's really when things start to take off. So, the challenge is, things are complex right now. That's really what this slide shows. And Spark is designed, really, to reduce some of that complexity. We've heard a lot about that, but take us through this. Look at this data flow from ingest, to explore, to process, to serve. We talked a lot about that yesterday, but this underscores the complexity in the marketplace. >> [George] Right, and while we're just looking mostly at numbers today, the point of the forecast is to estimate when the barriers, representing complexities, start to fall. And then, when we can put all these pieces together, in just explore, process, serve. When that becomes an end-to-end pipeline. When you can start taking the data in on one end, get a scientist to turn it into a model, inject it into an application, and that process becomes automated. That's when it's mature enough for the knee in the curve to start. >> And that's when we think the market's going to explode. But now so, how do you bound this. Okay, when we do forecasts, we always try to bound things. Because if they're not bounded, then you get no foundation. So, if you look at the next slide, we're trying to get a sense of real-time analytics. How big can it actually get? That's what this slide is really trying to-- >> [George] So this one was one firm's take on real-time analytics, where by 2027, they see it peaking just under-- >> [Dave] When you say one firm, you mean somebody from the technology district? >> [George] Publicly available data. And we take it as as a, since they didn't have a lot of assumptions published, we took it as, okay one data point. And then, we're going to come at it with some bottoms-up end top-down data points, and compare. >> [Dave] Okay, so the next slide we want to drill into the DBMS market and when you think about DBMS, you think about the traditional RDBMS and what we know, or the Oracle, SQL Server, IBMDB2's, etc. And then, you have this emergent NewSQL, and noSQL entrance, which are, obviously, we talked today to a number of folks. The number of suppliers is exploding. The revenue's still relatively small. Certainly small relative to the RDBMS marketplace. But, take us through what your expectations is here, and what some of the assumptions are behind this. >> [George] Okay, so the first thing to understand is the DBMS market, overall, is about $40 billion of which 30 billion goes to online transaction processing supporting real operational apps. 10 billion goes to Orlap or business intelligence type stuff. The Orlap one is shrinking materially. The online transaction processing one, new sales is shrinking materially but there's a huge maintenance stream. >> [Dave] Yeah which companies like Oracle and IBM and Microsoft are living off of that trying to fund new development. >> We modeled that declining gently and beginning to accelerate more going out into the latter years of the tenure period. >> What's driving that decline? Obviously, you've got the big sucking sound of a dup in part, is driving that. But really, increasingly it's people shifting their resources to some of these new emergent applications and workloads and new types of databases to support them right? But these are still, those new databases, you can see here, the NewSQL and noSQL still, relatively, small. A lot of it's open source. But then it starts to take off. What's your assumption there? >> So here, what's going on is, if you look at dollars today, it's, actually, interesting. If you take the noSQL databases, you take DynamoDB, you take Cassandra, Hadoop, HBase, Couchbase, Mongo, Kudu and you add all those up, it's about, with DynamoDB, it's, probably, about 1.55 billion out of a $40 billion market today. >> [Dave] Okay but it's starting to get meaningful. We were approaching two billion. >> But where it's meaningful is the unit share. If that were translated into Oracle pricing. The market would be much, much bigger. So the point it. >> Ten X? >> At least, at least. >> Okay, so in terms of work being done. If there's a measure of work being done. >> [George] We're looking at dollars here. >> Operations per second or etcetera, it would be enormous. >> Yes, but that's reflective of the fact that the data volumes are exploding but the prices are dropping precipitously. >> So do you have a metric to demonstrate that. We're, obviously, not going to show it today but. >> [George] Yes. >> Okay great, so-- >> On the business intelligence side, without naming names, the data warehouse appliance vendors are charging anywhere from 25,000 per terabyte up to, when you include running costs, as high as 100,000 a terabyte. That their customers are estimating. That's not the selling cost but that's the cost of ownership per terabyte. Whereas, if you look at, let's say Hadoop, which is comparable for the off loading some of the data warehouse work loads. That's down to the 5K per terabyte range. >> Okay great, so you expect that these platforms will have a bigger and bigger impact? What's your pricing assumption? Is prices going to go up or is it just volume's going to go through the roof? >> I'm, actually, expecting pricing. It's difficult because we're going to add more and more functionality. Volumes go up and if you add sufficient functionality, you can maintain pricing. But as volumes go up, typically, prices go down. So it's a matter of how much do these noSQL and NewSQL databases add in terms of functionality and I distinguish between them because NewSQL databases are scaled out version of Oracle or Teradata but they are based on the more open source pricing model. >> Okay and NoSQL, don't forget, stands for not only SQL, not not SQL. >> If you look at the slides, big existing markets never fall off a cliff when they're in the climb. They just slowly fade. And, eventually, that accelerates. But what's interesting here is, the data volumes could explode but the revenue associated with the NoSQL which is the dark gray and the NewSQL which is the blue. Those don't explode. You could take, what's the DBMS cost of supporting YouTube? It would be in the many, many, many billions of dollars. It would support 1/2 of an Oracle itself probably. But it's all open source there so. >> Right, so that's minimizing the opportunity is what you're saying? >> Right. >> You can see the database market is flat, certainly flattish and even declining but you do expect some growth in the out years as part of that evasion, that volume, presumably-- >> And that's the next slide which is where we've seen that growth come from. >> Okay so let's talk about that. So the next slide, again, I should have set this up better. The X-axis year is worldwide dollars and the horizontal axis is time. And we're talking here about these continuous application work loads. This new work load that you talked about earlier. So take us through the three. >> [George] There's three types of workloads that, in large part, are going to be driving most of this revenue. Now, these aren't completely, they are completely comparable to the DBMS market because some of these don't use traditional databases. Or if they do, they're Torry databases and I'll explain that. >> [Dave] Sure but if I look at the IoT Edge, the Cloud and the micro services and streaming, that's a tail wind to the database forecast in the previous slide, is that right? >> [George] It's, actually, interesting but the application and infrastructure telemetry, this is what Splunk pioneered. Which is all the torrents of data coming out of your data center and your applications and you're trying to manage what's going on. That is a database application. And we know Splunk, for 2016, was 400 million. In software revenue Hadoop was 750 million. And the various other management vendors, New Relic, AppDynamics, start ups and 5% of Azure and AWS revenue. If you add all that up, it comes out to $1.7 billion for 2016. And so, we can put a growth rate on that. And we talked to several vendors to say, okay, how much will that work load be compared to IoT Edge Cloud. And the IoT Edge Cloud is the smart devices at the Edge and the analytics are in the fog but not counting the database revenue up in the Cloud. So it's everything surrounding the Cloud. And that, actually, if you look out five years, that's, maybe, 20% larger than the app and infrastructure telemetry but growing much, much faster. Then the third one where you were talking about was this a tail wind to the database. Micro server systems streaming are very different ways of building applications from what we do now. Now, people build their logic for the application and everyone then, stores their data in this centralized external database. In micro services, you build a little piece of the app and whatever data you need, you store within that little piece of the app. And so the database requirements are, rather, primitive. And so that piece will not drive a lot of database revenue. >> So if you could go back to the previous slide, Patrick. What's driving database growth in the out years? Why wouldn't database continue to get eaten away and decline? >> [George] In broad terms, the overall database market, it staying flat. Because as prices collapse but the data volumes go up. >> [Dave] But there's an assumption in here that the NoSQL space, actually, grows in the out years. What's driving that growth? >> [George] Both the NoSQL and the NewSQL. The NoSQL, probably, is best serving capturing the IoT data because you don't need lots of fancy query capabilities for concurrency. >> [Dave] So it is a tail wind in a sense in that-- >> [George] IoT but that's different. >> [Dave] Yeah sure but you've got the overall market growing. And that's because the new stuff, NewSQL and NoSQL is growing faster than the decline of the old stuff. And it's not in the 2020 to 2022 time frame. It's not enough to offset that decline. And then they have it start growing again. You're saying that's going to be driven by IoT and other Edge use cases? >> Yes, IoT Edge and the NewSQL, actually, is where when they mature, you start to substitute them for the traditional operational apps. For people who want to write database apps not who want to write micro service based apps. >> Okay, alright good. Thank you, George, for setting it up for us. Now, we're going to be at Big Data SV in mid March? Is that right? Middle of March. And George is going to be releasing the actual final forecast there. We do it every year. We use Spark Summit to look at our preliminary numbers, some of the Spark related forecasts like continuous work loads. And then we harden those forecasts going into Big Data SV. We publish our big data report like we've done for the past, five, six, seven years. So check us out at Big Data SV. We do that in conjunction with the Strada events. So we'll be there again this year at the Fairmont Hotel. We got a bunch of stuff going on all week there. Some really good programs going on. So check out siliconangle.tv for all that action. Check out Wikibon.com. Look for new research coming out. You're going to be publishing this quarter, correct? And of course, check out siliconangle.com for all the news. And, really, we appreciate everybody watching. George, been a pleasure co-hosting with you. As always, really enjoyable. >> Alright, thanks Dave. >> Alright, to that's a rap from Sparks. We're going to try to get out of here, hit the snow storm and work our way home. Thanks everybody for watching. A great job everyone here. Seth, Ava, Patrick and Alex. And thanks to our audience. This is the Cube. We're out, see you next time. (lively music)

Published Date : Feb 9 2017

SUMMARY :

Brought to you by Databricks. of the Wikibon Big Data Forecast. What's happening is the rise of a new interaction mode. On the next slide, we're at the beginning for the knee in the curve to start. So, if you look at the next slide, And then, we're going to come at it with some bottoms-up [Dave] Okay, so the next slide we want to drill into the [George] Okay, so the first thing to understand and IBM and Microsoft are living off of that going out into the latter years of the tenure period. you can see here, the NewSQL and you add all those up, [Dave] Okay but it's starting to get meaningful. So the point it. Okay, so in terms of work being done. it would be enormous. that the data volumes are exploding So do you have a metric to demonstrate that. some of the data warehouse work loads. the more open source pricing model. Okay and NoSQL, don't forget, but the revenue associated with the NoSQL And that's the next slide which is where and the horizontal axis is time. in large part, are going to be driving of the app and whatever data you need, What's driving database growth in the out years? the data volumes go up. that the NoSQL space, actually, grows is best serving capturing the IoT data because And it's not in the 2020 to 2022 time frame. and the NewSQL, actually, And George is going to be releasing This is the Cube.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
IBMORGANIZATION

0.99+

George GilbertPERSON

0.99+

PatrickPERSON

0.99+

GeorgePERSON

0.99+

MicrosoftORGANIZATION

0.99+

OracleORGANIZATION

0.99+

Dave VellantePERSON

0.99+

DavePERSON

0.99+

SethPERSON

0.99+

30 billionQUANTITY

0.99+

AlexPERSON

0.99+

two billionQUANTITY

0.99+

2016DATE

0.99+

$40 billionQUANTITY

0.99+

AWSORGANIZATION

0.99+

2027DATE

0.99+

20%QUANTITY

0.99+

five yearsQUANTITY

0.99+

New RelicORGANIZATION

0.99+

OrlapORGANIZATION

0.99+

$1.7 billionQUANTITY

0.99+

10 billionQUANTITY

0.99+

2020DATE

0.99+

BostonLOCATION

0.99+

AvaPERSON

0.99+

mid MarchDATE

0.99+

third oneQUANTITY

0.99+

last yearDATE

0.99+

AppDynamicsORGANIZATION

0.99+

2022DATE

0.99+

yesterdayDATE

0.99+

WikibonORGANIZATION

0.99+

60 yearsQUANTITY

0.99+

two daysQUANTITY

0.99+

siliconangle.comOTHER

0.99+

400 millionQUANTITY

0.99+

750 millionQUANTITY

0.99+

YouTubeORGANIZATION

0.99+

todayDATE

0.99+

5%QUANTITY

0.99+

Middle of MarchDATE

0.99+

Sparks SummitEVENT

0.99+

first slideQUANTITY

0.99+

threeQUANTITY

0.99+

two waysQUANTITY

0.98+

Boston, MassachusettsLOCATION

0.98+

early 60'sDATE

0.98+

about $40 billionQUANTITY

0.98+

one firmQUANTITY

0.98+

this yearDATE

0.98+

Ten XQUANTITY

0.98+

Spark SummitEVENT

0.97+

25,000 per terabyteQUANTITY

0.97+

80'sDATE

0.97+

DatabricksORGANIZATION

0.97+

DynamoDBTITLE

0.97+

three typesQUANTITY

0.97+

BothQUANTITY

0.96+

Sparks Summit East 2017EVENT

0.96+

Spark Summit East 2017EVENT

0.96+

this weekDATE

0.95+

SparkTITLE

0.95+

Bill Peterson, MapR - Spark Summit East 2017 - #SparkSummit - #theCUBE


 

>> Narrator: Live from Boston, Massachusetts, this is theCUBE, covering Spark Summit East 2017. Brought to you by Databricks. Now, here are your hosts Dave Vellante and George Gilbert. >> Welcome back to Boston, everybody, this is theCUBE, the leader in live tech coverage. We're here in Boston, in snowy Boston. This is Spark Summit. Spark Summit does a East Coast version, they do a West Coast version, they've got one in Europe this year. theCUBE has been a partner with Databricks as the live broadcast partner. Our friend Bill Peterson is here. He's the head of partner marketing at MapR. Bill, good to see you again. >> Thank you, thanks for having me. >> So how's the show going for you? >> It's great. >> Give us the vibe. We're kind of windin' down day two. >> It is. The show's been great, we've got a lot of traffic coming by, a lot of deep technical questions which is-- >> Dave: Hardcore at the show-- >> It is, it is. I spend a lot of time there smiling and going, "Yeah, talk to him." (laughs) But it's great. We're getting those deep technical questions and it's great. We actually just got one on Lustre, which I had to think for a minute, oh, HPC. It was like way back in there. >> Dave: You know, Cray's on the floor. >> Oh, yeah that's true. But a lot of our customers as well. UnitedHealth Group, Wells Fargo, AMEX coming by. Which is great to see them and talk to them, but also they've got some deep technical questions for us. So it's moving the needle with existing customers but also new business, which is great. >> So I got to ask a basic question. What is MapR? MapR started in the early days of Hadoop distro, vendor, one of the big three. When somebody says to you what is MapR, what do you say? My answer today is MapR is an enterprise software company that delivers a converged data platform. That converged data platform consists of a file system, a NoSQL database, a Hadoop distribution, a Spark distribution, and a set of data management tools. And as a customer of MapR, you get all of those. You can turn 'em all on if you'd like. You can just turn on the file system, for example, if you wanted to just use the file system for storage. But the enterprise software piece of that is all the hardening we do behind the scenes on things like snapshots, mirroring, data governance, multi-tenancy, ease of use performance, all of that baked in to the solution, or the platform as we're calling it now. So as you're kind of alluding to, a year ago now we kind of got out of that business of saying okay, lead 100% with Hadoop and then while we have your attention, or if we don't, hey wait, we got all this other stuff in the basket we want to show you, we went the platform play and said we're going to include everything and it's all there and then the baseline underneath is the hardening of it, the file system, the database, and the streaming product, actually, which I didn't mention, which is kind of the core, and everything plays off of there. And that honestly has been really well-received. And it just, I feel, makes it so much easier because-- It happened here, we get the question, okay, how are you different from Cloudera or Hortonworks? And some of it here, given the nature of the attendees, is very technical, but there's been a couple of business users that I've talked to. And when I talk about us as an enterprise software company delivering a plethora of solutions versus just Hadoop, you can see the light going on sometimes in people's eyes. And I got it today, earlier, "I had no idea you had a file system," which, to me, just drives me insane because the file system is pretty cool, right? >> Well you guys are early on in investing in that file system and recovery capabilities and all the-- >> Two years in stealth writing it. >> Nasty, gnarly, hard stuff that was kind of poo-pooed early on. >> Yeah, yeah. MapR was never patient about waiting for the open source community to just figure it out and catch up. You always just said all right, we're going to solve this problem and go sell. >> And I'm glad you said that. I want to be clear. We're not giving up on open source or anything, right? Open source is still a big piece. 50% of our engineers' time is working on open source projects. That's still super important to us. And then back in November-ish last year we announced the MapR Ecosystem Packs, which is our effort to help our customers that are using open source components to stay current. 'Cause that's a pain in the butt. So this is a set of packages that have a whole bunch of components. We lead with Spark and Drill, and that was by customer request, that they were having a hard time keeping current with Spark and Drill. So the packs allow them to come up to current level within the converged data platform for all of their open source components. And that's something we're going to do at dot Level, so I think we're at 2.1 or 2 now. The dot levels will bring you up on everything and then the big ones, like the 3.0s, the 4.0s, will bring Spark and Drill current. And so we're going to kind of leapfrog those. So that's still a really important part of our business and we don't want to forget that part, but what we're trying here to do is, via the platform, is deliver all of that in one entity, right? >> So the converged data platform is relevant presumably because you've got the history of Hadoop, 'cause you got all these different components and you got to cobble 'em together and they're different interfaces and different environments, you're trying to unify that and you have unified that, right? >> Yeah, yeah. >> So what is your customer feedback with regard to the converged data platform? >> Yeah so it's a great question because for existing customers, it was like, ah, thank you. It was one of those, right, because we're listening. Actually, again, glad you said that. This week, in addition to Spark Summit we're doing our yearly customer advisory board so we've got, like a lot of vendors, we've got a 30 plus company customer advisory board that we bring in and we sit down with them for a couple of days and they give us feedback on what we should and shouldn't be doing and where, directional and all that, which is super important. And that's where a lot of this converged data platform came out of is the need for... There was just too much, it's kind of confusing. I'll give the example of streams, right? We came out with our streaming product last year and okay, I'm using Hadoop, I'm using your file system, I'm using NoSQL, now you're adding streams, this is great, but now, like MEP, the Ecosystem Packages, I have to keep everything current. You got to make it easier for me, you got to make my life easier for me. So for existing customers it's a stay current, I like this, the model, I can turn on and off what I want when I want. Great model for them, existing business. For new business it gets us out of that Hadoop-only mode, right? I kind of jokingly call us Hadoop plus plus plus plus. We keep adding solutions and add it to a single, cohesive data platform that we keep updated. And as I mentioned here, talking to new customers or new prospects, our potential new business, when I describe the model you can just see the light going on and they realize wow, there's a lot more to this than I had imagined. I got it earlier today, I thought you guys only did Hadoop. Which is a little infuriating as a marketer, but I think from a mechanism and a delivery and a message and a story point of view, it's really helped. >> More Cube time will help get this out there. (laughs) >> Well played, well played. >> It's good to have you back on. Okay, so Spark comes along a couple years ago and it was like ah, what's going to happen to Hadoop? So you guys embraced Spark. Talk more specifically about Spark, where it fits in your platform and the ecosystem generally. >> Spark, Hadoop, others as a entity to bring data into the converged data platform, that's one way to think about it. Way oversimplified, obviously, but that's a really great way, I think, to think about it is if we're going to provide this platform that anybody can query on, you can run analytics against. We talk a lot about now converged applications. So taking historical data, taking operational data, so streaming data, great example. Putting those together and you could use the Data Lake example if you want, that's fine. But putting them into a converged application in the middle where they overlap, kind of typical Venn diagram where they overlap, and that middle part is the converged application. What's feeding that? Well, Spark could be feeding that, Hadoop could be feeding that. Just yesterday we announced a Docker for containers, that could be feeding into the converged data platform as well. So we look at all of these things as an opportunity for us to manage data and to make data accessible at the enterprise level. And then that enterprise level goes back to what I was talkin' before, it's got to have all of those things, like multi-tenancy and snapshots and mirroring and data governance, security, et cetera. But Spark is a big component of that. All of the customers who came by here that I mentioned earlier, which are some really good names for us, are all using Spark to drive data into the converged data platform. So we look at it as we can help them build new applications within converged data platform with that data. So whether it's Spark data, Hadoop data, container data, we don't really care. >> So along those lines, if the focus of intense interest right now is on Spark, and Spark says oh, and we work with all these databases, data storers, file systems, if you approach a customer who's Spark first, what's the message relative to all the other data storers that they can get to through, without getting too techy, their API? >> Sure, sure. I think as you know, George, we support a whole bunch of APIs. So I guess for us it's the breadth. >> But I'm thinking of Spark in particular. If someone says specifically, I want to run Databricks, but I need something underneath it to capture the data and to manage it. >> Well I think that's the beauty of our file system there. As I mentioned, if you think about it from an architectural point of view, our file system along the bottom, or it could be our database or our streaming product, but in this instance-- >> George: That's what I'm getting at too, all three. >> Picture that as the bottom layer as your storage-- I shouldn't say storage layer but as the bottom layer. 'Cause it's not just storage, it's more than storage. Middle layer is maybe some of your open source tools and the like, and then above that is what I called your data delivery mechanisms. Which would be Spark, for example, one bucket. Another bucket could be Hadoop, and another bucket could be these microservices we're talking about. Let my draw the picture another way using a partner, SAP. One of the things we've had some success with SAP is SAP HANA sitting up here. SAP would love to have you put all your data in HANA. It's probably not going to happen. >> George: Yeah, good luck. >> Yeah, good luck, right? But what if you, hey customer, what if you put zero to two years worth of data, historical data, in HANA. Okay, maybe the customer starts nodding their head like you just did. Hey customer, what if you put two to five years worth of data in Business Warehouse. Guess what, you already own that. You've been an SAP customer for awhile, you already have it. Okay, the customer's now really nodding their head. You got their attention. To your original question, whether it's Spark or whatever, five plus years, put it in MapR. >> Oh, and then like HANA Vora could do the query. >> Drill can query across all of them. >> Oh, right including the Business Warehouse, okay. >> So we're running in the file system. That, to me, and we do this obviously with our joint SAP MapR customers, that to me is kind of a really cool vision. And to your original question, if that was Spark at the top feeding it rather than SAP, sure, right? Why not? >> What can you share with us, Bill, about business metrics around MapR? However you choose to share it, head count, want to give us gross margins by product, that's great, but-- (laughs) >> Would you like revenues too, Dave? >> We know they're very high because you're a software company, so that's actually a bad question. I've already profit-- (laughs) >> You don't have to give us top line revenues-- >> So what are you guys saying publicly about the company, its growth. >> That's fair. >> Give us the latest. >> Fantastic, number one. Hiring like crazy, we're well north of 500 people now. I actually, you want to hear a funny story? I yesterday was texting in the booth, with a candidate from my team, back and forth on salary. Did the salary negotiation on text right there in the booth and closed her, she starts on the 27th, so. >> Dave: Congratulations. >> I'm very excited about that. So moving along on that. Seven, 800 plus customers as we talk about... We just finished our fiscal year on January 31st, so we're on Feb one fiscal year. And we always do a momentum press release, which will be coming out soon. Hiring, again, like crazy, as I mentioned, executive staff is all filled in and built to scale which we're really excited about. We talk a lot about the kind of uptake of-- it used to be of the file system, Hadoop, et cetera on its own, but now in this one the momentum release we'll be doing, we'll talk about the converged data platform and the uplift we've seen from that. So we obviously can't talk revenue numbers and the like, but everything... David, I got to tell you, we've been doin' this a long time, all of that is just all moving in the right direction. And then the other example I'll give you from my world, in the partner world. Last year I rebranded our partner to the converged partner program. We're going with this whole converged thing, right? And we established three levels, elite, preferred, and affiliate with different levels there. But also, there's revenue requirements at each level, so elite, preferred, and affiliate, and there's resell and influence revenues, we have MDF funds, not only from the big guys coming to us, but we're paying out MDF funds now to select partners as well. So all of this stuff I always talk about as the maturity of the company, right? We're maturing in our messaging, we're maturing in the level of people who are joining, and we're maturing in the customers and the deals, the deal sizes and volumes that we're seeing. It's all movin' in the right direction. >> Dave: Great, awesome, congratulations. >> Bill: Thank you, yeah, I'm excited. >> Can you talk about number of customers or number of employees relative to last year? >> Oh boy. Honestly, George, I don't know off the top of my head. I apologize, I don't know the metric, but I know it's north of 500 today, of employees, and it's like seven, 800 customers. >> Okay, okay. >> Yeah, yeah. >> And a little bit more on this partner, elite, preferred, and affiliate. >> Affiliate, yeah. >> What did you call it, the converged partners program? >> Converged-- Yeah, yeah. >> What are some of the details of that? >> Sure. So the elites are invite only, and those are some of the bigger ones. So for us, we're-- >> Dave: Like, some examples. >> Cisco, SAP, AWS, others, but those are some of the big ones. And they were looking at things like resell and influence revenue. That's what I track in my... I always jokingly say at MapR, even though we're kind of a big startup now, I always jokingly say at MapR you have three jobs. You have the job you were hired for, you have your Thursday night job, and you have your Sunday night job. (Dave and George laugh) In the job that I was hired for, partner marketing, I track influence and resell revenue. So at the elite level, we're doing both. Like Cisco resells us, so this S-Series, we're in their SKU, their sales reps can go sell an S-Series for big data workloads or analytical workloads, MapR, on it, off you go. Our job then is cashing checks, which I like. That's a good job to have in this business. At the preferred level it's kind of that next tier of big players, but revenue thresholds haven't moved into the elite yet. Partners in there, like the MicroStrategies of the world, we're doing a lot with them, Tableau, Talend, a lot of the BI vendors in there. And then the affiliates are the smaller guys who maybe we'll do one piece of a campaign during the year with them. So I'll give you an example, Attunity, you guys know those guys right here? >> Sure >> Yeah, yeah. >> Last year we were doing a campaign on DWO, data warehouse offload. We wanted to bring them in but this was a MapR campaign running for a quarter, and we're typical, like a lot of companies, we run four campaigns a year and then my partner in field stuff kind of opts into that and we run stuff to support it. And then corporate marketing does something. Pretty traditional. But what I try and do is pull these partners into those campaigns. So we did a webinar with Attunity as part of that campaign. So at the affiliate level, the lower level, we're not doing a full go-to-market like we would with the elites at the top, but they're being brought into our campaigns and then obviously hopefully, we hope on the other side they're going to pull us in as well. >> Great, last question. What should we pay attention to, what's comin' up? >> Yeah, so-- >> Let's see, we got some events, we got Strata coming up you'll be out your way, or out MapR way. >> As my Twitter handle says, seat 11A. That's where I am. (laughs) Yeah, I mean the Docker announcement we're really excited about, and microservices. You'll see more from us on the whole microservices thing. Streaming is still a big one, we think, for this year. You guys probably agree. That's why we announced the MapR streaming product last year. So again, from a go-to-market point of view and kind of putting some meat behind streaming not only MapR but with partners, so streaming as a component and a delivery model for managing data in CDP. I think that's a big one. Machine learning is something that we're seeing more and more touching us from a number of customers but also from the partner perspective. I see all the partner requests that come in to join the partner program, and there's been an uptick in the machine learning customers that want to come in and-- Excuse me, partners, that want to be talking to us. Which I think is really interesting. >> Where you would be the sort of prediction serving layer? >> Exactly, exactly. Or a data store. A lot of them are looking for just an easy data store that the MapR file system can do. >> Infrastructure to support that, yeah. >> Commodity, right? The whole old promise of Hadoop or just a generic file system is give me easy access to storage on commodity hardware. The machine learning-- >> That works. >> Right. The existing machine learning vendors need an answer for that. When the customer asks them, they want just an easy answer, say oh, we just use MapR FS for that and we're done. Okay, that's fine with me, I'll take that one. >> So that's the operational end of that machine learning pipeline that we call DevOps for data scientists? >> Correct, right. I guess the nice synergy there is the whole, going back to the Docker microservices one, there's a DevOps component there as well. So, might be interesting marrying those together. >> All right, we got to go, Bill, thanks very much, good to see you again. >> All right, thank you. >> All right, George and I will be back to wrap. We're going to part two of our big data forecast right now, so stay with us, right back. (digital music) (synth music)

Published Date : Feb 9 2017

SUMMARY :

Brought to you by Databricks. Bill, good to see you again. We're kind of windin' down day two. a lot of deep technical questions which is-- "Yeah, talk to him." So it's moving the needle with existing customers is all the hardening we do behind the scenes that was kind of poo-pooed early on. You always just said all right, we're going to solve So the packs allow them to come up to current level I got it earlier today, I thought you guys only did Hadoop. More Cube time will help get this out there. It's good to have you back on. and that middle part is the converged application. I think as you know, George, we support and to manage it. our file system along the bottom, and the like, and then above that is what I called Okay, maybe the customer starts nodding their head And to your original question, if that was Spark at the top so that's actually a bad question. So what are you guys saying publicly and closed her, she starts on the 27th, so. all of that is just all moving in the right direction. Honestly, George, I don't know off the top of my head. And a little bit more on this partner, elite, Yeah, yeah. So the elites are invite only, So at the elite level, we're doing both. So at the affiliate level, the lower level, What should we pay attention to, what's comin' up? Let's see, we got some events, we got Strata coming up I see all the partner requests that come in that the MapR file system can do. to storage on commodity hardware. When the customer asks them, they want just an easy answer, I guess the nice synergy there is the whole, thanks very much, good to see you again. We're going to part two of our big data forecast

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
DavidPERSON

0.99+

GeorgePERSON

0.99+

Dave VellantePERSON

0.99+

UnitedHealth GroupORGANIZATION

0.99+

George GilbertPERSON

0.99+

AMEXORGANIZATION

0.99+

Bill PetersonPERSON

0.99+

BostonLOCATION

0.99+

DavePERSON

0.99+

CiscoORGANIZATION

0.99+

EuropeLOCATION

0.99+

twoQUANTITY

0.99+

MapRORGANIZATION

0.99+

Wells FargoORGANIZATION

0.99+

Last yearDATE

0.99+

50%QUANTITY

0.99+

five yearsQUANTITY

0.99+

AWSORGANIZATION

0.99+

DatabricksORGANIZATION

0.99+

yesterdayDATE

0.99+

two yearsQUANTITY

0.99+

HortonworksORGANIZATION

0.99+

BillPERSON

0.99+

ClouderaORGANIZATION

0.99+

30 plusQUANTITY

0.99+

zeroQUANTITY

0.99+

last yearDATE

0.99+

Two yearsQUANTITY

0.99+

todayDATE

0.99+

NovemberDATE

0.99+

bothQUANTITY

0.99+

January 31stDATE

0.99+

Feb oneDATE

0.99+

HANATITLE

0.99+

This weekDATE

0.99+

Thursday nightDATE

0.99+

SAPORGANIZATION

0.99+

Sunday nightDATE

0.99+

five plus yearsQUANTITY

0.99+

three jobsQUANTITY

0.99+

TableauORGANIZATION

0.99+

Boston, MassachusettsLOCATION

0.99+

Seven, 800 plus customersQUANTITY

0.99+

100%QUANTITY

0.98+

TalendORGANIZATION

0.98+

NoSQLTITLE

0.98+

HadoopTITLE

0.98+

seven, 800 customersQUANTITY

0.98+

each levelQUANTITY

0.98+

a year agoDATE

0.98+

SparkTITLE

0.98+

TwitterORGANIZATION

0.98+

this yearDATE

0.98+

theCUBEORGANIZATION

0.98+

day twoQUANTITY

0.98+

27thDATE

0.97+

OneQUANTITY

0.97+

oneQUANTITY

0.97+

SAP HANATITLE

0.97+

Spark SummitEVENT

0.97+

East CoastLOCATION

0.96+

Joel Cumming, Kik - Spark Summit East 2017 - #SparkSummit - #theCUBE


 

>> Narrator: Live from Boston, Massachusetts this is the Cube, covering Spark Summit East 2017 brought to you by Databricks. Now, here are your hosts, Dave Vellante and George Gilbert. >> Welcome back to Boston, everybody, where it's a blizzard outside and a blizzard of content coming to you from Spark Summit East, #SparkSummit. This is the Cube, the worldwide leader in live tech coverage. Joel Cumming is here. He's the head of data at Kik. Kicking butt at Kik. Welcome to the Cube. >> Thank you, thanks for having me. >> So tell us about Kik, this cool mobile chat app. Checked it out a little bit. >> Yeah, so Kik has been around since about 2010. We're, as you mentioned, a mobile chat app, start-up based in Waterloo, Ontario. Kik really took off, really 2010 when it got 2 million users in the first 22 days of its existence. So was insanely popular, specifically with U.S. youth, and the reason for that really is Kik started off in a time where chatting through text cost money. Text messages cost money back in 2010, and really not every kid has a phone like they do today. So if you had an iPod or an iPad all you needed to do was sign up, and you had a user name and now you could text with your friends, so kids could do that just like their parents could with Kik, and that's really where we got our entrenchment with U.S. youth. >> And you're the head of data. So talk a little bit about your background. What does that mean to be a head of data? >> Yes, so prior to working at Kik I worked at Blackberry, and I like to say I worked at Blackberry probably around the time just before you bought your first Blackberry and I left just after you bought your first iPhone. So kind of in that range, but was there for nine years. >> Vellante: Can you do that with real estate? >> Yeah, I'd love to be able to do that with real estate. But it was a great time at Blackberry. It was very exciting to be part of that growth. When I was there, we grew from three million to 80 million customers, from three thousand employees to 17 thousand employees, and of course, things went sideways for Blackberry, but conveniently at the end Blackberry was working in BBM, and leading a team of data scientists and data engineers there. And BBM if you're not familiar with it is a chat app as well, and across town is where Kik is headquartered. The appeal to me of moving to Kik was a company that was very small and fast moving, but they actually weren't leveraging data at all. So when I got there, they had a pile of logs sitting in S3, waiting for someone to take advantage of them. They were good at measuring events, and looking at those events and how they tracked over time, but not really combining them to understand or personalize any experience for their end customers. >> So they knew enough to keep the data. >> They knew enough to keep the data. >> They just weren't sure what to do with it. Okay so, you come in, and where did you start? >> So the first day that I started that was the first day I used any AWS product, so I had worked on the big data tools at the old place, with Hadoop and Pig and Hive and Oracle and those kinds of things, but had never used an AWS product until I got there and it was very much sink or swim and on my first day our CEO in the meeting said, "Okay, you're data guy here now. "I want you to tell me in a week why people leave Kik." And I'm like, man we don't even have a database yet. The first thing I did was I fired up a Redshift cluster. First time I had done that, looked at the tools that were available in AWS to transform the data using EMR and Pig and those kinds of things, and was lucky enough, fortunate enough that they could figure that out in a week and I didn't give him the full answer of why people left, but I was able to give him some ideas of places we could go based on some preliminary exploration. So I went from leading this team of about 40 people to being a team of one and writing all the code myself. Super exciting, not the experience that everybody wants, but for me it was a lot of fun. Over the last three years have built up the team. Now we have three data engineers and three data scientists and indeed it's a lot more important to people every day at Kik. >> What sort of impact has your team had on the product itself and the customer experience? >> So the beginning it was really just trying to understand the behaviors of people across Kik, and that took a while to really wrap our heads around, and any good data analysis combines behaviors that you have to ask people their opinion on and also behaviors that we see them do. So I had an old boss that used to work at Rogers, which is a telecomm provider in Canada, and he said if you ask people the things that they watch they tell you documentaries and the news and very important stuff, but if you see what they actually watch it's reality TV and trashy shows, and so the truth is really somewhere in the middle. There's an aspirational element. So for us really understanding the data we already had, instrumenting new events, and then in the last year and a half, building out an A/B testing framework is something that's been instrumental in how we leverage data at Kik. So we were making decisions by gut feel in the very beginning, then we moved into this era where we were doing A/B testing and very focused on statistical significance, and rigor around all of our experiments, but then stepping back and realizing maybe the bets that we have aren't big enough. So we need to maybe bet a little bit more on some bigger features that have the opportunity to move the needle. So we've been doing that recently with a few features that we've released, but data is super important now, both to stimulate creativity of our product managers as well as to measure the success of those features. >> And how do you map to the product managers who are defining the new features? Are you a central group? Are you sort of point guards within the different product groups? How does that, your evidence-based decisions or recommendations but they make ultimately, presumably, the decisions. What's the dynamic? >> So it's a great question. In my experience, it's very difficult to build a structure that's perfect. So in the purely centralized model you've got this problem of people are coming to you to ask for something, and they may get turned away because you're too busy, and then in the decentralized model you tend to have lots of duplication and overlap and maybe not sharing all the things that you need to share. So we tried to build a hybrid of both. And so we had our data engineers centralized and we tried doing what we called tours of duty, so our data scientists would be embedded with various teams within the company so it could be, it could be the core messenger team. It could be our bot platform team. It could be our anti-spam team. And they would sit with them and it's very easy for product managers and developers to ask them questions and for them to give out answers, and then we would rotate those folks through a different tour of duty after a few months and they would sit with another team. So we did that for a while, and it worked pretty well, but one of the major things we found was a problem was there's no good checkpoint to confirm that what they're doing is right. So in software development you're releasing a version of software. There's QA, there's code review and there's structure in place to ensure that yes, this number I'm providing is right. It's difficult when you've got a data scientist who's out with a team for him to come back to the team and get that peer review. So now we're kind of reevaluating that. We use an agile approach, but we have primes for each of these groups but now we all sit together. >> So the accountability is after the data scientist made a recommendation that the product manager agrees with, how do you ensure that it measured up to the expectation? Like sort of after the fact. >> Yeah, so in those cases our A/B tests are it's nice to have that unbiased data resource on the team that's embedded with them that can step back and say yes, this idea worked, or it didn't work. So that's the approach that we're taking. It's not a dedicated resource, but a prime resource for each of these teams that's a subject matter expert and then is evaluating the results in an unbiased kind of way. >> So you've got this relatively small, even though it's quadruple the size when you started, data team and then application development team as sort of colleagues or how do you interact with them? >> Yeah, we're actually part of the engineering organization at Kik, part of R and D, and in different times in my life I've been part of different organizations whether it's marketing or whether it's I.T. or whether it's R and D, and R and D really fits nicely. And the reason why I think it's the best is because if there's data that you need to understand users more there's much more direct control over getting that element instrumented within a product that you have when you're part of R and D. If you're in marketing, you're like hey, I'd love to know how many times people tap on that red button, but no event fires when that red button is tapped. Good luck trying to get the software developers to put that in. But when there's an inherent component of R and D that's dependent on data, and data has that direct path to those developers, getting that kind of thing done is much easier. >> So from a tooling standpoint, thinking about data scientists and data engineers, a lot of the tools that we've seen in this so-called big data world have been quite spoke. Different interfaces, different experience. How are you addressing that? Does Spark help with that? Maybe talk about that a bit more. >> Yeah, so I was fortunate enough to do a session today that sort of talked about data V1 at Kik versus data V2 at Kik, and we drew this kind of a line in the sand. So when I started it was just me. I'm trying to answer these questions very quickly on these three or five day timelines that we get from our CEO. >> Vallente: You've been here a week, come on! >> Yeah exactly, so you sacrifice data engineering and architecture when you're living like that. So you can answer questions very quickly. It worked well for a while, but then all of a sudden we come up and we have 300 data pipelines. They're a mess. They're hard to manage and control. We've got code sometimes in Sequel or sometimes in Python scripts, or sometimes on people's laptops. We have no real plan for Getup integration. And then you know real scalability out of Redshift. We were doing a lot of our workloads in Redshift to do transformations just because, get the data into Redshift, write some Sequel and then have your results. We're running into contention problems with that. So what we decided to do is sort of stop, step back and say, okay so how are we going to house all of this atomic data that we have in a way that's efficient. So we started with Redshift, our database was 10 terabytes. Now it's 100, except for we get five terabytes of data per day that's new coming in, so putting that all in Redshift, it doesn't make sense. It's not all that useful. So if we cull that data under supervision, we don't want to get rid of the atomic data, how do we control that data under supervision. So we decided to go the data lake route, even though we hate the term data lake, but basically a folder structure within S3 that's stored in a query optimized format like Parquet, and now we can access that data very quickly at an atomic level, at a cleansed level and also an at aggregate level. So for us, this data V2 was the evolution of stopping doing a lot of things the way we used to do, which was lots of data pipelines, kind of code that was all over the place, and then aggregations in Redshift, and starting to use Spark, specifically Databricks. Databricks we think of in two ways. One is kind of managed Spark, so that we don't have to do all the configuration that we used to have to do with EMR, and then the second is notebooks that we can align with all the work that we're doing and have revision control and Getup integration as well. >> A question to clarify, when you've put the data lake, which is the file system and then the data in Parquet format, or Parquet files, so this is where you want to have some sort of interactive experience for business intelligence. Do you need some sort of MPP server on top of that to provide interactive performance, or, because I know a lot customers are struggling at that point where they got all the data there, and it's kind of organized, but then if they really want to munge through that huge volume they find it slows to lower than a crawl. >> Yeah, it's a great point. And we're at the stage right now where our data lake at the top layer of our data lake where we aggregate and normalize, we also push that data into Redshift. So Redshift what we're trying to do with that is make it a read-only environment, so that our analysts and developers, so they know they have consistent read performance on Redshift, where before when it's a mix of batch jobs as well as read workload, they didn't have that guarantee. So you're right, and we think what will probably happen over the next year or so is the advancements in Spark will make it much more capable as a data warehousing product, and then you'd have to start a question do I need both Redshift and Spark for that kind of thing? But today I think some of the cost-based optimizations that are coming, at least the promise of them coming I would hope that those would help Spark becoming more of a data warehouse, but we'll have to see. >> So carry that thread a little further through. I mean in terms of things that you'd like to see in the Spark roadmap, things that could be improved. What's your feedback to Databricks? >> We're fortunate, we work with them pretty closely. We've been a customer for about half a year, and they've been outstanding working with us. So structured streaming is a great example of something we worked pretty closely with on. We're really excited about. We don't have, you know we have certain pockets within our company that require very real-time data, so obviously your operational components. Are your servers up or down, as well as our anti-spam team. They require very low latency access to data. We haven't typically, if we batch every hour that's fine in most cases, but structured streaming when our data streams are coming in now through Kinesis Firehose, and we can process those without have to worry about checking to see if it's time we should start this or is all the data there so we can run this batch. Structured streaming solves a lot of those, it simplifies a lot of that workload for us. So that's something we've been working with them on. The other things that we're really interested in. We've got a bit of list, but the other major ones are how do you start to leverage this data to use it for personalization back in the app? So today we think of data in two ways at Kik. It's data as KPIs, so it's like the things you need to run your business, maybe it's A/B testing results, maybe it's how many active users you had yesterday, that kind of thing. And then the second is data as a product, and how do you provide personalization at an individual level based on your data sciences models back out to the app. So we do that, I should point out at Kik we don't see anybody's messages. We don't read your messages. We don't have access to those. But we have the metadata around the transactions that you have, like most companies do. So that helps us improve our products and services under our privacy policy to say okay, who's building good relationships and who's leaving the platform and why are they doing it. But we can also service components that are useful for personalization, so if you've chatted with three different bots on our platform that's important for us to know if we want to recommend another bot to you. Or you know the classic people people you may know recommendations. We don't do that right now, but behind the scenes we have the kind of information that we could help personalize that experience for you. So those two things are very different. In a lot of companies there's an R and D element, like at Blackberry, the app world recommendation engine was something that there was a team that ran in production but our team was helping those guys tweak and tune their models. So it's the same kind of thing at Kik where we can build, our data scientist are building models for personalization, and then we need to service them back up to the rest of the company. And the process right now of taking the results of our models and then putting them into a real time serving system isn't that clean, and so we do batches every day on things that don't need to be near real-time, so things like predicted gender. If we know your first name, we've downloaded the list of baby names from the U.S. Social Security website and we can say the frequency of the name Pat 80 percent of the time it's a male, and 20 percent it's a female, but Joel is 99 percent of the time it's male and one percent of the time it's a female, so based on your tolerance for whatever you want to use this personalization for we can give you our degrees of confidence on that. That's one example of what we surface rate now in our API back to our own first party components of our app. But in the future with more real-time data coming in from Spark streaming with more real-time model scoring, and then the ability to push that over into some sort of capability that can be surfaced up through an API, it gives our data team the capability of being much more flexible and fast at surfacing things that can provide personalization to the end user, as opposed to what we have now which is all this batch processing and then loading once a day and then knowing that we can't react on the fly. >> So if I were to try and turn that into a sort of a roadmap, a Spark roadmap, it sounds like the process of taking the analysis and doing perhaps even online training to update the models, or just rescoring if you're doing a little slightly less fresh, but then serving it up from a high speed serving layer, that's when you can take data that's coming in from the game and send it back to improve the game in real time. >> Exactly. Yep. >> That's what you're looking for. >> Yeah. >> You and a lot of other people. >> Yeah I think so. >> So how's the event been for you? >> It's been great. There's some really smart people here. It's humbling when you go to some of these sessions and you know, we're fortunate where we try and not have to think about a lot of the details that people are explaining here, but it's really good to understand them and know that there are some smart people that are fixing these problems. As like all events, been some really good sessions, but the networking is amazing, so meeting lots of great people here, and hearing their stories too. >> And you're hoping to go to the hockey game tonight. >> Yeah, I'd love to go to the hockey game. See if we can get through the snow. >> Who are the Bruins playing tonight. >> San Jose. >> Oh, good. >> It could be a good game. >> Yeah, the rivalry. You guys into the hockey game? Alright, good. Alright, Joel, listen, thanks very much for coming on the Cube. Great segment. I really appreciate your insights and sharing. >> Okay, thanks for having me. >> You're welcome. Alright, keep it right there, everybody. George and I will be back right after this short break. This is the Cube. We're live from Spark Summit in Boston.

Published Date : Feb 9 2017

SUMMARY :

brought to you by Databricks. and a blizzard of content coming to you So tell us about Kik, this cool mobile chat app. and the reason for that really is Kik started off What does that mean to be a head of data? and I like to say I worked at Blackberry but conveniently at the end Blackberry was working Okay so, you come in, and where did you start? and on my first day our CEO in the meeting said, and also behaviors that we see them do. And how do you map to the product managers but one of the major things we found was a problem So the accountability is after the data scientist So that's the approach that we're taking. and data has that direct path to those developers, a lot of the tools that we've seen and we drew this kind of a line in the sand. One is kind of managed Spark, so that we don't have to do and it's kind of organized, but then if they that are coming, at least the promise of them coming in the Spark roadmap, things that could be improved. It's data as KPIs, so it's like the things you need from the game and send it back to improve the game and not have to think about a lot of the details See if we can get through the snow. Yeah, the rivalry. This is the Cube.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
GeorgePERSON

0.99+

George GilbertPERSON

0.99+

CanadaLOCATION

0.99+

Joel CummingPERSON

0.99+

Dave VellantePERSON

0.99+

BlackberryORGANIZATION

0.99+

2010DATE

0.99+

JoelPERSON

0.99+

AWSORGANIZATION

0.99+

10 terabytesQUANTITY

0.99+

20 percentQUANTITY

0.99+

nine yearsQUANTITY

0.99+

99 percentQUANTITY

0.99+

BostonLOCATION

0.99+

iPadCOMMERCIAL_ITEM

0.99+

three millionQUANTITY

0.99+

17 thousand employeesQUANTITY

0.99+

Boston, MassachusettsLOCATION

0.99+

three thousand employeesQUANTITY

0.99+

KikORGANIZATION

0.99+

threeQUANTITY

0.99+

Waterloo, OntarioLOCATION

0.99+

iPodCOMMERCIAL_ITEM

0.99+

three data scientistsQUANTITY

0.99+

two thingsQUANTITY

0.99+

PythonTITLE

0.99+

100QUANTITY

0.99+

one percentQUANTITY

0.99+

firstQUANTITY

0.99+

RedshiftTITLE

0.99+

bothQUANTITY

0.99+

2 million usersQUANTITY

0.99+

80 percentQUANTITY

0.99+

iPhoneCOMMERCIAL_ITEM

0.99+

todayDATE

0.99+

KikPERSON

0.99+

five dayQUANTITY

0.99+

eachQUANTITY

0.99+

three data engineersQUANTITY

0.99+

OracleORGANIZATION

0.99+

secondQUANTITY

0.99+

300 data pipelinesQUANTITY

0.98+

OneQUANTITY

0.98+

yesterdayDATE

0.98+

two waysQUANTITY

0.98+

DatabricksORGANIZATION

0.98+

S3TITLE

0.98+

oneQUANTITY

0.98+

ParquetTITLE

0.98+

first dayQUANTITY

0.98+

RogersORGANIZATION

0.98+

about half a yearQUANTITY

0.97+

once a dayQUANTITY

0.97+

SparkTITLE

0.97+

Spark Summit East 2017EVENT

0.97+

first 22 daysQUANTITY

0.97+

about 40 peopleQUANTITY

0.97+

next yearDATE

0.97+

first thingQUANTITY

0.96+

First timeQUANTITY

0.96+

SparkORGANIZATION

0.95+

U.S. Social SecurityORGANIZATION

0.95+

a weekQUANTITY

0.95+

80 million customersQUANTITY

0.95+

John Landry, HP - Spark Summit East 2017 - Spark Summit East 2017 - #SparkSummit - #theCUBE


 

>> Live from Boston, Massachusetts, this is the CUBE, covering Spark Summit East 2017 brought to you by databricks. Now, here are your hosts Dave Valante and George Gilbert. >> Welcome back to Boston everyone. It's snowing like crazy outside, it's a cold mid-winter day here in Boston but we're here with the CUBE, the world-wide leader in tech coverage. We are live covering Spark Summit. This is wall to wall coverage, this is our second day here. John Landry with us, he's the distinguished technologist for HP's personal systems data science group within Hewlett Packard. John, welcome. >> Thank you very much for having me here. >> So I was saying, I was joking, we do a lot of shows with HPE, it's nice to have HP back on the CUBE, it's been awhile. But I want to start there. The company split up just over a year ago and it's seemingly been successful for both sides but you were describing to us that you've gone through an IT transformation of sorts within HP. Can you describe that? >> In the past, we were basically a data warehousing type of approach with reporting and what have you coming out of data warehouses, using Vertica, but recently, we made an investment into more of a programming platform for analytics and so where transformation to the cloud is about that where we're basically instead of investing into our own data centers because really, with the split, our data centers went with Hewlett Packard Enterprise, is that we're building our software platform in the cloud and that software platform includes analytics and in this case, we're building big data on top of Spark and so that transformation is huge for us, but it's also enabled us to move a lot faster, the velocity of our business and to be able to match up to that better. Like I said, it's mainly around the software development really more than anything else. >> Describe your role in a little bit more detail inside of HP. >> My role is I'm the leader in our big data investments and so I've been leading teams internally and also collaborating across HP with our print group and what we've done is we've managed to put together a strategy around our cloud-based solution to that. One of the things that was important was we had a common platform because when you put a program platform in place, if it's not common, then we can't collaborate. Our investment could be fractured, we could have a lot of side little efforts going on and what have you so my role is to pry the leadership in the direction for that and also one of the reasons I'm here today is to get involved in the Spark community because our investment is in Spark so that's another part of my role is to get involved with the industry and to be able to connect with the experts in the industry so we can leverage off of that because we don't have that expertise internally. >> What are the strategic and tactical objectives of your analytics initiatives? Is it to get better predictive maintenance on your devices? Is it to create new services for customers? Can you describe that? >> It's two-fold, internal and external so internally, we got millions of dollars of opportunity to better our products with cost, also to optimize our business models and the way we can do that is by using the data that comes back from our products, our services, our customers, combining that together and creating models around that that are then automated and can be turned into apps that can be used internally by our organizations. The second part is to take the same approach, same data, but apply that back towards our customers and so with the split, our enterprise services group also went with Hewlett Packard Enterprise and so now, we have a dedicated effort towards creating manage services for the commercial environment. And that's both on the print size and on the personal system side so to basically fuel that, analytics is a big part of the story. So we've had different things that you'll see out there like touch point manager is one of our services we're delivering in personal systems. >> Dave: What is that? >> Touch point manager is aimed at providing management services for SMB and for commercial environments. So for instance, in touch point manager, we can provide predictive type of capabilities for support. A number of different services that companies are looking for when they buy our products. Another thing we're going after too is device as a service. So there's another thing that we've announced recently that basically we're invested into there and so this is obviously if you're delivering devices as a service, you want to do that as optimal as possible. Well, being able to understand the devices, what's happening with them, been able to predictive support on them, been able to optimize the usage of those devices, that's all important. >> Dave: A lot of data. >> The data really helps us out, right? So the data that we can collect back from our devices and to be able to take that and turn that around into applications that are delivering information inside or outside is huge for us, a huge opportunity. >> It's interesting where you talk about internal initiatives and manage services, which sound like they're most external, but on the internal ones, you were talking about taking customer data and internal data and turning those into live models. Can you elaborate on that? >> Sure, I can give you a great example is on our mobile products, they all have batteries. All of our batteries are instrumented as smart batteries and that's an industry standard but HP actually goes a step further on that, it's the information that we put into our batteries. So by monitoring those batteries and the usage in the field is we can tell how optimally they're performing, but also how they're being used and how we can better design batteries going forward. So in addition, we can actually provide information back into our supply chain. For instance, there's a cell supplier for the battery, there's a pack supplier, there's our unit manufacturer for the product, and so a lot of things that we've been able to uncover is that we can go and improve process. And so improving process alone helps to improve the quality of what we deliver and the quality of the experience to our customers. So that's one example of just using the data, turning that around into a model. >> Is there an advantage to having such high volume, such market share in getting not just more data, but sort of more of the bell curve, so you get the edge conditions? >> Absolutely, it's really interesting because when we started out on this, everybody's used to doing reporting which is absolute numbers and how much did you shift and all that kind of stuff. But, we're doing big data, right? So in big data, you just need a good sample population. Turn the data scientist into that and they've got their statistical algorithms against that. They give you the confidence factor based upon the data that you have so it's absolutely a good factor for us because we don't have to see all the platforms out there. Then, the other thing is, when you look at populations, we see variances in different customers so we're looking at, like one of our populations that's very valuable to us is our own, so we take the 60 thousand units that we have internally at HP and that's one of our sample populations. What a better way to get information on your own products? But, you take that and you take it to one of our other customers and their population's going to look slight different. Why? Because they use the products differently. So one of the things is just usage of the products, the environment they're used in, how they use them. Our sample populations are great in that respect. Of course, the other thing is, very important to point out, we only collect data under the rules and regulations that are out there, so we absolutely follow that and we absolutely keep our data secure and we absolutely keep everything and that's important. Sometimes, today they get a little bit spooked sometimes around that, but the case is that our services are provided based on customers signing up for them. >> I'm guessing you don't collect more data than Google. >> No, we're nowhere near Google. >> So, if you're not spooked at Google - >> That's what I tell people. I say if you got a smartphone, you're giving up a lot more data than we're collecting. >> Buy something from Amazon. Spark, where does Spark fit into all of this? >> Spark is great because we needed a programming platform that could scale in our data centers and in our previous approaches, we didn't have a programming platform. We started with a Hadoop, the Hadoop was very complex though. It really gets down to the hardware and you're programming and trying to distribute that load and getting clusters and you pick up Spark and immediately abstraction. The other thing is it allows me to hire people that can actually program on top of it. I don't have to get someone that knows Map Reduce. I can sit there and it's like what do you know? You know R, Scala, you know Python, it doesn't matter. I can run all of that on top of it. So that's huge for us. The other thing is flat out the speed because as you start getting going with this, we get this pull all of a sudden. It's like well I only need the data like once a month, it's like I need it once a week, I need it once a day, I need the output of this by the hour now. So, the scale and the speed of that is huge and then when you put that on the cloud platform, you know, Spark on a cloud platform like Amazon, now I've got access to all the compute instances. I can scale that, I can optimize it because I don't always need all the power. The flexibility of Spark and being able to deliver that is huge for our success. >> So, I've got to ask some columbo questions and George, maybe you can help me sort of frame it. So you mentioned you were using Hadoop. Like a lot of Hadoop practitioners, you found it very complex. Now, Hewlett Packard has resources. Many companies don't but so you mentioned people out doing Python and R and Scale and Map Reduce, are you basically saying okay, we're going to unify portions of our Hadoop complexity with Spark and that's going to simplify our efforts? >> No, what we actually did was we started on the Hadoop side of it. The first thing we did was try to move from a data warehouse to more of a data lake approach or repository and that was internal, right? >> Dave: And that was a cost reduction? >> That was a cost reduction but also, data accessibility. >> Dave: Yeah, okay. >> The other thing we did was ingesting the data. When you're starting to bring data in from millions of devices, we had a problem coming through the firewall type approach and you got to have something in front of that like a Kafka or something in front of it that can handle it. So when we moved to the cloud, we didn't even try to put up our own, we just used Kinesis and that we didn't have to spend any resources to go solve that problem. Well, the next thing was, when we got the data, you need to ingest the data in and our data's coming in, we want to split it out, we needed to clean it and what you, we actually started out running Java and then we ran Java on top of Hadoop, but then we came across Spark and we said that's it. For us to go to the next step of actually really get into Hadoop, we were going to have to get some more skills and to find the skills to actually program in Hadoop was going to be complex. And to train them organically was going to be complex. We got a lot of smart people, but- >> Dave: You got a lot of stuff to do, too. >> That's the thing, we wanted to spend more time getting information out of the data as opposed to the framework of getting it to run and everything. >> Dave: Okay, so there's a lot of questions coming out. You mentioned Kinesis, so you've replaced that? >> Yeah, when we went to the cloud, we used as many Amazon services as we can as opposed to growing something for ourselves so when we get onto Amazon, you know, getting data into an S3 bucket through Kineses was a no-brainer. When we transferred over to the cloud, it took us less than 30 days to point our devices at Kinesis and we had all our data flowing into S3. So that was like wow, let's go do something else. >> So I got to ask you something else. Again, I love when practitioners come on. So, one of the complaints that I hear sometimes from AWS users and I wonder if you see this is the data pipeline is getting more and more complex. I got an API for Kinesis, one for S3, one for DynamoDB, one for Elastic Plus. There must be 15 proprietary APIs that are primitive, and again, it gets complicated and sometimes it's hard to even figure out what's the right cost model to use. Is that increasingly becoming more complex or is it just so much simpler than what you had before and you're in nirvana right now? >> When you mentioned costs, just the cost of moving to the cloud was a major cost reduction for us. >> Reduction? >> So now it's - >> You had that HP corporate tax on you before - >> Yeah, now we're going from data centers and software licenses. >> So that was a big win for you? >> Yeah, huge, and that released us up to go spend dollars on resources to focus on the data science aspect. So when we start looking at it, we continually optimized, don't get me wrong. But, the point is, if we can bring it up real quickly, that's going to save us a lot of money even if you don't have to maintain it. So we want to focus on creating the code inside of Spark that's actually doing the real work as opposed to the infrastructure. So that cost savings was huge. Now, when you look at it over time, we could've over analyzed that and everything else, but what we did was we used a rapid prototyping approach and then from there, we continued to optimize. So what's really good about the cloud is you can predict the cost and with internal data centers and software licenses and everything else, you can't predict the cost because everybody's trying to figure out who's paying for what. But in the case of the cloud, it's all pretty much you get your bill and you understand what you're paying. So anyway - >> And then you can adjust accordingly? >> We continue to optimize so we use the services but if we have for some reason, it's going to deliver us an advantage, we'll go develop it. But right now, our advantage is we got umteen opportunities to create AI type code and applications to basically automate these services, we don't even have enough resources to do it right now. But, the common programming platform's going to help us. >> Can you drill into those umpteen examples? Just some of them because - >> I mentioned the battery one for instance. So take that across the whole system so now you've got your storage devices, you've got your software that's running on there, we've got built into our system security monitoring at the firmware level just basically connecting into that and adding AI around that is huge because now we can see a tax that may be happening upon your fleet and we can create services out of that. Anything that you can automate around that is money in our pocket or money in our customers' pocket so if we can save them money with these new services, they're going to be more willing to come to HP for products. >> It's actually more than just automation because it's the stuff you couldn't do with 1,000 monkeys trying to write Shakespeare. You have data that you could not get before. >> You're right, what we're doing, the automation is helping us uncover things that we would've never seen and you're right, the whole gorilla walking through the room, I could sit there and I could show you tons of examples of where we're missing the boat. Even when we brought up our first data sets, we started looking at them and some of the stuff we looked at, we thought this is just bad data and actually it wasn't, it was bad product. >> People talk about dark data - >> We had no data models, we had no data model to say is it good or bad? And now we have data models and we're continuing to create those data models around, you create the data model and then you can continue to teach it and that's where we create the apps around it. Our primitives are the data models that we're creating from the device data that we have. >> Are there some of these apps where some of the intelligence lives on the device and it can, like in a security attack, it's a big surface area, you want to lock it down right away. >> We do. The good example on the security is we built something into our products called Sure Start. What essentially it is is we have ability to monitor the firmware layer and so there's a local process that's running independent of everything else that's running that's monitoring what's happening at that firmware level. Well, if there's an attack, it's going to immediately prevent the attack or recover from the attack. Well, that's built into the product. >> But it has to have a model of what this anomalous behavior is. >> Well in our case, we're monitoring what the firmware should look like and if we see that the firmware, you know you take check sums from the firmware or the pattern - >> So the firmware does not change? >> Well basically we can take the characteristics of the firmware and monitor it. If we see that changing, then we know something's wrong. Now it can get corrupt through hardware failure maybe because glitches can happen maybe. I mean solar flares can cause problems sometimes. So, the point is we've found that customers had problems sometimes where basically their firmware would get corrupted and they couldn't start their system. So we're like are we getting attacked? Is this a hardware issue? Could it be bad Flash devices? There's always all kinds of things that could cause that. Well now we monitor it and we know what's going on. Now, the other cool thing is we create logs from that so when those events occur, we can collect those logs and we're monitoring those events so now we can have something monitor the logs that are monitoring all the units. So, if you've got millions of units out there, how are you going to do that manually? You can't and that's where the automation comes in. >> So the logs give you the ability up in the cloud or at HP to look at the ecosystem of devices, but there is intelligence down on the - >> There's intelligence to protect the device in an auto recover which is really cool. So in the past, you had to get your repair. Imagine if someone attacked your fleet of notebooks. Say you got 10 thousand of them and basically it brought every single one of them down one day. What would you do? >> Dave: Freak. >> And everything you got to replace. It was just an attack and it could happen so we basically protect against that with our products and at the same time, we can see that may be a current and then from the footprints of it, we can then do analysis on it and determine was that malicious, is this happening because of a hardware issue, is this happening because maybe we tried to update the firmware and something happened there? What caused that to happen? And so that's where collecting the data from the population then helps us do that and then mix that with other things like service events. Are we seeing service events being driven by this? Thermal, we can look at the thermal data. Maybe there's some kind of heat issue that's causing this to happen. So we starting mixing that. >> Did Samsung come calling to buy this? >> Well, actually what's funny is Samsung is actually a supplier of ours, is a battery supplier of ours. So, by monitoring the batteries, what's interesting is we're helping them out because we go back to them. One of the things I'm working on, is we want to create apps that can go back to them and they can see the performance of their product that they're delivering to us. So instead of us having to call a meeting and saying hey guys let's talk about this, we've got some problems here. Imagine how much time that takes. But if they can self-monitor, then they're going to want to keep supplying to us, then they're going to better their product. >> That's huge. What a productivity boost because you're like hey, we got a problem, let's meet and talk about it and then you take an action to go and figure out what it is. Now if you need a meeting, it's like let's look at the data. >> Yeah, you don't have enough people. >> But there's also potentially a shift in pricing power. I would imagine it shifts a little more in your favor if you have all the data that indicates the quality of their product. >> That's an interesting thing. I don't know that we've reached that point. I think that in the future, it would be something that could be included in the contracts. The fact that the world is the way it is today and data is a big part of that to where going forward, absolutely, the fact that you have that data helps you to better have a relationship with your suppliers. >> And your customers, I mean it used to be that the brand used to have all the information. The internet obviously changed all that, but this whole digital transformation and IOT and all those log data, that sort of levels the playing field back to the brand. >> John: It actually changes it. >> You can now add value for the consumer that you couldn't before. >> And that's what HP's trying to do. We're invested to exactly do that is to really improve or increase the value of our brand. We have a strong brand today but - >> What do you guys do with - we got to wrap - but what do you do with databricks? What's the relationship there? >> Databricks, again we decided that we didn't want to be the experts on managing the whole Spark thing. The other part was that we're going to be involved with Spark and help them drive the direction as far as our use cases and what have you. Databricks and Spark go hand in hand. They got the experts there and it's been huge, our relationship, being able to work with these guys. But I recognize the fact that, and going back to software development and everything else, we don't want to spare resources on that. We got too many other things to do and the less that I have to worry about my Spark code running and scaling and the cost of it and being able to put code in production, the better and so, having that layer there is saving us a ton of money and resources and a ton of time. Just imagine time to market, it's just huge. >> Alright, John, sorry we got to wrap. Awesome having you on, thanks for sharing your story. >> It's great to talk to you guys. >> Alright, keep it right there everybody. We'll be back with our next guest. This is the CUBE live from Spark Summit East, we'll be right back.

Published Date : Feb 9 2017

SUMMARY :

brought to you by databricks. the world-wide leader in tech coverage. we do a lot of shows with HPE, In the past, we were basically a data warehousing bit more detail inside of HP. One of the things that was important was we had a common the way we can do that is by using the data we can provide predictive type of capabilities for support. So the data that we can collect back from our devices It's interesting where you talk about internal and the quality of the experience to our customers. Then, the other thing is, when you look at populations, I say if you got a smartphone, you're giving up Spark, where does Spark fit into all of this? and then when you put that on the cloud platform, and that's going to simplify our efforts? and that was internal, right? and to find the skills to actually program That's the thing, we wanted to spend more time Dave: Okay, so there's a lot of questions coming out. so when we get onto Amazon, you know, getting data into So I got to ask you something else. of moving to the cloud was a major cost reduction for us. Yeah, now we're going from But, the point is, if we can bring it up real quickly, We continue to optimize so we use the services So take that across the whole system because it's the stuff you couldn't do with that we would've never seen and you're right, And now we have data models and we're continuing intelligence lives on the device and it can, The good example on the security is we built But it has to have a model of what Now, the other cool thing is we create logs from that So in the past, you had to get your repair. and at the same time, we can see that may be a current of their product that they're delivering to us. and then you take an action to go if you have all the data that indicates and data is a big part of that to where the playing field back to the brand. that you couldn't before. is to really improve or increase the value of our brand. and the less that I have to worry about Alright, John, sorry we got to wrap. This is the CUBE live from Spark Summit East,

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
DavePERSON

0.99+

Dave ValantePERSON

0.99+

George GilbertPERSON

0.99+

JohnPERSON

0.99+

GeorgePERSON

0.99+

HPORGANIZATION

0.99+

BostonLOCATION

0.99+

John LandryPERSON

0.99+

Hewlett PackardORGANIZATION

0.99+

AmazonORGANIZATION

0.99+

10 thousandQUANTITY

0.99+

JavaTITLE

0.99+

GoogleORGANIZATION

0.99+

SamsungORGANIZATION

0.99+

SparkORGANIZATION

0.99+

second dayQUANTITY

0.99+

AWSORGANIZATION

0.99+

second partQUANTITY

0.99+

60 thousand unitsQUANTITY

0.99+

PythonTITLE

0.99+

HadoopTITLE

0.99+

less than 30 daysQUANTITY

0.99+

millions of dollarsQUANTITY

0.99+

todayDATE

0.99+

Hewlett Packard EnterpriseORGANIZATION

0.99+

once a monthQUANTITY

0.99+

HPEORGANIZATION

0.99+

both sidesQUANTITY

0.99+

SparkTITLE

0.99+

1,000 monkeysQUANTITY

0.99+

oneQUANTITY

0.99+

Boston, MassachusettsLOCATION

0.99+

once a weekQUANTITY

0.98+

once a dayQUANTITY

0.98+

15 proprietary APIsQUANTITY

0.98+

OneQUANTITY

0.98+

bothQUANTITY

0.98+

one dayQUANTITY

0.98+

Map ReduceTITLE

0.97+

Spark Summit East 2017EVENT

0.97+

first data setsQUANTITY

0.97+

two-foldQUANTITY

0.97+

Spark SummitEVENT

0.96+

RTITLE

0.96+

a tonQUANTITY

0.95+

millions of unitsQUANTITY

0.95+

ScaleTITLE

0.95+

KafkaTITLE

0.94+

ShakespearePERSON

0.94+

S3TITLE

0.94+

Manish Gupta, Redis Labs | Spark Summit East 2017


 

>> Announcer: Live from Boston, Massachusetts, it's theCUBE, covering Spark Summit East 2017. Brought to you by Databricks. Now, here are your hosts Dave Vellante and George Gilbert. >> Welcome back to snowy Boston, everybody. This is theCUBE, the leader in live tech coverage. We're here at Spark Summit East, hashtag SparkSummit. Manish Gupta is here, he's the CMO at Redis Labs. Manish, welcome to theCUBE. >> Thank you, good to be here. >> So, you know, 10 years ago you say you're in the database business and everybody would yawn. Now you're the life of the party. >> Yeah, the world has changed. I think the party has lots and lots of players. We are happy to be on the top of that heap. >> It is a crowded space, so how does Redis Labs differentiate? >> Redis Labs is the company behind the massively popular open source Redis, and Redis became popular because of its performance primarily, and then simplicity. Developers could very easily run up an instance of Redis, solve some very hairy problems, and time to market was a big issue for them. Redis Enterprise took that forward and enabled it to be mission critical, ready for the largest workloads, ready for things that the enterprises need in a highly distributed clustered environment. So they have resilience and they benefit from the performance of Redis. >> And your claim to fame, as you say, is that top-gun performance, you guys will talk about some of the benchmarks later. We're talking about use cases like fraud detection, as example. Obviously ad serving would be another one. But add some color to that if you would. >> Redis is whatever you need to make real time real, Redis plays a very important role. It is able to deliver millions of operations per second with sub-millisecond latency, and that's the hallmark. With data structures that comprise Redis, you can solve the problems in a way, and the reason you can get that performance is because the data structures take some very complex issues and simplify the operation. Depending on the use case, you could use one of the data structures, you can mix and match the data structures, so that's the power of a Redis. We're used for ITO, for machine learning, for metering of billing and telecommunications environment, for personalization, for ad serving with companies like Groupon and others, and the list goes on and on. >> Yeah, you've got a big list on your website of all your customers, so you can check that out. Let's get the business model piece out of the way. Everybody's always fascinated. Okay, you got open source, how do you make money? How does Redis make money? >> Yeah, you know, we believe strategically fostering the growth of open source is foundational in our business model, and we invest heavily both R&D and marketing to do that. On top of that, to enable enterprise success and deployment of Redis, we have the mission critical, highly available Redis Enterprise offerings. Our monetization is entirely based on the Redis Enterprise platform, which takes advantage of the data structures and performance of core Redis, but layers on top management and the capabilities that make things like auto-recovery, auto-sorting, management much, much easier for the enterprise. We make that available in four deployment models. The enterprise can select us as Redis cloud, which runs on a public infrastructure on any of the four major platforms. We also allow for the enterprise to select a VPC environment in their own private clouds. They can also get software and self-manage that, or get our software and we can manage it for them. Four deployment options are the modalities in other ways where the enterprise customers help us monetize. >> When you said four major platforms, you meant cloud platforms? >> That's right. AWS, >> So, AWS, Azure >> Azure, Google, and IBM. >> Is IBM software, got there in the fourth, alright. >> That's right, all four. >> Go to the whip IBM. Go ahead, George. >> Along the lines of the business model, and we were sort of starting to talk about this earlier offline, you're just one component in building an application, and there's always this challenge of, well, I can manage my component better than anyone else, but it's got to fit with a bunch of other vendors' components. How do you make that seamless to the customer so that it's not defaulting over to a cloud vendor who has to build all the components themselves to make it work together? >> Certainly, you know, database is an integral part of your stack, of your application stack, but it is a stack, so there are other components. Redis and Redis Labs has a very, very large ecosystem within which we operate. We work closely with others for interfaces, for connectors, for interoperability, and that's a sustained environment that we invest in on a continuous basis. >> How do handle application consistency? A lot of in the no-SQL world, even in the AWS world, you hear about eventual consistency, but in the real-time world, there's a need for more rigorous, what's your philosophy there, how do you approach that? >> I think that's an issue that many no-SQL vendors have not been able to crack. Redis Labs has been at the forefront of that. We are taking an approach, and we are offering what we call tuneable consistency. Depending on the economics and the business model and the use case, the needs of consistency vary. In some cases, you do need immediate consistency. In other cases, you don't ever need consistency. And to give that flexibility to the customer is very important, so we've taken the approach where you can go from loose consistency to what we call strong eventual consistency. That approach is based on a fairly well trusted architecture and approach called CRDT, Conflict-free Replication Data Type. That approach allows us to, regardless of what the cluster magnitude or the distribution looks like geographically, we can deliver strong eventual consistency which meets the needs of majority of the customers. >> What are you seeing in terms of, you know, also in that a discussion about acid properties, and how many workloads really need acid properties. What are seeing now as you get more cloud native workloads and more no-SQL oriented workloads in terms of the requirement for those acid properties? >> First of all, we truly believe and agree that not all environments required acid support. Having said that, to be a truly credible database, you must support acid, and we do. Redis is acid-compli, supports acid, and Redis Labs certainly supports that. >> I remember on a stage once with Curt Monash, I'm sure you know Curt, right? Very famous database person. And he basically had a similar answer. But you would say that increasingly there are workloads that, the growth workloads don't necessarily require that, is that fair statement? >> That's a fair statement I would say. >> Dave: Great, good. >> There's a trade-off, though, when you talked about strong eventual consistency, potentially you have to wait for, presumably, a quorum of the partitions, I'm getting really technical here, but in other words, you've got a copy of the data here-- >> Dave: Good CMO question. (laughing) >> But your value proposition to the customers, we get this stuff done fast, but if you have to wait for a couple other servers to make sure that they've got the update, that can slow things way down. How does that trade-off work? >> I think that's part of the power of our architecture. We have a nothing shared, single proxy architecture where all of the replication, the disaster recovery, and the consistency management of the back end is handled by the proxy, and we ensure that the performance is not degraded when you are working through the consistency challenges, and that's where significant amount of IP is in the development of that proxy. >> I'll take that as a, let's go into it even more offline. >> Manish: Sounds good. >> And I have some other CMO questions, if I may. A lot of young companies like yours, especially in open source world, when they go to get the word out, they rely on their community, their open source community, and that's the core, and that makes a lot of sense, it's their peeps. As you become, grow more into enterprise grade apps and workloads, how do you extend beyond that? What is Redis Labs doing to sort of reach that C-Suite, are you even trying to reach that C-Suite up level to messaging? How do you as a CMO deal with those challenges? >> Maybe I'll begin by talking about our personas that matter to us in the ecosystem. The enterprise level, the architects, the developers, are the primary target, which we try to influence in early part of the decision cycle, it's at the architectural level. The ultimate teams that manage, run, and operate the infrastructure is certainly the DevOps, or the operations teams, and we spend time there. All along for some of the enterprise engagements, CIOs, chief data officers, and CTOs tend to play a very important role in the decisions and the selection process, and so, we do influence and interact with the C-Suite quite heavily. What the power of the open source gives us is that groundswell of love for Redis. Literally you can walk around a developer environment, such as the Spark Summit here, and you'll find people wearing Redis Geek shirts. And we get emails from Kazakhstan and strange, places from all over the world where we don't necessarily have salesforce, and requesting t-shirts, "send us stickers." Because people love Redis, and the word of mouth, that ground level love for the technology enables the decisions to be so much easier and smoother. We're not convincing, it's not a philosophical battle anymore. It's simply about the use case and the solution where Redis Enterprise fits or doesn't fit. >> Okay, so it really is that core developer community that are your advocates, and they're able to internally sell to the C-Suite. A lot of times the C-Suite, not the CTO so much, but certainly the CIO, CDO are like, "Yeah, yeah, they're geekin' out on some new hot thing. "What's the business impact?" Do you get that question a lot, and how do address it? >> I think then you get to some of the very basic tools, ROI calculators and the value proposition. For the C-level, the message is very simple. We are the least risky bet. We are the best long-term proposition, and we are the best cost answer for their implementation. Particularly as the needs are increasingly becoming more real-time in nature, they are not batch processed. Yes, there will always be some of that, but as the workloads are becoming, there is a need for faster processing, there is a need for quick insights, and real-time is not a moniker anymore, right. Real-time truly needs to be delivered today. And so, I think those three propositions for the C-Suite are resonating very well. >> Let's talk about ROI calculators for a second. I love talking about it because it underscores what a company feels as though its core value proposition is. I would think with Redis Labs part of the value proposition is you are enabling new types of workloads and new types of, whether it's sources of revenue or productivity. And these are generally telephone numbers as compared to some of the cost savings head to head to your competition, which of course you want to stress as well because the CFO cares about the cap-backs. What do you emphasize in that, and we don't have to get into the calculator itself, but in the conceptual model, what's the emphasis? Is it on those sort of business value attributes, is it on the sort of cost-savings? How do you translate performance into that business value? A lot of questions there, but if you could summarize, that'd be great. >> Well, I think you can think of it in three dimensions. The very first one is, does the performance support the use case or the solution that is required? That's the very first one. The second piece that fits in it, and that's in our books, that's operations per second and the latency. The second piece is the cost side, and that has two components to it. The first component is, what are the compute requirements? So, what is the infrastructure underneath that has to support it? And the efficiency that Redis and Redis Enterprise has is dramatically superior to the alternatives. And so, the economics show up. To run a million operations per second, we can do that on two nodes as opposed to alternative, which might need 50 nodes or 300 nodes. >> You can utilize your assets on the floor much better than maybe the competition can. >> This is where the data structures come into play quite a bit. That's one part of-- >> Dave: That's one part of the cost. >> Yeah. The other part of the cost is the human cost. >> Dave: People, yeah. >> And because, and this goes back to the open source, because the people available with the talent and the competency and appreciation for Redis, it's easy to procure those people, and your cost of acquisition and deploying goes down quite a bit. So, there's a human cost to it. The third dimension to this whole equation is time to market. And time to market is measured in many ways. Is it lost revenue if it takes you longer to get there? And Redis consistently from multiple analysts' reports gets top ranking for fastest way to get to market because of how simple it is. Beyond performance, simplicity is a second hallmark. >> That's a benefit acceleration, and you can quantify that. >> Absolutely, absolutely. And that's a revenue parameter, right. >> For years, people have been saying this Cambrian explosion of databases is unsustainable, and sort of in response we've gotten a squaring of the Cambrian explosion. The question is, with your sort of very flexible, I don't want to get too geeky, 'cause Dave'll cut me off, but the idea that you can accommodate time series and all these different ways of, all these different types of data, are we approaching a situation where customers can start consolidating their database choices and have fewer vendors, fewer products in their landscape? >> I think not only are we getting there, but we must get there. You've got over 300 databases in the marketplace, and imagine a CIO or an architect trying to have to sort through that to make a decision, it's difficult, and you certainly cannot support it from a trading standpoint or from an investment, cap-backs, and all that standpoint. What we have done with Redis is introduce something called Redis Modules. We released that at the last RedisConf in May in San Francisco. And the Redis Module is a very simple concept but a very powerful concept. It's an API which can be utilized to take an existing development effort, written as CC++, that can be ported onto the Redis data structures. This gives you the flexibility without having to reinvent the wheel every single time to take that investment, port it on top of Redis, and you get the performance, and you can make now Redis becomes a multi-model database. And I'm going to get to your answer of how do you address the multiple needs so you don't need multiple databases. To give you some examples, since the introduction of Redis Modules, we have now over 50 modules that have been published by a variety of places, not just Redis Labs. To indicate how simple and how powerful this model is. We took Lucene and developed the world's fastest full-text search engine as a module. We have very recently introduced Redis machine learning as a module that works with Spark ML and serves as a great serving layer in the machine learning domain. Just two very simple examples, but work that's being done ported over onto Redis data structures and now you have ability to do some very powerful things because of what Redis is. And this is the way future's going to be. I think every database is trying to offer multi-functionality to be multi-model in nature, but instead of doing it one step at a time, this approach gives us the ability to leverage the entire ecosystem. >> Your point being consolidation's inevitable in this business as well. >> Manish: Architectural consolidation. >> Yes, but also you would think, company consolidation, isn't that going to follow? What do you make of the market, and tell me, if you look back on the database market and what Oracle was able to achieve in the face of, maybe not as many players, but you had Sybase and Informix, and certainly DB2's still around, and SQL Server's still around, but Oracle won, and maybe it was SQL standards that. It's great to be lucky and good. Can we learn from that, or is this a whole different world? Are there similarities, and how do you, how do you see that consolidation potentially shaking out, if you agree that there will be consolidation? >> Yeah, there has to be, first and foremost, an architectural approach that solves the OPEX, CAPEX challenge for the enterprise. But beyond that, no industry can sustain the diversity and the fragmentation that exists in database world. I think there will always be new things coming out, of universities particularly. There's great innovation and research happening, and that is required to augment. But at the end of the day, the commercial enterprises cannot be of the fragmented volume that we have today in the database world, so there is going to be some consolidation, and it's not unnatural. I think it's natural, it's expected, time will tell what that looks like. We've seen some of our competitors acquire smaller companies to add graph functionality, to add search functionality. We just don't think that's the level of consolidation that really moves the needle for the industry. It's got to be at a higher level of consolidation. >> I don't want to, don't take this the wrong way, don't hate me for saying it, but is Oracle sort of the enemy, if I can say that. I mean, it's like, no, okay. >> Depends how you define enemy. >> I'm not going to go do many of the workloads that you're talking about on Oracle, despite what Larry tells me at Oracle OpenWorld. And I'm not going to make Oracle my choice for any of the workloads that you guys are working on. I guess in terms, I mean, everybody who's in the database business looks at that and say, "Hey, we can do it cheaper, better, "more productively," but, could you respond to that, and what do you make of Amazon's moves in the database world? Does that concern you? >> We think of Amazon and Oracle as two very different philosophies, if you can use that word. The approach we have taken is really a forward-looking approach and philosophy. We believe that the needs of the market need to be solved in new ways, and new ways should not be encumbered by old approaches. We're not trying to go and replicate what was done in the SQL world or in a relational database world. Our approach is how do you deliver a multi-model database that has the real-time attribute attached to it in a way that requires very limited computer force power and very few resources to manage? You take all of those things as kind of the core philosophy, which is a forward-looking philosophy. We are definitely not trying to replicate what an Oracle used to be. AWS I think is a very different animal. >> Dave: Interesting, though. >> They have defined the cloud, and I think play a very important role. We are a strong partner of theirs, much of our traffic runs on AWS infrastructure, certainly also on other clouds. I think AWS is one to watch in how they evolve. They have database offerings, including Redis offerings. However, we fully recognize, and the industry recognizes that that's not to the same capability as Redis Enterprise. It's open sourced Redis managed by AWS, and that's fine as a cache, but you cannot persist, and you really cannot have a multi-model capability that's a full database in that approach. >> And you're in the marketplace. >> Manish: We are in the marketplace. >> Obviously. >> And actually, we announced earlier, a few weeks ago, that you can buy and get Redis cloud access, which is Redis Enterprise cloud, on AWS through the integrated billing approach on their marketplace. You can have an AWS account and get our service, the true Redis Enterprise service. >> And as a software company, you'd figure, okay, the cloud infrastructures are service, we don't care what infrastructure it runs on. Whatever the customer wants, but you see AWS making these moves up-market, you got to obviously be paying attention to that. >> Manish: Certainly, certainly. >> Go ahead, last question. >> Interesting that you were saying that to solve this problem of proliferation of choice it has to be multi-model with speed and low resource requirement. If I were to interpret that from an old-style database perspective, it would be you're going to get, the multi-model is something you are addressing now, with the extensibility, but the speed means taking out that abstraction layer that was the query optimizer sort of and working almost at the storage layer, or having an option to do that. Would that be a fair way to say? >> No, I don't think that necessarily needs to be the case. For us, speed translates from the simplicity and the power of the data structures. Instead of having to serialize, deserialize before you process data in a Spark context, or instead of having to look for data that is perhaps not put in sorted sets for a use case that you might be doing, running a query on, if the data is already handled through one of the data structures, you now have a much faster query time, you now have the ability to reach the data in the right approach. And again, this is no-SQL, right, so it's a schema lesson write and it sets your scheme as you want it be on read. We marry that with the data structures, and that gives you the ultimate speed. >> We have to leave it there, but Manish, I'll give you the last word. Things we should be paying attention to for Redis Labs this year, events, announcements? >> I think the big thing I would leave the audience with is RedisConf 2017. It's May 31 to June 2 in San Francisco. We are expecting over 1,000 people. The brightest minds around Redis of the database world will be there, and anybody who is considering deploying the next generation database should attend. >> Dave: Where are you doing that? >> It's the Marriott Marquis in San Franciso. >> Great, is that on Howard Street, across from the--? >> It is right across from Moscone. >> Great, awesome location. People know it, easy to get to. Well, congratulations on the success. We'll be lookin' for outputs from that event, and hope to see you again on theCUBE. >> Thank you, enjoyed the conversation. >> Alright, good. Keep it right there, everybody, we'll be back with our next guest. This is theCUBE, we're live from Spark Summit East. Be right back. (upbeat electronic rock music)

Published Date : Feb 9 2017

SUMMARY :

Brought to you by Databricks. Manish Gupta is here, he's the CMO at Redis Labs. So, you know, 10 years ago you say We are happy to be on the top of that heap. Redis Labs is the company behind But add some color to that if you would. and the reason you can get that performance Let's get the business model piece out of the way. We also allow for the enterprise to select a VPC environment That's right. Google, and IBM. Go to the whip IBM. Along the lines of the business model, Certainly, you know, database is an integral part and the use case, the needs of consistency vary. in terms of the requirement for those acid properties? you must support acid, and we do. the growth workloads don't necessarily require that, Dave: Good CMO question. but if you have to wait for a couple other servers and the consistency management of the back end and that's the core, and that makes and the word of mouth, that ground level love but certainly the CIO, CDO are like, For the C-level, the message is very simple. part of the value proposition is you are enabling That's the very first one. much better than maybe the competition can. This is where the data structures of the cost. The other part of the cost is the human cost. and the competency and appreciation for Redis, And that's a revenue parameter, right. but the idea that you can accommodate time series We released that at the last RedisConf in this business as well. and tell me, if you look back on the database market that really moves the needle for the industry. but is Oracle sort of the enemy, if I can say that. for any of the workloads that you guys are working on. We believe that the needs of the market and that's fine as a cache, but you cannot persist, the true Redis Enterprise service. okay, the cloud infrastructures are service, the multi-model is something you are addressing now, and the power of the data structures. but Manish, I'll give you the last word. of the database world will be there, and hope to see you again on theCUBE. This is theCUBE, we're live from Spark Summit East.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
AmazonORGANIZATION

0.99+

Dave VellantePERSON

0.99+

George GilbertPERSON

0.99+

DavePERSON

0.99+

AWSORGANIZATION

0.99+

GeorgePERSON

0.99+

IBMORGANIZATION

0.99+

OracleORGANIZATION

0.99+

Howard StreetLOCATION

0.99+

CurtPERSON

0.99+

second pieceQUANTITY

0.99+

San FranciscoLOCATION

0.99+

Redis LabsORGANIZATION

0.99+

Manish GuptaPERSON

0.99+

two nodesQUANTITY

0.99+

RedisORGANIZATION

0.99+

two componentsQUANTITY

0.99+

twoQUANTITY

0.99+

San FrancisoLOCATION

0.99+

LarryPERSON

0.99+

ManishPERSON

0.99+

first componentQUANTITY

0.99+

Boston, MassachusettsLOCATION

0.99+

over 50 modulesQUANTITY

0.99+

June 2DATE

0.99+

May 31DATE

0.99+

GoogleORGANIZATION

0.99+

BostonLOCATION

0.99+

Curt MonashPERSON

0.99+

MayDATE

0.99+

millionsQUANTITY

0.99+

third dimensionQUANTITY

0.98+

50 nodesQUANTITY

0.98+

MosconeLOCATION

0.98+

fourthQUANTITY

0.98+

Redis EnterpriseTITLE

0.98+

300 nodesQUANTITY

0.98+

RedisTITLE

0.98+

KazakhstanLOCATION

0.98+

over 1,000 peopleQUANTITY

0.98+

one partQUANTITY

0.98+

bothQUANTITY

0.98+

one stepQUANTITY

0.97+

C-SuiteTITLE

0.97+

Marriott MarquisORGANIZATION

0.97+

second hallmarkQUANTITY

0.97+

10 years agoDATE

0.97+

Spark Summit East 2017EVENT

0.97+

GrouponORGANIZATION

0.97+

first oneQUANTITY

0.97+

CDOTITLE

0.97+

over 300 databasesQUANTITY

0.96+

SQL ServerTITLE

0.96+

Redis Enterprise cloudTITLE

0.96+

Bryan Duxbury, StreamSets | Spark Summit East 2017


 

>> Announcer: Live from Boston, Massachusetts. This is "The Cube" covering Spark Summit East 2017. Brought to you by Databricks. Now here are your hosts Dave Volante and George Gilbert. >> Welcome back to snowy Boston everybody. This is "The Cube." The leader in live tech coverage. This is Spark Summit. Spark Summit East #SparkSummit. Bryan Duxbury's here. He's the vice president of engineering at StreamSets. Cleveland boy! Welcome to "The Cube." >> Thanks for having me. >> You've very welcome. Tell us, let's start with StreamSets. We're going to talk about Spark and some of the use cases that it's enabling and some of the integrations you're doing. But what does StreamSets do? >> Sure, StreamSets is a data movement software. So I like to think of it either the first mile or the last mile of a lot of different analytical or data movement workflows. Basically we build a product that allows you to build a workflow, or build a data pipeline that doesn't require you to code. It's a graphical user interphase for dropping an origin, several destinations, and then lightweight transformations onto a canvas. You click play and it runs. So this is kind of different than, a lot of the market today is a programming tool or a command line tool. That still requires your systems engineers or your unfortunate data scientists pretending to be systems engineers to do systems engineering. To do a science project to figure out how to move data. The challenge of data movement I think is often underplayed how challenging it is. But it's extremely tedious work. You know, you have to connect to dozens or hundreds of different data sources. Totally different schemas. Different database drivers, or systems altogether. And it break all the time. So the home-built stuff is really challenging to keep online. When it goes down, your business is not, you're not moving data. You can't actually get the insights you built in the first place. >> I remember I broke into this industry you know, in the days of mainframe. You used to read about them and they had this high-speed data mover. And it was this key component. And it had to be integrated. It had to be able to move, back then, it was large amounts of data fast. Today especially with the advent of Hadoop, people say okay don't move the data, keep it in place. Now that's not always practical. So talk about the sort of business case for starting a company that basically moves data. >> We handle basically the one step before. I agree with you completely. Many data analytical situations today where you're doing like the true, like business-oriented detail, where you're actually analyzing data and producing value, you can do it in place. Which is to say in your cluster, in your Spark cluster, all the different environments you can imagine. The problem is that if it's not there already, then it's a pretty monumental effort to get it there. I think we see. You know a lot of people think oh I can just write a SQL script, right? And that works for the first two to 20 tables you want to deploy. But for instance, in my background, I used to work at Square. I ran a data platform there. We had 500 tables we had to move on a regular basis. Coupled with a whole variety of other data sources. So at some point it becomes really impractical to hand-code these solutions. And even when you build your own framework, and you start to build tools internally, you know, it's not your job really, these companies, to build a world class data movement tool. It's their job to make the data valuable, right? And actually data movement is like utility, right. Providing the utility, really the thing to do is be productive and cost effective, right? So the reason why we build StreamSets, the reason why this thing is a thing in the first place, is because we think people shouldn't be in the business of building data movement tools. They should be in the business of moving their data and then getting on with it. Does that make sense? >> Yeah absolutely. So talk about how it all fits in with Spark generally and specifically Spark coming to the enterprise. >> Well in terms of how StreamSets connects to stuff, we deploy in every way you can imagine, whether you want to run your own premise, on your own machines, or in the Cloud. It's up to you to deploy however you like. We're not prescriptive about that. We often get deployed on the edge of clusters, wether it's your Hadoop cluster or your Spark cluster. And basically we try not to get in the way of these analysis tools. There are many great analytical tools out there like Spark is a great example. We focus really on the moving of data. So what you'll see is someone will build a Spark streaming application or some big Spark SQL thing that actually produces the reports. And we plug in ahead of that. So if you're data is being collected from, you know, Edge web logs or some thing or some Kafka thing or a third party AVI or scripting website. We do the first collection. And then it's usually picked up from there with the next tool. Whether it's Spark or other things. I'm trying to think about the right way to put this. I think that people who write Spark they should focus on the part that's like the business value for them. They should be doing the thing that actually is applying the machine learning model, or is producing the report that the CEO or CTO wants to see. And move away from the ingest part of the business. Does that make sense? >> [] Yeah. >> Yeah. When the Spark guys sort of aspire to that by saying you don't have to worry about exactly when's delivery. And you know you can make sure this sort of guarantee, you've got guarantees that will get from point A to point B. >> Bryan: Yeah. >> Things like that. But all those sources of data and all those targets, writing all those adapters is, I mean, that's been a La Brea tar pit for many companies over time. >> In essence that is our business. I think that you touch on a good point. Spark can actually do some of these things right. There's not complete, but significant overlap in some cases. But the important difference is that Spark is a cluster tool for working with cluster data. And we're not going to beat you running a Spark application for consuming from Kafka to do your analysis. But you want to use Spark for reading local files? Do you want to use Spark for reading from a mainframe? Like these are things that StreamSets is built for. And that library of connectors you're talking about, it's our bread and butter. It's not your job as a data scientist, you know, applying Spark, to build a library of connectors. So actually the challenge is not the difficulty of building any one connector, because we have that down to an art now. But we can afford to invest, we can build a portfolio of connectors. But you as a user of Spark, can only afford to do it on demand. Reactive. And so that turn around time, of the cost it might take you to build that connector is pretty significant. And actually I often see the flow side. This is a problem I faced at Square, which was that people asked me to integrate new data sources, I had to say no. Because it was too rare, it was too unusual for what we had to do. We had other things to support. So the problem with that is that I have no idea what kind of opportunity cost I left behind. Like what kind of data we didn't get, kind of analysis we couldn't do. And with an approach like StreamSets, you can solve that problem sort of up front even. >> So sort of two follow ups. One is it would seem to be an evergreen effort to maintain the existing connectors. >> Bryan: Certainly. >> And two, is there a way to leverage connectors that others have built, like the Kafka connect type stuff. >> Truthfully we are a heavy-duty user of open source software so our actual product, if you dig in to what you see, it's a framework for executing pipelines. And it's for connecting other software into our product. So it's not like when we integrate Kafka we built a build brand new blue sky Kafka connector. We actually integrate what stuff is out there. So our idea is to bring as much of that stuff in there as we can. And really be part of the community. You know, our product is also open source. So we play well with the community. We have had people contribute connectors. People who say we love the product, we need it to connect to this other database. And then they do it for us. So it's been a pretty exciting situation. >> We were talking earlier off-camera, George and I have been talking all week about the badge workloads, interactive workloads, now you've got this sort of new emerging workloads, continuous screening workloads, which is in the name. What are you seeing there? And what kind of use cases is that enabling? >> So we're focused on mostly the continuous delivery workload. We also deliver the batch stuff. We're finding is people are moving farther and farther away from batch in general. Because batch was not the goal it was a means to the end. People wanted to get their data into their environment, so they could do their analysis. They want to run their daily reports, things like that. But ask any data scientist, they would rather the data show up immediately. So we're definitely seeing a lot of customers who want to do things like moving data live from a log file into Hadoop they can read immediately, in the order of minutes. We're trying to do our best to enable those kind of use cases. In particular we're seeing a lot of interest in the Spark arena, obviously that's kind of why we're here today. You know people want to add their event processing, or their aggregation, and analysis, like Spark, especially like Spark SQL. And they want that to be almost happening at the time of ingest. Not once it landed, but like when it's happening. So we're starting to build integration. We have kind of our foot in the door there, with our Spark processor. Which allows you to put a Spark workflow right in the middle of your data pipeline. Or as many of them as you want in fact. And we all sort of manage the lifecycle of that. And do all those connections as required to make your pipeline pretend to have a Spark processor in the middle. We really think that with that kind of workload, you can do your ingest, but you can also capture your real-time analytics along the way. And that doesn't replace batch reporting for say that'll happen after the fact. Our your daily reports or what have you. But it makes it that much easier for your data scientists to have, you know, a piece of intelligence that they had in flight. You know? >> I love talking to someone who's a practitioner now sort of working for a company that's selling technology. What do you see, from both perspectives, as Spark being good at? You know, what's the best fit? And what's it not good at? >> Well I think that Spark is following the arc of like Hadoop basically. It started out as infrastructure for engineers, for building really big scary things. But it's becoming more and more a productivity tool for analysts, data scientist, machine-learning experts. And we see that popping up all the time. And it's really exciting frankly, to think about these streaming analytics that can happen. These scoring machine-learning models. Really bringing a lot more power into the hands of these people who are not engineers. People who are much more focused on the semantic value of the data. And not the garbage in garbage out value of the data. >> You were talking before about it's really hard, data movement and the data's not always right. Data quality continues to be a challenge. >> Bryan: Yeah. >> Maybe comment on that. State the data quality and how the industry is dealing with that problem. >> It is hard, it is hard. I think that the traditional approach to data quality is to try and specify a quality up front. We take the opposite approach. We basically say that it's impossible to know that your data will be correct at all times. So we have what we call schema drift tools. So we try to go, we say like intent-driven approach. We're interacting with your data. Rather then a schema driven approach. So of course your data has an implicit schema as it's passing through the pipeline. Rather than saying, let's transform com three, we want you to use the name. We want you to be aware of what it is you're trying to actually change and affect. And the rest just kind of flows along with it. There's no magic bullet for every kind of data-quality issue or schema change that could possibly come into your pipeline. We try to do the best to make it easy for you to do effectively the best practice. The easiest thing that will survive the future, build robust data pipelines. This is one of the biggest challenges I think with like home-grown solutions. Is that it's really easy to build something that works. It's not easy to build something that works all the time. It's very easy to not imagine the edge cases. 'Cause it might take you a year until you've actually encountered you know, the first big problem. The real, the gotcha that you didn't consider when you were building your own thing. And those of us at StreamSets who have been in the industry and on the user side, we've had some of these experiences. So we're trying to export that knowledge in the product. >> Dave: Who do you guys sell to? >> Everybody. (laughing) We see a lot of success today with, we call it Hadoop replatforming. Which is people who are moving from their huge variety of data sources environment into like a Hadoop data-like kind of environment. Also Cloud, people are moving into the Cloud. The need a way for their data to get from wherever it is to where they want it to be. And certainly people could script these things manually. They could build their own tools for this. But it's just so much more productive to do it quickly in a UI. >> Is it an architect who's buying your product? Is it a developer? >> It's a variety. So I think our product resonates greatly with a developer. But also people who are higher up in the chain. People who are trying to design their whole topology. I think the thing I love to talk about is everyone, when they start on a data project, they sit down and they draw this beautiful diagram with boxes and arrows that says here's where the data's going to go. But a month later, it works, kind of, but it's never that thing. >> Dave: Yeah because the data is just everywhere. >> Exactly. And the reality is that what you have to do to make it work correctly within SLA guidelines and things like that is so not what you imagined. But then you can almost never go backwards. You can never say based on what I have, give me the box scenarios, because it's a systems analysis effort that no one has the time to engage in. But since StreamSets is actually instruments, every step of the pipeline, and we have a view into how all your pipelines actually fit together. We can give you that. We can just generate it. So we actually have a product. We've been talking about the StreamSet data collector which is the core like data movement product. We have like our enterprise edition, which is called the Dataflow Performance Manager, or DPM, It basically gives you a lot of collaboration and enterprise grade authentication. And access control, and the commander control features. So it aggregates your metrics across all your data collectors. It helps you visualize your topology. So people like your director of analytics, or your CIO, who want to know is everything okay? We have a dashboard for them now. And that's really powerful. It's a beautiful UI. And it's really a platform for us to build visualizations with more intelligence. That looks across your whole infrastructure. >> Dave: That's good. >> Yeah. And then the thing is this is strangely kind of unprecedented. Because, you know, again, the engineer who wants to build this himself would say, I could just deploy Graphite. And all of a sudden I've got graphs it's fine right. But they're missing the details. What about the systems that aren't under your control? What about the failure cases? All these things, these are the things we tackle. 'Cause it's our business we can afford to invest massively and make this a really first-class data engineering environment. >> Would it be fair to say that Kafka sort of as it exists today is just data movement built on a log, but that it doesn't do the analytics. And it doesn't really yet, maybe it's just beginning to do some of the monitoring you know, with a dashboard, or that's a statement of direction. Would it be fair to say that you can layer on top of that? Or you can substitute on top of it with all the analytics? And then when you want the really fancy analytic soup, you know, call out to Spark. >> Sure, I would say that for one thing we definitely want to stay out of the analytics base. We think there's many great analytics tools out there like Spark. We also are not a storage tool. In fact, we're kind of like, we're queue-like but we view ourselves more like, if there's a pipe and a pump, we're the pump. And Kafka is the pipe. I think that from like a monitoring perspective, we monitor Kafka indirectly. 'Cause if we know what's coming out, and we know what's going in later, we can give you the stats. And that's actually what's important. This is actually one of the challenges of having sort of a home-grown or disconnected solution, is that stitching together so you understand the end to end is extremely difficult. 'Cause if you have a relational database, and a Kafka, and a Hadoop, and a Spark job, sure you can monitor all those things. They all have their own UIs. But if you can't understand what the is on the whole system you're left like with four windows open trying to figure out where things connect. And it's just too difficult. >> So just on a sort of a positioning point of view for someone who's trying to make sense out of all the choices they have, to what extent would you call yourself a management framework for someone who's building these pipelines, whether from Scratch, or buying components. And to what extent is it, I guess, when you talk about a pump, that would be almost like the run time part of it. >> Bryan: Yeah, yeah. >> So you know there's a control plane and then there's a data plane. >> Bryan: Sure. >> What's the mix? >> Yeah well we do both for sure. I mean I would say that the data point for us is StreamSet's data collector. We move data, we physically move the data. We have our own internal pipeline execution engine. So it doesn't presuppose any other existing technologies, not dependent on Hadoop or Spark or Kafka or anything. You know to some degree data collector is also the control plane for small deployments. Because it does give you start to stop commanding control. Some metrics monitoring, things like that. Now, what people need to expand beyond the realm of single data collector, when they have enterprises with more than one business unit, or data center, or security zone, things like that. You don't just deploy one data collector, you deploy a bunch, dozens or hundreds. And in that case, that's where dataflow performance manager again comes in, as that control plane. Now dataflow performance manager has no data in it. It does not pass your actual business data. But it does again aggregate all of your metrics from all your data collectors and gives you a unified view across your whole enterprise. >> And one more follow-up along those lines. When you have a multi-vendor stack, or a multi-vendor pipeline. >> Bryan: Yeah. >> What gives you the meta view? >> Well we're at the ins and outs. We see the interfaces. So in theory if someone were to consume data out of Kafka do something right. Then there's another job later, like a Spark job. >> George: Yeah. >> So we don't automatic visibility for that. But our plan in the future is to expand as dataflow performance manager to take third party metric sources effectively. To broaden the view of your entire enterprise. >> You've got a bunch of stuff on your website here which is kind of interesting. Talking about some of the things we talked about. You know taming data drift is one of your papers. The silent killer of data integrity. And some other good resources. So just in sort of closing, how do we learn more? What would you suggest? >> Sure, yeah please visit the website. The product is open source and free to download. Data collector is free to download. I would encourage people to try it out. It's really easy to take for a spin. And if you love it you should check out our community. We have a very active Slack channel and Google group, which you can find from the website as well. And there's also a blog full of tutorials. >> Yeah well you're solving gnarly problems that a lot of companies just don't want to deal with. That's good thanks for doing the dirty work, we appreciate it. >> Yeah my pleasure. >> Alright Bryan thanks for coming on "The Cube." >> Thanks for having me. >> Good to see you. You're welcome. Keep right there buddy we'll be back with our next guest. This is "The Cube" we're live from Boston Spark Summit. Spark Summit East #SparkSummit right back. >> Narrator: Since the dawn.

Published Date : Feb 9 2017

SUMMARY :

Brought to you by Databricks. He's the vice president of engineering at StreamSets. and some of the integrations you're doing. And it break all the time. And it had to be integrated. all the different environments you can imagine. generally and specifically Spark coming to the enterprise. And move away from the ingest part of the business. When the Spark guys sort of aspire to that But all those sources of data and all those targets, of the cost it might take you to build that connector to maintain the existing connectors. like the Kafka connect type stuff. And really be part of the community. about the badge workloads, interactive workloads, We have kind of our foot in the door there, What do you see, from both perspectives, And not the garbage in garbage out value of the data. data movement and the data's not always right. and how the industry is dealing with that problem. The real, the gotcha that you didn't consider Also Cloud, people are moving into the Cloud. I think the thing I love to talk about is And the reality is that what you have to do What about the systems that aren't under your control? And then when you want the really fancy And Kafka is the pipe. to what extent would you call yourself So you know there's a control plane and gives you a unified view across your whole enterprise. When you have a multi-vendor stack, We see the interfaces. But our plan in the future is to expand Talking about some of the things we talked about. And if you love it you should check out our community. That's good thanks for doing the dirty work, Good to see you.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
BryanPERSON

0.99+

DavePERSON

0.99+

Dave VolantePERSON

0.99+

George GilbertPERSON

0.99+

GeorgePERSON

0.99+

Bryan DuxburyPERSON

0.99+

StreamSetsORGANIZATION

0.99+

first mileQUANTITY

0.99+

Boston, MassachusettsLOCATION

0.99+

dozensQUANTITY

0.99+

SparkTITLE

0.99+

500 tablesQUANTITY

0.99+

firstQUANTITY

0.99+

GoogleORGANIZATION

0.99+

20 tablesQUANTITY

0.99+

KafkaTITLE

0.99+

hundredsQUANTITY

0.99+

OneQUANTITY

0.99+

more than one business unitQUANTITY

0.98+

BostonLOCATION

0.98+

a yearQUANTITY

0.98+

Spark SQLTITLE

0.98+

todayDATE

0.98+

first collectionQUANTITY

0.98+

oneQUANTITY

0.98+

a month laterDATE

0.98+

bothQUANTITY

0.98+

twoQUANTITY

0.98+

SQLTITLE

0.98+

StreamSetsTITLE

0.98+

TodayDATE

0.97+

DatabricksORGANIZATION

0.97+

Spark Summit EastLOCATION

0.97+

one data collectorQUANTITY

0.97+

Boston Spark SummitLOCATION

0.97+

Spark Summit East 2017EVENT

0.97+

Spark Summit EastEVENT

0.96+

one stepQUANTITY

0.96+

ClevelandLOCATION

0.95+

both perspectivesQUANTITY

0.95+

StreamSetORGANIZATION

0.95+

SlackORGANIZATION

0.95+

SquareORGANIZATION

0.95+

HadoopTITLE

0.94+

four windowsQUANTITY

0.93+

first twoQUANTITY

0.93+

Spark SummitEVENT

0.93+

single data collectorQUANTITY

0.92+

Robbie Strickland, IBM - Spark Summit East 2017 - #SparkSummit - #theCUBE


 

>> Announcer: Live from Boston Massachusetts this is theCube. Covering Spark Summit East 2017, brought to you by Databricks. Now here are your hosts Dave Vellante and George Gilbert. >> Welcome back to theCube, everybody, we're here in Boston. The Cube is the worldwide leader in live tech coverage. This is Spark Summit, hashtag #SparkSummit. And Robbie Strickland is here. He's the Vice President of Engines & Pipelines, I love that title, for the Watson Data Platform at IBM Analytics, formerly with The Weather Company that was acquired by IBM. Welcome to you theCube, good to see you. >> Thank you, good to be here. >> So, it's my standing tongue-in-cheek line is the industry's changing, Dell buys EMC, IBM buys The Weather Company. [Robbie] That's right. >> Wow! That sort of says it all, right? But it was kind of a really interesting blockbuster acquisition. Great for the folks at The Weather Company, great for IBM, so give us the update. Where are we at today? >> So, it's been an interesting first year. Actually, we just hit our first anniversary of the acquisition and a lot has changed. Part of my role, new role at IBM, having come from The Weather Company, is a byproduct of the two companies bringing our best analytics work and kind of pulling those together. I don't know if we have some water but that would be great. So, (coughs) excuse me. >> Dave: So, let me chat for a bit. >> Thanks. >> Feel free to clear your throat. So, you were at IBM, the conference at the time was called IBM Insight. It was the day before the acquisition was announced and we had David Kenny on. David Kenny was the CEO of The Weather Company. And I remember we were talking, and I was like, wow, you have such an interesting business model. Off camera, I was like, what do you want to do with this company, you guys are like prime. Are you going public, you going to sell this thing, I know you have an MBA background. And he goes, "Oh, yeah, we're having fun." Next day was the announcement that IBM bought The Weather Company. I saw him later and I was like, "Aha!" >> And now he's the leader of the Watson Group. >> That's right. >> Which is part of our, The Weather Company joined The Watson Group. >> And The Cloud and analytics groups have come together in recognition that analytics and The Cloud are peanut butter and jelly. >> Robbie: That's absolutely right. >> And David's running that organization, right? >> That is absolutely right. So, it's been an exciting year, it's been an interesting year, a lot of challenges. But I think where we are now with the Watson Data Platform is a real recognition that the use dase where we want to try to make data and analytics and machine learning and operationalizing all of those, that that's not easy for people. And we need to make that easy. And our experience doing that at The Weather Company and all the challenges we ran into have informed the organization, have informed the road map and the technologies that we're using to kind of move forward on that path. >> And The Watson Data Platform was announced in, I believe, October. >> Robbie: That's right. >> You guys had a big announcement in New York City. And you took many sort of components that were viewed as individual discreet functions-- >> Robbie: That's right. >> And brought them together in a single data pipeline. Is that right? >> Robbie: That's right. >> So, maybe describe that a little bit for our audience. >> So, the vision is, you know, one of the things that's missing in the market today is the ability to easily grab data from some source, whether it's a database or a Kafka stream, or some sort of streaming data feed, which is actually something that's often overlooked. Usually you have platforms that are oriented around streaming data, data feeds, or oriented around data at rest, batch data. One of the things that we really wanted to do was sort of combine those two together because we think that's really important. So, to be able to easily acquire data at scale, bring it into a platform, orchestrate complex workflows around that, with the objective, of course, of data enrichment. Ultimately, what you want to be able to do is take those raw signals, whatever they are, and turn that into some sort of enriched data for your organization. And so, for example, we may take signals in from a mobile app, things like beacons, usage beacons on a mobile app, and turn that into a recommendation engine so we can feed real time content decisions back into a mobile platform. Well, that's really hard right now. It requires lots of custom development. It requires you to essentially stitch together your pipeline end to end. It might involve a machine learning pipeline that runs a training pipeline. It might involve, it's all batch oriented, so you land your data somewhere, you run this machine learning pipeline maybe in Spark or ADO or whatever you've got. And then the results of that get fed back into some data store that gets merged with your online application. And then you need to have a restful API or something for your application to consume that and make decisions. So, our objective was to take all of the manual work of standing up those individual pieces and build a platform where that is just, that's what it's designed to do. It's designed to orchestrate those multiple combinations of real time and batch flows. And then with a click of a button and a few configuration options, stand up a restful service on top of whatever the results are. You know, either at an interim stage or at the end of the line. >> And you guys gave an example. You actually showed a demo at the announcement. And I think it was a retail example, and you showed a lot of what would traditionally be batch processes, and then real time, a recommendation came up and completed the purchase. The inference was this is an out of the box software solution. >> Robbie: That's right. >> And that's really what you're saying you've developed. A lot of people would say, oh, it's IBM, they've cobbled together a bunch of their old products, stuck them together, put an abstraction layer on, and wrapped a bunch of services around it. I'm hearing from you-- >> That's exactly, that's just WebSphere. It's WebSphere repackaged. >> (laughing) Yeah, yeah, yeah. >> No, it's not that. So, one of the things that we're trying to do is, if you look at our cloud strategy, I mean, this is really part and parcel, I mean, the nexus of the cloud strategy is the Watson Data Platform. What we could have done is we could have said let's build a fantastic cloud and compete with Amazon or Google or Microsoft. But what we realized is that there is a certain niche there of people who want to take individual services and compose them together and build an application. Mostly on top of just raw VMs with some additional, you know, let's stitch together something with Lambda or stitch together something with SQS, or whatever it may be. Our objective was to sort of elevate that a bit, not try to compete on that level. And say, how do we bring Enterprise grade capabilities to that space. Enterprise grade data management capabilities end-to-end application development, machine learning as a first class citizen, in a cohesive experience. So that, you know, the collaboration is key. We want to be able to collaborate with business users, data scientists, data engineers, developers, API developers, the consumers of the end results of that, whether they be mobile developers or whatever. One of the things that is sort of key, I think, to the vision is that these roles that we've traditionally looked at. If you look at the way that tool sets are built, they're very targeted to specific roles. The data engineer has a tool, the data scientist has a tool. And what's been the difficult part is the boundaries between those have been very firm and the collaboration has been difficult. And so, we draw the personas as a Venn diagram. Because it's very difficult, especially if you look at a smaller company, and even sometimes larger companies, the data engineer is the data scientist. The developer who builds the mobile application is the data scientist. And then in some larger organizations, you have very large teams of data scientists that have these artificial barriers between the data scientist and the data engineer. So, how do we solve both cases? And I think the answer was for us a platform that allows for seamless collaboration where there is not these clean lines between the personas, that the tool sets easily move from one to the other. And if you're one of those hybrid people that works across lines, that the tool feels like it's one tool for you. But if you're two different teams working together, that you can easily hand off. So, that was one of the key objectives we're trying to answer. >> Definitely an innovative component of the announcement, for sure. Go ahead, George. >> So, help us sort of bracket how mature this end-to-end tool suite is in terms of how much of the pipeline it addresses. You know, from the data origin all the way to a trained model and deploying that model. Sort of what's there now, what's left to do. >> So, there are a few things we've brought to market. Probably the most significant is the data science experience. The data science experience is oriented around data science and has, as its sort of central interface, Jupyter Notebooks. Now, as well as, we brought in our studio, and those sorts of things. The idea there being that we'll start with the collaboration around data scientists. So, data scientists can use their language of choice, collaborate around data sets, save out the results of their work and have it consumed either publicly by some other group of data scientists. But the collaboration among data scientists, that was sort of step one. There's a lot of work going on that's sort of ongoing, not ready to bring to market, around how do we simplify machine learning pipelines specifically, how do we bring governance and lineage, and catalog services and those sorts of things. And then the ingest, one of the things we're working on that we have brought to market is our product called Lift which connects, as well. And that's bringing large amounts of data easily into the platform. There are a few components that have sort of been brought to market. dashDB, of course, is a key source of data clouded. So, one of the things that we're working on is some of these existing technologies that actually really play well into the eco system, trying to tie them well together. And then add the additional glue pieces. >> And some of your information management and governance components, as well. Now, maybe that is a little bit more legacy but they're proven. And I don't know if the exits and entries into those systems are as open, I don't know, but there's some capabilities there. >> Speaking of openness, that's actually a great point. If you look at the IIG suite, it's a great On-Premise suite. And one of the challenges that we've had in sort of past IBM cloud offerings is a lot of what has been the M.O. in the past is take a great On-Prem solution and just try to stand it up as a service in the cloud. Which in some cases has been successful, in other cases, less so. One of the things we're trying to look at with this platform is how do we leverage (a) open source. So that whatever you may already be running open source on, Prem or in some other provider, that it's very easy to move your workloads. So, we want to be able to say if you've got 10,000 lines of fraud detection code to map produce. You don't need to rewrite that in anything. You can just move it. And the other thing is where our existing legacy tech doesn't necessarily translate well to the cloud, our first strategy is see if there's any traction around an existing open source project that satisfies that need, and try to see if we can build on that. Where there's not, we go cloud first and we build something that's tailor made to come out. >> So, who's the first one or two customers for this platform? Is it like IBM Global Business Services where they're building the semi-custom industry apps? Or is it the very, very big and sophisticated, like banks and Telcos who are doing the same? Or have you gotten to the point where you can push it out to a much wider audience? >> That's a great question, and it's actually one that is a source of lots of conversation internally for us. If you look at where the data science experience is right now, it's a lot of individual data scientists, you know, small companies, those sorts of things coming together. And a lot of that is because some of the sophistication that we expect for Enterprise customers is not quite there yet. So, we wouldn't expect Enterprise customers to necessarily be onboarded as quickly at the moment. But if we look at sort of the, so I guess there's maybe a medium term answer and a long term answer. I think the long term answer is definitely the Enterprise customers, you know, leveraging IBM's huge entry point into all of those customers today, there's definitely a play to be made there. And one of the things that we're differentiating, we think, over an AWS or Google, is that we're trying to answer that use case in a way that they really aren't even trying to answer it right now. And so, that's one thing. The other is, you know, going beta with a launch customer that's a healthcare provider or a bank where they have all sorts of regulatory requirements, that's more complicated. And so, we are looking at, in some cases, we're looking at those banks or healthcare providers and trying to carve off a small niche use case that doesn't actually fall into the category of all those regulatory requirements. So that we can get our feet wet, get the tires kicked, those sorts of things. And in some cases we're looking for less traditional Enterprise customers to try to launch with. So, that's an active area of discussion. And one of the other key ones is The Weather Company. Trying to take The Weather Company workloads and move The Weather Company workloads. >> I want to come back to The Weather Company. When you did that deal, I was talking to one of your executives and he said, "Why do you think we did the deal?" I said, "Well, you've got 1500 data scientists, "you've got all this data, you know, it's the future." He goes, "Yeah, it's also going to be a platform "for IOT for IBM." >> Robbie: That's right. >> And I was like, "Hmmm." I get the IOT piece, how does it become a platform for IBM's IOT strategy? Is that really the case? Is that transpiring and how so? >> It's interesting because that was definitely one of the key tenets behind the acquisition. And what we've been working on so hard over the last year, as I'm sure you know, sometimes boxes and arrows on an architecture diagram and reality are more challenging. >> Dave: (laughing) Don't do that. >> And so, what we've had to do is reconcile a lot of what we built at The Weather Company, existing IBM tech, and the new things that were in flight, and try to figure out how can we fit all those pieces together. And so, it's been complicated but also good. In some cases, it's just people and expertise. And bringing those people and expertise and leaving some of the software behind. And other cases, it's actually bringing software. So, the story is, obviously, where the rubber meets the road, more complicated than what it sounds like in the press release. But the reality is we've combined those teams and they are all moving in the same direction together with various bits and pieces from the different teams. >> Okay, so, there's vision and then the road map to execute on that, and it's going to unfold over several years. >> Robbie: That's right. >> Okay, good. Stuff at the event here, I mean, what are you seeing, what's hot, what's going on with Spark? >> I think one of the interesting things with what's going on with Spark right now is a lot of the optimizations, especially things around GPUs and that. And we're pretty excited about that, being a hardware manufacturer, that's something that is interesting to us. We run our own cloud. Where some people may not be able to immediately leverage those capabilities, we're pretty excited about that. And also, we're looking at some of those, you know, taking Spark and running it on Power and those sorts of things to try to leverage the hardware improvements. So, that's one of the things we're doing. >> Alright, we have to leave it there, Robbie. Thanks very much for coming on theCube, really appreciate it. >> Thank you. >> You're welcome. Alright, keep it right there, everybody. We'll be right back with our next guest. This is theCube. We're live from Spark Summit East, hashtag #SparkSummit. Be right back. >> Narrator: Since the dawn of The Cloud, theCube.

Published Date : Feb 9 2017

SUMMARY :

brought to you by Databricks. The Cube is the worldwide leader in live tech coverage. is the industry's changing, Dell buys EMC, Great for the folks at The Weather Company, is a byproduct of the two companies And I remember we were talking, and I was like, Which is part of our, And The Cloud and analytics groups have come together is a real recognition that the use dase And The Watson Data Platform was announced in, And you took many sort of components that were And brought them together in a single data pipeline. So, the vision is, you know, one of the things And I think it was a retail example, And that's really what you're saying you've developed. That's exactly, that's just WebSphere. So, one of the things that we're trying to do is, of the announcement, for sure. You know, from the data origin all the way to So, one of the things that we're working on And I don't know if the exits and entries One of the things we're trying to look at with this platform And a lot of that is because some of the sophistication and he said, "Why do you think we did the deal?" Is that really the case? one of the key tenets behind the acquisition. and the new things that were in flight, to execute on that, and it's going to unfold Stuff at the event here, I mean, So, that's one of the things we're doing. Alright, we have to leave it there, Robbie. This is theCube.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
DavidPERSON

0.99+

IBMORGANIZATION

0.99+

Dave VellantePERSON

0.99+

George GilbertPERSON

0.99+

GeorgePERSON

0.99+

MicrosoftORGANIZATION

0.99+

AmazonORGANIZATION

0.99+

BostonLOCATION

0.99+

The Weather CompanyORGANIZATION

0.99+

GoogleORGANIZATION

0.99+

RobbiePERSON

0.99+

DavePERSON

0.99+

Robbie StricklandPERSON

0.99+

Watson GroupORGANIZATION

0.99+

David KennyPERSON

0.99+

OctoberDATE

0.99+

New York CityLOCATION

0.99+

1500 data scientistsQUANTITY

0.99+

two companiesQUANTITY

0.99+

10,000 linesQUANTITY

0.99+

DellORGANIZATION

0.99+

AWSORGANIZATION

0.99+

OneQUANTITY

0.99+

both casesQUANTITY

0.99+

Boston MassachusettsLOCATION

0.99+

Spark SummitEVENT

0.99+

IBM AnalyticsORGANIZATION

0.99+

SparkTITLE

0.99+

oneQUANTITY

0.99+

ADOTITLE

0.99+

LambdaTITLE

0.99+

TelcosORGANIZATION

0.99+

The CloudORGANIZATION

0.98+

Spark Summit East 2017EVENT

0.98+

first strategyQUANTITY

0.98+

IBM Global Business ServicesORGANIZATION

0.98+

EMCORGANIZATION

0.98+

one toolQUANTITY

0.98+

first anniversaryQUANTITY

0.98+

DatabricksORGANIZATION

0.98+

last yearDATE

0.98+

todayDATE

0.97+

two customersQUANTITY

0.97+

singleQUANTITY

0.97+

SQSTITLE

0.97+

first yearQUANTITY

0.97+

twoQUANTITY

0.96+

two different teamsQUANTITY

0.96+

WebSphereTITLE

0.96+

#SparkSummitEVENT

0.95+

JupyterORGANIZATION

0.95+

Watson Data PlatformTITLE

0.94+

KafkaTITLE

0.94+

Arun Murthy, Hortonworks - Spark Summit East 2017 - #SparkSummit - #theCUBE


 

>> [Announcer] Live, from Boston, Massachusetts, it's the Cube, covering Spark Summit East 2017, brought to you by Data Breaks. Now, your host, Dave Alante and George Gilbert. >> Welcome back to snowy Boston everybody, this is The Cube, the leader in live tech coverage. Arun Murthy is here, he's the founder and vice president of engineering at Horton Works, father of YARN, can I call you that, godfather of YARN, is that fair, or? (laughs) Anyway. He's so, so modest. Welcome back to the Cube, it's great to see you. >> Pleasure to have you. >> Coming off the big keynote, (laughs) you ended the session this morning, so that was great. Glad you made it in to Boston, and uh, lot of talk about security and governance, you know we've been talking about that years, it feels like it's truly starting to come into the main stream Arun, so. >> Well I think it's just a reflection of what customers are doing with the tech now. Now, three, four years ago, a lot of it was pilots, a lot of it was, you know, people playing with the tech. But increasingly, it's about, you know, people actually applying stuff in production, having data, system of record, running workloads both on prem and on the cloud, cloud is sort of becoming more and more real at mainstream enterprises. So a lot of it means, as you take any of the examples today any interesting app will have some sort of real time data feed, it's probably coming out from a cell phone or sensor which means that data is actually not, in most cases not coming on prem, it's actually getting collected in a local cloud somewhere, it's just more cost effective, why would we put up 25 data centers if you don't have to, right? So then you got to connect that data, production data you have or customer data you have or data you might have purchased and then join them up, run some interesting analytics, do geobased real time threat detection, cyber security. A lot of it means that you need a common way to secure data, govern it, and that's where we see the action, I think it's a really good sign for the market and for the community that people are pushing on these dimensions of the broader, because, getting pushed in this dimension because it means that people are actually using it for real production work loads. >> Well in the early days of Hadoop you really didn't talk that much about cloud. >> Yeah. >> You know, and now, >> Absolutely. >> It's like, you know, duh, cloud. >> Yeah. >> It's everywhere, and of course the whole hybrid cloud thing comes into play, what are you seeing there, what are things you can do in a hybrid, you know, or on prem that you can't do in a public cloud and what's the dynamic look like? >> Well, it's definitely not an either or, right? So what we're seeing is increasingly interesting apps need data which are born in the cloud and they'll stay in the cloud, but they also need transactional data which stays on prem, you might have an EDW for example, right? >> Right. >> There's not a lot of, you know, people want to solve business problems and not just move data from one place to another, right? Or back from one place to another, so it's not interesting to move an EDW to the cloud, and similarly it's not interesting to bring your IOT data or sensor data back into on-prem, right? Just makes sense. So naturally what happens is, you know, at Hortonworks we talk of kinds of modern app or a modern data app, which means a modern data app has to spare, has to sort of, you know, it can pass both on-prem data and cloud data. >> Yeah, you talked about that in your keynote years ago. Furio said that the data is the new development kit. And now you're seeing the apps are just so dang rich, >> Exactly, exactly. >> And they have to span >> Absolutely. >> physical locations, >> Yeah. >> But then this whole thing of IOT comes up, we've been having a conversation on The Cube, last several Cubes of, okay, how much stays out, how much stays in, there's a lot of debates about that, there's reasons not to bring it in, but you talked today about some of the important stuff will come back. >> Yeah. >> So the way this is, this all is going to be, you know, there's a lot of data that should be born in the cloud and stay there, the IOT data, but then what will happen increasingly is, key summaries of the data will move back and forth, so key summaries of your EDW will move to the cloud, sometimes key summaries of your IOT data, you know, you want to do some sort of historical training in analytics, that will come back on-prem, so I think there's a bi-directional data movement, but it just won't be all the data, right? It'll be key interesting summaries of the data but not all of it. >> And a lot of times, people say well it doesn't matter where it lives, cloud should be an operating model, not a place where you put data or applications, and while that's true and we would agree with that, from a customer standpoint it matters in terms of performance and latency issues and cost and regulation, >> And security and governance. >> Yeah. >> Absolutely. >> You need to think those things through. >> Exactly, so I mean, so that's what we're focused on, to make sure that you have a common security and governance model regardless of where data is, so you can think of it as, infrastructure you own and infrastructure you lease. >> Right. >> Right? Now, the details matter of course, when you go to the cloud you lose S3 for example or ADLS from Microsoft, but you got to make sure that there's a common sort of security governance front and top of it, in front of it, as an example one of the things that, you know, in the open source community, Ranger's a really sort of key project right now from a security authorization and authentication standpoint. We've done a lot of work with our friends at Microsoft to make sure, you can actually now manage data in Wasabi which is their object store, data stream, natively with Ranger, so you can set a policy that says only Dave can access these files, you know, George can access these columns, that sort of stuff is natively done on the Microsoft platform thanks to the relationship we have with them. >> Right. >> So that's actually really interesting for the open source communities. So you've talked about sort of commodity storage at the bottom layer and even if they're different sort of interfaces and implementations, it's still commodity storage, and now what's really helpful to customers is that they have a common security model, >> Exactly. >> Authorization, authentication, >> Authentication, lineage prominence, >> Oh okay. >> You want to make sure all of these are common sources across. >> But you've mentioned off of the different data patterns, like the stuff that might be streaming in on the cloud, what, assuming you're not putting it into just a file system or an object store, and you want to sort of merge it with >> Yeah. >> Historical data, so what are some of the data stores other than the file system, in other words, newfangled databases to manage this sort of interaction? >> So I think what you're saying is, we certainly have the raw data, the raw data is going to line up in whatever cloud native storage, >> Yeah. >> It's going to be Amazon, Wasabi, ADLS, Google Storage. But then increasingly you want, so now the patterns change so you have raw data, you have some sort of an ETL process, what's interesting in the cloud is that even the process data or, if you take the unstructured raw data and structure it, that structured data also needs to live on the cloud platform, right? The reason that's important is because A, it's cheaper to use the native platform rather than set up your own database on top of it. The other one is you also want to take advantage of all the native sources that the cloud storage provides, so for example, linking your application. So automatically data in Wasabi, you know, if you can set up a policy and easily say this structured data stable that I have of which is a summary of all the IOT activity in the last 24 hours, you can, using the cloud provider's technologies you can actually make it show up easily in Europe, like you don't have to do any work, right? So increasingly what we Hortonworks focused a lot on is to make sure that we, all of the computer engines, whether it's Spark or Hive or, you know, or MapReduce, it doesn't really matter, they're all natively working on the cloud provider's storage platform. >> [George] Okay. >> Right, so, >> Okay. >> That's a really key consideration for us. >> And the follow up to that, you know, there's a bit of a misconception that Spark replaces Hadoop, but it actually can be a processing, a compute engine for, >> Yeah. >> That can compliment or replace some of the compute engines in Hadoop, help us frame, how you talk about it with your customers. >> For us it's really simple, like in the past, the only option you had on Hadoop to do any computation was MapReduce, that was, I started working in MapReduce 11 years ago, so as you can imagine, it's a pretty good run for any technology, right? Spark is definitely the interesting sort of engine for sort of the, anything from mission learning to ETL for data on top of Hadoop. But again, what we focus a lot on is to make sure that every time we bring in, so right now, when we started on HTP, the first on HTP had about nine open source projects literally just nine. Today, the last one we shipped was 2.5, HTP 2.5 had about 27 I think, like it's a huge sort of explosion, right? But the problem with that is not just that we have 27 projects, the problem is that you're going to make sure each of the 27 work with all the 26 others. >> It's a QA nightmare. >> Exactly. So that integration is really key, so same thing with Spark, we want to make sure you have security and YARN (mumbles), like you saw in the demo today, you can now run Spark SQL but also make sure you get low level (mumbles) masking, all of the enterprise capabilities that you need, and I was at a financial services three or four weeks ago in Chicago. Today, to do equivalent of what I showed today on demo, they need literally, they have a classic ADW, and they have to maintain anywhere between 1500 to 2500 views of the same database, that's a nightmare as you can imagine. Now the fact that you can do this on the raw data using whether it's Hive or Spark or Peg or MapReduce, it doesn't really matter, it's really key, and that's the thing we push to make sure things like YARN security work across all the stacks, all the open source techs. >> So that makes life better, a simplification use case if you will, >> Yeah. >> What are some of the other use cases that you're seeing things like Spark enable? >> Machine learning is a really big one. Increasingly, every product is going to have some, people call it, machine learning and AI and deep learning, there's a lot of techniques out there, but the key part is you want to build a predictive model, in the past (mumbles) everybody want to build a model and score what's happening in the real world against model, but equally important make sure the model gets updated as more data comes in on and actually as the model scores does get smaller over time. So that's something we see all over, so for example, even within our own product, it's not just us enabling this for the customer, for example at Hortonworks we have a product called SmartSense which allows you to optimize how people use Hadoop. Where the, what are the opportunities for you to explore deficiencies within your own Hadoop system, whether it's Spark or Hive, right? So we now put mesh learning into SmartSense. And show you that customers who are running queries like you are running, Mr. Customer X, other customers like you are tuning Hadoop this way, they're running this sort of config, they're using these sort of features in Hadoop. That allows us to actually make the product itself better all the way down the pipe. >> So you're improving the scoring algorithm or you're sort of replacing it with something better? >> What we're doing there is just helping them optimize their Hadoop deploys. >> Yep. >> Right? You know, configuration and tuning and kernel settings and network settings, we do that automatically with SmartSense. >> But the customer, you talked about scoring and trying to, >> Yeah. >> They're tuning that, improving that and increasing the probability of it's accuracy, or is it? >> It's both. >> Okay. >> So the thing is what they do is, you initially come with a hypothesis, you have some amount of data, right? I'm a big believer that over time, more data, you're better off spending more, getting more data into the system than to tune that algorithm financially, right? >> Interesting, okay. >> Right, so you know, for example, you know, talk to any of the big guys on Facebook because they'll do the same, what they'll say is it's much better to get, to spend your time getting 10x data to the system and improving the model rather than spending 10x the time and improving the model itself on day one. >> Yeah, but that's a key choice, because you got to >> Exactly. >> Spend money on doing either, >> One of them. >> And you're saying go for the data. >> Go for the data. >> At least now. >> Yeah, go for data, what happens is the good part of that is it's not just the model, it's the, what you got to really get through is the entire end to end flow. >> Yeah. >> All the way from data aggregation to ingestion to collection to scoring, all that aspect, you're better off sort of walking through the paces like building the entire end to end product rather than spending time in a silo trying to make a lot of change. >> We've talked to a lot of machine learning tool vendors, application vendors, and it seems like we got to the point with Big Data where we put it in a repository then we started doing better at curating it and understanding it then starting to do a little bit exploration with business intelligence, but with machine learning, we don't have something that does this end to end, you know, from acquiring the data, building the model to operationalizing it, where are we on that, who should we look to for that? >> It's definitely very early, I mean if you look at, even the EDW space, for example, what is EDW? EDW is ingestion, ETL, and then sort of fast query layer, Olap BI, on and on and on, right? So that's the full EDW flow, I don't think as a market, I mean, it's really early in this space, not only as an overall industry, we have that end to end sort of industrialized design concept, it's going to take time, but a lot of people are ahead, you know, the Google's a world ahead, over time a lot of people will catch up. >> We got to go, I wish we had more time, I had so many other questions for you but I know time is tight in our schedule, so thanks so much Arun, >> Appreciate it. For coming on, appreciate it, alright, keep right there everybody, we'll be back with our next guest, it's The Cube, we're live from Spark Summit East in Boston, right back. (upbeat music)

Published Date : Feb 9 2017

SUMMARY :

brought to you by Data Breaks. father of YARN, can I call you that, Glad you made it in to Boston, So a lot of it means, as you take any of the examples today you really didn't talk that has to sort of, you know, it can pass both on-prem data Yeah, you talked about that in your keynote years ago. but you talked today about some of the important stuff So the way this is, this all is going to be, you know, And security and You need to think those so that's what we're focused on, to make sure that you have as an example one of the things that, you know, in the open So that's actually really interesting for the open source You want to make sure all of these are common sources in the last 24 hours, you can, using the cloud provider's in Hadoop, help us frame, how you talk about it with like in the past, the only option you had on Hadoop all of the enterprise capabilities that you need, Where the, what are the opportunities for you to explore What we're doing there is just helping them optimize and network settings, we do that automatically for example, you know, talk to any of the big guys is it's not just the model, it's the, what you got to really like building the entire end to end product rather than but a lot of people are ahead, you know, the Google's everybody, we'll be back with our next guest, it's The Cube,

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
DavePERSON

0.99+

George GilbertPERSON

0.99+

Dave AlantePERSON

0.99+

Arun MurthyPERSON

0.99+

EuropeLOCATION

0.99+

MicrosoftORGANIZATION

0.99+

10xQUANTITY

0.99+

BostonLOCATION

0.99+

ChicagoLOCATION

0.99+

AmazonORGANIZATION

0.99+

GeorgePERSON

0.99+

ArunPERSON

0.99+

WasabiORGANIZATION

0.99+

25 data centersQUANTITY

0.99+

TodayDATE

0.99+

HadoopTITLE

0.99+

WasabiLOCATION

0.99+

YARNORGANIZATION

0.99+

FacebookORGANIZATION

0.99+

ADLSORGANIZATION

0.99+

HortonworksORGANIZATION

0.99+

Horton WorksORGANIZATION

0.99+

todayDATE

0.99+

Data BreaksORGANIZATION

0.99+

1500QUANTITY

0.98+

SmartSenseTITLE

0.98+

S3TITLE

0.98+

Boston, MassachusettsLOCATION

0.98+

OneQUANTITY

0.98+

27 projectsQUANTITY

0.98+

threeDATE

0.98+

GoogleORGANIZATION

0.98+

FurioPERSON

0.98+

SparkTITLE

0.98+

2500 viewsQUANTITY

0.98+

firstQUANTITY

0.97+

Spark Summit EastLOCATION

0.97+

bothQUANTITY

0.97+

Spark SQLTITLE

0.97+

Google StorageORGANIZATION

0.97+

26QUANTITY

0.96+

RangerORGANIZATION

0.96+

four weeks agoDATE

0.95+

oneQUANTITY

0.94+

eachQUANTITY

0.94+

four years agoDATE

0.94+

11 years agoDATE

0.93+

27 workQUANTITY

0.9+

MapReduceTITLE

0.89+

HiveTITLE

0.89+

this morningDATE

0.88+

EDWTITLE

0.88+

about nine open sourceQUANTITY

0.88+

day oneQUANTITY

0.87+

nineQUANTITY

0.86+

yearsDATE

0.84+

OlapTITLE

0.83+

CubeORGANIZATION

0.81+

a lot of dataQUANTITY

0.8+

Day Two Kickoff - Spark Summit East 2017 - #SparkSummit - #theCUBE


 

>> Narrator: Live from Boston, Massachusetts, this is theCUBE, covering Spark Summit East 2017. Brought to you by Databricks. Now, here are your hosts, Dave Vellante and George Gilbert. >> Welcome back to day two in Boston where it is snowing sideways here. But we're all here at Spark Summit #SparkSummit, Spark Summit East, this is theCUBE. Sound like an Anglo flagship product. We go out to the event, we program for our audience, we extract the signal from the noise. I'm here with George Gilbert, day two, at Spark Summit, George. We're seeing the evolution of so-called big data. Spark was a key part of that. Designed to really both simplify and speed up big data oriented transactions and really help fulfill the dream of big data, which is to be able to affect outcomes in near real time. A lot of those outcomes, of course, are related to ad tech and selling and retail oriented use cases, but we're hearing more and more around education and deep learning and affecting consumers and human life in different ways. We're now 10 years in to the whole big data trend, what's your take, George, on what's going on here? >> Even if we started off with ad tech, which is what most of the big internet companies did, we always start off in any new paradigm with one application that kind of defines that era. And then we copy and extend that pattern. For me, on the rethinking your business the a McGraw-Hill interview we did yesterday was the most amazing thing because they took, what they had was a textbook business for their education unit and they're re-thinking the business, as in what does it mean to be an education company? And they take cognitive science about how people learn and then they take essentially digital assets and help people on a curriculum, not the centuries old sort of teacher, lecture, homework kind of thing, but individualized education where the patterns of reinforcement are consistent with how each student learns. And it's not just a break up the lecture into little bits, it's more of a how do you learn most effectively? How do you internalize information? >> I think that is a great example, George, and there are many, many examples of companies that are transforming digitally. Years and years ago people started to think about okay, how can I instrument or digitize certain assets that I have for certain physical assets? I remember a story when we did the MIT event in London with Andy MacAfee and Eric Binyolsen, they were giving the example of McCormick Spice, the spice company, who digitized by turning what they were doing into recipes and driving demand for their product and actually building new communities. That was kind of an interesting example, but sort of mundane. The McGraw-Hill education is massive. Their chief data scientist, chief data scientist? I don't know, the head of engineering, I guess, is who he was. >> VP of Analytics and Data Science. >> VP of Analytics and Data Science, yeah. He spoke today and got a big round of applause when he sort of led off about the importance of education at the keynote. He's right on, and I think that's a classic example of a company that was built around printing presses and distributing dead trees that is completely transformed and it's quite successful. Over the last only two years brought in a new CEO. So that's good, but let's bring it back to Spark specifically. When Spark first came out, George, you were very enthusiastic. You're technical, you love the deep tech. And you saw the potential for Spark to really address some of the problems that we faced with Hadoop, particularly the complexity, the batch orientation. Even some of the costs -- >> The hidden costs. >> Associated with that, those hidden costs. So you were very enthusiastic, in your mind, has Spark lived up to your initial expectations? >> That's a really good question, and I guess techies like me are often a little more enthusiastic than the current maturity of the technology. Spark doesn't replace Hadoop, but it carves out a big chunk of what Hadoop would do. Spark doesn't address storage, and it doesn't really have any sort of management bits. So you could sort of hollow out Hadoop and put Spark in. But it's still got a little ways to go in terms of becoming really, really fast to respond in near real time. Not just human real time, but like machine real time. It doesn't work sort of deeply with databases yet. It's still teething, and sort of every release, which is approximately every 12 to 18 months, it gets broader in its applicability. So there's no question sort of everyone is piling on, which means that'll help it mature faster. >> When Hadoop was first sort of introduced to the early masses, not the main stream masses, but the early masses, the profundity of Hadoop was that you could leave data in place and bring compute to the data. And people got very excited about that because they knew there was so much data and you just couldn't keep moving it around. But the early insiders of Hadoop, I remember, they would come to theCUBE and everybody was, of course, enthusiastic and lot of cheerleading going on. But in the hallway conversations with Hadoop, with the real insiders you would have conversations about, people are going to realize how much this sucks some day and how hard this is and it's going to hit a wall. Some of the cheerleaders would say, no way, Hadoop forever. Now you've started to see that in practice. And the number of real hardcore transformations as a result of Hadoop in and of itself have been quite limited. The same is true for virtually, most anyway, technology, not any technology. I'd say the smartphone was pretty transformative in and of itself, but nonetheless, we are seeing that sort of progression and we're starting to see a lot of the same use cases that you hear about like fraud detection and retargeting as coming up again. I think what we're seeing is those are improving. Like fraud detection, I talked yesterday about it used to be six months before you'd even detect fraud, if you ever did. Now it's minutes or seconds. But you still get a lot of false positives. So we're going to just keep turning that crank. Mike Gualtieri today talked about the efficacy of today's AI and he gave some examples of Google, he showed a plane crash and he said, it said plane and it accurately identified that, but also the API said it could be wind sports or something like that. So you can see it's still not there yet. At the same time, you see things like Siri and Amazon Alexa getting better and better and better. So my question to you, kind of long-winded here, is, is that what Spark is all about? Just making better the initial initiatives around big data, or is it more transformative than that? >> Interesting question, and I would come at it with a couple different answers. Spark was a reaction to you can't, you can't have multiple different engines to attack all the different data problems because you would do a part of the analysis here, push it into a disk, pull it out of a disk to another engine, all of that would take too long or be too complex a pipeline to go from end to the other. Spark was like, we'll do it all in our unified engine and you can come at it from SQL, you can come at it from streaming, so it's all in one place. That changes the sophistication of what you can do, the simplicity, and therefore how many people can access it and apply it to these problems. And the fact that it's so much faster means you can attack a qualitatively different setup of problems. >> I think as well it really underscores the importance of Open Source and the ability of the Open Source community to launch projects that both stick and can attract serious investment. Not only with IBM, but that's a good example. But entire ecosystems that collectively can really move the needle. Big day today, George, we've got a number of guests. We'll give you the last word at the open. >> Okay, what I thought, this is going to sound a little bit sort of abstract, but a couple of two takeaways from some of our most technical speakers yesterday. One was with Juan Stoyka who sort of co-headed the lab that was the genesis of Spark at Berkeley. >> AMPLabs. >> The AMPLab at Berkeley. >> And now Rise Labs. >> And then also with the IBM Chief Data Officer for the Analytics Unit. >> Seth Filbrun. >> Filbrun, yes. When we look at what's the core value add ultimately, it's not these infrastructure analytic frameworks and that sort of thing, it's the machine learning model in its flywheel feedback state where it's getting trained and re-trained on the data that comes in from the app and then as you continually improve it, that was the whole rationale for Data Links, but not with models. It was put all the data there because you're going to ask questions you couldn't anticipate. So here it's collect all the data from the app because you're going to improve the model in ways you didn't expect. And that beating heart, that living model that's always getting better, that's the core value add. And that's going to belong to end customers and to application companies. >> One of the speakers today, AI kind of invented in the 50s, a lot of excitement in the 70s, kind of died in the 80s and it's coming back. It's almost like it's being reborn. And it's still in its infant stages, but the potential is enormous. All right, George, that's a wrap for the open. Big day today, keep it right there, everybody. We got a number of guests today, and as well, don't forget, at the end of the day today George and I will be introducing part two of our WikiBon Big Data forecast. This is where we'll release a lot of our numbers and George will give a first look at that. So keep it right there everybody, this is theCUBE. We're live from Spark Summit East, #SparkSummit. We'll be right back. (techno music)

Published Date : Feb 9 2017

SUMMARY :

Brought to you by Databricks. fulfill the dream of big data, which is to be able it's more of a how do you learn most effectively? the example of McCormick Spice, the spice company, some of the problems that we faced with Hadoop, So you were very enthusiastic, in your mind, than the current maturity of the technology. At the same time, you see things like Siri That changes the sophistication of what you can do, of Open Source and the ability of the Open Source community One was with Juan Stoyka who sort of co-headed the lab for the Analytics Unit. that comes in from the app and then as you One of the speakers today, AI kind of invented

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
George GilbertPERSON

0.99+

Dave VellantePERSON

0.99+

Mike GualtieriPERSON

0.99+

GeorgePERSON

0.99+

Juan StoykaPERSON

0.99+

BostonLOCATION

0.99+

IBMORGANIZATION

0.99+

Eric BinyolsenPERSON

0.99+

LondonLOCATION

0.99+

yesterdayDATE

0.99+

10 yearsQUANTITY

0.99+

SiriTITLE

0.99+

BerkeleyLOCATION

0.99+

GoogleORGANIZATION

0.99+

McCormick SpiceORGANIZATION

0.99+

Boston, MassachusettsLOCATION

0.99+

Rise LabsORGANIZATION

0.99+

AmazonORGANIZATION

0.99+

todayDATE

0.99+

Seth FilbrunPERSON

0.99+

80sDATE

0.98+

50sDATE

0.98+

each studentQUANTITY

0.98+

two takeawaysQUANTITY

0.98+

70sDATE

0.98+

SparkORGANIZATION

0.98+

Spark Summit East 2017EVENT

0.98+

firstQUANTITY

0.97+

bothQUANTITY

0.97+

Andy MacAfeePERSON

0.97+

#SparkSummitEVENT

0.97+

OneQUANTITY

0.96+

1QUANTITY

0.96+

day twoQUANTITY

0.95+

one applicationQUANTITY

0.95+

SparkTITLE

0.95+

McGraw-HillPERSON

0.94+

AMPLabsORGANIZATION

0.94+

YearsDATE

0.94+

one placeQUANTITY

0.93+

HadoopTITLE

0.93+

AlexaTITLE

0.93+

DatabricksORGANIZATION

0.93+

Spark Summit EastEVENT

0.93+

12QUANTITY

0.91+

two yearsQUANTITY

0.91+

Spark Summit EastLOCATION

0.91+

six monthsQUANTITY

0.9+

SQLTITLE

0.89+

Chief Data OfficerPERSON

0.89+

HadoopPERSON

0.85+

muchQUANTITY

0.84+

Spark SummitEVENT

0.84+

AngloOTHER

0.81+

first lookQUANTITY

0.75+

8 monthsQUANTITY

0.72+

WikiBonORGANIZATION

0.69+

part twoQUANTITY

0.69+

HillORGANIZATION

0.68+

KickoffEVENT

0.64+

coupleQUANTITY

0.64+

McGraw-PERSON

0.64+

Wikibon Big Data Market Update Pt. 1 - Spark Summit East 2017 - #sparksummit - #theCUBE


 

>> [Announcer] Live from Boston, Massachusetts, this is theCUBE, covering Spark Summit East 2017, brought to you by Databricks. Now, here are your hosts, Dave Vellante and George Gilbert. >> We're back, welcome to Boston, everybody, this is a special presentation that George Gilbert and I are going to provide to you now. SiliconANGLE Media is the umbrella brand of our company, and we've got three sub-brands. One of them is Wikibon, it's the research organization that Gorge works in, and then of course, we have theCUBE and then SiliconANGLE, which is the tech publication, and then we extensively, as you may know, use CrowdChat and other social data, but we want to drill down now on the Wikibon, Wikibon research side of things. Wikibon was the first research company ever to do a big data forecast. Many, many years ago, our friend Jeff Kelly produced that for several years, we opensourced it, and it really, I think helped the industry a lot, sort of framing the big data opportunity, and then George last year did the first Spark forecast, really Spark adoption, so what we want to do now is talk about some of the trends in the marketplace, this is going to be done in two parts, today's part one, and we're really going to talk about the overall market trends and the market conditions, and then we're going to go to part two tomorrow, where you're going to release some of the numbers, right? And we'll share some of the numbers today. So, we're going to start on the first slide here, we're going to share with you some slides. The Wikibon forecast review, and George is going to, I'm going to ask you to talk about where we are at with big data apps, everybody's saying it's peaked, big data's now going mainstream, where are we at with big data apps? >> [George] Okay, so, I want to quote, just to provide context, the former CTO on VMware, Steve Herrod. He said, "In the end, it wasn't big data, "it was big analytics." And what's interesting is that when we start thinking about it, there have been three classes of, there have been traditionally two classes of workloads, one batch, and in the context of analytics, that means running reports in the background, doing offline business intelligence, but then there was also the interactive-type work. What's emerging is something that's continuously happening, and it doesn't mean that all apps are going to be always on, it just means that there are, all apps will have a batch component, an interactive component, like with the user, and then a streaming, or continuous component. >> [Dave] So it's a new type of workload. >> Yes. >> Okay. Anything else you want to point out here? >> Yeah, what's worth mentioning, this is, it's not like it's going to burst fully-formed out of the clouds, and become sort of a new standard, there's two things that has to happen, the technology has to mature, so right now you have some pretty tough trade-offs between integration, which provides simplicity, and choice and optimization, which gives you fragmentation, and then skillset, and both of those need to develop. >> [Dave] Alright, we're going to talk about both of those a little bit later in this segment. Let's go to the next slide, which really talks to some of the high-level forecast that we released last year, so these are last year's numbers, correct? >> Yes, yes. >> [Dave] Okay, so, what's changed? You've got the ogive curve, which is sort of the streaming penetration, Spark/streaming, that's what, was last year, this is now reflective of continuous, you'll be updating that, how is this changing, what do you want us to know here? >> [George] Okay, so the key takeaways here are, first, we took three application patterns, the first being the data lake, which is sort of the original canonical repository of all your data. That never goes away, but on top of it, you layer what we were calling last year systems of engagement, which is where you've got the interactive machine learning component helping to anticipate and influence a user's decision, and then on top of that, which was the aqua color, was the self-tuning systems, which is probably more IIoT stuff, where you've got a whole ecosystem of devices and intelligence in the cloud and at the edge, and you don't necessarily need a human in the loop. But, these now, when you look at them, you can break them down as having three types of workloads, the batch, the interactive, and the continuous. >> Okay, and that is sort of a new workload here, and this is a real big theme of your research now is, we all remember, no, we don't all remember, I remember punch cards, that's the ultimate batch, and then of course, the terminals were interactive, and you think of that as closer to real time, but now, this notion of continuous, if you go to the next slide, Patrick, we can take a look at how workloads are changing, so George, take us through that dynamic. >> [George] Okay so, to understand where we're going, sometimes it helps to look at where we've come from, and the traditional workloads, if we talk about applications, they were divided into, now, we talked about sort of batch versus interactive, but now, they were also divided into online transaction processing, operational application, systems of record, and then there was the analytic side, which was reporting on it, but this was sort of backward-looking reporting, and we begin to see some convergence between the two with web and mobile apps, where a user was interacting both with the analytics that informed an interaction that they might have. That's looking backwards, and we're going to take a quick look at some of the new technologies that augmented those older application patterns. Then we're going to go look at the emergent workloads and what they look like. >> Okay so, let's have a quick conversation about this before we go on to the next segment. Hadoop obviously was batch. It really was a way, as we've talked about today and many other dates in theCUBE, a way to reduce the expense of doing data warehousing and business intelligence, I remember we were interviewing Jeff Hammerbacher, and he said, "When I was at Facebook, "my mission was to break the dependency "and the container, the storage container." So he really wanted to, needed to reduce costs, he saw that infrastructure needed to change, so if you look at the next slide, which is really sort of talking to Hadoop doing batch in traditional BI, take us through that, and then we'll sort of evolve to the future. >> Okay, so this is an example of traditional workloads, batch business intelligence, because Hadoop has not really gotten to the maturity point of view where you can really do interactive business intelligence. It's going to take a little more work. But here, you've basically put in a repository more data than you could possibly ever fit in a data warehouse, and the key is, this environment was very fragmented, there were many different engines involved, and so there was a high developer complexity, and a high operational complexity, and we're getting to the point where we can do somewhat better on the integration, and we're getting to the point where we might be able to do interactive business intelligence and start doing a little bit of advanced analytics like machine learning. >> Okay. Let's talk a little bit about why we're here, we're here 'cause it's Spark Summit, Spark was designed to simplify big data, simplify a lot of the complexity in Hadoop, so on the next slide, you've got this red line of Spark, so what is Spark's role, what does that red line represent? >> Okay, so the key takeaway from this slide is, couple things. One, it's interesting, but when you listen to Matei Zaharia, who is the creator of Spark, he said, "I built this to be a better MapReduce than MapReduce," which was the old crufty heart of Hadoop. And of course, they've stretched it far beyond their original intentions, but it's not the panacea yet, and if you put it in the context of a data lake, it can help you with what a data engineer does with exploring and munging the data, and what a data scientist might do in terms of processing the data and getting it ready for more advanced analytics, but it doesn't give you an end-to-end solution, not even within the data lake. The point of explaining this is important, because we want to explain how, even in the newer workloads, Spark isn't yet mature to handle the end-to-end integration, and by making that point, we'll show where it needs still more work, and where you have to substitute other products. >> Okay, so let's have a quick discussion about those workloads. Workloads really kind of drive everything, a lot of decisions for organizations, where to put things, and how to protect data, where the value is, so in this next slide you've got, you're juxtaposing traditional workloads with emerging workloads, so let's talk about these new continuous apps. >> Okay, so, this tees it up well, 'cause we focused on the traditional workloads. The emerging ones are where data is always coming in. You could take a big flow of data and sort of end it and bucket it, and turn it into a batch process, but now that we have the capability to keep processing it, and you want answers from it very near real time, you don't want to stop it from flowing, so the first one that took off like this was collecting telemetry about the operation and performance of your apps and your infrastructure, and Splunk sort of conquered that workload first. And then the second one, the one that everyone's talking about now is sort of Internet of Things, but more accurately, the Industrial Internet of Things, and that stream of data is, again, something you'll want to analyze and act on with as little delay as possible. The third one is interesting, asynchronous microservices. This is difficult, because this doesn't necessarily require a lot of new technology, so much as a new skillset for developers, and that's going to mean it takes off fairly slowly. Maybe new developers coming out of school will adopt it whole cloth, but this is where you don't rely on a big central database, this is where you break things into little pieces, and each piece manages itself. >> So you say the components of these arrows that you're showing in just explore processor, these are all sort of discrete elements of the data flow that you have to then integrate as a customer? >> [George] Yes, frankly, these are all steps that could be an end-to-end integrative process, but it's not yet mature enough really to do it end-to-end. For example, we don't even have a data store that can go all the way from ingest to serve, and by ingest, I mean taking the millions, potentially millions or more, events per second coming in from your Internet of Things devices, the explorer would be in that same data store, letting you visualize what's there, and process doing the analysis, and serving then is, from that same data store, letting your industrial devices, or your business intelligence workloads get real-time updates. For this to work as one whole, we need a data store, for example, that can go from end-to-end, in addition to the compute and analytic capabilities that go end-to-end. The point of this is, for continuous workloads, we do want to get to this integrated point somehow, sometime, but we're not there yet. >> Okay, let's go deeper, and take a look at the next slide, you've got this data feedback loop, and you've got this prediction on top of this, what does all that mean, let's double-click on that. >> Okay, so now we're unpacking the slide we just looked at, in that we're unpacking it into two different elements, one is what you're doing when you're running the system, and the next one will be what you're doing when you're designing it. And so for this one, what you're doing when you're running the system, I've grayed out the where's the data coming from and where's it going to, just to focus on how we're operating on the data, and again, to repeat the green part, which is storage, we don't have an end-to-end integrated store that could cost-effectively, scalably handle this whole chain of steps, but what we do have is that in the runtime, you're going to ingest the data, you're going to process it and make it ready for prediction, then there's a step that's called devops for data science, we know devops for developers, but devops for data science, as we're going to see, actually unpacks a whole 'nother level of complexity, but this devops for data science, this is where you get the prediction, of, okay, so, if this turbine is vibrating and has a heat spike, it means shut it down because something's going to fail. That's the prediction component, and the serve part then takes that prediction, and makes sure that that device gets it fast. >> So you're putting that capability in the hands of the data science component so they can effect that outcome virtually instantaneously? >> Yes, but in this case, the data scientist will have done that at design time. We're still at run time, so this is, once the data scientist has built that model, here, it's the engineer who's keeping it running. >> Yeah, but it's designed into the process, that's the devops analogy. Okay great, well let's go to that sort of next piece, which is design, so how does this all affect design, what are the implications there? >> So now, before we had ingest process, then prediction with devops for data science, and then serving, now when you're at design time, you ingest the data, and there's a whole unpacking of steps, which requires a handful, or two fistfuls of tools right now to make operate. This is to acquire the data, explore it, prepare it, model it, assess it, distribute it, all those things are today handled by a collection of tools that you have to stitch together, and then you have process at which could be typically done in Spark, where you do the analysis, and then serving it, Spark isn't ready to serve, that's typically a high-speed database, one that either has tons of data for history, or gets very, very fast updates, like a Redis that's almost like a cache. So the point of this is, we can't yet take Spark as gospel from end to end. >> Okay so, there's a lot of complexity here. >> [George] Right, that's the trade-off. >> So let's take a look at the next slide, which talks to where that complexity comes from, let's look at it first from the developer side, and then we'll look at the admin, so, so on the next slide, we're looking at the complexity from the dev perspective, explain the axes here. >> Okay, okay. So, there's two axes. If you look at the x-axis at the bottom, there's ingest, explore, process, serve. Those were the steps at a high level that we said a developer has to master, and it's going to be in separate products, because we don't have the maturity today. Then on the y-axis, we have some, but not all, this is not an exhaustive list of all the different things a developer has to deal with, with each product, so the complexity is multiplying all the steps on the y-axis, data model, addressing, programming model, persistence, all the stuff's on the y-axis, by all the products he needs on the x-axis, it's a mess, which is why it's very, very hard to build these types of systems today. >> Well, and why everybody's pushing on this whole unified integration, that was a major thing that we heard throughout the day today. What about from the admin's side, let's take a look at the next slide, which is our last slide, in terms of the operational complexity, take us through that. >> [George] Okay, so, the admin is when the system's running, and reading out the complexity, or inferring the complexity, follows the same process. On the y-axis, there's a separate set of tasks. These are admin-related. Governance, scheduling and orchestration, a high availability, all the different types of security, resource isolation, each of these is done differently for each product, and the products are on the x-axis, ingest, explore, process, serve, so that when you multiply those out, and again, this isn't exhaustive, you get, again, essentially a mess of complexity. >> Okay, so we got the message, if you're a practitioner of these so-called big data technologies, you're going to be dealing with more complexity, despite the industry's pace of trying to address that, but you're seeing new projects pop up, but nonetheless, it feels like the complexity curve is growing faster than customer's ability to absorb that complexity. Okay, well, is there hope? >> Yes. But here's where we've had this conundrum. The Apache opensource community has been the most amazing source of innovation I think we've ever seen in the industry, but the problem is, going back to the amazing book, The Cathedral and the Bazaar, about opensource innovation versus top-down, the cathedral has this central architecture that makes everything fit together harmoniously, and beautifully, with simplicity. But the bazaar is so much faster, 'cause it's sort of this free market of innovation. The Apache ecosystem is the bazaar, and the burden is on the developer and the administrator to make it work together, and it was most appropriate for the big internet companies that had the skills to do that. Now, the companies that are distributing these Apache opensource components are doing a Herculean job of putting them together, but they weren't designed to fit together. On the other hand, you've got the cloud service providers, who are building, to some extent, services that have standard APIs that might've been supported by some of the Apache products, but they have proprietary implementations, so you have lock-in, but they have more of the cathedral-type architecture that-- >> And they're delivering 'em their services, even though actually, many of those data services are discrete APIs, as you point out, are proprietary. Okay, so, very useful, George, thank you, if you have questions on this presentation, you can hit Wikibon.com and fire off a question to us, we'll make sure it gets to George and gets answered. This is part one, part two tomorrow is we're going to dig into some of the numbers, right? So if you care about where the trends are, what the numbers look like, what the market size looks like, we'll be sharing that with you tomorrow, all this stuff, of course, will be available on-demand, we'll be doing CrowdChats on this, George, excellent job, thank you very much for taking us through this. Thanks for watching today, it is a wrap of day one, Spark Summit East, we'll be back live tomorrow from Boston, this is theCUBE, so check out siliconangle.com for a review of all the action today, all the news, check out Wikibon.com for all the research, siliconangle.tv is where we house all these videos, check that out, we start again tomorrow at 11 o'clock east coast time, right after the keynotes, this is theCUBE, we're at Spark Summit, #SparkSummit, we're out, see you tomorrow. (electronic music jingle)

Published Date : Feb 8 2017

SUMMARY :

brought to you by Databricks. and the market conditions, and then we're going to go and it doesn't mean that all apps are going to be always on, Anything else you want to point out here? the technology has to mature, so right now Let's go to the next slide, which really and at the edge, and you don't necessarily need and you think of that as closer to real time, and the traditional workloads, "and the container, the storage container." and we're getting to the point where so on the next slide, you've got this red line of Spark, but it's not the panacea yet, and if you put it Okay, so let's have a quick discussion and you want answers from it very near real time, and by ingest, I mean taking the millions, and take a look at the next slide, and the next one will be what you're doing here, it's the engineer who's keeping it running. Yeah, but it's designed into the process, So the point of this is, we can't yet take Spark so on the next slide, we're looking of all the different things a developer has to deal with, let's take a look at the next slide, and the products are on the x-axis, it feels like the complexity curve is growing faster and the burden is on the developer and the administrator of all the action today, all the news,

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
George GilbertPERSON

0.99+

PatrickPERSON

0.99+

Dave VellantePERSON

0.99+

Jeff HammerbacherPERSON

0.99+

Steve HerrodPERSON

0.99+

Jeff KellyPERSON

0.99+

GeorgePERSON

0.99+

Matei ZahariaPERSON

0.99+

BostonLOCATION

0.99+

last yearDATE

0.99+

WikibonORGANIZATION

0.99+

SiliconANGLEORGANIZATION

0.99+

tomorrowDATE

0.99+

millionsQUANTITY

0.99+

VMwareORGANIZATION

0.99+

SparkTITLE

0.99+

GorgeORGANIZATION

0.99+

one batchQUANTITY

0.99+

Boston, MassachusettsLOCATION

0.99+

two classesQUANTITY

0.99+

DavePERSON

0.99+

three classesQUANTITY

0.99+

firstQUANTITY

0.99+

two partsQUANTITY

0.99+

eachQUANTITY

0.99+

second oneQUANTITY

0.99+

two different elementsQUANTITY

0.99+

first slideQUANTITY

0.99+

twoQUANTITY

0.99+

The Cathedral and the BazaarTITLE

0.99+

each productQUANTITY

0.99+

each pieceQUANTITY

0.99+

third oneQUANTITY

0.99+

OneQUANTITY

0.99+

DatabricksORGANIZATION

0.99+

todayDATE

0.98+

FacebookORGANIZATION

0.98+

first oneQUANTITY

0.98+

bothQUANTITY

0.98+

ApacheORGANIZATION

0.98+

SiliconANGLE MediaORGANIZATION

0.98+

first researchQUANTITY

0.98+

Spark Summit East 2017EVENT

0.97+

HadoopTITLE

0.97+

two thingsQUANTITY

0.97+

two fistfuls of toolsQUANTITY

0.96+

theCUBEORGANIZATION

0.96+

oneQUANTITY

0.96+

day oneQUANTITY

0.95+

#SparkSummitEVENT

0.93+

siliconangle.comOTHER

0.93+

two axesQUANTITY

0.92+

Ion Stoica, Databricks - Spark Summit East 2017 - #sparksummit - #theCUBE


 

>> [Announcer] Live from Boston Massachusetts. This is theCUBE. Covering Sparks Summit East 2017. Brought to you by Databricks. Now here are your hosts, Dave Vellante and George Gilbert. >> [Dave] Welcome back to Boston everybody, this is Spark Summit East #SparkSummit And this is theCUBE. Ion Stoica is here. He's Executive Chairman of Databricks and Professor of Computer Science at UCal Berkeley. The smarts is rubbing off on me. I always feel smart when I co-host with George. And now having you on is just a pleasure, so thanks very much for taking the time. >> [Ion] Thank you for having me. >> So loved the talk this morning, we learned about RISELabs, we're going to talk about that. Which is the son of AMP. You may be the father of those two, so. Again welcome. Give us the update, great keynote this morning. How's the vibe, how are you feeling? >> [Ion] I think it's great, you know, thank you and thank everyone for attending the summit. It's a lot of energy, a lot of interesting discussions, and a lot of ideas around. So I'm very happy about how things are going. >> [Dave] So let's start with RISELabs. Maybe take us back, to those who don't understand, so the birth of AMP and what you were trying to achieve there and what's next. >> Yeah, so the AMP was a six-year Project at Berkeley, and it involved around eight faculties and over the duration of the lab around 60 students and postdocs, And the mission of the AMPLab was to make sense of big data. AMPLab started in 2009, at the end of 2009, and the premise is that in order to make sense of this big data, we need a holistic approach, which involves algorithms, in particular machine-learning algorithms, machines, means systems, large-scale systems, and people, crowd sourcing. And more precisely the goal was to build a stack, a data analytic stack for interactive analytics, to be used across industry and academia. And, of course, being at Berkeley, it has to be open source. (laugh) So that's basically what was AMPLab and it was a birthplace for Apache Spark that's why you are all here today. And a few other open-source systems like Mesos, Apache Mesos, and Alluxio which was previously called Tachyon. And so AMPLab ended in December last year and in January, this January, we started a new lab which is called RISE. RISE stands for Real-time Intelligent Secure Execution. And the premise of the new lab is that actually the real value in the data is the decision you can make on the data. And you can see this more and more at almost every organization. They want to use their data to make some decision to improve their business processes, applications, services, or come up with new applications and services. But then if you think about that, what does it mean that the emphasis is on the decision? Then it means that you want the decision to be fast, because fast decisions are better than slower decisions. You want decisions to be on fresh data, on live data, because decisions on the data I have right now are original but those are decisions on the data from yesterday, or last week. And then you also want to make targeted, personalized decisions. Because the decisions on personal information are better than aggregate information. So that's the fundamental premise. So therefore you want to be on platforms, tools and algorithms to enable intelligent real-time decisions on live data with strong security. And the security is a big emphasis of the lab because it means to provide privacy, confidentiality and integrity, and as you hear about data breaches or things like that every day. So for an organization, it is extremely important to provide privacy and confidentiality to their users and it's not only because the users want that, but it also indirectly can help them to improve their service. Because if I guarantee your data is confidential with me, you are probably much more willing to share some of your data with me. And if you share some of the data with me, I can build and provide better services. So that's basically in a nutshell what the lab is and what the focus is. >> [Dave] Okay, so you said three things: fast, live and targeted. So fast means you can affect the outcome. >> Yes. Live data means it's better quality. And then targeted means it's relevant. >> Yes. >> Okay, and then my question on security, I felt like when cloud and Big Data came to fore, security became a do-over. (laughter) Is that a fair assessment? Are you doing it over? >> [George] Or as Bill Clinton would call it, a Mulligan. >> Yeah, if you get a Mulligan on security. >> I think security is, it's always a difficult topic because it means so many things for so many people. >> Hmm-mmm. >> So there are instances and actually cloud is quite secure. It's actually cloud can be more secure than some on-prem deployments. In fact, if you hear about these data leaks or security breaches, you don't hear them happening in the cloud. And there is some reason for that, right? It is because they have trained people, you know, they are paranoid about this, they do a specification maybe much more often and things like that. But still, you know, the state of security is not that great. Right? For instance, if I compromise your operating system, whether it's in cloud or in not in the cloud, I can't do anything. Right? Or your VM, right? On all this cloud you run on a VM. And now you are going to allow on some containers. Right? So it's a lot of attacks, or there are attacks, sophisticated attacks, which means your data is encrypted, but if I can look at the access patterns, how much data you transferred, or how much data you access from memory, then I can infer something about what you are doing about your queries, right? If it's more data, maybe it's a query on New York. If it's less data it's probably maybe something smaller, like maybe something at Berkeley. So you can infer from multiple queries just looking at the access. So it's a difficult problem. But fortunately again, there are some new technologies which are developed and some new algorithms which gives us some hope. One of the most interesting technologies which is happening today is hardware enclaves. So with hardware enclaves you can execute the code within this enclave which is hardware protected. And even if your operating system or VM is compromised, you cannot access your code which runs into this enclave. And Intel has Intell SGX and we are working and collaborating with them actively. ARM has TrustZone and AMB also announced they are going to have a similar technology in their chips. So that's kind of a very interesting and very promising development. I think the other aspect, it's a focus of the lab, is that even if you have the enclaves, it doesn't automatically solve the problem. Because the code itself has a vulnerability. Yes, I can run the code in hardware enclave, but the code can send out >> Right. >> data outside. >> Right, the enclave is a more granular perimeter. Right? >> Yeah. So yeah, so you are looking and the security expert is in your lab looking at this, maybe how to split the application so you run only a small part in the enclave, which is a critical part, and you can make sure that also the code is secure, and the rest of the code you run outside. But the rest of the code, it's only going to work on data which is encrypted. Right? So there is a lot of interesting research but that's good. >> And does Blockchain fit in there as well? >> Yeah, I think Blockchain it's a very interesting technology. And again it's real-time and the area is also very interesting directions. >> Yeah, right. >> Absolutely. >> So you guys, I want George, you've shared with me sort of what you were calling a new workload. So you had batch and you have interactive and now you've got continuous- >> Continuous, yes. >> And I know that's a topic that you want to discuss and I'd love to hear more about that. But George, tee it up. >> Well, okay. So we were talking earlier and the objective of RISE is fast and continuous-type decisions. And this is different from the traditional, you either do it batch or you do it interactive. So maybe tell us about some applications where that is one workload among the other traditional workloads. And then let's unpack that a little more. >> Yeah, so I'll give you a few applications. So it's more than continuously interacting with the environment continuously, but you also learn continuously. I'll give you some examples. So for instance in one example, think about you want to detect a network security attack, and respond and diagnose and defend in the real time. So what this means is that you need to continuously get logs from the network and from the more endpoints you can get the better. Right? Because more data will help you to detect things faster. But then you need to detect the new pattern and you need to learn the new patterns. Because new security attacks, which are the ones that are effective, are slightly different from the past one because you hope that you already have the defense in place for the past ones. So now you are going to learn that and then you are going to react. You may push patches in real time. You may push filters, installing new filters to firewalls. So that's kind of one application that's going in real time. Another application can be about self driving. Now self driving has made tremendous strides. And a lot of algorithms you know, very smart algorithms now they are implemented on the cars. Right? All the system is on the cars. But imagine now that you want to continuously get the information from this car, aggregate and learn and then send back the information you learned to the cars. Like for instance if it's an accident or a roadblock an object which is dropped on the highway, so you can learn from the other cars what they've done in that situation. It may mean in some cases the driver took an evasive action, right? Maybe you can monitor also the cars which are not self-driving, but driven by the humans. And then you learn that in real time and then the other cars which follow through the same, confronted with the same situation, they now know what to do. Right? So this is again, I want to emphasize this. Not only continuous sensing environment, and making the decisions, but a very important components about learning. >> Let me take you back to the security example as I sort of process the auto one. >> Yeah, yeah. >> So in the security example, it doesn't sound like, I mean if you have a vast network, you know, end points, software, infrastructure, you're not going to have one God model looking out at everything. >> Yes. >> So I assume that means there are models distributed everywhere and they don't know what a new, necessarily but an entirely new attack pattern looks like. So in other words, for that isolated model, it doesn't know what it doesn't know. I don't know if that's what Rumsfeld called it. >> Yes (laughs). >> How does it know what to pass back for retraining? >> Yes. Yes. Yes. So there are many aspects and there are many things you can look at. And it's again, it's a research problem, so I cannot give you the solution now, I can hypothesize and I give you some examples. But for instance, you can look about, and you correlate by observing the affect. Some of the affects of the attack are visible. In some cases, denial of service attack. That's pretty clear. Even the And so forth, they maybe cause computers to crash, right? So once you see some of this kind of anomaly, right, anomalies on the end devices, end host and things like that. Maybe reported by humans, right? Then you can try to correlate with what kind of traffic you've got. Right? And from there, from that correlation, probably you can, and hopefully, you can develop some models to identify what kind of traffic. Where it comes from. What is the content, and so forth, which causes behavior, anomalous behavior. >> And where is that correlation happening? >> I think it will happen everywhere, right? Because- >> At the edge and at the center. >> Absolutely. >> And then I assume that it sounds like the models both at the edge and at the center are ensemble models. >> Yes. >> Because you're tracking different behavior. >> Yes. You are going to track different behavior and you are going to, I think that's a good hypothesis. And then you are going to assemble them, assemble to come up with the best decision. >> Okay, so now let's wind forward to the car example. >> Yeah. >> So it sound like there's a mesh network, at least, Peter Levine's sort of talk was there's near-local compute resources and you can use bitcoin to pay for it or Blockchain or however it works. But that sort of topology, we haven't really encountered before in computing, have we? And how imminent is that sort of ... >> I think that some of the stuff you can do today in the cloud. I think if you're on super-low latency probably you need to have more computation towards the edges, but if I'm thinking that I want kind of reactions on tens, hundreds of milliseconds, in theory you can do it today with the cloud infrastructure we have. And if you think about in many cases, if you can't do it within a few hundredths of milliseconds, it's still super useful. Right? To avoid this object which has dropped on the highway. You know, if I have a few hundred milliseconds, many cars will effectively avoid that having that information. >> Let's have that conversation about the edge a little further. The one we were having off camera. So there's a debate in our community about how much data will stay at the edge, how much will go into the cloud, David Flores said 90% of it will stay at the edge. Your comment was, it depends on the value. What do you mean by that? >> I think that that depends who am I and how I perceive the value of the data. And, you know, what can be the value of the data? This is what I was saying. I think that value of the data is fundamentally what kind of decisions, what kind of actions it will enable me to take. Right? So here I'm not just talking about you know, credit card information or things like that, even exactly there is an action somebody's going to take on that. So if I do believe that the data can provide me with ability to take better actions or make better decisions I think that I want to keep it. And it's not, because why I want to keep it, because also it's not only the decision it enables me now, but everyone is going to continuously improve their algorithms. Develop new algorithms. And when you do that, how do you test them? You test on the old data. Right? So I think that for all these reasons, a lot of data, valuable data in this sense, is going to go to the cloud. Now, is there a lot of data that should remain on the edges? And I think that's fair. But it's, again, if a cloud provider, or someone who provides a service in the cloud, believes that the data is valuable. I do believe that eventually it is going to get to the cloud. >> So if it's valuable, it will be persisted and will eventually get to the cloud? And we talked about latency, but latency, the example of evasive action. You can't send the back to the cloud and make the decision, you have to make it real time. But eventually that data, if it's important, will go back to the cloud. The other question of all this data that we are now processing on a continuous basis, how much actually will get persisted, most of it, much of it probably does not get persisted. Right? Is that a fair assumption? >> Yeah, I think so. And probably all the data is not equal. All right? It's like you want to maybe, even if you take a continuous video, all right? On the cars, they continuously have videos from multiple cameras and radar and lidar, all of this stuff. This continuous. And if you think about this one, I would assume that you don't want to send all the data to the cloud. But the data around the interesting events, you may want to do, right? So before and after the car has a near-accident, or took an evasive action, or the human had to intervene. So in all these cases, probably I want to send the data to the cloud. But for the most cases, probably not. >> That's good. We have to leave it there, but I'll give you the last word on things that are exciting you, things you're working on, interesting projects. >> Yeah, so I think this is what really excites me is about how we are going to have this continuous application, you are going to continuously interact with the environment. You are going to continuously learn and improve. And here there are many challenges. And I just want to say a few more there, and which we haven't discussed. One, in general it's about explainability. Right? If these systems augment the human decision process, if these systems are going to make decisions which impact you as a human, you want to know why. Right? Like I gave this example, assuming you have machine-learning algorithms, you're making a diagnosis on your MRI, or x-ray. You want to know why. What is in this x-ray causes that decision? If you go to the doctor, they are going to point and show you. Okay, this is why you have this condition. So I think this is very important. Because as a human you want to understand. And you want to understand not only why the decision happens, but you want also to understand what you have to do, you want to understand what you need to do to do better in the future, right? Like if your mortgage application is turned down, I want to know why is that? Because next time when I apply to the mortgage, I want to have a higher chance to get it through. So I think that's a very important aspect. And the last thing I will say is that this is super important and information is about having algorithms which can say I don't know. Right? It's like, okay I never have seen this situation in the past. So I don't know what to do. This is much better than giving you just the wrong decision. Right? >> Right, or a low probability that you don't know what to do with. (laughs) >> Yeah. >> Excellent. Ion, thanks again for coming in theCUBE. It was really a pleasure having you. >> Thanks for having me. >> You're welcome. All right, keep it right there everybody. George and I will be back to do our wrap right after this short break. This is theCUBE. We're live from Spark Summit East. Right back. (techno music)

Published Date : Feb 8 2017

SUMMARY :

Brought to you by Databricks. And now having you on is just a pleasure, So loved the talk this morning, [Ion] I think it's great, you know, and what you were trying to achieve there is the decision you can make on the data. So fast means you can affect the outcome. And then targeted means it's relevant. Are you doing it over? because it means so many things for so many people. So with hardware enclaves you can execute the code Right, the enclave is a more granular perimeter. and the rest of the code you run outside. And again it's real-time and the area is also So you guys, I want George, And I know that's a topic that you want to discuss and the objective of RISE and from the more endpoints you can get the better. Let me take you back to the security example So in the security example, and they don't know what a new, and you correlate both at the edge and at the center And then you are going to assemble them, to the car example. and you can use bitcoin to pay for it And if you think about What do you mean by that? So here I'm not just talking about you know, You can't send the back to the cloud And if you think about this one, but I'll give you the last word And you want to understand not only why that you don't know what to do with. It was really a pleasure having you. George and I will be back to do our wrap

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
David FloresPERSON

0.99+

GeorgePERSON

0.99+

George GilbertPERSON

0.99+

Dave VellantePERSON

0.99+

2009DATE

0.99+

Peter LevinePERSON

0.99+

Bill ClintonPERSON

0.99+

New YorkLOCATION

0.99+

90%QUANTITY

0.99+

JanuaryDATE

0.99+

AMBORGANIZATION

0.99+

last weekDATE

0.99+

DavePERSON

0.99+

yesterdayDATE

0.99+

IonPERSON

0.99+

ARMORGANIZATION

0.99+

BostonLOCATION

0.99+

six-yearQUANTITY

0.99+

December last yearDATE

0.99+

DatabricksORGANIZATION

0.99+

three thingsQUANTITY

0.99+

Boston MassachusettsLOCATION

0.99+

one exampleQUANTITY

0.99+

twoQUANTITY

0.98+

UCal BerkeleyORGANIZATION

0.98+

BerkeleyLOCATION

0.98+

AMPLabORGANIZATION

0.98+

Ion StoicaPERSON

0.98+

tens, hundreds of millisecondsQUANTITY

0.98+

todayDATE

0.97+

end of 2009DATE

0.96+

RumsfeldPERSON

0.96+

IntelORGANIZATION

0.96+

IntellORGANIZATION

0.95+

bothQUANTITY

0.95+

OneQUANTITY

0.95+

AMPORGANIZATION

0.94+

TrustZoneORGANIZATION

0.94+

Spark Summit East 2017EVENT

0.93+

around 60 studentsQUANTITY

0.93+

RISEORGANIZATION

0.93+

Sparks Summit East 2017EVENT

0.92+

oneQUANTITY

0.89+

one workloadQUANTITY

0.88+

Spark Summit EastEVENT

0.87+

Apache SparkORGANIZATION

0.87+

around eight facultiesQUANTITY

0.86+

this JanuaryDATE

0.86+

this morningDATE

0.84+

MulliganORGANIZATION

0.78+

few hundredths of millisecondsQUANTITY

0.77+

ProfessorPERSON

0.74+

GodPERSON

0.72+

theCUBEORGANIZATION

0.7+

few hundred millisecondsQUANTITY

0.67+

SGXCOMMERCIAL_ITEM

0.64+

MesosORGANIZATION

0.63+

one applicationQUANTITY

0.63+

Apache MesosORGANIZATION

0.62+

AlluxioORGANIZATION

0.62+

AMPLabEVENT

0.59+

TachyonORGANIZATION

0.59+

#SparkSummitEVENT

0.57+

Ziya Ma, Intel - Spark Summit East 2017 - #sparksummit - #theCUBE


 

>> [Narrator] Live from Boston Massachusetts. This is the Cube, covering Sparks Summit East 2017. Brought to you by Databricks. Now here are your hosts, Dave Alante and George Gilbert. >> Back to you Boston everybody. This is the Cube and we're here live at Spark Summit East, #SparkSummit. Ziya Ma is here. She's the Vice President of Big Data at Intel. Ziya, thanks for coming to the Cube. >> Thanks for having me. >> You're welcome. So software is our topic. Software at Intel. You know people don't necessarily associate Intel with always with software but what's the story there? >> So actually there are many things that we do for software. Since I manage the Big Data engineering organization so I'll just say a little bit more about what we do for Big Data. >> [Dave] Great. >> So you know Intel do all the processors, all the hardware. But when our customers are using the hardware, they like to get the best performance out of Intel hardware. So this is for the Big Data space. We optimize the Big Data solution stack, including Spark and Hadoop on top of Intel hardware. And make sure that we leverage the latest instructions set so that the customers get the most performance out of the newest released Intel hardware. And also we collaborated very extensively with the open source community for Big Data ecosystem advancement. For example we're a leading contributor to Apache Spark ecosystem. We're also a top contributor to Apache Hadoop ecosystem. And lately we're getting into the machine learning and deep learning and the AI space, especially integrating those capabilities into the Big Data eTcosystem. >> So I have to ask you a question to just sort of strategically, if we go back several years, you look at during the Unix days, you had a number of players developing hardware, microprocessors, there were risk-based systems, remember MIPS and of course IBM had one and Sun, et cetera, et cetera. Some of those live on but very, very small portion of the market. So Intel has dominated the general purpose market. So as Big Data became more mainstream, was there a discussion okay, we have to develop specialized processors, which I know Intel can do as well, or did you say, okay, we can actually optimize through software. Was that how you got here? Or am I understanding that? >> We believe definitely software optimization, optimizing through software is one thing that we do. That's why Intel actually have, you may not know this, Intel has one of the largest software divisions that focus on enabling and optimizing the solutions in Intel hardware. And of course we also have very aggressive product roadmap for advancing continuously our hardware products. And actually, you mentioned a general purpose computing. CPU today, in the Big Data market, still has more than 95% of the market. So that's still the biggest portion of the Big Data market. And will continue our advancement in that area. And obviously as the Ai and machine learning, deep learning use cases getting added into the Big Data domain and we are expanding our product portfolio into some other Silicon products. >> And of course that was kind of the big bet of, we want to bet on Intel. And I guess, I guess-- >> You should still do. >> And still do. And I guess, at the time, Seagate or other disk mounts. Now flash comes in. And of course now Spark with memory, it's really changing the game, isn't it? What does that mean for you and the software group? >> Right, so what do we... Actually, still we focus on the optimi-- Obviously at the hardware level, like Intel now, is not just offering the computing capability. We also offer very powerful network capability. We offer very good memory solutions, memory hardware. Like we keep talking about this non-volatile memory technologies. So for Big Data, we're trying to leverage all those newest hardware. And we're already working with many of our customers to help them, to improve their Big Data memory solution, the e-memory, analytics type of capability on Intel hardware, give them the most optimum performance and most secure result using Intel hardware. So that's definitely one thing that we continue to do. That's going to be our still our top priority. But we don't just limit our work to optimization. Because giving user the best experience, giving user the complete experience on Intel platform is our ultimate goal. So we work with our customers from financial services company. We work with folks from manufacturing. From transportation. And from other IOT internet of things segment. And to make sure that we give them the easiest Big Data analytics experience on Intel hardware. So when they are running those solutions they don't have to worry too much about how to make their application work with Intel hardware, and how to make it more performant with Intel hardware. Because that's the Intel software solution that's going to bridge the gap. We do that part of the job. And so that it will make our customers experience easier and more complete. >> You serve as the accelerant to the marketplace. Go ahead George. >> [Ziya] That's right. >> So Intel's big ML as the news product, as of the last month of so, open source solution. Tell us how there are other deep learning frameworks that aren't as fully integrated with Spark yet and where BigML fits in since we're at a Spark conference. How it backfills some functionality and how it really takes advantage of Intel hardware. >> George, just like you said, BigDL, we just open sourced a month ago. It's a deep learning framework that we organically built onto of Apache Spark. And it has quite some differences from the other mainstream deep learning frameworks like Caffe, Tensorflow, Torch and Tianu are you name it. The reason that we decide to work on this project was again, through our experience, working with our analytics, especially Big Data analytic customers, as they build their AI solutions or AI modules within their analytics application, it's funny, it's getting more and more difficult to build and integrate AI capability into their existing Big Data analytics ecosystem. They had to set up a different cluster and build a different set of AI capabilities using, let's say, one of the deep learning frameworks. And later they have to overcome a lot of challenges, for example, moving the model and data between the two different clusters and then make sure that AI result is getting integrated into the existing analytics platform or analytics application. So that was the primary driver. How do we make our customers experience easier? Do they have to leave their existing infrastructure and build a separate AI module? And can we do something organic on top of the existing Big Data platform, let's say Apache Spark? Can we just do something like that? So that the user can just leverage the existing infrastructure and make it a naturally integral part of the overall analytics ecosystem that they already have. So this was the primary driver. And also the other benefit that we see by integrating this BigDL framework naturally was the Big Data platform, is that it enables efficient scale-out and fault tolerance and elasticity and dynamic resource management. And those are the benefits that's on naturally brought by Big Data platform. And today, actually, just with this short period of time, we have already tested that BigDL can scale easily to tens or hundreds of nodes. So the scalability is also quite good. And another benefit with solution like BigDL, especially because it eliminates the need of setting a separate cluster and moving the model between different hardware clusters, you save your total cost of ownership. You can just leverage your existing infrastructure. There is no need to buy additional set of hardware and build another environment just for training the model. So that's another benefit that we see. And performance-wise, again we also tested BigDL with Caffe, Torch and TensorFlow. So the performance of BigDL on single node Xeon is orders of magnitude faster than out of box at open source Caffe, TensorFlow or Torch. So it definitely it's going to be very promising. >> Without the heavy lifting. >> And useful solution, yeah. >> Okay, can you talk about some of the use cases that you expect to see from your partners and your customers. >> Actually very good question. You know we already started a few engagement with some of the interested customers. The first customer is from Stuart Industry. Where improving the accuracy for steel-surface defect recognition is very important to it's quality control. So we worked with this customer in the last few months and built end-to-end image recognition pipeline using BigDL and Spark. And the customer just through phase one work, already improved it's defect recognition accuracy to 90%. And they're seeing a very yield improvement with steel production. >> And it used to by human? >> It used to be done by human, yes. >> And you said, what was the degree of improvement? >> 90, nine, zero. So now the accuracy is up to 90%. And another use case and financial services actually, is another use case, especially for fraud detection. So this customer, again I'm not at the customer's request, they're very sensitive the financial industry, they're very sensitive with releasing their name. So the customer, we're seeing is fraud risks were increasing tremendously. With it's wide range of products, services and customer interaction channels. So the implemented end-to-end deep learning solution using BigDL and Spark. And again, through phase one work, they are seeing the fraud detection rate improved 40 times, four, zero times. Through phase one work. We think there were more improvement that we can do because this is just a collaboration in the last few month. And we'll continue this collaboration with this customer. And we expect more use cases from other business segments. But that are the two that's already have BigDL running in production today. >> Well so the first, that's amazing. Essentially replacing the human, have to interact and be much more accurate. The fraud detection, is interesting because fraud detection has come a long way in the last 10 years as you know. Used to take six months, if they found fraud. And now it's minutes, seconds but there's a lot of false positives still. So do you see this technology helping address that problem? >> Yeah, we actually that's continuously improving the prediction accuracy is one of the goals. This is another reason why we need to bring AI and Big Data together. Because you need to train your model. You need to train your AI capabilities with more and more training data. So that you get much more improved training accuracy. Actually this is the biggest way of improving your training accuracy. So you need a huge infrastructure, a big data platform so that you can host and well manage your training data sets. And so that it can feed into your deep learning solution or module for continuously improving your training accuracy. So yes. >> This is a really key point it seems like. I would like to unpack that a little bit. So when we talk to customers and application vendors, it's that training feedback loop that gets the models smarter and smarter. So if you had one cluster for training that was with another framework, and then Spark was your... Rest of your analytics. How would training with feedback data work when you had two separate environments? >> You know that's one of the drivers why we're creating BigDL. Because, we tried to port, we did not come to BigDL at the very beginning. We tried to port the existing deep learning frameworks like Caffe and Tensorflow onto Spark. And you also probably saw some research papers folks. There's other teams that out there that's also trying to port Caffe, Tensorflow and other deep learning framework that's out there onto Spark. Because you have that need. You need to bring the two capabilities together. But the problem is that those systems were developed in a very traditional way. With Big Data, not yet in consideration, when those frameworks were created, were innovated. But now the need for converging the two becomes more and more clear, and more necessary. And that's we way, when we port it over, we said gosh, this is so difficult. First it's very challenging to integrate the two. And secondly the experience, after you've moved it over, is awkward. You're literally using Spark as a dispatcher. The integration is not coherent. It's like they're superficially integrated. So this is where we said, we got to do something different. We can not just superficially integrate two systems together. Can we do something organic on top of the Big Data platform, on top of Apache Spark? So that the integration between the training system, between the feature engineering, between data management can &be more consistent, can be more integrated. So that's exactly the driver for this work. >> That's huge. Seamless integration is one of the most overused phrases in the technology business. Superficial integration is maybe a better description for a lot of those so-called seamless integrations. You're claiming here that it's seamless integration. We're out of time but last word Intel and Spark Summit. What do you guys got going here? What's the vibe like? >> So actually tomorrow I have a keynote. I'm going to talk a little bit more about what we're doing with BigDL. Actually this is one of the big things that we're doing. And of course, in order for BigDL, system like BigDL or even other deep learning frameworks, to get optimum performance on Intel hardware, there's another item that we're highlighting at MKL, Intel optimized Math Kernel Library. It has a lot of common math routines. That's optimized for Intel processor using the latest instruction set. And that's already, today, integrated into the BigDL ecosystem.z6 So that's another thing that we're highlighting. And another thing is that those are just software. And at hardware level, during November, Intel's AI day, our executives from BK, Diane Bryant and Doug Fisher. They also highlighted the Nirvana product portfolio that's coming out. That will give you different hardware choices for AI. You can look at FPGA, Xeon Fi, Xeon and our new Nirvana based Silicon like Crestlake. And those are some good silicon products that you can expect in the future. Intel, taking us to Nirvana, touching every part of the ecosystem. Like you said, 95% share and in all parts of the business. Yeah, thanks very much for coming the Cube. >> Thank you, thank you for having me. >> You're welcome. Alright keep it right there. George and I will be back with our next guest. This is Spark Summit, #SparkSummit. We're the Cube. We'll be right back.

Published Date : Feb 8 2017

SUMMARY :

This is the Cube, covering Sparks Summit East 2017. This is the Cube and we're here live So software is our topic. Since I manage the Big Data engineering organization And make sure that we leverage the latest instructions set So Intel has dominated the general purpose market. So that's still the biggest portion of the Big Data market. And of course that was kind of the big bet of, And I guess, at the time, Seagate or other disk mounts. And to make sure that we give them the easiest You serve as the accelerant to the marketplace. So Intel's big ML as the news product, And also the other benefit that we see that you expect to see from your partners And the customer just through phase one work, So the customer, we're seeing is fraud risks in the last 10 years as you know. So that you get much more improved training accuracy. that gets the models smarter and smarter. So that the integration between the training system, Seamless integration is one of the most overused phrases integrated into the BigDL ecosystem We're the Cube.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
George GilbertPERSON

0.99+

GeorgePERSON

0.99+

SeagateORGANIZATION

0.99+

Dave AlantePERSON

0.99+

40 timesQUANTITY

0.99+

IBMORGANIZATION

0.99+

90%QUANTITY

0.99+

DavePERSON

0.99+

tomorrowDATE

0.99+

twoQUANTITY

0.99+

six monthsQUANTITY

0.99+

Ziya MaPERSON

0.99+

NovemberDATE

0.99+

Doug FisherPERSON

0.99+

two systemsQUANTITY

0.99+

tensQUANTITY

0.99+

more than 95%QUANTITY

0.99+

IntelORGANIZATION

0.99+

Boston MassachusettsLOCATION

0.99+

oneQUANTITY

0.99+

BostonLOCATION

0.99+

SparkTITLE

0.99+

firstQUANTITY

0.99+

ZiyaPERSON

0.99+

first customerQUANTITY

0.99+

a month agoDATE

0.98+

FirstQUANTITY

0.98+

Diane BryantPERSON

0.98+

Stuart IndustryORGANIZATION

0.98+

zero timesQUANTITY

0.98+

nineQUANTITY

0.98+

zeroQUANTITY

0.97+

two capabilitiesQUANTITY

0.97+

Big DataTITLE

0.97+

BigDLTITLE

0.97+

TensorflowTITLE

0.97+

95% shareQUANTITY

0.96+

CaffeTITLE

0.96+

one thingQUANTITY

0.96+

fourQUANTITY

0.96+

#SparkSummitEVENT

0.96+

one clusterQUANTITY

0.96+

up to 90%QUANTITY

0.96+

two different clustersQUANTITY

0.96+

HadoopTITLE

0.96+

todayDATE

0.96+

two separate environmentsQUANTITY

0.95+

CubeCOMMERCIAL_ITEM

0.95+

ApacheORGANIZATION

0.94+

DatabricksORGANIZATION

0.94+

Spark Summit East 2017EVENT

0.94+

Big DataORGANIZATION

0.93+

NirvanaLOCATION

0.92+

MIPSTITLE

0.92+

Spark Summit EastLOCATION

0.92+

hundreds of nodesQUANTITY

0.91+

secondlyQUANTITY

0.9+

BigMLTITLE

0.89+

Sparks Summit East 2017EVENT

0.89+

Mike Gualtieri, Forrester Research - Spark Summit East 2017 - #sparksummit - #theCUBE


 

>> Narrator: Live from Boston, Massachusetts, this is the Cube, covering Spark Summit East 2017, brought to you by Databricks. Now, here are your hosts, Dave Vellante and George Gilbert. >> Welcome back to Boston, everybody, where the town is still euphoric. Mike Gualtieri is here, he's the principal analyst at Forrester Research, attended the parade yesterday. How great was that, Mike? >> Yes. Yes. It was awesome. >> Nothing like we've ever seen before. All right, the first question is what was the bigger shocking surprise, upset, greatest win, was it the Red Sox over the Yankees or was it the Superbowl this weekend? >> That's the question, I think it's the Superbowl. >> Yeah, who knows, right? Who knows. It was a lot of fun. So how was the parade yesterday? >> It was magnificent. I mean, it was freezing. No one cared. I mean--but it was, yeah, it was great. Great to see that team in person. >> That's good, wish we could talk, We can, but we'll get into it. So, we're here at Spark Summit, and, you know, the show's getting bigger, you're seeing more sponsors, still heavily a technical audience, but what's your take these days? We were talking off-camera about the whole big data thing. It used to be the hottest thing in the world, and now nobody wants to have big data in their title. What's Forrester's take on that? >> I mean, I think big data-- I think it's just become mainstream, so we're just back to data. You know, because all data is potentially big. So, I don't think it's-- it's not the thing anymore. I mean, what do you do with big data? You analyze it, right? And part of what this whole Spark Summit is about-- look at all the sessions. Data science, machine learning, streaming analytics, so it's all about sort of using that data now, so big data is still important, but the value of big data comes from all this advanced analytics. >> Yeah, and we talked earlier, I mean, a lot of the value of, you know, Hadoop was cutting costs. You know, you've mentioned commodity components and reduction in denominator, and breaking the need for some kind of big storage container. OK, so that-- we got there. Now, shifting to new sources of value, what are you spending your time on these days in terms of research? >> Artificial intelligence, machine learning, so those are really forms of advanced analytics, so that's been-- that's been very hot. We did a survey last year, an AI survey, and we asked a large group of people, we said, oh, you know, what are you doing with AI? 58% said they're researching it. 19% said they're training a model. Right, so that's interesting. 58% are researching it, and far fewer are actually, you know, actually doing something with it. Now, the reality is, if you phrase that a little bit differently, and you said, oh, what are you doing with machine learning? Many more would say yes, we're doing machine learning. So it begs the question, what do enterprises think of AI? And what do they think it is? So, a lot of my inquiries are spent helping enterprises understand what AI is, what they should focus on, and the other part of it is what are the technologies used for AI, and deep learning is the hottest. >> So, you wrote a piece late last year, what's possible today in AI. What's possible today in AI? >> Well, you know, before understanding was possible, it's important to understand what's not possible, right? And so we sort of characterize it as there's pure AI, and there's pragmatic AI. So it's real simple. Pure AI is the sci-fi stuff, we've all seen it, Ex Machina, Star Wars, whatever, right? That's not what we're talking about. That's not what enterprises can do today. We're talking about pragmatic AI, and pragmatic AI is about building predictive models. It's about conversational APIs, to interact in a natural way with humans, it's about image analysis, which is something very hot because of deep learning. So, AI is really about the building blocks that companies have been using, but then using them in combination to create even more intelligent solutions. And they have more options on the market, both from open source, both from cloud services that-- from Google, Microsoft, IBM, and now Amazon, at their re-- Were you guys at their reinvent conference? >> I wasn't, personally, but we were certainly there. >> Yeah, they announced the Amazon AI, which is a set of three services that developers can use without knowing anything about AI or being a data scientist. But, I mean, I think the way to think about AI is that it is data science. It requires the expertise of a data scientist to do AI. >> Following up on that comment, which was really interesting, is we try and-- whereas vendors try and democratize access to machine learning and AI, and I say that with two terms because usually the machine learning is the stuff that's sort of widely accessible and AI is a little further out, but there's a spectrum when you can just access an API, which is like a pre-trained model-- >> Pre-trained model, yep. >> It's developer-accessible, you don't need to be a data scientist, and then at the other end, you know, you need to pick your algorithms, you need to pick your features, you need to find the right data, so how do you see that horizon moving over time? >> Yeah, no, I-- So, these machine learning services, as you say, they're pre-trained models, totally accessible by anyone, anyone who can call an API or a restful service can access these. But their scope is limited, right? So, if, for example, you take the image API, you know, the imaging API that you can get from Google or now Amazon, you can drop an image in there and it will say, oh, there's a wine bottle on a picnic table on the beach. Right? It can identify that. So that's pretty cool, there might be a lot of use cases for that, but think of an enterprise use case. No. You can't do it, and let me give you this example. Say you're an insurance company, and you have a picture of a steel roof that's caved in. If you give that to one of these APIs, it might say steel roof, it may say damage, but what it's not going to do is it's not going to be able to estimate the damage, it's not going to be able to create a bill of materials on how to repair it, because Google hasn't trained it at that level. OK, so, enterprises are going to have to do this themselves, or an ISV is going to have to do it, because think about it, you've got 10 years worth of all these pictures taken of damage. And with all of those pictures, you've got tons of write-ups from an adjuster. Whoa, if you could shove that into a deep learning algorithm, you could potentially have consumers take pictures, or someone untrained, and have this thing say here's what the estimate damage is, this is the situation. >> And I've read about like insurance use cases like that, where the customer could, after they sort of have a crack up, take pictures all around the car, and then the insurance company could provide an estimate, tell them where the nearest repair shops are-- >> Yeah, but right now it's like the early days of e-commerce, where you could send an order in and then it would fax it and they'd type it in. So, I think, yes, insurance coverage is taking those pictures, and the question is can we automate it, and-- >> Well, let me actually iterate on that question, which is so who can build a more end-to-end solution, assuming, you know, there's a lot of heavy lifting that's got to go on for each enterprise trying to build a use case like that. Is it internal development and only at big companies that have a few of these data science gurus? Would it be like an IBM Global Services or an EXIN SURE, or would it be like a vertical ISV where it's semi-custom, semi-patent? >> I think it's both, but I also think it's two or three people walking around this conference, right, understanding Spark, maybe understanding how to use TensorFlow in conjunction with Spark that will start to come up with these ideas as well. So I think-- I think we'll see all of those solutions. Certainly, like IBM with their cognitive computing-- oh, and by the way, so we think that cognitive computing equals pragmatic AI, right, because it has similar characteristics. So, we're already seeing the big ISVs and the big application developers, SAP, Oracle, creating AI-infused applications or modules, but yeah, we're going to see small ISVs do it. There's one in Austin, Texas, called InteractiveTel. It's like 10 people. What they do is they use the Google-- so they sell to large car dealerships, like Ernie Boch. And they record every conversation, phone conversation with customers. They use the Google pre-trained model to convert the speech to text, and then they use their own machine learning to analyze that text to find out if there's a customer service problem or if there's a selling opportunity, and then they alert managers or other people in the organization. So, small company, very narrowly focused on something like car buying. >> So, I wonder if we could come back to something you said about pragmatic AI. We love to have someone like you on the Cube, because we like to talk about the horses on the track. So, if Watson is pragmatic AI, and we all-- well, I think you saw the 60 Minutes show, I don't know, whenever it was, three or four months ago, and IBM Watson got all the love. They barely mentioned Amazon and Google and Facebook, and Microsoft didn't get any mention. So, and there seems to be sentiment that, OK, all the real action is in Silicon Valley. But you've got IBM doing pragmatic AI. Do those two worlds come together in your view? How does that whole market shake up? >> I don't think they come together in the way I think you're suggesting. I think what Google, Microsoft, Facebook, what they're doing is they're churning out fundamental technology, like one of the most popular deep learning frameworks, TensorFlow, is a Google thing that they open sourced. And as I pointed out, those image APIs, that Amazon has, that's not going to work for insurance, that's not going to work for radiology. So, I don't think they're in-- >> George Gilbert: Facebook's going to apply it differently-- >> Yeah, I think what they're trying to do is they're trying to apply it to the millions of consumers that use their platforms, and then I think they throw off some of the technology for the rest of the world to use, fundamentally. >> And then the rest of the world has to apply those. >> Yeah, but I don't think they're in the business of building insurance solutions or building logistical solutions. >> Right. >> But you said something that was really, really potentially intriguing, which was you could take the horizontal Google speech to text API, and then-- >> Mike Gualtieri: And recombine it. >> --put your own model on top of that. And that's, techies call that like ensemble modeling, but essentially you're taking, almost like an OS level service, and you're putting in a more vertical application on top of it, to relate it to our old ways of looking at software, and that's interesting. >> Yeah, because what we're talking about right now, but this conversation is now about applications. Right, we're talking about applications, which need lots of different services recombined, whereas mostly the data science conversation has been narrowly about building one customer lifetime value model or one churn model. Now the conversation, when we talk about AI, is becoming about combining many different services and many different models. >> Dave Vellante: And the platform for building applications is really-- >> Yeah, yeah. >> And that platform, the richest platform, or the platform that is, that is most attractive has the most building blocks to work with, or the broadest ones? >> The best ones, I would say, right now. The reason why I say it that way is because this technology is still moving very rapidly. So for an image analysis, deep learning, very good for image, nothing's better than deep learning for image analysis. But if you're doing business process models or like churn models, well, deep learning hasn't played out there yet. So, right now I think there's some fragmentation. There's so much innovation. Ultimately it may come together. What we're seeing is, many of these companies are saying, OK, look, we're going to bring in the open source. It's pretty difficult to create a deep learning library. And so, you know, a lot of the vendors in the machine learning space, instead of creating their own, they're just bringing in MXNet or TensorFlow. >> I might be thinking of something from a different angle, which is not what underlying implementation they're using, whether it's deep learning or whether it's just random forest, or whatever the terminology is, you know, the traditional statistical stuff. The idea, though, is you want a platform-- like way, way back, Windows, with the Win32 API had essentially more widgets for helping you build graphical applications than any other platform >> Mike Gualtieri: Yeah, I see where you're going. >> And I guess I'm thinking it doesn't matter what the underlying implementation is, but how many widgets can you string together? >> I'm totally with you there, yeah. And so I think what you're saying is look, a platform that has the most capabilities, but abstracts, the implementations, and can, you know, can be somewhat pluggable-- right, good, to keep up with the innovation, yeah. And there's a lot of new companies out there, too, that are tackling this. One of them's called Bonsai AI, you know, small startup, they're trying to abstract deep learning, because deep learning right now, like TensorFlow and MXNet, that's a little bit of a challenge to learn, so they're abstracting it. But so are a lot of the-- so is SAS, IBM, et cetera. >> So, Mike, we're out of time, but I want to talk about your talk tomorrow. So, AI meets Spark, give us a little preview. >> AI meets Spark. Basically, the prerequisite to AI is a very sophisticated and fast data pipeline, because just because we're talking about AI doesn't mean we don't need data to build these models. So, I think Spark gives you the best of both worlds, right? It's designed for these sort of complex data pipelines that you need to prep data, but now, with MLlib for more traditional machine learning, and now with their announcement of TensorFrames, which is going to be an interface for TensorFlow, now you've got deep learning, too. And you've got it in a cluster architecture, so it can scale. So, pretty cool. >> All right, Mike, thanks very much for coming on the Cube. You know, way to go Pats, awesome. Really a pleasure having you back. >> Thanks. >> All right, keep right there, buddy. We'll be back with our next guest right after this short break. This is the Cube. (peppy music)

Published Date : Feb 8 2017

SUMMARY :

brought to you by Databricks. Mike Gualtieri is here, he's the principal analyst It was awesome. All right, the first question is So how was the parade yesterday? Great to see that team in person. and, you know, the show's getting bigger, I mean, what do you do with big data? what are you spending your time on Now, the reality is, if you phrase that So, you wrote a piece late last year, So, AI is really about the building blocks It requires the expertise of a data scientist to do AI. So, if, for example, you take the image API, of e-commerce, where you could send an order in assuming, you know, there's a lot of heavy lifting and the big application developers, SAP, Oracle, We love to have someone like you on the Cube, that Amazon has, that's not going to work for insurance, Yeah, I think what they're trying to do Yeah, but I don't think they're in the business and you're putting in a more vertical application Yeah, because what we're talking about right now, And so, you know, a lot of the vendors you know, the traditional statistical stuff. and can, you know, can be somewhat pluggable-- So, Mike, we're out of time, So, I think Spark gives you the best of both worlds, right? Really a pleasure having you back. This is the Cube.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
IBMORGANIZATION

0.99+

Mike GualtieriPERSON

0.99+

AmazonORGANIZATION

0.99+

Dave VellantePERSON

0.99+

George GilbertPERSON

0.99+

GoogleORGANIZATION

0.99+

MicrosoftORGANIZATION

0.99+

twoQUANTITY

0.99+

MikePERSON

0.99+

Red SoxORGANIZATION

0.99+

FacebookORGANIZATION

0.99+

BostonLOCATION

0.99+

Star WarsTITLE

0.99+

10 yearsQUANTITY

0.99+

Silicon ValleyLOCATION

0.99+

two termsQUANTITY

0.99+

OracleORGANIZATION

0.99+

YankeesORGANIZATION

0.99+

10 peopleQUANTITY

0.99+

SuperbowlEVENT

0.99+

last yearDATE

0.99+

IBM Global ServicesORGANIZATION

0.99+

oneQUANTITY

0.99+

bothQUANTITY

0.99+

Ex MachinaTITLE

0.99+

Boston, MassachusettsLOCATION

0.99+

Win32TITLE

0.99+

first questionQUANTITY

0.99+

Austin, TexasLOCATION

0.99+

19%QUANTITY

0.99+

millionsQUANTITY

0.99+

yesterdayDATE

0.99+

threeDATE

0.99+

58%QUANTITY

0.99+

Forrester ResearchORGANIZATION

0.99+

three peopleQUANTITY

0.99+

SparkTITLE

0.99+

OneQUANTITY

0.99+

SASORGANIZATION

0.98+

tomorrowDATE

0.98+

three servicesQUANTITY

0.98+

DatabricksORGANIZATION

0.98+

Spark SummitEVENT

0.98+

both worldsQUANTITY

0.98+

TensorFramesTITLE

0.97+

MLlibTITLE

0.97+

SAPORGANIZATION

0.97+

todayDATE

0.96+

each enterpriseQUANTITY

0.96+

TensorFlowTITLE

0.96+

four months agoDATE

0.95+

two worldsQUANTITY

0.95+

WindowsTITLE

0.95+

CubeCOMMERCIAL_ITEM

0.94+

late last yearDATE

0.93+

Ernie BochPERSON

0.91+

Alfred Essa, McGraw Hill Education - Spark Summit East 2017 - #sparksummit - #theCUBE


 

>> Announcer: Live from Boston, Massachusetts this is the CUBE covering Spark Summit East 2017 brought to you by Databricks. Now, here are your hosts Dave Vellante and George Gilbert. >> Welcome back to Boston everybody this is the CUBE. We're live here at Spark Summit East in the Hynes Convention Center. This is the CUBE, check out SiliconANGLE.com for all the news of the day. Check out Wikibon.com for all the research. I'm really excited about this session here. Al Essa is here, he's the vice president of analytics and R&D at McGraw-Hill Education. And I'm so excited because we always talk about digital transformations and transformations. We have an example of 150 year old company that has been, I'm sure, through many transformations. We're going to talk about a recent one. Al Essa, welcome to the CUBE, thanks for coming on. >> Thank you, pleasure to be here. >> So you heard my little narrative up front. You, obviously, have not been with the company for 150 years (laughs), you can't talk about all the transformations, but there's certainly one that's recent in the last couple of years, anyway which is digital. We know McGraw Hill is a print publisher, describe your business. >> Yeah, so McGraw Hill Education has been traditionally a print publisher, but beginning with our new CEO, David Levin, he joined the company about two years ago and now we call ourselves a learning science company. So it's no longer print publishing, it's smart digital and by smart digital we mean we're trying to transform education by applying principles of learning science. Basically what that means is we try to understand, how do people learn? And how they can learn better. So there are a number of domains, cognitive science, brain sciences, data science and we begin to try to understand what are the known knowns in these areas and then apply it to education. >> I think Marc Benioff said it first, at least the first I heard he said there were going to be way more Saas companies that come out of non-tech companies than tech companies. We're talking off camera, you're a software company. Describe that in some detail. >> Yeah, so being a software company is new for us, but we've moved pretty quickly. Our core competency has been really expert knowledge about education. We work with educators, subject matter experts, so for over a hundred years, we've created vetted content, assessments, and so on. So we have a great deal of domain expertise in education and now we're taking, sort of the new area of frontiers of knowledge, and cognitive science, brain sciences. How can learners learn better and applying that to software and models and algorithms. >> Okay, and there's a data component to this as well, right? >> So yeah, the way I think about it is we're a smart digital company, but smart digital is fueled by smart data. Data underlies everything that we do. Why? Because in order to strengthen learners, provide them with the optimal pathway, as well as instructors. We believe instructors are at the center of this new transformation. We need to provide immediate, real-time data to students and instructors on, how am I doing? How can I do better? This is the predictive component and then you're telling me, maybe I'm not on the best path. So what's my, "How can I do better?" the optimal path. So all of that is based on data. >> Okay, so that's, I mean, the major reason. Do you do any print anymore? Yes, we still do print, because there's still a huge need for print. So print's not going to go away. >> Right. Okay, I just wanted to clarify that. But what you described is largely a business model change, not largely, it is a business model change. But also the value proposition is changing. You're providing a new service, related, but new incremental value, right? >> Yeah, yeah. So the value proposition has changed, and here again, data is critical. Inquiring minds want to know. Our customers want to know, "All right, we're going to use your technology "and your products and solutions, "show us "rigorously, empirically, that it works." That's the bottom line question. Is it effective? Are the tools, products, solutions, not just ours, but are our products and solutions have a context. Is the instruction effective? Is it effective for everyone? So all that is reliant on data. >> So how much of a course, how much of the content in a course would you prepare? Is it now the entire courseware and you instrument the students interaction with it? And then, essentially you're selling the outcomes, the improved outcomes. >> Yeah, I think that's one way to think about it. Here's another model change, so this is not so much digital versus non-digital, but we've been a closed environment. You buy a textbook from us, all the material, the assessments is McGraw Hill Education. But now a fundamental part of our thinking as a software company is that we have to be an open company. Doesn't mean open as in free, but it's an open ecosystem, so one of the things that we believe in very much is standards. So there's a standard body in education called IMS Global. My boss, Stephen Laster, is on the board of IMS Global. So think of that as, this encompasses everything from different tools working together, interoperability tools, or interoperability standards, data standards for data exchange. So, we will always produce great content, great assessments, we have amazing platform and analytics capability, however, we don't believe all of our customers are going to want to use everything from McGraw Hill. So interoperability standards, data standards is vital to what we're doing. >> Can you explain in some detail this learning science company. Explain how we learn. We were talking off camera about sort of the three-- >> Yeah, so this is just one example. It's well known that memory decays exponentially, meaning when you see some item of knowledge for the first time, unless something happens, it goes into short-term memory and then it evaporates. One of the challenges in education is how can I acquire knowledge and retain knowledge? Now most of the techniques that we all use are not optimal. We cram right before an exam. We highlight things and that creates the illusion that we'll be able to recall it. But it's an illusion. Now, cognitive science and research in cognitive science tells us that there are optimal strategies for acquiring knowledge and recalling it. So three examples of that are effort for recall. If you have to actively recall some item of knowledge, that helps with the stickiness. Another is space practice. Practicing out your recall over multiple sessions. Another one is interleaving. So what we do is, we just recently came out with a product last week called, StudyWise. What we've done is taken those principles, written some algorithms, applies those algorithms into a mobile product. That's going to allow learners to optimize their acquisition and recall of knowledge. >> And you're using Spark to-- >> Yeah, we're using Spark and we're using Databricks. So I think what's important there is not just Spark as a technology, but it's an ecosystem, it's a set of technologies. And it has to be woven together into a workflow. Everything from building the model and algorithm, and those are always first approximations. We do the best we can, in terms of how we think the algorithm should work and then deploy that. So our data science team and learning science team builds the models, designs the models, but our IT team wants to make sure that it's part of a workflow. They don't want to have to deal with a new set of technologies, so essentially pressing the button goes into production and then it doesn't stop there, because as Studywise has gone on the market last week, now we're collecting data real-time as learners are interacting with our products. The results of their interactions is coming in to our research environment and we're analyzing that data, as a way of updating our models and tuning the models. >> So would it be fair to say that it was interesting when you talked about these new ways of learning. If I were to create an analogy to Legacy Enterprise apps, they standardize business transactions and the workflows that went with them. It's like you're picking out the best practices in learning, codifying them into an application. And you've opened it up so other platforms can take some or all and then you're taking live feedback from the models, but not just tuning the existing model, but actually adding learning to the model over time as you get a better sense for how effort of recall works or interleaving works. >> Yeah, I think that's exactly right. I do want to emphasize something, an aspect of what you just said is we believe, and it's not just we believe, the research in learning science shows that we can get the best, most significant learning gains when we place the instructor, the master teacher, at the center of learning. So, doing that, not just in isolation, but what we want to do is create a community of practitioners, master teachers. So think of the healthcare analogy. We have expert physicians, so when we have a new technique or even an old technique, What's working? What's not working? Let's look at the data. What we're also doing is instrumenting our tools so that we can surface these insights to the master practitioners or master teachers. George is trying this technique, that's working or not working, what adjustments do we need to make? So it's not just something has to happen with the learner. Maybe we need to adjust our curriculum. I have to change my teaching practices, my assessments. >> And the incentive for the master practitioners to collaborate is because that's just their nature? >> I think it is. So let's kind of stand back, I think the current paradigm of instruction is lecture mode. I want to impart knowledge, so I'm going to give a lecture. And then assessment is timed tests. In the educational, the jargon for that is summit of assessments, so lecture and tests. That's the dominant paradigm in education. All the research evidence says that doesn't work. (laughs) It doesn't work, but we still do it. >> For how many hundreds of years? >> Yeah. Well, it was okay if we needed to train and educate a handful of people. But now, everyone needs to be educated and it's lifelong learning rate, so that paradigm doesn't work. And the research evidence is overwhelming that it doesn't work. We have to change our paradigm where the new paradigm, and this is again based on research, is differentiated instruction. Different learners are at different stages in their learning and depending on what you need to know, I'm at a different stage. So, we need assessments. Assessments are not punitive, they're not tests. They help us determine what kind of knowledge, what kind of information each learner needs to know. And the instructor helps with the differentiated instruction. >> It's an alignment. >> It's an alignment, yeah. Really to take it to the next stage, the master practitioners, if they are armed with the right data, they can begin to compare. All right, practices this way of teaching for these types of students works well, these are the adjustments that we need to make. >> So, bringing it down to earth with Spark, these models of how to teach, or perhaps how to differentiate the instruction, how to do differentiated assessments, these are the Spark models. >> Yeah, these are the Spark models. So let's kind of stand back and see what's different about traditional analytics or business intelligence and the new analytics enabled by Spark, and so on. First, traditional analytics, the questions that you need to be able to answer are defined beforehand. And then they're implemented in schemas in a data warehouse. In the new order of things, I have questions that I need to ask and they just arise right now. I'm not going to anticipate all the questions that I might want to be able to ask. So, we have to be enable the ability to ask new questions and be able to receive answers immediately. Second, the feedback loop, traditional analytics is a batch mode. Overnight, data warehouse gets updated. Imagine you're flying an airplane, you're the pilot, a new weather system emerges. You can't wait a week or six months to get a report. I have to have corrective course. I have to re-navigate and find a new course. So, the same way, a student encounters difficulty, tell me what I need to do, what course correction do I need to apply? The data has to come in real-time. The models have to run real-time. And if it's at scale, then we have to have parallel processing and then the updates, the round trip, data back to the instructor or the student has to be essentially real-time or near real-time. Spark is one of the technologies that's enabling that. >> The way you got here is kind of interesting. You used to be CIO, got that big Yale brain (laughs) working for you. You're not a developer, I presume, is that right? >> No. >> How did you end up in this role? >> I think it's really a passion for education and I think this is at McGraw Hill. So I'm a first generation college student, I went to public school in Los Angeles. I had a lot of great breaks, I had great teachers who inspired me. So I think first, it's education, but I think we have a major, major problem that we need to solve. So if we look at... So I spent five years with the Minnesota state colleges and university system, most of the colleges, community colleges are open access institutions. So let me just give you a quick statistic. 70% of students who enter community colleges are not prepared in math and english. So seven out of 10 students need remediation. Of the seven out of 10 students who need remediation, only 15% not 5-0, one-five succeed to the next level. This is a national tragedy. >> And that's at the community college level? >> That's at the community college level. We're talking about millions of students who are not making it past the first gate. And they go away thinking they've failed, they incurred debt, their life is now stuck. So this is playing itself out, not to tens of thousands of students, but hundreds of thousands of students annually. So, we've got to solve this problem. I think it's not technology, but reshaping the paradigm of how we think about education. >> It is a national disaster, because often times that's the only affordable route for folks and they are taking on debt, thinking okay, this is a gateway. Al, we have to leave it there. Awesome segment, thanks very much for coming to the CUBE, really appreciate it. >> Thank you very much. >> All right, you're welcome. Keep it right there, my buddy, George and I will be back with our next guest. This is the CUBE, we're live from Boston. Be right back. (techno music) >> Narrator: Since the dawn of the cloud

Published Date : Feb 8 2017

SUMMARY :

brought to you by Databricks. This is the CUBE, check out SiliconANGLE.com that's recent in the last couple of years, and then apply it to education. at least the first I heard he said and applying that to software and models and algorithms. This is the predictive component Okay, so that's, I mean, the major reason. But also the value proposition is changing. So the value proposition how much of the content in a course would you prepare? but it's an open ecosystem, so one of the things Explain how we learn. Now most of the techniques that we all use We do the best we can, in terms of how we think and the workflows that went with them. So it's not just something has to happen with the learner. All the research evidence says that doesn't work. And the research evidence is overwhelming the master practitioners, if they are armed So, bringing it down to earth with Spark, and the new analytics enabled by Spark, and so on. You're not a developer, I presume, is that right? Of the seven out of 10 students who need remediation, but reshaping the paradigm of how we think about education. that's the only affordable route for folks This is the CUBE, we're live from Boston.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
GeorgePERSON

0.99+

Marc BenioffPERSON

0.99+

Dave VellantePERSON

0.99+

George GilbertPERSON

0.99+

David LevinPERSON

0.99+

Stephen LasterPERSON

0.99+

Al EssaPERSON

0.99+

BostonLOCATION

0.99+

five yearsQUANTITY

0.99+

150 yearsQUANTITY

0.99+

sevenQUANTITY

0.99+

six monthsQUANTITY

0.99+

70%QUANTITY

0.99+

IMS GlobalORGANIZATION

0.99+

last weekDATE

0.99+

Los AngelesLOCATION

0.99+

OneQUANTITY

0.99+

10 studentsQUANTITY

0.99+

Boston, MassachusettsLOCATION

0.99+

one exampleQUANTITY

0.99+

SecondQUANTITY

0.99+

FirstQUANTITY

0.99+

a weekQUANTITY

0.99+

Hynes Convention CenterLOCATION

0.99+

first gateQUANTITY

0.99+

15%QUANTITY

0.99+

first timeQUANTITY

0.99+

McGraw-Hill EducationORGANIZATION

0.99+

firstQUANTITY

0.99+

YaleORGANIZATION

0.98+

Alfred EssaPERSON

0.98+

Spark Summit East 2017EVENT

0.98+

first approximationsQUANTITY

0.98+

McGraw Hill EducationORGANIZATION

0.98+

SaasORGANIZATION

0.98+

Spark Summit EastLOCATION

0.98+

oneQUANTITY

0.98+

DatabricksORGANIZATION

0.97+

three examplesQUANTITY

0.97+

tens of thousands of studentsQUANTITY

0.97+

threeQUANTITY

0.96+

hundreds of thousands of studentsQUANTITY

0.96+

CUBEORGANIZATION

0.96+

each learnerQUANTITY

0.96+

HillORGANIZATION

0.94+

millions of studentsQUANTITY

0.93+

SparkTITLE

0.93+

MinnesotaLOCATION

0.92+

StudywiseORGANIZATION

0.92+

first generationQUANTITY

0.92+

SiliconANGLE.comOTHER

0.91+

hundreds of yearsQUANTITY

0.91+

5-0QUANTITY

0.91+

last couple of yearsDATE

0.9+

McGraw HillORGANIZATION

0.9+

over a hundred yearsQUANTITY

0.88+

150 year oldQUANTITY

0.87+

Since the dawn of the cloudTITLE

0.87+

about two years agoDATE

0.85+

McGraw HillPERSON

0.84+

Narrator:TITLE

0.8+

one-fiveQUANTITY

0.78+

one wayQUANTITY

0.75+

Spark Summit EastEVENT

0.74+

McGrawPERSON

0.73+

StudyWiseORGANIZATION

0.65+

Wikibon.comORGANIZATION

0.64+

Legacy EnterpriseTITLE

0.63+

R&DORGANIZATION

0.6+

2017DATE

0.59+

vicePERSON

0.55+

englishOTHER

0.54+

Aaron Colcord & David Favela, FIS Global - Spark Summit East 2017 - #sparksummit - #theCUBE


 

>> Narrator: Live, from Boston, Massachusetts, this is theCUBE, covering Spark Summit East 2017, brought to you by Databricks. Now, here are your hosts, David Vellante and George Gilbert. >> Back to Boston, everybody, where the city is bracing for a big snowstorm. Still euphoric over the Patriots' big win. Aaron Colcord is here, he's the director of engineering at FIS Global, and he's joined by Dave Favela, who's the director of BI at FIS Global. Gentlemen, welcome to theCUBE. It's good to see you. >> Yeah, thank you. >> Thank you very much. >> Thanks so much for coming on. So Dave, set it up. FIS Global, the company that does a ton of work in financial services that nobody's ever heard of. >> Yeah, absolutely, absolutely. Yeah, we serve and touch virtually every credit union or bank in the United States, and have services that extend globally, and that ranges anywhere from back office services to technology services that we provide by way of mobile banking or online banking. And so, we're a Fortune 500 company with a reach, like I said, throughout the nation and globally. >> So, you're a services company that provides, sort of, end-to-end capabilities for somebody who wants to start a bank, or upgrade their infrastructure? >> Absolutely, yeah. So, whether you're starting a bank or whether you're an existing bank looking to offer some type of technology, whether it's back-end processing services, mobile banking, bill pay, peer-to-peer payments, so, we are considered a FinTech company, and one of the largest FinTech companies there is. >> And Aaron, your role as the director of engineering, maybe talk about that a little bit. >> My role is primarily about the mobile data analytics, about creating a product that's able to not only be able to give the basic behavior of our mobile application, but be able to actually dig deeper and create interesting analytics, insights into the data, to give our customers understanding about not only the mobile application, but be able to even, as we're building right now, a use case for being able to take action on that data. >> So, I mean, mobile obviously is sweeping the banking industry by storm, I mean, banks have always been, basically, IT companies, when you think about it, a huge component of IT, but now mobile comes in and, maybe talk a little bit about, sort of the big drivers in the business, and how, you know, mobile is fitting in. >> Absolutely. So, first of all, you see a shift that's happening with the end user: you, David, as a user of mobile banking, right? You probably have gone to the branch maybe once in the last 90 days, but have logged into mobile banking 10 times. So, we've seen anywhere from an eight to nine time shift in usage and engagement on the digital channel, and what that means is, more interactions and more touch points that the bank is getting off of the consumer behavior. And so, what we're trying to do here is turn that into getting to know the customer profile better, so that they could better serve in this digital channel, where there's a lot more interactions occurring. >> Yeah, I mean, you look at the demographic, too. I mean, my kids don't even use cheques. Right, I mean, it's all, everything's done on mobile, Venmo, or whatever, the capabilities they have. So, what's the infrastructure behind that that enables it? I mean, it can't be what it used to be. I mean, probably back-end still is, but what else do you have to create to enable that? >> Well, it's been a tremendous amount of transformation on the back-ends over the last ten years, and particularly when we talk about how that interaction has changed, from becoming a more formal experience to becoming a more intimate experience through the mobile client. But, more specifically to the back-end, we have actually implemented Apache Spark as one of our platforms, to actually help transform and move the data faster. Mobile actually creates a tremendous amount of back-end activity, sometimes even more than what we were able to see in other channels. >> Yeah, and if you think about it, if you just kind of step back a little bit, this is about core banking, right, and as you speak to IT systems, and so, if you think about all the transactions that happen on the daily, whether you're in branch, at ATM, on a mobile device, it's processed through a core banking system, and so one of the challenges that, I think, this industry and FinTech is up against is that, you've got all these legacy old systems that have been built that can't compute all this data at a fast enough rate, and so for us, bringing in Aaron, this is about, how do you actually leverage new technology, and take the technical data of the old systems, data schemas and models, and marry the two to provide data, key data that's been generated. >> Dave: Without shutting down the business. >> Without shutting down the business. >> Because that's the hard part. >> Can you elaborate on that, because that's non-trivial. It used to be when banks merged, it could take years for the back-off of systems to come together. So now, let's say a bank comes to you, they have their, I don't want to say legacy systems, it's the systems they've built up over time, but they want the more modern capabilities. How do you marry the two? >> Would you take a first stab? >> Well, it is actually a very complicated process, because you always have to try to understand data itself, and how to put those two things together. More specifically on the mobile client, because of the way that we are able to think about how data can be transformed and transported, we came up with a very flexible mechanism to allow data to actually be interpreted on the fly, and processed, so that when you talk about two different banks, by transforming it into this type of format, we're able to kind of reinterpret it and process it. >> Would this be, could you think of this as a very, very smart stream processor that, where ETL would be at the most basic layer, and then you're adding meaning to the data so that it shows up to the mobile client in a way that coheres to the user model that the user is experiencing on their device? >> I think that's a really good way of putting it, yeah. I mean, there's a, we like to think of it, I call it a semantic layer, of how you, one, treat ETL as one process, and then you have a semantic layer that you basically transform the bottom bits, so to speak, into components that you can then assemble semantically so that it starts making sense to the end user. >> And to that point, you know, to your integration question, it is very challenging, because you're trying to marry the old with the new, and we'll tease the section for tomorrow in which Aaron will talk about that, but for us, at enterprise grade, it has to be done very cautiously, right? And we're under heavy regulation and compliance and security, and so, it's not about abandoning the old, right? It's trying to figure out, how do we take that, what's been in place and been stable, and then couple it with the new technology that we're introducing. >> Which is interesting conversation, the old versus new, and I look at your title, Dave, and it's got 'BI' in it. I remember I interviewed Christian Chabot, who was then CEO of Tableau, and he's like, "Old, slow, BI", okay, now you guys are here talking about Spark. Spark's all about real-time and speed and memory, and everything else. Talk about the transformation in your role as this industry has transformed. >> Yeah, absolutely, so, when we think about business intelligence and creating that intelligence layer, we elected the mobile channel, right? Because we're seeing that most inner activities happen there. So for us, an intelligent BI solution is not just, you know, data management and analytics platform. There has to be the fulfillment. You talk a lot about actioning on your data. So for us, it's, if we could actually create, you know, intelligence layer to analytics level, how can we feed marketing solutions with this intelligence to have the full circle and insights back? I believe, the gentlemen, they were talking about the RISE Lab in this morning session. >> Dave: The follow-on to AMP, basically. >> Yeah, exactly. So, there it was all about that feedback loop, right? And so, for us, when we think about BI, the whole loop is from data management to end-to-end marketing solutions, and then back, so that we can serve the mobile customer. >> Well, so, you know, the original promise of the data warehouse was this 365, what you just described, right? And being able to effect business outcomes, and that is now the promise of so-called big data, even though people don't really like that term anymore, so, my question is, is it same line, new bottle, or is it really transformational? Are we going to live up to that challenge this time around? As practitioners, I'd really love your input on that. >> I think I'd love to expand on that. >> Absolutely. >> Yeah, I mean, I don't think it's, I think it's a whole new bottle and a whole new wine. David here is from wine country, and, there's definitely the, data warehouse introduced the important concepts, of which is a tremendous foundation for us to stand on. You know, you always like to stand on the shoulders of giants. It introduced a concept, but in the case of marrying the new with the old, there's a tremendous extra third dimension, okay? So, we have a velocity dimension when we start talking about Apache Spark. We can accelerate it, make it go quick, and we can get that data. There's another aspect there when we start talking about, for example, hey, different banks have different types of way that they like to talk to it, so now we're kind of talking about, there's variation in people's data, and Apache Spark, actually, is able to give that capability to process data that is different than each other, and then being able to marry it, down the pipe, together. And then the additional, what I think is actually making it into a new wine is, when we start talking about data, the traditional mechanism, data warehousing, that 360 view of the customer, they were thinking more of data as in, I like to think of it as, let's count beans, right? Let's just come up with what how many people were doing X, how many were doing this? >> Dave: Accurate reporting, yeah. >> Exactly, and if you think about it, it was driving the business through the rear-view mirror, because all you had to do was base it off of the historical information, and that's how we're going to drive the business. We're going to look in the rear-view mirror, we're going to look at what's been going on, and then we're going to see what's going on. And I think the transformation here is taking technologies and being able to say, how do we put not only predictive analytics inside play, but how do we actually allow the customer to take control and actually move forward? And then, as well, expand those use cases for variation, use that same technology to look for, between the data points, are there more data points that can be actually derived and moved forward on? >> George, I loved that description. You have, in one of your reports, I remember, George had this picture of this boat, and he said, "Oh, imagine trying to drive the boat", and it was looking at the wake (laughs), you know, right? Rather than looking in the rear-view mirror. >> But in addition to that, yeah, it's like driving the rear-view mirror, but you also said something interesting about, sort of, I guess the words I used to use were anticipating and influencing the customer. >> Aaron: Exactly. >> Can you talk about how much of that is done offline, like scoring profiles, and how much of that is done in real-time with the customer? >> Go ahead. >> Well, a lot of it still is still being done offline, mostly because, you know, as trying to serve a bank, you have to also be able to serve their immediate needs. So, really, we're evolving to actually build that use case around the real-time. We actually do have the technology already in place. We built the POCs, we built the technology inside, we're being able to move real-time, and we're ready to go there. >> So, what will be the difference? Me as a consumer, how will that change my experience? >> I think that would probably be best for you. >> Yeah, well, just got to step back a little bit, too, because, you know, what we're representing here is the digital channel mobile analytics, right? But, there's other areas within FIS Global that handles real-time payments with real-time analytics, such as a credit card division, right? So, both are happening sort of in parallel right now. For us, from our perspective on the mobile and digital front, the experience and how that's going to change is that, if you were a bank, and as a bank or a credit union you're receiving this behavioral data from our product, you want to be able to offer up better services that meet your consumer profile, right? And so, from our standpoint, we're working with other teams within FIS Global via Spark and Cloud, to essentially get that holistic profile to offer up those services that are more targeted, that are, I think, more meaningful to the consumer when they're in the mobile banking application. >> So, does FIS provide that sort of data service, that behavioral service, sort of as a turnkey service, or as a service, or is that something that you sort of teach the bank or the credit union how to fish? >> That's a really good question. We stated our mission statement as helping these institutions, creating a culture of being data-driven, right? So, give them the taste of data in a way that, you know, democratizing data, if you will, as we talked about this morning. >> Dave: Yeah, that's right. >> That concept's really important to us, because with that comes, give FIS more data, right? Send them more data, or have them teach us how to manage all this data, to have a data science experience, where we can go in and play with the data to create our own sub-targeting, because our belief is that, you know, our clients know their customers the best, so we're here to serve them with tools to do that. >> So, I want to come back to the role of Spark. I mean, Hadoop was profound, right, I mean, shipped five megabytes of code, a petabyte a day, no doubt about it. But at the same time, it was a heavy lift. It still is a heavy lift. So talk about the role of Spark in terms of catalyzing that vision that we've been talking about. >> Oh, definitely. So, Apache Spark, when we talk in terms of big data, big data got started with Hadoop, and MapReduce was definitely an interesting concept, but Apache Spark really lifted and accelerates the entire vision of big data. When you look at, for example, MapReduce, you need to go get a team of trained engineers, who are typically going to work in a lower level language like Java, and they no longer focus in on what the business objectives are. They're focusing on the programming objectives, the requirements. With Spark, because it takes a more high-level abstraction of how we process data, it means that you're more focusing on, what's the actual business case? How are we actually abstracting the data? How are we moving data? But then it also gives you that same capability to go inside the actual APIs, get a little bit lower, to modify it for what's your specific needs. So, I think the true transformation with Apache Spark is basically allowing us, now, like for example, in the presentation this morning, it was, there's a lot of people who are using Scala. We use Scala, ourselves. There's now a lot of people who are using Python, and everybody's using SQL. How does SQL, something that has survived so robustly for almost 30, 40 years, still keep on coming back like a boomerang on us? And it's because a language composed of four simple keywords is just so easy to use, and so descriptive and declarative, that allows us to actually just concentrate on the business, and I think that's actually the acceleration that Apache Spark actually brings to the business, is being able to just focus in on what you're actually trying to do, and focus in on your objectives, and it actually lowers the actual, that same team of engineers that you're using for MapReduce now become extremely more productive. I mean, when I look at the number of lines of codes that we had to do to figure out machine learning and Hadoop, to the amount of lines that you have to do in Apache Spark, it's tremendously, it's like, five lines in Apache Spark, 30 in MapReduce, and the system just responds and gives it to you a hundred times faster. >> Why Spark, too? I mean, Spark, when we saw it two years ago, to your point of this tidal wave of data, we saw more mobile phone adoption, we saw those people that were on mobile banking using it more, logging in more, and then we're seeing the proliferation of devices, right, in IoT, so for us, these are all these interaction and data points that is a tsunami that's coming our way, so that's when we strategically elected to go Spark, so we could handle the volume and compute storage- >> And Aaron, what you just described is, all the attention used to be on just making it work, and now it's putting to work, is really- >> Aaron: Right, exactly. >> You're seeing that in your businesses. >> Quick question. Do you see, now that you have this, sort of, lower and lower latency analytics and ability to access more of the, what previously were data silos, do you see services that are possible that banks couldn't have thought of before, beyond just making different products recommended at the appropriate moment, are there new things that banks can offer? >> It's interesting. On one hand, you free up their time from an analysis standpoint, to where they could actually start to get out of the weeds to think about new products and services, so, from that component, yes. From the standpoint of seeing pattern recognition in the data, and seeing what it can do aside from target marketing, our products are actually often used by our product owners internally to understand, what are the consumers doing on the device, so that they could actually come up with better services to ultimately serve them, aside from marketing solutions. >> Notwithstanding your political affiliations, we won't go there, but there's certainly a mood of, and a trend toward, deregulation, that's presumably good news for the financial services industry. Can you comment on that, or, what's the narrative going on in your customer base? Are they excited about fewer regulations, or is that just all political nonsense? Any thoughts? >> Yeah (laughs), you know, on one hand, why people come to FIS is because we do adhere to a compliance and regulation standpoint, right? >> Dave: Complexity is your friend, then (laughs). >> Absolutely, right, so they can trust us in that regard, right? And so, from our vantage point, will it go away entirely? No, absolutely not, right. I think Cloud introduces a whole new layer of complexity, because how do you handle Cloud computing and NPI, and PII data in the Cloud, and our customers look to us to make sure that, first and foremost, security for the end consumer is in place, and so, but I think it's an interesting question, and one that you are seeing end users click through without even viewing agreements or whatnot, they just want to get to product, right? So, you know, will it go away, or do we see it going away? No, but ... >> You guys don't read all that text, do you? (laughing) >> No comment? >> Required, required to. >> You know, no matter where it goes with the politics, I think there's a theme over the last 10 years, and the 10 years before. Things are transforming, things are evolving in ways, and sometimes going extremely, extremely fast in ways that we don't, surely can't anticipate. I think, if we were to think about just a mobile application, or the mobile bank experience 10 years ago, all we wanted was just to be able to see just the bank balance, and now we're able to take that same application and not only see our bank balance, but be able to deposit our cheque, or even replace the card in our pocket completely, with the mobile app, and I think we're going to see the exact same types of transformations over the industry over the next 10 years. Whether or not it's more regulation or different regulation, I think it's going to still speak to the same services, which FIS is there to help deliver. >> Yeah, and you're right, there are going to be new regulations, because they'll evolve, maybe out with the old, in with the new, you see, and global regulations are on run book, and you've got your Cloud, there's data locality, and you know, it's never-ending. That's great for your business. Fantastic. >> It comes down to trust, ultimately, right? I mean, they still, our customers still go to banks and credit unions because they trust them with their data, if you will, or their online currency, in some regards. So, you know, that's not going to change. >> Right, yeah. Well, Aaron, Dave, thanks very much for coming to theCUBE, it was great to have you. >> Thanks so much for talking with us. >> Absolutely, good luck with everything. >> Alright, keep it right there, buddy. We'll be back with our next guest. This is theCUBE. We're live from Boston, Spark Summit East, #SparkSummit. Be right back. >> I remember, when I had such a fantastic batting practice-

Published Date : Feb 8 2017

SUMMARY :

brought to you by Databricks. It's good to see you. FIS Global, the company that does a ton of work and have services that extend globally, and one of the largest FinTech companies there is. maybe talk about that a little bit. but be able to actually dig deeper and how, you know, mobile is fitting in. that the bank is getting off of the consumer behavior. but what else do you have to create to enable that? and particularly when we talk about and so one of the challenges that, I think, it's the systems they've built up over time, and how to put those two things together. so that it starts making sense to the end user. and so, it's not about abandoning the old, right? Talk about the transformation in your role and creating that intelligence layer, and then back, so that we can serve the mobile customer. and that is now the promise of so-called big data, and then being able to marry it, down the pipe, together. Exactly, and if you think about it, and it was looking at the wake (laughs), you know, right? But in addition to that, yeah, We built the POCs, we built the technology inside, the experience and how that's going to change is that, you know, democratizing data, if you will, because our belief is that, you know, But at the same time, it was a heavy lift. and the system just responds and gives it to you and ability to access more of the, so that they could actually come up with better services for the financial services industry. and one that you are seeing end users click through and the 10 years before. and you know, it's never-ending. because they trust them with their data, if you will, it was great to have you. We'll be back with our next guest.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
DavePERSON

0.99+

AaronPERSON

0.99+

Dave FavelaPERSON

0.99+

David VellantePERSON

0.99+

Aaron ColcordPERSON

0.99+

George GilbertPERSON

0.99+

DavidPERSON

0.99+

FIS GlobalORGANIZATION

0.99+

David FavelaPERSON

0.99+

Christian ChabotPERSON

0.99+

GeorgePERSON

0.99+

10 timesQUANTITY

0.99+

FISORGANIZATION

0.99+

BostonLOCATION

0.99+

five megabytesQUANTITY

0.99+

ScalaTITLE

0.99+

twoQUANTITY

0.99+

TableauORGANIZATION

0.99+

JavaTITLE

0.99+

eightQUANTITY

0.99+

PythonTITLE

0.99+

RISE LabORGANIZATION

0.99+

SQLTITLE

0.99+

two different banksQUANTITY

0.99+

Boston, MassachusettsLOCATION

0.99+

United StatesLOCATION

0.99+

five linesQUANTITY

0.99+

two thingsQUANTITY

0.99+

two years agoDATE

0.99+

tomorrowDATE

0.98+

MapReduceTITLE

0.98+

SparkTITLE

0.98+

ApacheORGANIZATION

0.98+

bothQUANTITY

0.98+

10 years agoDATE

0.98+

third dimensionQUANTITY

0.97+

360 viewQUANTITY

0.97+

Patriots'ORGANIZATION

0.97+

oneQUANTITY

0.97+

DatabricksORGANIZATION

0.96+

Spark Summit East 2017EVENT

0.96+

nine timeQUANTITY

0.96+

first stabQUANTITY

0.95+

#SparkSummitEVENT

0.94+

HadoopTITLE

0.94+

a petabyte a dayQUANTITY

0.93+

one processQUANTITY

0.92+

VenmoORGANIZATION

0.9+

30QUANTITY

0.9+

almost 30QUANTITY

0.9+

10 yearsDATE

0.9+

Apache SparkORGANIZATION

0.89+

FinTechORGANIZATION

0.89+

GlobalEVENT

0.89+

onceQUANTITY

0.89+

#sparksummitEVENT

0.88+

theCUBEORGANIZATION

0.88+

CloudTITLE

0.86+

firstQUANTITY

0.86+

four simple keywordsQUANTITY

0.86+

this morningDATE

0.84+

last 10 yearsDATE

0.84+

Seth Dobrin, IBM Analytics - Spark Summit East 2017 - #sparksummit - #theCUBE


 

>> Narrator: Live from Boston, Massachusetts, this is theCUBE! Covering Spark Summit East 2017. Brought to you by, Databricks. Now, here are your hosts, Dave Vellante and George Gilbert. >> Welcome back to Boston, everybody, Seth Dobrin is here, he's the vice president and chief data officer of the IBM Analytics Organization. Great to see you, Seth, thanks for coming on. >> Great to be back, thanks for having me again. >> You're welcome, so chief data officer is the hot title. It was predicted to be the hot title and now it really is. Many more of you around the world and IBM's got an interesting sort of structure of chief data officers, can you explain that? >> Yeah, so there's a global chief data officer, that's Inderpal Bhandari and he's been on this podcast or videocast a view times. Then he's set up structures within each of the business units in IBM. Where each of the major business units have a chief data officer, also. And so I'm the chief data officer for the analytics business unit. >> So one of Interpol's things when I've interviewed them is culture. The data culture, you've got to drive that in. And he talks about the five things that chief data officers really need to do to be successful. Maybe you could give us your perspective on how that flows down through the organization and what are the key critical success factors for you and how are you implementing them? >> I agree, there's five key things and maybe I frame a little differently than Interpol does. There's this whole cloud migration, so every chief data officer needs to understand what their cloud migration strategy is. Every chief data officer needs to have a good understanding of what their data science strategy is. So how are they going to build the posable data science assets. So not data science assets that are delivered through spreadsheets. Every chief data officer needs to understand what their approach to unified governance is. So how do I govern all of my platforms in a way that enables that last point about data science. And then there's a piece around people. How do I build a pipeline for me today and the future? >> So the people piece is both the skills, and it's presumably a relationship with the line of business, as well. There's sort of two vectors there, right? >> Yeah the people piece when I think of it, is really about skills. There's a whole cultural component that goes across all of those five pieces that I laid out. Finding the right people, with the right skillset, where you need them, is hard. >> Can you talk about cloud migration, why that's so critical and so hard? >> If you look at kind of where the industry's been, the IT industry, it's been this race to the public cloud. I think it's a little misguided, all along. If you look at how business is run, right? Today, enterprises that are not internet born, make their money from what's running their businesses today. So this business critical assets. And just thinking that you can pick those up and move them to the cloud and take advantage of cloud, is not realistic. So the race really, is to a hybrid cloud. Our future's really lie in how do I connect these business critical assets to the cloud? And how do I migrate those things to the cloud? >> So Seth, the CIO might say to you, "Okay, let's go there for a minute, I kind of agree with what you're saying, I can't just shift everything in to the cloud. But what I can do in a hybrid cloud that I can't do in a public cloud?" >> Well, there's some drivers for that. I think one driver for hybrid cloud is what I just said. You can't just pick everything up and move it overnight, it's a journey. And it's not a six month journey, it's probably not a year journey, it's probably a multi year journey. >> Dave: So you can actually keep running your business? >> So you can actually keep running your business. And then other piece is there's new regulations that are coming up. And these regulations, EUGDPR is the biggest example of them right now. There are very stiff fines, for violations of those policies. And the party that's responsible for paying those fines, is the party that who the consumer engaged with. It's you, it's whoever owns the business. And as a business leader, I don't know that I would be, very willingly give up, trust a third party to manage that, just any any third party to manage that for me. And so there's certain types of data that some enterprises may never want to move to the cloud, because they're not going to trust a third party to manage that risk for them. >> So it's more transparent from a government standpoint. It's not opaque. >> Seth: Yup. >> You feel like you're in control? >> Yeah, you feel like you're in control and if something goes wrong, it's my fault. It's not something that I got penalized for because someone else did something wrong. >> So at the data layer, help us sort of abstract one layer up and the applications. How would you partition the applications. The ones that are managing that critical data that has to stay on premises. What would you build up potentially to compliment it in the public cloud? >> I don't think you need to partition applications. The way you build modern applications today, it's all API driven. You can reduce some of the costs of latency, through design. So you don't really need to partition the applications, per say. >> I'm thinking more along the lines of that the systems of record are not going to be torn out and those are probably the last ones if ever to go to the public cloud. But other applications leverage them. If that's not the right way of looking at it, where do you add value in the public cloud versus what stays on premise? >> So some of the system of record data, there's no reason you can't replicate some of it to the cloud. So if it's not this personal information, or highly regulated information, there's no reason that you can't replicate some of that to the cloud. And I think we get caught up in, we can't replicate data, we can't replicate data. I don't think that's the right answer, I think the right answer is to replicate the data if you need to, or if the data and system of record is not in the right structure, for what I need to do, then let's put the data in the right structure. Let's not have the conversation about how I can't replicate data. Let's have the conversation about where's the right place for the data, where does it make most sense and what's the right structure for it? And if that means you've got 10 copies of a certain type of data then you've got 10 copies of a certain type of data. >> Would you be, on that data, would it typically be, other parts of the systems of record that you might have in the public cloud, or would they be new apps, sort of green field apps? >> Seth: Yes. >> George: Okay. >> Seth: I think both. And that's part of, i think in my mind, that's kind of how you build, that question you just asked right there. Is one of the things that guide how you build your cloud migration strategy. So we said you can't just pick everything up and move it. So how do you prioritize? You look at what you need to build to run your business differently. And you start there and you start thinking about how do I migrate information to support those to the cloud? And maybe you start by building a local private cloud. So that everything's close together until you kind of master it. And then once you get enough, critical mass of data and applications around it, then you start moving stuff to the cloud. >> We talked earlier off camera about reframing governance steps. I used to head a CIO consultancy and we worked with a number of CIOs that were within legal IT, for example. And were worried about compliance and governance and things of that nature. And their ROI was always scare the board. But the holy grail, was can we turn governance into something of value? For the organization? Can we? >> I think in the world we live in today, with ever increasing regulations. And with a need to be agile and with everyone needing to and wanting to apply data science at scale. You need to reframe governance, right? Governance needs to be reframed from something that is seen as a roadblock. To something that is truly an enabler. And not just giving it lip service. And what do I mean by that? For governance to be an enabler, you really got to think about, how do I upfront, classify my data so that all data in my organization is bucketed in to some version of public, propietary and confidential. Different enterprises may have 30 scales and some may only have two. Or some may have one. and so you do that up front and so you know what can be done with data, when it can be done and who it can by done with. You need to capture intent. So what are allowed intended uses of data? And as a data scientist, what am I intending to do with this data? So that you can then mesh those two things together? Cause that's important in these new regulations I talked about, is people give you access to data, their personal data for an intended purpose. And then you need to be able to apply these governance, policies, actively. So it's not a passive, after the fact. Or you got to stop and you got to wait, it's leveraging services. Leveraging APIs. And building a composable system of polices that are delivered through APIs. So if I want to create a sandbox. To run some analytics on. I'm going to call an API. To get that data. That API is going to call a policy API that's going to say, "Okay, does Seth have permission to see this data? Can Seth use this data for this intended purpose?" if yes, the sandbox is created. If not, there's a conversation about really why does Seth need access to this data? It's really moving governance to be actively to enable me to do things. And it changes the conversation from, hey it's your data, can I have it? To there's really solid reasons as to why I can and can't have data. >> And then some potential automation around a sandbox that creates value. >> Seth: Absolutely. >> But it's still, the example you gave, public prop6ietary or confidential. Is still very governance like, where I was hoping you were going with the data classification and I think you referenced this. Can I extend that, that schema, that nomenclature to include other attributes of value? And can i do it, automate it, at the point of creation or use and scale it? >> Absolutely, that is exactly what I mean. I just used those three cause it was the three that are easy to understand. >> So I can give you as a business owner some areas that I would like to see, a classification schema and then you could automate that for me at scale? In theory? >> In theory, that's where we're hoping to go. To be able to automate. And it's going to be different based on what industry vertical you're in. What risk profile your business is willing to take. So that classification scheme is going to look very different for a bank, than it will for a pharmaceutical company. Or for a research organization. >> Dave: Well, if I can then defensively delete data. That's of real value to an organization. >> With new regulations, you need to be able to delete data. And you need to be able to know where all of your data is. So that you can delete it. Today, most organizations don't know where all their data is. >> And that problem is solved with math and data science, or? >> I think that problem is solved with a combination of governance. >> Dave: Sure. >> And technology. Right? >> Yeah, technology kind of got us into this problem. We'll say technology can get us out. >> On the technology subject, it seems like, with the explosion of data, whether it's not just volume, but also, many copies of the truth. You would need some sort of curation and catalog system that goes beyond what you had in a data warehouse. How do you address that challenge? >> Seth: Yeah and that gets into what I said when you guys asked me about CDOs, what do they care about? One of the things is unified governance. And so part of unified governance, the first piece of unified governance is having a catalog of your data. That is all of your data. And it's a single catalog for your data whether it's one of your business critical systems that's running your business today. Whether it's a public cloud, or it's a private cloud. Or some combination of both. You need to know where all your data is. You also need to have a policy catalog that's single for both of those. Catalogs like this fall apart by entropy. And the more you have, the more likely they are to fall apart. And so if you have one. And you have a lot of automation around it to do a lot of these things, so you have automation that allows you to go through your data and discover what data is where. And keep track of lineage in an automated fashion. Keep track of provenance in an automated fashion. Then we start getting into a system of truly unified governance that's active like I said before. >> There's a lot of talk about digital transformations. Of course, digital equals data. If it ain't data, it ain't digital. So one of the things that in the early days of the whole big data theme. You'd hear people say, "You have to figure out how to monetize the data." And that seems to have changed and morphed into you have to understand how your organization gets value from data. If you're a for profit company, it's monetizing. Something and feeding how data contributes to that monetization if you're a health care organization, maybe it's different. I wonder if you could talk about that in terms of the importance of understanding how an organization makes money to the CDO specifically. >> I think you bring up a good point. Monetization of data and analytics, is often interpreted differently. If you're a CFO you're going to say, "You're going to create new value for me, I'm going to start getting new revenue streams." And that may or may not be what you mean. >> Dave: Sell the data, it's not always so easy. >> It's not always so easy and it's hard to demonstrate value for data. To sell it. There's certain types, like IBM owns a weather company. Clearly, people want to buy weather data, it's important. But if you're talking about how do you transform a business unit it's not necessarily about creating new revenue streams, it's how do I leverage data and analytics to run my business differently. And maybe even what are new business models that I could never do before I had data and data science. >> Would it be fair to say that, as Dave was saying, there's the data side and people were talking about monetizing that. But when you talk about analytics increasingly, machine learning specifically, it's a fusion of the data and the model. And a feedback loop. Is that something where, that becomes a critical asset? >> I would actually say that you really can't generate a tremendous amount of value from just data. You need to apply something like machine learning to it. And machine learning has no value without good data. You need to be able to apply machine learning at scale. You need to build the deployable data science assets that run your business differently. So for example, I could run a report that shows me how my business did last quarter. How my sales team did last quarter. Or how my marketing team did last quarter. That's not really creating value. That's giving me a retrospective look on how I did. Where you can create value is how do I run my marketing team differently. So what data do I have and what types of learning can I get from that data that will tell my marketing team what they should be doing? >> George: And the ongoing process. >> And the ongoing process. And part of actually discovering, doing this catalog your data and understanding data you find data quality issues. And data quality issues are not necessarily an issue with the data itself or the people, they're usually process issues. And by discovering those data quality issues you may discover processes that need to be changed and in changing those processes you can create efficiencies. >> So it sounds like you guys got a pretty good framework. Having talked to Interpol a couple times and what you're saying makes sense. Do you have nightmares about IOT? (laughing) >> Do I have nightmares about IOT? I don't think I have nightmares about IOT. IOT is really just a series of connected devices. Is really what it is. On my talk tomorrow, I'm going to talk about hybrid cloud and connect a car is actually one of the things I'm going to talk about. And really a connected car you're just have a bunch of connected devices to a private cloud that's on wheels. I'm less concerned about IOT than I am, people manually changing data. IOT you get data, you can track it, if something goes wrong, you know what happened. I would say no, I don't have nightmares about IOT. If you do security wrong, that's a whole nother conversation. >> But it sounds like you're doing security right, sounds like you got a good handle on governance. Obviously scale is a key part of that. Could break the whole thing if you can't scale. And you're comfortable with the state of technology being able to support that? At least with IBM. >> I think at least with an IBM I think I am. Like I said, a connected car which is basically a bunch of IOT devices, a private cloud. How do we connect that private cloud to other private clouds or to a public cloud? There's tons of technologies out there to do that. Spark, Kafka. Those two things together allow you to do things that we could never do before. >> Can you elaborate? Like in a connected car environment or some other scenario where, other people called it a data center on wheels. Think of it as a private cloud, that's a wonderful analogy. How does Spark and Kafka on that very, very, smart device, cooperate with something like on the edge. Like the cities, buildings, versus in the clouds? >> If you're a connected car and you're this private cloud on wheels. You can't drive the car just on that information. You can't drive it just on the LIDAR knowing how well the wheels are in contact, you need weather information. You need information about other cars around you. You need information about pedestrians. You need information about traffic. All of this information you get from that connection. And the way you do that is leveraging Spark and Kafka. Kafka's a messaging system, you could leverage Kafka to send the car messages. Or send pedestrian messages. "This car is coming, you shouldn't cross." Or vice versa. Get a car to stop because there's a pedestrian in the way before even the systems on the car can see it. So if you can get that kind of messaging system in near real time. If I'm the pedestrian I'm 300 feet away. A half a second that it would take for that to go through, isn't that big of a deal because you'll be stopped before you get there. >> What about the again, intelligence between not just the data, but the advanced analytics. Where some of that would live in the car and some in the cloud. Is it just you're making realtime decisions in the car and you're retraining the models in the cloud, or how does that work? >> No I think some of those decisions would be done through Spark. In transit. And so one of the nice things about something about Spark is, we can do machine learning transformations on data. Think ETL. But think ETL where you can apply machine learning as part of that ETL. So I'm transferring all this weather data, positioning data and I'm applying a machine learning algorithm for a given purpose in that car. So the purpose is navigation. Or making sure I'm not running into a building. So that's happening in real time as it's streaming to the car. >> That's the prediction aspect that's happening in real time. >> Seth: Yes. >> But at the same time, you want to be learning from all the cars in your fleet. >> That would happen up in the cloud. I don't think that needs to happen on the edge. Maybe it does, but I don't think it needs to happen on the edge. And today, while I said a car is a data center, a private cloud on wheels, there's cost to the computation you can have on that car. And I don't think the cost is quite low enough yet where you could do all that where it makes sense to do all that computation on the edge. So some of it you would want to do in the cloud. Plus you would want to have all the information from as many cars in the area as possible. >> Dave: We're out of time, but some closing thoughts. They say may you live in interesting times. Well you can sum up the sum of the changes that are going on the business. Dell buys EMC, IBM buys The Weather Company. And that gave you a huge injection of data scientists. Which, talk about data culture. Just last thoughts on that in terms of the acquisition and how that's affected your role. >> I've only been at IBM since November. So all that happened before my role. >> Dave: So you inherited? >> So from my perspective it's a great thing. Before I got there, the culture was starting to change. Like we talked about before we went on air, that's the hardest part about any kind of data science transformation is the cultural aspects. >> Seth, thanks very much for coming back in theCUBE. Good to have you. >> Yeah, thanks for having me again. >> You're welcome, all right, keep it right there everybody, we'll be back with our next guest. This is theCUBE, we're live from Spark Summit in Boston. Right back. (soft rock music)

Published Date : Feb 8 2017

SUMMARY :

Brought to you by, Databricks. of the IBM Analytics Organization. Many more of you around the world And so I'm the chief data officer and what are the key critical success factors for you So how are they going to build the posable data science assets. So the people piece is both the skills, with the right skillset, where you need them, is hard. So the race really, is to a hybrid cloud. So Seth, the CIO might say to you, And it's not a six month journey, So you can actually keep running your business. So it's more transparent from a government standpoint. Yeah, you feel like you're in control that has to stay on premises. I don't think you need to partition applications. of record are not going to be torn out to replicate the data if you need to, that guide how you build your cloud migration strategy. But the holy grail, So that you can then mesh those two things together? And then some potential automation But it's still, the example you gave, that are easy to understand. So that classification scheme is going to That's of real value to an organization. And you need to be able to know where all of your data is. I think that problem is solved And technology. Yeah, technology kind of got us into this problem. that goes beyond what you had in a data warehouse. And the more you have, And that seems to have changed and morphed into you have And that may or may not be what you mean. and it's hard to demonstrate value for data. it's a fusion of the data and the model. that you really can't generate a tremendous amount And by discovering those data quality issues you may So it sounds like you guys got a pretty good framework. of the things I'm going to talk about. Could break the whole thing if you can't scale. Those two things together allow you Can you elaborate? And the way you do that is leveraging Spark and Kafka. and some in the cloud. But think ETL where you can apply machine That's the prediction aspect you want to be learning from all the cars in your fleet. to the computation you can have on that car. And that gave you a huge injection of data scientists. So all that happened before my role. that's the hardest part about any kind Good to have you. we'll be back with our next guest.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
DavePERSON

0.99+

IBMORGANIZATION

0.99+

GeorgePERSON

0.99+

George GilbertPERSON

0.99+

SethPERSON

0.99+

Dave VellantePERSON

0.99+

Inderpal BhandariPERSON

0.99+

10 copiesQUANTITY

0.99+

Seth DobrinPERSON

0.99+

DellORGANIZATION

0.99+

300 feetQUANTITY

0.99+

oneQUANTITY

0.99+

twoQUANTITY

0.99+

six monthQUANTITY

0.99+

bothQUANTITY

0.99+

BostonLOCATION

0.99+

30 scalesQUANTITY

0.99+

last quarterDATE

0.99+

five thingsQUANTITY

0.99+

five piecesQUANTITY

0.99+

IBM Analytics OrganizationORGANIZATION

0.99+

Boston, MassachusettsLOCATION

0.99+

eachQUANTITY

0.99+

two thingsQUANTITY

0.99+

todayDATE

0.99+

NovemberDATE

0.99+

tomorrowDATE

0.99+

TodayDATE

0.99+

singleQUANTITY

0.99+

The Weather CompanyORGANIZATION

0.99+

two vectorsQUANTITY

0.99+

EMCORGANIZATION

0.98+

SparkTITLE

0.98+

InterpolORGANIZATION

0.98+

IBM AnalyticsORGANIZATION

0.98+

one driverQUANTITY

0.98+

OneQUANTITY

0.97+

first pieceQUANTITY

0.97+

KafkaPERSON

0.97+

threeQUANTITY

0.97+

Spark Summit East 2017EVENT

0.93+

a yearQUANTITY

0.93+

Spark SummitEVENT

0.92+

five key thingsQUANTITY

0.91+

single catalogQUANTITY

0.9+

EUGDPRTITLE

0.9+

one layerQUANTITY

0.9+

SparkPERSON

0.88+

KafkaTITLE

0.86+

half a secondQUANTITY

0.84+

DatabricksORGANIZATION

0.82+

Nick Pentreath, IBM STC - Spark Summit East 2017 - #sparksummit - #theCUBE


 

>> Narrator: Live from Boston, Massachusetts, this is The Cube, covering Spark Summit East 2017. Brought to you by Data Bricks. Now, here are your hosts, Dave Valente and George Gilbert. >> Boston, everybody. Nick Pentry this year, he's a principal engineer a the IBM Spark Technology Center in South Africa. Welcome to The Cube. >> Thank you. >> Great to see you. >> Great to see you. >> So let's see, it's a different time of year, here that you're used to. >> I've flown from, I don't know the Fahrenheit's equivalent, but 30 degrees Celsius heat and sunshine to snow and sleet, so. >> Yeah, yeah. So it's a lot chillier there. Wait until tomorrow. But, so we were joking. You probably get the T-shirt for the longest flight here, so welcome. >> Yeah, I actually need the parka, or like a beanie. (all laugh) >> Little better. Long sleeve. So Nick, tell us about the Spark Technology Center, STC is its acronym and your role, there. >> Sure, yeah, thank you. So Spark Technology Center was formed by IBM a little over a year ago, and its mission is to focus on the Open Source world, particularly Apache Spark and the ecosystem around that, and to really drive forward the community and to make contributions to both the core project and the ecosystem. The overarching goal is to help drive adoption, yeah, and particularly enterprise customers, the kind of customers that IBM typically serves. And to harden Spark and to make it really enterprise ready. >> So why Spark? I mean, we've watched IBM do this now for several years. The famous example that I like to use is Linux. When IBM put $1 billion into Linux, it really went all in on Open Source, and it drove a lot of IBM value, both internally and externally for customers. So what was it about Spark? I mean, you could have made a similar bet on Hadoop. You decided not to, you sort of waited to see that market evolve. What was the catalyst for having you guys all go in on Spark? >> Yeah, good question. I don't know all the details, certainly, of what was the internal drivers because I joined HTC a little under a year ago, so I'm fairly new. >> Translate the hallway talk, maybe. (Nick laughs) >> Essentially, I think you raise very good parallels to Linux and also Java. >> Absolutely. >> So Spark, sorry, IBM, made these investments and Open Source technologies that had ceased to be transformational and kind of game-changing. And I think, you know, most people will probably admit within IBM that they maybe missed the boat, actually, on Hadoop and saw Spark as the successor and actually saw a chance to really dive into that and kind of almost leap frog and say, "We're going to "back this as the next generation analytics platform "and operating system for analytics "and big debt in the enterprise." >> Well, I don't know if you happened to watch the Super Bowl, but there's a saying that it's sometimes better to be lucky than good. (Nick laughs) And that sort of applies, and so, in some respects, maybe missing the window on Hadoop was not a bad thing for IBM >> Yeah, exactly because not a lot of people made a ton of dough on Hadoop and they're still sort of struggling to figure it out. And now along comes Spark, and you've got this more real time nature. IBM talks a lot about bringing analytics and transactions together. They've made some announcements about that and affecting business outcomes in near real time. I mean, that's really what it's all about and one of your areas of expertise is machine learning. And so, talk about that relationship and what it means for organizations, your mission. >> Yeah, machine learning is a key part of the mission. And you've seen the kind of big debt in enterprise story, starting with the kind of Hadoop and data lakes. And that's evolved into, now we've, before we just dumped all of this data into these data lakes and these silos and maybe we had some Hadoop jobs and so on. But now we've got all this data we can store, what are we actually going to do with it? So part of that is the traditional data warehousing and business intelligence and analytics, but more and more, we're seeing there's a rich value in this data, and to unlock it, you really need intelligent systems. You need machine learning, you need AI, you need real time decision making that starts transcending the boundaries of all the rule-based systems and human-based systems. So we see machine learning as one of the key tools and one of the key unlockers of value in these enterprise data stores. >> So Nick, perhaps paint us a picture of someone who's advanced enough to be working with machine learning with BMI and we know that the tool chain's kind of immature. Although, IBM with Data Works or Data First has a fairly broad end-to-end sort of suit of tools, but what are the early-use cases? And what needs to mature to go into higher volume production apps or higher-value production apps? >> I think the early-use cases for machine learning in general and certainly at scale are numerous and they're growing, but classic examples are, let's say, recommendation engines. That's an area that's close to my heart. In my previous life before IBM, I bought the startup that had a recommendation engine service targeting online stores and new commerce players and social networks and so on. So this is a great kind of example use case. We've got all this data about, let's say, customer behavior in your retail store or your video-sharing site, and in order to serve those customers better and make more money, if you can make good recommendations about what they should buy, what they should watch, or what they should listen to, that's a classic use case for machine learning and unlocking the data that is there, so that is one of the drivers of some of these systems, players like Amazon, they're sort of good examples of the recommendation use case. Another is fraud detection, and that is a classic example in financial services, enterprise, which is a kind of staple of IBM's customer base. So these are a couple of examples of the use cases, but the tool sets, traditionally, have been kind of cumbersome. So Amazon bought everything from scratch themselves using customized systems, and they've got teams and teams of people. Nowadays, you've got this bold into Apache Spark, you've got it in Spark, a machine learning library, you've got good models to do that kind of thing. So I think from an algorithmic perspective, there's been a lot of advancement and there's a lot of standardization and almost commoditization of the model side. So what is missing? >> George: Yeah, what else? >> And what are the shortfalls currently? So there's a big difference between the current view, I guess the hype of the machine learning as you've got data, you apply some machine learning, and then you get profit, right? But really, there's a hugely complex workflow that involves this end-to-end story. You've got data coming from various data sources, you have to feed it into one centralized system, transform and process it, extract your features and do your sort of hardcore data signs, which is the core piece that everyone sort of thinks about as the only piece, but that's kind of in the middle and it makes up a relatively small proportion of the overall chain. And once you've got that, you do model training and selection testing, and you now have to take that model, that machine-learning algorithm and you need to deploy it into a real system to make real decisions. And that's not even the end of it because once you've got that, you need to close the loop, what we call the feedback loop, and you need to monitor the performance of that model in the real world. You need to make sure that it's not deteriorating, that it's adding business value. All of these ind of things. So I think that is the real, the piece of the puzzle that's missing at the moment is this end-to-end, delivering this end-to-end story and doing it at scale, securely, enterprise-grade. >> And the business impact of that presumably will be a better-quality experience. I mean, recommendation engines and fraud detection have been around for a while, they're just not that good. Retargeting systems are too little too late, and kind of cumbersome fraud detection. Still a lot of false positives. Getting much better, certainly compressing the time. It used to be six months, >> Yes, yes. Now it's minutes or second, but a lot of false positives still, so, but are you suggesting that by closing that gap, that we'll start to see from a consumer standpoint much better experiences? >> Well, I think that's imperative because if you don't see that from a consumer standpoint, then the mission is failing because ultimately, it's not magic that you just simply throw machine learning at something and you unlock business value and everyone's happy. You have to, you know, there's a human in the loop, there. You have to fulfill the customer's need, you have to fulfill consumer needs, and the better you do that, the more successful your business is. You mentioned the time scale, and I think that's a key piece, here. >> Yeah. >> What makes better decisions? What makes a machine-learning system better? Well, it's better data and more data, and faster decisions. So I think all of those three are coming into play with Apache Spark, end-to-end's story streaming systems, and the models are getting better and better because they're getting more data and better data. >> So I think we've, the industry, has pretty much attacked the time problem. Certainly for fraud detection and recommendation systems the quality issue. Are we close? I mean, are we're talking about 6-12 months before we really sort of start to see a major impact to the consumer and ultimately, to the company who's providing those services? >> Nick: Well, >> Or is it further away than that, you think? >> You know, it's always difficult to make predictions about timeframes, but I think there's a long way to go to go from, yeah, as you mentioned where we are, the algorithms and the models are quite commoditized. The time gap to make predictions is kind of down to this real-time nature. >> Yeah. >> So what is missing? I think it's actually less about the traditional machine-learning algorithms and more about making the systems better and getting better feedback, better monitoring, so improving the end user's experience of these systems. >> Yeah. >> And that's actually, I don't think it's, I think there's a lot of work to be done. I don't think it's a 6-12 month thing, necessarily. I don't think that in 12 months, certainly, you know, everything's going to be perfectly recommended. I think there's areas of active research in the kind of academic fields of how to improve these things, but I think there's a big engineering challenge to bring in more disparate data sources, to better, to improve data quality, to improve these feedback loops, to try and get systems that are serving customer needs better. So improving recommendations, improving the quality of fraud detection systems. Everything from that to medical imaging and counter detection. I think we've got a long way to go. >> Would it be fair to say that we've done a pretty good job with traditional application lifecycle in terms of DevOps, but we now need the DevOps for the data scientists and their collaborators? >> Nick: Yeah, I think that's >> And where is BMI along that? >> Yeah, that's a good question, and I think you kind of hit the nail on the head, that the enterprise applied machine learning problem has moved from the kind of academic to the software engineering and actually, DevOps. Internally, someone mentioned the word train ops, so it's almost like, you know, the machine learning workflow and actually professionalizing and operationalizing that. So recently, IBM, for one, has announced what's in data platform and now, what's in machine learning. And that really tries to address that problem. So really, the aim is to simplify and productionize these end-to-end machine-learning workflows. So that is the product push that IBM has at the moment. >> George: Okay, that's helpful. >> Yeah, and right. I was at the Watson data platform announcement you call the Data Works. I think they changed the branding. >> Nick: Yeah. >> It looked like there were numerous components that IBM had in its portfolio that's now strung together. And to create that end-to-end system that you're describing. Is that a fair characterization, or is it underplaying? I'm sure it is. The work that went into it, but help us maybe understand that better. >> Yeah, I should caveat it by saying we're fairly focused, very focused at HTC on the Open Source side of things, So my work is predominately within the Apache Spark project and I'm less involved in the data bank. >> Dave: So you didn't contribute specifically to Watson data platform? >> Not to the product line, so, you know, >> Yeah, so its really not an appropriate question for you? >> I wouldn't want to kind of, >> Yeah. >> To talk too deeply about it >> Yeah, yeah, so that, >> Simply because I haven't been involved. >> Yeah, that's, I don't want to push you on that because it's not your wheelhouse, but then, help me understand how you will commercialize the activities that you do, or is that not necessarily the intent? >> So the intent with HTC particularly is that we focus on Open Source and a core part of that is that we, being within IBM, we have the opportunity to interface with other product groups and customer groups. >> George: Right. >> So while we're not directly focused on, let's say, the commercial aspect, we want to effectively leverage the ability to talk to real-world customers and find the use cases, talk to other product groups that are building this Watson data platform and all the product lines and the features, data sans experience, it's all built on top of Apache Apache Spark and platform. >> Dave: So your role is really to innovate? >> Exactly, yeah. >> Leverage and Open Source and innovate. >> Both innovate and kind of improve, so improve performance improve efficiency. When you are operating at the scale of a company such as IBM and other large players, your customers and you as product teams and builders of products will come into contact with all the kind of little issues and bugs >> Right. >> And performance >> Make it better. Problems, yeah. And that is the feedback that we take on board and we try and make it better, not just for IBM and their customers. Because it's an Apache product and everyone benefits. So that's really the idea. Take all the feedback and learnings from enterprise customers and product groups and centralize that in the Open Source contributions that we make. >> Great. Would it be, so would it be fair to say you're focusing on making the core Spark, Spark ML and Spark ML Lib capabilities sort of machine learning libraries and in the pipeline, more robust? >> Yes. >> And if that's the case, we know there needs to be improvements in its ability to serve predictions in real time, like high speed. We know there's a need to take the pipeline and sort of share it with other tools, perhaps. Or collaborate with other tool chains. >> Nick: Yeah. >> What are some of the things that the Enterprise customers are looking for along the lines? >> Yeah, that's a great question and very topical at the moment. So both from an Open Source community perspective and Enterprise customer perspective, this is one of the, if not the key, I think, kind of missing pieces within the Spark machine-learning kind of community at the moment, and it's one of the things that comes up most often. So it is a missing piece, and we as a community need to work together and decide, is this something that we built within Spark and provide that functionality? Is is something where we try and adopt open standards that will benefit everybody and that provides a kind of one standardized format, or way or serving models? Or is it something where there's a few Open Source projects out there that might serve for this purpose, and do we get behind those? So I don't have the answer because this is ongoing work, but it's definitely one of the most critical kind of blockers, or, let's say, areas that needs work at the moment. >> One quick question, then, along those lines. IBM, the first thing IBM contributed to the Spark community was Spark ML, which is, as I understand it, it was an ability to, I think, create an ensemble sort of set of models to do a better job or create a more, >> So are you referring to system ML, I think it is? >> System ML. >> System ML, yeah, yeah. >> What are they, I forgot. >> Yeah, so, so. >> Yeah, where does that fit? >> System ML started out as a IBM research project and perhaps the simplest way to describe it is, as a kind of sequel optimizer is to take sequel queries and decide how to execute them in the most efficient way, system ML takes a kind of high-level mathematical language and compiles it down to a execution plan that runs in a distributed system. So in much the same way as your sequel operators allow this very flexible and high-level language, you don't have to worry about how things are done, you just tell the system what you want done. System ML aims to do that for mathematical and machine learning problems, so it's now an Apache project. It's been donated to Open Source and it's an incubating project under very active development. And that is really, there's a couple of different aspects to it, but that's the high-level goal. The underlying execution engine is Spark. It can run on Hadoop and it can run locally, but really, the main focus is to execute on Spark and then expose these kind of higher level APRs that are familiar to users of languages like R and Python, for example, to be able to write their algorithms and not necessarily worry about how do I do large scale matrix operations on a cluster? System ML will compile that down and execute that for them. >> So really quickly, follow up, what that means is if it's a higher level way for people who sort of cluster aware to write machine-learning algorithms that are cluster aware? >> Nick: Precisely, yeah. >> That's very, very valuable. When it works. >> When it works, yeah. So it does, again, with the caveat that I'm mostly focused on Spark and not so much the System ML side of things, so I'm definitely not an expert. I don't claim to be an expert in it. But it does, you know, it works at the moment. It works for a large class of machine-learning problems. It's very powerful, but again, it's a young project and there's always work to be done, so exactly the areas that I know that they're focusing on are these areas of usability, hardening up the APRs and making them easier to use and easier to access for users coming from the R and Python communities who, again are, as you said, they're not necessarily experts on distributed systems and cluster awareness, but they know how to write a very complex machine-learning model in R, for example. And it's really trying to enable them with a set of APR tools. So in terms of the underlying engine, they are, I don't know how many hundreds of thousands, millions of lines of code and years and years of research that's gone into that, so it's an extremely powerful set of tools. But yes, a lot of work still to be done there and ongoing to make it, in a way to make it user ready and Enterprise ready in a sense of making it easier for people to use it and adopt it and to put it into their systems and production. >> So I wonder if we can close, Nick, just a few questions on STC, so the Spark Technology Centers in Cape Town, is that a global expertise center? Is is STC a virtual sort of IBM community, or? >> I'm the only member visiting Cape Town, >> David: Okay. >> So I'm kind of fairly lucky from that perspective, to be able to kind of live at home. The rest of the team is mostly in San Francisco, so there's an office there that's co-located with the Watson west office >> Yeah. >> And Watson teams >> Sure. >> That are based there in Howard Street, I think it is. >> Dave: How often do you get there? >> I'll be there next week. >> Okay. >> So I typically, sort of two or three times a year, I try and get across there >> Right. And interface with the team, >> So, >> But we are a fairly, I mean, IBM is obviously a global company, and I've been surprised actually, pleasantly surprised there are team members pretty much everywhere. Our team has a few scattered around including me, but in general, when we interface with various teams, they pop up in all kinds of geographical locations, and I think it's great, you know, a huge diversity of people and locations, so. >> Anything, I mean, these early days here, early day one, but anything you saw in the morning keynotes or things you hope to learn here? Anything that's excited you so far? >> A couple of the morning keynotes, but had to dash out to kind of prepare for, I'm doing a talk later, actually on feature hashing for scalable machine learning, so that's at 12:20, please come and see it. >> Dave: A breakout session, it's at what, 12:20? >> 20 past 12:00, yeah. >> Okay. >> So in room 302, I think, >> Okay. >> I'll be talking about that, so I needed to prepare, but I think some of the key exciting things that I have seen that I would like to go and take a look at are kind of related to the deep learning on Spark. I think that's been a hot topic recently in one of the areas, again, Spark is, perhaps, hasn't been the strongest contender, let's say, but there's some really interesting work coming out of Intel, it looks like. >> They're talking here on The Cube in a couple hours. >> Yeah. >> Yeah. >> I'd really like to see their work. >> Yeah. >> And that sounds very exciting, so yeah. I think every time I come to a Spark summit, they always need projects from the community, various companies, some of them big, some of them startups that are pushing the envelope, whether it's research projects in machine learning, whether it's adding deep learning libraries, whether it's improving performance for kind of commodity clusters or for single, very powerful single modes, there's always people pushing the envelope, and that's what's great about being involved in an Open Source community project and being part of those communities, so yeah. That's one of the talks that I would like to go and see. And I think I, unfortunately, had to miss some of the Netflix talks on their recommendation pipeline. That's always interesting to see. >> Dave: Right. >> But I'll have to check them on the video (laughs). >> Well, there's always another project in Open Source land. Nick, thanks very much for coming on The Cube and good luck. Cool, thanks very much. Thanks for having me. >> Have a good trip, stay warm, hang in there. (Nick laughs) Alright, keep it right there. My buddy George and I will be back with our next guest. We're live. This is The Cube from Sparks Summit East, #sparksummit. We'll be right back. (upbeat music) (gentle music)

Published Date : Feb 8 2017

SUMMARY :

Brought to you by Data Bricks. a the IBM Spark Technology Center in South Africa. So let's see, it's a different time of year, here I've flown from, I don't know the Fahrenheit's equivalent, You probably get the T-shirt for the longest flight here, need the parka, or like a beanie. So Nick, tell us about the Spark Technology Center, and the ecosystem. The famous example that I like to use is Linux. I don't know all the details, certainly, Translate the hallway talk, maybe. Essentially, I think you raise very good parallels and kind of almost leap frog and say, "We're going to and so, in some respects, maybe missing the window on Hadoop and they're still sort of struggling to figure it out. So part of that is the traditional data warehousing So Nick, perhaps paint us a picture of someone and almost commoditization of the model side. And that's not even the end of it And the business impact of that presumably will be still, so, but are you suggesting that by closing it's not magic that you just simply throw and the models are getting better and better attacked the time problem. to go from, yeah, as you mentioned where we are, and more about making the systems better So improving recommendations, improving the quality So really, the aim is to simplify and productionize Yeah, and right. And to create that end-to-end system that you're describing. and I'm less involved in the data bank. So the intent with HTC particularly is that we focus leverage the ability to talk to real-world customers and you as product teams and builders of products and centralize that in the Open Source contributions sort of machine learning libraries and in the pipeline, And if that's the case, So I don't have the answer because this is ongoing work, IBM, the first thing IBM contributed to the Spark community but really, the main focus is to execute on Spark When it works. and ongoing to make it, in a way to make it user ready So I'm kind of fairly lucky from that perspective, And interface with the team, and I think it's great, you know, A couple of the morning keynotes, but had to dash out are kind of related to the deep learning on Spark. that are pushing the envelope, whether it's research and good luck. My buddy George and I will be back with our next guest.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
DavidPERSON

0.99+

George GilbertPERSON

0.99+

IBMORGANIZATION

0.99+

Dave ValentePERSON

0.99+

GeorgePERSON

0.99+

DavePERSON

0.99+

Nick PentreathPERSON

0.99+

Howard StreetLOCATION

0.99+

San FranciscoLOCATION

0.99+

Nick PentryPERSON

0.99+

$1 billionQUANTITY

0.99+

NickPERSON

0.99+

AmazonORGANIZATION

0.99+

HTCORGANIZATION

0.99+

twoQUANTITY

0.99+

Cape TownLOCATION

0.99+

South AfricaLOCATION

0.99+

JavaTITLE

0.99+

LinuxTITLE

0.99+

12 monthsQUANTITY

0.99+

six monthsQUANTITY

0.99+

next weekDATE

0.99+

BostonLOCATION

0.99+

Boston, MassachusettsLOCATION

0.99+

IBM Spark Technology CenterORGANIZATION

0.99+

BMIORGANIZATION

0.99+

PythonTITLE

0.99+

SparkTITLE

0.99+

12:20DATE

0.99+

threeQUANTITY

0.99+

6-12 monthQUANTITY

0.99+

WatsonORGANIZATION

0.98+

tomorrowDATE

0.98+

Spark Technology CenterORGANIZATION

0.98+

oneQUANTITY

0.98+

Spark Technology CentersORGANIZATION

0.98+

this yearDATE

0.97+

HadoopTITLE

0.97+

hundreds of thousandsQUANTITY

0.97+

bothQUANTITY

0.97+

30 degrees CelsiusQUANTITY

0.97+

Data FirstORGANIZATION

0.97+

Super BowlEVENT

0.97+

singleQUANTITY

0.96+

Kickoff - Spark Summit East 2017 - #sparksummit - #theCUBE


 

>> Narrator: Live from Boston, Massachusetts, this is theCUBE covering Spark Summit East 2017. Brought to you by Databricks. Now, here are your hosts, Dave Vellante and George Gilbert. >> Everybody the euphoria is still palpable here, we're in downtown Boston at the Hynes Convention Center. For Spark Summit East, #SparkSummit, my co-host and I, George Gilbert, will be unpacking what's going on for the next two days. George, it's good to be working with you again. >> Likewise. >> I always like working with my man, George Gilbert. We go deep, George goes deeper. Fantastic action going on here in Boston, actually quite a good crowd here, it was packed this morning in the keynotes. The rave is streaming. Everybody's talking about streaming. Let's sort of go back a little bit though George. When Spark first came onto the scene, you saw these projects coming out of Berkeley, it was the hope of bringing real-timeness to big data, dealing with some of the memory constraints that we found going from batch to real-time interactive and now streaming, you're going to talk about that a lot. Then you had IBM come in and put a lot of dough behind Spark, basically giving it a stamp, IBM's imprimatur-- >> George: Yeah. >> Much in the same way it did with Lynx-- >> George: Yeah. >> Kind of elbowing it's way in-- >> George: Yeah. >> The marketplace and sort of gaining a foothold. Many people at the time thought that Hadoop needed Spark more than Spark needed Hadoop. A lot of people thought that Spark was going to replace Hadoop. Where are we today? What's the state of big data? >> Okay so to set some context, when Hadoop V1, classic Hadoop came out it was file system, commodity file system, keep everything really cheap, don't have to worry about shared storage, which is very expensive and the processing model, the execution of munging through data was map produced. We're all familiar with those-- >> Dave: Complicated but dirt cheap. >> Yes. >> Dave: Relative to a traditional data warehouse. >> Yes. >> Don't buy a big Oracle Unix box or Lynx box, buy this new file system and figure out how to make it work and you'll save a ton of money. >> Yeah, but unlike the traditional RDBMS', it wasn't really that great for doing interactive business intelligence and things like that. It was really good for big batch jobs that would run overnight or periods of hours, things like that. The irony is when Matei Zaharia, the co-creator of Spark or actually the creator and co-founder of Databricks, which is steward of Spark. When he created the language and the execution environment, his objective was to do a better MapReduce than Radue, than MapReduce, make it faster, take advantage of memory, but he did such a good job of it, that he was able to extend it to be a uniform engine not just for MapReduce type batch stuff, but for streaming stuff. >> Dave: So originally they start out thinking that if I get this right-- >> Yeah. >> It was sort of a microbatch leveraging memory more effectively and then it extended beyond-- >> The microbatch is their current way to address the streaming stuff. >> Dave: Okay. >> It takes MapReduce, which would be big long running jobs, and they can slice them up and so each little slice turns into an element in the stream. >> Dave: Okay, so the point it was improvement upon these big long batch jobs-- >> George: Yeah. >> They're making it batch to interactive in real-time, so let's go back to big data for a moment here. >> George: Yeah. >> Big data was the hottest topic in the world three or four years ago and now it's sort of waned as a buzz word, but big data is now becoming more mainstream. We've talked about that a lot. A lot of people think it's done. Is big data done? >> George: Not it's more that it's sort of-- it's boring for us, kind of pundits, to talk about because it's becoming part of the fabric. The use cases are what's interesting. It started out as a way to collect all data into this really cheap storage repository and then once you did that, this was the data you couldn't afford to put into your terra data, data warehouse at 25,000 per terabyte or with running costs a multiple of that. Here you put all your data in here, your data scientists and data engineers started munging with the data, you started taking workloads off your data warehouse, like ETL things that didn't belong there. Now people are beginning to experiment with business intelligence sort of exploration and reporting on Hadoop, so taking more workloads off the data warehouse. The limitations, there are limitations there that will get solved by putting MPP SQL back-ends on it, but the next step after that. So we're working on that step, but the one that comes after that is make it easier for data scientists to use this data, to create predictive models-- [Dave] Okay, so I often joke that the ROI on big data was reduction on investment and lowering the denominator-- >> George: Yeah. >> In the expense equation, which I think it's fair to say that big data and Hadoop succeeded in achieving that, but then the question becomes, what's the real business impact. Clearly big data has not, except in some edge cases and there are a number of edge cases and examples, but it's not yet anyway lived up to the promise of real-time, affecting outcomes before, you know taking the human out of the decision, bringing transaction and analytics together. Now we're hearing a lot of that talk around AI and machine learning, of course, IoT is the next big thing, that's where streaming fits in. Is it same line new bottle? Or is it sort of the evolution of the data meme? >> George: It's an evolution, but it's not just a technology evolution to make it work. When we've been talking about big data as efficiency, like low cost, cost reduction for the existing type of infrastructure, but when it starts going into machine learning you're doing applications that are more strategic and more top line focused. That means your c-level execs actually have to get involved because they have to talk about the strategic objectives, like growth versus profitability or which markets you want to target first. >> So has Spark been a headwind or tailwind to Hadoop? >> I think it's very much been a tailwind because it simplified a lot of things that took many, many engines in Hadoop. That's something that Matei, creator of Spark, has been talking about for awhile. >> Dave: Okay something I learned today and actually I had heard this before, but the way I phrased it in my tweet, Genomiocs is kicking Moore's Law's ass. >> George: Yeah. >> That the price performance of sequencing a gene improves three x every year to what is essentially a doubling every 18 months for Moore's Law. The amount of data that's being created is just enormous, I think we heard from Broad Institute that they create 17 terabytes a day-- >> George: Yeah. >> As compared to YouTube, which is 24 terabytes a day. >> And then a few years it will be-- >> It will be dwarfing YouTube >> Yeah. >> Of course Twitter you couldn't even see-- >> Yeah. >> So what do you make of that? Is that just the fun fact, is that a new use case, is that really where this whole market is headed? >> It's not a fun fact because we've been hearing for years and years about this study about data doubling every 18 to 24 months, that's coming from the legacy storage guys who can only double their capacity every 18 to 24 months. The reality is that when we take what was analog data and we make it digitally accessible, the only thing that's preventing us from capturing all this data is the cost to acquire and manage it. The available data is growing much, much faster than 40% every 18 months. >> Dave: So what you're saying is that-- I mean this industry has marched to the cadence of Moore's Law for decades and what you're saying is that linear curve is actually reshaping and it's becoming exponential. >> George: For data-- >> Yes. >> George: So the pressure is on for compute, which is now the bottleneck to get clever and clever about how to process it-- >> So that says innovation has to come from elsewhere, not just Moore's Law. It's got to come from a combination of-- Thomas Friedman talks a lot about Moore's Law being one of the fundamentals, but there are others. >> George: Right. >> So from a data perspective, what are those combinatorial effects that are going to drive innovation forward? >> George: There was a big meetup for Spark last night and the focus was this new database called SnappyData that spun out of Pivotal and it's being mentored by Paul Maritz, ex-head of Development in Microsoft in the 90s and former head of VMWare. The interesting thing about this database, and we'll start seeing it in others, is you don't necessarily want to be able to query and analyze petabytes at once, it will take too long, sort of like munging through data of that size on Hadoop took too long. You can do things that approximate the answer and get it much faster. We're going to see more tricks like that. >> Dave: It's interesting you mention Maritz, I heard a lot of messaging this morning that talked about essentially real-time analysis and being able to make decisions on data that you've never seen before and actually affect outcomes. This narrative I first heard from Maritz many, many years ago when they launched Pivotal. He launched Pivotal to be this platform for building big data apps and now you're seeing Databricks and others sort of usurp that messaging and actually seeming to be at the center of that trend. What's going on there? >> I think there's two, what would you call it, two centers of gravity and our CTO David Floyer talks about this. The edge is becoming more intelligent because there's a huge bandwidth and latency gap between these smart devices at the edge, whether the smart device is like a car or a drone or just a bunch of sensors on a turbine. Those things need to analyze and respond in near real-time or hard real-time, like how to tune themselves, things like that, but they also have to send a lot of data back to the cloud to learn about how these things evolve. In other words it would be like sending the data to the cloud to figure out how the weather patterns are changing. >> Dave: Um,humm. >> That's the analogy. You need them both. >> Dave: Okay. >> So Spark right now is really good in the cloud, but they're doing work so that they can take a lighter weight version and put at the edge. We've also seen Amazon put some stuff at the edge and Azure as well. >> Dave: I want you to comment. We're going to talk about this later, we have a-- George and I are going to do a two-part series at this event. We're going to talk about the state of the market and then we're going to release our big data, in a glimpse to our big data numbers, our Spark forecast, our streaming forecast-- I say I mention streaming because that is-- we talk about batch, we talk about interactive/real-time, you know you're at a terminal-- anybody who's as old as I am remembers that. But now you're talking about streaming. Streaming is a new workload type, you call these things continuous apps, like streams of events coming into a call center, for example, >> George: Yeah. >> As one example that you used. Add some color to that. Talk about that new workload type and the roll of streaming, and really potentially how it fits into IoT. >> Okay, so for the last 60 years, since the birth of digital computing, we've had either one of two workloads, they were either batch, which is jobs that ran offline, you put your punch cards in and sometime later the answer comes out. Or we've had interactive, which is originally it was green screens and now we have PCs and mobile devices. The third one coming up now is continuous or streaming data that you act on in near real-time. It's not that those apps will replace the previous ones, it's that you'll have apps that have continuous processing, batch processing, interactive as a mix. An example would be today all the information about how your applications and data center infrastructure are operating, that's a lot of streams of data that Splunk first, took amat and did very well with-- so that you're looking in real-time and able to figure out if something goes wrong. That type of stuff, all the coulometry from your data center, that is a training wheel for Internet things, where you've got lots of stuff out at the edge. >> Dave: It's interesting you mention Splunk, Splunk doesn't actually use the big data term in its marketing, but they actually are big data and they are streaming. They're actually not talking about it, they're just doing it, but anyway-- Alright George, great thanks for that overview. We're going to break now, bring back our first guest, Arun Murthy, coming in from Hortonworks, co-founder at Hortonworks, so keep it right there everybody. This is theCUBE we're live from Spark Summit East, #SparkSummit, we'll be right back. (upbeat music)

Published Date : Feb 8 2017

SUMMARY :

Brought to you by Databricks. George, it's good to be working with you again. and now streaming, you're going to talk about that a lot. Many people at the time thought that Hadoop needed Spark and the processing model, buy this new file system and figure out how to make it work and the execution environment, to address the streaming stuff. in the stream. so let's go back to big data for a moment here. and now it's sort of waned as a buzz word, [Dave] Okay, so I often joke that the ROI on big data and machine learning, of course, IoT is the next big thing, but it's not just a technology evolution to make it work. That's something that Matei, creator of Spark, but the way I phrased it in my tweet, That the price performance of sequencing a gene all this data is the cost to acquire and manage it. I mean this industry has marched to the cadence So that says innovation has to come from elsewhere, and the focus was this new database called SnappyData and actually seeming to be at the center of that trend. but they also have to send a lot of data back to the cloud That's the analogy. So Spark right now is really good in the cloud, We're going to talk about this later, we have a-- As one example that you used. and sometime later the answer comes out. We're going to break now,

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
GeorgePERSON

0.99+

Paul MaritzPERSON

0.99+

Dave VellantePERSON

0.99+

George GilbertPERSON

0.99+

Arun MurthyPERSON

0.99+

Matei ZahariaPERSON

0.99+

DavePERSON

0.99+

BostonLOCATION

0.99+

HortonworksORGANIZATION

0.99+

AmazonORGANIZATION

0.99+

Thomas FriedmanPERSON

0.99+

IBMORGANIZATION

0.99+

David FloyerPERSON

0.99+

MateiPERSON

0.99+

Broad InstituteORGANIZATION

0.99+

BerkeleyLOCATION

0.99+

twoQUANTITY

0.99+

MaritzPERSON

0.99+

DatabricksORGANIZATION

0.99+

two-partQUANTITY

0.99+

MicrosoftORGANIZATION

0.99+

oneQUANTITY

0.99+

third oneQUANTITY

0.99+

OracleORGANIZATION

0.99+

YouTubeORGANIZATION

0.99+

25,000 per terabyteQUANTITY

0.99+

Hynes Convention CenterLOCATION

0.99+

24 monthsQUANTITY

0.99+

Boston, MassachusettsLOCATION

0.98+

first guestQUANTITY

0.98+

threeQUANTITY

0.98+

one exampleQUANTITY

0.98+

HadoopTITLE

0.97+

last nightDATE

0.97+

threeDATE

0.97+

bothQUANTITY

0.97+

40%QUANTITY

0.97+

todayDATE

0.97+

Spark Summit East 2017EVENT

0.97+

17 terabytes a dayQUANTITY

0.97+

firstQUANTITY

0.97+

24 terabytes a dayQUANTITY

0.97+

TwitterORGANIZATION

0.96+

decadesQUANTITY

0.96+

90sDATE

0.96+

Moore's LawTITLE

0.96+

two workloadsQUANTITY

0.96+

SparkTITLE

0.95+

four years agoDATE

0.94+

Moore'sTITLE

0.94+

two centersQUANTITY

0.92+

UnixCOMMERCIAL_ITEM

0.92+

KickoffEVENT

0.92+

#SparkSummitEVENT

0.91+

Jack Norris, MapR - Spark Summit East 2016 #SparkSummit #theCUBE


 

>>From New York expecting the signal to nine. It's the cube covering sparks summit east brought to you by spark summit. Now your hosts, Dave Volante and George Gilbert >>Right here in Midtown at the Hilton hotel. This has sparked somebody and this is the cube. The cube goes out to the events. We extract the signal from the noise. Jack Norris is here. He's the CMO of Mapbox, long time cube, alum jackets. It's great to see you again. Hey, if you've been here since the beginning of this whole big data >>Meme and it might've started here, I don't know. I think we've yeah, >>I think you're right. I mean, it really did start it. I think in this building, it was our first big data show at the original, you know, uh, uh, Hadoop world. And, uh, and you guys, like I say, I've been there from the start. Uh, you were kind of impatient early on. You said, you know, we're just going to go build solutions and, uh, and ignore the noise and you built a really nice, nice business. Um, you guys have been growing, you're growing your Salesforce and, uh, and things are good and all of a sudden, boom, the spark thing comes in. So we're seeing the evolution. I remember saying to George and the early days of a dupe, we were geeking out talking to all the bits and bytes and then it turned into a business discussion. It's like we're back to the hardcore bits and bites. So give us the update from Matt bar's point of view, where are we in the whole big data space? >>Well, I think, um, I think it has transitioned. I mean, uh, if you look at the typical large fortune company, the web to Datto's, it's really, how do we best leverage our data and how do we leverage our data in that we can, we can make decisions much faster, right? That high-frequency decision-making process. Um, and typically that involves taking production data and analytics and joining them together so that you're actually impacting business as it happens and to do that effectively requires, um, innovations. So the exciting thing about spark is taking and, uh, and having a distributed compute engine, it's much easier to develop and, uh, in much faster. >>So in the remember the early days we'd be at these shows and the big question was, you know, can you take the humans out of the equation? It's like, no, no humans are the last mile. Um, is that, is that changing or would we still need that human interaction or, >>Um, humans are important part of the process, but increasingly if you can adjust and make, you know, small algorithmic decisions, um, and, and make those decisions at that kind of moment of truth, you got big impact, and I'll give you a few examples. So, um, ad platforms, you know, Rubicon project over a hundred billion ad auctions a day, you know, humans, part of that process in terms of setting that up and reviewing the process, but each, you know, each supply and demand decision, there is an automated decision optimizing that has a huge impact on the bottom line, um, fraud, uh, you know, credit card swiping that transaction and deciding is this fraudulent or not avoiding false positives, et cetera, a big leveraged item. So we're seeing things like that across manufacturing, across retail healthcare. And, um, it isn't about asking bigger questions or doing reports and looking back at, you know, what happened last week. It's more, how can I have an infrastructure in place that allows this organization to be agile? Because it's not the companies with the most data that's going to win. It's the companies that are the most agile and making intelligent. >>So it's so much data. Humans can ingest it any faster. I mean, we just, we can't keep up. So the world needs data scientists that needs trained developers. You've got some news I want to talk about on the training side, but even that we can only throw so many bodies at the problem. So it's really software. That's going to allow us to scale it. Software's hard. Software takes time. So we've seen a lot of the spend in the analytics, big data world on, on services. And obviously you guys and others have been working hard to shift it towards software. I want to come back to that training issue. We heard this morning about, uh, Databricks launched a move. They trained 20,000 people. That's a lot, but still long way to go. You guys are putting some investment into training. Talk about that news. Yeah. >>Yeah. Um, well it starts at the underlying software. If you can do things in the platform to make it much easier and do things that are hard to surround with services, like, uh, data protection, right? If you've lost data, it doesn't matter how many people you throw at it, you can't recover it. Right. So that's kind of the starting point you're gonna get fired. >>The, the, uh, the approach we've taken is, is to take, uh, a software product approach to the training as well. So we rolled out on demand training. So it's free, it's on demand. You work at your own pace. It's got different modules, there's some training associated with that, or some hands-on labs, if you will. Um, we launched that last January. So it's basically coming up the year anniversary. We recently celebrated, we trained 50,000 people, uh, on, on Hadoop and big data. Um, today we're announcing expansion on spark classes. We've got full curriculum around spark, including a certification. So you can get sparked certification through this, this map, our on demand training. Okay. >>Gotcha. You said something really, really intriguing that I want to dive into a little bit is where we were talking about the small decisions that can be made really, really fast for that a human in the loop human might have to train them, but it at runtime now where you said, it's not about asking bigger questions, it's finding faster answers, um, what had to change in your platform or in the underlying technology to make that possible. >>You know, um, there's a lot that into it. It's typically a series of functions, uh, a kind of breadth that needs to be brought to the problem as well as squeezing out latencies. So instead of, um, the traditional approach, which is different applications and different analytic techniques dictate a separate silo, a separate, you know, scheme of data. And you've got those all around the organization and data kind of travels, and you get an answer at the end of some period of time. Uh, it's converging that altogether into a single platform, squeezing out those latencies so that you can have an informed action at the speed of business, if you will. And, >>Um, let's say spark never came along. Would that be possible? >>Yes. Yes. Would you, how would you, so if you look at kind of the different architectures that are out there, there's typically deep analytics in terms of, you know, let's go look at the trends, you know, the last seven years, what happened. And then look, let's look at, um, doing actions on a streaming set, say for instance, storm, and then let's do a real time database operations. So you could do that with, with HBase or map RDB and all of that together. What spark has really done is made that whole development process just much easier and much more streamlined. And that's where a lot of the excitements happen. >>So you mentioned earlier, um, to, to use cases, ad tech and fraud detection. Um, and I want to ask you about those in the state of those. So ad tech obviously has come a long way, but it's still got a ways to go. I mean, you look at, I mean, who's making money on ads. Obviously Google will make tons of money. Everybody else is sorta chasing them Facebook making money. It's probably cause they didn't let Google in. Okay. So how will spark affect sort of that business? Uh, and, and what's map, R's sort of role in evolving that, you know, to the next level. >>So, so, um, there's, there's different kind of compute and the types of things you can do, um, on the data. I think increasingly we're seeing the kind of streaming analytics and making those decisions as the data arrives, right. And then there's the whole ecosystem in terms of how do you coordinate those flows of data? It's not just a simple, here's the origin, here's the destination. There's typically a complex data flow. Um, that's where we've kind of focused on map our streams, this huge publish and subscribe infrastructure so that you can get real-time data to the appropriate location and then do the right operations, a lot of that involved with spark, but not exclusively. >>Okay. And then on fraud detection, um, obviously come a long way. Sampling could have died. Yes. And now, but now we're getting too many false positives. You get the call and, you know, I mean, I get a lot of calls because we can buy so much equipment, but, um, but now what about the next level? What are you guys doing to take fraud detection to the next level? So that when I get on the plane in Boston and I land in London, it knows, um, is that a database problem? Is it an integration problem, a systems problem, and how, what role you guys play in solving that? >>Well, there's, there's, um, you know, there's, there's a lot of details and techniques that probably go, um, beyond, you know, what, what we'll share publicly or what are our customers talk about publicly? I think in general, it's the more data that you can apply to a problem. The more context, the better off you are, that's the way I kind of summarize it so that instead of a sampling or instead of a boy, that's a strange purchase over there, it's understanding, well, this is Dave Valenti and this is the full body of, of, uh, expenditures he's done, then the types of things and here's who he frequently purchases from. And here's kind of a transaction trend started in San Francisco, went to New York, et cetera. So in context it would make more sense. So >>Part of that is more data. And the other part of that is just better algorithms and better, better learnings and applying that on a continuous basis. How are your customers dealing with that, that constraint? I mean, if they got a, a hundred dollars to spend, yeah. They can only spend so much on, on each of those gathering more data, cleaning the data, they spent so much time getting it ready versus making their machine learning algorithms or whatever the other techniques to do. What are you seeing there as sort of best practice? It was probably varies. I'm sure, but give us some color on it. >>Um, I'll actually go back to Google and Google a letter last round, um, you know, excellent, excellent insights coming from Google. They wrote a paper called the unreasonable effectiveness of data and in it, they basically squarely addressed that problem. And given the choice to invest in either the complex model and algorithm or put more data at it, putting more data, had a huge impact. And, um, you know, my simple explanation is if you're sampling the data, you have to have a model that tries to recreate reality. If you're looking at all of the data, then the anomalies can, can pop up and be more apparent. And, um, the more context you can bring, the more data from other sources. So you get around, you know, a better picture of what's happening, the better off you are. And so that requires scale. It requires speed and requires different techniques that can be brought to bear, right? The database operation, here's a streaming operation, here's a deep, you know, file machine learning algorithm. >>So there's a lot of vendors in the sort of big data ecosystem are coming at spark from different angles and, um, are, are trying to add value to it and sort of bathe themselves in sort of the halo. Yep. Now you guys took some time upfront to build a converged platform so that you weren't trying to wrap your arms around 37 different projects. Can you tell us how having perhaps not anticipated spark how this converts platform allows you to add more value to it than other approaches? >>So, so we simplify, if you look at the Hadoop ecosystem, it's basically separated into the components for compute and management on top of the data layer, right? The Hadoop distributed file system. So how do you scale data? How do you protect it? It's very simply what's going on. Spark really does a great job at that top layer. Doesn't do anything about defining the underlying storage layer in the Hadoop community that underlying storage layer is a batch system. So you're trying to do, you know, micro batch kind of streaming operations on top of batch oriented data. What we addressed was to take that whole data layer, make it real time, make it random. Read-write converge enterprise storage together with Hadoop support and spark support on a single platform. And that's basically >>With the difference and to make an enterprise great. You guys were really the first to lead the lecture. You were, everybody started talking about attic price straight after you were kind of delivering it. So you've had a lead there. Do you feel like you still have a lead there, or is that the kind of thing where you sort of hit the top of the S-curve and start innovating elsewhere? >>NC state did a study, uh, just this past year, a recent study identified that only 25% of data corruption issues are identified and properly handled by the Hadoop distributed file system. 42% of those are silent. So there's a huge gap in terms of quote unquote enterprise grade features and what we think. >>Yes, silent data corruption has been a problem for decades now. And you're saying it's no different in the duke ecosystem, especially as, as mainstream businesses start to, uh, to adopt this what's happening in the valley. Uh, we're seeing, you know, in the wall street journal every day you read about down rounds, flat rounds, people can't get B rounds. Uh, you guys are funded, you know, you're growing, you're talking about investments, you know, what do you see? Do you, do you feel like you're achieving escape velocity? Um, maybe give us sort of an update on, uh, the state of the business. >>Yeah. I, I think the state of the business is best represented by the customers, right? And the customers kind of vote, right. They vote in terms of, you know, how well is this technology driving their business? So we've got a recent study, um, that kind of shows the, the returns that customers, um, are getting, uh, we've got a 1% chance, a 99% retention rate with our customers. We've got, uh, an expansion rate. That's, that's unbelievable. We've got multi-million dollar customers in, uh, in seven of the top verticals and nine out of the top $10 million customers. So we're seeing significant investments and more importantly, significant returns on the part of customers where they're not just doing a single application on the platform, but multiple >>Applications, Jack Norris map are always focused. Always a pleasure having you on the cube. Thanks very much for coming on. Appreciate it. Keep right there, buddy. We'll be back with our next guest is the cube we're live from spark somebody's right back. Okay.

Published Date : Feb 17 2016

SUMMARY :

covering sparks summit east brought to you by spark summit. It's great to see you again. I think we've yeah, You said, you know, we're just going to go build solutions and, if you look at the typical large fortune company, So in the remember the early days we'd be at these shows and the big question was, you know, and reviewing the process, but each, you know, each supply and demand decision, And obviously you guys and others have been working hard to shift it towards software. If you can do things in the platform to make it much easier and do things that are hard to surround So you can get sparked certification through really fast for that a human in the loop human might have to train them, but it at runtime around the organization and data kind of travels, and you get an answer at the end of some period Would that be possible? let's go look at the trends, you know, the last seven years, what happened. So you mentioned earlier, um, to, to use cases, ad tech and fraud detection. so that you can get real-time data to the appropriate location and then do the right operations, You get the call and, you know, I mean, I get a lot of calls because we can buy so much equipment, but, The more context, the better off you are, that's the way I kind of summarize What are you seeing there as sort of best practice? um, you know, my simple explanation is if you're sampling the data, this converts platform allows you to add more value to it than other approaches? So how do you scale data? You were, everybody started talking about attic price straight after you were kind of delivering it. and properly handled by the Hadoop distributed file system. you know, in the wall street journal every day you read about down rounds, flat rounds, people can't get B rounds. They vote in terms of, you know, Always a pleasure having you on the cube.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Dave ValentiPERSON

0.99+

Jack NorrisPERSON

0.99+

Dave VolantePERSON

0.99+

New YorkLOCATION

0.99+

LondonLOCATION

0.99+

GeorgePERSON

0.99+

San FranciscoLOCATION

0.99+

BostonLOCATION

0.99+

George GilbertPERSON

0.99+

99%QUANTITY

0.99+

GoogleORGANIZATION

0.99+

42%QUANTITY

0.99+

FacebookORGANIZATION

0.99+

DatabricksORGANIZATION

0.99+

50,000 peopleQUANTITY

0.99+

nineQUANTITY

0.99+

20,000 peopleQUANTITY

0.99+

last weekDATE

0.99+

DattoORGANIZATION

0.99+

last JanuaryDATE

0.99+

$10 millionQUANTITY

0.98+

sevenQUANTITY

0.98+

eachQUANTITY

0.98+

firstQUANTITY

0.98+

MapboxORGANIZATION

0.98+

todayDATE

0.97+

1%QUANTITY

0.97+

HadoopTITLE

0.97+

MattPERSON

0.96+

single platformQUANTITY

0.96+

NCORGANIZATION

0.95+

this morningDATE

0.95+

single applicationQUANTITY

0.94+

25%QUANTITY

0.94+

MidtownLOCATION

0.93+

first bigQUANTITY

0.92+

RubiconORGANIZATION

0.92+

37 different projectsQUANTITY

0.92+

last seven yearsDATE

0.89+

over a hundred billion ad auctions a dayQUANTITY

0.88+

this past yearDATE

0.86+

sparkORGANIZATION

0.85+

multi-million dollarQUANTITY

0.84+

decadesQUANTITY

0.83+

a hundred dollarsQUANTITY

0.79+

data corruptionQUANTITY

0.7+

HBaseTITLE

0.67+

HiltonORGANIZATION

0.67+

RDBTITLE

0.64+

SparkORGANIZATION

0.57+

MapRORGANIZATION

0.57+

mapTITLE

0.57+

SalesforceORGANIZATION

0.53+

2016EVENT

0.51+

- Spark SummitEVENT

0.46+

EastLOCATION

0.42+

Teresa Carlson, Flexport | International Women's Day


 

(upbeat intro music) >> Hello everyone. Welcome to theCUBE's coverage of International Women's Day. I'm your host, John Furrier, here in Palo Alto, California. Got a special remote guest coming in. Teresa Carlson, President and Chief Commercial Officer at Flexport, theCUBE alumni, one of the first, let me go back to 2013, Teresa, former AWS. Great to see you. Thanks for coming on. >> Oh my gosh, almost 10 years. That is unbelievable. It's hard to believe so many years of theCUBE. I love it. >> It's been such a great honor to interview you and follow your career. You've had quite the impressive run, executive level woman in tech. You've done such an amazing job, not only in your career, but also helping other women. So I want to give you props to that before we get started. Thank you. >> Thank you, John. I, it's my, it's been my honor and privilege. >> Let's talk about Flexport. Tell us about your new role there and what it's all about. >> Well, I love it. I'm back working with another Amazonian, Dave Clark, who is our CEO of Flexport, and we are about 3,000 people strong globally in over 90 countries. We actually even have, we're represented in over 160 cities and with local governments and places around the world, which I think is super exciting. We have over 100 network partners and growing, and we are about empowering the global supply chain and trade and doing it in a very disruptive way with the use of platform technology that allows our customers to really have visibility and insight to what's going on. And it's a lot of fun. I'm learning new things, but there's a lot of technology in this as well, so I feel right at home. >> You quite have a knack from mastering growth, technology, and building out companies. So congratulations, and scaling them up too with the systems and processes. So I want to get into that. Let's get into your personal background. Then I want to get into the work you've done and are doing for empowering women in tech. What was your journey about, how did it all start? Like, I know you had a, you know, bumped into it, you went Microsoft, AWS. Take us through your career, how you got into tech, how it all happened. >> Well, I do like to give a shout out, John, to my roots and heritage, which was a speech and language pathologist. So I did start out in healthcare right out of, you know, university. I had an undergraduate and a master's degree. And I do tell everyone now, looking back at my career, I think it was super helpful for me because I learned a lot about human communication, and it has done me very well over the years to really try to understand what environments I'm in and what kind of individuals around the world culturally. So I'm really blessed that I had that opportunity to work in healthcare, and by the way, a shout out to all of our healthcare workers that has helped us get through almost three years of COVID and flu and neurovirus and everything else. So started out there and then kind of almost accidentally got into technology. My first small company I worked for was a company called Keyfile Corporation, which did workflow and document management out of Nashua, New Hampshire. And they were a Microsoft goal partner. And that is actually how I got into big tech world. We ran on exchange, for everybody who knows that term exchange, and we were a large small partner, but large in the world of exchange. And those were the days when you would, the late nineties, you would go and be in the same room with Bill Gates and Steve Ballmer. And I really fell in love with Microsoft back then. I thought to myself, wow, if I could work for a big tech company, I got to hear Bill on stage about saving, he would talk about saving the world. And guess what my next step was? I actually got a job at Microsoft, took a pay cut and a job downgrade. I tell this story all the time. Took like three downgrades in my role. I had been a SVP and went to a manager, and it's one of the best moves I ever made. And I shared that because I really didn't know the world of big tech, and I had to start from the ground up and relearn it. I did that, I just really loved that job. I was at Microsoft from 2000 to 2010, where I eventually ran all of the U.S. federal government business, which was a multi-billion dollar business. And then I had the great privilege of meeting an amazing man, Andy Jassy, who I thought was just unbelievable in his insights and knowledge and openness to understanding new markets. And we talked about government and how government needed the same great technology as every startup. And that led to me going to work for Andy in 2010 and starting up our worldwide public sector business. And I pinch myself some days because we went from two people, no offices, to the time I left we had over 10,000 people, billions in revenue, and 172 countries and had done really amazing work. I think changing the way public sector and government globally really thought about their use of technology and Cloud computing in general. And that kind of has been my career. You know, I was there till 2020, 21 and then did a small stint at Splunk, a small stint back at Microsoft doing a couple projects for Microsoft with CEO, Satya Nadella, who is also an another amazing CEO and leader. And then Dave called me, and I'm at Flexport, so I couldn't be more honored, John. I've just had such an amazing career working with amazing individuals. >> Yeah, I got to say the Amazon One well-documented, certainly by theCUBE and our coverage. We watched you rise and scale that thing. And like I said at a time, this will when we look back as a historic run because of the build out. I mean as a zero to massive billions at a historic time where government was transforming, I would say Microsoft had a good run there with Fed, but it was already established stuff. Federal business was like, you know, blocking and tackling. The Amazon was pure build out. So I have to ask you, what was your big learnings? Because one, you're a Seattle big tech company kind of entrepreneurial in the sense of you got, here's some working capital seed finance and go build that thing, and you're in DC and you're a woman. What did you learn? >> I learned that you really have to have a lot of grit. You, my mom and dad, these are kind of more southern roots words, but stick with itness, you know. you can't give up and no's not in your vocabulary. I found no is just another way to get to yes. That you have to figure out what are all the questions people are going to ask you. I learned to be very patient, and I think one of the things John, for us was our secret sauce was we said to ourselves, if we're going to do something super transformative and truly disruptive, like Cloud computing, which the government really had not utilized, we had to be patient. We had to answer all their questions, and we could not judge in any way what they were thinking because if we couldn't answer all those questions and prove out the capabilities of Cloud computing, we were not going to accomplish our goals. And I do give so much credit to all my colleagues there from everybody like Steve Schmidt who was there, who's still there, who's the CISO, and Charlie Bell and Peter DeSantis and the entire team there that just really helped build that business out. Without them, you know, we would've just, it was a team effort. And I think that's the thing I loved about it was it was not just sales, it was product, it was development, it was data center operations, it was legal, finance. Everybody really worked as a team and we were on board that we had to make a lot of changes in the government relations team. We had to go into Capitol Hill. We had to talk to them about the changes that were required and really get them to understand why Cloud computing could be such a transformative game changer for the way government operates globally. >> Well, I think the whole world and the tech world can appreciate your work and thank you later because you broke down those walls asking those questions. So great stuff. Now I got to say, you're in kind of a similar role at Flexport. Again, transformative supply chain, not new. Computing wasn't new when before Cloud came. Supply chain, not a new concept, is undergoing radical change and transformation. Online, software supply chain, hardware supply chain, supply chain in general, shipping. This is a big part of our economy and how life is working. Similar kind of thing going on, build out, growth, scale. >> It is, it's very much like that, John, I would say, it's, it's kind of a, the model with freight forwarding and supply chain is fairly, it's not as, there's a lot of technology utilized in this global supply chain world, but it's not integrated. You don't have a common operating picture of what you're doing in your global supply chain. You don't have easy access to the information and visibility. And that's really, you know, I was at a conference last week in LA, and it was, the themes were so similar about transparency, access to data and information, being able to act quickly, drive change, know what was happening. I was like, wow, this sounds familiar. Data, AI, machine learning, visibility, common operating picture. So it is very much the same kind of themes that you heard even with government. I do believe it's an industry that is going through transformation and Flexport has been a group that's come in and said, look, we have this amazing idea, number one to give access to everyone. We want every small business to every large business to every government around the world to be able to trade their goods, think about supply chain logistics in a very different way with information they need and want at their fingertips. So that's kind of thing one, but to apply that technology in a way that's very usable across all systems from an integration perspective. So it's kind of exciting. I used to tell this story years ago, John, and I don't think Michael Dell would mind that I tell this story. One of our first customers when I was at Keyfile Corporation was we did workflow and document management, and Dell was one of our customers. And I remember going out to visit them, and they had runners and they would run around, you know, they would run around the floor and do their orders, right, to get all those computers out the door. And when I think of global trade, in my mind I still see runners, you know, running around and I think that's moved to a very digital, right, world that all this stuff, you don't need people doing this. You have machines doing this now, and you have access to the information, and you know, we still have issues resulting from COVID where we have either an under-abundance or an over-abundance of our supply chain. We still have clogs in our shipping, in the shipping yards around the world. So we, and the ports, so we need to also, we still have some clearing to do. And that's the reason technology is important and will continue to be very important in this world of global trade. >> Yeah, great, great impact for change. I got to ask you about Flexport's inclusion, diversity, and equity programs. What do you got going on there? That's been a big conversation in the industry around keeping a focus on not making one way more than the other, but clearly every company, if they don't have a strong program, will be at a disadvantage. That's well reported by McKinsey and other top consultants, diverse workforces, inclusive, equitable, all perform better. What's Flexport's strategy and how are you guys supporting that in the workplace? >> Well, let me just start by saying really at the core of who I am, since the day I've started understanding that as an individual and a female leader, that I could have an impact. That the words I used, the actions I took, the information that I pulled together and had knowledge of could be meaningful. And I think each and every one of us is responsible to do what we can to make our workplace and the world a more diverse and inclusive place to live and work. And I've always enjoyed kind of the thought that, that I could help empower women around the world in the tech industry. Now I'm hoping to do my little part, John, in that in the supply chain and global trade business. And I would tell you at Flexport we have some amazing women. I'm so excited to get to know all. I've not been there that long yet, but I'm getting to know we have some, we have a very diverse leadership team between men and women at Dave's level. I have some unbelievable women on my team directly that I'm getting to know more, and I'm so impressed with what they're doing. And this is a very, you know, while this industry is different than the world I live in day to day, it's also has a lot of common themes to it. So, you know, for us, we're trying to approach every day by saying, let's make sure both our interviewing cycles, the jobs we feel, how we recruit people, how we put people out there on the platforms, that we have diversity and inclusion and all of that every day. And I can tell you from the top, from Dave and all of our leaders, we just had an offsite and we had a big conversation about this is something. It's a drum beat that we have to think about and live by every day and really check ourselves on a regular basis. But I do think there's so much more room for women in the world to do great things. And one of the, one of the areas, as you know very well, we lost a lot of women during COVID, who just left the workforce again. So we kind of went back unfortunately. So we have to now move forward and make sure that we are giving women the opportunity to have great jobs, have the flexibility they need as they build a family, and have a workplace environment that is trusted for them to come into every day. >> There's now clear visibility, at least in today's world, not withstanding some of the setbacks from COVID, that a young girl can look out in a company and see a path from entry level to the boardroom. That's a big change. A lot than even going back 10, 15, 20 years ago. What's your advice to the folks out there that are paying it forward? You see a lot of executive leaderships have a seat at the table. The board still underrepresented by most numbers, but at least you have now kind of this solidarity at the top, but a lot of people doing a lot more now than I've seen at the next levels down. So now you have this leveled approach. Is that something that you're seeing more of? And credit compare and contrast that to 20 years ago when you were, you know, rising through the ranks? What's different? >> Well, one of the main things, and I honestly do not think about it too much, but there were really no women. There were none. When I showed up in the meetings, I literally, it was me or not me at the table, but at the seat behind the table. The women just weren't in the room, and there were so many more barriers that we had to push through, and that has changed a lot. I mean globally that has changed a lot in the U.S. You know, if you look at just our U.S. House of Representatives and our U.S. Senate, we now have the increasing number of women. Even at leadership levels, you're seeing that change. You have a lot more women on boards than we ever thought we would ever represent. While we are not there, more female CEOs that I get an opportunity to see and talk to. Women starting companies, they do not see the barriers. And I will share, John, globally in the U.S. one of the things that I still see that we have that many other countries don't have, which I'm very proud of, women in the U.S. have a spirit about them that they just don't see the barriers in the same way. They believe that they can accomplish anything. I have two sons, I don't have daughters. I have nieces, and I'm hoping someday to have granddaughters. But I know that a lot of my friends who have granddaughters today talk about the boldness, the fortitude, that they believe that there's nothing they can't accomplish. And I think that's what what we have to instill in every little girl out there, that they can accomplish anything they want to. The world is theirs, and we need to not just do that in the U.S., but around the world. And it was always the thing that struck me when I did all my travels at AWS and now with Flexport, I'm traveling again quite a bit, is just the differences you see in the cultures around the world. And I remember even in the Middle East, how I started seeing it change. You've heard me talk a lot on this program about the fact in both Saudi and Bahrain, over 60% of the tech workers were females and most of them held the the hardest jobs, the security, the architecture, the engineering. But many of them did not hold leadership roles. And that is what we've got to change too. To your point, the middle, we want it to get bigger, but the top, we need to get bigger. We need to make sure women globally have opportunities to hold the most precious leadership roles and demonstrate their capabilities at the very top. But that's changed. And I would say the biggest difference is when we show up, we're actually evaluated properly for those kind of roles. We have a ways to go. But again, that part is really changing. >> Can you share, Teresa, first of all, that's great work you've done and I wan to give you props of that as well and all the work you do. I know you champion a lot of, you know, causes in in this area. One question that comes up a lot, I would love to get your opinion 'cause I think you can contribute heavily here is mentoring and sponsorship is huge, comes up all the time. What advice would you share to folks out there who were, I won't say apprehensive, but maybe nervous about how to do the networking and sponsorship and mentoring? It's not just mentoring, it's sponsorship too. What's your best practice? What advice would you give for the best way to handle that? >> Well yeah, and for the women out there, I would say on the mentorship side, I still see mentorship. Like, I don't think you can ever stop having mentorship. And I like to look at my mentors in different parts of my life because if you want to be a well-rounded person, you may have parts of your life every day that you think I'm doing a great job here and I definitely would like to do better there. Whether it's your spiritual life, your physical life, your work life, you know, your leisure life. But I mean there's, and there's parts of my leadership world that I still seek advice from as I try to do new things even in this world. And I tried some new things in between roles. I went out and asked the people that I respected the most. So I just would say for sure have different mentorships and don't be afraid to have that diversity. But if you have mentorships, the second important thing is show up with a real agenda and questions. Don't waste people's time. I'm very sensitive today. If you're, if you want a mentor, you show up and you use your time super effectively and be prepared for that. Sponsorship is a very different thing. And I don't believe we actually do that still in companies. We worked, thank goodness for my great HR team. When I was at AWS, we worked on a few sponsorship programs where for diversity in general, where we would nominate individuals in the company that we felt that weren't, that had a lot of opportunity for growth, but they just weren't getting a seat at the table. And we brought 'em to the table. And we actually kind of had a Chatham House rules where when they came into the meetings, they had a sponsor, not a mentor. They had a sponsor that was with them the full 18 months of this program. We would bring 'em into executive meetings. They would read docs, they could ask questions. We wanted them to be able to open up and ask crazy questions without, you know, feeling wow, I just couldn't answer this question in a normal environment or setting. And then we tried to make sure once they got through the program that we found jobs and support and other special projects that they could go do. But they still had that sponsor and that group of individuals that they'd gone through the program with, John, that they could keep going back to. And I remember sitting there and they asked me what I wanted to get out of the program, and I said two things. I want you to leave this program and say to yourself, I would've never had that experience if I hadn't gone through this program. I learned so much in 18 months. It would probably taken me five years to learn. And that it helped them in their career. The second thing I told them is I wanted them to go out and recruit individuals that look like them. I said, we need diversity, and unless you all feel that we are in an inclusive environment sponsoring all types of individuals to be part of this company, we're not going to get the job done. And they said, okay. And you know, but it was really one, it was very much about them. That we took a group of individuals that had high potential and a very diverse with diverse backgrounds, held 'em up, taught 'em things that gave them access. And two, selfishly I said, I want more of you in my business. Please help me. And I think those kind of things are helpful, and you have to be thoughtful about these kind of programs. And to me that's more sponsorship. I still have people reach out to me from years ago, you know, Microsoft saying, you were so good with me, can you give me a reference now? Can you talk to me about what I should be doing? And I try to, I'm not pray 100%, some things pray fall through the cracks, but I always try to make the time to talk to those individuals because for me, I am where I am today because I got some of the best advice from people like Don Byrne and Linda Zecker and Andy Jassy, who were very honest and upfront with me about my career. >> Awesome. Well, you got a passion for empowering women in tech, paying it forward, but you're quite accomplished and that's why we're so glad to have you on the program here. President and Chief Commercial Officer at Flexport. Obviously storied career and your other jobs, specifically Amazon I think, is historic in my mind. This next chapter looks like it's looking good right now. Final question for you, for the few minutes you have left. Tell us what you're up to at Flexport. What's your goals as President, Chief Commercial Officer? What are you trying to accomplish? Share a little bit, what's on your mind with your current job? >> Well, you kind of said it earlier. I think if I look at my own superpowers, I love customers, I love partners. I get my energy, John, from those interactions. So one is to come in and really help us build even a better world class enterprise global sales and marketing team. Really listen to our customers, think about how we interact with them, build the best executive programs we can, think about new ways that we can offer services to them and create new services. One of my favorite things about my career is I think if you're a business leader, it's your job to come back around and tell your product group and your services org what you're hearing from customers. That's how you can be so much more impactful, that you listen, you learn, and you deliver. So that's one big job. The second job for me, which I am so excited about, is that I have an amazing group called flexport.org under me. And flexport.org is doing amazing things around the world to help those in need. We just announced this new funding program for Tech for Refugees, which brings assistance to millions of people in Ukraine, Pakistan, the horn of Africa, and those who are affected by earthquakes. We just took supplies into Turkey and Syria, and Flexport, recently in fact, just did sent three air shipments to Turkey and Syria for these. And I think we did over a hundred trekking shipments to get earthquake relief. And as you can imagine, it was not easy to get into Syria. But you know, we're very active in the Ukraine, and we are, our goal for flexport.org, John, is to continue to work with our commercial customers and team up with them when they're trying to get supplies in to do that in a very cost effective, easy way, as quickly as we can. So that not-for-profit side of me that I'm so, I'm so happy. And you know, Ryan Peterson, who was our founder, this was his brainchild, and he's really taken this to the next level. So I'm honored to be able to pick that up and look for new ways to have impact around the world. And you know, I've always found that I think if you do things right with a company, you can have a beautiful combination of commercial-ity and giving. And I think Flexport does it in such an amazing and unique way. >> Well, the impact that they have with their system and their technology with logistics and shipping and supply chain is a channel for societal change. And I think that's a huge gift that you have that under your purview. So looking forward to finding out more about flexport.org. I can only imagine all the exciting things around sustainability, and we just had Mobile World Congress for Big Cube Broadcast, 5Gs right around the corner. I'm sure that's going to have a huge impact to your business. >> Well, for sure. And just on gas emissions, that's another thing that we are tracking gas, greenhouse gas emissions. And in fact we've already reduced more than 300,000 tons and supported over 600 organizations doing that. So that's a thing we're also trying to make sure that we're being climate aware and ensuring that we are doing the best job we can at that as well. And that was another thing I was honored to be able to do when we were at AWS, is to really cut out greenhouse gas emissions and really go global with our climate initiatives. >> Well Teresa, it's great to have you on. Security, data, 5G, sustainability, business transformation, AI all coming together to change the game. You're in another hot seat, hot roll, big wave. >> Well, John, it's an honor, and just thank you again for doing this and having women on and really representing us in a big way as we celebrate International Women's Day. >> I really appreciate it, it's super important. And these videos have impact, so we're going to do a lot more. And I appreciate your leadership to the industry and thank you so much for taking the time to contribute to our effort. Thank you, Teresa. >> Thank you. Thanks everybody. >> Teresa Carlson, the President and Chief Commercial Officer of Flexport. I'm John Furrier, host of theCUBE. This is International Women's Day broadcast. Thanks for watching. (upbeat outro music)

Published Date : Mar 6 2023

SUMMARY :

and Chief Commercial Officer It's hard to believe so honor to interview you I, it's my, it's been Tell us about your new role and insight to what's going on. and are doing for And that led to me going in the sense of you got, I learned that you really Now I got to say, you're in kind of And I remember going out to visit them, I got to ask you about And I would tell you at Flexport to 20 years ago when you were, you know, And I remember even in the Middle East, I know you champion a lot of, you know, And I like to look at my to have you on the program here. And I think we did over a I can only imagine all the exciting things And that was another thing I Well Teresa, it's great to have you on. and just thank you again for and thank you so much for taking the time Thank you. and Chief Commercial Officer of Flexport.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Satya NadellaPERSON

0.99+

Jeremy BurtonPERSON

0.99+

DavePERSON

0.99+

CiscoORGANIZATION

0.99+

Teresa CarlsonPERSON

0.99+

Dave VellantePERSON

0.99+

Dave VallentePERSON

0.99+

Ryan PetersonPERSON

0.99+

MicrosoftORGANIZATION

0.99+

Andy JassyPERSON

0.99+

TeresaPERSON

0.99+

JohnPERSON

0.99+

Linda ZeckerPERSON

0.99+

AmazonORGANIZATION

0.99+

MikePERSON

0.99+

John FurrierPERSON

0.99+

Steve BallmerPERSON

0.99+

CanadaLOCATION

0.99+

GoogleORGANIZATION

0.99+

AWSORGANIZATION

0.99+

FlexportORGANIZATION

0.99+

Dave ClarkPERSON

0.99+

Mike FrancoPERSON

0.99+

Stu MinimanPERSON

0.99+

2010DATE

0.99+

SyriaLOCATION

0.99+

HallmarkORGANIZATION

0.99+

UkraineLOCATION

0.99+

Don ByrnePERSON

0.99+

Keyfile CorporationORGANIZATION

0.99+

Steve SchmidtPERSON

0.99+

DellORGANIZATION

0.99+

five yearsQUANTITY

0.99+

Dave StanfordPERSON

0.99+

TurkeyLOCATION

0.99+

BostonLOCATION

0.99+

JuneDATE

0.99+

Middle EastLOCATION

0.99+

second jobQUANTITY

0.99+

Michael DellPERSON

0.99+

dozensQUANTITY

0.99+

2013DATE

0.99+

MayDATE

0.99+

2019DATE

0.99+

LALOCATION

0.99+

Amazon Web ServicesORGANIZATION

0.99+

100%QUANTITY

0.99+

CUBE Analysis of Day 1 of MWC Barcelona 2023 | MWC Barcelona 2023


 

>> Announcer: theCUBE's live coverage is made possible by funding from Dell Technologies creating technologies that drive human progress. (upbeat music) >> Hey everyone, welcome back to theCube's first day of coverage of MWC 23 from Barcelona, Spain. Lisa Martin here with Dave Vellante and Dave Nicholson. I'm literally in between two Daves. We've had a great first day of coverage of the event. There's been lots of conversations, Dave, on disaggregation, on the change of mobility. I want to be able to get your perspectives from both of you on what you saw on the show floor, what you saw and heard from our guests today. So we'll start with you, Dave V. What were some of the things that were our takeaways from day one for you? >> Well, the big takeaway is the event itself. On day one, you get a feel for what this show is like. Now that we're back, face-to-face kind of pretty much full face-to-face. A lot of excitement here. 2000 plus exhibitors, I mean, planes, trains, automobiles, VR, AI, servers, software, I mean everything. I mean, everybody is here. So it's a really comprehensive show. It's not just about mobile. That's why they changed the name from Mobile World Congress. I think the other thing is from the keynotes this morning, I mean, you heard, there's a lot of, you know, action around the telcos and the transformation, but in a lot of ways they're sort of protecting their existing past from the future. And so they have to be careful about how fast they move. But at the same time if they don't move fast, they're going to get disrupted. We heard some complaints, essentially, you know, veiled complaints that the over the top guys aren't paying their fair share and Telco should be able to charge them more. We heard the chairman of Ericsson talk about how we can't let the OTTs do that again. We're going to charge directly for access through APIs to our network, to our data. We heard from Chris Lewis. Yeah. They've only got, or maybe it was San Ji Choha, how they've only got eight APIs. So, you know the developers are the ones who are going to actually build out the innovation at the edge. The telcos are going to provide the connectivity and the infrastructure companies like Dell as well. But it's really to me all about the developers. And that's where the action's going to be. And it's going to be interesting to see how the developers respond to, you know, the gun to the head. If you want access, you're going to have to pay for it. Now maybe there's so much money to be made that they'll go for it, but I feel like there's maybe a different model. And I think some of the emerging telcos are going to say, you know what, here developers, here's a platform, have at it. We're not going to charge you for all the data until you succeed. Then we're going to figure out a monetization model. >> Right. A lot of opportunity for the developer. That skillset is certainly one that's in demand here. And certainly the transformation of the telecom industry is, there's a lot of conundrums that I was hearing going on today, kind of chicken and egg scenarios. But Dave, you had a chance to walk around the show floor. We were here interviewing all day. What were some of the things that you saw that really stuck out to you? >> I think I was struck by how much attention was being paid to private 5G networks. You sort of read between the lines and it appears as though people kind of accept that the big incumbent telecom players are going to be slower to move. And this idea of things like open RAN where you're leveraging open protocols in a stack to deliver more agility and more value. So it sort of goes back to the generalized IT discussion of moving to cloud for agility. It appears as though a lot of players realize that the wild wild west, the real opportunity, is in the private sphere. So it's really interesting to see how that works, how 5G implemented into an environment with wifi how that actually works. It's really interesting. >> So it's, obviously when you talk to companies like Dell, I haven't hit HPE yet. I'm going to go over there and check out their booth. They got an analyst thing going on but it's really early days for them. I mean, they started in this business by taking an X86 box, putting a name on it, you know, that sounded like it was edged, throwing it over, you know, the wall. That's sort of how they all started in this business. And now they're, you know, but they knew they had to form partnerships. They had to build purpose-built systems. Now with 16 G out, you're seeing that. And so it's still really early days, talking about O RAN, open RAN, the open RAN alliance. You know, it's just, I mean, not even, the game hasn't even barely started yet but we heard from Dish today. They're trying to roll out a massive 5G network. Rakuten is really focused on sort of open RAN that's more reliable, you know, or as reliable as the existing networks but not as nearly as huge a scale as Dish. So it's going to take a decade for this to evolve. >> Which is surprising to the average consumer to hear that. Because as far as we know 5G has been around for a long time. We've been talking about 5G, implementing 5G, you sort of assume it's ubiquitous but the reality is it is just the beginning. >> Yeah. And you know, it's got a fake 5G too, right? I mean you see it on your phone and you're like, what's the difference here? And it's, you know, just, >> Dave N.: What does it really mean? >> Right. And so I think your point about private is interesting, the conversation Dave that we had earlier, I had throughout, hey I don't think it's a replacement for wifi. And you said, "well, why not?" I guess it comes down to economics. I mean if you can get the private network priced close enough then you're right. Why wouldn't it replace wifi? Now you got wifi six coming in. So that's a, you know, and WiFi's flexible, it's cheap, it's good for homes, good for offices, but these private networks are going to be like kickass, right? They're going to be designed to run whatever, warehouses and robots, and energy drilling facilities. And so, you know the economics I don't think are there today but maybe they can be at volume. >> Maybe at some point you sort of think of today's science experiment becoming the enterprise-grade solution in the future. I had a chance to have some conversations with folks around the show. And I think, and what I was surprised by was I was reminded, frankly, I wasn't surprised. I was reminded that when we start talking about 5G, we're talking about spectrum that is managed by government entities. Of course all broadcast, all spectrum, is managed in one way or another. But in particular, you can't simply put a SIM in every device now because there are a lot of regulatory hurdles that have to take place. So typically what these things look like today is 5G backhaul to the network, communication from that box to wifi. That's a huge improvement already. So yeah, my question about whether, you know, why not put a SIM in everything? Maybe eventually, but I think, but there are other things that I was not aware of that are standing in the way. >> Your point about spectrum's an interesting one though because private networks, you're going to be able to leverage that spectrum in different ways, and tune it essentially, use different parts of the spectrum, make it programmable so that you can apply it to that specific use case, right? So it's going to be a lot more flexible, you know, because I presume the needs spectrum needs of a hospital are going to be different than, you know, an agribusiness are going to be different than a drilling, you know, unit, offshore drilling unit. And so the ability to have the flexibility to use the spectrum in different ways and apply it to that use case, I think is going to be powerful. But I suspect it's going to be expensive initially. I think the other thing we talked about is public policy and regulation, and it's San Ji Choha brought up the point, is telcos have been highly regulated. They don't just do something and ask for permission, you know, they have to work within the confines of that regulated environment. And there's a lot of these greenfield companies and private networks that don't necessarily have to follow those rules. So that's a potential disruptive force. So at the same time, the telcos are spending what'd we hear, a billion, a trillion and a half over the next seven years? Building out 5G networks. So they got to figure out, you know how to get a payback on that. They'll get it I think on connectivity, 'cause they have a monopoly but they want more. They're greedy. They see the over, they see the Netflixes of the world and the Googles and the Amazons mopping up services and they want a piece of that action but they've never really been good at it. >> Well, I've got a question for both of you. I mean, what do you think the odds are that by the time the Shangri La of fully deployed 5G happens that we have so much data going through it that effectively it feels exactly the same as 3G? What are the odds? >> That's a good point. Well, the thing that gets me about 5G is there's so much of it on, if I go to the consumer side when we're all consumers in our daily lives so much of it's marketing hype. And, you know all the messaging about that, when it's really early innings yet they're talking about 6G. What does actual fully deployed 5G look like? What is that going to enable a hospital to achieve or an oil refinery out in the middle of the ocean? That's something that interests me is what's next for that? Are we going to hear that at this event? >> I mean, walking around, you see a fair amount of discussion of, you know, the internet of things. Edge devices, the increase in connectivity. And again, what I was surprised by was that there's very little talk about a sim card in every one of those devices at this point. It's like, no, no, no, we got wifi to handle all that but aggregating it back into a central network that's leveraging 5G. That's really interesting. That's really interesting. >> I think you, the odds of your, to go back to your question, I think the odds are even money, that by the time it's all built out there's going to be so much data and so much new capability it's going to work similarly at similar speeds as we see in the networks today. You're just going to be able to do so many more things. You know, and your video's going to look better, the graphics are going to look better. But I think over the course of history, this is what's happening. I mean, even when you go back to dial up, if you were in an AOL chat room in 1996, it was, you know, yeah it took a while. You're like, (screeches) (Lisa laughs) the modem and everything else, but once you were in there- >> Once you're there, 2400 baud. >> It was basically real time. And so you could talk to your friends and, you know, little chat room but that's all you could do. You know, if you wanted to watch a video, forget it, right? And then, you know, early days of streaming video, stop, start, stop, start, you know, look at Amazon Prime when it first started, Prime Video was not that great. It's sort of catching up to Netflix. But, so I think your point, that question is really prescient because more data, more capability, more apps means same speed. >> Well, you know, you've used the phrase over the top. And so just just so we're clear so we're talking about the same thing. Typically we're talking about, you've got, you have network providers. Outside of that, you know, Netflix, internet connection, I don't need Comcast, right? Perfect example. Well, what about the over the top that's coming from direct satellite communications with devices. There are times when I don't have a signal on my, happens to be an Apple iPhone, when I get a little SOS satellite logo because I can communicate under very limited circumstances now directly to the satellite for very limited text messaging purposes. Here at the show, I think it might be a Motorola device. It's a dongle that allows any mobile device to leverage direct satellite communication. Again, for texting back to the 2,400 baud modem, you know, days, 1200 even, 300 even, go back far enough. What's that going to look like? Is that too far in the future to think that eventually it's all going to be over the top? It's all going to be handset to satellite and we don't need these RANs anymore. It's all going to be satellite networks. >> Dave V.: I think you're going to see- >> Little too science fiction-y? (laughs) >> No, I, no, I think it's a good question and I think you're going to see fragments. I think you're going to see fragmentation of private networks. I think you're going to see fragmentation of satellites. I think you're going to see legacy incumbents kind of hanging on, you know, the cable companies. I think that's coming. I think by 2030 it'll, the picture will be much more clear. The question is, and I think it's come down to the innovation on top, which platform is going to be the most developer friendly? Right, and you know, I've not heard anything from the big carriers that they're going to be developer friendly. I've heard "we have proprietary data that we're going to charge access for and developers are going to have to pay for that." But I haven't heard them saying "Developers, developers, developers!" You know, Steve Bomber running around, like bend over backwards for developers, they're asking the developers to bend over. And so if a network can, let's say the satellite network is more developer friendly, you know, you're going to see more innovation there potentially. You know, or if a dish network says, "You know what? We're going after developers, we're going after innovation. We're not going to gouge them for all this network data. Rather we're going to make the platform open or maybe we're going to do an app store-like model where we take a piece of the action after they succeed." You know, take it out of the backend, like a Silicon Valley VC as opposed to an East Coast VC. They're not going to get you in the front end. (Lisa laughs) >> Well, you can see the sort of disruptive forces at play between open RAN and the legacy, call it proprietary stack, right? But what is the, you know, if that's sort of a horizontal disruptive model, what's the vertically disruptive model? Is it private networks coming in? Is it a private 5G network that comes in that says, "We're starting from the ground up, everything is containerized. We're going to go find people at KubeCon who are, who understand how to orchestrate with Kubernetes and use containers in microservices, and we're going to have this little 5G network that's going to deliver capabilities that you can't get from the big boys." Is there a way to monetize that? Is there a way for them to be disrupted, be disruptive, or are these private 5G networks that everybody's talking about just relegated to industrial use cases where you're just squeezing better economics out of wireless communication amongst all your devices in your factory? >> That's an interesting question. I mean, there are a lot of those smart factory industrial use cases. I mean, it's basically industry 4.0 use cases. But yeah, I don't count the cloud guys out. You know, everybody says, "oh, the narrative is, well, the latency of the cloud." Well, not if the cloud is at the edge. If you take a local zone and put storage, compute, and data right next to each other and the cloud model with the cloud APIs, and then you got an asynchronous, you know, connection back. I think that's a reasonable model. I think the cloud guys figured out developers, right? Pretty well. Certainly Microsoft and, and Amazon and Google, they know developers. I don't see any reason why they can't bring their model to the edge. So, and that's really disruptive to the legacy telco guys, you know? So they have to be careful. >> One step closer to my dream of eliminating the word "cloud" from IT lexicon. (Lisa laughs) I contend that it has always been IT, and it will always be IT. And this whole idea of cloud, what is cloud? If AWS, for example, is delivering hardware to the edge where it needs to be, is that cloud? Do we go back to the idea that cloud is an operational model and not a question of physical location? I hope we get to that point. >> Well, what's Apex and GreenLake? Apex is, you know, Dell's as a service. GreenLake is- >> HPE. >> HPE's as a service. That's outposts. >> Dave N.: Right. >> Yeah. >> That's their outpost. >> Yeah. >> Well AWS's position used to be, you know, to use them as a proxy for hyperscale cloud. We'll just, we'll grow in a very straight trajectory forever on the back of net new stuff. Forget about the old stuff. As James T. Kirk said of the Klingons, "let them die." (Lisa laughs) As far as the cloud providers were concerned just, yeah, let, let that old stuff go away. Well then they found out, there came a point in time where they realized there's a lot of friction and stickiness associated with that. So they had to deal with the reality of hybridity, if that's the word, the hybrid nature of things. So what are they doing? They're pushing stuff out to the edge, so... >> With the same operating model. >> With the same operating model. >> Similar. I mean, it's limited, right? >> So you see- >> You can't run a lot of database on outpost, you can run RES- >> You see this clash of Titans where some may have written off traditional IT infrastructure vendors, might have been written off as part of the past. Whereas hyperscale cloud providers represent the future. It seems here at this show they're coming head to head and competing evenly. >> And this is where I think a company like Dell or HPE or Cisco has some advantages in that they're not going to compete with the telcos, but the hyperscalers will. >> Lisa: Right. >> Right. You know, and they're already, Google's, how much undersea cable does Google own? A lot. Probably more than anybody. >> Well, we heard from Google and Microsoft this morning in the keynote. It'd be interesting to see if we hear from AWS and then over the next couple of days. But guys, clearly there is, this is a great wrap of day one. And the crazy thing is this is only day one. We've got three more days of coverage, more news, more information to break down and unpack on theCUBE. Look forward to doing that with you guys over the next three days. Thank you for sharing what you saw on the show floor, what you heard from our guests today as we had about 10 interviews. Appreciate your insights and your perspectives and can't wait for tomorrow. >> Right on. >> All right. For Dave Vellante and Dave Nicholson, I'm Lisa Martin. You're watching theCUBE's day one wrap from MWC 23. We'll see you tomorrow. (relaxing music)

Published Date : Feb 27 2023

SUMMARY :

that drive human progress. of coverage of the event. are going to say, you know what, of the telecom industry is, are going to be slower to move. And now they're, you know, Which is surprising to the I mean you see it on your phone I guess it comes down to economics. I had a chance to have some conversations And so the ability to have the flexibility I mean, what do you think the odds are What is that going to of discussion of, you know, the graphics are going to look better. And then, you know, early the 2,400 baud modem, you know, days, They're not going to get you that you can't get from the big boys." to the legacy telco guys, you know? dream of eliminating the word Apex is, you know, Dell's as a service. That's outposts. So they had to deal with I mean, it's limited, right? they're coming head to going to compete with the telcos, You know, and they're already, Google's, And the crazy thing is We'll see you tomorrow.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
TelcoORGANIZATION

0.99+

Dave NicholsonPERSON

0.99+

Lisa MartinPERSON

0.99+

Dave NicholsonPERSON

0.99+

DellORGANIZATION

0.99+

AmazonORGANIZATION

0.99+

CiscoORGANIZATION

0.99+

Dave VellantePERSON

0.99+

ComcastORGANIZATION

0.99+

Steve BomberPERSON

0.99+

GoogleORGANIZATION

0.99+

MicrosoftORGANIZATION

0.99+

Dave VellantePERSON

0.99+

DavePERSON

0.99+

Chris LewisPERSON

0.99+

AWSORGANIZATION

0.99+

James T. KirkPERSON

0.99+

LisaPERSON

0.99+

1996DATE

0.99+

EricssonORGANIZATION

0.99+

MotorolaORGANIZATION

0.99+

AmazonsORGANIZATION

0.99+

HPEORGANIZATION

0.99+

NetflixORGANIZATION

0.99+

Dave V.PERSON

0.99+

Dave N.PERSON

0.99+

1200QUANTITY

0.99+

twoQUANTITY

0.99+

tomorrowDATE

0.99+

first dayQUANTITY

0.99+

Dell TechnologiesORGANIZATION

0.99+

Barcelona, SpainLOCATION

0.99+

RakutenORGANIZATION

0.99+

2,400 baudQUANTITY

0.99+

telcosORGANIZATION

0.99+

bothQUANTITY

0.99+

2400 baudQUANTITY

0.99+

todayDATE

0.99+

ApexORGANIZATION

0.99+

San Ji ChohaORGANIZATION

0.99+

AOLORGANIZATION

0.99+

Silicon ValleyLOCATION

0.99+

300QUANTITY

0.99+

GooglesORGANIZATION

0.98+

2030DATE

0.98+

GreenLakeORGANIZATION

0.98+

iPhoneCOMMERCIAL_ITEM

0.98+

MWC 23EVENT

0.98+

day oneQUANTITY

0.98+

MWC 23EVENT

0.98+

X86COMMERCIAL_ITEM

0.97+

eight APIsQUANTITY

0.97+

OneQUANTITY

0.96+

2023DATE

0.96+

DishORGANIZATION

0.96+

PrimeCOMMERCIAL_ITEM

0.95+

this morningDATE

0.95+

Day 1QUANTITY

0.95+

a billion, a trillion and a halfQUANTITY

0.94+

Prime VideoCOMMERCIAL_ITEM

0.94+

three more daysQUANTITY

0.94+

AppleORGANIZATION

0.93+

firstQUANTITY

0.92+

Keynote Analysis with Sarbjeet Johal & Chris Lewis | MWC Barcelona 2023


 

(upbeat instrumental music) >> TheCUBE's live coverage is made possible by funding from Dell Technologies, creating technologies that drive human progress. (uplifting instrumental music) >> Hey everyone. Welcome to Barcelona, Spain. It's theCUBE Live at MWC '23. I'm Lisa Martin, Dave Vellante, our co-founder, our co-CEO of theCUBE, you know him, you love him. He's here as my co-host. Dave, we have a great couple of guests here to break down day one keynote. Lots of meat. I can't wait to be part of this conversation. Chris Lewis joins us, the founder and MD of Lewis Insight. And Sarbjeet Johal, one of you know him as well. He's a Cube contributor, cloud architect. Guys, welcome to the program. Thank you so much for joining Dave and me today. >> Lovely to be here. >> Thank you. >> Chris, I want to start with you. You have covered all aspects of global telecoms industries over 30 years working as an analyst. Talk about the evolution of the telecom industry that you've witnessed, and what were some of the things you heard in the keynote that excite you about the direction it's going? >> Well, as ever, MWC, there's no lack of glitz and glamour, but it's the underlying issues of the industry that are really at stake here. There's not a lot of new revenue coming into the telecom providers, but there's a lot of adjustment, readjustment of the underlying operational environment. And also, really importantly, what came out of the keynotes is the willingness and the necessity to really engage with the API community, with the developer community, people who traditionally, telecoms would never have even touched. So they're sorting out their own house, they're cleaning their own stables, getting the cost base down, but they're also now realizing they've got to engage with all the other parties. There's a lot of cloud providers here, there's a lot of other people from outside so they're realizing they cannot do it all themselves. It's quite a tough lesson for a very conservative, inward looking industry, right? So should we be spending all this money and all this glitz and glamour of MWC and all be here, or should would be out there really building for the future and making sure the services are right for yours and my needs in a business and personal lives? So a lot of new changes, a lot of realization of what's going on outside, but underlying it, we've just got to get this right this time. >> And it feels like that monetization is front and center. You mentioned developers, we've got to work with developers, but I'm hearing the latest keynote from the Ericsson CEOs, we're going to monetize through those APIs, we're going to charge the developers. I mean, first of all, Chris, am I getting that right? And Sarbjeet, as somebody who's close to the developer community, is that the right way to build bridges? But Chris, are we getting that right? >> Well, let's take the first steps first. So, Ericsson, of course, acquired Vonage, which is a massive API business so they want to make money. They expect to make money by bringing that into the mainstream telecom community. Now, whether it's the developers who pay for it, or let's face it, we are moving into a situation as the telco moves into a techco model where the techco means they're going to be selling bits of the technology to developer guys and to other application developers. So when he says he needs to charge other people for it, it's the way in which people reach in and will take going through those open APIs like the open gateway announced today, but also the way they'll reach in and take things like network slicing. So we're opening up the telecom community, the treasure chest, if you like, where developers' applications and other third parties can come in and take those chunks of technology and build them into their services. This is a complete change from the old telecom industry where everybody used to come and you say, "all right, this is my product, you've got to buy it and you're going to pay me a lot of money for it." So we are looking at a more flexible environment where the other parties can take those chunks. And we know we want collectivity built into our financial applications, into our government applications, everything, into the future of the metaverse, whatever it may be. But it requires that change in attitude of the telcos. And they do need more money 'cause they've said, the baseline of revenue is pretty static, there's not a lot of growth in there so they're looking for new revenues. It's in a B2B2X time model. And it's probably the middle man's going to pay for it rather than the customer. >> But the techco model, Sarbjeet, it looks like the telcos are getting their money on their way in. The techco company model's to get them on their way out like the app store. Go build something of value, build some kind of app or data product, and then when it takes off, we'll take a piece of the action. What are your thoughts from a developer perspective about how the telcos are approaching it? >> Yeah, I think before we came here, like I said, I did some tweets on this, that we talk about all kind of developers, like there's game developers and front end, back end, and they're all talking about like what they're building on top of cloud, but nowhere you will hear the term "telco developer," there's no API from telcos given to the developers to build IoT solutions on top of it because telco as an IoT, I think is a good sort of hand in hand there. And edge computing as well. The glimmer of hope, if you will, for telcos is the edge computing, I believe. And even in edge, I predicted, I said that many times that cloud players will dominate that market with the private 5G. You know that story, right? >> We're going to talk about that. (laughs) >> The key is this, that if you see in general where the population lives, in metros, right? That's where the world population is like flocking to and we have cloud providers covering the local zones with local like heavy duty presence from the big cloud providers and then these telcos are getting sidetracked by that. Even the V2X in cars moving the autonomous cars and all that, even in that space, telcos are getting sidetracked in many ways. What telcos have to do is to join the forces, build some standards, if not standards, some consortium sort of. They're trying to do that with the open gateway here, they have only eight APIs. And it's 2023, eight APIs is nothing, right? (laughs) So they should have started this 10 years back, I think. So, yeah, I think to entice the developers, developers need the employability, we need to train them, we need to show them some light that hey, you can build a lot on top of it. If you tell developers they can develop two things or five things, nobody will come. >> So, Chris, the cloud will dominate the edge. So A, do you buy it? B, the telcos obviously are acting like that might happen. >> Do you know I love people when they've got their heads in the clouds. (all laugh) And you're right in so many ways, but if you flip it around and think about how the customers think about this, business customers and consumers, they don't care about all this background shenanigans going on, do they? >> Lisa: No. >> So I think one of the problems we have is that this is a new territory and whether you call it the edge or whatever you call it, what we need there is we need connectivity, we need security, we need storage, we need compute, we need analytics, and we need applications. And are any of those more important than the others? It's the collective that actually drives the real value there. So we need all those things together. And of course, the people who represented at this show, whether it's the cloud guys, the telcos, the Nokia, the Ericssons of this world, they all own little bits of that. So that's why they're all talking partnerships because they need the combination, they cannot do it on their own. The cloud guys can't do it on their own. >> Well, the cloud guys own all of those things that you just talked about though. (all laugh) >> Well, they don't own the last bit of connectivity, do they? They don't own the access. >> Right, exactly. That's the one thing they don't own. So, okay, we're back to pipes, right? We're back to charging for connectivity- >> Pipes are very valuable things, right? >> Yeah, for sure. >> Never underestimate pipes. I don't know about where you live, plumbers make a lot of money where I live- >> I don't underestimate them but I'm saying can the telcos charge for more than that or are the cloud guys going to mop up the storage, the analytics, the compute, and the apps? >> They may mop it up, but I think what the telcos are doing and we've seen a lot of it here already, is they are working with all those major cloud guys already. So is it an unequal relationship? The cloud guys are global, massive global scale, the telcos are fundamentally national operators. >> Yep. >> Some have a little bit of regional, nobody has global scale. So who stitches it all together? >> Dave: Keep your friends close and your enemies closer. >> Absolutely. >> I know that saying never gets old. It's true. Well, Sarbjeet, one of the things that you tweeted about, I didn't get to see the keynote but I was looking at your tweets. 46% of telcos think they won't make it to the next decade. That's a big number. Did that surprise you? >> No, actually it didn't surprise me because the competition is like closing in on them and the telcos are competing with telcos as well and the telcos are competing with cloud providers on the other side, right? So the smaller ones are getting squeezed. It's the bigger players, they can hook up the newer platforms, I think they will survive. It's like that part is like any other industry, if you will. But the key is here, I think why the pain points were sort of described on the main stage is that they're crying out loud to tell the big tech cloud providers that "hey, you pay your fair share," like we talked, right? You are not paying, you're generating so much content which reverses our networks and you are not paying for it. So they are not able to recoup the cost of laying down their networks. By the way, one thing actually I want to mention is that they said the cloud needs earth. The cloud and earth, it's like there's no physical need to cloud, you know that, right? So like, I think it's the other way around. I think the earth needs the cloud because I'm a cloud guy. (Sarbjeet and Lisa laugh) >> I think you need each other, right? >> I think so too. >> They need each other. When they said cloud needs earth, right? I think they're still in denial that the cloud is a big force. They have to partner. When you can't compete with somebody, what do you do? Partner with them. >> Chris, this is your world. Are they in denial? >> No, I think they're waking up to the pragmatism of the situation. >> Yeah. >> They're building... As we said, most of the telcos, you find have relationships with the cloud guys, I think you're right about the industry. I mean, do you think what's happened since US was '96, the big telecom act when we started breaking up all the big telcos and we had lots of competition came in, we're seeing the signs that we might start to aggregate them back up together again. So it's been an interesting experiment for like 30 years, hasn't it too? >> It made the US less competitive, I would argue, but carry on. >> Yes, I think it's true. And Europe is maybe too competitive and therefore, it's not driven the investment needed. And by the way, it's not just mobile, it's fixed as well. You saw the Orange CEO was talking about the her investment and the massive fiber investments way ahead of many other countries, way ahead of the UK or Germany. We need that fiber in the ground to carry all your cloud traffic to do this. So there is a scale issue, there is a competition issue, but the telcos are very much aware of it. They need the cloud, by the way, to improve their operational environments as well, to change that whole old IT environment to deliver you and I better service. So no, it absolutely is changing. And they're getting scale, but they're fundamentally offering the basic product, you call it pipes, I'll just say they're offering broadband to you and I and the business community. But they're stepping on dangerous ground, I think, when saying they want to charge the over the top guys for all the traffic they use. Those over the top guys now build a lot of the global networks, the backbone submarine network. They're putting a lot of money into it, and by giving us endless data for our individual usage, that cat is out the bag, I think to a large extent. >> Yeah. And Orange CEO basically said that, that they're not paying their fair share. I'm for net neutrality but the governments are going to have to fund this unless you let us charge the OTT. >> Well, I mean, we could of course renationalize. Where would that take us? (Dave laughs) That would make MWC very interesting next year, wouldn't it? To renationalize it. So, no, I think you've got to be careful what we wish for here. Creating the absolute clear product that is required to underpin all of these activities, whether it's IoT or whether it's cloud delivery or whether it's just our own communication stuff, delivering that absolutely ubiquitously high quality for business and for consumer is what we have to do. And telcos have been too conservative in the past. >> I think they need to get together and create standards around... I think they have a big opportunity. We know that the clouds are being built in silos, right? So there's Azure stack, there's AWS and there's Google. And those are three main ones and a few others, right? So that we are fighting... On the cloud side, what we are fighting is the multicloud. How do we consume that multicloud without having standards? So if these people get together and create some standards around IoT and edge computing sort of area, people will flock to them to say, "we will use you guys, your API, we don't care behind the scenes if you use AWS or Google Cloud or Azure, we will come to you." So market, actually is looking for that solution. I think it's an opportunity for these guys, for telcos. But the problem with telcos is they're nationalized, as you said Chris versus the cloud guys are still kind of national in a way, but they're global corporations. And some of the telcos are global corporations as well, BT covers so many countries and TD covers so many... DT is in US as well, so they're all over the place. >> But you know what's interesting is that the TM forum, which is one of the industry associations, they've had an open digital architecture framework for quite some years now. Google had joined that some years ago, Azure in there, AWS just joined it a couple of weeks ago. So when people said this morning, why isn't AWS on the keynote? They don't like sharing the limelight, do they? But they're getting very much in bed with the telco. So I think you'll see the marriage. And in fact, there's a really interesting statement, if you look at the IoT you mentioned, Bosch and Nokia have been working together 'cause they said, the problem we've got, you've got a connectivity network on one hand, you've got the sensor network on the other hand, you're trying to merge them together, it's a nightmare. So we are finally seeing those sort of groups talking to each other. So I think the standards are coming, the cooperation is coming, partnerships are coming, but it means that the telco can't dominate the sector like it used to. It's got to play ball with everybody else. >> I think they have to work with the regulators as well to loosen the regulation. Or you said before we started this segment, you used Chris, the analogy of sports, right? In sports, when you're playing fiercely, you commit the fouls and then ask for ref to blow the whistle. You're now looking at the ref all the time. The telcos are looking at the ref all the time. >> Dave: Yeah, can I do this? Can I do that? Is this a fair move? >> They should be looking for the space in front of the opposition. >> Yeah, they should be just on attack mode and commit these fouls, if you will, and then ask for forgiveness then- >> What do you make of that AWS not you there- >> Well, Chris just made a great point that they don't like to share the limelight 'cause I thought it was very obvious that we had Google Cloud, we had Microsoft there on day one of this 80,000 person event. A lot of people back from COVID and they weren't there. But Chris, you brought up a great point that kind of made me think, maybe you're right. Maybe they're in the afternoon keynote, they want their own time- >> You think GSMA invited them? >> I imagine so. You'd have to ask GSMA. >> I would think so. >> Get Max on here and ask that. >> I'm going to ask them, I will. >> But no, and they don't like it because I think the misconception, by the way, is that everyone says, "oh, it's AWS, it's Google Cloud and it's Azure." They're not all the same business by any stretch of the imagination. AWS has been doing loads of great work, they've been launching private network stuff over the last couple of weeks. Really interesting. Google's been playing catch up. We know that they came in readily late to the market. And Azure, they've all got slightly different angles on it. So perhaps it just wasn't right for AWS and the way they wanted to pitch things so they don't have to be there, do they? >> That's a good point. >> But the industry needs them there, that's the number one cloud. >> Dave, they're there working with the industry. >> Yeah, of course. >> They don't have to be on the keynote stage. And in fact, you think about this show and you mentioned the 80,000 people, the activity going on around in all these massive areas they're in, it's fantastic. That's where the business is done. The business isn't done up on the keynote stage. >> That's why there's the glitz and the glamour, Chris. (all laugh) >> Yeah. It's not glitz, it's espresso. It's not glamour anymore, it's just espresso. >> We need the espresso. >> Yeah. >> I think another thing is that it's interesting how an average European sees the tech market and an average North American, especially you from US, you have to see the market. Here, people are more like process oriented and they want the rules of the road already established before they can take a step- >> Chris: That's because it's your pension in the North American- >> Exactly. So unions are there and the more employee rights and everything, you can't fire people easily here or in Germany or most of the Europe is like that with the exception of UK. >> Well, but it's like I said, that Silicone Valley gets their money on the way out, you know? And that's how they do it, that's how they think it. And they don't... They ask for forgiveness. I think the east coast is more close to Europe, but in the EU, highly regulated, really focused on lifetime employment, things like that. >> But Dave, the issue is the telecom industry is brilliant, right? We keep paying every month whatever we do with it. >> It's a great business, to your point- >> It's a brilliant business model. >> Dave: It's fantastic. >> So it's about then getting the structure right behind it. And you know, we've seen a lot of stratification where people are selling off towers, Orange haven't sold their towers off, they made a big point about that. Others are selling their towers off. Some people are selling off their underlying network, Telecom Italia talking about KKR buying the whole underlying network. It's like what do you want to be in control of? It's a great business. >> But that's why they complain so much is that they're having to sell their assets because of the onerous CapEx requirements, right? >> Yeah, they've had it good, right? And dare I say, perhaps they've not planned well enough for the future. >> They're trying to protect their past from the future. I mean, that's... >> Actually, look at the... Every "n" number of years, there's a new faster network. They have to dig the ground, they have to put the fiber, they have to put this. Now, there are so many booths showing 6G now, we are not even done with 5G yet, now the next 6G you know, like then- >> 10G's coming- >> 10G, that's a different market. (Dave laughs) >> Actually, they're bogged down by the innovation, I think. >> And the generational thing is really important because we're planning for 6G in all sorts of good ways but actually what we use in our daily lives, we've gone through the barrier, we've got enough to do that. So 4G gives us enough, the fiber in the ground or even old copper gives us enough. So the question is, what are we willing to pay for more than that basic connectivity? And the answer to your point, Dave, is not a lot, right? So therefore, that's why the emphasis is on the business market on that B2B and B2B2X. >> But we'll pay for Netflix all day long. >> All day long. (all laugh) >> The one thing Chris, I don't know, I want to know your viewpoints and we have talked in the past as well, there's absence of think tanks in tech, right? So we have think tanks on the foreign policy and economic policy in every country, and we have global think tanks, but tech is becoming a huge part of the economy, global economy as well as national economies, right? But we don't have think tanks on like policy around tech. For example, this 4G is good for a lot of use cases. Then 5G is good for smaller number of use cases. And then 6G will be like, fewer people need 6G for example. Why can't we have sort of those kind of entities dictating those kind of like, okay, is this a wiser way to go about it? >> Lina Khan wants to. She wants to break up big tech- >> You're too young to remember but the IT used to have a show every four years in Geneva, there were standards around there. So I think there are bodies. I think the balance of power obviously has gone from the telecom to the west coast to the IT markets. And it's changing the balance about, it moves more quickly, right? Telecoms has never moved quickly enough. I think there is hope by the way, that telecoms now that we are moving to more softwarized environment, and God forbid, we're moving into CICD in the telecom world, right? Which is a massive change, but I think there's hopes for it to change. The mentality is changing, the culture is changing, but to change those old structured organizations from the British telecom or the France telecom into the modern world, it's a hell of a long journey. It's not an overnight journey at all. >> Well, of course the theme of the event is velocity. >> Yeah, I know that. >> And it's been interesting sitting here with the three of you talking about from a historic perspective, how slow and molasseslike telecom has been. They don't have a choice anymore. As consumers, we have this expectation we're going to get anything we want on our mobile device, 24 by seven. We don't care about how the sausage is made, we just want the end result. So do you really think, and we're only on day one guys... And Chris we'll start with you. Is the theme really velocity? Is it disruption? Are they able to move faster? >> Actually, I think invisibility is the real answer. (Lisa laughs) We want communication to be invisible, right? >> Absolutely. >> We want it to work. When we switch our phones on, we want it to work and we want to... Well, they're not even phones anymore, are they really? I mean that's the... So no, velocity, we've got... There is momentum in the industry, there's no doubt about that. The cloud guys coming in, making telecoms think about the way they run their own business, where they meet, that collision point on the edges you talked about Sarbjeet. We do have velocity, we've got momentum. There's so many interested parties. The way I think of this is that the telecom industry used to be inward looking, just design its own technology and then expect everyone else to dance to our tune. We're now flipping that 180 degrees and we are now having to work with all the different outside forces shaping us. Whether it's devices, whether it's smart cities, governments, the hosting guys, the Equinoxis, all these things. So everyone wants a piece of this telecom world so we've got to make ourselves more open. That's why you get in a more open environment. >> But you did... I just want to bring back a point you made during COVID, which was when everybody switched to work from home, started using their landlines again, telcos had to respond and nothing broke. I mean, it was pretty amazing. >> Chris: It did a good job. >> It was kind of invisible. So, props to the telcos for making that happen. >> They did a great job. >> So it really did. Now, okay, what have you done for me lately? So now they've got to deal with the future and they're talking monetization. But to me, monetization is all about data and not necessarily just the network data. Yeah, they can sell that 'cause they own that but what kind of incremental value are they going to create for the consumers that... >> Yeah, actually that's a problem. I think the problem is that they have been strangled by the regulation for a long time and they cannot look at their data. It's a lot more similar to the FinTech world, right? I used to work at Visa. And then Visa, we did trillion dollars in transactions in '96. Like we moved so much money around, but we couldn't look at these things, right? So yeah, I think regulation is a problem that holds you back, it's the antithesis of velocity, it slows you down. >> But data means everything, doesn't it? I mean, it means everything and nothing. So I think the challenge here is what data do the telcos have that is useful, valuable to me, right? So in the home environment, the fact that my broadband provider says, oh, by the way, you've got 20 gadgets on that network and 20 on that one... That's great, tell me what's on there. I probably don't know what's taking all my valuable bandwidth up. So I think there's security wrapped around that, telling me the way I'm using it if I'm getting the best out of my service. >> You pay for that? >> No, I'm saying they don't do it yet. I think- >> But would you pay for that? >> I think I would, yeah. >> Would you pay a lot for that? I would expect it to be there as part of my dashboard for my monthly fee. They're already charging me enough. >> Well, that's fine, but you pay a lot more in North America than I do in Europe, right? >> Yeah, no, that's true. >> You're really overpaying over there, right? >> Way overpaying. >> So, actually everybody's looking at these devices, right? So this is a radio operated device basically, right? And then why couldn't they benefit from this? This is like we need to like double click on this like 10 times to find out why telcos failed to leverage this device, right? But I think the problem is their reliance on regulations and their being close to the national sort of governments and local bodies and authorities, right? And in some countries, these telcos are totally controlled in very authoritarian ways, right? It's not like open, like in the west, most of the west. Like the world is bigger than five, six countries and we know that, right? But we end up talking about the major economies most of the time. >> Dave: Always. >> Chris: We have a topic we want to hit on. >> We do have a topic. Our last topic, Chris, it's for you. You guys have done an amazing job for the last 25 minutes talking about the industry, where it's going, the evolution. But Chris, you're registered blind throughout your career. You're a leading user of assertive technologies. Talk about diversity, equity, inclusion, accessibility, some of the things you're doing there. >> Well, we should have had 25 minutes on that and five minutes on- (all laugh) >> Lisa: You'll have to come back. >> Really interesting. So I've been looking at it. You're quite right, I've been using accessible technology on my iPhone and on my laptop for 10, 20 years now. It's amazing. And what I'm trying to get across to the industry is to think about inclusive design from day one. When you're designing an app or you're designing a service, make sure you... And telecom's a great example. In fact, there's quite a lot of sign language around here this week. If you look at all the events written, good to see that coming in. Obviously, no use to me whatsoever, but good for the hearing impaired, which by the way is the biggest category of disability in the world. Biggest chunk is hearing impaired, then vision impaired, and then cognitive and then physical. And therefore, whenever you're designing any service, my call to arms to people is think about how that's going to be used and how a blind person might use it or how a deaf person or someone with physical issues or any cognitive issues might use it. And a great example, the GSMA and I have been talking about the app they use for getting into the venue here. I downloaded it. I got the app downloaded and I'm calling my guys going, where's my badge? And he said, "it's top left." And because I work with a screen reader, they hadn't tagged it properly so I couldn't actually open my badge on my own. Now, they changed it overnight so it worked this morning, which is fantastic work by Trevor and the team. But it's those things that if you don't build it in from scratch, you really frustrate a whole group of users. And if you think about it, people with disabilities are excluded from so many services if they can't see the screen or they can't hear it. But it's also the elderly community who don't find it easy to get access to things. Smart speakers have been a real blessing in that respect 'cause you can now talk to that thing and it starts talking back to you. And then there's the people who can't afford it so we need to come down market. This event is about launching these thousand dollars plus devices. Come on, we need below a hundred dollars devices to get to the real mass market and get the next billion people in and then to educate people how to use it. And I think to go back to your previous point, I think governments are starting to realize how important this is about building the community within the countries. You've got some massive projects like NEOM in Saudi Arabia. If you have a look at that, if you get a chance, a fantastic development in the desert where they're building a new city from scratch and they're building it so anyone and everyone can get access to it. So in the past, it was all done very much by individual disability. So I used to use some very expensive, clunky blind tech stuff. I'm now using mostly mainstream. But my call to answer to say is, make sure when you develop an app, it's accessible, anyone can use it, you can talk to it, you can get whatever access you need and it will make all of our lives better. So as we age and hearing starts to go and sight starts to go and dexterity starts to go, then those things become very useful for everybody. >> That's a great point and what a great champion they have in you. Chris, Sarbjeet, Dave, thank you so much for kicking things off, analyzing day one keynote, the ecosystem day, talking about what velocity actually means, where we really are. We're going to have to have you guys back 'cause as you know, we can keep going, but we are out of time. But thank you. >> Pleasure. >> We had a very spirited, lively conversation. >> Thanks, Dave. >> Thank you very much. >> For our guests and for Dave Vellante, I'm Lisa Martin, you're watching theCUBE live in Barcelona, Spain at MWC '23. We'll be back after a short break. See you soon. (uplifting instrumental music)

Published Date : Feb 27 2023

SUMMARY :

that drive human progress. the founder and MD of Lewis Insight. of the telecom industry and making sure the services are right is that the right way to build bridges? the treasure chest, if you like, But the techco model, Sarbjeet, is the edge computing, I believe. We're going to talk from the big cloud providers So, Chris, the cloud heads in the clouds. And of course, the people Well, the cloud guys They don't own the access. That's the one thing they don't own. I don't know about where you live, the telcos are fundamentally Some have a little bit of regional, Dave: Keep your friends Well, Sarbjeet, one of the and the telcos are competing that the cloud is a big force. Are they in denial? to the pragmatism of the situation. the big telecom act It made the US less We need that fiber in the ground but the governments are conservative in the past. We know that the clouds are but it means that the telco at the ref all the time. in front of the opposition. that we had Google Cloud, You'd have to ask GSMA. and the way they wanted to pitch things But the industry needs them there, Dave, they're there be on the keynote stage. glitz and the glamour, Chris. It's not glitz, it's espresso. sees the tech market and the more employee but in the EU, highly regulated, the issue is the telecom buying the whole underlying network. And dare I say, I mean, that's... now the next 6G you know, like then- 10G, that's a different market. down by the innovation, I think. And the answer to your point, (all laugh) on the foreign policy Lina Khan wants to. And it's changing the balance about, Well, of course the theme Is the theme really velocity? invisibility is the real answer. is that the telecom industry But you did... So, props to the telcos and not necessarily just the network data. it's the antithesis of So in the home environment, No, I'm saying they don't do it yet. Would you pay a lot for that? most of the time. topic we want to hit on. some of the things you're doing there. So in the past, We're going to have to have you guys back We had a very spirited, See you soon.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
NokiaORGANIZATION

0.99+

ChrisPERSON

0.99+

Lisa MartinPERSON

0.99+

Chris LewisPERSON

0.99+

DavePERSON

0.99+

EuropeLOCATION

0.99+

Dave VellantePERSON

0.99+

Lina KhanPERSON

0.99+

LisaPERSON

0.99+

BoschORGANIZATION

0.99+

GermanyLOCATION

0.99+

EricssonORGANIZATION

0.99+

Telecom ItaliaORGANIZATION

0.99+

SarbjeetPERSON

0.99+

AWSORGANIZATION

0.99+

KKRORGANIZATION

0.99+

20 gadgetsQUANTITY

0.99+

GenevaLOCATION

0.99+

25 minutesQUANTITY

0.99+

10 timesQUANTITY

0.99+

Saudi ArabiaLOCATION

0.99+

USLOCATION

0.99+

GoogleORGANIZATION

0.99+

Sarbjeet JohalPERSON

0.99+

TrevorPERSON

0.99+

OrangeORGANIZATION

0.99+

180 degreesQUANTITY

0.99+

30 yearsQUANTITY

0.99+

five minutesQUANTITY

0.99+

iPhoneCOMMERCIAL_ITEM

0.99+

EricssonsORGANIZATION

0.99+

North AmericaLOCATION

0.99+

telcoORGANIZATION

0.99+

20QUANTITY

0.99+

46%QUANTITY

0.99+

threeQUANTITY

0.99+

Dell TechnologiesORGANIZATION

0.99+

next yearDATE

0.99+

Barcelona, SpainLOCATION

0.99+

'96DATE

0.99+

GSMAORGANIZATION

0.99+

telcosORGANIZATION

0.99+

VisaORGANIZATION

0.99+

trillion dollarsQUANTITY

0.99+

thousand dollarsQUANTITY

0.99+

Amir Khan & Atif Khan, Alkira | Supercloud2


 

(lively music) >> Hello, everyone. Welcome back to the Supercloud presentation here. I'm theCUBE, I'm John Furrier, your host. What a great segment here. We're going to unpack the networking aspect of the cloud, how that translates into what Supercloud architecture and platform deployment scenarios look like. And demystify multi-cloud, hybridcloud. We've got two great experts. Amir Khan, the Co-Founder and CEO of Alkira, Atif Khan, Co-Founder and CTO of Alkira. These guys been around since 2018 with the startup, but before that story, history in the tech industry. I mean, routing early days, multiple waves, multiple cycles. >> Welcome three decades. >> Welcome to Supercloud. >> Thanks. >> Thanks for coming on. >> Thank you so much for having us. >> So, let's get your take on Supercloud because it's been one of those conversations that really galvanized the industry because it kind of highlights almost this next wave, this next side of the street that everyone's going to be on that's going to be successful. The laggards on the legacy seem to be stuck on the old model. SaaS is growing up, it's ISVs, it's ecosystems, hyperscale, full hybrid. And then multi-cloud around the corners cause all this confusion, everyone's hand waving. You know, this is a solution, that solution, where are we? What do you guys see as this supercloud dynamic? >> So where we start from is always focusing on the customer problem. And in 2018 when we identified the problem, we saw that there were multiple clouds with many diverse ways of doing things from the network perspective, and customers were struggling with that. So we delved deeper into that and looked at each one of the cloud architectures completely independent. And there was no common solution and customers were struggling with that from the perspective. They wanted to be in multiple clouds, either through mergers and acquisitions or running an application which may be more cost effective to run in something or maybe optimized for certain reasons to run in a different cloud. But from the networking perspective, everything needed to come together. So that's, we are starting to define it as a supercloud now, but basically, it's a common infrastructure across all clouds. And then integration of high lift services like, you know, security or IPAM services or many other types of services like inter-partner routing and stuff like that. So, Amir, you agree then that multi-cloud is simply a default result of having whatever outcomes, either M&A, some productivity software, maybe Azure. >> Yes. >> Amazon has this and then I've got on-premise application, so it's kinds mishmash. >> So, I would qualify it with hybrid multi-cloud because everything is going to be interconnected. >> John: Got it. >> Whether it's on-premise, remote users or clouds. >> But have CTO perspective, obviously, you got developers, multiple stacks, got AWS, Azure and GCP, other. Not everyone wants to kind of like go all in, but yet they don't want to hedge too much because it's a resource issue. And I got to learn this stack, I got to learn that stack. So then now, you have this default multi-cloud, hybrid multi-cloud, then it's like, okay, what do I do? How do you spread that around? Is it dangerous? What's the the approach technically? What's some of the challenges there? >> Yeah, certainly. John, first, thanks for having us here. So, before I get to that, I'll just add a little bit to what Amir was saying, like how we started, what we were seeing and how it, you know, correlates with the supercloud. So, as you know, before this company, Alkira, we were doing, we did the SD-WAN company, which was Viptela. So there, we started seeing when people started deploying SD-WAN at like a larger scale. We started like, you know, customers coming to us and saying they needed connectivity into the cloud from the SD-WAN. They wanted to extend the SD-WAN fabric to the cloud. So we came up with an architecture, which was like later we started calling them Cloud onRamps, where we built, you know, a transit VPC and put like the virtual instances of SD-WAN appliances extended from there to the cloud. But before we knew, like it started becoming very complicated for the customers because it wasn't just connectivity, it also required, you know, other use cases. You had to instantiate or bring in security appliances in there. You had to secure all of that stuff. There were requirements for, you know, different regions. So you had to bring up the same thing in different regions. Then multiple clouds, what did you do? You had to replicate the same thing in multiple clouds. And now if there was was requirement between clouds, how were you going to do it? You had to route traffic from somewhere, and come up with all those routing controls and stuff. So, it was very complicated. >> Like spaghetti code, but on network. >> The games begin, in fact, one of our customers called it spaghetti mess. And so, that's where like we thought about where was the industry going and which direction the industry was going into? And we came up with the Alkira where what we are doing is building a common infrastructure across multiple clouds, across in, you know, on-prem locations, be it data centers or physical sites, branches sites, et cetera, with integrated security and network networking services inside. And, you know, nowadays, networking is not only about connectivity, you have to secure everything. So, security has to be built in. Redundancy, high availability, disaster recovery. So all of that needs to be built in. So that's like, you know, kind of a definition of like what we thought at that time, what is turning into supercloud now. >> Yeah. It's interesting too, you mentioned, you know, VPCs is not, configuration of loans a hassle. Nevermind the manual mistakes could be made, but as you decide to do something you got to, "Oh, we got to get these other things." A lot of the hyper scales and a lot of the alpha cloud players now, and cloud native folks, they're kind of in that mode of, "Wow, look at what we've built." Now, they're got to maintain, how do I refresh it? Like, how do I keep the talent? So they got this similar chaotic environment where it's like, okay, now they're already already through, so I think they're going to be okay. But then some people want to bypass it completely. So there's a lot of customers that we see out there that fit the makeup of, I'm cloud first, I've lifted and shifted, I move some stuff to the cloud. But I want to bypass all that learnings from all the people that are gone through the past three years. Can I just skip that and go to a multi-cloud or coherent infrastructure? What do you think about that? What's your view? >> So yeah, so if you look at these enterprises, you know, many of them just to find like the talent, which for one cloud as far as the IT staff is concerned, it's hard enough. And now, when you have multiple clouds, it's hard to find people the talent which is, you know, which has expertise across different clouds. So that's where we come into the picture. So our vision was always to simplify all of this stuff. And simplification, it cannot be just simplification because you cannot just automate the workflows of the cloud providers underneath. So you have to, you know, provide your full data plane on top of it, fed full control plane, management plane, policy and management on top of it. And coming back to like your question, so these nowadays, those people who are working on networking, you know, before it used to be like CLI. You used to learn about Cisco CLI or Juniper CLI, and you used to work on it. Nowadays, it's very different. So automation, programmability, all of that stuff is the key. So now, you know, Ops guys, the DevOps guys, so these are the people who are in high demand. >> So what do you think about the folks out there that are saying, okay, you got a lot of fragmentation. I got the stacks, I got a lot of stove pipes, if you will, out there on the stack. I got to learn this from Azure. Can you guys have with your product abstract the way that's so developers don't need to know the ins and outs of stack's, almost like a gateway, if you will, the old days. But like I'm a developer or team develop, why should I have to learn the management layer of Azure? >> That's exactly what we started, you know, out with to solve. So it's, what we have built is a platform and the platform sits inside the cloud. And customers are able to build their own network or a virtual network on top using that platform. So the platform has its own data plane, own control plane and management plane with a policy layer on top of it. So now, it's the platform which is sitting in different clouds, but from a customer's point of view, it's one way of doing networking. One way of instantiating or bringing in services or security services in the middle. Whether those are our security services or whether those are like services from our partners, like Palo Alto or Checkpoint or Cisco. >> So you guys brought the SD-WAN mojo and refactored it for the cloud it sounds like. >> No. >> No? (chuckles) >> We cannot said. >> All right, explain. >> It's way more than that. >> I mean, SD-WAN was wan. I mean, you're talking about wide area networks, talking about connected, so explain the difference. >> SD-WAN was primarily done for one major reason. MPLS was expensive, very strong SLAs, but very low speed. Internet, on the other hand, you sat at home and you could access your applications much faster. No SLA, very low cost, right? So we wanted to marry the two together so you could have a purely private infrastructure and a public infrastructure and secure both of them by creating a common secure fabric across all those environments. And then seamlessly tying it into your internal branch and data center and cloud network. So, it merely brought you to the edge of the cloud. It didn't do anything inside the cloud. Now, the major problem resides inside the clouds where you have to optimize the clouds themselves. Take a step back. How were the clouds built? Basically, the cloud providers went to the Ciscos and Junipers and the rest of the world, built the network in the data centers or across wide area infrastructure, and brought it all together and tried to create a virtualized layer on top of that. But there were many limitations of this underlying infrastructure that they had built. So number of routes per region, how inter region connectivity worked, or how many routes you could carry to the VPCs of V nets? That all those were becoming no common policy across, you know, these environments, no segmentation across these environments, right? So the networking constructs that the enterprise customers were used to as enterprise class carry class capabilities, they did not exist in the cloud. So what did the customer do? They ended up stitching it together all manually. And that's why Atif was alluding to earlier that it became a spaghetti mess for the customers. And then what happens is, as a result, day two operations, you know, troubleshooting, everything becomes a nightmare. So what do you do? You have to build an infrastructure inside the cloud. Cloud has enough raw capabilities to build the solutions inside there. Netflix's of the world. And many different companies have been born in the cloud and evolved from there. So why could we not take the raw capabilities of the clouds and build a network cloud or a supercloud on top of these clouds to optimize the whole infrastructure and seamlessly connecting it into the on-premise and remote user locations, right? So that's your, you know, hybrid multi-cloud solution. >> Well, great call out on the SD-WAN in common versus cloud. 'Cause I think this is important because you're building a network layer in the cloud that spans out so the customers don't have to get into the, there's a gap in the system that I'm used to, my operating environment, of having lockdown security and network. >> So yeah. So what you do is you use the raw capabilities like bandwidth or virtual machines, or you know, containers, or, you know, different types of serverless capabilities. And you bring it all together in a way to solve the networking problems, thereby creating a supercloud, which is an abstraction layer which hides all the complexity of the underlying clouds from the customer, right? And it provides a common infrastructure across all environments to that customer, right? That's the beauty of it. And it does it in a way that it looks like, if they have the networking knowledge, they can apply it to this new environment and carry it forward. One way of doing security across all clouds and hybrid environments. One way of doing routing. One way of doing large-scale network address translation. One way of doing IPAM services. So people are tired of doing individual things and individual clouds and on-premise locations, right? So now they're getting something common. >> You guys brought that, you brought all that to bear and flexible for the customer to essentially self-serve their network cloud. >> Yes, yeah. Is that the wave? >> And nowadays, from business perspective, agility is the key, right? You have to move at the pace of the business. If you don't, you are losing. >> So, would it be safe to say that you guys have a network supercloud? >> Absolutely, yeah. >> We, pretty much, yeah. Absolutely. >> What does that mean to our customer? What's in it for them? What's the benefit to the customer? I got a network supercloud, it connects, provides SLA, all the capabilities I need. What do they get? What's the end point for them? What's the end? >> Atif, maybe you can talk some examples. >> The IT infrastructure is all like distributed now, right? So you have applications running in data centers. You have applications running in one cloud. Other cloud, public clouds, enterprises are depending on so many SaaS applications. So now, these are, you can call these endpoints. So a supercloud or a network cloud, from our perspective, it's a cloud in the middle or a network in the middle, which provides connectivity from any endpoint to any endpoint. So, you are able to connect to the supercloud or network cloud in one way no matter where you are. So now, whichever cloud you are in, whichever cloud you need to connect to. And also, it's not just connecting to the cloud. So you need to do a lot of stuff, a lot of networking inside the cloud also. So now, as Amir was saying, every cloud has its own from a networking, you know, the concept perspective or the construct, they are different. There are limitations in there also. So this supercloud, which is sitting on top, basically, your platform is sitting into the cloud, but the supercloud is built on top of using your platform. So that abstracts all those complexities, all those limitations. So now your limitations are whatever the limitations of that platform are. So now your platform, that platform is in our control. So we can keep building it, we can keep scaling it horizontally. Because one of the things is that, you know, in this cloud era, one of the things is autoscaling these services. So why can't the network now autoscale also, just like your other services. >> Network autoscaling is a genius idea, and I think that's a killer. I want to ask the the follow on question because I think, first of all, I love what you guys are doing. So, I think it's a great example of this new innovation. It's not obvious until you see it, right? Geographical is huge. So, you know, single instance, global instances, multiple instances, you're seeing global. How do you guys look at that global equation? Because as companies expand their clouds into geos, and then ultimately, you know, it's obviously continent, region and locales. You're going to have geographic issues. So, this is an extension of your network cloud? >> Amir: It is the extension of the network cloud because if you look at this hyperscalers, they're sitting pretty much everywhere in the globe. So, wherever their regions are, the beauty of building a supercloud is that you can by definition, be available in those regions. It literally takes a day or two of testing for our stack to run in those regions, to make sure there are no nuances that we run into, you know, for that region. The moment we bring it up in that region, all customers can onboard into that solution. So literally, what used to take months or years to build a global infrastructure, now, you can configure it in 10 minutes basically, and bring it up in less than one hour. Since when did we see any solution- >> And by the way, >> that can come up with. >> when the edge comes out too, you're going to start to see more clouds get bolted on. >> Exactly. And you can expand to the edge of the network. That's why we call cloud the new edge, right? >> John: Yeah, it is. Now, I think you guys got a good solutions, network clouds, superclouds, good. So the question on the premise side, so I get the cloud play. It's very cool. You can expand out. It's a nice layer. I'm sure you manage the SLAs between latency and all kinds of things. Knowing when not to do things. Physics or physics. Okay. Now, you've got the on-premise. What's the on-premise equation look like? >> So on-premise, the kind of customers, we are working with large enterprises, mid-size enterprises. So they have on-prem networks, they have deployed, in many cases, they have deployed SD-WAN. In many cases, they have MPLS. They have data centers also. And a lot of these companies are, you know, moving the applications from the data center into the cloud. But we still have large enterprise- >> But for you guys, you can sit there too with non server or is it a box or what is it? >> It's a software stack, right? So, we are a software company. >> Okay, so no box. >> No box. >> Okay, got it. >> No box. >> It's even better. So, we can connect any, as I mentioned, any endpoint, whether it's data centers. So, what happens is usually these enterprises from the data centers- >> John: It's a cloud endpoint for you. >> Cloud endpoint for us. And they need highspeed connectivity into the cloud. And our network cloud is sitting inside the or supercloud is sitting inside the cloud. So we need highspeed connectivity from the data centers. This is like multi-gig type of connectivity. So we enable that connectivity as a service. And as Amir was saying, you are able to bring it up in minutes, pretty much. >> John: Well, you guys have a great handle on supercloud. I really appreciate you guys coming on. I have to ask you guys, since you have so much experience in the industry, multiple inflection points you've guys lived through and we're all old, and we can remember those glory days. What's the big deal going on right now? Because you can connect the dots and you can imagine, okay, like a Lambda function spinning up some connectivity. I need instant access to a new route, throw some, I need to send compute to an edge point for process data. A lot of these kind of ad hoc services are going to start flying around, which used to be manually configured as you guys remember. >> Amir: And that's been the problem, right? The shadow IT, that was the biggest problem in the enterprise environment. So that's what we are trying to get the customers away from. Cloud teams came in, individuals or small groups of people spun up instances in the cloud. It was completely disconnected from the on-premise environment or the existing IT environment that the customer had. So, how do you bring it together? And that's what we are trying to solve for, right? At a large scale, in a carrier cloud center (indistinct). >> What do you call that? Shift right or shift left? Shift left is in the cloud native world security. >> Amir: Yes. >> Networking and security, the two hottest areas. What are you shifting? Up or down? I mean, the network's moving up the stack. I mean, you're seeing the run times at Kubernetes later' >> Amir: Right, right. It's true we're end-to-end virtualization. So you have plumbing, which is the physical infrastructure. Then on top of that, now for the first time, you have true end-to-end virtualization, which the cloud-like constructs are providing to us. We tried to virtualize the routers, we try to virtualize instances at the server level. Now, we are bringing it all together in a truly end-to-end virtualized manner to connect any endpoint anywhere across the globe. Whether it's on-premise, home, multiple clouds, or SaaS type environments. >> Yeah. If you talk about the technical benefits beyond virtualizations, you kind of see in virtualization be abstracted away. So you got end-to-end virtualization, but you don't need to know virtualization to take advantage of it. >> Exactly. Exactly. >> What are some of the tech involved where, what's the trend around on top of virtual? What's the easy button for that? >> So there are many, many use cases from the customers and they're, you know, some of those use cases, they used to deliver out of their data centers before. So now, because you, know, it takes a long time to spend something up in the data center and stuff. So the trend is and what enterprises are looking for is agility. And to achieve that agility, they are moving those services or those use cases into the cloud. So another technical benefit of like something like a supercloud and what we are doing is we allow customers to, you know, move their services from existing data centers into the cloud as well. And I'll give you some examples. You know, these enterprises have, you know, tons of partners. They provide connectivity to their partners, to select resources. It used to happen inside the data center. You would bring in connectivity into the data center and apply like tons of ACLs and whatnot to make sure that you are able to only connect. And now those use cases are, they need to be enabled inside the cloud. And the customer's customers are also, it's not just coming from the on-prem, they're coming from the cloud as well. So, if they're coming from the cloud as well as from on-prem, so you need like an infrastructure like supercloud, which is sitting inside the cloud and is able to handle all these use cases. So all of these use cases have to be, so that requires like moving those services from the data center into the cloud or into the supercloud. So, they're, oh, as we started building this service over the last four years, we have come across so many use cases. And to deliver those use cases, you have to have a platform. So you have to have your own platform because otherwise you are depending on somebody else's, you know, capabilities. And every time their capabilities change, you have to change. >> John: I'm glad you brought up the platform 'cause I want to get your both reaction to this. So Bob Muglia just said on theCUBE here at Supercloud, that supercloud is a platform that provides programmatically consistent services hosted on heterogeneous cloud providers. So the question is, is supercloud a platform or an architecture in your view? >> That's an interesting view on things, you know? I mean, if you think of it, you have to design or architect a solution before we turn it into a platform. >> John: It's a trick question actually. >> So it's a, you know, so we look at it as that you have to have an architectural approach end to end, right? And then you build a solution based on that approach. So, I don't think that they are mutually exclusive. I think they go hand in hand. It's an architecture that you turn into a solution and provide that agility and high availability and disaster recovery capability that it built into that. >> It's interesting that these definitions might be actually redefined with this new configuration. >> Amir: Yes. >> Because architecture and platform used to mean something, like, aight here's a platform, you buy this platform. >> And then you architecture solution. >> Architect it via vendor. >> Right, right, right. >> Okay. And they have to deal with that architecture in the place of multiple superclouds. If you have too many stove pipes, then what's the purpose of supercloud? >> Right, right, right. And because, you know, historically, you built a router and you sold it to the customer. And the poor customer was supposed to install it all, you know, and interconnect all those things. And if you have 40, 50,000 router network, which we saw in our lifetime, 'cause there used to be many more branches when we were growing up in the networking industry, right? You had to create hierarchy and all kinds of things to figure out how to solve that problem. We are no longer living in that world anymore. You cannot deploy individual virtual instances. And that's what approach a lot of people are taking, which is a pure overly network. You cannot take that approach anymore. You have to evolve the architecture and then build the solution based on that architecture so that it becomes a platform which is readily available, highly scalable, and available. And at the same time, it's very, very easy to deploy. It's a SaaS type solution, right? >> So you're saying, do the architecture to get the solution for the platform that the customer has. >> Amir: Yes. >> They're not buying a platform, they end up with a platform- >> With the platform. >> as a result of Supercloud path. All right. So that's what's, so you mentioned, that's a great point. I want to double click on what you just said. 'Cause I like that what you said. What's the deployment strategy in your mind for supercloud? I'm an architect. I'm at an enterprise in the Midwest. I'm an insurance company, got some cloud action going on. I'm mostly on-premise. I've got the mandate to transform the company. We have apps. We'll be fully transformed in five years. What's my strategy? What do I do? >> Amir: The resources. >> What's the deployment strategy? Single global instance, code in every region, on every cloud? >> It needs to be a solution which is available as a SaaS service, right? So from the customer's perspective, they are onboarding into the supercloud. And then the supercloud is allowing them to do whatever they used to do, you know, historically and in the new world, right? That needs to come together. And that's what we have built is that, we have brought everything together in a way that what used to take months or years, and now taking an hour or two hours, and then people test it for a week or so and deploy it in production. >> I want to bring up something we were talking about before we were on camera about the TCP/IP, the OSI model. That was a concept that destroyed the proprietary narcissist. Work operating systems of the mini computers, which brought in an era of tech prosperity for generations. TCP/IP was kind of the magical moment that allowed for that kind of super networking connection. Inter networking is what's called as a category. It feels like something's going on here with supercloud. The way you describe it, it feels like there's this unification idea. Like the reality is we've got multiple stuff sitting around by default, you either clean it up or get rid of it, right? Or it's almost a, it's either a nuance, a new nuisance or chaos. >> Yeah. And we live in the new world now. We don't have the luxury of time. So we need to move as fast as possible to solve the business problems. And that's what we are running into. If we don't have automated solutions which scale, which solve our problems, then it's going to be a problem. And that's why SaaS is so important in today's world. Why should we have to deploy the network piecemeal? Why can't we have a solution? We solve our problem as we move forward and we accomplish what we need to accomplish and move forward. >> And we don't really need standards here, dude. It's not that we need a standards body if you have unification. >> So because things move so fast, there's no time to create a standards body. And that's why you see companies like ours popping up, which are trying to create a common infrastructure across all clouds. Otherwise if we vent the standardization path may take long. Eventually, we should be going in that direction. But we don't have the luxury of time. That's what I was trying to get to. >> Well, what's interesting is, is that to your point about standards and ratification, what ratifies a defacto anything? In the old days there was some technical bodies involved, but here, I think developers drive everything. So if you look at the developers and how they're voting with their code. They're instantly, organically defining everything as a collective intelligence. >> And just like you're putting out the paper and making it available, everybody's contributing to that. That's why you need to have APIs and terra form type constructs, which are available so that the customers can continue to improve upon that. And that's the Net DevOps, right? So that you need to have. >> What was once sacrilege, just sayin', in business school, back in the days when I got my business degree after my CS degree was, you know, no one wants to have a better mousetrap, a bad business model to have a better mouse trap. In this case, the better mouse trap, the better solution actually could be that thing. >> It is that thing. >> I mean, that can trigger, tips over the industry. >> And that that's where we are seeing our customers. You know, I mean, we have some publicly referenceable customers like Coke or Warner Music Group or, you know, multiple others and chart industries. The way we are solving the problem. They have some of the largest environments in the industry from the cloud perspective. And their whole network infrastructure is running on the Alkira infrastructure. And they're able to adopt new clouds within days rather than waiting for months to architect and then deploy and then figure out how to manage it and operate it. It's available as a service. >> John: And we've heard from your customer, Warner, they were just on the program. >> Amir: Yes. Okay, okay. >> So they're building a supercloud. So superclouds aren't just for tech companies. >> Amir: No. >> You guys build a supercloud for networking. >> Amir: It is. >> But people are building their own superclouds on top of all this new stuff. Talk about that dynamic. >> Healthcare providers, financials, high-tech companies, even startups. One of our startup customers, Tekion, right? They have these dealerships that they provide sales and support services to across the globe. And for them to be able to onboard those dealerships, it is 80% less time to production. That is real money, right? So, maybe Atif can give you a lot more examples of customers who are deploying. >> Talk about some of the customer activity. What are they like? Are they laggards, they innovators? Are they trying to hit the easy button? Are they coming in late or are you got some high customers? >> Actually most of our customers, all of our customers or customers in general. I don't think they have a choice but to move in this direction because, you know, the cloud has, like everything is quick now. So the cloud teams are moving faster in these enterprises. So now that they cannot afford the network nor to keep up pace with the cloud teams. So, they don't have a choice but to go with something similar where you can, you know, build your network on demand and bring up your network as quickly as possible to meet all those use cases. So, I'll give you an example. >> John: So the demand's high for what you guys do. >> Demand is very high because the cloud teams have- >> John: Yeah. They're going fast. >> They're going fast and there's no stopping. And then network teams, they have to keep up with them. And you cannot keep deploying, you know, networks the way you used to deploy back in the day. And as far as the use cases are concerned, there are so many use cases which our customers are using our platform for. One of the use cases, I'll give you an example of these financial customers. Some of the financial customers, they have their customers who they provide data, like stock exchanges, that provide like market data information to their customers out of data centers part. But now, their customers are moving into the cloud as well. So they need to come in from the cloud. So when they're coming in from the cloud, you cannot be giving them data from your data center because that takes time, and your hair pinning everything back. >> Moving data is like moving, moving money, someone said. >> Exactly. >> Exactly. And the other thing is like you have to optimize your traffic flows in the cloud as well because every time you leave the cloud, you get charged a lot. So, you don't want to leave the cloud unless you have to leave the cloud, your traffic. So, you have to come up or use a service which allows you to optimize all those traffic flows as well, you know? >> My final question to you guys, first of all, thanks for coming on Supercloud Program. Really appreciate it. Congratulations on your success. And you guys have a great positioning and I'm a big fan. And I have to ask, you guys are agile, nimble startup, smart on the cutting edge. Supercloud concept seems to resonate with people who are kind of on the front range of this major wave. While all the incumbents like Cisco, Microsoft, even AWS, they're like, I think they're looking at it, like what is that? I think it's coming up really fast, this trend. Because I know people talk about multi-cloud, I get that. But like, this whole supercloud is not just SaaS, it's more going on there. What do you think is going on between the folks who get it, supercloud, get the concept, and some are who are scratching their heads, whether it's the Ciscos or someone, like I don't get it. Why is supercloud important for the folks that aren't really seeing it? >> So first of all, I mean, the customers, what we saw about six months, 12 months ago, were a little slower to adopt the supercloud kind of concept. And there were leading edge customers who were coming and adopting it. Now, all of a sudden, over the last six to nine months, we've seen a flurry of customers coming in and they are from all disciplines or all very diverse set of customers. And they're starting to see the value of that because of the practical implications of what they're doing. You know, these shadow IT type environments are no longer working and there's a lot of pressure from the management to move faster. And then that's where they're coming in. And perhaps, Atif, if you can give a few examples of. >> Yeah. And I'll also just add to your point earlier about the network needing to be there 'cause the cloud teams are like, let's go faster. And the network's always been slow because, but now, it's been almost turbocharged. >> Atif: Yeah. Yeah, exactly. And as I said, like there was no choice here. You had to move in this industry. And the other thing I would add a little bit is now if you look at all these enterprises, most of their traffic is from, even from which is coming from the on-prem, it's going to the cloud SaaS applications or public clouds. And it's more than 50% of traffic, which is leaving your, you know, what you used to call, your network or the private network. So now it's like, you know, before it used to just connect sites to data centers and sites together. Now, it's a cloud as well as the SaaS application. So it's either internet bound or the public cloud bound. So now you have to build a network quickly, which caters to all these use cases. And that's where like something- >> And you guys, your solution to me is you eliminate all that work for the customer. Now, they can treat the cloud like a bag of Legos. And do their thing. Well, I oversimplify. Well, you know I'm talking about. >> Atif: Right, exactly. >> And to answer your question earlier about what about the big companies coming in and, you know, now they slow to adopt? And, you know, what normally happens is when Cisco came up, right? There used to be 16 different protocols suites. And then we finally settled on TCP/IP and DECnet or AppleTalk or X&S or, you know, you name it, right? Those companies did not adapt to the networking the way it was supposed to be done. And guess what happened, right? So if the companies in the networking space do not adopt this new concept or new way of doing things, I think some of them will become extinct over time. >> Well, I think the force and function too is the cloud teams as well. So you got two evolutions. You got architectural relevance. That's real as impact. >> It's very important. >> Cost, speed. >> And I look at it as a very similar disruption to what Cisco's the world, very early days did to, you know, bring the networking out, right? And it became the internet. But now we are going through the cloud. It's the cloud era, right? How does the cloud evolve over the next 10, 15, 20 years? Everything's is going to be offered as a service, right? So slowly data centers go away, the network becomes a plumbing thing. Very, you know, simple to deploy. And everything on top of that is virtualized in the cloud-like manners. >> And that makes the networks hardened and more secure. >> More secure. >> It's a great way to be secure. You remember the glory days, we'll go back 15 years. The Cisco conversation was, we got to move up to stack. All the manager would fight each other. Now, what does that actually mean? Stay where we are. Stay in your lane. This is kind of like the network's version of moving up the stack because not so much up the stack, but the cloud is everywhere. It's almost horizontally scaled. >> It's extending into the on-premise. It is already moving towards the edge, right? So, you will see a lot- >> So, programmability is a big program. So you guys are hitting programmability, compatibility, getting people into an environment they're comfortable operating. So the Ops people love it. >> Exactly. >> Spans the clouds to a level of SLA management. It might not be perfectly spanning applications, but you can actually know latencies between clouds, measure that. And then so you're basically managing your network now as the overall infrastructure. >> Right. And it needs to be a very intelligent infrastructure going forward, right? Because customers do not want to wait to be able to troubleshoot. They don't want to be able to wait to deploy something, right? So, it needs to be a level of automation. >> Okay. So the question for you guys both on we'll end on is what is the enablement that, because you guys are a disruptive enabler, right? You create this fabric. You're going to enable companies to do stuff. What are some of the things that you see and your customers might be seeing as things that they're going to do as a result of having this enablement? So what are some of those things? >> Amir: Atif, perhaps you can talk through the some of the customer experience on that. >> It's agility. And we are allowing these customers to move very, very quickly and build these networks which meet all these requirements inside the cloud. Because as Amir was saying, in the cloud era, networking is changing. And if you look at, you know, going back to your comment about the existing networking vendors. Some of them still think that, you know, just connecting to the cloud using some concepts like Cloud OnRamp is cloud networking, but it's changing now. >> John: 'Cause there's apps that are depending upon. >> Exactly. And it's all distributed. Like IT infrastructure, as I said earlier, is all distributed. And at the end of the day, you have to make sure that wherever your user is, wherever your app is, you are able to connect them securely. >> Historically, it used to be about building a router bigger and bigger and bigger and bigger, you know, and then interconnecting those routers. Now, it's all about horizontal scale. You don't need to build big, you need to scale it, right? And that's what cloud brings to the customer. >> It's a cultural change for Cisco and Juniper because they have to understand that they're still could be in the game and still win. >> Exactly. >> The question I have for you, what are your customers telling you that, what's some of the anecdotal, like, 'cause you guys have a good solution, is it, "Oh my god, you guys saved my butt." Or what are some of the commentary that you hear from the customers in terms of praise and and glory from your solution? >> Oh, some even say, when we do our demo and stuff, they say it's too hard to believe. >> Believe. >> Like, too hard. It's hard, you know, it's >> I dont believe you. They're skeptics. >> I don't believe you that because now you're able to bring up a global network within minutes. With networking services, like let's say you have APAC, you know, on-prem users, cloud also there, cloud here, users here, you can bring up a global network with full routed connectivity between all these endpoints with security services. You can bring up like a firewall from a third party or our services in the middle. This is a matter of minutes now. And this is all high speed connectivity with SLAs. Imagine like before connecting, you know, Singapore to U.S. East or Hong Kong to Frankfurt, you know, if you were putting your infrastructure in columns like E-connects, you would have to go, you know, figure out like, how am I going to- >> Seal line In, connect to it? Yeah. A lot of hassles, >> If you had to put like firewalls in the middle, segmentation, you had to, you know, isolate different entities. >> That's called heavy lifting. >> So what you're seeing is, you know, it's like customer comes in, there's a disbelief, can you really do that? And then they try it out, they go, "Wow, this works." Right? It's deployed in a small environment. And then all of a sudden they start taking off, right? And literally we have seen customers go from few thousand dollars a month or year type deployments to multi-million dollars a year type deployments in very, very short amount of time, in a few months. >> And you guys are pay as you go? >> Pay as you go. >> Pay as go usage cloud-based compatibility. >> Exactly. And it's amazing once they get to deploy the solution. >> What's the variable on the cost? >> On the cost? >> Is it traffic or is it. >> It's multiple different things. It's packaged into the overall solution. And as a matter of fact, we end up saving a lot of money to the customers. And not only in one way, in multiple different ways. And we do a complete TOI analysis for the customers. So it's bandwidth, it's number of connections, it's the amount of compute power that we are using. >> John: Similar things that they're used to. >> Just like the cloud constructs. Yeah. >> All right. Networking supercloud. Great. Congratulations. >> Thank you so much. >> Thanks for coming on Supercloud. >> Atif: Thank you. >> And looking forward to seeing more of the demand. Translate, instant networking. I'm sure it's going to be huge with the edge exploding. >> Oh yeah, yeah, yeah, yeah. >> Congratulations. >> Thank you so much. >> Thank you so much. >> Okay. So this is Supercloud 2 event here in Palo Alto. I'm John Furrier. The network Supercloud is here. Checkout Alkira. I'm John Furry, the host. Thanks for watching. (lively music)

Published Date : Feb 17 2023

SUMMARY :

networking aspect of the cloud, that really galvanized the industry of the cloud architectures Amazon has this and then going to be interconnected. Whether it's on-premise, So then now, you have So you had to bring up the same So all of that needs to be built in. and a lot of the alpha cloud players now, So now, you know, Ops So what do you think So now, it's the platform which is sitting So you guys brought the SD-WAN mojo so explain the difference. So what do you do? a network layer in the So what you do is and flexible for the customer Is that the wave? agility is the key, right? We, pretty much, yeah. the benefit to the customer? So you need to do a lot of stuff, and then ultimately, you know, that we run into, you when the edge comes out too, And you can expand So the question on the premise side, So on-premise, the kind of customers, So, we are a software company. from the data centers- or supercloud is sitting inside the cloud. I have to ask you guys, since that the customer had. Shift left is in the cloud I mean, the network's moving up the stack. So you have plumbing, which is So you got end-to-end virtualization, Exactly. So you have to have your own platform So the question is, it, you have to design So it's a, you know, It's interesting that these definitions you buy this platform. in the place of multiple superclouds. And because, you know, for the platform that the customer has. 'Cause I like that what you said. So from the customer's perspective, of the mini computers, We don't have the luxury of time. if you have unification. And that's why you see So if you look at the developers So that you need to have. in business school, back in the days I mean, that can trigger, from the cloud perspective. from your customer, Warner, So they're building a supercloud. You guys build a Talk about that dynamic. And for them to be able to the customer activity. So the cloud teams are moving John: So the demand's the way you used to Moving data is like moving, And the other thing is And I have to ask, you guys from the management to move faster. about the network needing to So now you have to to me is you eliminate all So if the companies in So you got two evolutions. And it became the internet. And that makes the networks hardened This is kind of like the network's version It's extending into the on-premise. So you guys are hitting Spans the clouds to a So, it needs to be a level of automation. What are some of the things that you see of the customer experience on that. And if you look at, you know, that are depending upon. And at the end of the day, and bigger, you know, in the game and still win. commentary that you hear they say it's too hard to believe. It's hard, you know, it's I dont believe you. Imagine like before connecting, you know, Seal line In, connect to it? firewalls in the middle, can you really do that? Pay as go usage get to deploy the solution. it's the amount of compute that they're used to. Just like the cloud constructs. All right. And looking forward to I'm John Furry, the host.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
MicrosoftORGANIZATION

0.99+

CiscoORGANIZATION

0.99+

JohnPERSON

0.99+

AmirPERSON

0.99+

Bob MugliaPERSON

0.99+

Amir KhanPERSON

0.99+

Atif KhanPERSON

0.99+

John FurryPERSON

0.99+

John FurrierPERSON

0.99+

2018DATE

0.99+

CokeORGANIZATION

0.99+

AWSORGANIZATION

0.99+

Warner Music GroupORGANIZATION

0.99+

AtifPERSON

0.99+

CiscosORGANIZATION

0.99+

AlkiraPERSON

0.99+

Palo AltoLOCATION

0.99+

an hourQUANTITY

0.99+

AlkiraORGANIZATION

0.99+

FrankfurtLOCATION

0.99+

AmazonORGANIZATION

0.99+

JuniperORGANIZATION

0.99+

SingaporeLOCATION

0.99+

a dayQUANTITY

0.99+

NetflixORGANIZATION

0.99+

U.S. EastLOCATION

0.99+

Palo AltoORGANIZATION

0.99+

16 different protocolsQUANTITY

0.99+

JunipersORGANIZATION

0.99+

CheckpointORGANIZATION

0.99+

Hong KongLOCATION

0.99+

10 minutesQUANTITY

0.99+

less than one hourQUANTITY

0.99+

ViptelaORGANIZATION

0.99+

twoQUANTITY

0.99+

five yearsQUANTITY

0.99+

bothQUANTITY

0.99+

first timeQUANTITY

0.99+

OneQUANTITY

0.99+

more than 50%QUANTITY

0.99+

one wayQUANTITY

0.99+

firstQUANTITY

0.99+

SupercloudORGANIZATION

0.98+

Supercloud 2EVENT

0.98+

LambdaTITLE

0.98+

One wayQUANTITY

0.98+

CLITITLE

0.98+

supercloudORGANIZATION

0.98+

12 months agoDATE

0.98+

LegosORGANIZATION

0.98+

APACORGANIZATION

0.98+

oneQUANTITY

0.98+