Joe Nolte, Allegis Group & Torsten Grabs, Snowflake | Snowflake Summit 2022

>>Hey everyone. Welcome back to the cube. Lisa Martin, with Dave ante. We're here in Las Vegas with snowflake at the snowflake summit 22. This is the fourth annual there's close to 10,000 people here. Lots going on. Customers, partners, analysts, cross media, everyone talking about all of this news. We've got a couple of guests joining us. We're gonna unpack snow park. Torston grabs the director of product management at snowflake and Joe. No NTY AI and MDM architect at Allegis group. Guys. Welcome to the program. Thank >>You so much for having >>Us. Isn't it great to be back in person? It is. >>Oh, wonderful. Yes, it >>Is. Indeed. Joe, talk to us a little bit about Allegis group. What do you do? And then tell us a little bit about your role specifically. >>Well, Allegis group is a collection of OPCA operating companies that do staffing. We're one of the biggest staffing companies in north America. We have a presence in AMEA and in the APAC region. So we work to find people jobs, and we help get 'em staffed and we help companies find people and we help individuals find >>People incredibly important these days, excuse me, incredibly important. These days. It is >>Very, it very is right >>There. Tell me a little bit about your role. You are the AI and MDM architect. You wear a lot of hats. >>Okay. So I'm a architect and I support both of those verticals within the company. So I work, I have a set of engineers and data scientists that work with me on the AI side, and we build data science models and solutions that help support what the company wants to do, right? So we build it to make business business processes faster and more streamlined. And we really see snow park and Python helping us to accelerate that and accelerate that delivery. So we're very excited about it. >>Explain snow park for, for people. I mean, I look at it as this, this wonderful sandbox. You can bring your own developer tools in, but, but explain in your words what it >>Is. Yeah. So we got interested in, in snow park because increasingly the feedback was that everybody wants to interact with snowflake through SQL. There are other languages that they would prefer to use, including Java Scala and of course, Python. Right? So then this led down to the, our, our work into snow park where we're building an infrastructure that allows us to host other languages natively on the snowflake compute platform. And now here, what we're, what we just announced is snow park for Python in public preview. So now you have the ability to natively run Python code on snowflake and benefit from the thousands of packages and libraries that the open source community around Python has contributed over the years. And that's a huge benefit for data scientists. It is ML practitioners and data engineers, because those are the, the languages and packages that are popular with them. So yeah, we very much look forward to working with the likes of you and other data scientists and, and data engineers around the Python ecosystem. >>Yeah. And, and snow park helps reduce the architectural footprint and it makes the data pipelines a little easier and less complex. We have a, we had a pipeline and it works on DMV data. And we converted that entire pipeline from Python, running on a VM to directly running down on snowflake. Right. We were able to eliminate code because you don't have to worry about multi threading, right? Because we can just set the warehouse size through a task, no more multi threading, throw that code away. Don't need to do it anymore. Right. We get the same results, but the architecture to run that pipeline gets immensely easier because it's a store procedure that's already there. And implementing that calling to that store procedure is very easy. The architecture that we use today uses six different components just to be able to run that Python code on a VM within our ecosystem to make sure that it runs on time and is scheduled and all of that. Right. But with snowflake, with snowflake and snow park and snowflake Python, it's two components. It's the store procedure and our ETL tool calling it. >>Okay. So you've simplified that, that stack. Yes. And, and eliminated all the other stuff that you had to do that now Snowflake's doing, am I correct? That you're actually taking the application development stack and the analytics stack and bringing them together? Are they merging? >>I don't know. I think in a way I'm not real sure how I would answer that question to be quite honest. I think with stream lit, there's a little bit of application that's gonna be down there. So you could maybe start to say that I'd have to see how that carries out and what we do and what we produce to really give you an answer to that. But yeah, maybe in a >>Little bit. Well, the reason I asked you is because you talk, we always talk about injecting data into apps, injecting machine intelligence and ML and AI into apps, but there are two separate stacks today. Aren't they >>Certainly the two are getting closer >>To Python Python. It gets a little better. Explain that, >>Explain, explain how >>That I just like in the keynote, right? The other day was SRE. When she showed her sample application, you can start to see that cuz you can do some data pipelining and data building and then throw that into a training module within Python, right down inside a snowflake and have it sitting there. Then you can use something like stream lit to, to expose it to your users. Right? We were talking about that the other day, about how do you get an ML and AI, after you have it running in front of people, we have a model right now that is a Mo a predictive and prescriptive model of one of our top KPIs. Right. And right now we can show it to everybody in the company, but it's through a Jupyter notebook. How do I deliver it? How do I get it in the front of people? So they can use it well with what we saw was streamlet, right? It's a perfect match. And then we can compile it. It's right down there on snowflake. And it's completely easier time to delivery to production because since it's already part of snowflake, there's no architectural review, right. As long as the code passes code review, and it's not poorly written code and isn't using a library that's dangerous, right. It's a simple deployment to production. So because it's encapsulated inside of that snowflake environment, we have approval to just use it. However we see fit. >>It's very, so that code delivery, that code review has to occur irrespective of, you know, not always whatever you're running it on. Okay. So I get that. And, and, but you, it's a frictionless environment you're saying, right. What would you have had to do prior to snowflake that you don't have to do now? >>Well, one, it's a longer review process to allow me to push the solution into production, right. Because I have to explain to my InfoSec people, right? My other it's not >>Trusted. >>Well, well don't use that word. No. Right? It got, there are checks and balances in everything that we do, >>It has to be verified. And >>That's all, it's, it's part of the, the, what I like to call the good bureaucracy, right? Those processes are in place to help all of us stay protected. >>It's the checklist. Yeah. That you >>Gotta go to. >>That's all it is. It's like fly on a plane. You, >>But that checklist gets smaller. And sometimes it's just one box now with, with Python through snow park, running down on the snowflake platform. And that's, that's the real advantage because we can do things faster. Right? We can do things easier, right? We're doing some mathematical data science right now and we're doing it through SQL, but Python will open that up much easier and allow us to deliver faster and more accurate results and easier not to mention, we're gonna try to bolt on the hybrid tables to that afterwards. >>Oh, we had talk about that. So can you, and I don't, I don't need an exact metric, but when you say faster talking 10% faster, 20% faster, 50% path >>Faster, it really depends on the solution. >>Well, gimme a range of, of the worst case, best case. >>I, I really don't have that. I don't, I wish I did. I wish I had that for you, but I really don't have >>It. I mean, obviously it's meaningful. I mean, if >>It is meaningful, it >>Has a business impact. It'll >>Be FA I think what it will do is it will speed up our work inside of our iterations. So we can then, you know, look at the code sooner. Right. And evaluate it sooner, measure it sooner, measure it faster. >>So is it fair to say that as a result, you can do more. Yeah. That's to, >>We be able do more well, and it will enable more of our people because they're used to working in Python. >>Can you talk a little bit about, from an enablement perspective, let's go up the stack to the folks at Allegis who are on the front lines, helping people get jobs. What are some of the benefits that having snow park for Python under the hood, how does it facilitate them being able to get access to data, to deliver what they need to, to their clients? >>Well, I think what we would use snowflake for a Python for there is when we're building them tools to let them know whether or not a user or a piece of talent is already within our system. Right. Things like that. Right. That's how we would leverage that. But again, it's also new. We're still figuring out what solutions we would move to Python. We are, we have some targeted, like we're, I have developers that are waiting for this and they're, and they're in private preview. Now they're playing around with it. They're ready to start using it. They're ready to start doing some analytical work on it, to get some of our analytical work out of, out of GCP. Right. Because that's where it is right now. Right. But all the data's in snowflake and it just, but we need to move that down now and take the data outta the data wasn't in snowflake before. So there, so the dashboards are up in GCP, but now that we've moved all of that data down in, down in the snowflake, the team that did that, those analytical dashboards, they want to use Python because that's the way it's written right now. So it's an easier transformation, an easier migration off of GCP and get us into snow, doing everything in snowflake, which is what we want. >>So you're saying you're doing the visualization in GCP. Is that righting? >>It's just some dashboarding. That's all, >>Not even visualization. You won't even give for. You won't even give me that. Okay. Okay. But >>Cause it's not visualization. It's just some D boardings of numbers and percentages and things like that. It's no graphic >>And it doesn't make sense to run that in snowflake, in GCP, you could just move it into AWS or, or >>No, we, what we'll be able to do now is all that data before was in GCP and all that Python code was running in GCP. We've moved all that data outta GCP, and now it's in snowflake and now we're gonna work on taking those Python scripts that we thought we were gonna have to rewrite differently. Right. Because Python, wasn't available now that Python's available, we have an easier way of getting those dashboards back out to our people. >>Okay. But you're taking it outta GCP, putting it to snowflake where anywhere, >>Well, the, so we'll build the, we'll build those, those, those dashboards. And they'll actually be, they'll be displayed through Tableau, which is our enterprise >>Tool for that. Yeah. Sure. Okay. And then when you operationalize it it'll go. >>But the idea is it's an easier pathway for us to migrate our code, our existing code it's in Python, down into snowflake, have it run against snowflake. Right. And because all the data's there >>Because it's not a, not a going out and coming back in, it's all integrated. >>We want, we, we want our people working on the data in snowflake. We want, that's our data platform. That's where we want our analytics done. Right. We don't want, we don't want, 'em done in other places. We when get all that data down and we've, we've over our data cloud journey, we've worked really hard to move all of that data. We use out of existing systems on prem, and now we're attacking our, the data that's in GCP and making sure it's down. And it's not a lot of data. And we, we fixed it with one data. Pipeline exposes all that data down on, down in snowflake now. And we're just migrating our code down to work against the snowflake platform, which is what we want. >>Why are you excited about hybrid tables? What's what, what, what's the >>Potential hybrid tables I'm excited about? Because we, so some of the data science that we do inside of snowflake produces a set of results and there recommendations, well, we have to get those recommendations back to our people back into our, our talent management system. And there's just some delays. There's about an hour delay of delivering that data back to that team. Well, with hybrid tables, I can just write it to the hybrid table. And that hybrid table can be directly accessed from our talent management system, be for the recruiters and for the hiring managers, to be able to see those recommendations and near real time. And that that's the value. >>Yep. We learned that access to real time. Data it in recent years is no longer a nice to have. It's like a huge competitive differentiator for every industry, including yours guys. Thank you for joining David me on the program, talking about snow park for Python. What that announcement means, how Allegis is leveraging the technology. We look forward to hearing what comes when it's GA >>Yeah. We're looking forward to, to it. Nice >>Guys. Great. All right guys. Thank you for our guests and Dave ante. I'm Lisa Martin. You're watching the cubes coverage of snowflake summit 22 stick around. We'll be right back with our next guest.

Published Date : Jun 15 2022

SUMMARY :

This is the fourth annual there's close to Us. Isn't it great to be back in person? Yes, it Joe, talk to us a little bit about Allegis group. So we work to find people jobs, and we help get 'em staffed and we help companies find people and we help It is You are the AI and MDM architect. on the AI side, and we build data science models and solutions I mean, I look at it as this, this wonderful sandbox. and libraries that the open source community around Python has contributed over the years. And implementing that calling to that store procedure is very easy. And, and eliminated all the other stuff that you had to do that now Snowflake's doing, am I correct? we produce to really give you an answer to that. Well, the reason I asked you is because you talk, we always talk about injecting data into apps, It gets a little better. And it's completely easier time to delivery to production because since to snowflake that you don't have to do now? Because I have to explain to my InfoSec we do, It has to be verified. Those processes are in place to help all of us stay protected. It's the checklist. That's all it is. And that's, that's the real advantage because we can do things faster. I don't need an exact metric, but when you say faster talking 10% faster, I wish I had that for you, but I really don't have I mean, if Has a business impact. So we can then, you know, look at the code sooner. So is it fair to say that as a result, you can do more. We be able do more well, and it will enable more of our people because they're used to working What are some of the benefits that having snow park of that data down in, down in the snowflake, the team that did that, those analytical dashboards, So you're saying you're doing the visualization in GCP. It's just some dashboarding. You won't even give for. It's just some D boardings of numbers and percentages and things like that. gonna have to rewrite differently. And they'll actually be, they'll be displayed through Tableau, which is our enterprise And then when you operationalize it it'll go. And because all the data's there And it's not a lot of data. so some of the data science that we do inside of snowflake produces a set of results and We look forward to hearing what comes when it's GA Thank you for our guests and Dave ante.

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Joe	PERSON	0.99+
10%	QUANTITY	0.99+
20%	QUANTITY	0.99+
Dave	PERSON	0.99+
Allegis	ORGANIZATION	0.99+
Las Vegas	LOCATION	0.99+
Allegis Group	ORGANIZATION	0.99+
Joe Nolte	PERSON	0.99+
50%	QUANTITY	0.99+
north America	LOCATION	0.99+
Python	TITLE	0.99+
Java Scala	TITLE	0.99+
SQL	TITLE	0.99+
both	QUANTITY	0.99+
one box	QUANTITY	0.99+
two	QUANTITY	0.99+
thousands	QUANTITY	0.99+
Snowflake Summit 2022	EVENT	0.98+
AWS	ORGANIZATION	0.98+
Tableau	TITLE	0.98+
six different components	QUANTITY	0.98+
two components	QUANTITY	0.98+
Python Python	TITLE	0.98+
Torsten Grabs	PERSON	0.97+
one	QUANTITY	0.96+
today	DATE	0.96+
Torston	PERSON	0.96+
Allegis group	ORGANIZATION	0.96+
OPCA	ORGANIZATION	0.95+
one data	QUANTITY	0.95+
two separate stacks	QUANTITY	0.94+
InfoSec	ORGANIZATION	0.91+
Dave ante	PERSON	0.9+
fourth annual	QUANTITY	0.88+
Jupyter	ORGANIZATION	0.88+
park	TITLE	0.85+
snowflake summit 22	EVENT	0.84+
10,000 people	QUANTITY	0.82+
Snowflake	ORGANIZATION	0.78+
AMEA	LOCATION	0.77+
snow park	TITLE	0.76+
snow	ORGANIZATION	0.66+
couple of guests	QUANTITY	0.65+
NTY	ORGANIZATION	0.6+
Snowflake	EVENT	0.59+
MDM	ORGANIZATION	0.58+
APAC	ORGANIZATION	0.58+
prem	ORGANIZATION	0.52+
GA	LOCATION	0.5+
snow	TITLE	0.46+
SRE	TITLE	0.46+
lit	ORGANIZATION	0.43+
stream	TITLE	0.41+
22	QUANTITY	0.4+

Barb Huelskamp and Tarik Dwiek, Alteryx

>>Okay. We're back here in the cube, focusing on the business promise of the cloud democratizing data, making it accessible and enabling everyone to get value from analytics, insights, and data. We're now moving into the eco systems segment the power of many versus the resources of one. And we're pleased to welcome. Barb Hills camp was the senior vice president partners and alliances at Ultrix and a special guest terror do week head of technology alliances at snowflake folks. Welcome. Good to see you. >>Thank you. Thanks for having me. Good to >>See Dave. Great to see you guys. So cloud migration, it's one of the hottest topics. It's the top one of the top initiatives of senior technology leaders. We have survey data with our partner ETR it's number two behind security and just ahead of analytics. So we're hovering around all the hot topics here. Barb, what are you seeing with respect to customer know cloud migration momentum and how does the Ultrix partner strategy fit? >>Yeah, sure. Partners are central, our company's strategy. They always have been, we recognize that our partners have deep customer relationships. And when you connect that with their domain expertise, they're really helping customers on their cloud and business transformation journey. We've been helping customers achieve their desired outcomes with our partner community for quite some time. And our partner base has been growing an average of 30% year over year, that partner, community and strategy now addresses several kinds of partners, spanning solution providers to global size and technology partners, such as snowflake and together, we help our customers realize that business promise of their journey to the cloud. Snowflake provides a scalable storage system altereds provides the business user friendly front end. So for example, it departments depend on snowflake to consolidate data across systems into one data cloud with Altryx business users can easily unlock that data in snowflake solving real business outcomes. Our GSI and solution provider partners are instrumental in providing that end to end benefit of a modern analytic stack in the cloud providing platform guidance, deployment, support, and other professional services. Okay, >>Great. Let's get a little bit more into the relationship between Altrix and in snowflake the partnership, maybe a little bit about the history, you know, what are the critical aspects that we should really focus on? Barb? Maybe you could start an Interra kindly way in as well. >>Yeah, so the relationship started in 2020 and all shirts made a big bag deep with snowflake co-innovating and optimizing cloud use cases together. We are supporting customers who are looking for that modern analytic stack to replace an old one or to implement their first analytic strategy. And our joint customers want to self-serve with data-driven analytics, leveraging all the benefits of the cloud, scalability, accessibility, governance, and optimizing our costs. Altrix proudly achieves highest elite tier and their partner program last year. And to do that, we completed a rigorous third party testing process, which also helped us make some recommended improvements to our joint stack. We wanted customers to have confidence. They would benefit from high quality and performance in their investment with us then to help customers get the most value out of the strength solution. We developed two great assets. One is the Altrix starter kit for snowflake, and we coauthored a joint best practices guide. >>The starter kit contains documentation, business workflows and videos, helping customers to get going more easily with an Alteryx and snowflake solution. And the best practices guide is more of a technical document, bringing together experiences and guidance on how Ultrix and snowflake can be deployed together. Internally. We also built a full enablement catalog resources, right? We wanted to provide our account executives more about the value of the snowflake relationship. How do we engage and some best practices. And now we have hundreds of joint customers such as Juniper and Sainsbury who are actively using our joint solution, solving big business problems, much faster. Cool. >>Tara, can you give us your perspective on the >>Yeah, definitely. Dave. So as Bart mentioned, we've got this standing very successful partnership going back, whereas with hundreds of happy joint customers. And when I look at the beginning, Ultrix has helped pioneer the concept of self-service analytics actually with use cases that we've worked on with, for, for data prep for BI users like Tableau and as Altrix has evolved to now becoming from data prep to now becoming a full end to end data science platform, it's really opened up a lot more opportunities for our partnership. Ultrix has invested heavily over the last two years in areas of deep integration for customers to fully be able to expand their investment, both technologies. And those investments include things like in database pushed down, right? So customers can, can leverage that elastic platform, that being the snowflake data cloud with Alteryx orchestrating the end to end machine learning workflows, Altryx also invested heavily in snow park, a feature we released last year around this concept of data programmability. So all users were regardless of their business analysts, regardless of their data, scientists can use their tools of choice in order to consume and get at data. And now with Altryx cloud, we think it's going to open up even more opportunities. It's going to be a big year for the partnership. >>Yeah. So, you know, Terike, we we've covered snowflake pretty extensively and you initially solve what I used to call the, I still call the snake swallowing the basketball problem and cloud data warehouse changed all that because you had virtually infinite resources. But so that's obviously one of the problems that you guys solved early on, but what are some of the common challenges or patterns or trends that you see with snowflake customers and where does Altryx come in? >>Sure. Dave there's there's handful that I can come up with today. The big challenges or trends for us, and Altrix really helps us across all of them. There are three particular ones I'm going to talk about the first one being self service analytics. If we think about it, every organization is trying to democratize data. Every organization wants to empower all their users, business users, you know, the, the technology users, but the business users, right? I think every, every organization has realized that if everyone has access to data and everyone can do something with data, it's going to make them competitively, give them a competitive advantage with all traits is something we share that vision of putting that power in the hands of everyday users, regardless of the skillsets. So with self-service analytics, with Ultrix designer, they've they started out with self-service analytics as the forefront, and we're just scratching the surface. >>I think there was an analyst report that shows that less than 20% of organizations are truly getting self-service analytics to their end users. Now with Altryx going to Ultrix cloud, we think that's going to be a huge opportunity for us. And then that opens up the second challenge, which is machine learning and AI, every organization is trying to get predictive analytics into every application that they have in order to be competitive in order to be competitive. And with Altryx creating this platform. So they can cater to both the everyday business user, the quote, unquote, citizen data scientists, and making it code friendly for data scientists, to be able to get at their notebooks and all the different tools that they want to use. They fully integrated in our snow park platform, which I talked about before, so that now we get an end to end solution catering to all, all lines of business. >>And then finally this concept of data marketplaces, right? We, we created snowflake from the ground up to be able to solve the data sharing problem, the big data problem, the data sharing problem. And Altryx, if we look at mobilizing your data, getting access to third-party data sets to enrich with your own data sets to enrich with, with your suppliers and with your partners, data sets, that's what all customers are trying to do in order to get a more comprehensive 360 view within their, their data applications. And so with Altryx is we're working on third-party data sets and marketplaces for quite some time. Now we're working on how do we integrate what Altrix is providing with, with the snowflake data marketplace so that we can enrich these workflows, these great rate workflows that Ultrix rating provides. Now we can add third party data into that workflow. So that opens up a ton of opportunities date. So those are three. I see easily that we're going to be able to solve a lot of customer challenges with. >>Excellent. Thank you for that. Terrick so let's stay on cloud a little bit. I mean, Altrix is undergoing a major transformation, big focus on the cloud. How does this cloud launch impact the partnership Terike from snowflakes perspective and then Barb, maybe, please add some color. >>Yeah, sure. Dave snowflake started as a cloud data platform. We saw our founders really saw the challenges that customers are having with becoming data-driven. And the biggest challenge was the complexity of having a managed infrastructure to even be able to, to get applications off the ground. And so we created something to be Claudia. We created to be a SAS managed service. So now that that Altrix is moving into the same model, right? A cloud platform, a SAS managed service, we're just, we're just removing more of the friction. So we're going to be able to start to package these end to end solutions that are SAS based that are fully managed. So customers can, can go faster. They don't have to worry about all of the underlying complexities of, of, of stitching things together. Right? So, so that's, what's exciting from my viewpoint >>And I'll follow up. So as you said, we're investing heavily in the cloud a year ago, we had to pray desktop products. And today we have four cloud products with cloud. We can provide our users with more flexibility. We want to make it easier for the users to leverage their snowflake data in the Alteryx platform, whether they're using our beloved on-premise solution or the new cloud products, we're committed to that continued investment in the cloud, enabling our joint partner solutions to meet customer requirements, wherever they store their data. And we're working with snowflake, we're doing just that. So as customers look for a modern analytic stack, they expect that data to be easily accessible, right within a fast, secure and scalable platform. And the launch of our cloud strategy is a huge leap forward in making Altrix more widely accessible to all users in all types of roles, our GSI and our solution provider partners have asked for these cloud capabilities at scale, and they're excited to better support our customers cloud and analytic ambitions. >>How about you go to market strategy? How would you describe your joint go to market strategy with snowflake? >>Sure. It's simple. We've got to work backwards from our customer's challenges, right? Driving transformation to solve problems, games agencies, or help them save money. So whether it's with snowflake or other GSI, other partner types, we've outlined a joint journey together from recruit solution development, activation enablement, and then strengthening our go to market strategies to optimize our results together. We launched an updated partner program and within that framework, we've created new benefits for our partners around opportunity registration, new role based enablement and training, basically extending everything we do internally for our own go-to-market teams to our partners. We're offering partner, marketing resources and funding to reach new customers together. And as a matter of fact, we recently launched a fantastic video with snowflake. I love this video that very simply describes the path to insights starting with your snowflake data. Right? We do joint customer webinars. We're working on joint hands-on labs and have a wonderful landing page with a lot of assets for our customers. Once we have an interested customer, we engage our respective account managers, collaborating through questions, proof of concepts really showcasing the desired outcome. And when you combine that with our partners technology or domain expertise, it's quite powerful, >>Tara, how do you see it? You'd go to market strategy. >>Yeah. Dave we've. So we initially started selling, we initially sold snowflake as technology, right? Looking at positioning the diff the architectural differentiators and the scale and concurrency. And we noticed as we got up into the larger enterprise customers, we were starting to see how do they solve their business problems using the technology, as well as them coming to us and saying, look, we want to also know how do you, how do you continue to map back to the specific prescriptive business problems we're having? And so we shifted to an industry focus last year, and this is an area where Ultrix has been mature for probably since their inception selling to the line of business, right? Having prescriptive use cases that are particular to an industry like financial services, like retail, like healthcare and life sciences. And so mark talked about these, these starter kits where it's prescriptive, you've got a demo and a way that customers can get off the ground and running, right? >>Because we want to be able to shrink that time to market, the time to value that customers can watch these applications. And we want to be able to, to, to tell them specifically how we can map back to their business initiatives. So I see a huge opportunity to align on these industry solutions. As BARR mentioned, we're already doing that where we've released a few around financial services working on healthcare and retail as well. So that is going to be a way for us to allow customers to go even faster and start to map to lines of business with Altryx >>Great. Thanks Derek, Bob, what can we expect if we're observing this relationship? What should we look for in the coming year? >>A lot specifically with snowflake, we'll continue to invest in the partnership. We're co innovators in this journey, including snow park extensibility efforts, which Derek will tell you more about shortly. We're also launching these great news strategic solution blueprints, and extending that at no charge to our partners with snowflake, we're already collaborating with their retail and CPG team for industry blueprints. We're working with their data marketplace team to highlight solutions, working with that data in their marketplace. More broadly, as I mentioned, we're relaunching the alternative partner program designed to really better support the unique partner types in our global ecosystem, introducing new benefits so that with every partner, achievement or investment with ultra we're providing our partners with earlier access to benefits, I could talk about our program for 30 minutes. I know we don't have time, but the key message here Alteryx is investing in our partner community across the business, recognizing the incredible value that they bring to our customers every day. >>Great Tarik. We'll give you the last word. What should we be looking for from, >>Yeah. Thanks. Thanks, Dave. As BARR mentioned, Ultrix has been the forefront of innovating with us. They've been integrating into making sure again, that customers get the full investment out of snowflake things like in database push down that I talked about before, but extensibility is really what we're excited about. The ability for Altrix to plug into this extensibility framework that we call snow park and to be able to extend out ways that the end users can consume snowflake through, through sequel, which has traditionally been the way that you consume snowflake as well as Java and Scala now Python. So we're excited about those, those capabilities. And then we're also excited about the ability to plug into the data marketplace to provide third party data sets, right? If they're PI day sets and in financial services, third party, data sets and retail. So now customers can build their data applications from end to end using ultrasound snowflake when the comprehensive 360 view of their customers, of their partners, of even their employees. Right. I think it's exciting to see what we're going to be able to do together with these upcoming innovations. >>Great stuff, Bob, Derek, thanks so much for coming on the program. Got to leave it right there in a moment. I'll be back with some closing thoughts in summary, don't go away.

Published Date : Mar 1 2022

SUMMARY :

We're now moving into the eco systems segment the power of many Good to So cloud migration, it's one of the hottest topics. on snowflake to consolidate data across systems into one data cloud with Altryx business the partnership, maybe a little bit about the history, you know, what are the critical aspects that we should really focus And to do that, we completed a rigorous third party helping customers to get going more easily with an Alteryx and snowflake solution. So customers can, can leverage that elastic platform, that being the snowflake data cloud with one of the problems that you guys solved early on, but what are some of the common challenges or patterns or trends to data and everyone can do something with data, it's going to make them competitively, give them a competitive advantage So they can cater to both the everyday business user, And so with Altryx is we're working on third-party big focus on the cloud. So now that that Altrix is moving into the same model, And today we have four cloud products with cloud. the path to insights starting with your snowflake data. You'd go to market strategy. And so we shifted to an industry focus customers to go even faster and start to map to lines of business with Altryx What should we look for in the coming year? blueprints, and extending that at no charge to our partners with snowflake, we're already collaborating with What should we be looking for from, excited about the ability to plug into the data marketplace to provide third party data sets, Got to leave it right there in a moment.

ENTITIES

Entity	Category	Confidence
Derek	PERSON	0.99+
Dave	PERSON	0.99+
Bob	PERSON	0.99+
Altrix	ORGANIZATION	0.99+
Tara	PERSON	0.99+
Bart	PERSON	0.99+
Altryx	ORGANIZATION	0.99+
30 minutes	QUANTITY	0.99+
Terike	PERSON	0.99+
Tarik Dwiek	PERSON	0.99+
Ultrix	ORGANIZATION	0.99+
Barb	PERSON	0.99+
Barb Huelskamp	PERSON	0.99+
Juniper	ORGANIZATION	0.99+
2020	DATE	0.99+
three	QUANTITY	0.99+
Terrick	PERSON	0.99+
Java	TITLE	0.99+
last year	DATE	0.99+
second challenge	QUANTITY	0.99+
One	QUANTITY	0.99+
less than 20%	QUANTITY	0.99+
Alteryx	ORGANIZATION	0.99+
Claudia	PERSON	0.99+
today	DATE	0.99+
Python	TITLE	0.99+
Scala	TITLE	0.99+
ETR	ORGANIZATION	0.99+
a year ago	DATE	0.98+
30%	QUANTITY	0.98+
first one	QUANTITY	0.98+
both	QUANTITY	0.98+
both technologies	QUANTITY	0.98+
one	QUANTITY	0.98+
first analytic	QUANTITY	0.97+
360 view	QUANTITY	0.97+
two great assets	QUANTITY	0.96+
GSI	ORGANIZATION	0.96+
Tarik	PERSON	0.96+
Tableau	TITLE	0.93+
Alteryx	PERSON	0.92+
SAS	ORGANIZATION	0.91+
BARR	ORGANIZATION	0.87+
three particular ones	QUANTITY	0.85+
last two years	DATE	0.85+
Sainsbury	ORGANIZATION	0.84+
BARR	PERSON	0.8+
hundreds of joint customers	QUANTITY	0.74+
hundreds of happy joint customers	QUANTITY	0.7+
mark	PERSON	0.66+

Accelerating Automated Analytics in the Cloud with Alteryx

>>Alteryx is a company with a long history that goes all the way back to the late 1990s. Now the one consistent theme over 20 plus years has been that Ultrix has always been a data company early in the big data and Hadoop cycle. It saw the need to combine and prep different data types so that organizations could analyze data and take action Altrix and similar companies played a critical role in helping companies become data-driven. The problem was the decade of big data, brought a lot of complexities and required immense skills just to get the technology to work as advertised this in turn limited, the pace of adoption and the number of companies that could really lean in and take advantage of the cloud began to change all that and set the foundation for today's theme to Zuora of digital transformation. We hear that phrase a ton digital transformation. >>People used to think it was a buzzword, but of course we learned from the pandemic that if you're not a digital business, you're out of business and a key tenant of digital transformation is democratizing data, meaning enabling, not just hypo hyper specialized experts, but anyone business users to put data to work. Now back to Ultrix, the company has embarked on a major transformation of its own. Over the past couple of years, brought in new management, they've changed the way in which it engaged with customers with the new subscription model and it's topgraded its talent pool. 2021 was even more significant because of two acquisitions that Altrix made hyper Ana and trifecta. Why are these acquisitions important? Well, traditionally Altryx sold to business analysts that were part of the data pipeline. These were fairly technical people who had certain skills and were trained in things like writing Python code with hyper Ana Altryx has added a new persona, the business user, anyone in the business who wanted to gain insights from data and, or let's say use AI without having to be a deep technical expert. >>And then Trifacta a company started in the early days of big data by cube alum, Joe Hellerstein and his colleagues at Berkeley. They knocked down the data engineering persona, and this gives Altryx a complimentary extension into it where things like governance and security are paramount. So as we enter 2022, the post isolation economy is here and we do so with a digital foundation built on the confluence of cloud native technologies, data democratization and machine intelligence or AI, if you prefer. And Altryx is entering that new era with an expanded portfolio, new go-to market vectors, a recurring revenue business model, and a brand new outlook on how to solve customer problems and scale a company. My name is Dave Vellante with the cube and I'll be your host today. And the next hour, we're going to explore the opportunities in this new data market. And we have three segments where we dig into these trends and themes. First we'll talk to Jay Henderson, vice president of product management at Ultrix about cloud acceleration and simplifying complex data operations. Then we'll bring in Suresh Vetol who's the chief product officer at Altrix and Adam Wilson, the CEO of Trifacta, which of course is now part of Altrix. And finally, we'll hear about how Altryx is partnering with snowflake and the ecosystem and how they're integrating with data platforms like snowflake and what this means for customers. And we may have a few surprises sprinkled in as well into the conversation let's get started. >>We're kicking off the program with our first segment. Jay Henderson is the vice president of product management Altryx and we're going to talk about the trends and data, where we came from, how we got here, where we're going. We get some launch news. Well, Jay, welcome to the cube. >>Great to be here, really excited to share some of the things we're working on. >>Yeah. Thank you. So look, you have a deep product background, product management, product marketing, you've done strategy work. You've been around software and data, your entire career, and we're seeing the collision of software data cloud machine intelligence. Let's start with the customer and maybe we can work back from there. So if you're an analytics or data executive in an organization, w J what's your north star, where are you trying to take your company from a data and analytics point of view? >>Yeah, I mean, you know, look, I think all organizations are really struggling to get insights out of their data. I think one of the things that we see is you've got digital exhaust, creating large volumes of data storage is really cheap, so it doesn't cost them much to keep it. And that results in a situation where the organization's, you know, drowning in data, but somehow still starving for insights. And so I think, uh, you know, when I talk to customers, they're really excited to figure out how they can put analytics in the hands of every single person in their organization, and really start to democratize the analytics, um, and, you know, let the, the business users and the whole organization get value out of all that data they have. >>And we're going to dig into that throughout this program data, I like to say is plentiful insights, not always so much. Tell us about your launch today, Jay, and thinking about the trends that you just highlighted, the direction that your customers want to go and the problems that you're solving, what role does the cloud play in? What is what you're launching? How does that fit in? >>Yeah, we're, we're really excited today. We're launching the Altryx analytics cloud. That's really a portfolio of cloud-based solutions that have all been built from the ground up to be cloud native, um, and to take advantage of things like based access. So that it's really easy to give anyone access, including folks on a Mac. Um, it, you know, it also lets you take advantage of elastic compute so that you can do, you know, in database processing and cloud native, um, solutions that are gonna scale to solve the most complex problems. So we've got a portfolio of solutions, things like designer cloud, which is our flagship designer product in a browser and on the cloud, but we've got ultra to machine learning, which helps up-skill regular old analysts with advanced machine learning capabilities. We've got auto insights, which brings a business users into the fold and automatically unearths insights using AI and machine learning. And we've got our latest edition, which is Trifacta that helps data engineers do data pipelining and really, um, you know, create a lot of the underlying data sets that are used in some of this, uh, downstream analytics. >>Let's dig into some of those roles if we could a little bit, I mean, you've traditionally Altryx has served the business analysts and that's what designer cloud is fit for, I believe. And you've explained, you know, kind of the scope, sorry, you've expanded that scope into the, to the business user with hyper Anna. And we're in a moment we're going to talk to Adam Wilson and Suresh, uh, about Trifacta and that recent acquisition takes you, as you said, into the data engineering space in it. But in thinking about the business analyst role, what's unique about designer cloud cloud, and how does it help these individuals? >>Yeah, I mean, you know, really, I go back to some of the feedback we've had from our customers, which is, um, you know, they oftentimes have dozens or hundreds of seats of our designer desktop product, you know, really, as they look to take the next step, they're trying to figure out how do I give access to that? Those types of analytics to thousands of people within the organization and designer cloud is, is really great for that. You've got the browser-based interface. So if folks are on a Mac, they can really easily just pop, open the browser and get access to all of those, uh, prep and blend capabilities to a lot of the analysis we're doing. Um, it's a great way to scale up access to the analytics and then start to put it in the hands of really anyone in the organization, not just those highly skilled power users. >>Okay, great. So now then you add in the hyper Anna acquisition. So now you're targeting the business user Trifacta comes into the mix that deeper it angle that we talked about, how does this all fit together? How should we be thinking about the new Altryx portfolio? >>Yeah, I mean, I think it's pretty exciting. Um, you know, when you think about democratizing analytics and providing access to all these different groups of people, um, you've not been able to do it through one platform before. Um, you know, it's not going to be one interface that meets the, of all these different groups within the organization. You really do need purpose built specialized capabilities for each group. And finally, today with the announcement of the alternates analytics cloud, we brought together all of those different capabilities, all of those different interfaces into a single in the end application. So really finally delivering on the promise of providing analytics to all, >>How much of this you've been able to share with your customers and maybe your partners. I mean, I know OD is fairly new, but if you've been able to get any feedback from them, what are they saying about it? >>Uh, I mean, it's, it's pretty amazing. Um, we ran a early access, limited availability program that led us put a lot of this technology in the hands of over 600 customers, um, over the last few months. So we have gotten a lot of feedback. I tell you, um, it's been overwhelmingly positive. I think organizations are really excited to unlock the insights that have been hidden in all this data. They've got, they're excited to be able to use analytics in every decision that they're making so that the decisions they have or more informed and produce better business outcomes. Um, and, and this idea that they're going to move from, you know, dozens to hundreds or thousands of people who have access to these kinds of capabilities, I think has been a really exciting thing that is going to accelerate the transformation that these customers are on. >>Yeah, those are good. Good, good numbers for, for preview mode. Let's, let's talk a little bit about vision. So it's democratizing data is the ultimate goal, which frankly has been elusive for most organizations over time. How's your cloud going to address the challenges of putting data to work across the entire enterprise? >>Yeah, I mean, I tend to think about the future and some of the investments we're making in our products and our roadmap across four big themes, you know, in the, and these are really kind of enduring themes that you're going to see us making investments in over the next few years, the first is having cloud centricity. You know, the data gravity has been moving to the cloud. We need to be able to provide access, to be able to ingest and manipulate that data, to be able to write back to it, to provide cloud solution. So the first one is really around cloud centricity. The second is around big data fluency. Once you have all of the data, you need to be able to manipulate it in a performant manner. So having the elastic cloud infrastructure and in database processing is so important, the third is around making AI a strategic advantage. >>So, uh, you know, getting everyone involved and accessing AI and machine learning to unlock those insights, getting it out of the hands of the small group of data scientists, putting it in the hands of analysts and business users. Um, and then the fourth thing is really providing access across the entire organization. You know, it and data engineers, uh, as well as business owners and analysts. So, um, cloud centricity, big data fluency, um, AI is a strategic advantage and, uh, personas across the organization are really the four big themes you're going to see us, uh, working on over the next few months and, uh, coming coming year. >>That's good. Thank you for that. So, so on a related question, how do you see the data organizations evolving? I mean, traditionally you've had, you know, monolithic organizations, uh, very specialized or I might even say hyper specialized roles and, and your, your mission of course is the customer. You, you, you, you and your customers, they want to democratize the data. And so it seems logical that domain leaders are going to take more responsibility for data, life cycles, data ownerships, low code becomes more important. And perhaps this kind of challenges, the historically highly centralized and really specialized roles that I just talked about. How do you see that evolving and, and, and what role will Altryx play? >>Yeah. Um, you know, I think we'll see sort of a more federated systems start to emerge. Those centralized groups are going to continue to exist. Um, but they're going to start to empower, you know, in a much more de-centralized way, the people who are closer to the business problems and have better business understanding. I think that's going to let the centralized highly skilled teams work on, uh, problems that are of higher value to the organization. The kinds of problems where one or 2% lift in the model results in millions of dollars a day for the business. And then by pushing some of the analytics out to, uh, closer to the edge and closer to the business, you'll be able to apply those analytics in every single decision. So I think you're going to see, you know, both the decentralized and centralized models start to work in harmony and a little bit more about almost a federated sort of a way. And I think, you know, the exciting thing for us at Altryx is, you know, we want to facilitate that. We want to give analytic capabilities and solutions to both groups and types of people. We want to help them collaborate better, um, and drive business outcomes with the analytics they're using. >>Yeah. I mean, I think my take on another one, if you could comment is to me, the technology should be an operational detail and it has been the, the, the dog that wags the tail, or maybe the other way around, you mentioned digital exhaust before. I mean, essentially it's digital exhaust coming out of operationals systems that then somehow, eventually end up in the hand of the domain users. And I wonder if increasingly we're going to see those domain users, users, those, those line of business experts get more access. That's your goal. And then even go beyond analytics, start to build data products that could be monetized, and that maybe it's going to take a decade to play out, but that is sort of a new era of data. Do you see it that way? >>Absolutely. We're actually making big investments in our products and capabilities to be able to create analytic applications and to enable somebody who's an analyst or business user to create an application on top of the data and analytics layers that they have, um, really to help democratize the analytics, to help prepackage some of the analytics that can drive more insights. So I think that's definitely a trend we're going to see more. >>Yeah. And to your point, if you can federate the governance and automate that, then that can happen. I mean, that's a key part of it, obviously. So, all right, Jay, we have to leave it there up next. We take a deep dive into the Altryx recent acquisition of Trifacta with Adam Wilson who led Trifacta for more than seven years. It's the recipe. Tyler is the chief product officer at Altryx to explain the rationale behind the acquisition and how it's going to impact customers. Keep it right there. You're watching the cube. You're a leader in enterprise tech coverage. >>It's go time, get ready to accelerate your data analytics journey with a unified cloud native platform. That's accessible for everyone on the go from home to office and everywhere in between effortless analytics to help you go from ideas to outcomes and no time. It's your time to shine. It's Altryx analytics cloud time. >>Okay. We're here with. Who's the chief product officer at Altryx and Adam Wilson, the CEO of Trifacta. Now of course, part of Altryx just closed this quarter. Gentlemen. Welcome. >>Great to be here. >>Okay. So let me start with you. In my opening remarks, I talked about Altrix is traditional position serving business analysts and how the hyper Anna acquisition brought you deeper into the business user space. What does Trifacta bring to your portfolio? Why'd you buy the company? >>Yeah. Thank you. Thank you for the question. Um, you know, we see, uh, we see a massive opportunity of helping, um, brands, um, democratize the use of analytics across their business. Um, every knowledge worker, every individual in the company should have access to analytics. It's no longer optional, um, as they navigate their businesses with that in mind, you know, we know designer and are the products that Altrix has been selling the past decade or so do a really great job, um, addressing the business analysts, uh, with, um, hyper Rana now kind of renamed, um, Altrix auto. We even speak with the business owner and the line of business owner. Who's looking for insights that aren't real in traditional dashboards and so on. Um, but we see this opportunity of really helping the data engineering teams and it organizations, um, to also make better use of analytics. Um, and that's where the drive factor comes in for us. Um, drive factor has the best data engineering cloud in the planet. Um, they have an established track record of working across multiple cloud platforms and helping data engineers, um, do better data pipelining and work better with, uh, this massive kind of cloud transformation that's happening in every business. Um, and so fact made so much sense for us. >>Yeah. Thank you for that. I mean, you, look, you could have built it yourself would have taken, you know, who knows how long, you know, but, uh, so definitely a great time to market move, Adam. I wonder if we could dig into Trifacta some more, I mean, I remember interviewing Joe Hellerstein in the early days. You've talked about this as well, uh, on the cube coming at the problem of taking data from raw refined to an experience point of view. And Joe in the early days, talked about flipping the model and starting with data visualization, something Jeff, her was expert at. So maybe explain how we got here. We used to have this cumbersome process of ETL and you may be in some others changed that model with ELL and then T explain how Trifacta really changed the data engineering game. >>Yeah, that's exactly right. Uh, David, it's been a really interesting journey for us because I think the original hypothesis coming out of the campus research, uh, at Berkeley and Stanford that really birth Trifacta was, you know, why is it that the people who know the data best can't do the work? You know, why is this become the exclusive purview of the highly technical? And, you know, can we rethink this and make this a user experience, problem powered by machine learning that will take some of the more complicated things that people want to do with data and really help to automate those. So, so a broader set of, of users can, um, can really see for themselves and help themselves. And, and I think that, um, there was a lot of pent up frustration out there because people have been told for, you know, for a decade now to be more data-driven and then the whole time they're saying, well, then give me the data, you know, in the shape that I could use it with the right level of quality and I'm happy to be, but don't tell me to be more data-driven and then, and, and not empower me, um, to, to get in there and to actually start to work with the data in meaningful ways. >>And so, um, that was really, you know, what, you know, the origin story of the company and I think is, as we, um, saw over the course of the last 5, 6, 7 years that, um, you know, uh, real, uh, excitement to embrace this idea of, of trying to think about data engineering differently, trying to democratize the, the ETL process and to also leverage all these exciting new, uh, engines and platforms that are out there that allow for processing, you know, ever more diverse data sets, ever larger data sets and new and interesting ways. And that's where a lot of the push-down or the ELT approaches that, you know, I think it could really won the day. Um, and that, and that for us was a hallmark of the solution from the very beginning. >>Yeah, this is a huge point that you're making is, is first of all, there's a large business, it's probably about a hundred billion dollar Tam. Uh, and the, the point you're making, because we've looked, we've contextualized most of our operational systems, but the big data pipeline is hasn't gotten there. But, and maybe we could talk about that a little bit because democratizing data is Nirvana, but it's been historically very difficult. You've got a number of companies it's very fragmented and they're all trying to attack their little piece of the problem to achieve an outcome, but it's been hard. And so what's going to be different about Altryx as you bring these puzzle pieces together, how is this going to impact your customers who would like to take that one? >>Yeah, maybe, maybe I'll take a crack at it. And Adam will, um, add on, um, you know, there hasn't been a single platform for analytics, automation in the enterprise, right? People have relied on, uh, different products, um, to solve kind of, uh, smaller problems, um, across this analytics, automation, data transformation domain. Um, and, um, I think uniquely Alcon's has that opportunity. Uh, we've got 7,000 plus customers who rely on analytics for, um, data management, for analytics, for AI and ML, uh, for transformations, uh, for reporting and visualization for automated insights and so on. Um, and so by bringing drive factor, we have the opportunity to scale this even further and solve for more use cases, expand the scenarios where it's applied and so multiple personas. Um, and we just talked about the data engineers. They are really a growing stakeholder in this transformation of data and analytics. >>Yeah, good. Maybe we can stay on this for a minute cause you, you you're right. You bring it together. Now at least three personas the business analyst, the end user slash business user. And now the data engineer, which is really out of an it role in a lot of companies, and you've used this term, the data engineering cloud, what is that? How is it going to integrate in with, or support these other personas? And, and how's it going to integrate into the broader ecosystem of clouds and cloud data warehouses or any other data stores? >>Yeah, no, that's great. Uh, yeah, I think for us, we really looked at this and said, you know, we want to build an open and interactive cloud platform for data engineers, you know, to collaboratively profile pipeline, um, and prepare data for analysis. And that really meant collaborating with the analysts that were in the line of business. And so this is why a big reason why this combination is so magic because ultimately if we can get the data engineers that are creating the data products together with the analysts that are in the line of business that are driving a lot of the decision making and allow for that, what I would describe as collaborative curation of the data together, so that you're starting to see, um, uh, you know, increasing returns to scale as this, uh, as this rolls out. I just think that is an incredibly powerful combination and, and frankly, something that the market is not crack the code on yet. And so, um, I think when we, when I sat down with Suresh and with mark and the team at Ultrix, that was really part of the, the, the big idea, the big vision that was painted and got us really energized about the acquisition and about the potential of the combination. >>And you're really, you're obviously writing the cloud and the cloud native wave. Um, and, but specifically we're seeing, you know, I almost don't even want to call it a data warehouse anyway, because when you look at what's, for instance, Snowflake's doing, of course their marketing is around the data cloud, but I actually think there's real justification for that because it's not like the traditional data warehouse, right. It's, it's simplified get there fast, don't necessarily have to go through the central organization to share data. Uh, and, and, and, but it's really all about simplification, right? Isn't that really what the democratization comes down to. >>Yeah. It's simplification and collaboration. Right. I don't want to, I want to kind of just what Adam said resonates with me deeply. Um, analytics is one of those, um, massive disciplines inside an enterprise that's really had the weakest of tools. Um, and we just have interfaces to collaborate with, and I think truly this was all drinks and a superpower was helping the analysts get more out of their data, get more out of the analytics, like imagine a world where these people are collaborating and sharing insights in real time and sharing workflows and getting access to new data sources, um, understanding data models better, I think, um, uh, curating those insights. I boring Adam's phrase again. Um, I think that creates a real value inside the organization because frankly in scaling analytics and democratizing analytics and data, we're still in such early phases of this journey. >>So how should we think about designer cloud, which is from Altrix it's really been the on-prem and the server desktop offering. And of course Trifacta is with cloud cloud data warehouses. Right. Uh, how, how should we think about those two products? Yeah, >>I think, I think you should think about them. And, uh, um, as, as very complimentary right designer cloud really shares a lot of DNA and heritage with, uh, designer desktop, um, the low code tooling and that interface, uh, the really appeals to the business analysts, um, and gets a lot of the things that they do well, we've also built it with interoperability in mind, right. So if you started building your workflows in designer desktop, you want to share that with design and cloud, we want to make it super easy for you to do that. Um, and I think over time now we're only a week into, um, this Alliance with, um, with, um, Trifacta, um, I think we have to get deeper inside to think about what does the data engineer really need? What's the business analysts really need and how to design a cloud, and Trifacta really support both of those requirements, uh, while kind of continue to build on the trifecta on the amazing Trifacta cloud platform. >>You know, >>I think we're just going to say, I think that's one of the things that, um, you know, creates a lot of, uh, opportunity as we go forward, because ultimately, you know, Trifacta took a platform, uh, first mentality to everything that we built. So thinking about openness and extensibility and, um, and how over time people could build things on top of factor that are a variety of analytic tool chain, or analytic applications. And so, uh, when you think about, um, Ultrix now starting to, uh, to move some of its capabilities or to provide additional capabilities, uh, in the cloud, um, you know, Trifacta becomes a platform that can accelerate, you know, all of that work and create, uh, uh, a cohesive set of, of cloud-based services that, um, share a common platform. And that maintains independence because both companies, um, have been, uh, you know, fiercely independent, uh, and, and really giving people choice. >>Um, so making sure that whether you're, uh, you know, picking one cloud platform and other, whether you're running things on the desktop, uh, whether you're running in hybrid environments, that, um, no matter what your decision, um, you're always in a position to be able to get out your data. You're always in a position to be able to cleanse transform shape structure, that data, and ultimately to deliver, uh, the analytics that you need. And so I think in that sense, um, uh, you know, this, this again is another reason why the combination, you know, fits so well together, giving people, um, the choice. Um, and as they, as they think about their analytics strategy and their platform strategy going forward, >>Yeah. I make a chuckle, but one of the reasons I always liked Altrix is cause you kinda did the little end run on it. It can be a blocker sometimes, but that created problems, right? Because the organization said, wow, this big data stuff has taken off, but we need security. We need governance. And it's interesting because you've got, you know, ETL has been complex, whereas the visualization tools, they really, you know, really weren't great at governance and security. It took some time there. So that's not, not their heritage. You're bringing those worlds together. And I'm interested, you guys just had your sales kickoff, you know, what was their reaction like? Uh, maybe Suresh, you could start off and maybe Adam, you could bring us home. >>Um, thanks for asking about our sales kickoff. So we met for the first time and you've got a two years, right. For, as, as it is for many of us, um, in person, uh, um, which I think was a, was a real breakthrough as Qualtrics has been on its transformation journey. Uh, we added a Trifacta to, um, the, the potty such as the tour, um, and getting all of our sales teams and product organizations, um, to meet in person in one location. I thought that was very powerful for other the company. Uh, but then I tell you, um, um, the reception for Trifacta was beyond anything I could have imagined. Uh, we were working out him and I will, when he's so hot on, on the deal and the core hypotheses and so on. And then you step back and you're going to share the vision with the field organization, and it blows you away, the energy that it creates among our sellers out of partners. >>And I'm sure Madam will and his team were mocked, um, every single day, uh, with questions and opportunities to bring them in. But Adam, maybe you should share. Yeah, no, it was, uh, it was through the roof. I mean, uh, uh, the, uh, the amount of energy, the, uh, certainly how welcoming everybody was, uh, uh, you know, just, I think the story makes so much sense together. I think culturally, the company is, are very aligned. Um, and, uh, it was a real, uh, real capstone moment, uh, to be able to complete the acquisition and to, and to close and announced, you know, at the kickoff event. And, um, I think, you know, for us, when we really thought about it, you know, when we ended, the story that we told was just, you have this opportunity to really cater to what the end users care about, which is a lot about interactivity and self-service, and at the same time. >>And that's, and that's a lot of the goodness that, um, that Altryx is, has brought, you know, through, you know, you know, years and years of, of building a very vibrant community of, you know, thousands, hundreds of thousands of users. And on the other side, you know, Trifacta bringing in this data engineering focus, that's really about, uh, the governance things that you mentioned and the openness, um, that, that it cares deeply about. And all of a sudden, now you have a chance to put that together into a complete story where the data engineering cloud and analytics, automation, you know, coming together. And, um, and I just think, you know, the lights went on, um, you know, for people instantaneously and, you know, this is a story that, um, that I think the market is really hungry for. And certainly the reception we got from, uh, from the broader team at kickoff was, uh, was a great indication. >>Well, I think the story hangs together really well, you know, one of the better ones I've seen in, in this space, um, and, and you guys coming off a really, really strong quarter. So congratulations on that jets. We have to leave it there. I really appreciate your time today. Yeah. Take a look at this short video. And when we come back, we're going to dig into the ecosystem and the integration into cloud data warehouses and how leading organizations are creating modern data teams and accelerating their digital businesses. You're watching the cube you're leader in enterprise tech coverage. >>This is your data housed neatly insecurely in the snowflake data cloud. And all of it has potential the potential to solve complex business problems, deliver personalized financial offerings, protect supply chains from disruption, cut costs, forecast, grow and innovate. All you need to do is put your data in the hands of the right people and give it an opportunity. Luckily for you. That's the easy part because snowflake works with Alteryx and Alteryx turns data into breakthroughs with just a click. Your organization can automate analytics with drag and drop building blocks, easily access snowflake data with both sequel and no SQL options, share insights, powered by Alteryx data science and push processing to snowflake for lightning, fast performance, you get answers you can put to work in your teams, get repeatable processes they can share in that's exciting because not only is your data no longer sitting around in silos, it's also mobilized for the next opportunity. Turn your data into a breakthrough Alteryx and snowflake >>Okay. We're back here in the queue, focusing on the business promise of the cloud democratizing data, making it accessible and enabling everyone to get value from analytics, insights, and data. We're now moving into the eco systems segment the power of many versus the resources of one. And we're pleased to welcome. Barb Hills camp was the senior vice president partners and alliances at Ultrix and a special guest Terek do week head of technology alliances at snowflake folks. Welcome. Good to see you. >>Thank you. Thanks for having me. Good to see >>Dave. Great to see you guys. So cloud migration, it's one of the hottest topics. It's the top one of the top initiatives of senior technology leaders. We have survey data with our partner ETR it's number two behind security, and just ahead of analytics. So we're hovering around all the hot topics here. Barb, what are you seeing with respect to customer, you know, cloud migration momentum, and how does the Ultrix partner strategy fit? >>Yeah, sure. Partners are central company's strategy. They always have been. We recognize that our partners have deep customer relationships. And when you connect that with their domain expertise, they're really helping customers on their cloud and business transformation journey. We've been helping customers achieve their desired outcomes with our partner community for quite some time. And our partner base has been growing an average of 30% year over year, that partner community and strategy now addresses several kinds of partners, spanning solution providers to global SIS and technology partners, such as snowflake and together, we help our customers realize the business promise of their journey to the cloud. Snowflake provides a scalable storage system altereds provides the business user friendly front end. So for example, it departments depend on snowflake to consolidate data across systems into one data cloud with Altryx business users can easily unlock that data in snowflake solving real business outcomes. Our GSI and solution provider partners are instrumental in providing that end to end benefit of a modern analytic stack in the cloud providing platform, guidance, deployment, support, and other professional services. >>Great. Let's get a little bit more into the relationship between Altrix and S in snowflake, the partnership, maybe a little bit about the history, you know, what are the critical aspects that we should really focus on? Barb? Maybe you could start an Interra kindly way in as well. >>Yeah, so the relationship started in 2020 and all shirts made a big bag deep with snowflake co-innovating and optimizing cloud use cases together. We are supporting customers who are looking for that modern analytic stack to replace an old one or to implement their first analytic strategy. And our joint customers want to self-serve with data-driven analytics, leveraging all the benefits of the cloud, scalability, accessibility, governance, and optimizing their costs. Um, Altrix proudly achieved. Snowflake's highest elite tier in their partner program last year. And to do that, we completed a rigorous third party testing process, which also helped us make some recommended improvements to our joint stack. We wanted customers to have confidence. They would benefit from high quality and performance in their investment with us then to help customers get the most value out of the destroyed solution. We developed two great assets. One is the officer starter kit for snowflake, and we coauthored a joint best practices guide. >>The starter kit contains documentation, business workflows, and videos, helping customers to get going more easily with an altered since snowflake solution. And the best practices guide is more of a technical document, bringing together experiences and guidance on how Altryx and snowflake can be deployed together. Internally. We also built a full enablement catalog resources, right? We wanted to provide our account executives more about the value of the snowflake relationship. How do we engage and some best practices. And now we have hundreds of joint customers such as Juniper and Sainsbury who are actively using our joint solution, solving big business problems much faster. >>Cool. Kara, can you give us your perspective on the partnership? >>Yeah, definitely. Dave, so as Barb mentioned, we've got this standing very successful partnership going back years with hundreds of happy joint customers. And when I look at the beginning, Altrix has helped pioneer the concept of self-service analytics, especially with use cases that we worked on with for, for data prep for BI users like Tableau and as Altryx has evolved to now becoming from data prep to now becoming a full end to end data science platform. It's really opened up a lot more opportunities for our partnership. Altryx has invested heavily over the last two years in areas of deep integration for customers to fully be able to expand their investment, both technologies. And those investments include things like in database pushed down, right? So customers can, can leverage that elastic platform, that being the snowflake data cloud, uh, with Alteryx orchestrating the end to end machine learning workflows Alteryx also invested heavily in snow park, a feature we released last year around this concept of data programmability. So all users were regardless of their business analysts, regardless of their data, scientists can use their tools of choice in order to consume and get at data. And now with Altryx cloud, we think it's going to open up even more opportunities. It's going to be a big year for the partnership. >>Yeah. So, you know, Terike, we we've covered snowflake pretty extensively and you initially solve what I used to call the, I still call the snake swallowing the basketball problem and cloud data warehouse changed all that because you had virtually infinite resources, but so that's obviously one of the problems that you guys solved early on, but what are some of the common challenges or patterns or trends that you see with snowflake customers and where does Altryx come in? >>Sure. Dave there's there's handful, um, that I can come up with today, the big challenges or trends for us, and Altrix really helps us across all of them. Um, there are three particular ones I'm going to talk about the first one being self-service analytics. If we think about it, every organization is trying to democratize data. Every organization wants to empower all their users, business users, um, you know, the, the technology users, but the business users, right? I think every organization has realized that if everyone has access to data and everyone can do something with data, it's going to make them competitively, give them a competitive advantage with Altrix is something we share that vision of putting that power in the hands of everyday users, regardless of the skillsets. So, um, with self-service analytics, with Ultrix designer they've they started out with self-service analytics as the forefront, and we're just scratching the surface. >>I think there was an analyst, um, report that shows that less than 20% of organizations are truly getting self-service analytics to their end users. Now, with Altryx going to Ultrix cloud, we think that's going to be a huge opportunity for us. Um, and then that opens up the second challenge, which is machine learning and AI, every organization is trying to get predictive analytics into every application that they have in order to be competitive in order to be competitive. Um, and with Altryx creating this platform so they can cater to both the everyday business user, the quote unquote, citizen data scientists, and making a code friendly for data scientists to be able to get at their notebooks and all the different tools that they want to use. Um, they fully integrated in our snow park platform, which I talked about before, so that now we get an end to end solution caring to all, all lines of business. >>And then finally this concept of data marketplaces, right? We, we created snowflake from the ground up to be able to solve the data sharing problem, the big data problem, the data sharing problem. And Altryx um, if we look at mobilizing your data, getting access to third-party datasets, to enrich with your own data sets, to enrich with, um, with your suppliers and with your partners, data sets, that's what all customers are trying to do in order to get a more comprehensive 360 view, um, within their, their data applications. And so with Altryx alterations, we're working on third-party data sets and marketplaces for quite some time. Now we're working on how do we integrate what Altrix is providing with the snowflake data marketplace so that we can enrich these workflows, these great, great workflows that Altrix writing provides. Now we can add third party data into that workflow. So that opens up a ton of opportunities, Dave. So those are three I see, uh, easily that we're going to be able to solve a lot of customer challenges with. >>So thank you for that. Terrick so let's stay on cloud a little bit. I mean, Altrix is undergoing a major transformation, big focus on the cloud. How does this cloud launch impact the partnership Terike from snowflakes perspective and then Barb, maybe, please add some color. >>Yeah, sure. Dave snowflake started as a cloud data platform. We saw our founders really saw the challenges that customers are having with becoming data-driven. And the biggest challenge was the complexity of having imagine infrastructure to even be able to do it, to get applications off the ground. And so we created something to be cloud-native. We created to be a SAS managed service. So now that that Altrix is moving to the same model, right? A cloud platform, a SAS managed service, we're just, we're just removing more of the friction. So we're going to be able to start to package these end to end solutions that are SAS based that are fully managed. So customers can, can go faster and they don't have to worry about all of the underlying complexities of, of, of stitching things together. Right? So, um, so that's, what's exciting from my viewpoint >>And I'll follow up. So as you said, we're investing heavily in the cloud a year ago, we had two pre desktop products, and today we have four cloud products with cloud. We can provide our users with more flexibility. We want to make it easier for the users to leverage their snowflake data in the Alteryx platform, whether they're using our beloved on-premise solution or the new cloud products were committed to that continued investment in the cloud, enabling our joint partner solutions to meet customer requirements, wherever they store their data. And we're working with snowflake, we're doing just that. So as customers look for a modern analytic stack, they expect that data to be easily accessible, right within a fast, secure and scalable platform. And the launch of our cloud strategy is a huge leap forward in making Altrix more widely accessible to all users in all types of roles, our GSI and our solution provider partners have asked for these cloud capabilities at scale, and they're excited to better support our customers, cloud and analytic >>Are. How about you go to market strategy? How would you describe your joint go to market strategy with snowflake? >>Sure. It's simple. We've got to work backwards from our customer's challenges, right? Driving transformation to solve problems, gain efficiencies, or help them save money. So whether it's with snowflake or other GSI, other partner types, we've outlined a joint journey together from recruit solution development, activation enablement, and then strengthening our go to market strategies to optimize our results together. We launched an updated partner program and within that framework, we've created new benefits for our partners around opportunity registration, new role based enablement and training, basically extending everything we do internally for our own go-to-market teams to our partners. We're offering partner, marketing resources and funding to reach new customers together. And as a matter of fact, we recently launched a fantastic video with snowflake. I love this video that very simply describes the path to insights starting with your snowflake data. Right? We do joint customer webinars. We're working on joint hands-on labs and have a wonderful landing page with a lot of assets for our customers. Once we have an interested customer, we engage our respective account managers, collaborating through discovery questions, proof of concepts really showcasing the desired outcome. And when you combine that with our partners technology or domain expertise, it's quite powerful, >>Dark. How do you see it? You'll go to market strategy. >>Yeah. Dave we've. Um, so we initially started selling, we initially sold snowflake as technology, right? Uh, looking at positioning the diff the architectural differentiators and the scale and concurrency. And we noticed as we got up into the larger enterprise customers, we're starting to see how do they solve their business problems using the technology, as well as them coming to us and saying, look, we want to also know how do you, how do you continue to map back to the specific prescriptive business problems we're having? And so we shifted to an industry focus last year, and this is an area where Altrix has been mature for probably since their inception selling to the line of business, right? Having prescriptive use cases that are particular to an industry like financial services, like retail, like healthcare and life sciences. And so, um, Barb talked about these, these starter kits where it's prescriptive, you've got a demo and, um, a way that customers can get off the ground and running, right? >>Cause we want to be able to shrink that time to market, the time to value that customers can watch these applications. And we want to be able to, to tell them specifically how we can map back to their business initiatives. So I see a huge opportunity to align on these industry solutions. As BARR mentioned, we're already doing that where we've released a few around financial services working in healthcare and retail as well. So that is going to be a way for us to allow customers to go even faster and start to map two lines of business with Alteryx. >>Great. Thanks Derek. Bob, what can we expect if we're observing this relationship? What should we look for in the coming year? >>A lot specifically with snowflake, we'll continue to invest in the partnership. Uh, we're co innovators in this journey, including snow park extensibility efforts, which Derek will tell you more about shortly. We're also launching these great news strategic solution blueprints, and extending that at no charge to our partners with snowflake, we're already collaborating with their retail and CPG team for industry blueprints. We're working with their data marketplace team to highlight solutions, working with that data in their marketplace. More broadly, as I mentioned, we're relaunching the ultra partner program designed to really better support the unique partner types in our global ecosystem, introducing new benefits so that with every partner, achievement or investment with ultra score, providing our partners with earlier access to benefits, um, I could talk about our program for 30 minutes. I know we don't have time. The key message here Alteryx is investing in our partner community across the business, recognizing the incredible value that they bring to our customers every day. >>Tarik will give you the last word. What should we be looking for from, >>Yeah, thanks. Thanks, Dave. As BARR mentioned, Altrix has been the forefront of innovating with us. They've been integrating into, uh, making sure again, that customers get the full investment out of snowflake things like in database push down that I talked about before that extensibility is really what we're excited about. Um, the ability for Ultrix to plug into this extensibility framework that we call snow park and to be able to extend out, um, ways that the end users can consume snowflake through, through sequel, which has traditionally been the way that you consume snowflake as well as Java and Scala, not Python. So we're excited about those, those capabilities. And then we're also excited about the ability to plug into the data marketplace to provide third party data sets, right there probably day sets in, in financial services, third party, data sets and retail. So now customers can build their data applications from end to end using ultrasound snowflake when the comprehensive 360 view of their customers, of their partners, of even their employees. Right? I think it's exciting to see what we're going to be able to do together with these upcoming innovations. Great >>Barb Tara, thanks so much for coming on the program, got to leave it right there in a moment, I'll be back with some closing thoughts in a summary, don't go away. >>1200 hours of wind tunnel testing, 30 million race simulations, 2.4 second pit stops make that 2.3. The sector times out the wazoo, whites are much of this velocity's pressures, temperatures, 80,000 components generating 11.8 billion data points and one analytics platform to make sense of it all. When McLaren needs to turn complex data into insights, they turn to Altryx Qualtrics analytics, automation, >>Okay, let's summarize and wrap up the session. We can pretty much agree the data is plentiful, but organizations continue to struggle to get maximum value out of their data investments. The ROI has been elusive. There are many reasons for that complexity data, trust silos, lack of talent and the like, but the opportunity to transform data operations and drive tangible value is immense collaboration across various roles. And disciplines is part of the answer as is democratizing data. This means putting data in the hands of those domain experts that are closest to the customer and really understand where the opportunity exists and how to best address them. We heard from Jay Henderson that we have all this data exhaust and cheap storage. It allows us to keep it for a long time. It's true, but as he pointed out that doesn't solve the fundamental problem. Data is spewing out from our operational systems, but much of it lacks business context for the data teams chartered with analyzing that data. >>So we heard about the trend toward low code development and federating data access. The reason this is important is because the business lines have the context and the more responsibility they take for data, the more quickly and effectively organizations are going to be able to put data to work. We also talked about the harmonization between centralized teams and enabling decentralized data flows. I mean, after all data by its very nature is distributed. And importantly, as we heard from Adam Wilson and Suresh Vittol to support this model, you have to have strong governance and service the needs of it and engineering teams. And that's where the trifecta acquisition fits into the equation. Finally, we heard about a key partnership between Altrix and snowflake and how the migration to cloud data warehouses is evolving into a global data cloud. This enables data sharing across teams and ecosystems and vertical markets at massive scale all while maintaining the governance required to protect the organizations and individuals alike. >>This is a new and emerging business model that is very exciting and points the way to the next generation of data innovation in the coming decade. We're decentralized domain teams get more facile access to data. Self-service take more responsibility for quality value and data innovation. While at the same time, the governance security and privacy edicts of an organization are centralized in programmatically enforced throughout an enterprise and an external ecosystem. This is Dave Volante. All these videos are available on demand@theqm.net altrix.com. Thanks for watching accelerating automated analytics in the cloud made possible by Altryx. And thanks for watching the queue, your leader in enterprise tech coverage. We'll see you next time.

Published Date : Mar 1 2022

SUMMARY :

It saw the need to combine and prep different data types so that organizations anyone in the business who wanted to gain insights from data and, or let's say use AI without the post isolation economy is here and we do so with a digital We're kicking off the program with our first segment. So look, you have a deep product background, product management, product marketing, And that results in a situation where the organization's, you know, the direction that your customers want to go and the problems that you're solving, what role does the cloud and really, um, you know, create a lot of the underlying data sets that are used in some of this, into the, to the business user with hyper Anna. of our designer desktop product, you know, really, as they look to take the next step, comes into the mix that deeper it angle that we talked about, how does this all fit together? analytics and providing access to all these different groups of people, um, How much of this you've been able to share with your customers and maybe your partners. Um, and, and this idea that they're going to move from, you know, So it's democratizing data is the ultimate goal, which frankly has been elusive for most You know, the data gravity has been moving to the cloud. So, uh, you know, getting everyone involved and accessing AI and machine learning to unlock seems logical that domain leaders are going to take more responsibility for data, And I think, you know, the exciting thing for us at Altryx is, you know, we want to facilitate that. the tail, or maybe the other way around, you mentioned digital exhaust before. the data and analytics layers that they have, um, really to help democratize the We take a deep dive into the Altryx recent acquisition of Trifacta with Adam Wilson It's go time, get ready to accelerate your data analytics journey the CEO of Trifacta. serving business analysts and how the hyper Anna acquisition brought you deeper into the with that in mind, you know, we know designer and are the products And Joe in the early days, talked about flipping the model that really birth Trifacta was, you know, why is it that the people who know the data best can't And so, um, that was really, you know, what, you know, the origin story of the company but the big data pipeline is hasn't gotten there. um, you know, there hasn't been a single platform for And now the data engineer, which is really And so, um, I think when we, when I sat down with Suresh and with mark and the team and, but specifically we're seeing, you know, I almost don't even want to call it a data warehouse anyway, Um, and we just have interfaces to collaborate And of course Trifacta is with cloud cloud data warehouses. What's the business analysts really need and how to design a cloud, and Trifacta really support both in the cloud, um, you know, Trifacta becomes a platform that can You're always in a position to be able to cleanse transform shape structure, that data, and ultimately to deliver, And I'm interested, you guys just had your sales kickoff, you know, what was their reaction like? And then you step back and you're going to share the vision with the field organization, and to close and announced, you know, at the kickoff event. And certainly the reception we got from, Well, I think the story hangs together really well, you know, one of the better ones I've seen in, in this space, And all of it has potential the potential to solve complex business problems, We're now moving into the eco systems segment the power of many Good to see So cloud migration, it's one of the hottest topics. on snowflake to consolidate data across systems into one data cloud with Altryx business the partnership, maybe a little bit about the history, you know, what are the critical aspects that we should really focus Yeah, so the relationship started in 2020 and all shirts made a big bag deep with snowflake And the best practices guide is more of a technical document, bringing together experiences and guidance So customers can, can leverage that elastic platform, that being the snowflake data cloud, one of the problems that you guys solved early on, but what are some of the common challenges or patterns or trends everyone has access to data and everyone can do something with data, it's going to make them competitively, application that they have in order to be competitive in order to be competitive. to enrich with your own data sets, to enrich with, um, with your suppliers and with your partners, So thank you for that. So now that that Altrix is moving to the same model, And the launch of our cloud strategy How would you describe your joint go to market strategy the path to insights starting with your snowflake data. You'll go to market strategy. And so we shifted to an industry focus So that is going to be a way for us to allow What should we look for in the coming year? blueprints, and extending that at no charge to our partners with snowflake, we're already collaborating with Tarik will give you the last word. Um, the ability for Ultrix to plug into this extensibility framework that we call Barb Tara, thanks so much for coming on the program, got to leave it right there in a moment, I'll be back with 11.8 billion data points and one analytics platform to make sense of it all. This means putting data in the hands of those domain experts that are closest to the customer are going to be able to put data to work. While at the same time, the governance security and privacy edicts

ENTITIES

Entity	Category	Confidence
Derek	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Suresh Vetol	PERSON	0.99+
Altryx	ORGANIZATION	0.99+
Jay	PERSON	0.99+
Joe Hellerstein	PERSON	0.99+
Dave	PERSON	0.99+
Dave Volante	PERSON	0.99+
Altrix	ORGANIZATION	0.99+
Jay Henderson	PERSON	0.99+
David	PERSON	0.99+
Adam	PERSON	0.99+
Barb	PERSON	0.99+
Jeff	PERSON	0.99+
2020	DATE	0.99+
Bob	PERSON	0.99+
Trifacta	ORGANIZATION	0.99+
Suresh Vittol	PERSON	0.99+
Tyler	PERSON	0.99+
Juniper	ORGANIZATION	0.99+
Alteryx	ORGANIZATION	0.99+
Ultrix	ORGANIZATION	0.99+
30 minutes	QUANTITY	0.99+
Terike	PERSON	0.99+
Adam Wilson	PERSON	0.99+
Joe	PERSON	0.99+
Suresh	PERSON	0.99+
Terrick	PERSON	0.99+
demand@theqm.net	OTHER	0.99+
thousands	QUANTITY	0.99+
Alcon	ORGANIZATION	0.99+
Kara	PERSON	0.99+
last year	DATE	0.99+
three	QUANTITY	0.99+
Qualtrics	ORGANIZATION	0.99+
less than 20%	QUANTITY	0.99+
hundreds	QUANTITY	0.99+
one	QUANTITY	0.99+
One	QUANTITY	0.99+
Java	TITLE	0.99+
more than seven years	QUANTITY	0.99+
two acquisitions	QUANTITY	0.99+

Benoit Dageville, Snowflake | AWS re:Invent 2021

(upbeat music) >> Hi, everyone, welcome back to theCUBE's coverage of AWS re:Invent 2021. We're wrapping up four days of coverage, two sets. Two remote sets, one in Boston, one in Palo Alto. And really, it's a pleasure to introduce Benoit Dageville. He's the Press Co-founder of Snowflake and President of Products. Benoit, thanks for taking some time out and coming to theCUBE. >> Yeah, thank you for having me, Dave. >> You know, it's really a pleasure. We've been watching Snowflake since, maybe not 2012, but mid last decade you hit our radar. We said, "Wow, this company is going to go places." And yeah, we made that call correctly. But it's been a pleasure to sort of follow you. We've talked a little bit remotely. I kind of want to go back to some of the fundamentals. First of all, I wanted mention your earnings last night. If you guys didn't see it, again, triple digit growth, $1.8 billion RPO, cashflow actually looking pretty good. So, pretty amazing. Oh, and 173% NRR, you know, wow. And Mike Scarpelli is kind of bummed that you did so well. And I know why, right? Because it's going to be at some point, and he dials it down for the expectations and Wall Street says, "Oh, he's sandbagging." And then at some point you're actually going to meet expectations and people are going to go, "Oh, they met expectations." But anyway, he's a smart guy, he know what he's doing. (Benoit laughing) I loved it, it was so funny listening to him last night. But anyway, I want to go back to, when I talked to practitioners about data warehousing pre-cloud, they would say sound bites like, it's like a snake swallowing a basketball, they would tell me. And the other thing they said, "We just chased the chips. Every time a new Intel chip comes out, we have to bring in new servers, and we're struggling." The cloud changed all that. Your vision and Terry's vision changed all that. Maybe go back to the fundamentals of what you saw. >> Yeah, we really wanted to address what we call the data challenges. And if you remember at that time, data challenge was first of the volume of data, machine-generated data. So it was way more than just structured data, right? Machine-generated data is weblogs, and it's at petabyte scale. And there was no good solution for that type of data. Big data was not a great solution, Hadoop was really bad. And there was no good solution for that. So we thought we should do something for big data. The other aspect was concurrency, right? Everyone wants to use these data analytic platform in an enterprise, right? And you have more and more workload running against the same data, and the systems that were built were not scaling for these workloads. So you had to silo data, right? That's the only way big enterprise could deal with that, is to create many different silos, Oracle, Teradata, data mass, you would hear data mass. All of it was to afloat, right, this data? And then there was the, what do we call, data sharing. How to get access to data which is not born inside the enterprise, right? So with Terry, we wanted to solve all these challenges and we thought the only way to solve it was the cloud. And the cloud has really two free aspects. One is the elasticity, for all of a sudden, you can run every workload that you want concurrently, in parallel, on different computer resources, and you can run them against the same data. So this is kind of the data lake model, if you want. At the same time, you can, in the cloud, create a service. So you can remove complexity from users and make it really easy for new workloads to be added to the system, because you can manage, you can create a managed service, where all the sudden our customers, they don't need to manage infrastructure, they don't need to patch, they don't need to tune. Everything is done by Snowflake, the service, and they can just load in and run their query. And the third aspect is really collaboration. Is how to connect data sets together. And that's almost a new product for Snowflake, this data sharing. So we really at Snowflake was all about combining big data and data warehouse in one system in the cloud, and have only one single system where you can put all your data and all your workload. >> So you weren't necessarily trying to solve the data warehouse problem, you were trying to solve a data problem. And then it just so happened data warehouse was a logical entry point for you. >> It's really not that. Yes, we wanted to solve the data problem. And for us big data was a really important problem to solve. So from day one, Snowflake was all about machine generated data, petabyte scale, but we wanted to do it right. And for us, right was not compromising on data warehouse principle, which is a CDT of transaction, which is really fast response time, and which is also simplicity. So as I said, we wanted to solve kind of all the problems at the time of volume of data, concurrency, and these sharing aspects. >> This was 2012. You knew at that time that Hadoop wasn't going to be the answer. >> No, I mean, we were really, I mean, everyone knew that. Everyone knew Hadoop was really bad. You know, complex to manage, really slow. It had good aspects, right? This was the only system that could manage petabyte scale data sets. That's the only thing- >> Cheaply. >> Yeah, and cheaply which was good. And we wanted really to do that, plus have all the good attributes of data warehouse system. And at the same time, we wanted to build a system where if you are data warehouse customer, if you are coming from Teradata, you can migrate to Snowflake and you will get to a system which is faster than what you had on-premise, right. That's why it's pretty cool. So we wanted to do big data without compromising on data warehouse. >> So several years ago we looked at the hyperscalers and said, "Wow, last year they spent $100 billion in CapEx." And so, we started to think about this abstraction layer. And then we saw what you guys announced with the data cloud. We call it super clouds. And we see that as exactly what you're building. So that's clearly not just a data warehouse or database, it's technology that really hides the underlying complexity of all those clouds, and it allows you to have federated governance and data sharing, all those things. Can you talk about sort of how you think about that architecture? >> So for me, what I say is that really Snowflake is the worldwide web of data. And we are indeed a super cloud, or we are super-posed to the infrastructure cloud, which is our friends at Amazon, and of course, Azure, I mean, Microsoft and Google. And as any cloud, we have regions, Snowflake regions all over the world, and located on different cloud providers. At the same time, our platform is global in the sense that every region interconnects with all the other regions, this is our snow grid and data mesh, if you want. So that as an organization you can have your presence on several Snowflake region. It doesn't matter which cloud provider, so you can mix AWS with Azure. You can use our cloud like that. And indeed you can, this is a cloud where you can store your data, that's the thing that really matters, and data is structured, but it's machine structure, as I say, machine generated, petabyte scale, but there's also unstructured, right? We have added support for images, text, videos, where you can process this data in our system, and that's the workload spout. And workload, what is very important is that you can run this workload, any number of workloads. So the number of workloads is effectively unlimited with Snowflake because each workload can have its dedicated set of compute resources all operating on the same data set. And the type of workloads is also very important. It's not only about dashboards and data warehouse, it's data engineering, it's data science, it's building application. We have many of our customers who are building full-scale cloud applications on top of Snowflake. >> Yeah so the other thing, if you're not familiar with Snowflake, I don't know, maybe your head has been in the sand for a while, but separating compute and storage, I don't know if you were the first, but you were certainly the first to popularize it. And that allowed you to solve that chasing the chips problem and the swallowing the basketball, right? Because you have virtually infinite resources now at your disposal. >> Yeah, this is really the concurrency challenge that I was mentioning. Everyone wants to access the data. And of course, if everyone runs on the same set of compute resources, you have a bottleneck. So Snowflake was really about this multi-workload. We call it Multi-Cluster Shared Data Architecture. But it's not difficult to run multiple cluster if you don't have consistency of data. So how to do that while maintaining transactional property of data as CDT, right? You cannot modify data from different clusters. And when you commit, every other cluster will immediately see the change, right, as if everyone was running on the same cluster. So that was the challenge that we solve when we started Snowflake. >> Used the term data mesh. What is data mesh to Snowflake? Is it a concept, is it fabric? >> No, it's a very interesting point. As much as we like to centralize data, this becomes a bottleneck, right? When you are a large organization with different independent units, everyone wants to manage their own data and they have domain-specific expertise about that data. So having it centralized in IT is not practical. At the same time, you really want to be able to connect these different data sets together and join different data together, right? So that's the data mesh architecture. Each data set is managed independently by business owners, and then there is a contract which is exposed to others, and you can combine. And Snowflake architectures with data sharing, right. Data sharing that can happen within an organization, or across organization, allows you to connect any data with any other data on our platform. >> Yeah, so when I first heard that term, you guys using the term data mesh, I got very excited because it was kind of the data mesh is, my view, anyway, is going to be the fundamental architecture of this decade and beyond. And the principles, if I understand it correctly, you're applying the principles of Jim Octagon's data mesh within Snowflake. So decentralized data doesn't have to be physically in one place. Logically it's in the data cloud. >> It's logically decentralized, right? It's independently managed, and the reason, right, is the data that you need to use is not produced by your, even if in your company you want to centralize the data and having only one organization, let's say IT managing that, let's say, pretend. Yet you need to connect with other datasets, which is managed by other organizations. So by nature, the data that you use cannot be centralized, right? So now that you have this principle, if you have a platform where you can store all the data, wherever it is, and you can connect these data very seamlessly, then we can use that platform for your enterprise, right? To have different business units independently manage their data sets, connects these together so that as a company you have a 360 view of your customers, for example. But you can expand that outside of your enterprise and connect with data sets, which are from your vertical, for example, financial data set that you don't have in your company, or any public data set. >> And the other key principles, I think, that you've touched on really is the line of business now. Increasingly they're building data products that are creating value, and then also there's a self-service component. Assuming there's the fourth principle, governance. You got to have federated governance. And it seems like you've kind of ticked the boxes, more than tick the boxes, but engineered a solution to solve for those. >> No, it's very true. So Snowflake was really built to be really simple to use. And you're right. Our vision was, it would be more than IT, right? Who is going to use Snowflake is going now to be business unit, because you do not have to manage infrastructure. You do not have to patch. You do not have to do these things that business cannot do. You just have to load your data and run your queries, and run your applications. So now business can directly use Snowflake and create value from that. And yes, you're right, then connect that data with other data sets and to get maximum insights. >> Can you please talk about some of the things you do with AWS here at the event. I'm interested in what you're doing with your machine learning initiatives that you've recently announced, the AI piece. >> Yes, so one key aspects is data is not only about SQL, right? We started with SQL, but we expanded our platform to what we call data programmability, which is really about running program at scale across a large volume of data. And this was made popular with a programming model which was introduced by Pendal, DataFrames. Later taken by Spark, and now we have DataFrames in Snowflake, Where we are different than other systems, is that these DataFrame programs, which are in Python, or Java, or Scala, you program with data. These DataFrames are compiled to our single execution platforms. So we have one single execution platform, which is a data flow execution platform, which can run both SQL very efficiently, as I said, data warehouse speed, and also these very complex programs running Python and Java against this data. And this is a single platform. You don't need to use two different systems. >> Now so, you kind of really attack the traditional analytics base. People said, "Wow, Snowflake's really easy." Now you're injecting AI and machine intelligence. I see Databricks coming at it from the other angle. They started with machine learning, now they're sort of going after the analytics. Does there need to be a semantic layer to connect, 'cause it's the same raw data. Does there need to be a semantic layer to connect those two worlds? >> Yes, and that's what we are doing in our platform. And that's very novel to Snowflake. As I said, you interact with data in different program. You pick your program. You are a SQL programmer, use SQL. You are a Python programmer, use DataFrames with Python. It doesn't really matter. And then the semantic layer is our compiler and our processing engine, is going to translate both your program and my program in Python, your program in SQL, to the same execution platform and to the same programming language that Snowflake internally, we don't expose our programming language, but it's a data flow programming language that our execution platform executes. So at the end, we might execute exactly the same program, potentially. And that's very important because we spent all our IP and all our time, engineering time to optimize this platform, to make it the fastest platform. And we want to use that platform for any type of workloads, whether it's data programs or SQL. >> Now, you and Terry were at Oracle, so you know a lot about bench marketing. As Larry would stand up and say, "We killed the competition." You guys are probably behind it, right. So you know all about that. >> We are very behind it. >> So you know a lot about that. I've had some experience, I'm not a technologist, but I'm an observer and analyst. You have to take benchmarking with a very big grain of salt. So you guys have generally stayed away from that. Databricks came out and they came up with all these benchmarks. So you had to respond, because otherwise it's out there. Now you reran the benchmarks, you took out the materialized views and all the expensive stuff that they included in your cost, your price performance, but then you wrote, I thought, a very cogent blog. Maybe you could talk about sort of why you did that and your general philosophy around bench marketing. >> Yeah, from day one, with Terry we say never again we will participate in this really stupid benchmark war, because it's really not in the interest of customers. And we have been really at the frontline of that war with Terry, both of us, really doing special tricks, right? And optimizing this query to death, this query that no one runs apart from the synthetic benchmark. We optimize them to death to have the best number when we were at Oracle. And we decided that this is really not helping customers in the end. So we said, with Snowflake, we'll not do that. And actually, we are not the only one not to do that. If you look at who has published TPC-DS, you will see no one, none of the big vendors. It's not because they cannot run TPC-DS, Oracle can run it, I know that. And all the other big data warehouse vendor can, but it's something of a little bit of past. And TPC was really important at some point, and is not really relevant now. So we are not going to compete. And that's what we said is basically now our blog. We are not interesting in participating in this war. We want to invest our engineering effort and our IP in solving real world issues and performance issues that we have. And we want to improve our engine for these real world customers. And the nice thing with Snowflake, because it's a service, we see exactly all the queries that our customers are executing. So we know where we are struggling as a system, and that's where we want to invest and we want to improve. And if you look at many announcements that we made, it's all about under-the-cover improving Snowflake and getting the benefit of this improvement to our customer. So that was the message of that blog. And yes, the message was okay. Mr. Databricks, it's nice, and it's perfect that, I mean, everyone makes a decision, right? We made the decision not to participate. Databricks made another decision, which is very fine, and that's fine that they publish their number on their system. Where it is not fine is that they published number using Snowflake and misrepresenting our performance. And that's what we wanted also to correct. >> Yeah, well, thank you for going into that. I know it's, look, leaders don't necessarily have to get involved in that mudslide. (crosstalk) Enough said about that, so that's cool. I want to ask you, I interviewed Frank last spring, right after the lockdown, he was kind enough to come on virtually, and I asked him about on-prem. And he was, you know Frank, he doesn't mix words, He said, "We're not getting into a halfway house. That's not going to happen." And of course, you really can't do what you do on-prem. You can't separate compute, some have tried, but it's not the same. But at the same time that you see like Andreessen comes out with this blog that says a huge portion of your cost of goods sold is going to be the cloud, so you're going to have to repatriate. Help me square that circle. Is it cloud forever? Is it will you never say never? What can you share of that? >> I will never say never, it's not my style. I always say you can always change your mind, and maybe different factors can change your mind. What was true at some point might not be true at a later point. But as of now, I don't see any reason for us to go on-premise. As you mentioned at the beginning, right, Snowflake is growing like crazy. The world is moving to the cloud. I think maybe it goes both ways, but I would say 90% or 99% of the world is moving to the cloud. Maybe 1% is coming back for some very specific reasons. I don't think that the world is going to move back on-premise. So in the end we might miss a small percentage of the workload that will stay on-premise and that's okay. >> And as well, if you dig into some of the financial statements you'll see, read the notes where you've renegotiated, right? We're talking big numbers. Hundreds and hundreds of millions of dollars of cost reduction, actually more, over a 10 year period. Billions of your cloud bills. So the cloud suppliers, they don't want to lose you as a customer, right? You're one of their biggest customer. So it's awesome. Last question is kind of, your work now is to really drive the data cloud, get adoption up, build that supercloud, we call it. Maybe you could talk a little bit about how you see the future. >> The future is really broadened, the scope of Snowflake, and really, I would say the marketplace, and data sharing, and services, which are directly built natively on Snowflake and are shared through our platform, and can operate, it can mix data on provider-side with data on consumer-side, and creating this collaboration within the Snowflake data cloud, I think is really the future. And we are really only scratching the surface of that. And you can see the enthusiasm of Snowflake data cloud and vertical industry We have nuanced the final show data cloud. Industry, complete vertical industry, latching on that concept and collaborating via Snowflake, which was not possible before. And I think you talked about machine learning, for example. Machine learning, collaboration through machine learning, the ones who are building this advanced model might not be the same as the one who are consuming this model, right? It might be this collaboration between expertise and consumer of that expertise. So we are really at the beginning of this interconnected world. And to me the world wide web of data that we are creating is really going to be amazing. And it's all about connecting. >> And I'm glad you mentioned the ecosystem. I didn't give enough attention to that. Because as a cloud provider, which essentially you are, you've got to have a strong ecosystem. That's a hallmark of cloud. And then the other, vertical, that we didn't touch on, is media and entertainment. A lot of direct-to-consumer. I think healthcare is going to be a huge vertical for you guys. All right we got to go, Terry. Thanks so much for coming on "theCUBE." I really appreciate you. >> Thanks, Dave. >> And thank you for watching. This a wrap from AWS re:Invent 2021. "theCUBE," the leader in global tech coverage. We'll see you next time. (upbeat music)

Published Date : Dec 3 2021

SUMMARY :

and coming to theCUBE. and he dials it down for the expectations At the same time, you can, in So you weren't So as I said, we wanted to You knew at that time that Hadoop That's the only thing- And at the same time, we And then we saw what you guys is that you can run this And that allowed you to solve that And when you commit, every other cluster What is data mesh to Snowflake? At the same time, you really And the principles, if I is the data that you need to And the other key principles, I think, and to get maximum insights. some of the things you do and now we have DataFrames in Snowflake, 'cause it's the same raw data. and to the same programming language So you know all about that. and all the expensive stuff And the nice thing with But at the same time that you see So in the end we might And as well, if you dig into And I think you talked about And I'm glad you And thank you for watching.

ENTITIES

Entity	Category	Confidence
Frank	PERSON	0.99+
Mike Scarpelli	PERSON	0.99+
Benoit Dageville	PERSON	0.99+
Larry	PERSON	0.99+
Terry	PERSON	0.99+
Boston	LOCATION	0.99+
$1.8 billion	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
Benoit	PERSON	0.99+
Palo Alto	LOCATION	0.99+
Oracle	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
90%	QUANTITY	0.99+
$100 billion	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
Dave	PERSON	0.99+
last year	DATE	0.99+
Google	ORGANIZATION	0.99+
99%	QUANTITY	0.99+
2012	DATE	0.99+
Teradata	ORGANIZATION	0.99+
SQL	TITLE	0.99+
two sets	QUANTITY	0.99+
Snowflake	TITLE	0.99+
one	QUANTITY	0.99+
Andreessen	PERSON	0.99+
Two remote sets	QUANTITY	0.99+
one system	QUANTITY	0.99+
One	QUANTITY	0.99+
both	QUANTITY	0.99+
first	QUANTITY	0.99+
Hundreds	QUANTITY	0.99+
1%	QUANTITY	0.99+
third aspect	QUANTITY	0.99+
Scala	TITLE	0.99+
Snowflake	ORGANIZATION	0.99+
Python	TITLE	0.99+
Intel	ORGANIZATION	0.99+
Databricks	PERSON	0.99+
two free aspects	QUANTITY	0.99+
mid last decade	DATE	0.99+
Java	TITLE	0.99+
Jim Octagon	PERSON	0.99+
both ways	QUANTITY	0.99+
fourth principle	QUANTITY	0.98+
two worlds	QUANTITY	0.98+
last night	DATE	0.98+
173%	QUANTITY	0.98+
360 view	QUANTITY	0.98+
several years ago	DATE	0.98+
each workload	QUANTITY	0.97+
last spring	DATE	0.97+
CapEx	ORGANIZATION	0.97+
Wall Street	ORGANIZATION	0.97+
one organization	QUANTITY	0.95+
single platform	QUANTITY	0.95+
four days	QUANTITY	0.95+
First	QUANTITY	0.95+
Snowflake	EVENT	0.94+
Azure	ORGANIZATION	0.94+

Greg Rokita, Edmunds.com & Joel Minnick, Databricks | AWS re:Invent 2021

>>We'll come back to the cubes coverage of AWS reinvent 2021, the industry's most important hybrid event. Very few hybrid events, of course, in the last two years. And the cube is excited to be here. Uh, this is our ninth year covering AWS reinvent this the 10th reinvent we're here with Joel Minnick, who the vice president of product and partner marketing at smoking hot company, Databricks and Greg Rokita, who is executive director of technology at Edmonds. If you're buying a car or leasing a car, you gotta go to Edmund's. We're gonna talk about busting data silos, guys. Great to see you again. >>Welcome. Welcome. Glad to be here. >>All right. So Joel, what the heck is a lake house? This is all over the place. Everybody's talking about lake house. What is it? >>And it did well in a nutshell, a Lakehouse is the ability to have one unified platform to handle all of your traditional analytics workloads. So your BI and reporting Trisha, the lake, the workloads that you would have for your data warehouse on the same platform as the workloads that you would have for data science and machine learning. And so if you think about kind of the way that, uh, most organizations have built their infrastructure in the cloud today, what we have is generally customers will land all their data in a data lake and a data lake is fantastic because it's low cost, it's open. It's able to handle lots of different kinds of data. Um, but the challenges that data lakes have is that they don't necessarily scale very well. It's very hard to govern data in a data lake house. It's very hard to manage that data in a data lake, sorry, in a, in a data lake. >>And so what happens is that customers then move the data out of a data lake into downstream systems and what they tend to move it into our data warehouses to handle those traditional reporting kinds of workloads that they have. And they do that because data warehouses are really great at being able to have really great scale, have really great performance. The challenge though, is that data warehouses really only work for structured data. And regardless of what kind of data warehouse you adopt, all data warehouse and platforms today are built on some kind of proprietary format. So once you've put that data into the data warehouse, that's, that is kind of what you're locked into. The promise of the data lake house was to say, look, what if we could strip away all of that complexity and having to move data back and forth between all these different systems and keep the data exactly where it is today and where it is today is in the data lake. >>And then being able to apply a transaction layer on top of that. And the Databricks case, we do that through a technology and open source technology called data lake, or sorry, Delta lake. And what Delta lake allows us to do is when you need it, apply that performance, that reliability, that quality, that scale that you would expect out of a data warehouse directly on your data lake. And if I can do that, then what I'm able to do now is operate from one single source of truth that handles all of my analytics workloads, both my traditional analytics workloads and my data science and machine learning workloads, and being able to have all of those workloads on one common platform. It means that now not only do I get much, much more simple in the way that my infrastructure works and therefore able to operate at much lower costs, able to get things to production much, much faster. >>Um, but I'm also able to now to leverage open source in a much bigger way being that lake house is inherently built on an open platform. Okay. So I'm no longer locked into any kind of data format. And finally, probably one of the most, uh, lasting benefits of a lake house is that all the roles that have to take that have to touch my data for my data engineers, to my data analyst, my data scientists, they're all working on the same data, which means that collaboration that has to happen to go answer really hard problems with data. I'm now able to do much, much more easy because those silos that traditionally exist inside of my environment no longer have to be there. And so Lakehouse is that is the promise to have one single source of truth, one unified platform for all of my data. Okay, >>Great. Thank you for that very cogent description of what a lake house is now. Let's I want to hear from the customer to see, okay, this is what he just said. True. So actually, let me ask you this, Greg, because the other problem that you, you didn't mention about the data lake is that with no schema on, right, it gets messy and Databricks, I think, correct me if I'm wrong, has begun to solve that problem, right? Through series of tooling and AI. That's what Delta liked us. It's a man, like it's a managed service. Everybody thought you were going to be like the cloud era of spark and Brittany Britain, a brilliant move to create a managed service. And it's worked great. Now everybody has a managed service, but so can you paint a picture at Edmonds as to what you're doing with, maybe take us through your journey the early days of a dupe, a data lake. Oh, that sounds good. Throw it in there, paint a picture as to how you guys are using data and then tie it into what y'all just said. >>As Joel said, that they'll the, it simplifies the architecture quite a bit. Um, in a modern enterprise, you have to deal with a variety of different data sources, structured semi-structured and unstructured in the form of images and videos. And with Delta lake and built a lake, you can have one system that handles all those data sources. So what that does is that basically removes the issue of multiple systems that you have to administer. It lowers the cost, and it provides consistency. If you have multiple systems that deal with data, you always arise as the issue as to which data has to be loaded into which system. And then you have issues with consistency. Once you have issues with consistency, business users, as analysts will stop trusting your data. So that was very critical for us to unify the system of data handling in the one place. >>Additionally, you have a massive scalability. So, um, I went to the talk with from apple saying that, you know, he can process two years worth of data. Instead of just two days in an Edmonds, we have this use case of backfilling the data. So often we changed the logic and went to new. We need to reprocess massive amounts of data with the lake house. We can reprocess months worth of data in, in a matter of minutes or hours. And additionally at the data lake houses based on open, uh, open standards, like parquet that allowed us, allowed us to basically hope open source and third-party tools on top of the Delta lake house. Um, for example, a Mattson, we use a Matson for data discovery, and finally, uh, the lake house approach allows us for different skillsets of people to work on the same source data. We have analysts, we have, uh, data engineers, we have statisticians and data scientists using their own programming languages, but working on the same core of data sets without worrying about duplicating data and consistency issues between the teams. >>So what, what is, what are the primary use cases where you're using house Lakehouse Delta? >>So, um, we work, uh, we have several use cases, one of them more interesting and important use cases as vehicle pricing, you have used the Edmonds. So, you know, you go to our website and you use it to research vehicles, but it turns out that pricing and knowing whether you're getting a good or bad deal is critical for our, uh, for our business. So with the lake house, we were able to develop a data pipeline that ingests the transactions, curates the transactions, cleans them, and then feeds that curated a curated feed into the machine learning model that is also deployed on the lake house. So you have one system that handles this huge complexity. And, um, as you know, it's very hard to find unicorns that know all those technologies, but because we have flexibility of using Scala, Java, uh, Python and SQL, we have different people working on different parts of that pipeline on the same system and on the same data. So, um, having Lakehouse really enabled us to be very agile and allowed us to deploy new sources easily when we, when they arrived and fine tune the model to decrease the error rates for the price prediction. So that process is ongoing and it's, it's a very agile process that kind of takes advantage of the, of the different skill sets of different people on one system. >>Because you know, you guys democratized by car buying, well, at least the data around car buying because as a consumer now, you know, I know what they're paying and I can go in of course, but they changed their algorithms as well. I mean, the, the dealers got really smart and then they got kickbacks from the manufacturer. So you had to get smarter. So it's, it's, it's a moving target, I guess. >>Great. The pricing is actually very complex. Like I, I don't have time to explain it to you, but knowing, especially in this crazy market inflationary market where used car prices are like 38% higher year over year, and new car prices are like 10% higher and they're changing rapidly. So having very responsive pricing model is, is extremely critical. Uh, you, I don't know if you're familiar with Zillow. I mean, they almost went out of business because they mispriced their, uh, their houses. So, so if you own their stock, you probably under shorthand of it, but, you know, >>No, but it's true because I, my lease came up in the middle of the pandemic and I went to Edmonds, say, what's this car worth? It was worth like $7,000. More than that. Then the buyout costs the residual value. I said, I'm taking it, can't pass up that deal. And so you have to be flexible. You're saying the premises though, that open source technology and Delta lake and lake house enabled that flexible. >>Yes, we are able to ingest new transactions daily recalculate our model within less than an hour and deploy the new model with new pricing, you know, almost real time. So, uh, in this environment, it's very critical that you kind of keep up to date and ingest their latest transactions as they prices change and recalculate your model that predicts the future prices. >>Because the business lines inside of Edmond interact with the data teams, you mentioned data engineers, data scientists, analysts, how do the business people get access to their data? >>Originally, we only had a core team that was using Lakehouse, but because the usage was so powerful and easy, we were able to democratize it across our units. So other teams within software engineering picked it up and then analysts picked it up. And then even business users started using the dashboarding and seeing, you know, how the price has changed over time and seeing other, other metrics within the, >>What did that do for data quality? Because I feel like if I'm a business person, I might have context of the data that an analyst might not have. If they're part of a team that's servicing all these lines of business, did you find that the data quality, the collaboration affected data? >>Th the biggest thing for us was the fact that we don't have multiple systems now. So you don't have to load the data. Whenever you have to load the data from one system to another, there is always a lag. There's always a delay. There is always a problematic job that didn't do the copy correctly. And the quality is uncertain. You don't know which system tells you the truth. Now we just have one layer of data. Whether you do reports, whether you're data processing or whether you do modeling, they all read the same data. And the second thing is that with the dashboarding capabilities, people that were not very technical that before we could only use Tableau and Tableau is not the easiest thing to use as if you're not technical. Now they can use it. So anyone can see how our pricing data looks, whether you're an executive, whether you're an analyst or a casual business users, >>But Hey, so many questions, you guys are gonna have to combat. I'm gonna run out of time, but you now allow a consumer to buy a car directly. Yes. Right? So that's a new service that you launched. I presume that required new data. We give, we >>Give consumers offers. Yes. And, and that offer you >>Offered to buy my league. >>Exactly. And that offer leverages the pricing that we develop on top of the lake house. So the most important thing is accurately giving you a very good offer price, right? So if we give you a price, that's not so good. You're going to go somewhere else. If we give you price, that's too high, we're going to go bankrupt like Zillow debt, right. >>It took to enable that you're working off the same dataset. Yes. You're going to have to spin up a, did you have to inject new data? Was there a new data source that we're working on? >>Once we curate the data sources and once we clean it, we see the directly to the model. And all of those components are running on the lake house, whether you're curating the data, cleaning it or running the model. The nice thing about lake house is that machine learning is the first class citizen. If you use something like snowflake, I'm not going to slam snowflake here, but you >>Have two different use case. You have >>To, you have to load it into a different system later. You have to load it into a different system. So like good luck doing machine learning on snowflake. Right. >>Whereas, whereas Databricks, that's kind of your raison d'etre >>So what are your, your, your data engineer? I feel like I should be a salesman or something. Yeah. I'm not, I'm not saying that. Just, just because, you know, I was told to, like, I'm saying it because of that's our use case, >>Your use case. So question for each of you, what, what business results did you see when you went to kind of pre lake house, post lake house? What are the, any metrics you can share? And then I wonder, Joel, if you could share a sort of broader what you're seeing across your customer base, but Greg, what can you tell us? Well, >>Uh, before their lake house, we had two different systems. We had one for processing, which was still data breaks. And the second one for serving and we iterated over Nateeza or Redshift, but we figured that maintaining two different system and loading data from one to the other was a huge overhead administration security costs. Um, the fact that you had to consistency issues. So the fact that you can have one system, um, with, uh, centralized data, solves all those issues. You have to have one security mechanism, one administrative mechanism, and you don't have to load the data from one system to the other. You don't have to make compromises. >>It's scale is not a problem because of the cloud, >>Because you can spend clusters at will for different use cases. So your clusters are independent. You have processing clusters that are not affecting your serving clusters. So, um, in the past, if you were running a serving, say on Nateeza or Redshift, if you were doing heavy processing, your reports would be affected, but now all those clusters are separated. So >>Consumer data consumer can take that data from the producer independ >>Using its own cluster. Okay. >>Yeah. I'll give you the final word, Joel. I know it's been, I said, you guys got to come back. This is what have you seen broadly? >>Yeah. Well, I mean, I think Greg's point about scale. It's an interesting one. So if you look at cross the entire Databricks platform, the platform is launching 9 million VMs every day. Um, and we're in total processing over nine exabytes a month. So in terms of just how much data the platform is able to flow through it, uh, and still maintain a extremely high performance is, is bar none out there. And then in terms of, if you look at just kind of the macro environment of what's happening out there, you know, I think what's been most exciting to watch or what customers are experiencing traditionally or, uh, on the traditional data warehouse and kinds of workloads, because I think that's where the promise of lake house really comes into its own is saying, yes, I can run these traditional data warehousing workloads that require a high concurrency high scale, high performance directly on my data lake. >>And, uh, I think probably the two most salient data points to raise up there is, uh, just last month, Databricks announced it's set the world record for the, for the, uh, TPC D S 100 terabyte benchmark. So that is a place where Databricks at the lake house architecture, that benchmark is built to measure data warehouse performance and the lake house beat data warehouse and sat their own game in terms of overall performance. And then what's that spends from a price performance standpoint, it's customers on Databricks right now are able to enjoy that level of performance at 12 X better price performance than what cloud data warehouses provide. So not only are we jumping on this extremely high scale and performance, but we're able to do it much, much more efficiently. >>We're gonna need a whole nother section second segment to talk about benchmarking that guys. Thanks so much, really interesting session and thank you and best of luck to both join the show. Thank you for having us. Very welcome. Okay. Keep it right there. Everybody you're watching the cube, the leader in high-tech coverage at AWS reinvent 2021

Published Date : Nov 30 2021

SUMMARY :

Great to see you again. Glad to be here. This is all over the place. and reporting Trisha, the lake, the workloads that you would have for your data warehouse on And regardless of what kind of data warehouse you adopt, And what Delta lake allows us to do is when you need it, that all the roles that have to take that have to touch my data for as to how you guys are using data and then tie it into what y'all just said. And with Delta lake and built a lake, you can have one system that handles all Additionally, you have a massive scalability. So you have one system that So you had to get smarter. So, so if you own their stock, And so you have to be flexible. less than an hour and deploy the new model with new pricing, you know, you know, how the price has changed over time and seeing other, other metrics within the, lines of business, did you find that the data quality, the collaboration affected data? So you don't have to load But Hey, so many questions, you guys are gonna have to combat. So the most important thing is accurately giving you a very good offer did you have to inject new data? I'm not going to slam snowflake here, but you You have To, you have to load it into a different system later. Just, just because, you know, I was told to, And then I wonder, Joel, if you could share a sort of broader what you're seeing across your customer base, but Greg, So the fact that you can have one system, So, um, in the past, if you were running a serving, Okay. This is what have you seen broadly? So if you look at cross the entire So not only are we jumping on this extremely high scale and performance, but we're able to do it much, Thanks so much, really interesting session and thank you and best of luck to both join the show.

ENTITIES

Entity	Category	Confidence
Joel	PERSON	0.99+
Greg	PERSON	0.99+
Joel Minnick	PERSON	0.99+
$7,000	QUANTITY	0.99+
Greg Rokita	PERSON	0.99+
38%	QUANTITY	0.99+
two days	QUANTITY	0.99+
10%	QUANTITY	0.99+
Java	TITLE	0.99+
Databricks	ORGANIZATION	0.99+
two years	QUANTITY	0.99+
one system	QUANTITY	0.99+
one	QUANTITY	0.99+
Scala	TITLE	0.99+
apple	ORGANIZATION	0.99+
Python	TITLE	0.99+
SQL	TITLE	0.99+
ninth year	QUANTITY	0.99+
last month	DATE	0.99+
lake house	ORGANIZATION	0.99+
two different systems	QUANTITY	0.99+
Tableau	TITLE	0.99+
2021	DATE	0.99+
9 million VMs	QUANTITY	0.99+
second thing	QUANTITY	0.99+
less than an hour	QUANTITY	0.99+
Lakehouse	ORGANIZATION	0.98+
12 X	QUANTITY	0.98+
Delta	ORGANIZATION	0.98+
Delta lake house	ORGANIZATION	0.98+
one layer	QUANTITY	0.98+
one common platform	QUANTITY	0.98+
both	QUANTITY	0.97+
AWS	ORGANIZATION	0.97+
Zillow	ORGANIZATION	0.97+
Brittany Britain	PERSON	0.97+
Edmunds.com	ORGANIZATION	0.97+
two different system	QUANTITY	0.97+
Edmonds	ORGANIZATION	0.97+
over nine exabytes a month	QUANTITY	0.97+
today	DATE	0.96+
Lakehouse Delta	ORGANIZATION	0.96+
Delta lake	ORGANIZATION	0.95+
Trisha	PERSON	0.95+
data lake	ORGANIZATION	0.94+
Mattson	ORGANIZATION	0.92+
second segment	QUANTITY	0.92+
each	QUANTITY	0.92+
Matson	ORGANIZATION	0.91+
two most salient data points	QUANTITY	0.9+
Edmonds	LOCATION	0.89+
100 terabyte	QUANTITY	0.87+
one single source	QUANTITY	0.86+
first class	QUANTITY	0.85+
Nateeza	TITLE	0.85+
one security	QUANTITY	0.85+
Redshift	TITLE	0.84+

Sean Knapp, Ascend.io & Jason Robinson, Steady | AWS Startup Showcase

(upbeat music) >> Hello and welcome to today's session, theCUBE's presentation of the AWS Startup Showcase, New Breakthroughs in DevOps, Data Analytics, Cloud Management Tools, featuring Ascend.io for the data and analytics track. I'm your host, John Furrier with theCUBE. Today, we're proud joined by Sean Knapp, CEO and founder of Ascend.io and Jason Robinson who's the VP of Data Science and Engineering at Steady. Guys, thanks for coming on and congratulations, Sean, for the continued success, loves our cube conversation and Jason, nice to meet you. >> Great to meet you. >> Thanks for having us. >> So, the session today is really kind of looking at automating analytics workloads, right? So, and Steady as a customer. Sean, talk about the relationship with the customer Steady. What's the main product, what's the core relationship? >> Yeah, it's a really great question. when we work with a lot of companies like Steady we're working hand in hand with their data engineering teams, to help them onboard onto the Ascend platform, build these really powerful data pipelines, fueling their analytics and other workloads, and really helping to ensure that they can be successful at getting more leverage and building faster than ever before. So we tend to partner really closely with each other's teams and really think of them even as extensions of each other's own teams. I watch in slack oftentimes and our teams just go back and forth. And it's like, as if we were all just part of the same company. >> It's a really exciting time, Jason, great to have you on as a person cutting your teeth into this kind of what I call next gen data as intellectual property. Sean and I chat on theCUBE conversation previous to this event where every company is a data company, right? And we've heard that cliche. >> Right. >> But it's true, right? It's going to, it's getting more powerful with the edge. You seeing more diverse data, faster data, small, big, large, medium, all kinds of different aspects and patterns. And it's becoming a workflow kind of intellectual property paradigm for companies, not so much. >> That's right. >> Just the tech it's the database is you can, it's the data itself, data in flight, it's moving around, it's got value. What's your take-- >> Absolutely. >> On this trend? >> Basically, Steady helps our members and we have a community of members earn more income. So we want to help them steady their financial lives. And that's all based on data, so we have a web app, you could go to the iOS Store, you could go to the Google Play Store, you can download the app. And we have a large number of members, 3 million plus, who are actively using this. And we also have a very exciting new product called income passport. And this helps 1099 and mixed wage earners verify their income, which is very important for different government benefits. And then third, we help people with emergency cash grants as well as awards. So all of that is built on a bedrock of data, so if you're using our apps, it's all data powered. So what you were mentioning earlier from pipelines that are running it real time to yeah, anything, that's a kind of a small data aggregation, we do everything from small to real-time and large. >> You guys are like a multiple sided marketplace here, you've got it, you're a FinTech app, as well as the future of work and with virtual space-- >> That's right. >> Happening now, this is becoming, actually encapsulates kind of the critical problems that people trying to solve right now, you've got multiple stakeholders. >> That's right. >> In the data. >> Yes, we absolutely do. So we have our members, but we also, within the company, we have product, we have strategy, we have a growth team, we have operations. So data engineering and data science also work with a data analytics organization. So at Steady we're very much a data company. And we have a data organization led by our chief data officer and we have data engineering and data science, which are my teams, but also that business insights and analytics. So a lot of what we're building on the data engineering side is powering those insights and analytics that the business stakeholders use every day to run the organization. >> Sean, I want to get your thoughts on this because we heard from Emily Freeman in the keynote about how this revolution in DevOps or for premiering her talk around how, it's not just one persona anymore, I'm a release engineer, I'm this kind of engineer, you're seeing now all engineering, all developers are developers. You have some specialty, but for the most part, the team makeups are changing. We touched on this in our cube conversation. The journey of data is not just the data people, the data folks. It's like there's, they're developers too. So the confluence of data science, data management, developing, is changing the team and cultural makeup of companies. Could you share your thoughts on this dynamic and how it impacts customers? >> Absolutely, I think the, we're finding a similar trend to what we saw a number of years ago, when we talked about how software was eating the world and every company was now becoming a software company. And as a result, we saw this proliferation and expansion of what the software roles look like and thought of a company pulled through this entire new era of DevOps. We were finding that same pattern now emerging around data as not only is every company a software company, every company is a data company and data really is that field, that oil that fuels the business and in doing so, we're finding that as Jason describes it's pervasive across the team, it is no longer just one team that is creating some insights and reports around operational analytics, or maybe a team over here doing data science or machine learning. It is expensive. And I think the really interesting challenges that start to come with this too, are so many data teams are so over capacity. We did a recent study that highlighted that 96% of data teams are at, or over capacity, only 4% had spare capacity. But as a result, the net is being cast even wider to pull in people from even broader and more adjacent domains to all participate in the data future of their organization. >> Yeah, and I think I'd love to get your guys react to this conversation with Andy Jassy, who's now the CEO of Amazon, but when he was the CEO of AWS last year, I talked with him about how the old guard and new guard are thinking around team formations. Obviously team capacity is growing and challenged when you've got the right formula. So that's one thing, right? But what if you don't have the right formula? If you're in the skills gap, problem, or team formation side of it, where you maybe there was two years ago where the mandate came down? Well, we got to build a data team even in two years, if you're not inquisitive. And this is what Andy and I were talking about is the thinking and the mindset of that mission and being open to discovering and understanding the changes, because if you were deciding what your team was two, three years ago, that might have changed a lot. So team capacity, Sean, to your point, if you got it right, and that's a challenge in and of itself, but what if you don't have it, right? What do you guys think about this? >> Yeah, I think that's exactly right. Basically trying to see, look and gaze into the crystal ball and see what's going to happen in a year or two years, even six months is quite difficult. And if you don't have it right, you do spend a lot of time because of the technical debt that you've amassed. And we certainly spend quite a bit of time with technical debt for things we wanted to build. So, deconvolving that, getting those ETLs to a runnable state, getting performance there, that's what we spend a bit of time on. And yeah, it's something that it's really part of the package. >> What do you guys see as the big challenge on teams? The scaling challenge okay. Formation is one thing, Sean, but like, okay, getting it right, getting it formed properly and then scaling it, what are the big things you're seeing? >> One of the, I think the overarching management themes in general, it is the highest out by the highest performing teams are those where the individual with the context and the idea is able to execute as far and as fast and as efficiently as possible, and removing a lot of those encumbrances and put it a slightly different way. If DevOps was all basically boiled down to, how do we help more people write more software faster and safely data ops would be very similarly, how do we enable more people to do more things with data faster and safely? And to do that, I think the era of these massive multi-year efforts around data are gone and hopefully in the not too distant future, even these multi-quarter efforts around data are gone and we get into a much more agile, nimble methodology where smaller initiatives and smaller efforts are possible by more diverse skillsets across the business. And really what we should be doing is leveraging technology and automation to ensure that people are able to be productive and efficient and that we can trust our data and that systems are automated. And these are problems that technology is good at. And so in many ways, how in the early days Amazon would described as getting people out of the muck of DevOps. I think we're going to do the same thing around getting people out of the muck of the data and get them really focused on the higher level aspects. >> Yeah, we're going to get into that complexity, heavy lifting side muck, and then the heavy lifting taking away from the customers. But I want to go back to real quick with Jason while we're on this topic. Jason, I was just curious, how much has your team grown in the recent year and how much could've, should've grown, what's the status and how has Ascend helped you guys? What's the dynamic there? ' Cause that's their value proposition. So, take us through that. >> Absolutely, so, since the beginning of the year data engineering has doubled. So, we're a lean team, we certainly use the agile mindset and methodologies, but we have gone from, yeah, we've essentially doubled. So a lot of that is there's just so much to do and the capacity problem is certainly there. So we also spend a lot of time figuring out exactly what the right tooling is. And I was mentioning the technical debt. So you have those, there's the big O notation of whatever that involves technical debt. And when you're building new things, you're fixing old things. And then you're trying to maintain everything. That scaling starts to hit hard. So even if we continue to double, I mean, we could easily add more data engineers. And a lot of that is, I mean, you know about the hiring cycles, like, a lot of of great talent, but it's difficult to make all of those hires. So, we do spend quite a bit of time thinking about exactly what tools data engineering is using day-to-day. And what I mentioned, were technologies on the streaming side all the way to like the small batch things, but, like something that starts as a small batch getting grow and grow and grow and take, say 15 hours, it's possible, I've seen it. But, and getting that back down and managing that complexity while not overburdening people who probably don't want to spend all their waking hours building ETLs, maintaining ETL, putting in monitoring, putting in alerting, that I think is quite a challenge. >> It's so funny because you mentioned 18 hours, you got to kind of being, you didn't roll your eyes, but you almost did, but this is, but people want it yesterday, they want real time, so there's a lot of demand-- >> Yes. >> On the minds of the business outcome side of it. So, I got to ask you, because this comes up a lot with technical debt, and now we're starting to see that come into the data conversation. And so I always curious, is there a different kind of technical debt with data? Because again, data is like software, but it's a little bit of more elusive in the sense it's always changing. So is there, what kind of technical debt do you see in the data side that's different than say software side? >> Absolutely, now that's a great question. So a lot of thinking about your data and structuring your data and how you want to use that data going into a particular project might be different from what happens after stakeholders have a new considerations and new products and new items that need to be built. So thinking about how that, let's say you have a document store, or you have something that you thought was going to be nice and structured, how that can evolve and support those particular products can essentially, unless you take the time and go through and say, well, let's architect it perfectly so that we can handle that. You're going to make trade-offs and choices, and essentially that debt builds up. So you start cutting corners, you start changing your normalization. You start essentially taking those implicit schema that then tend to build into big things, big implicit schema. And then of course, with implicit schema, you're going to have a lot of null values, you're going to have a lot of items to deal with. So, how do you deal with that? And then you also have the opportunity to create keys and values and oops, do we take out those keys that were slightly misspelled? So, I could go on for hours, but basically the technical debt certainly is there with on data. I see a lot of this as just a spectrum of technical debt, because it's all trade-offs that you made to build a product, and the efficiency has start to hit you. So, the 15 hour ETL, I was mentioning, basically you start with something and you were building things for stakeholders and essentially you have so much complex logic within that. So for the transforms that you're doing from if you're thinking of the bronze, silver, gold, kind of a framework, going from that bronze to a silver, you may have a massive number of transformations or just a few, just to lightly dust it. But you could also go to gold with many more transformations and managing that, managing the complexity, managing what you're spending for servers day after day after day. That's another real challenge of that technical debt stuff. >> That's a great lead into my next question, for Sean, this is the disparate system complexity, technical debt and software was always kind of the belief was, oh yeah, I'll take some technical debt on and work it off once I get visibility and say, unit economics or some sort of platform or tool feature, and then you work it off as fast as possible. I was, this becomes the art and science of technical debt. Jason, what you're saying is that this can be unwieldy pretty quickly. You got state and you got a lot of different inter moving parts. This is a huge issue, Sean, this is where it's, technical debt in the data world is much different architecturally. If you don't get it right, this is a huge, huge issue. Could you aluminate why that is and what you guys are doing to help unify and change some of those conditions? >> Yeah, absolutely. When we think about technical debt and I'll keep drawing some parallels between DevOps and data ops, 'cause I think there's a tremendous number of similarities in these worlds. We used to always have the saying that "Your tech debt grows manually across microservices, "but exponentially within services." And so you want that right level of architecture and composibility if you will, of your systems where you can deploy changes, you can test, you can have high degrees of competence and the roll-outs. And I think the interesting part in the data side, as Jason highlighted, the big O-notation for tech debt in the data ecosystem, is still fairly exponential or polynomial in nature. As right now, we don't have great decomposition of the components. We have different systems. We have a streaming system, we have a databases, we have documents, doors and so on, but how the whole data pipeline data engineering part works generally tends to be pretty monolithic in nature. You take your whole data pipeline and you deploy the whole thing and you basically just cross your fingers, and hopefully it's not 15 hours, but if it is 15 hours, you go to sleep, you wake up the next morning, grab a coffee and then maybe it worked. And that iteration cycle is really slow. And so when we think about how we can improve these things, right? This is combinations of intelligent systems that do instantaneous schema detection, and validation, excuse me, it's combinations of things that do instantaneous schema detection and validation. It's things like automated lineage and dependency tracking. So you know that when you deploy code, what piece of data it affects, it's things like automated testing on individual core parts of your data pipelines to validate that you're getting the expected output that you need. So it's pulling a lot of these same DevOps style principles into the data world, which is really designed to going back to how do you help more people build more things faster and safely really rapid iterations for rapid feedback. So you know if there's breaks in the system much earlier on. >> Well, I think Sean, you're onto something really big there. And I think this is something that's emerging pretty quickly in the cloud scale that I called, 2.0, whatever, what version we're in, is the systems thinking mindset. 'Cause you mentioned the model that that was essentially a silo or subsystem. It was cohesive in it's own way, but now it's been monolithic. Now you have a broken down set of decomposed sets of data pieces that have to work together. So Jason, this is the big challenge that everyone, not really people are talking about, I think most these guys are, and you're using them. What are you unifying? Because this is a systems operating systems thinking, this is not like a database problem. It's a systems problem applied to data where databases are just pieces of it, what's your thoughts? >> That's absolutely right. And I would, so Sean touched on composibility of ETL and thinking about reusable components, thinking about pieces that all fit together, because as you're building something as complex as some of these ETS are, we do think about the platform itself and how that lends to the overarching output. So one thing, being able to actually see the different components of an ETL and blend those in and you as the dry principal, don't repeat yourself. So you essentially are able to take pieces that one person built, maybe John builds a couple of our connectors coming in, Sean also has a bunch of transforms and I just want this stuff out, so I can use a lot of what you guys have already built. I think that's key, because a lot of engineering and data engineering is about managing complexity. So taking that complexity and essentially getting it out fast and getting out error free, is where we're going with all of the data products we're building. >> What are some of the complexity that you guys have that you're dealing with? Can you be specific and share what these guys are doing to solve that problem for you? That's, this is a big problem everyone's having, I'm seeing that all over the place. >> Absolutely, so I could start at a couple of places. So I don't know if you guys are on the three Vs, four Vs or five Vs, but we have all of those. And if you go to that five, four or five V model, there is the veracity piece, which you have to ask yourself, is it true? Is it accurate when? So change happens throughout the pipeline, change can come from web hooks, change can come from users. You have to make sure that you're managing that complexity and what we we're building, I mentioned that we are paying down a lot of tech debt, but we're also building new products. And one pretty challenging, quite challenging ETL that we're building is something going from a document store to an analytical application. So in that document store, we talked about flexible schema. Basically, you don't really know exactly what you're going to get day to day, and you need to be able to manage that change through the whole process in a way that the ultimate business users find value. So, that's one of the key applications that we're using right now. And that's one that the team at Ascend and my team are working hand in hand going through a lot of those challenges. And it's, I also watch the slack just as Sean does, and it's a very active discussion board. So it is essentially like they're just partnering together. It's fabulous, but yeah-- >> And you're seeing kind of a value on this too, I mean, in terms of output what's the business results? >> Yes, absolutely. So essentially this is all, so yes, the fifth V value. So, getting to that value is essentially, there were a few pieces of the, to the value. So there's some data products that we're building within that product and their data science, data analytics based products that essentially do things with the data that help the user. There's also the question of exactly the usage and those kinds of metrics that people in ops want to understand as well as our growth team. So we have internal and external stakeholders for that. >> Jason, this is a great use case, a great customer, Sean, you guys are automating. For the folks watching, who were seeing their peer living the dream here and the data journey, as we say, things are happening. What's the message to customers that you guys want to send because you guys are really cutting your teeth into a whole another level of data engineering, data platform. That's really about the systems view and about cloud. What's the pitch, Sean? What should people know about the company? >> Absolutely, yeah, well, so one, I'd say even before the pitch, I would encourage people to not accept the status quo. And in particular, in data engineering today, the status quo is an incredibly high degree of pain and discomfort. And I think the important part of why Ascend exists and why we're so helpful for our customers, there is a much more automated future of how we build data products, how we optimize those and how we can get a larger cohort of builders into the data ecosystem. And that helps us get out of the muck as we talked about before and put really advanced technology to work for more people inside of our companies to build these data products, leveraging the latest and greatest technologies to drive increased business value faster. >> Jason, what's your assessment of these guys, as people are watching might say, hey, you know what, I'm going to contact them, I need this. How would you talk about Ascend into your peers? >> Absolutely, so I think just thinking about the whole process has been a great partnership. We started with a POC, I think Ascend likes to start with three use cases, I think we came out with four and we went through the ones that we really cared about and really wanted to bring value to the company with. So we have roadmaps for some, as we're paying down technical debt and transitioning, others we can go directly to. And I think that thinking about just like you're saying, John, that systems view of everything you're building, where that makes sense, you can actually take a lot of that complexity and encapsulate it in a way that you can essentially manage it all in that platform. So the Ascend platform has the composibility piece that we touched on. It also, not only can you compose it, but you can drill into it. And my team is super talented and is going to drill into it. So basically loves to open up each of those data flows each of the components therein and has the control there with the combination of Spark Sequel, PI Spark SQL Scala and so on. And I think that the variety of connections is also quite helpful. So thinking about the dry principle from a systems perspective is extremely useful because it's dry, you often get that in a code review, right? I think you can be a little bit more dry here. >> Yeah. >> But you can really do that in the way that you're composing your systems as well. >> That's a great, great point. One quick thing for the folks that they're watching that are trying to figure this out, and a lot of architecture is going on. A lot of people are looking at different solutions. What things have you learned that you could give them a tip like to avoid like maybe some scar tissue or tips of the trade, where you can say, hey, this way, be careful, what's some of the learnings? Could you give a few pointers to folks out there, if they're kicking tires on the direction, what's the wrong direction? What's the right direction look like? >> Absolutely, I think that, I think it through, and I don't know how much time we have that, that feels like a few days conversation as far as ways to go wrong. But absolutely, I think that thinking through exactly where want to be is the key. Otherwise it's kind of like when you're writing a ticket on Jarrah, if you don't have clear success criteria, if you don't know where you going to go, then you'll end up somewhere building something and it might work. But if you think through your exact destination that you want to be at, that will drive a lot of the decisions as you think backwards to where you started. And also I think that, so Sean also mentioned challenging the status quo. I think that you really have to be ready to challenge the status quo at every step of that journey. So if you start with some particular service that you had and its legacy, if it's not essentially performing what you need, then it's okay to just take a step back and say, well, maybe that's not the one. So I think that thinking through the system, just like you were saying, John, and also I think that having a visual representation of where you want to go is critical. So hopefully that encapsulates a lot of it, but yes, the destination is key. >> Yeah, and having an engineering platform that also unifies the multiple components and it's agile. >> That's right. >> It gets you out of the muck and on the last day and differentiate heavy lifting is a cloud plan. >> Absolutely. >> Sean, wrap it up for us here. What's the bumper sticker for your vision, share your founding principles of the company. >> Absolutely, for us, we started the company as a former in recovery and CTO. The last company I founded, we had nearly 60 people on our data team alone and had invested tremendous amounts of effort over the course of eight years. And one of the things that I've learned is that over time innovation comes just as much from deciding what you're no longer going to do as what you're going to do. And focusing heavily around, how do you get out of that muck? How do you continue to climb up that technology stack? Is incredibly important. And so really we are excited to be a part of it and taking the industry is continuing to climb higher and higher level. We're building more and more advanced levels of automation and what we call our data awareness into the automated engine of the Ascend platform that takes us across the entire data ecosystem, connecting and automating all data movement. And so we have a very exciting vision for this fabric that's emerging over time. >> Awesome, Sean, thank you so much for that insight, Jason, thanks for coming on customer of Ascend.io. >> Thank you. >> I appreciate it, gentlemen, thank you. This is the track on automating analytic workloads. We here at the end of us showcase, startup showcase, the hottest companies here at Ascend.io, I'm John Furrier, with theCUBE, thanks for watching. (upbeat music)

Published Date : Sep 22 2021

SUMMARY :

and Jason, nice to meet you. So, and Steady as a customer. and really helping to ensure great to have you on as a person kind of intellectual property the database is you can, So all of that is built of the critical problems that the business and cultural makeup of companies. and data really is that field, that oil but what if you don't have it, right? that it's really part of the package. What do you guys see as and the idea is able to execute as far grown in the recent year And a lot of that is, I mean, that come into the data conversation. and essentially you have so and then you work it and you basically just cross your fingers, And I think this is something and how that lends to complexity that you guys have and you need to be able of exactly the usage that you guys want to send of builders into the data ecosystem. hey, you know what, I'm going and has the control there in the way that you're that you could give them a tip of where you want to go is critical. Yeah, and having an and on the last day and What's the bumper sticker for your vision, and taking the industry is continuing Awesome, Sean, thank you This is the track on

ENTITIES

Entity	Category	Confidence
Andy	PERSON	0.99+
Jason	PERSON	0.99+
Sean	PERSON	0.99+
Emily Freeman	PERSON	0.99+
Sean Knapp	PERSON	0.99+
Jason Robinson	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
John	PERSON	0.99+
Andy Jassy	PERSON	0.99+
AWS	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
15 hours	QUANTITY	0.99+
Ascend	ORGANIZATION	0.99+
last year	DATE	0.99+
96%	QUANTITY	0.99+
eight years	QUANTITY	0.99+
15 hour	QUANTITY	0.99+
iOS Store	TITLE	0.99+
18 hours	QUANTITY	0.99+
Google Play Store	TITLE	0.99+
Ascend.io	ORGANIZATION	0.99+
Steady	ORGANIZATION	0.99+
yesterday	DATE	0.99+
six months	QUANTITY	0.99+
five	QUANTITY	0.99+
third	QUANTITY	0.99+
Spark Sequel	TITLE	0.99+
two	DATE	0.98+
Today	DATE	0.98+
a year	QUANTITY	0.98+
two years	QUANTITY	0.98+
two years ago	DATE	0.98+
today	DATE	0.98+
four	QUANTITY	0.98+
Jarrah	PERSON	0.98+
each	QUANTITY	0.97+
theCUBE	ORGANIZATION	0.97+
three years ago	DATE	0.97+
one	QUANTITY	0.97+
3 million plus	QUANTITY	0.97+
4%	QUANTITY	0.97+
one thing	QUANTITY	0.96+
one team	QUANTITY	0.95+
three use cases	QUANTITY	0.94+
one person	QUANTITY	0.93+
nearly 60 people	QUANTITY	0.93+
one persona	QUANTITY	0.91+

Dirk Didascalou, AWS | AWS re:Invent 2020

>>From around the globe. It's the cube with digital coverage of AWS reinvent 2020 sponsored by Intel and AWS. >>Hey, welcome back to the cubes. Live coverage here for re-invent 2020 Amazon web services. I'm John for your host with the cube. We are the cube virtual. Normally we're in person this year. We're remote because of the pandemic. It's a virtual event on both sides. Got a great guest here. Derek did a Scala vice-president of IOT at AWS. Um, Derek, did I get the name right last year? I think I got it right. Did a scholar, >>You still did a good ride last year and this year. It's exactly it's Greek. >>Great to see you. Um, keep alumni and last year's talk was phenomenal. Really a precursor to what you, you did this year and your keynote leadership session, which you just came off of. Um, really kind of extending the conversation around new news and announcements around what's going on in the complex system. That is the edge and or IOT, some really awesome announcements. So give us a quick overview of, you know, what was the main theme of the keynote. And then I got some specific questions on the news. >>Uh, so the main theme was connected. They transform tomorrow. And I think the idea was that, um, in order to do complex IOT solutions, um, which, which they are, as you said, complex systems you need in principle three different types of elements, software that runs on devices that you connect then services that you have in the cloud that you manage all of the devices and then, uh, technology like services again in the cloud that can make sense of data, um, so that you can do your business logic. And what I was walking the audience through was what is IOT? What are the use cases that we empower today? And then of course I have a bunch of, uh, new launches actually 19 launched new 19 very significant features at reinvent this this morning about what else can you do? And some of them hopefully we'll talk about today. >>Well, we don't have all that time ago to check out for the folks watching, go to the Amazon re-invent site, log in and watch the replays playing multiple times in different time zones and it's, and it's on demand. The thing that got me was impressive to me, I loved your talk. And one of the key news was this I, uh, AWS IOT core for low Rowan, which is fully managed service on AWS. One of the highlights of the, of the presentation. So this is interesting, right? So it's all this a whole nother way. It's kind of a disconnected kind of system. Then you've got fleet as well. You announced, but to what is a low Rawan, can you explain what that is? >>Ryan stands for long range wide area network, and it's a type of connectivity standard, um, which uses very little energy on devices. So think about your own level cellular or wifi, which are connectivity standards. Some of them are for high throughput, but if you have low data rates like for sensors and you want to have those sensors, um, having a lifespan of let's send 10 years for the same battery, then you need very specific standards. Don't require a lot of compute and Laura ran as one of those standards. And the other thing is as long range. So that means you can put sensors pretty far away. Um, you penetrates also concrete or, uh, normally basements, which you counted differently. So if you think about asset tracking or a large scale monitoring off of sensors, Laura van is the standard to go. It's also a similar technology that powers the sidewalk network for Amazon, which is a public offering that we have as well. And the announcement that we did is that we now have this technology fully integrated with AWS IOT core. So customers who want to spin up those Laurel when networks, they don't have to do it themselves. We do it for them. The only thing they need to do is just buy or acquire a specific gateway, which is also pre-certified in our device kind of. And every sensor that is Laura, when standard specific can immediately connect securely to the AWS IOT cloud. >>Okay. So two questions. One is use cases. What does this use for, and you mentioned long range, I'm assuming it's radio-frequency so there's a, uh, um, radio and design a battery power. I mean, how you drive those long rain signals and what are the use cases? I mean, it's just for like manufacturing, is it for like buildings? I mean, would it be, >>We'll use it for all of them? So I give you a great example. We had their compliance mate as one of our launch customers for one a Lang. And what they do is they put sensors in refrigeration units in restaurants, and they are typically agreeably big metal, shielded refrigeration units, and basements. And if you're trying to get what seller or 5g take your phone down in the basement, there's no reception anymore. But Laura ran because it's a low frequency. It can actually penetrate a concrete quite a bit. And because it sends very more data rates, because it only tells you the temperature instead of a streaming video and uses very little battery. So they can put the sensor in all of the refrigeration units and all of the rest ones. And you don't have to touch them for years to come. So that's, for example, one use case, or you want to asset tracking, you put those small little sensors, I don't know, on containers, on pallets and ship them all, all the country. So that's parts where you can more or less than how these assets. >>And so is like a base station. Is there an antenna? Is there a main antenna that goes for walls? It sounds like it's yeah. >>What'd you do your bite. What is called a LoRa LAN gateway? That is a gateway, which has, if you like, it's a mini base station that you can buy from multiple suppliers and partners of ours actually be pre-certified 13 of those with 13 different suppliers in our device, uh, catalog, and then you buy them and more or less, and then you just connect them directly to the internet because everything else, what we'll do, we'll just call this LoRa network server, which normally is the backend infrastructure runs. They're not in the AWS cloud. These gateways act as base stations. Think of them. It's like your wifi router in the home. It's then a LoRa gateway device, which then has a longer range than a wifi would have. And we don't talk about just a few meters used. So it's, it's much further along. I'd love to follow >>Up. I don't have a lot of time, but that was a fascinating announcement, really kind of core, uh, fleet hub and other one that got my attention. Um, this is managing IOT to AWS IOT devices from anywhere, from anywhere from any device. Give us quick tutorial on fleet hub, >>Really tough. So I would take coral, any managers, a lot of devices you have, as I said, more than half a billion devices now going, or end points as we call them through our service 70 months. And if you have so many devices, then you would like to understand, okay, is something going wrong? Is everything fine in order to do so? You can't just probe every single device who typically buy a, built an application that the motor shows you, this fleet management dashboards. And that's exactly what feed pump fleet have is with very little effort. Actually, an it administrator cannot click a button and it has these applications that everybody in the company can log in with their standard logins. And then they can see, okay, all the entire fleets, they see there's something wrong. It can identify issues and they can also do remediations like, okay, maybe reboot a device or make a firmware update or security tunnel into a more complicated device for troubleshooting. >>Awesome. And the other one, by the way, that's awesome. People love those dashboards. Sitewise edge software. This was interesting localizing data for developers to process their run visualization on a connected or disconnected scenario. This sounds really cool and relevant. What's the point? Yeah, >>Well Sitewise edge is for industrial customers. This is a really big deal. So imagine that you would like to optimize your main function. Um, our dedicated industrial services called Sightlines edge came to the gateway component, took all of the data out of the manufacturing plants into the cloud, where you could model them and you'd do cool stuff with it. Um, the problem is in very many of the scenarios, you don't want to sync all of the data to the cloud, or you can't send all of the data to the cloud. So customers were saying is okay, can I do all this good stuff that I can do in the cloud locally and DH even disconnected? And that's what we know. We launched the sideways. It's the same capabilities that you'd have in the cloud, which is not can run on gateways on outpost, on snow devices, which is data ingestion, data modeling, ETL metrics calculation. And you also have a dashboard application that we have in the cloud called side-by-side monitor. And the exact same application can run locally so that you can log in again, like with three tab locally in that URL. And you see what's actually happening with your equipment, all that it disconnected. >>Awesome. Great job there. Finally, the other one got my attention as James Gosling tweeted about the open source of green grass, which was awesome. He obviously he's a legend in the programming and systems world. Um, now works for AWS. You guys are getting all the great talent, um, Greengrass 2.0 at the edge. This is, uh, a new announcement. Take us through that. And obviously the open sourcing with Gosling involved pretty much >>Big deal. Oh yeah. So I don't know for everybody Greenglass alias besotted, reinvest, that's our runtime environment, which brings typical IOT core to the cloud from the cloud to the edge. It can be Lunda runtimes, including containers, including machine learning inferencing. And over the last few years, James and our team together, we were working actually to revamp this completely. It's a complete rewrite of the entire software that runs on the edge. It's no JVM based. It's not modular. And as you said, we just open-sourced it. So, um, there was an enormous effort into how can I modularize this because there are so many applications and sometimes you have a very powerful machine is what all the features together, or you have a much cheaper device where you said, Hey, you know what? I only want specific applications. And then how do you modularize this? And you also need a deployable at the edge of the past. You always needed the cloud in order to provision stuff. Now I can actually code and deploy all locally by doing that at scale. And of course, open sourcing. This is a pretty big deal because everybody can now inspect the code and you can extend it to whatever you would like it to ask. So >>What is someone going to do with the open source, given an example of some innovation, a bar raising activity app that someone could take with the, with the green grass open source, what would it be? What would you envision? >>So what you can do with green goes open source in the past. If you wanted to put it, for example, put on a very specific proprietary system and the past, we only shifted as binary code working from the next for example, but now I can see no one, I have a mix, so I have it a windows. So I want to have an Q and X on any type of operating system. And you can now have the code and therefore adopted yourself. You can also extend it if you'd like, because all of them, of course the short support is available. And then the modularization is that you can also build your own mind >>And it's an Apache license. So follows that >>Super easy. You can do whatever you like with a code, by the way, open sourcing, doesn't change anything to pricing when it's wherever. So you get the code, you do what you like with this Apache 2.0, not to be confounded. You have another open source, which is free. Artose, that's our real-time operating system. That's under the MIT license that they have. We also had some great news at reinvent. We have no long-term support for free, right? >>I think there's going to be a tsunami of innovation and creative thinking around the edge. Um, real quick, final comment edge is a complex system. One of the themes that reinvent this year is, you know, re re-imagined reinvent everything. Um, when you have complexity came in, complexity is the number one challenge that we're hearing from customers, your customers and people in the industry saying, we love it. It keeps getting better and better with AWS, but, you know, putting it behind the curtain of SAS and plot pass and it, I got to tame the complexity. What do you say to that? >>It's true, particularly in IOT, it's true because we need to somehow manage complexity from embedded software and hardware and fleet management. As we said, uh, clouds, capabilities, AI, it's really, really complex. If you try to muscle this all yourself. So that's why we try to integrate our offerings. I don't know whether you've realized we didn't announce any new services. All of our capabilities are part of what we have and trying to combine. So if you like, Sitewise edge is bringing sideways to the edge, but under the hood, it's using green grass in order to make the work freed up as well. Um, everything we've done in fleet hub is based on device management. Greenglass V2 itself is not under the hood using also device management for the fleet provisioning. So we try now to combine all of the dots, make it easier to access. And then as we set for this web applications, whether it's Sightlines monitor or from the top, you don't even have to be a developer anymore. You can more or less just directly access a dashboarding app and just see what's happening without that. You need to >>Turk exciting times, congratulations a lot more to dig into, um, tons of videos on demand on the re-invent site, of course, uh, comes to the cube and we got more coverage on siliconangle.com. Dirk. Thanks for your time. Congratulations. >>Can I just one thing which I would like to still denounce or people understand, communicate for everybody. If you go to amazon.com and look for AWS IOT, educate for $42, you can buy now a tiny little device. It's not about the device, it's about a curriculum. It shows you everybody can code. How do I use IOT? How easy it is and how do I do the invoice and the amount. So it's an awesome thing for students and everybody else who would like to understand how IOT works. So check it out@amazon.com. >>Okay. We'll get it out. Educate, check it out. Learn it's easy. Next level. Programming, complexity, Turk. Thanks for coming on. >>Appreciate it. I'm John >>Florio, host of the cube here. Eight hours coverage reinvent 20, 20 virtual. We are the cube virtual. Thanks for watching.

Published Date : Dec 15 2020

SUMMARY :

It's the cube with digital coverage of AWS We are the cube virtual. It's exactly it's Greek. So give us a quick overview of, you know, what was the main theme of the keynote. of data, um, so that you can do your business logic. You announced, but to what is a low Rawan, can you explain what that is? So that means you can put sensors pretty far away. What does this use for, and you mentioned long range, And you don't have to touch them And so is like a base station. which has, if you like, it's a mini base station that you can buy from multiple suppliers Um, this is managing IOT to AWS IOT devices from anywhere, And if you have so many devices, then you would like to understand, okay, is something going wrong? And the other one, by the way, that's awesome. many of the scenarios, you don't want to sync all of the data to the cloud, or you can't send all of the data And obviously the open sourcing with Gosling involved pretty much This is a pretty big deal because everybody can now inspect the code and you So what you can do with green goes open source in the past. And it's an Apache license. So you get the code, you do what you like with this Apache 2.0, not to be confounded. Um, when you have complexity came in, complexity is the number one challenge that we're hearing So if you like, comes to the cube and we got more coverage on siliconangle.com. you can buy now a tiny little device. Educate, check it out. I'm John Florio, host of the cube here.

ENTITIES

Entity	Category	Confidence
James	PERSON	0.99+
Derek	PERSON	0.99+
Dirk Didascalou	PERSON	0.99+
AWS	ORGANIZATION	0.99+
James Gosling	PERSON	0.99+
Gosling	PERSON	0.99+
10 years	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
$42	QUANTITY	0.99+
John	PERSON	0.99+
Laura	PERSON	0.99+
last year	DATE	0.99+
two questions	QUANTITY	0.99+
Eight hours	QUANTITY	0.99+
13 different suppliers	QUANTITY	0.99+
13	QUANTITY	0.99+
Dirk	PERSON	0.99+
this year	DATE	0.99+
IOT	ORGANIZATION	0.99+
both sides	QUANTITY	0.99+
One	QUANTITY	0.99+
one	QUANTITY	0.99+
70 months	QUANTITY	0.99+
tomorrow	DATE	0.98+
more than half a billion devices	QUANTITY	0.98+
IOT	TITLE	0.98+
Intel	ORGANIZATION	0.98+
three tab	QUANTITY	0.97+
today	DATE	0.97+
Florio	PERSON	0.96+
siliconangle.com	OTHER	0.95+
Apache 2.0	TITLE	0.94+
2020	DATE	0.93+
20	QUANTITY	0.93+
Greenglass	ORGANIZATION	0.92+
19 very significant features	QUANTITY	0.92+
Ryan	PERSON	0.91+
V2	COMMERCIAL_ITEM	0.89+
one thing	QUANTITY	0.88+
MIT	ORGANIZATION	0.88+
19	QUANTITY	0.86+
pandemic	EVENT	0.86+
Scala	ORGANIZATION	0.84+
5g	ORGANIZATION	0.83+
windows	TITLE	0.82+
tons of videos	QUANTITY	0.82+
every single device	QUANTITY	0.81+
three	QUANTITY	0.81+
so many devices	QUANTITY	0.77+
this morning	DATE	0.77+
Sitewise edge	TITLE	0.77+
2020	TITLE	0.74+
one use	QUANTITY	0.72+
Sightlines	ORGANIZATION	0.71+
Apache	TITLE	0.69+
Artose	ORGANIZATION	0.67+
amazon.com	ORGANIZATION	0.66+
Greek	LOCATION	0.62+
years	DATE	0.59+
reinvent 2020	EVENT	0.59+

Christian Keynote with Disclaimer

(upbeat music) >> Hi everyone, thank you for joining us at the Data Cloud Summit. The last couple of months have been an exciting time at Snowflake. And yet, what's even more compelling to all of us at Snowflake is what's ahead. Today I have the opportunity to share new product developments that will extend the reach and impact of our Data Cloud and improve the experience of Snowflake users. Our product strategy is focused on four major areas. First, Data Cloud content. In the Data Cloud silos are eliminated and our vision is to bring the world's data within reach of every organization. You'll hear about new data sets and data services available in our data marketplace and see how previous barriers to sourcing and unifying data are eliminated. Second, extensible data pipelines. As you gain frictionless access to a broader set of data through the Data Cloud, Snowflakes platform brings additional capabilities and extensibility to your data pipelines, simplifying data ingestion, and transformation. Third, data governance. The Data Cloud eliminates silos and breaks down barriers and in a world where data collaboration is the norm, the importance of data governance is ratified and elevated. We'll share new advancements to support how the world's most demanding organizations mobilize your data while maintaining high standards of compliance and governance. Finally, our fourth area focuses on platform performance and capabilities. We remain laser focused on continuing to lead with the most performant and capable data platform. We have some exciting news to share about the core engine of Snowflake. As always, we love showing you Snowflake in action, and we prepared some demos for you. Also, we'll keep coming back to the fact that one of the characteristics of Snowflake that we're proud as staff is that we offer a single platform from which you can operate all of your data workloads, across clouds and across regions, which workloads you may ask, specifically, data warehousing, data lake, data science, data engineering, data applications, and data sharing. Snowflake makes it possible to mobilize all your data in service of your business without the cost, complexity and overhead of managing multiple systems, tools and vendors. Let's dive in. As you heard from Frank, the Data Cloud offers a unique capability to connect organizations and create collaboration and innovation across industries fueled by data. The Snowflake data marketplace is the gateway to the Data Cloud, providing visibility for organizations to browse and discover data that can help them make better decisions. For data providers on the marketplace, there is a new opportunity to reach new customers, create new revenue streams, and radically decrease the effort and time to data delivery. Our marketplace dramatically reduces the friction of sharing and collaborating with data opening up new possibilities to all participants in the Data Cloud. We introduced the Snowflake data marketplace in 2019. And it is now home to over 100 data providers, with half of them having joined the marketplace in the last four months. Since our most recent product announcements in June, we have continued broadening the availability of the data marketplace, across regions and across clouds. Our data marketplace provides the opportunity for data providers to reach consumers across cloud and regional boundaries. A critical aspect of the Data Cloud is that we envisioned organizations collaborating not just in terms of data, but also data powered applications and services. Think of instances where a provider doesn't want to open access to the entirety of a data set, but wants to provide access to business logic that has access and leverages such data set. That is what we call data services. And we want Snowflake to be the platform of choice for developing discovering and consuming such rich building blocks. To see How the data marketplace comes to live, and in particular one of these data services, let's jump into a demo. For all of our demos today, we're going to put ourselves in the shoes of a fictional global insurance company. We've called it Insureco. Insurance is a data intensive and highly regulated industry. Having the right access control and insight from data is core to every insurance company's success. I'm going to turn it over to Prasanna to show how the Snowflake data marketplace can solve a data discoverability and access problem. >> Let's look at how Insureco can leverage data and data services from the Snowflake data marketplace and use it in conjunction with its own data in the Data Cloud to do three things, better detect fraudulent claims, arm its agents with the right information, and benchmark business health against competition. Let's start with detecting fraudulent claims. I'm an analyst in the Claims Department. I have auto claims data in my account. I can see there are 2000 auto claims, many of these submitted by auto body shops. I need to determine if they are valid and legitimate. In particular, could some of these be insurance fraud? By going to the Snowflake data marketplace where numerous data providers and data service providers can list their offerings, I find the quantifying data service. It uses a combination of external data sources and predictive risk typology models to inform the risk level of an organization. Quantifying external sources include sanctions and blacklists, negative news, social media, and real time search engine results. That's a wealth of data and models built on that data which we don't have internally. So I'd like to use Quantifind to determine a fraud risk score for each auto body shop that has submitted a claim. First, the Snowflake data marketplace made it really easy for me to discover a data service like this. Without the data marketplace, finding such a service would be a lengthy ad hoc process of doing web searches and asking around. Second, once I find Quantifind, I can use Quantifind service against my own data in three simple steps using data sharing. I create a table with the names and addresses of auto body shops that have submitted claims. I then share the table with Quantifind to start the risk assessment. Quantifind does the risk scoring and shares the data back with me. Quantifind uses external functions which we introduced in June to get results from their risk prediction models. Without Snowflake data sharing, we would have had to contact Quantifind to understand what format they wanted the data in, then extract this data into a file, FTP the file to Quantifind, wait for the results, then ingest the results back into our systems for them to be usable. Or I would have had to write code to call Quantifinds API. All of that would have taken days. In contrast, with data sharing, I can set this up in minutes. What's more, now that I have set this up, as new claims are added in the future, they will automatically leverage Quantifind's data service. I view the scores returned by Quantifind and see the two entities in my claims data have a high score for insurance fraud risk. I open up the link returned by Quantifind to read more, and find that this organization has been involved in an insurance crime ring. Looks like that is a claim that we won't be approving. Using the Quantifind data service through the Snowflake data marketplace gives me access to a risk scoring capability that we don't have in house without having to call custom APIs. For a provider like Quantifind this drives new leads and monetization opportunities. Now that I have identified potentially fraudulent claims, let's move on to the second part. I would like to share this fraud risk information with the agents who sold the corresponding policies. To do this, I need two things. First, I need to find the agents who sold these policies. Then I need to share with these agents the fraud risk information that we got from Quantifind. But I want to share it such that each agent only sees the fraud risk information corresponding to claims for policies that they wrote. To find agents who sold these policies, I need to look up our Salesforce data. I can find this easily within Insureco's internal data exchange. I see there's a listing with Salesforce data. Our sales Ops team has published this listing so I know it's our officially blessed data set, and I can immediately access it from my Snowflake account without copying any data or having to set up ETL. I can now join Salesforce data with my claims to identify the agents for the policies that were flagged to have fraudulent claims. I also have the Snowflake account information for each agent. Next, I create a secure view that joins on an entitlements table, such that each agent can only see the rows corresponding to policies that they have sold. I then share this directly with the agents. This share contains the secure view that I created with the names of the auto body shops, and the fraud risk identified by Quantifind. Finally, let's move on to the third and last part. Now that I have detected potentially fraudulent claims, I'm going to move on to building a dashboard that our executives have been asking for. They want to see how Insureco compares against other auto insurance companies on key metrics, like total claims paid out for the auto insurance line of business nationwide. I go to the Snowflake data marketplace and find SNL U.S. Insurance Statutory Data from SNP. This data is included with Insureco's existing subscription with SMP so when I request access to it, SMP can immediately share this data with me through Snowflake data sharing. I create a virtual database from the share, and I'm ready to query this data, no ETL needed. And since this is a virtual database, pointing to the original data in SNP Snowflake account, I have access to the latest data as it arrives in SNPs account. I see that the SNL U.S. Insurance Statutory Data from SNP has data on assets, premiums earned and claims paid out by each us insurance company in 2019. This data is broken up by line of business and geography and in many cases goes beyond the data that would be available from public financial filings. This is exactly the data I need. I identify a subset of comparable insurance companies whose net total assets are within 20% of Insureco's, and whose lines of business are similar to ours. I can now create a Snow site dashboard that compares Insureco against similar insurance companies on key metrics, like net earned premiums, and net claims paid out in 2019 for auto insurance. I can see that while we are below median our net earned premiums, we are doing better than our competition on total claims paid out in 2019, which could be a reflection of our improved claims handling and fraud detection. That's a good insight that I can share with our executives. In summary, the Data Cloud enabled me to do three key things. First, seamlessly fine data and data services that I need to do my job, be it an external data service like Quantifind and external data set from SNP or internal data from Insureco's data exchange. Second, get immediate live access to this data. And third, control and manage collaboration around this data. With Snowflake, I can mobilize data and data services across my business ecosystem in just minutes. >> Thank you Prasanna. Now I want to turn our focus to extensible data pipelines. We believe there are two different and important ways of making Snowflakes platform highly extensible. First, by enabling teams to leverage services or business logic that live outside of Snowflake interacting with data within Snowflake. We do this through a feature called external functions, a mechanism to conveniently bring data to where the computation is. We announced this feature for calling regional endpoints via AWS gateway in June, and it's currently available in public preview. We are also now in public preview supporting Azure API management and will soon support Google API gateway and AWS private endpoints. The second extensibility mechanism does the converse. It brings the computation to Snowflake to run closer to the data. We will do this by enabling the creation of functions and procedures in SQL, Java, Scala or Python ultimately providing choice based on the programming language preference for you or your organization. You will see Java, Scala and Python available through private and public previews in the future. The possibilities enabled by these extensibility features are broad and powerful. However, our commitment to being a great platform for data engineers, data scientists and developers goes far beyond programming language. Today, I am delighted to announce Snowpark a family of libraries that will bring a new experience to programming data in Snowflake. Snowpark enables you to write code directly against Snowflake in a way that is deeply integrated into the languages I mentioned earlier, using familiar concepts like DataFrames. But the most important aspect of Snowpark is that it has been designed and optimized to leverage the Snowflake engine with its main characteristics and benefits, performance, reliability, and scalability with near zero maintenance. Think of the power of a declarative SQL statements available through a well known API in Scala, Java or Python, all these against data governed in your core data platform. We believe Snowpark will be transformative for data programmability. I'd like to introduce Sri to showcase how our fictitious insurance company Insureco will be able to take advantage of the Snowpark API for data science workloads. >> Thanks Christian, hi, everyone? I'm Sri Chintala, a product manager at Snowflake focused on extensible data pipelines. And today, I'm very excited to show you a preview of Snowpark. In our first demo, we saw how Insureco could identify potentially fraudulent claims. Now, for all the valid claims InsureCo wants to ensure they're providing excellent customer service. To do that, they put in place a system to transcribe all of their customer calls, so they can look for patterns. A simple thing they'd like to do is detect the sentiment of each call so they can tell which calls were good and which were problematic. They can then better train their claim agents for challenging calls. Let's take a quick look at the work they've done so far. InsureCo's data science team use Snowflakes external functions to quickly and easily train a machine learning model in H2O AI. Snowflake has direct integrations with H2O and many other data science providers giving Insureco the flexibility to use a wide variety of data science libraries frameworks or tools to train their model. Now that the team has a custom trained sentiment model tailored to their specific claims data, let's see how a data engineer at Insureco can use Snowpark to build a data pipeline that scores customer call logs using the model hosted right inside of Snowflake. As you can see, we have the transcribed call logs stored in the customer call logs table inside Snowflake. Now, as a data engineer trained in Scala, and used to working with systems like Spark and Pandas, I want to use familiar programming concepts to build my pipeline. Snowpark solves for this by letting me use popular programming languages like Java or Scala. It also provides familiar concepts in APIs, such as the DataFrame abstraction, optimized to leverage and run natively on the Snowflake engine. So here I am in my ID, where I've written a simple scalar program using the Snowpark libraries. The first step in using the Snowpark API is establishing a session with Snowflake. I use the session builder object and specify the required details to connect. Now, I can create a DataFrame for the data in the transcripts column of the customer call logs table. As you can see, the Snowpark API provides native language constructs for data manipulation. Here, I use the Select method provided by the API to specify the column names to return rather than writing select transcripts as a string. By using the native language constructs provided by the API, I benefit from features like IntelliSense and type checking. Here you can see some of the other common methods that the DataFrame class offers like filters like join and others. Next, I define a get sentiment user defined function that will return a sentiment score for an input string by using our pre trained H2O model. From the UDF, we call the score method that initializes and runs the sentiment model. I've built this helper into a Java file, which along with the model object and license are added as dependencies that Snowpark will send to Snowflake for execution. As a developer, this is all programming that I'm familiar with. We can now call our get sentiment function on the transcripts column of the DataFrame and right back the results of the score transcripts to a new target table. Let's run this code and switch over to Snowflake to see the score data and also all the work that Snowpark has done for us on the back end. If I do a select star from scored logs, we can see the sentiment score of each call right alongside the transcript. With Snowpark all the logic in my program is pushed down into Snowflake. I can see in the query history that Snowpark has created a temporary Java function to host the pre trained H20 model, and that the model is running right in my Snowflake warehouse. Snowpark has allowed us to do something completely new in Snowflake. Let's recap what we saw. With Snowpark, Insureco was able to use their preferred programming language, Scala and use the familiar DataFrame constructs to score data using a machine learning model. With support for Java UDFs, they were able to run a train model natively within Snowflake. And finally, we saw how Snowpark executed computationally intensive data science workloads right within Snowflake. This simplifies Insureco's data pipeline architecture, as it reduces the number of additional systems they have to manage. We hope that extensibility with Scala, Java and Snowpark will enable our users to work with Snowflake in their preferred way while keeping the architecture simple. We are very excited to see how you use Snowpark to extend your data pipelines. Thank you for watching and with that back to you, Christian. >> Thank you Sri. You saw how Sri could utilize Snowpark to efficiently perform advanced sentiment analysis. But of course, if this use case was important to your business, you don't want to fully automate this pipeline and analysis. Imagine being able to do all of the following in Snowflake, your pipeline could start far upstream of what you saw in the demo. By storing your actual customer care call recordings in Snowflake, you may notice that this is new for Snowflake. We'll come back to the idea of storing unstructured data in Snowflake at the end of my talk today. Once you have the data in Snowflake, you can use our streams and past capabilities to call an external function to transcribe these files. To simplify this flow even further, we plan to introduce a serverless execution model for tasks where Snowflake can automatically size and manage resources for you. After this step, you can use the same serverless task to execute sentiment scoring of your transcript as shown in the demo with incremental processing as each transcript is created. Finally, you can surface the sentiment score either via snow side, or through any tool you use to share insights throughout your organization. In this example, you see data being transformed from a raw asset into a higher level of information that can drive business action, all fully automated all in Snowflake. Turning back to Insureco, you know how important data governance is for any major enterprise but particularly for one in this industry. Insurance companies manage highly sensitive data about their customers, and have some of the strictest requirements for storing and tracking such data, as well as managing and governing it. At Snowflake, we think about governance as the ability to know your data, manage your data and collaborate with confidence. As you saw in our first demo, the Data Cloud enables seamless collaboration, control and access to data via the Snowflake data marketplace. And companies may set up their own data exchanges to create similar collaboration and control across their ecosystems. In future releases, we expect to deliver enhancements that create more visibility into who has access to what data and provide usage information of that data. Today, we are announcing a new capability to help Snowflake users better know and organize your data. This is our new tagging framework. Tagging in Snowflake will allow user defined metadata to be attached to a variety of objects. We built a broad and robust framework with powerful implications. Think of the ability to annotate warehouses with cost center information for tracking or think of annotating tables and columns with sensitivity classifications. Our tagging capability will enable the creation of companies specific business annotations for objects in Snowflakes platform. Another key aspect of data governance in Snowflake is our policy based framework where you specify what you want to be true about your data, and Snowflake enforces those policies. We announced one such policy earlier this year, our dynamic data masking capability, which is now available in public preview. Today, we are announcing a great complimentary a policy to achieve row level security to see how role level security can enhance InsureCo's ability to govern and secure data. I'll hand it over to Artin for a demo. >> Hello, I'm Martin Avanes, Director of Product Management for Snowflake. As Christian has already mentioned, the rise of the Data Cloud greatly accelerates the ability to access and share diverse data leading to greater data collaboration across teams and organizations. Controlling data access with ease and ensuring compliance at the same time is top of mind for users. Today, I'm thrilled to announce our new row access policies that will allow users to define various rules for accessing data in the Data Cloud. Let's check back in with Insureco to see some of these in action and highlight how those work with other existing policies one can define in Snowflake. Because Insureco is a multinational company, it has to take extra measures to ensure data across geographic boundaries is protected to meet a wide range of compliance requirements. The Insureco team has been asked to segment what data sales team members have access to based on where they are regionally. In order to make this possible, they will use Snowflakes row access policies to implement row level security. We are going to apply policies for three Insureco's sales team members with different roles. Alice, an executive must be able to view sales data from both North America and Europe. Alex in North America sales manager will be limited to access sales data from North America only. And Jordan, a Europe sales manager will be limited to access sales data from Europe only. As a first step, the security administrator needs to create a lookup table that will be used to determine which data is accessible based on each role. As you can see, the lookup table has the row and their associated region, both of which will be used to apply policies that we will now create. Row access policies are implemented using standard SQL syntax to make it easy for administrators to create policies like the one our administrators looking to implement. And similar to masking policies, row access policies are leveraging our flexible and expressive policy language. In this demo, our admin users to create a row access policy that uses the row and region of a user to determine what row level data they have access to when queries are executed. When users queries are executed against the table protected by such a row access policy, Snowflakes query engine will dynamically generate and apply the corresponding predicate to filter out rows the user is not supposed to see. With the policy now created, let's log in as our Sales Users and see if it worked. Recall that as a sales executive, Alice should have the ability to see all rows from North America and Europe. Sure enough, when she runs her query, she can see all rows so we know the policy is working for her. You may also have noticed that some columns are showing masked data. That's because our administrator's also using our previously announced data masking capabilities to protect these data attributes for everyone in sales. When we look at our other users, we should notice that the same columns are also masked for them. As you see, you can easily combine masking and row access policies on the same data sets. Now let's look at Alex, our North American sales manager. Alex runs to st Korea's Alice, row access policies leverage the lookup table to dynamically generate the corresponding predicates for this query. The result is we see that only the data for North America is visible. Notice too that the same columns are still masked. Finally, let's try Jordan, our European sales manager. Jordan runs the query and the result is only the data for Europe with the same columns also masked. And you reintroduced masking policies, today you saw row access policies in action. And similar to our masking policies, row access policies in Snowflake will be accepted Hands of capability integrated seamlessly across all of Snowflake everywhere you expect it to work it does. If you're accessing data stored in external tables, semi structured JSON data, or building data pipelines via streams or plan to leverage Snowflakes data sharing functionality, you will be able to implement complex row access policies for all these diverse use cases and workloads within Snowflake. And with Snowflakes unique replication feature, you can instantly apply these new policies consistently to all of your Snowflake accounts, ensuring governance across regions and even across different clouds. In the future, we plan to demonstrate how to combine our new tagging capabilities with Snowflakes policies, allowing advanced audit and enforcing those policies with ease. And with that, let's pass it back over to Christian. >> Thank you Artin. We look forward to making this new tagging and row level security capabilities available in private preview in the coming months. One last note on the broad area of data governance. A big aspect of the Data Cloud is the mobilization of data to be used across organizations. At the same time, privacy is an important consideration to ensure the protection of sensitive, personal or potentially identifying information. We're working on a set of product capabilities to simplify compliance with privacy related regulatory requirements, and simplify the process of collaborating with data while preserving privacy. Earlier this year, Snowflake acquired a company called Crypto Numerix to accelerate our efforts on this front, including the identification and anonymization of sensitive data. We look forward to sharing more details in the future. We've just shown you three demos of new and exciting ways to use Snowflake. However, I want to also remind you that our commitment to the core platform has never been greater. As you move workloads on to Snowflake, we know you expect exceptional price performance and continued delivery of new capabilities that benefit every workload. On price performance, we continue to drive performance improvements throughout the platform. Let me give you an example comparing an identical set of customers submitted queries that ran both in August of 2019, and August of 2020. If I look at the set of queries that took more than one second to compile 72% of those improved by at least 50%. When we make these improvements, execution time goes down. And by implication, the required compute time is also reduced. Based on our pricing model to charge for what you use, performance improvements not only deliver faster insights, but also translate into cost savings for you. In addition, we have two new major announcements on performance to share today. First, we announced our search optimization service during our June event. This service currently in public preview can be enabled on a table by table basis, and is able to dramatically accelerate lookup queries on any column, particularly those not used as clustering columns. We initially support equality comparisons only, and today we're announcing expanded support for searches in values, such as pattern matching within strings. This will unlock a number of additional use cases such as analytics on logs data for performance or security purposes. This expanded support is currently being validated by a few customers in private preview, and will be broadly available in the future. Second, I'd like to introduce a new service that will be in private preview in a future release. The query acceleration service. This new feature will automatically identify and scale out parts of a query that could benefit from additional resources and parallelization. This means that you will be able to realize dramatic improvements in performance. This is especially impactful for data science and other scan intensive workloads. Using this feature is pretty simple. You define a maximum amount of additional resources that can be recruited by a warehouse for acceleration, and the service decides when it would be beneficial to use them. Given enough resources, a query over a massive data set can see orders of magnitude performance improvement compared to the same query without acceleration enabled. In our own usage of Snowflake, we saw a common query go 15 times faster without changing the warehouse size. All of these performance enhancements are extremely exciting, and you will see continued improvements in the future. We love to innovate and continuously raise the bar on what's possible. More important, we love seeing our customers adopt and benefit from our new capabilities. In June, we announced a number of previews, and we continue to roll those features out and see tremendous adoption, even before reaching general availability. Two have those announcements were the introduction of our geospatial support and policies for dynamic data masking. Both of these features are currently in use by hundreds of customers. The number of tables using our new geography data type recently crossed the hundred thousand mark, and the number of columns with masking policies also recently crossed the same hundred thousand mark. This momentum and level of adoption since our announcements in June is phenomenal. I have one last announcement to highlight today. In 2014, Snowflake transformed the world of data management and analytics by providing a single platform with first class support for both structured and semi structured data. Today, we are announcing that Snowflake will be adding support for unstructured data on that same platform. Think of the abilities of Snowflake used to store access and share files. As an example, would you like to leverage the power of SQL to reason through a set of image files. We have a few customers as early adopters and we'll provide additional details in the future. With this, you will be able to leverage Snowflake to mobilize all your data in the Data Cloud. Our customers rely on Snowflake as the data platform for every part of their business. However, the vision and potential of Snowflake is actually much bigger than the four walls of any organization. Snowflake has created a Data Cloud a data connected network with a vision where any Snowflake customer can leverage and mobilize the world's data. Whether it's data sets, or data services from traditional data providers for SaaS vendors, our marketplace creates opportunities for you and raises the bar in terms of what is possible. As examples, you can unify data across your supply chain to accelerate your time and quality to market. You can build entirely new revenue streams, or collaborate with a consortium on data for good. The possibilities are endless. Every company has the opportunity to gain richer insights, build greater products and deliver better services by reaching beyond the data that he owns. Our vision is to enable every company to leverage the world's data through seamless and governing access. Snowflake is your window into this data network into this broader opportunity. Welcome to the Data Cloud. (upbeat music)

Published Date : Nov 19 2020

SUMMARY :

is the gateway to the Data Cloud, FTP the file to Quantifind, It brings the computation to Snowflake and that the model is running as the ability to know your data, the ability to access is the mobilization of data to

ENTITIES

Entity	Category	Confidence
Insureco	ORGANIZATION	0.99+
Christian	PERSON	0.99+
Alice	PERSON	0.99+
August of 2020	DATE	0.99+
August of 2019	DATE	0.99+
June	DATE	0.99+
InsureCo	ORGANIZATION	0.99+
Martin Avanes	PERSON	0.99+
Europe	LOCATION	0.99+
Quantifind	ORGANIZATION	0.99+
Prasanna	PERSON	0.99+
15 times	QUANTITY	0.99+
2019	DATE	0.99+
Alex	PERSON	0.99+
SNP	ORGANIZATION	0.99+
2014	DATE	0.99+
Jordan	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Scala	TITLE	0.99+
Java	TITLE	0.99+
72%	QUANTITY	0.99+
SQL	TITLE	0.99+
Today	DATE	0.99+
North America	LOCATION	0.99+
each agent	QUANTITY	0.99+
SMP	ORGANIZATION	0.99+
second part	QUANTITY	0.99+
First	QUANTITY	0.99+
Second	QUANTITY	0.99+
Snowflake	ORGANIZATION	0.99+
Snowflake	TITLE	0.99+
Python	TITLE	0.99+
each call	QUANTITY	0.99+
Sri Chintala	PERSON	0.99+
each role	QUANTITY	0.99+
today	DATE	0.99+
Two	QUANTITY	0.99+
two	QUANTITY	0.99+
Both	QUANTITY	0.99+
Crypto Numerix	ORGANIZATION	0.99+
two entities	QUANTITY	0.99+

Scott Hanselman, Microsoft | Microsoft Ignite 2019

>> Announcer: Live from Orlando, Florida it's theCUBE! Covering Microsoft Ignite, brought to you by Cohesity. >> Hello, and happy taco Tuesday CUBE viewers! You are watching theCUBE's live coverage of Microsoft's Ignite here in Orlando, Florida. I'm your host Rebecca Knight, along with Stu Miniman. We're joined by Scott Hanselman, he is the partner program manager at Microsoft. Thank you so much for coming on theCUBE! >> Absolutely, my pleasure! >> Rebecca: And happy taco Tuesday to you! Will code for tacos. >> Will code for tacos. >> I'm digging it, I'm digging it >> I'm a very inexpensive coder. >> So you are the partner program manager, but you're really the people's programmer at Microsoft. Satya Nadella up on the main stage yesterday, talking about programming for everyone, empowering ordinary citizen developers, and you yourself were on the main stage this morning, "App Development for All", why is this such a priority for Microsoft at this point in time? >> Well there's the priority for Microsoft, and then I'll also speak selfishly as a priority for me, because when we talk about inclusion, what does that really mean? Well it is the opposite of exclusion. So when we mean inclusion, we need to mean everyone, we need to include everyone. So what can we do to make technology, to make programming possible, to make everyone enabled, whether that be something like drag and drop, and PowerApps, and the Power platform, all the way down to doing things like we did in the keynote this morning with C# on a tiny micro-controller, and the entire spectrum in between, whether it be citizen programmers in Excel using Power BI to go and do machine learning, or the silly things that we did in the keynote with rock, paper scissors that we might be able to talk about. All of that means including everyone and if the site isn't accessible, if Visual Studio as a tool isn't accessible, if you're training your AI in a non-ethical way, you are consciously excluding people. So back to what Satya thinks is why can't everyone do this? SatyaSacha thinks is why can't everyone do this? Why are we as programmers having any gate keeping, or you know, "You can't do that you're not a programmer, "you know, I'm a programmer, you can't have that." >> So what does the future look like, >> Rebecca: So what does the future look like, if everyone knows how to do it? I mean, do some imagining, visioning right now about if everyone does know how to do this, or at least can learn the building blocks for it, what does technology look like? >> Well hopefully it will be ethical, and it'll be democratized so that everyone can do it. I think that the things that are interesting, or innovative today will become commoditized tomorrow, like, something as simple as a webcam detecting your face, and putting a square around it and then you move around, and the square, we were like, "Oh my God, that was amazing!" And now it's just a library that you can download. What is amazing and interesting today, like AR and VR, where it's like, "Oh wow, I've never seen augmented reality work like that!" My eight-year-old will be able to do it in five years, and they'll be older than eight. >> So Scott, one of the big takeaways I had from the app dev keynote that you did this morning was in the past it was trying to get everybody on the same page, let's move them to our stack, let's move them to our cloud, let's move them on this programming language, and you really talked about how the example of Chipotle is different parts of the organization will write in a different language, and there needs to be, it's almost, you know, that service bus that you have between all of these environments, because we've spent, a lot of us, I know in my career I've spent decades trying to help break down those silos, and get everybody to work together, but we're never going to have everybody doing the same jobs, so we need to meet them where they are, they need to allow them to use the tools, the languages, the platforms that they want, but they need to all be able to work together, and this is not the Microsoft that I grew up with that is now an enabler of that environment. The word we keep coming back to is trust at the keynote. I know there's some awesome, cool new stuff about .net which is a piece of it, but it's all of the things together. >> Right, you know I was teaching a class at Mesa Community College down in San Diego a couple of days ago and they were trying, they were all people who wanted jobs, just community college people, I went to community college and it's like, I just want to know how to get a job, what is the thing that I can do? What language should I learn? And that's a tough question. They wonder, do I learn Java, do I learn C#? And someone had a really funny analogy, and I'll share it with you. They said, well you know English is the language, right? Why don't the other languages just give up? They said, you know, Finland, they're not going to win, right? Their language didn't win, so they should just give up, and they should all speak English, and I said, What an awful thing! They like their language! I'm not going to go to people who do Haskell, or Rust, or Scala, or F#, and say, you should give up! You're not going to win because C won, or Java won, or C# won. So instead, why don't we focus on standards where we can inter-operate, where we can accept that the reality is a hybrid cloud things like Azure Arc that allows us to connect multiple clouds, multi-vendor clouds together. That is all encompassing the concept of inclusion, including everyone means including every language, and as many standards as you can. So it might sound a little bit like a Tower of Babel, but we do have standards and the standards are HTTP, REST, JSON, JavaScript. It may not be the web we deserve, but it's the web that we have, so we'll use those building block technologies, and then let people do their own thing. >> So speaking of the keynote this morning, one of the cool things you were doing was talking about the rock, paper, scissors game, and how it's expanding. Tell our viewers a little bit more about the new elements to rock, paper, scissors. >> So folks named Sam Kass, a gentleman named Sam Kass many, many years ago on the internet, when the internet was much simpler web pages, created a game called Rock, Paper, Scissors, Lizard, Spock, and a lot of people will know that from a popular TV show on CBS, and they'll give credit to that show, in fact it was Sam Kass and Karen Bryla who created that, and we sent them a note and said, "Hey can I write a game about this?" And we basically built a Rock, Paper, Scissors, Lizard, Spock game in the cloud containerized at scale with multiple languages, and then we also put it on a tiny device, and what's fun about the game from a complexity perspective is that rock, paper, scissors is easy. There's only three rules, right? Paper covers rock, which makes no sense, but when you have five, it's hard! Spock shoots the Rock with his phaser, and then the lizard poisons Spock, and the paper disproves, and it gets really hard and complicated, but it's also super fun and nerdy. So we went and created a containerized app where we had all different bots, we had node, Python, Java, C#, and PHP, and then you can say, I'm going to pick Spock and .net, or node and paper, and have them fight, and then we added in some AI, and some machine learning, and some custom vision such that if you sign in with Twitter in this game, it will learn your patterns, and try to defeat you using your patterns and then, clicking on your choices and fun, snd then, clicking on your choices and fun, because we all want to go, "Rock, Paper, Scissors shoot!" So we made a custom vision model that would go, and detect your hand or whatever that is saying, this is Spock and then it would select it and play the game. So it was just great fun, and it was a lot more fun than a lot of the corporate demos that you see these days. >> All right Scott, you're doing a lot of different things at the show here. We said there's just a barrage of different announcements that were made. Love if you could share some of the things that might have flown under the radar. You know, Arc, everyone's talking about, but some cool things or things that you're geeking out on that you'd want to share with others? >> Two of the things that I'm most excited, one is an announcement that's specific to Ignite, and one's a community thing, the announcement is that .net Core 3.1 is coming. .net Core 3 has been a long time coming as we have began to mature, and create a cross platform open source .net runtime, but .net Core 3.1 LTS Long Term Support means that that's a version of .net core that you can put on a system for three years and be supported. Because a lot of people are saying, "All this open source is moving so fast! "I just upgraded to this, "and I don't want to upgrade to that". LTS releases are going to happen every November in the odd numbered years. So that means 2019, 2021, 2023, there's going to be a version of .net you can count on for three years, and then if you want to follow that train, the safe train, you can do that. In the even numbered years we're going to come out with a version of .net that will push the envelope, maybe introduce a new version of C#, it'll do something interesting and new, then we tighten the screws and then the following year that becomes a long term support version of .net. >> A question for you on that. One of the challenges I hear from customers is, when you talk about hybrid cloud, they're starting to get pulled apart a little bit, because in the public cloud, if I'm running Azure, I'm always on the latest version, but in my data center, often as you said, I want longer term support, I'm not ready to be able to take that CICD push all of the time, so it feels like I live, maybe call it bimodal if you want, but I'm being pulled with the am I always on the latest, getting the latest security, and it's all tested by them? Or am I on my own there? How do you help customers with that, when Microsoft's developing things, how do you live in both of those worlds or pull them together? >> Well, we're really just working on this idea of side-by-side, whether it be different versions of Visual Studio that are side-by-side, the stable one that your company is paying for, and then the preview version that you can go have side-by-side, or whether you could have .net Core 3, 3.1, or the next version, a preview version, and a safe version side-by-side. We want to enable people to experiment without fear of us messing up their machine, which is really, really important. >> One of the other things you were talking about is a cool community announcement. Can you tell us a little bit more about that? >> So this is a really cool product from a very, very small company out of Oregon, from a company called Wilderness Labs, and Wilderness Labs makes a micro-controller, not a micro-processor, not a raspberry pie, it doesn't run Linux, what it runs is .net, so we're actually playing Rock, Paper, Scissors, Lizard, Spock on this device. We've wired it all up, this is a screen from our friends at Adafruit, and I can write .net, so somehow if someone is working at, I don't know, the IT department at Little Debbie Snack Cakes, and they're making WinForms applications, they're suddenly now an IOT developer, 'cause they can go and write C# code, and control a device like this. And when you have a micro-controller, this will run for weeks on a battery, not hours. You go and 3D print a case, make this really tiny, it could become a sensor, it could become an IOT device, or one of thousands of devices that could check crops, check humidity, moisture wetness, whatever you want, and we're going to enable all kinds of things. This is just a commodity device here, this screen, it's not special. The actual device, this is the development version, size of my finger, it could be even smaller if we wanted to make it that way, and these are our friends at Wilderness Labs. and they had a successful Kickstarter, and I just wanted to give them a shout out, and I just wanted to give them a shoutout, I don't have any relationship with them, I just think they're great. >> Very cool, very cool. So you are a busy guy, and as Stu said, you're in a lot of different things within Microsoft, and yet you still have time to teach at community college. I'm interested in your perspective of why you do that? Why do you think it's so important to democratize learning about how to do this stuff? >> I am very fortunate and I think that we people, who have achieved some amount of success in our space, need to recognize that luck played a factor in that. That privilege played a factor in that. But, why can't we be the luck for somebody else, the luck can be as simple as a warm introduction. I believe very strongly in what I call the transitive value of friendship, so if we're friends, and you're friends, then the hypotenuse can be friends as well. A warm intro, a LinkedIn, a note that like, "Hey, I met this person, you should talk to them!" Non-transactional networking is really important. So I can go to a community college, and talk to a person that maybe wanted to quit, and give a speech and give them, I don't know, a week, three months, six months, more whatever, chutzpah, moxie, something that will keep them to finish their degree and then succeed, then I'm going to put good karma out into the world. >> Paying it forward. >> Exactly. >> So Scott, you mentioned that when people ask for advice, it's not about what language they do, is to, you know, is to,q you know, we talk in general about intellectual curiosity of course is good, being part of a community is a great way to participate, and Microsoft has a phenomenal one, any other tips you'd give for our listeners out there today? >> The fundamentals will never go out of style, and rather than thinking about learning how to code, why not think about learning how to think, and learning about systems thinking. One of my friends, Kishau Rogers, talked about systems thinking, I've hade her on my podcast a number of times, and we were giving a presentation at Black Girls Code, and I was talking to a fifteen-year-old young woman, and we were giving a presentation. It was clear that her mom wanted her to be there, and she's like, "Why are we here?" And I said, "All right, let's talk about programming "everybody, we're talking about programming. "My toaster is broken and the toast is not working. "What do you think is wrong?" Big, long, awkward pause and someone says, "Well is the power on?" I was like, "Well, I plugged a light in, "and nothing came on" and they were like, "Well is the fuse blown?" and then one little girl said "Well did the neighbors have power?", And I said, "You're debugging, we are debugging right?" This is the thing, you're a systems thinker, I don't know what's going on with the computer when my dad calls, I'm just figuring it out like, "Oh, I'm so happy, you work for Microsoft, "you're able to figure it out." >> Rebecca: He has his own IT guy now in you! >> Yeah, I don't know, I unplug the router, right? But that ability to think about things in the context of a larger system. I want toast, power is out in the neighborhood, drawing that line, that makes you a programmer, the language is secondary. >> Finally, the YouTube videos. Tell our viewers a little bit about those. you can go to D-O-T.net, so dot.net, the word dot, you can go to d-o-t.net, so dot.net, the word dot, slash videos and we went, and we made a 100 YouTube videos on everything from C# 101, .net, all the way up to database access, and putting things in the cloud. A very gentle, "Mr. Rodgers' Neighborhood" on-ramp. A lot of things, if you've ever seen that cartoon that says, "Want to draw an owl? "Well draw two circles, "and then draw the rest of the fricking owl." A lot of tutorials feel like that, and we don't want to do that, you know. We've got to have an on-ramp before we get on the freeway. So we've made those at dot.net/videos. >> Excellent, well that's a great plug! Thank you so much for coming on the show, Scott. >> Absolutely my pleasure! >> I'm Rebecca Knight, for Stu Miniman., stay tuned for more of theCUBE's live coverage of Microsoft Ignite. (upbeat music)

Published Date : Nov 5 2019

SUMMARY :

Covering Microsoft Ignite, brought to you by Cohesity. he is the partner program manager at Microsoft. Rebecca: And happy taco Tuesday to you! and you yourself were on the main stage this morning, and if the site isn't accessible, and the square, we were like, "Oh my God, that was amazing!" and there needs to be, it's almost, you know, and as many standards as you can. one of the cool things you were doing was talking about and then you can say, I'm going to pick Spock and Love if you could share some of the things and then if you want to follow that train, the safe train, but in my data center, often as you said, that you can go have side-by-side, One of the other things you were talking about and I just wanted to give them a shout out, and yet you still have time to teach at community college. and talk to a person that maybe wanted to quit, and we were giving a presentation at Black Girls Code, drawing that line, that makes you a programmer, and we don't want to do that, you know. Thank you so much for coming on the show, Scott. of Microsoft Ignite.

ENTITIES

Entity	Category	Confidence
Rebecca	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Scott Hanselman	PERSON	0.99+
Scott	PERSON	0.99+
Karen Bryla	PERSON	0.99+
Satya Nadella	PERSON	0.99+
Stu Miniman	PERSON	0.99+
Wilderness Labs	ORGANIZATION	0.99+
Oregon	LOCATION	0.99+
Microsoft	ORGANIZATION	0.99+
San Diego	LOCATION	0.99+
five	QUANTITY	0.99+
six months	QUANTITY	0.99+
three years	QUANTITY	0.99+
Excel	TITLE	0.99+
Kishau Rogers	PERSON	0.99+
Sam Kass	PERSON	0.99+
2019	DATE	0.99+
Visual Studio	TITLE	0.99+
three months	QUANTITY	0.99+
Java	TITLE	0.99+
Orlando, Florida	LOCATION	0.99+
Two	QUANTITY	0.99+
2023	DATE	0.99+
Python	TITLE	0.99+
Linux	TITLE	0.99+
2021	DATE	0.99+
five years	QUANTITY	0.99+
100	QUANTITY	0.99+
CBS	ORGANIZATION	0.99+
Little Debbie Snack Cakes	ORGANIZATION	0.99+
Satya	PERSON	0.99+
three rules	QUANTITY	0.99+
PHP	TITLE	0.99+
Adafruit	ORGANIZATION	0.99+
LinkedIn	ORGANIZATION	0.99+
English	OTHER	0.99+
yesterday	DATE	0.99+
Rodgers'	PERSON	0.99+
tomorrow	DATE	0.99+
eight-year	QUANTITY	0.99+
Scala	TITLE	0.98+
One	QUANTITY	0.98+
Stu	PERSON	0.98+
Rust	TITLE	0.98+
C#	TITLE	0.98+
Twitter	ORGANIZATION	0.98+
YouTube	ORGANIZATION	0.98+
both	QUANTITY	0.98+
node	TITLE	0.98+
Chipotle	ORGANIZATION	0.98+
Haskell	TITLE	0.97+
Azure	TITLE	0.97+
Tower of Babel	TITLE	0.97+
one	QUANTITY	0.97+
Power BI	TITLE	0.97+
SatyaSacha	PERSON	0.97+
Azure Arc	TITLE	0.97+
Spock	PERSON	0.97+
today	DATE	0.96+
.net	OTHER	0.96+
C won	TITLE	0.95+
3.1	TITLE	0.94+
a week	QUANTITY	0.94+

Jim HPE DCE 3 Segment

thanks Peter I'm here with James kabila Sawicki bonds lead analyst for data science Jim been hearing a lot about data science and how machine learning is coming into this environment give it give us a little bit of a guidance as to how that this this whole space fits into data science you know how does that that infrastructure fit in with data science today yeah well stew data science is a set of practices for building and training statistical models often known as machine learning models to be deployed into applications to do things like predictive analysis automating next best offers and marketing and so forth so what machine learning is all about is the statistical model and those are built by a category of professionals known as data scientists but data scientists operated in teams there are data engineers who manage your data lake there are data modelers who build the models themselves there are there are professionals who specialize in training the models and deploying them trainings like Quality Assurance so what it's all about is really these part these functions are increasingly being combined into workflows they have to conform with DevOps practices because this is an important set of application development capabilities that are absolutely essential to deploy machine learning into AI in AI is really the secret sauce of so many apps nowadays all right Jim is we've looked at data center Ops walk us through the tech the process and the people okay data center else really is data science ops or often well wiki bomb we've referred to as DevOps for data science and really what we start with the the start with the people I've already been to sketch those up so in terms of the people the professionals involved in building and training and deploying and evaluating and iterating machine learning models there are the data scientists who are this justjust iskele modelers you might call them the algorithm jockeys though that may be regarded as a pejorative but nonetheless these are the high-powered professionals who who build who know which algorithm is correct for what the challenge they build the models on there are the data engineers who not only manage your data lakes the data lakes is where the training data is maintained the data for building the model and for training the models are maintained in data lakes the data engineers manage that they also manage data preparation data transformation data cleansing to get the data clean and correct so that it can be used to build high quality models there are other functions that are absolutely essential there are as what some call ml architour machine learning architects I like to think of them as subject-matter experts who work with the data scientist to build what are called the feature sets the predictors that need to be built into machine learning models for those models to do therefore perform their function correctly whether it be a prediction or like face recognition or natural language processing for your ear your chat BOTS and so forth you need the subject matter experts with you to provide guidance to the data scientists as to what variables to build into these models there was also coders there's a lot of coding that's done in data science and ml ops that's done in Python and Java and Scala and a variety of other languages and there's other functions as well but these are the core functions that need to be performed in a team environment really in a workflow and that is where the process comes in the workflow for data science in teams is DevOps it's really the continuous integration of different data sets as well as different models as well as different features into the building and training of AI so these need to be pretty nice functions need to be performed in a workflow that's highly structured where there's checkpoints and there's governance and there's a transparency and auditability so it really all this needs to be performed in a DevOps environment where you have the data lake which is the source of the the data of course we also have a source repository for managing the current and past versions of the models themselves where you also do governance on the code builds that are with each of the models that are deployed into your application environment so that's the process site at all and then the platform our tech side is really revolves around with some colleague data science workbench or a data science platform there's a variety of terms for it but essentially it is a development environment that enables a high degree of automation and all across all these functions because automation is absolutely essential for speed and consistency in terms of how models are built and entrained there's also a need for our collaboration capability strong ones within these platforms so these different human roles can work together in a cohesive fashion and really like a well-oiled machine screaming there's a need for repositories - like I said managed in govern the current versions of all the artifacts be they data be they models be they code bills and so forth that are essential so all of these people processes and Texas and there is building high-quality AI yeah so Jim I noticed you call it DevOps for data science so yes there's a real emphasis there on how we get all of these new things aligned with the process for DevOps and maybe help us put a point on you know why that's so important well because DevOps is how applications are built and deployed now everywhere which is essentially it so it's a workflow it's a standard workflow that involves a scaleable organization where you have code that is built and managed and governed according to a standard workflow standard repositories with checkpoints and transparency as a way of consensual e ensuring that high quality code is deployed into working applications according to essentially a factory-style automation or an industrialized workflow so data science is a development discipline data science needs to as a as a workflow needs to conform with the established DevOps practices that your application developers your coders have already established in fact most AI applications most machine learning applications involve code involved machine learning models but also involve containers and kubernetes and increasingly serverless interfaces and so forth so data science is not separated from the other aspects of the DevOps workflow itting and christie is a unified and integrated piece of your operations and they needs to be managed as such all right well Jim appreciate you going through the evolution on that I know you've written quite a bit about this topic on the wiki bond website and Peter will send it back to you

Published Date : Sep 6 2019

**Summary and Sentiment Analysis are not been shown because of improper transcript**

ENTITIES

Entity	Category	Confidence
Jim	PERSON	0.99+
Peter	PERSON	0.99+
Java	TITLE	0.99+
Python	TITLE	0.99+
Scala	TITLE	0.99+
DevOps	TITLE	0.99+
Texas	LOCATION	0.98+
today	DATE	0.96+
James kabila Sawicki	PERSON	0.96+
each	QUANTITY	0.91+
christie	PERSON	0.91+
apps	QUANTITY	0.66+
wiki bond	ORGANIZATION	0.66+
Ops	ORGANIZATION	0.55+

Nanda Vijaydev, HPE (BlueData) | CUBE Conversation, September 2019

from our studios in the heart of Silicon Valley Palo Alto California this is a cute conversation hi and welcome to the cube Studios for another cube conversation where we go in-depth with thought leaders driving innovation across the tech industry I'm your host Peter Burris AI is on the forefront of every board in every enterprise on a global basis as well as machine learning deep learning and other advanced technologies that are intended to turn data into business action that differentiates the business leads to more revenue leads to more profitability but the challenge is is that all of these new use cases are not able to be addressed with the traditional ways that we've set up the workflows that we've set up to address them so as a consequence we're going to need greater opera's the operationalization of how we translate business problems into ml and related technology solutions big challenge we've got a great guest today to talk about it non-division diof is a distinguished technologist and lead data scientists at HPE in the blue data team nonde welcome to the cube thank you happy to be here so ananda let's start with this notion of a need for an architected approach to how we think about matching AI ml technology to operations so that we get more certain results better outcomes more understanding of where we're going and how the technology is working within the business absolutely yeah ai and doing AI in an enterprise is not new there have been enterprise-grade tools in the space before but most of them have a very prescribed way of doing things sometimes you use custom sequel to use that particular tool or the way you present data to that tool requires some level of pre-processing which makes you copy the data into the tool so you have already data fidelity maybe at risk and you have a data duplication happening and then the scale right when you talk about doing AI at the scale that is required now considering data is so big and there is a variety of data sets for the scale it can probably be done but there is a huge cost associated with that and you may still not meet the variety of use cases that you want to actually work on so the problem now is to make sure that you empower your users who are working in the space and augment them with the right set of technologies and the ability to bring data in a timely manner for them to work on these solutions so it sounds as though what we're trying to do is simplify the process of taking great ideas and turn it into great outcomes but you mentioned users I think it's got to start with or let me ask you if we have to start here that we've always thought about how is going to center in the data science or the data scientist as these solutions have start to become more popularized if diffused across the industry a lot more people are engaging are all roles being served as well as you need to be absolutely I think that's the biggest challenge right in the past you know when we talk about very prescribed solutions end to end was happening within those tools so the different user persona were probably part of that particular solution and also the way these models came into production which is really making it available for a consumer is read coding or redeveloping this in technologies that were production friendly which is you're rewriting that and sequel you're recording that and C so there is a lot of details that are lost in translation and the third big problem was really having visibility or having a say from a developer's point of view or a data scientist point of view in how these things are performing in production that how do you actually take it back take that feedback back into deciding you know is this model still good or how do you retrain so when you look at this lifecycle holistically this is an iterative process it is no longer you know workflow where you hand things off this is not a water flow methodology anymore this is a very very continuous and iterative process especially in the New Age data science the tools that are developing where you build the model that developer decides what the run time is and the run times are capable of serving those models as is you don't have to recode you don't have to lose things during translation so with this back to your question of how do you serve two different roles now all those personas and all those roles have to be part of the same project and they have to be part of the same experiment they're just serving different parts of the lifecycle and now you've whatever tooling you provide or whatever architecture technologies you provide have to look at it holistically there has to be continuous development there has to be collaboration there has to be central repositories that actually cater to those needs so each so the architected approach needs to be able to serve each of the roles but in a way that is collaborative and is ultimately put in service to the outcome and driving the use of the technology forward well that leads to another question should it should the should this architected approach be tied to one or another set of algorithms or one or another set of implementation infrastructure or does it have to be able to serve a wide array of Technology types yeah great question right this is a living ecosystem we can no longer build for you know you plant something for the next two years or the next three years technologies are coming every day and the reason is because the types of use cases are evolving and what you need to solve that use case is completely different when you look at two different use cases so whatever standards you come up with you know the consistency has to be across how a user is on-boarded into the system a consistency has to be about data access about security about how does one provision these environments but as far as what tool is used or how is that tool being applied to a specific problem there's a lot of variability in there and it has to cater your architecture has to make sure that this variability is addressed and it is growing so HPE spends a lot of time with customers and you're learning from your customer successes and how you turn that into tooling that leads to this type of operator operationalization but give us some visibility into some of those successes that really stand out for you that have been essential to how HP has participated in this journey to create better tools for better AI and m/l absolutely you know traditionally with blue data HPE now you know we've been exposed to a lot of big data processing technologies where the current landscape the data is different data is not always at rest data is not structured you know data is coming it could be a stream of data it could be a picture and in the use cases like we talked about you know it could be image recognition or a voice recognition where the type of data is very different right so back to how we've learnt from our customers like in my role I talked to you know tens of customers on a daily or weekly basis and each one of them are at a different level of maturity in their life cycle and these are some very established customers but you know the various groups that are adopting this new age technologies even within an organization there is a lot of variability so whatever we offered them we have to help support all of that particular user groups there are some who are coming from the classic or language background there are some that are coming from Python background some are doing things in Scala someone doing things in SPARC and there are some commercial tools that they're using like h2o driverless AI or data iku so what we have to look at is in this life cycle we have to make sure that all these communities are represented and/or addressed and if they build a model in a specific technology how do we consume that how do we take it in then how do we deploy that from an end to point of view it doesn't matter where a model gets built it does matter how end-users access it it doesn't matter how security is applied to it it does matter how scaling is applied to it so really there is a lot of consistency is required in the operationalization and also in how you onboard those different tools how do you make sure that consistency or methodology or standard practices are applied in this entire lifecycle and also monitoring that's a huge aspect right when you have deployed a model and it's in production monitoring means two different things to people where is it even available you know when you go to a website when you click on something is a website available very similarly when you go to an endpoint or you're scoring against a model is that model available do you have enough resources can it scale depending on how much requests come in that's one aspect of monitoring and the second aspect is really how was the model performing you know is that what is the accuracy what is the drift when is it time to retrain so you no longer have the luxury to look at these things in isolation right so it we want to make sure that all these things can be addressed in a manner knowing that this iteration sometimes can be a month sometimes it can be a day sometimes it's probably a few hours and that is why it can no longer be an isolated and even infrastructure point of view some of these workloads may need things like GPU and you may need it for a very short amount of time let how do you make sure that you give what is needed for that duration that is required and take it back and assign it to something else because these are very valuable resources so I want to build on if I may on that notion of onboarding the tools we're talking about use cases that enterprises are using today to create business value we're talking about HPE as an example delivering tooling that operationalize is how that's done today but the reality is we're gonna see the state of the art still evolve pretty dramatically over the next few years how is HPE going about ensuring that your approach and the approach you working with your customers does not get balkanized does not get you know sclerotic that it's capable of evolving and changing as folks learn new approaches to doing things absolutely you know it this has to start with having an open architecture you know you have to there has to be standards without which enterprises can't run but at the same time those standards shouldn't be so constricting that it doesn't allow you to expand into newer use cases right so what HP EML ops offers is really making sure that you can do what you do today in a best-practice manner or in the most efficient manner bringing time to value you know making sure that there is you know instant provisioning or access to beta or making sure that you don't duplicate data compute storage separation containerization you know these are some of the standard best practice technologies that are out there making sure that you adopt those and what these sets users for is to make sure that they can evolve with the later use cases you can never have you know you can never have things you know frozen in time you just want to make sure that you can evolve and this is what it sets them up for and you evolve with different use cases and different tools as they come along nada thanks very much has been a very it's been a great conversation we appreciate you being on the cube thank you Peter so my guest has been non Division I of the distinguished technologists and lead data scientists at HPE blue data and for all of you thanks for joining us again for another cube conversation on Peter burst see you next time you [Music]

Published Date : Sep 5 2019

**Summary and Sentiment Analysis are not been shown because of improper transcript**

ENTITIES

Entity	Category	Confidence
September 2019	DATE	0.99+
Nanda Vijaydev	PERSON	0.99+
Scala	TITLE	0.99+
Python	TITLE	0.99+
second aspect	QUANTITY	0.99+
tens of customers	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
HPE	ORGANIZATION	0.99+
Peter Burris	PERSON	0.99+
Peter	PERSON	0.98+
HP	ORGANIZATION	0.98+
BlueData	ORGANIZATION	0.97+
each	QUANTITY	0.97+
two different use cases	QUANTITY	0.97+
a day	QUANTITY	0.97+
third big problem	QUANTITY	0.97+
a month	QUANTITY	0.96+
two different things	QUANTITY	0.96+
each one	QUANTITY	0.95+
two different roles	QUANTITY	0.94+
one	QUANTITY	0.94+
today	DATE	0.92+
Palo Alto California	LOCATION	0.92+
SPARC	TITLE	0.91+
lot of details	QUANTITY	0.87+
New Age	DATE	0.83+
blue data	ORGANIZATION	0.83+
lot	QUANTITY	0.81+
ananda	PERSON	0.76+
h2o	TITLE	0.76+
lot more	QUANTITY	0.74+
a few hours	QUANTITY	0.72+
few years	DATE	0.7+
next two years	DATE	0.69+
daily	QUANTITY	0.65+
years	QUANTITY	0.62+
weekly	QUANTITY	0.62+
next three	DATE	0.61+
HPE	TITLE	0.59+
time	QUANTITY	0.55+
Division I	QUANTITY	0.54+

Rob Thomas, IBM | IBM Innovation Day 2018

(digital music) >> From Yorktown Heights, New York It's theCUBE! Covering IBM Cloud Innovation Day. Brought to you by IBM. >> Hi, it's Wikibon's Peter Burris again. We're broadcasting on The Cube from IBM Innovation Day at the Thomas J Watson Research Laboratory in Yorktown Heights, New York. Have a number of great conversations, and we got a great one right now. Rob Thomas, who's the General Manager of IBM Analytics, welcome back to theCUBE. >> Thanks Peter, great to see you. Thanks for coming out here to the woods. >> Oh, well it's not that bad. I actually live not to far from here. Interesting Rob, I was driving up the Taconic Parkway and I realized I hadn't been on it in 40 years, so. >> Is that right? (laugh) >> Very exciting. So Rob let's talk IBM analytics and some of the changes that are taking place. Specifically, how are customers thinking about achieving their AI outcomes. What's that ladder look like? >> Yeah. We call it the AI ladder. Which is basically all the steps that a client has to take to get to get to an AI future, is the best way I would describe it. From how you collect data, to how you organize your data. How you analyze your data, start to put machine learning into motion. How you infuse your data, meaning you can take any insights, infuse it into other applications. Those are the basic building blocks of this laddered AI. 81 percent of clients that start to do something with AI, they realize their first issue is a data issue. They can't find the data, they don't have the data. The AI ladder's about taking care of the data problem so you can focus on where the value is, the AI pieces. >> So, AI is a pretty broad, hairy topic today. What are customers learning about AI? What kind of experience are they gaining? How is it sharpening their thoughts and their pencils, as they think about what kind of outcomes they want to achieve? >> You know, its... For some reason, it's a bit of a mystical topic, but to me AI is actually quite simple. I'd like to say AI is not magic. Some people think it's a magical black box. You just, you know, put a few inputs in, you sit around and magic happens. It's not that, it's real work, it's real computer science. It's about how do I put, you know, how do I build models? Put models into production? Most models, when they go into production, are not that good, so how do I continually train and retrain those models? Then the AI aspect is about how do I bring human features to that? How do I integrate that with natural language, or with speech recognition, or with image recognition. So, when you get under the covers, it's actually not that mystical. It's about basic building blocks that help you start to achieve business outcomes. >> It's got to be very practical, otherwise the business has a hard time ultimately adopting it, but you mentioned a number of different... I especially like the 'add the human features' to it of the natural language. It also suggests that the skill set of AI starts to evolve as companies mature up this ladder. How is that starting to change? >> That's still one of the biggest gaps, I would say. Skill sets around the modern languages of data science that lead to AI: Python, AR, Scala, as an example of a few. That's still a bit of a gap. Our focus has been how do we make tools that anybody can use. So if you've grown up doing SPSS or SaaS, something like that, how do you adopt those skills for the open world of data science? That can make a big difference. On the human features point, we've actually built applications to try to make that piece easy. Great example is with Royal Bank of Scotland where we've created a solution called Watson Assistant which is basically how do we arm their call center representatives to be much more intelligent and engaging with clients, predicting what clients may do. Those types of applications package up the human features and the components I talked about, makes it really easy to get AI into production. >> Now many years ago, the genius Turing, noted the notion of the Turing machine where you couldn't tell the difference between the human and a machine from an engagement standpoint. We're actually starting to see that happen in some important ways. You mentioned the call center. >> Yep. >> How are technologies and agency coming together? By that I mean, the rate at which businesses are actually applying AI to act as an agent for them in front of customers? >> I think it's slow. What I encourage clients to do is, you have to do a massive number of experiments. So don't talk to me about the one or two AI projects you're doing, I'm thinking like hundreds. I was with a bank last week in Japan, and they're comment was in the last year they've done a hundred different AI projects. These are not one year long projects with hundreds of people. It's like, let's do a bunch of small experiments. You have to be comfortable that probably half of your experiments are going to fail, that's okay. The goal is how do you increase your win rate. Do you learn from the ones that work, and from the ones that don't work, so that you can apply those. This is all, to me at this stage, is about experimentation. Any enterprise right now, has to be thinking in terms of hundreds of experiments, not one, not two or 'Hey, should we do that project?' Think in terms of hundreds of experiments. You're going to learn a lot when you do that. >> But as you said earlier, AI is not magic and it's grounded in something, and it's increasingly obvious that it's grounded in analytics. So what is the relationship between AI analytics, and what types of analytics are capable of creating value independent of AI? >> So if you think about how I kind of decomposed AI, talked about human features, I talked about, it kind of starts with a model, you train the model. The model is only as good as the data that you feed it. So, that assumes that one, that your data's not locked into a bunch of different silos. It assumes that your data is actually governed. You have a data catalog or that type of capability. If you have those basics in place, once you have a single instantiation of your data, it becomes very easy to train models, and you can find that the more that you feed it, the better the model's going to get, the better your business outcomes are going to get. That's our whole strategy around IBM Cloud Private for Data. Basically, one environment, a console for all your data, build a model here, train it in all your data, no matter where it is, it's pretty powerful. >> Let me pick up on that where it is, 'cause it's becoming increasingly obvious, at least to us and our clients, that the world is not going to move all the data over to a central location. The data is going to be increasingly distributed closer to the sources, closer to where the action is. How does AI and that notion of increasing distributed data going to work together for clients. >> So we've just released what's called IBM Data Virtualization this month, and it is a leapfrog in terms of data virtualization technology. So the idea is leave your data where ever it is, it could be in a data center, it could be on a different data center, it could be on an automobile if you're an automobile manufacturer. We can federate data from anywhere, take advantage of processing power on the edge. So we're breaking down that problem. Which is, the initial analytics problem was before I do this I've got to bring all my data to one place. It's not a good use of money. It's a lot of time and it's a lot of money. So we're saying leave your data where it is, we will virtualize your data from wherever it may be. >> That's really cool. What was it called again? >> IBM Data Virtualization and it's part of IBM Cloud Private for Data. It's a feature in that. >> Excellent, so one last question Rob. February's coming up, IBM Think San Francisco thirty plus thousand people, what kind of conversations do you anticipate having with you customers, your partners, as they try to learn, experiment, take away actions that they can take to achieve their outcomes? >> I want to have this AI experimentation discussion. I will be encouraging every client, let's talk about hundreds of experiments not 5. Let's talk about what we can get started on now. Technology's incredibly cheap to get started and do something, and it's all about rate and pace, and trying a bunch of things. That's what I'm going to be encouraging. The clients that you're going to see on stage there are the ones that have adopted this mentality in the last year and they've got some great successes to show. >> Rob Thomas, general manager IBM Analytics, thanks again for being on theCUBE. >> Thanks Peter. >> Once again this is Peter Buriss of Wikibon, from IBM Innovation Day, Thomas J Watson Research Center. We'll be back in a moment. (techno beat)

Published Date : Dec 7 2018

SUMMARY :

Brought to you by IBM. at the Thomas J Watson Research Laboratory Thanks for coming out here to the woods. I actually live not to far from here. and some of the changes care of the data problem What kind of experience are they gaining? blocks that help you How is that starting to change? that lead to AI: Python, AR, notion of the Turing so that you can apply those. But as you said earlier, AI that the more that you feed it, that the world is not So the idea is leave your What was it called again? of IBM Cloud Private for Data. that they can take to going to see on stage there Rob Thomas, general Peter Buriss of Wikibon,

ENTITIES

Entity	Category	Confidence
Peter Buriss	PERSON	0.99+
Japan	LOCATION	0.99+
Rob Thomas	PERSON	0.99+
Peter	PERSON	0.99+
one	QUANTITY	0.99+
IBM	ORGANIZATION	0.99+
one year	QUANTITY	0.99+
Royal Bank of Scotland	ORGANIZATION	0.99+
Rob	PERSON	0.99+
81 percent	QUANTITY	0.99+
last week	DATE	0.99+
last year	DATE	0.99+
two	QUANTITY	0.99+
Peter Burris	PERSON	0.99+
February	DATE	0.99+
first issue	QUANTITY	0.99+
Yorktown Heights, New York	LOCATION	0.99+
IBM Innovation Day	EVENT	0.99+
IBM Analytics	ORGANIZATION	0.99+
hundreds	QUANTITY	0.99+
Wikibon	ORGANIZATION	0.98+
Python	TITLE	0.98+
Taconic Parkway	LOCATION	0.98+
40 years	QUANTITY	0.98+
Scala	TITLE	0.98+
thirty plus thousand people	QUANTITY	0.97+
IBM Cloud Innovation Day	EVENT	0.96+
hundreds of experiments	QUANTITY	0.96+
today	DATE	0.96+
Watson Assistant	TITLE	0.96+
one place	QUANTITY	0.94+
IBM Innovation Day 2018	EVENT	0.93+
Thomas J Watson Research Center	ORGANIZATION	0.93+
SPSS	TITLE	0.89+
this month	DATE	0.88+
one environment	QUANTITY	0.86+
San Francisco	LOCATION	0.8+
half of	QUANTITY	0.79+
hundreds of people	QUANTITY	0.78+
many years ago	DATE	0.77+
hundreds of experiments	QUANTITY	0.76+
single instantiation	QUANTITY	0.76+
hundred different AI projects	QUANTITY	0.76+
one last question	QUANTITY	0.73+
SaaS	TITLE	0.71+
Turing	ORGANIZATION	0.71+
AR	TITLE	0.7+
IBM Think	ORGANIZATION	0.69+
J Watson Research	ORGANIZATION	0.67+
Thomas	LOCATION	0.62+
The Cube	TITLE	0.58+
money	QUANTITY	0.58+
Virtualization	COMMERCIAL_ITEM	0.55+
Laboratory	LOCATION	0.54+
Turing	PERSON	0.51+
Cloud Private	COMMERCIAL_ITEM	0.49+
Private for	COMMERCIAL_ITEM	0.47+
Cloud	TITLE	0.3+

Rob Thomas, IBM | Change the Game: Winning With AI 2018

>> [Announcer] Live from Times Square in New York City, it's theCUBE covering IBM's Change the Game: Winning with AI, brought to you by IBM. >> Hello everybody, welcome to theCUBE's special presentation. We're covering IBM's announcements today around AI. IBM, as theCUBE does, runs of sessions and programs in conjunction with Strata, which is down at the Javits, and we're Rob Thomas, who's the General Manager of IBM Analytics. Long time Cube alum, Rob, great to see you. >> Dave, great to see you. >> So you guys got a lot going on today. We're here at the Westin Hotel, you've got an analyst event, you've got a partner meeting, you've got an event tonight, Change the game: winning with AI at Terminal 5, check that out, ibm.com/WinWithAI, go register there. But Rob, let's start with what you guys have going on, give us the run down. >> Yeah, it's a big week for us, and like many others, it's great when you have Strata, a lot of people in town. So, we've structured a week where, today, we're going to spend a lot of time with analysts and our business partners, talking about where we're going with data and AI. This evening, we've got a broadcast, it's called Winning with AI. What's unique about that broadcast is it's all clients. We've got clients on stage doing demonstrations, how they're using IBM technology to get to unique outcomes in their business. So I think it's going to be a pretty unique event, which should be a lot of fun. >> So this place, it looks like a cool event, a venue, Terminal 5, it's just up the street on the west side highway, probably a mile from the Javits Center, so definitely check that out. Alright, let's talk about, Rob, we've known each other for a long time, we've seen the early Hadoop days, you guys were very careful about diving in, you kind of let things settle and watched very carefully, and then came in at the right time. But we saw the evolution of so-called Big Data go from a phase of really reducing investments, cheaper data warehousing, and what that did is allowed people to collect a lot more data, and kind of get ready for this era that we're in now. But maybe you can give us your perspective on the phases, the waves that we've seen of data, and where we are today and where we're going. >> I kind of think of it as a maturity curve. So when I go talk to clients, I say, look, you need to be on a journey towards AI. I think probably nobody disagrees that they need something there, the question is, how do you get there? So you think about the steps, it's about, a lot of people started with, we're going to reduce the cost of our operations, we're going to use data to take out cost, that was kind of the Hadoop thrust, I would say. Then they moved to, well, now we need to see more about our data, we need higher performance data, BI data warehousing. So, everybody, I would say, has dabbled in those two area. The next leap forward is self-service analytics, so how do you actually empower everybody in your organization to use and access data? And the next step beyond that is, can I use AI to drive new business models, new levers of growth, for my business? So, I ask clients, pin yourself on this journey, most are, depends on the division or the part of the company, they're at different areas, but as I tell everybody, if you don't know where you are and you don't know where you want to go, you're just going to wind around, so I try to get them to pin down, where are you versus where do you want to go? >> So four phases, basically, the sort of cheap data store, the BI data warehouse modernization, self-service analytics, a big part of that is data science and data science collaboration, you guys have a lot of investments there, and then new business models with AI automation running on top. Where are we today? Would you say we're kind of in-between BI/DW modernization and on our way to self-service analytics, or what's your sense? >> I'd say most are right in the middle between BI data warehousing and self-service analytics. Self-service analytics is hard, because it requires you, sometimes to take a couple steps back, and look at your data. It's hard to provide self-service if you don't have a data catalog, if you don't have data security, if you haven't gone through the processes around data governance. So, sometimes you have to take one step back to go two steps forward, that's why I see a lot of people, I'd say, stuck in the middle right now. And the examples that you're going to see tonight as part of the broadcast are clients that have figured out how to break through that wall, and I think that's pretty illustrative of what's possible. >> Okay, so you're saying that, got to maybe take a step back and get the infrastructure right with, let's say a catalog, to give some basic things that they have to do, some x's and o's, you've got the Vince Lombardi played out here, and also, skillsets, I imagine, is a key part of that. So, that's what they've got to do to get prepared, and then, what's next? They start creating new business models, imagining this is where the cheap data officer comes in and it's an executive level, what are you seeing clients as part of digital transformation, what's the conversation like with customers? >> The biggest change, the great thing about the times we live in, is technology's become so accessible, you can do things very quickly. We created a team last year called Data Science Elite, and we've hired what we think are some of the best data scientists in the world. Their only job is to go work with clients and help them get to a first success with data science. So, we put a team in. Normally, one month, two months, normally a team of two or three people, our investment, and we say, let's go build a model, let's get to an outcome, and you can do this incredibly quickly now. I tell clients, I see somebody that says, we're going to spend six months evaluating and thinking about this, I was like, why would you spend six months thinking about this when you could actually do it in one month? So you just need to get over the edge and go try it. >> So we're going to learn more about the Data Science Elite team. We've got John Thomas coming on today, who is a distinguished engineer at IBM, and he's very much involved in that team, and I think we have a customer who's actually gone through that, so we're going to talk about what their experience was with the Data Science Elite team. Alright, you've got some hard news coming up, you've actually made some news earlier with Hortonworks and Red Hat, I want to talk about that, but you've also got some hard news today. Take us through that. >> Yeah, let's talk about all three. First, Monday we announced the expanded relationship with both Hortonworks and Red Hat. This goes back to one of the core beliefs I talked about, every enterprise is modernizing their data and application of states, I don't think there's any debate about that. We are big believers in Kubernetes and containers as the architecture to drive that modernization. The announcement on Monday was, we're working closer with Red Hat to take all of our data services as part of Cloud Private for Data, which are basically microservice for data, and we're running those on OpenShift, and we're starting to see great customer traction with that. And where does Hortonworks come in? Hadoop has been the outlier on moving to microservices containers, we're working with Hortonworks to help them make that move as well. So, it's really about the three of us getting together and helping clients with this modernization journey. >> So, just to remind people, you remember ODPI, folks? It was all this kerfuffle about, why do we even need this? Well, what's interesting to me about this triumvirate is, well, first of all, Red Hat and Hortonworks are hardcore opensource, IBM's always been a big supporter of open source. You three got together and you're proving now the productivity for customers of this relationship. You guys don't talk about this, but Hortonworks had to, when it's public call, that the relationship with IBM drove many, many seven-figure deals, which, obviously means that customers are getting value out of this, so it's great to see that come to fruition, and it wasn't just a Barney announcement a couple years ago, so congratulations on that. Now, there's this other news that you guys announced this morning, talk about that. >> Yeah, two other things. One is, we announced a relationship with Stack Overflow. 50 million developers go to Stack Overflow a month, it's an amazing environment for developers that are looking to do new things, and we're sponsoring a community around AI. Back to your point before, you said, is there a skills gap in enterprises, there absolutely is, I don't think that's a surprise. Data science, AI developers, not every company has the skills they need, so we're sponsoring a community to help drive the growth of skills in and around data science and AI. So things like Python, R, Scala, these are the languages of data science, and it's a great relationship with us and Stack Overflow to build a community to get things going on skills. >> Okay, and then there was one more. >> Last one's a product announcement. This is one of the most interesting product annoucements we've had in quite a while. Imagine this, you write a sequel query, and traditional approach is, I've got a server, I point it as that server, I get the data, it's pretty limited. We're announcing technology where I write a query, and it can find data anywhere in the world. I think of it as wide-area sequel. So it can find data on an automotive device, a telematics device, an IoT device, it could be a mobile device, we think of it as sequel the whole world. You write a query, you can find the data anywhere it is, and we take advantage of the processing power on the edge. The biggest problem with IoT is, it's been the old mantra of, go find the data, bring it all back to a centralized warehouse, that makes it impossible to do it real time. We're enabling real time because we can write a query once, find data anywhere, this is technology we've had in preview for the last year. We've been working with a lot of clients to prove out used cases to do it, we're integrating as the capability inside of IBM Cloud Private for Data. So if you buy IBM Cloud for Data, it's there. >> Interesting, so when you've been around as long as I have, long enough to see some of the pendulums swings, and it's clearly a pendulum swing back toward decentralization in the edge, but the key is, from what you just described, is you're sort of redefining the boundary, so I presume it's the edge, any Cloud, or on premises, where you can find that data, is that correct? >> Yeah, so it's multi-Cloud. I mean, look, every organization is going to be multi-Cloud, like 100%, that's going to happen, and that could be private, it could be multiple public Cloud providers, but the key point is, data on the edge is not just limited to what's in those Clouds. It could be anywhere that you're collecting data. And, we're enabling an architecture which performs incredibly well, because you take advantage of processing power on the edge, where you can get data anywhere that it sits. >> Okay, so, then, I'm setting up a Cloud, I'll call it a Cloud architecture, that encompasses the edge, where essentially, there are no boundaries, and you're bringing security. We talked about containers before, we've been talking about Kubernetes all week here at a Big Data show. And then of course, Cloud, and what's interesting, I think many of the Hadoop distral vendors kind of missed Cloud early on, and then now are sort of saying, oh wow, it's a hybrid world and we've got a part, you guys obviously made some moves, a couple billion dollar moves, to do some acquisitions and get hardcore into Cloud, so that becomes a critical component. You're not just limiting your scope to the IBM Cloud. You're recognizing that it's a multi-Cloud world, that' what customers want to do. Your comments. >> It's multi-Cloud, and it's not just the IBM Cloud, I think the most predominant Cloud that's emerging is every client's private Cloud. Every client I talk to is building out a containerized architecture. They need their own Cloud, and they need seamless connectivity to any public Cloud that they may be using. This is why you see such a premium being put on things like data ingestion, data curation. It's not popular, it's not exciting, people don't want to talk about it, but we're the biggest inhibitors, to this AI point, comes back to data curation, data ingestion, because if you're dealing with multiple Clouds, suddenly your data's in a bunch of different spots. >> Well, so you're basically, and we talked about this a lot on theCUBE, you're bringing the Cloud model to the data, wherever the data lives. Is that the right way to think about it? >> I think organizations have spoken, set aside what they say, look at their actions. Their actions say, we don't want to move all of our data to any particular Cloud, we'll move some of our data. We need to give them seamless connectivity so that they can leave their data where they want, we can bring Cloud-Native Architecture to their data, we could also help move their data to a Cloud-Native architecture if that's what they prefer. >> Well, it makes sense, because you've got physics, latency, you've got economics, moving all the data into a public Cloud is expensive and just doesn't make economic sense, and then you've got things like GDPR, which says, well, you have to keep the data, certain laws of the land, if you will, that say, you've got to keep the data in whatever it is, in Germany, or whatever country. So those sort of edicts dictate how you approach managing workloads and what you put where, right? Okay, what's going on with Watson? Give us the update there. >> I get a lot of questions, people trying to peel back the onion of what exactly is it? So, I want to make that super clear here. Watson is a few things, start at the bottom. You need a runtime for models that you've built. So we have a product called Watson Machine Learning, runs anywhere you want, that is the runtime for how you execute models that you've built. Anytime you have a runtime, you need somewhere where you can build models, you need a development environment. That is called Watson Studio. So, we had a product called Data Science Experience, we've evolved that into Watson Studio, connecting in some of those features. So we have Watson Studio, that's the development environment, Watson Machine Learning, that's the runtime. Now you move further up the stack. We have a set of APIs that bring in human features, vision, natural language processing, audio analytics, those types of things. You can integrate those as part of a model that you build. And then on top of that, we've got things like Watson Applications, we've got Watson for call centers, doing customer service and chatbots, and then we've got a lot of clients who've taken pieces of that stack and built their own AI solutions. They've taken some of the APIs, they've taken some of the design time, the studio, they've taken some of the Watson Machine Learning. So, it is really a stack of capabilities, and where we're driving the greatest productivity, this is in a lot of the examples you'll see tonight for clients, is clients that have bought into this idea of, I need a development environment, I need a runtime, where I can deploy models anywhere. We're getting a lot of momentum on that, and then that raises the question of, well, do I have expandability, do I have trust in transparency, and that's another thing that we're working on. >> Okay, so there's API oriented architecture, exposing all these services make it very easy for people to consume. Okay, so we've been talking all week at Cube NYC, is Big Data is in AI, is this old wine, new bottle? I mean, it's clear, Rob, from the conversation here, there's a lot of substantive innovation, and early adoption, anyway, of some of these innovations, but a lot of potential going forward. Last thoughts? >> What people have to realize is AI is not magic, it's still computer science. So it actually requires some hard work. You need to roll up your sleeves, you need to understand how I get from point A to point B, you need a development environment, you need a runtime. I want people to really think about this, it's not magic. I think for a while, people have gotten the impression that there's some magic button. There's not, but if you put in the time, and it's not a lot of time, you'll see the examples tonight, most of them have been done in one or two months, there's great business value in starting to leverage AI in your business. >> Awesome, alright, so if you're in this city or you're at Strata, go to ibm.com/WinWithAI, register for the event tonight. Rob, we'll see you there, thanks so much for coming back. >> Yeah, it's going to be fun, thanks Dave, great to see you. >> Alright, keep it right there everybody, we'll be back with our next guest right after this short break, you're watching theCUBE.

Published Date : Sep 18 2018

SUMMARY :

brought to you by IBM. Long time Cube alum, Rob, great to see you. But Rob, let's start with what you guys have going on, it's great when you have Strata, a lot of people in town. and kind of get ready for this era that we're in now. where you want to go, you're just going to wind around, and data science collaboration, you guys have It's hard to provide self-service if you don't have and it's an executive level, what are you seeing let's get to an outcome, and you can do this and I think we have a customer who's actually as the architecture to drive that modernization. So, just to remind people, you remember ODPI, folks? has the skills they need, so we're sponsoring a community and it can find data anywhere in the world. of processing power on the edge, where you can get data a couple billion dollar moves, to do some acquisitions This is why you see such a premium being put on things Is that the right way to think about it? to a Cloud-Native architecture if that's what they prefer. certain laws of the land, if you will, that say, for how you execute models that you've built. I mean, it's clear, Rob, from the conversation here, and it's not a lot of time, you'll see the examples tonight, Rob, we'll see you there, thanks so much for coming back. we'll be back with our next guest

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
six months	QUANTITY	0.99+
Rob	PERSON	0.99+
Rob Thomas	PERSON	0.99+
John Thomas	PERSON	0.99+
two months	QUANTITY	0.99+
one month	QUANTITY	0.99+
Germany	LOCATION	0.99+
last year	DATE	0.99+
Red Hat	ORGANIZATION	0.99+
Monday	DATE	0.99+
one	QUANTITY	0.99+
100%	QUANTITY	0.99+
GDPR	TITLE	0.99+
three people	QUANTITY	0.99+
first	QUANTITY	0.99+
two	QUANTITY	0.99+
ibm.com/WinWithAI	OTHER	0.99+
Watson Studio	TITLE	0.99+
Python	TITLE	0.99+
Scala	TITLE	0.99+
First	QUANTITY	0.99+
Data Science Elite	ORGANIZATION	0.99+
both	QUANTITY	0.99+
Cube	ORGANIZATION	0.99+
one step	QUANTITY	0.99+
One	QUANTITY	0.99+
Times Square	LOCATION	0.99+
today	DATE	0.99+
Vince Lombardi	PERSON	0.98+
three	QUANTITY	0.98+
Stack Overflow	ORGANIZATION	0.98+
tonight	DATE	0.98+
Javits Center	LOCATION	0.98+
Barney	ORGANIZATION	0.98+
Terminal 5	LOCATION	0.98+
IBM Analytics	ORGANIZATION	0.98+
Watson	TITLE	0.97+
two steps	QUANTITY	0.97+
New York City	LOCATION	0.97+
Watson Applications	TITLE	0.97+
Cloud	TITLE	0.96+
This evening	DATE	0.95+
Watson Machine Learning	TITLE	0.94+
two area	QUANTITY	0.93+
seven-figure deals	QUANTITY	0.92+
Cube	PERSON	0.91+

Sreesha Rao, Niagara Bottling & Seth Dobrin, IBM | Change The Game: Winning With AI 2018

>> Live, from Times Square, in New York City, it's theCUBE covering IBM's Change the Game: Winning with AI. Brought to you by IBM. >> Welcome back to the Big Apple, everybody. I'm Dave Vellante, and you're watching theCUBE, the leader in live tech coverage, and we're here covering a special presentation of IBM's Change the Game: Winning with AI. IBM's got an analyst event going on here at the Westin today in the theater district. They've got 50-60 analysts here. They've got a partner summit going on, and then tonight, at Terminal 5 of the West Side Highway, they've got a customer event, a lot of customers there. We've talked earlier today about the hard news. Seth Dobern is here. He's the Chief Data Officer of IBM Analytics, and he's joined by Shreesha Rao who is the Senior Manager of IT Applications at California-based Niagara Bottling. Gentlemen, welcome to theCUBE. Thanks so much for coming on. >> Thank you, Dave. >> Well, thanks Dave for having us. >> Yes, always a pleasure Seth. We've known each other for a while now. I think we met in the snowstorm in Boston, sparked something a couple years ago. >> Yep. When we were both trapped there. >> Yep, and at that time, we spent a lot of time talking about your internal role as the Chief Data Officer, working closely with Inderpal Bhandari, and you guys are doing inside of IBM. I want to talk a little bit more about your other half which is working with clients and the Data Science Elite Team, and we'll get into what you're doing with Niagara Bottling, but let's start there, in terms of that side of your role, give us the update. >> Yeah, like you said, we spent a lot of time talking about how IBM is implementing the CTO role. While we were doing that internally, I spent quite a bit of time flying around the world, talking to our clients over the last 18 months since I joined IBM, and we found a consistent theme with all the clients, in that, they needed help learning how to implement data science, AI, machine learning, whatever you want to call it, in their enterprise. There's a fundamental difference between doing these things at a university or as part of a Kaggle competition than in an enterprise, so we felt really strongly that it was important for the future of IBM that all of our clients become successful at it because what we don't want to do is we don't want in two years for them to go "Oh my God, this whole data science thing was a scam. We haven't made any money from it." And it's not because the data science thing is a scam. It's because the way they're doing it is not conducive to business, and so we set up this team we call the Data Science Elite Team, and what this team does is we sit with clients around a specific use case for 30, 60, 90 days, it's really about 3 or 4 sprints, depending on the material, the client, and how long it takes, and we help them learn through this use case, how to use Python, R, Scala in our platform obviously, because we're here to make money too, to implement these projects in their enterprise. Now, because it's written in completely open-source, if they're not happy with what the product looks like, they can take their toys and go home afterwards. It's on us to prove the value as part of this, but there's a key point here. My team is not measured on sales. They're measured on adoption of AI in the enterprise, and so it creates a different behavior for them. So they're really about "Make the enterprise successful," right, not "Sell this software." >> Yeah, compensation drives behavior. >> Yeah, yeah. >> So, at this point, I ask, "Well, do you have any examples?" so Shreesha, let's turn to you. (laughing softly) Niagara Bottling -- >> As a matter of fact, Dave, we do. (laughing) >> Yeah, so you're not a bank with a trillion dollars in assets under management. Tell us about Niagara Bottling and your role. >> Well, Niagara Bottling is the biggest private label bottled water manufacturing company in the U.S. We make bottled water for Costcos, Walmarts, major national grocery retailers. These are our customers whom we service, and as with all large customers, they're demanding, and we provide bottled water at relatively low cost and high quality. >> Yeah, so I used to have a CIO consultancy. We worked with every CIO up and down the East Coast. I always observed, really got into a lot of organizations. I was always observed that it was really the heads of Application that drove AI because they were the glue between the business and IT, and that's really where you sit in the organization, right? >> Yes. My role is to support the business and business analytics as well as I support some of the distribution technologies and planning technologies at Niagara Bottling. >> So take us the through the project if you will. What were the drivers? What were the outcomes you envisioned? And we can kind of go through the case study. >> So the current project that we leveraged IBM's help was with a stretch wrapper project. Each pallet that we produce--- we produce obviously cases of bottled water. These are stacked into pallets and then shrink wrapped or stretch wrapped with a stretch wrapper, and this project is to be able to save money by trying to optimize the amount of stretch wrap that goes around a pallet. We need to be able to maintain the structural stability of the pallet while it's transported from the manufacturing location to our customer's location where it's unwrapped and then the cases are used. >> And over breakfast we were talking. You guys produce 2833 bottles of water per second. >> Wow. (everyone laughs) >> It's enormous. The manufacturing line is a high speed manufacturing line, and we have a lights-out policy where everything runs in an automated fashion with raw materials coming in from one end and the finished goods, pallets of water, going out. It's called pellets to pallets. Pellets of plastic coming in through one end and pallets of water going out through the other end. >> Are you sitting on top of an aquifer? Or are you guys using sort of some other techniques? >> Yes, in fact, we do bore wells and extract water from the aquifer. >> Okay, so the goal was to minimize the amount of material that you used but maintain its stability? Is that right? >> Yes, during transportation, yes. So if we use too much plastic, we're not optimally, I mean, we're wasting material, and cost goes up. We produce almost 16 million pallets of water every single year, so that's a lot of shrink wrap that goes around those, so what we can save in terms of maybe 15-20% of shrink wrap costs will amount to quite a bit. >> So, how does machine learning fit into all of this? >> So, machine learning is way to understand what kind of profile, if we can measure what is happening as we wrap the pallets, whether we are wrapping it too tight or by stretching it, that results in either a conservative way of wrapping the pallets or an aggressive way of wrapping the pallets. >> I.e. too much material, right? >> Too much material is conservative, and aggressive is too little material, and so we can achieve some savings if we were to alternate between the profiles. >> So, too little material means you lose product, right? >> Yes, and there's a risk of breakage, so essentially, while the pallet is being wrapped, if you are stretching it too much there's a breakage, and then it interrupts production, so we want to try and avoid that. We want a continuous production, at the same time, we want the pallet to be stable while saving material costs. >> Okay, so you're trying to find that ideal balance, and how much variability is in there? Is it a function of distance and how many touches it has? Maybe you can share with that. >> Yes, so each pallet takes about 16-18 wraps of the stretch wrapper going around it, and that's how much material is laid out. About 250 grams of plastic that goes on there. So we're trying to optimize the gram weight which is the amount of plastic that goes around each of the pallet. >> So it's about predicting how much plastic is enough without having breakage and disrupting your line. So they had labeled data that was, "if we stretch it this much, it breaks. If we don't stretch it this much, it doesn't break, but then it was about predicting what's good enough, avoiding both of those extremes, right? >> Yes. >> So it's a truly predictive and iterative model that we've built with them. >> And, you're obviously injecting data in terms of the trip to the store as well, right? You're taking that into consideration in the model, right? >> Yeah that's mainly to make sure that the pallets are stable during transportation. >> Right. >> And that is already determined how much containment force is required when your stretch and wrap each pallet. So that's one of the variables that is measured, but the inputs and outputs are-- the input is the amount of material that is being used in terms of gram weight. We are trying to minimize that. So that's what the whole machine learning exercise was. >> And the data comes from where? Is it observation, maybe instrumented? >> Yeah, the instruments. Our stretch-wrapper machines have an ignition platform, which is a Scada platform that allows us to measure all of these variables. We would be able to get machine variable information from those machines and then be able to hopefully, one day, automate that process, so the feedback loop that says "On this profile, we've not had any breaks. We can continue," or if there have been frequent breaks on a certain profile or machine setting, then we can change that dynamically as the product is moving through the manufacturing process. >> Yeah, so think of it as, it's kind of a traditional manufacturing production line optimization and prediction problem right? It's minimizing waste, right, while maximizing the output and then throughput of the production line. When you optimize a production line, the first step is to predict what's going to go wrong, and then the next step would be to include precision optimization to say "How do we maximize? Using the constraints that the predictive models give us, how do we maximize the output of the production line?" This is not a unique situation. It's a unique material that we haven't really worked with, but they had some really good data on this material, how it behaves, and that's key, as you know, Dave, and probable most of the people watching this know, labeled data is the hardest part of doing machine learning, and building those features from that labeled data, and they had some great data for us to start with. >> Okay, so you're collecting data at the edge essentially, then you're using that to feed the models, which is running, I don't know, where's it running, your data center? Your cloud? >> Yeah, in our data center, there's an instance of DSX Local. >> Okay. >> That we stood up. Most of the data is running through that. We build the models there. And then our goal is to be able to deploy to the edge where we can complete the loop in terms of the feedback that happens. >> And iterate. (Shreesha nods) >> And DSX Local, is Data Science Experience Local? >> Yes. >> Slash Watson Studio, so they're the same thing. >> Okay now, what role did IBM and the Data Science Elite Team play? You could take us through that. >> So, as we discussed earlier, adopting data science is not that easy. It requires subject matter, expertise. It requires understanding of data science itself, the tools and techniques, and IBM brought that as a part of the Data Science Elite Team. They brought both the tools and the expertise so that we could get on that journey towards AI. >> And it's not a "do the work for them." It's a "teach to fish," and so my team sat side by side with the Niagara Bottling team, and we walked them through the process, so it's not a consulting engagement in the traditional sense. It's how do we help them learn how to do it? So it's side by side with their team. Our team sat there and walked them through it. >> For how many weeks? >> We've had about two sprints already, and we're entering the third sprint. It's been about 30-45 days between sprints. >> And you have your own data science team. >> Yes. Our team is coming up to speed using this project. They've been trained but they needed help with people who have done this, been there, and have handled some of the challenges of modeling and data science. >> So it accelerates that time to --- >> Value. >> Outcome and value and is a knowledge transfer component -- >> Yes, absolutely. >> It's occurring now, and I guess it's ongoing, right? >> Yes. The engagement is unique in the sense that IBM's team came to our factory, understood what that process, the stretch-wrap process looks like so they had an understanding of the physical process and how it's modeled with the help of the variables and understand the data science modeling piece as well. Once they know both side of the equation, they can help put the physical problem and the digital equivalent together, and then be able to correlate why things are happening with the appropriate data that supports the behavior. >> Yeah and then the constraints of the one use case and up to 90 days, there's no charge for those two. Like I said, it's paramount that our clients like Niagara know how to do this successfully in their enterprise. >> It's a freebie? >> No, it's no charge. Free makes it sound too cheap. (everybody laughs) >> But it's part of obviously a broader arrangement with buying hardware and software, or whatever it is. >> Yeah, its a strategy for us to help make sure our clients are successful, and I want it to minimize the activation energy to do that, so there's no charge, and the only requirements from the client is it's a real use case, they at least match the resources I put on the ground, and they sit with us and do things like this and act as a reference and talk about the team and our offerings and their experiences. >> So you've got to have skin in the game obviously, an IBM customer. There's got to be some commitment for some kind of business relationship. How big was the collective team for each, if you will? >> So IBM had 2-3 data scientists. (Dave takes notes) Niagara matched that, 2-3 analysts. There were some working with the machines who were familiar with the machines and others who were more familiar with the data acquisition and data modeling. >> So each of these engagements, they cost us about $250,000 all in, so they're quite an investment we're making in our clients. >> I bet. I mean, 2-3 weeks over many, many weeks of super geeks time. So you're bringing in hardcore data scientists, math wizzes, stat wiz, data hackers, developer--- >> Data viz people, yeah, the whole stack. >> And the level of skills that Niagara has? >> We've got actual employees who are responsible for production, our manufacturing analysts who help aid in troubleshooting problems. If there are breakages, they go analyze why that's happening. Now they have data to tell them what to do about it, and that's the whole journey that we are in, in trying to quantify with the help of data, and be able to connect our systems with data, systems and models that help us analyze what happened and why it happened and what to do before it happens. >> Your team must love this because they're sort of elevating their skills. They're working with rock star data scientists. >> Yes. >> And we've talked about this before. A point that was made here is that it's really important in these projects to have people acting as product owners if you will, subject matter experts, that are on the front line, that do this everyday, not just for the subject matter expertise. I'm sure there's executives that understand it, but when you're done with the model, bringing it to the floor, and talking to their peers about it, there's no better way to drive this cultural change of adopting these things and having one of your peers that you respect talk about it instead of some guy or lady sitting up in the ivory tower saying "thou shalt." >> Now you don't know the outcome yet. It's still early days, but you've got a model built that you've got confidence in, and then you can iterate that model. What's your expectation for the outcome? >> We're hoping that preliminary results help us get up the learning curve of data science and how to leverage data to be able to make decisions. So that's our idea. There are obviously optimal settings that we can use, but it's going to be a trial and error process. And through that, as we collect data, we can understand what settings are optimal and what should we be using in each of the plants. And if the plants decide, hey they have a subjective preference for one profile versus another with the data we are capturing we can measure when they deviated from what we specified. We have a lot of learning coming from the approach that we're taking. You can't control things if you don't measure it first. >> Well, your objectives are to transcend this one project and to do the same thing across. >> And to do the same thing across, yes. >> Essentially pay for it, with a quick return. That's the way to do things these days, right? >> Yes. >> You've got more narrow, small projects that'll give you a quick hit, and then leverage that expertise across the organization to drive more value. >> Yes. >> Love it. What a great story, guys. Thanks so much for coming to theCUBE and sharing. >> Thank you. >> Congratulations. You must be really excited. >> No. It's a fun project. I appreciate it. >> Thanks for having us, Dave. I appreciate it. >> Pleasure, Seth. Always great talking to you, and keep it right there everybody. You're watching theCUBE. We're live from New York City here at the Westin Hotel. cubenyc #cubenyc Check out the ibm.com/winwithai Change the Game: Winning with AI Tonight. We'll be right back after a short break. (minimal upbeat music)

Published Date : Sep 13 2018

SUMMARY :

Brought to you by IBM. at Terminal 5 of the West Side Highway, I think we met in the snowstorm in Boston, sparked something When we were both trapped there. Yep, and at that time, we spent a lot of time and we found a consistent theme with all the clients, So, at this point, I ask, "Well, do you have As a matter of fact, Dave, we do. Yeah, so you're not a bank with a trillion dollars Well, Niagara Bottling is the biggest private label and that's really where you sit in the organization, right? and business analytics as well as I support some of the And we can kind of go through the case study. So the current project that we leveraged IBM's help was And over breakfast we were talking. (everyone laughs) It's called pellets to pallets. Yes, in fact, we do bore wells and So if we use too much plastic, we're not optimally, as we wrap the pallets, whether we are wrapping it too little material, and so we can achieve some savings so we want to try and avoid that. and how much variability is in there? goes around each of the pallet. So they had labeled data that was, "if we stretch it this that we've built with them. Yeah that's mainly to make sure that the pallets So that's one of the variables that is measured, one day, automate that process, so the feedback loop the predictive models give us, how do we maximize the Yeah, in our data center, Most of the data And iterate. the Data Science Elite Team play? so that we could get on that journey towards AI. And it's not a "do the work for them." and we're entering the third sprint. some of the challenges of modeling and data science. that supports the behavior. Yeah and then the constraints of the one use case No, it's no charge. with buying hardware and software, or whatever it is. minimize the activation energy to do that, There's got to be some commitment for some and others who were more familiar with the So each of these engagements, So you're bringing in hardcore data scientists, math wizzes, and that's the whole journey that we are in, in trying to Your team must love this because that are on the front line, that do this everyday, and then you can iterate that model. And if the plants decide, hey they have a subjective and to do the same thing across. That's the way to do things these days, right? across the organization to drive more value. Thanks so much for coming to theCUBE and sharing. You must be really excited. I appreciate it. I appreciate it. Change the Game: Winning with AI Tonight.

ENTITIES

Entity	Category	Confidence
Shreesha Rao	PERSON	0.99+
Seth Dobern	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Walmarts	ORGANIZATION	0.99+
Costcos	ORGANIZATION	0.99+
Dave	PERSON	0.99+
30	QUANTITY	0.99+
Boston	LOCATION	0.99+
New York City	LOCATION	0.99+
California	LOCATION	0.99+
Seth Dobrin	PERSON	0.99+
60	QUANTITY	0.99+
Niagara	ORGANIZATION	0.99+
Seth	PERSON	0.99+
Shreesha	PERSON	0.99+
U.S.	LOCATION	0.99+
Sreesha Rao	PERSON	0.99+
third sprint	QUANTITY	0.99+
90 days	QUANTITY	0.99+
two	QUANTITY	0.99+
first step	QUANTITY	0.99+
Inderpal Bhandari	PERSON	0.99+
Niagara Bottling	ORGANIZATION	0.99+
Python	TITLE	0.99+
both	QUANTITY	0.99+
tonight	DATE	0.99+
ibm.com/winwithai	OTHER	0.99+
one	QUANTITY	0.99+
Terminal 5	LOCATION	0.99+
two years	QUANTITY	0.99+
about $250,000	QUANTITY	0.98+
Times Square	LOCATION	0.98+
Scala	TITLE	0.98+
2018	DATE	0.98+
15-20%	QUANTITY	0.98+
IBM Analytics	ORGANIZATION	0.98+
each	QUANTITY	0.98+
today	DATE	0.98+
each pallet	QUANTITY	0.98+
Kaggle	ORGANIZATION	0.98+
West Side Highway	LOCATION	0.97+
Each pallet	QUANTITY	0.97+
4 sprints	QUANTITY	0.97+
About 250 grams	QUANTITY	0.97+
both side	QUANTITY	0.96+
Data Science Elite Team	ORGANIZATION	0.96+
one day	QUANTITY	0.95+
every single year	QUANTITY	0.95+
Niagara Bottling	PERSON	0.93+
about two sprints	QUANTITY	0.93+
one end	QUANTITY	0.93+
R	TITLE	0.92+
2-3 weeks	QUANTITY	0.91+
one profile	QUANTITY	0.91+
50-60 analysts	QUANTITY	0.91+
trillion dollars	QUANTITY	0.9+
2-3 data scientists	QUANTITY	0.9+
about 30-45 days	QUANTITY	0.88+
almost 16 million pallets of water	QUANTITY	0.88+
Big Apple	LOCATION	0.87+
couple years ago	DATE	0.87+
last 18 months	DATE	0.87+
Westin Hotel	ORGANIZATION	0.83+
pallet	QUANTITY	0.83+
#cubenyc	LOCATION	0.82+
2833 bottles of water per second	QUANTITY	0.82+
the Game: Winning with AI	TITLE	0.81+

Rob Thomas, IBM | Change the Game: Winning With AI

>> Live from Times Square in New York City, it's The Cube covering IBM's Change the Game: Winning with AI, brought to you by IBM. >> Hello everybody, welcome to The Cube's special presentation. We're covering IBM's announcements today around AI. IBM, as The Cube does, runs of sessions and programs in conjunction with Strata, which is down at the Javits, and we're Rob Thomas, who's the General Manager of IBM Analytics. Long time Cube alum, Rob, great to see you. >> Dave, great to see you. >> So you guys got a lot going on today. We're here at the Westin Hotel, you've got an analyst event, you've got a partner meeting, you've got an event tonight, Change the game: winning with AI at Terminal 5, check that out, ibm.com/WinWithAI, go register there. But Rob, let's start with what you guys have going on, give us the run down. >> Yeah, it's a big week for us, and like many others, it's great when you have Strata, a lot of people in town. So, we've structured a week where, today, we're going to spend a lot of time with analysts and our business partners, talking about where we're going with data and AI. This evening, we've got a broadcast, it's called Winning with AI. What's unique about that broadcast is it's all clients. We've got clients on stage doing demonstrations, how they're using IBM technology to get to unique outcomes in their business. So I think it's going to be a pretty unique event, which should be a lot of fun. >> So this place, it looks like a cool event, a venue, Terminal 5, it's just up the street on the west side highway, probably a mile from the Javits Center, so definitely check that out. Alright, let's talk about, Rob, we've known each other for a long time, we've seen the early Hadoop days, you guys were very careful about diving in, you kind of let things settle and watched very carefully, and then came in at the right time. But we saw the evolution of so-called Big Data go from a phase of really reducing investments, cheaper data warehousing, and what that did is allowed people to collect a lot more data, and kind of get ready for this era that we're in now. But maybe you can give us your perspective on the phases, the waves that we've seen of data, and where we are today and where we're going. >> I kind of think of it as a maturity curve. So when I go talk to clients, I say, look, you need to be on a journey towards AI. I think probably nobody disagrees that they need something there, the question is, how do you get there? So you think about the steps, it's about, a lot of people started with, we're going to reduce the cost of our operations, we're going to use data to take out cost, that was kind of the Hadoop thrust, I would say. Then they moved to, well, now we need to see more about our data, we need higher performance data, BI data warehousing. So, everybody, I would say, has dabbled in those two area. The next leap forward is self-service analytics, so how do you actually empower everybody in your organization to use and access data? And the next step beyond that is, can I use AI to drive new business models, new levers of growth, for my business? So, I ask clients, pin yourself on this journey, most are, depends on the division or the part of the company, they're at different areas, but as I tell everybody, if you don't know where you are and you don't know where you want to go, you're just going to wind around, so I try to get them to pin down, where are you versus where do you want to go? >> So four phases, basically, the sort of cheap data store, the BI data warehouse modernization, self-service analytics, a big part of that is data science and data science collaboration, you guys have a lot of investments there, and then new business models with AI automation running on top. Where are we today? Would you say we're kind of in-between BI/DW modernization and on our way to self-service analytics, or what's your sense? >> I'd say most are right in the middle between BI data warehousing and self-service analytics. Self-service analytics is hard, because it requires you, sometimes to take a couple steps back, and look at your data. It's hard to provide self-service if you don't have a data catalog, if you don't have data security, if you haven't gone through the processes around data governance. So, sometimes you have to take one step back to go two steps forward, that's why I see a lot of people, I'd say, stuck in the middle right now. And the examples that you're going to see tonight as part of the broadcast are clients that have figured out how to break through that wall, and I think that's pretty illustrative of what's possible. >> Okay, so you're saying that, got to maybe take a step back and get the infrastructure right with, let's say a catalog, to give some basic things that they have to do, some x's and o's, you've got the Vince Lombardi played out here, and also, skillsets, I imagine, is a key part of that. So, that's what they've got to do to get prepared, and then, what's next? They start creating new business models, imagining this is where the cheap data officer comes in and it's an executive level, what are you seeing clients as part of digital transformation, what's the conversation like with customers? >> The biggest change, the great thing about the times we live in, is technology's become so accessible, you can do things very quickly. We created a team last year called Data Science Elite, and we've hired what we think are some of the best data scientists in the world. Their only job is to go work with clients and help them get to a first success with data science. So, we put a team in. Normally, one month, two months, normally a team of two or three people, our investment, and we say, let's go build a model, let's get to an outcome, and you can do this incredibly quickly now. I tell clients, I see somebody that says, we're going to spend six months evaluating and thinking about this, I was like, why would you spend six months thinking about this when you could actually do it in one month? So you just need to get over the edge and go try it. >> So we're going to learn more about the Data Science Elite team. We've got John Thomas coming on today, who is a distinguished engineer at IBM, and he's very much involved in that team, and I think we have a customer who's actually gone through that, so we're going to talk about what their experience was with the Data Science Elite team. Alright, you've got some hard news coming up, you've actually made some news earlier with Hortonworks and Red Hat, I want to talk about that, but you've also got some hard news today. Take us through that. >> Yeah, let's talk about all three. First, Monday we announced the expanded relationship with both Hortonworks and Red Hat. This goes back to one of the core beliefs I talked about, every enterprise is modernizing their data and application of states, I don't think there's any debate about that. We are big believers in Kubernetes and containers as the architecture to drive that modernization. The announcement on Monday was, we're working closer with Red Hat to take all of our data services as part of Cloud Private for Data, which are basically microservice for data, and we're running those on OpenShift, and we're starting to see great customer traction with that. And where does Hortonworks come in? Hadoop has been the outlier on moving to microservices containers, we're working with Hortonworks to help them make that move as well. So, it's really about the three of us getting together and helping clients with this modernization journey. >> So, just to remind people, you remember ODPI, folks? It was all this kerfuffle about, why do we even need this? Well, what's interesting to me about this triumvirate is, well, first of all, Red Hat and Hortonworks are hardcore opensource, IBM's always been a big supporter of open source. You three got together and you're proving now the productivity for customers of this relationship. You guys don't talk about this, but Hortonworks had to, when it's public call, that the relationship with IBM drove many, many seven-figure deals, which, obviously means that customers are getting value out of this, so it's great to see that come to fruition, and it wasn't just a Barney announcement a couple years ago, so congratulations on that. Now, there's this other news that you guys announced this morning, talk about that. >> Yeah, two other things. One is, we announced a relationship with Stack Overflow. 50 million developers go to Stack Overflow a month, it's an amazing environment for developers that are looking to do new things, and we're sponsoring a community around AI. Back to your point before, you said, is there a skills gap in enterprises, there absolutely is, I don't think that's a surprise. Data science, AI developers, not every company has the skills they need, so we're sponsoring a community to help drive the growth of skills in and around data science and AI. So things like Python, R, Scala, these are the languages of data science, and it's a great relationship with us and Stack Overflow to build a community to get things going on skills. >> Okay, and then there was one more. >> Last one's a product announcement. This is one of the most interesting product annoucements we've had in quite a while. Imagine this, you write a sequel query, and traditional approach is, I've got a server, I point it as that server, I get the data, it's pretty limited. We're announcing technology where I write a query, and it can find data anywhere in the world. I think of it as wide-area sequel. So it can find data on an automotive device, a telematics device, an IoT device, it could be a mobile device, we think of it as sequel the whole world. You write a query, you can find the data anywhere it is, and we take advantage of the processing power on the edge. The biggest problem with IoT is, it's been the old mantra of, go find the data, bring it all back to a centralized warehouse, that makes it impossible to do it real time. We're enabling real time because we can write a query once, find data anywhere, this is technology we've had in preview for the last year. We've been working with a lot of clients to prove out used cases to do it, we're integrating as the capability inside of IBM Cloud Private for Data. So if you buy IBM Cloud for Data, it's there. >> Interesting, so when you've been around as long as I have, long enough to see some of the pendulums swings, and it's clearly a pendulum swing back toward decentralization in the edge, but the key is, from what you just described, is you're sort of redefining the boundary, so I presume it's the edge, any Cloud, or on premises, where you can find that data, is that correct? >> Yeah, so it's multi-Cloud. I mean, look, every organization is going to be multi-Cloud, like 100%, that's going to happen, and that could be private, it could be multiple public Cloud providers, but the key point is, data on the edge is not just limited to what's in those Clouds. It could be anywhere that you're collecting data. And, we're enabling an architecture which performs incredibly well, because you take advantage of processing power on the edge, where you can get data anywhere that it sits. >> Okay, so, then, I'm setting up a Cloud, I'll call it a Cloud architecture, that encompasses the edge, where essentially, there are no boundaries, and you're bringing security. We talked about containers before, we've been talking about Kubernetes all week here at a Big Data show. And then of course, Cloud, and what's interesting, I think many of the Hadoop distral vendors kind of missed Cloud early on, and then now are sort of saying, oh wow, it's a hybrid world and we've got a part, you guys obviously made some moves, a couple billion dollar moves, to do some acquisitions and get hardcore into Cloud, so that becomes a critical component. You're not just limiting your scope to the IBM Cloud. You're recognizing that it's a multi-Cloud world, that' what customers want to do. Your comments. >> It's multi-Cloud, and it's not just the IBM Cloud, I think the most predominant Cloud that's emerging is every client's private Cloud. Every client I talk to is building out a containerized architecture. They need their own Cloud, and they need seamless connectivity to any public Cloud that they may be using. This is why you see such a premium being put on things like data ingestion, data curation. It's not popular, it's not exciting, people don't want to talk about it, but we're the biggest inhibitors, to this AI point, comes back to data curation, data ingestion, because if you're dealing with multiple Clouds, suddenly your data's in a bunch of different spots. >> Well, so you're basically, and we talked about this a lot on The Cube, you're bringing the Cloud model to the data, wherever the data lives. Is that the right way to think about it? >> I think organizations have spoken, set aside what they say, look at their actions. Their actions say, we don't want to move all of our data to any particular Cloud, we'll move some of our data. We need to give them seamless connectivity so that they can leave their data where they want, we can bring Cloud-Native Architecture to their data, we could also help move their data to a Cloud-Native architecture if that's what they prefer. >> Well, it makes sense, because you've got physics, latency, you've got economics, moving all the data into a public Cloud is expensive and just doesn't make economic sense, and then you've got things like GDPR, which says, well, you have to keep the data, certain laws of the land, if you will, that say, you've got to keep the data in whatever it is, in Germany, or whatever country. So those sort of edicts dictate how you approach managing workloads and what you put where, right? Okay, what's going on with Watson? Give us the update there. >> I get a lot of questions, people trying to peel back the onion of what exactly is it? So, I want to make that super clear here. Watson is a few things, start at the bottom. You need a runtime for models that you've built. So we have a product called Watson Machine Learning, runs anywhere you want, that is the runtime for how you execute models that you've built. Anytime you have a runtime, you need somewhere where you can build models, you need a development environment. That is called Watson Studio. So, we had a product called Data Science Experience, we've evolved that into Watson Studio, connecting in some of those features. So we have Watson Studio, that's the development environment, Watson Machine Learning, that's the runtime. Now you move further up the stack. We have a set of APIs that bring in human features, vision, natural language processing, audio analytics, those types of things. You can integrate those as part of a model that you build. And then on top of that, we've got things like Watson Applications, we've got Watson for call centers, doing customer service and chatbots, and then we've got a lot of clients who've taken pieces of that stack and built their own AI solutions. They've taken some of the APIs, they've taken some of the design time, the studio, they've taken some of the Watson Machine Learning. So, it is really a stack of capabilities, and where we're driving the greatest productivity, this is in a lot of the examples you'll see tonight for clients, is clients that have bought into this idea of, I need a development environment, I need a runtime, where I can deploy models anywhere. We're getting a lot of momentum on that, and then that raises the question of, well, do I have expandability, do I have trust in transparency, and that's another thing that we're working on. >> Okay, so there's API oriented architecture, exposing all these services make it very easy for people to consume. Okay, so we've been talking all week at Cube NYC, is Big Data is in AI, is this old wine, new bottle? I mean, it's clear, Rob, from the conversation here, there's a lot of substantive innovation, and early adoption, anyway, of some of these innovations, but a lot of potential going forward. Last thoughts? >> What people have to realize is AI is not magic, it's still computer science. So it actually requires some hard work. You need to roll up your sleeves, you need to understand how I get from point A to point B, you need a development environment, you need a runtime. I want people to really think about this, it's not magic. I think for a while, people have gotten the impression that there's some magic button. There's not, but if you put in the time, and it's not a lot of time, you'll see the examples tonight, most of them have been done in one or two months, there's great business value in starting to leverage AI in your business. >> Awesome, alright, so if you're in this city or you're at Strata, go to ibm.com/WinWithAI, register for the event tonight. Rob, we'll see you there, thanks so much for coming back. >> Yeah, it's going to be fun, thanks Dave, great to see you. >> Alright, keep it right there everybody, we'll be back with our next guest right after this short break, you're watching The Cube.

Published Date : Sep 13 2018

SUMMARY :

brought to you by IBM. Rob, great to see you. what you guys have going on, it's great when you have on the phases, the waves that we've seen where you want to go, you're the BI data warehouse modernization, a data catalog, if you and get the infrastructure right with, and help them get to a first and I think we have a as the architecture to news that you guys announced that are looking to do new things, I point it as that server, I get the data, of processing power on the the edge, where essentially, it's not just the IBM Cloud, Is that the right way to think about it? We need to give them seamless connectivity certain laws of the land, that is the runtime for people to consume. and it's not a lot of time, register for the event tonight. Yeah, it's going to be fun, we'll be back with our next guest

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
John Thomas	PERSON	0.99+
two months	QUANTITY	0.99+
six months	QUANTITY	0.99+
six months	QUANTITY	0.99+
Rob	PERSON	0.99+
Rob Thomas	PERSON	0.99+
Monday	DATE	0.99+
last year	DATE	0.99+
one month	QUANTITY	0.99+
Red Hat	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
Germany	LOCATION	0.99+
New York City	LOCATION	0.99+
one	QUANTITY	0.99+
Vince Lombardi	PERSON	0.99+
GDPR	TITLE	0.99+
three people	QUANTITY	0.99+
Watson Studio	TITLE	0.99+
Cube	ORGANIZATION	0.99+
ibm.com/WinWithAI	OTHER	0.99+
two	QUANTITY	0.99+
Times Square	LOCATION	0.99+
both	QUANTITY	0.99+
tonight	DATE	0.99+
First	QUANTITY	0.99+
today	DATE	0.98+
Data Science Elite	ORGANIZATION	0.98+
The Cube	TITLE	0.98+
two steps	QUANTITY	0.98+
Scala	TITLE	0.98+
Python	TITLE	0.98+
One	QUANTITY	0.98+
three	QUANTITY	0.98+
Barney	ORGANIZATION	0.98+
Javits Center	LOCATION	0.98+
Watson	TITLE	0.98+
This evening	DATE	0.98+
IBM Analytics	ORGANIZATION	0.97+
one step	QUANTITY	0.97+
Stack Overflow	ORGANIZATION	0.96+
Cloud	TITLE	0.96+
seven-figure deals	QUANTITY	0.96+
Terminal 5	LOCATION	0.96+
Watson Applications	TITLE	0.95+
Watson Machine Learning	TITLE	0.94+
a month	QUANTITY	0.94+
50 million developers	QUANTITY	0.92+

Hartej Sawhney, Hosho | Blockchain Futurist Conference 2018

>> Live, from Toronto Canada, it's the CUBE! Covering Blockchain Futurist Conference 2018. Brought to you by the CUBE. >> Hello everyone and welcome back. This is the CUBE's exclusive coverage here in Toronto for the Blockchain Futurist Conference, we're here all week. Yesterday we were at the Global Cloud and Blockchain Summit put on by DigitalBits and the community, here is the big show around thought leadership around the future of blockchain and where it's going. Certainly token economics is the hottest thing with blockchain, although the markets are down the market is not down when it comes to building things. I'm John Furrier with Dave Vellante, here with CUBE alumni and special guest Hartej Sawhney who is the founder of Hosho doing a lot of work on security space and they have a conference coming up that the CUBE will be broadcasting live at, HoshoCon this coming fall, it's in October I believe, welcome to the CUBE. >> Thank you so much for having me. >> Always great to see you man. >> What's the date of the event, real quick, what's the date on your event? >> It's October 9th to the 11th, Hard Rock Hotel & Casino, we rented out the entire property, we want everyone only to bump into the people that we're inviting and they're coming. And the focus is blockchain security. We attend over 130 conferences a year, and there's never enough conversation about blockchain security, so we figured, y'know, Defcon is still pure cybersecurity, Devcon from Ethereum is more for Ethereum developers only, and every other conference is more of a traditional blockchain conference with ICO pitch competitions. We figured we're not going to do that, and we're going to try to combine the worlds, a Defcon meets Devcon vibe, and have hackers welcome, have white hat hackers host a bug bounty, invite bright minds in the space like Max Keiser and Stacy Herbert, the founder of the Trezor wallet, RSA, y'know we've even invited everyone from our competitors to everyone in the media, to everyone that are leading the blockchain whole space. >> That's the way to run an event with community, congratulations. Mark your calendar we've got HoshoCon coming up in October. Hartej, I want to ask you, I know Dave wants to ask you your trip around the world kind of questions, but I want to get your take on something we're seeing emerging, and I know you've been talking about, I want to get your thoughts and reaction and vision on: we're starting to see the world, the losers go out of the market, and certainly prices are down on the coins, and the coins are a lot of tokens out there, >> Too many damn tokens! (laughing) >> The losers are the only ones who borrowed money to buy bitcoin. >> (laughs) Someone shorted bitcoin. >> That's it. >> But there's now an emphasis on builders and there's always been an entrepreneurial market here, alpha entrepreneurs are coming into the space you're starting to see engineers really building great stuff, there's an emphasis on builders, not just the quick hit ponies. >> Yep. >> So your thoughts on that trend. >> It's during the down-market that you can really focus on building real businesses that solve problems, that have some sort of foresight into how they're going to make real money with a product that's built and tested, and maybe even enterprise grade. And I also think that the future of fundraising is going to be security tokens, and we don't really have a viable security exchange available yet, but giving away actual equity in your business through a security token is something very exciting for sophisticated investors to participate in this future tokenized economy. >> But you're talking about real equity, not just percentage of coin. >> Yeah, y'know, actual equity in the business, but in the form of a security token. I think that's the future of fundraising to some extent. >> Is that a dual sort of vector, two vectors there, one is the value of the token itself and the equity that you get, right? >> Correct, I mean you're basically getting equity in the company, securitized in token form, and then maybe a platform like Securitize or Polymath, the security exchanges that are coming out, will list them. And so I think during the down-markets, when prices are down, again I said before the joke but it's also the truth: the only people losing in this market are the ones who borrowed to buy bitcoin. The people who believe in the technology remain to ignore the price more or less. And if you're focused on building a company this is the time to focus on building a real business. A lot of times in an up-market you think you see a business opportunity just because of the amount of money surely available to be thrown at any project, you can ICO just about any idea and get a couple a million dollars to work on it, not as easy during a down-market so you're starting to take a step back, and ask yourself questions like how do we hit $20,000 of monthly recurring revenue? And that shouldn't be such a crazy thing to ask. When you go to Silicon Valley, unless you're two-time exited, or went to Stanford, or you were an early employee at Facebook, you're not getting your first million dollar check for 15 or 20 percent of your business, even, until you make 20, 25K monthly recurring revenue. I say this on stage at a lot of my keynotes, and I feel like some people glaze their eyes over like, "obviously I know that", the majority are running an ICO where they are nowhere close to making 20K monthly recurring and when you say what's your project they go, "well, our latest traction is that we've closed about "1.5 million in our private pre-sale." That's not traction, you don't have a product built. You raised money. >> And that's a dotcom bubble dynamic where the milestone of fundraising was the traction and that really had nothing to do with building a viable business. And the benefit of blockchain is to do things differently, but achieve the same outcome, either more efficient or faster, in a new way, whether it's starting a company or achieving success. >> Yep, but at the same time, blockchain technology is relatively immature for some products to go, at least for the Fortune 500 today, for them to take a blockchain product out of R&D to the mainstream isn't going to happen right now. Right now the Fortune 500 is investing into blockchain tech but it's in R&D, and they're quickly training their employees to understand what is a smart contract?, who is Nick Szabo?, when did he come up with this word smart contracts? I was just privy to seeing some training information for multiple Fortune 500 companies training their employees on what are smart contracts. Stuff that we read four or five years ago from Nick Szabo's essays is now hitting what I would consider the mainstream, which is mid-level talent, VP-level talent at Fortune 500 companies, who know that this is the next wave. And so when we're thinking about fundraising it's the companies who raise enough money are going to be able to survive the storm, right? In this down-market, if you raised enough money in your ICO, for this vision that you have that's going to be revolutionary, a lot of times I read an ICO's white paper and all I can think is well I hope this happens, because if it does that's crazy. But the question is, did they raise enough money to survive? So that's kind of another reason why people are raising more money than they need. Do people need $100 million to do the project? I don't know. >> It's an arm's race. >> But they need to last 10 years to make this vision come true. >> Hey, so, I want to ask you about your whirlwind tour. And I want to ask in the context of something we've talked about before. You've mentioned on the CUBE that Solidity, very complex, there's a lot of bugs and a lot of security flaws as a result in some of the code. A lot of the code. You're seeing people now try to develop tooling to open up blockchain development to Java programmers, for example, which probably exacerbates the problem. So, in that context, what are you seeing around the world, what are you seeing in terms of the awareness of that problem, and how are you helping solve it? >> So, starting with Fortune 500 companies, they have floors on floors around the world full of Java engineers. Full Stack Engineers who, of course, know Java, they know C#, and they're prepared to build in this language. And so this is why I think IBM's Hyperledger went in that direction. This is why even some people have taken the Ethereum virtual machine and tried to completely rebuild it and rewrite it into functional programming languages like Clojure and Scala. Just so it's more accessible and you can do more with the functional programming language. Very few lines of code are equivalent to hundreds of lines of code in linear languages, and in functional programming languages things are concurrent and linear and you're able to build large-scale enterprise-grade solutions with very small lines of code. So I'm personally excited, I think, about seeing different types of blockchains cater more towards Fortune 500 companies being able to take advantage, right off the bat, of rooms full of Java engineers. The turn to teaching of Solidity, it's been difficult, at least from the cybersecurity perspective we're not looking for someone who's a software engineer who can teach themselves Solidity really fast. We're looking for a cybersecurity, QA-minded, quality-assurance mindset, someone who has an OPSEC mindset to learn Solidity and then audit code with the cybersecurity mindset. And we've found that to be easier than an engineer who knows Java to learn Solidity. Education is hard, we have a global shortage of qualified engineers in this space. >> So cybersecurity is a good cross-over bridge to Solidity. Skills matters. >> If you're in cybersecurity and you're a full sec engineer you can learn just about any language like anyone else. >> The key is to start at the core. >> The key is to have a QA mindset, to have the mindset of actually doing quality assurance, on code and finding vulnerabilities. >> Not as an afterthought, but as a fundamental component of the development process. >> I could be a good engineer and make an app like Angry Birds, upload it, and even before uploading it I'll get it audited by some third party professional, and once it's uploaded I can fix the bugs as we go and release another version. Most smart contracts that have money behind them are written to be irreversible. So if they get hacked, money gets stolen. >> Yeah, that's real. >> And so the mindset is shifting because of this space. >> Alright, so on your tour, paint a picture, what did you see? >> First of all, how many cities, how long? Give us the stats. >> I just did about 80 days and I hit 10 countries. Most of it was between Europe and Asia. I'll start with saying that, right now, there's a race amongst smaller nations, like Malta, Bermuda, Belarus, Panama, the island nations, where they're racing to say that "we have clarity on regulation when it comes to "the blockchain cryptocurrency industries," and this is a big deal, I'd say, mainly for cryptocurrency exchanges, that are fleeing and navigating global regulation. Like in India, Unocoin's bank has been shutdown by the RBI. And they're going up against the RBI and the central government of India because, as an exchange, their banks have been shut down. And they're being forced to navigate waters and unique waves around the world globally. You have people like the world's biggest exchange, at least by volume today is Binance. Binance has relocated 100 people to the island of Malta. For a small island nation that's still technically a part of the European Union, they've made significant progress on bringing clarity on what is legal and what is not, eventually they're saying they want to have a crypto-bank, they want to help you go from IPO to ICO from the Maltese stock exchange. Similarly also Gibraltar, and there's a law firm out there, Hassans, which is like the best law firm in Gibraltar, and they have really led the way on helping the regulators in Gibraltar bring clarity. Both Gibraltar and Malta, what's similar between them is they've been home to online gambling companies. So a lot of online casinos have been in both of their markets. >> They understand. >> They've been very innovative, in many different ways. And so even conversations with the regulators in both Malta and Gibraltar, you can hear their maturity, they understand what a smart contract is. They understand how important it is to have a smart contract audited. They already understand that every exchange in their jurisdiction has to go through regular penetration testing. That if this exchange changes its code that the code opens it up to vulnerabilities, and is the exchange going through penetration testing? So the smaller nations are moving fast. >> But they're operationalizing it faster, and it's the opportunity for them is the upside. >> My only fear is that they're still small nations, and maybe not what they want to hear but it's the truth. Operating in larger nations like the United States, Canada, Germany, even Japan, Korea, we need to see clarity in much larger nations and I think that's something that's exciting that's going to happen possibly after we have the blueprint laid out by places like Malta and Gibraltar and Bermuda. >> And what's the Wild West look like, or Wild East if you will in Asia, a lot of activity, it's a free-for-all, but there's so much energy both on the money-making side and on the capital formation side and the entrepreneurial side. Lay that out, what's that look like? >> By far the most exciting thing in Asia was Korea, Seoul, out of all the Asian tiger countries today, in August 2018, Seoul, Korea has a lot of blockchain action going on right now. It feels like you're in the future, there's actually physical buildings that say Blockchain Academy, and Blockchain Building and Bitcoin Labs, you feel like you're in 2028! (laughs) And today it's 2018. You have a lot of syndication going on, some of it illegal, it's illegal if you give a guarantee to the investor you're going to see some sort of return, as a guarantee. It's not illegal if you're putting together accredited investors who are willing to do KYC and AML and be interested in investing a couple of hundred ETH in a project. So, I would say today a lot of ICOs are flocking to Korea to do a quick fundraising round because a lot of successful syndication is happening there. Second to Korea, I would say, is a battle between Singapore and Hong Kong. They're both very interesting, It's the one place where you can find people who speak English, but also all four of the languages of the tiger nations: Japanese, Mandarin, Cantonese, Korean, all in one place in Hong Kong and Singapore. But Singapore, you still can't get a bank account as an ICO. So they're bringing clarity on regulation and saying you can come here and you can get a lawyer and you can incorporate, but an ICO still has trouble getting a bank account. Hong Kong is simply closer in proximity to China, and China has a lot of ICOs that cannot raise money from Chinese citizens. So they can raise from anybody that's not Chinese, and they don't even have a white paper, a website, or even anybody in-house that can speak English. So they're lacking English materials, English websites, and people in their company that can communicate with the rest of the world in other languages other than Mandarin or Cantonese. And that's a problem that can be solved and bridges need to be built. People are looking in China for people to build that bridge, there's a lot of action going on in Hong Kong for that reason since even though technically it's a part of China it's still not a part of China, it's a tricky gray line. >> Right, in Japan a lot going on but it's still, it's Japan, it's kind of insulated. >> The Japanese government hasn't provided clarity on regulation yet. Just like in India we're waiting for September 11th for some clarity on regulation, same way in Japan, I don't know the exact date but we don't have enough clarity on regulation. I'm seeing good projects pop up in Korea, we're even doing some audits for some projects out of Japan, but we see them at other conferences outside of Japan as well. Coming up in Singapore is consensus, I'm hoping that Singapore will turn into a better place for quality conferences, but I'm not seeing a lot of quality action out of Singapore itself. Y'know, who's based in Singapore? Lots of family funds, lots of new exchanges, lots of big crypto advisory funds have offices there, but core ICOs, there was still a higher number of them in Korea, even in Japan, even. I'm not sure about the comparison between Japan and Singapore, but there is definitely a lot more in Korea. >> What about Switzerland, do you have any visibility there? Did you visit Switzerland? >> I was Zug, I was in Crypto Valley, visited Crypto Valley labs... >> What feels best for you? >> I don't know, Mother Earth! (laughs) >> All of the above. >> The point of bitcoin is for us to start being able to treat this earth as one, and as you navigate through the crypto circuit one thing as that is becoming more visible is the power of China partnering up with the Middle East and building a One Belt, One Road initiative. I feel like One Belt, One Road ties right into the future of crypto, and it's opening up the power of markets like the Philippines, Thailand, Malaysia, Singapore. >> What Gabriel's doing in the Caribbean with Barbados. >> Gabriel from Bit, yeah. >> Yeah, Bit, he's bringing them all together. >> Yeah, I mean the island nations are open arms to companies, and I think they will attract a lot of American companies for sure. >> So you're seeing certainly more, in some pockets, more advanced regulatory climates, outside of the United States, and the talent pool is substantial. >> So then, when it comes to talent pools, I believe it was in global commits for the language of Python, China is just on the verge of surpassing the United States, and there's a lot of just global breakthroughs happening, there's a large number of Full Stack engineers at a very high level in countries like China, India, Ukraine. These are three countries that I think are outliers in that a Full Stack Engineer, at the highest level in a country like India or Ukraine for example, would cost a company between $2,000 to $5,000 a month, to employ full time, in a country where they likely won't take stock to work for your company. >> Fifteen years ago those countries were outsource, "hey, outsource some cheap labor," no, now they're product teams or engineers, they're really building value. >> They're building their own things, in-house. >> And the power of new markets are opening up as you said, this is huge, huge. OK, Hartej, thanks so much for coming on, I know you got to go, you got your event October 9th to 11th in Las Vegas, Blockchain Security Conference. >> The CUBE will be there. >> I look forward to having you there. >> You guys are the leader in Blockchain security, congratulations, hosho.io, check it out. Hosho.io, October 9th, mark your calendars. The CUBE, we are live here in Toronto, for the Blockchain Futurist Conference, with our good friend, CUBE alumni Hartej. I'm John Furrier, Dave Vellante, be right back with more live coverage from the Untraceable event here in Toronto, after this short break.

Published Date : Aug 15 2018

SUMMARY :

Live, from Toronto Canada, it's the CUBE! that the CUBE will be broadcasting live at, And the focus is blockchain security. and the coins are a lot of tokens out there, The losers are the only ones who not just the quick hit ponies. It's during the down-market that you can really focus on But you're talking about real equity, but in the form of a security token. just because of the amount of money And the benefit of blockchain is to do things differently, But the question is, did they raise enough money to survive? But they need to last 10 years to and a lot of security flaws as a result in some of the code. at least from the cybersecurity perspective So cybersecurity is a good cross-over bridge to Solidity. you can learn just about any language like anyone else. The key is to have a QA mindset, of the development process. and even before uploading it I'll get it audited First of all, how many cities, how long? Like in India, Unocoin's bank has been shutdown by the RBI. and is the exchange going through penetration testing? But they're operationalizing it faster, and it's the Operating in larger nations like the United States, and the entrepreneurial side. It's the one place where you can find people Right, in Japan a lot going on but it's still, I'm not sure about the comparison between I was Zug, I was in Crypto Valley, is the power of China partnering up with the Middle East Yeah, I mean the island nations are and the talent pool is substantial. China is just on the verge of surpassing the United States, no, now they're product teams or engineers, They're building their own things, And the power of new markets for the Blockchain Futurist Conference,

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Stacy Herbert	PERSON	0.99+
Hartej Sawhney	PERSON	0.99+
Dave	PERSON	0.99+
Bermuda	LOCATION	0.99+
Singapore	LOCATION	0.99+
Japan	LOCATION	0.99+
Korea	LOCATION	0.99+
John Furrier	PERSON	0.99+
15	QUANTITY	0.99+
August 2018	DATE	0.99+
Max Keiser	PERSON	0.99+
Switzerland	LOCATION	0.99+
September 11th	DATE	0.99+
$20,000	QUANTITY	0.99+
Hong Kong	LOCATION	0.99+
China	LOCATION	0.99+
Asia	LOCATION	0.99+
Gibraltar	LOCATION	0.99+
Hartej	PERSON	0.99+
20	QUANTITY	0.99+
IBM	ORGANIZATION	0.99+
$100 million	QUANTITY	0.99+
RSA	ORGANIZATION	0.99+
Nick Szabo	PERSON	0.99+
Malta	LOCATION	0.99+
October 9th	DATE	0.99+
Toronto	LOCATION	0.99+
2018	DATE	0.99+
European Union	ORGANIZATION	0.99+
India	LOCATION	0.99+
CUBE	ORGANIZATION	0.99+
Binance	ORGANIZATION	0.99+
Gabriel	PERSON	0.99+
Angry Birds	TITLE	0.99+
Facebook	ORGANIZATION	0.99+
20 percent	QUANTITY	0.99+
Hassans	ORGANIZATION	0.99+
October	DATE	0.99+
Unocoin	ORGANIZATION	0.99+
United States	LOCATION	0.99+
10 countries	QUANTITY	0.99+
2028	DATE	0.99+
Silicon Valley	LOCATION	0.99+
100 people	QUANTITY	0.99+
Caribbean	LOCATION	0.99+
Fortune 500	ORGANIZATION	0.99+
three countries	QUANTITY	0.99+
20K	QUANTITY	0.99+
Trezor	ORGANIZATION	0.99+
Second	QUANTITY	0.99+
Blockchain Academy	ORGANIZATION	0.99+
Europe	LOCATION	0.99+
Bitcoin Labs	ORGANIZATION	0.99+
Las Vegas	LOCATION	0.99+
Panama	LOCATION	0.99+
Belarus	LOCATION	0.99+
two vectors	QUANTITY	0.99+
first million dollar	QUANTITY	0.99+
two-time	QUANTITY	0.99+
RBI	ORGANIZATION	0.99+
Scala	TITLE	0.99+
both	QUANTITY	0.99+

Arun Murthy, Hortonworks | DataWorks Summit 2018

>> Live from San Jose in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2018, brought to you by Hortonworks. >> Welcome back to theCUBE's live coverage of DataWorks here in San Jose, California. I'm your host, Rebecca Knight, along with my cohost, Jim Kobielus. We're joined by Aaron Murphy, Arun Murphy, sorry. He is the co-founder and chief product officer of Hortonworks. Thank you so much for returning to theCUBE. It's great to have you on >> Yeah, likewise. It's been a fun time getting back, yeah. >> So you were on the main stage this morning in the keynote, and you were describing the journey, the data journey that so many customers are on right now, and you were talking about the cloud saying that the cloud is part of the strategy but it really needs to fit into the overall business strategy. Can you describe a little bit about how you're approach to that? >> Absolutely, and the way we look at this is we help customers leverage data to actually deliver better capabilities, better services, better experiences, to their customers, and that's the business we are in. Now with that obviously we look at cloud as a really key part of it, of the overall strategy in terms of how you want to manage data on-prem and on the cloud. We kind of joke that we ourself live in a world of real-time data. We just live in it and data is everywhere. You might have trucks on the road, you might have drawings, you might have sensors and you have it all over the world. At that point, we've kind of got to a point where enterprise understand that they'll manage all the infrastructure but in a lot of cases, it will make a lot more sense to actually lease some of it and that's the cloud. It's the same way, if you're delivering packages, you don't got buy planes and lay out roads you go to FedEx and actually let them handle that view. That's kind of what the cloud is. So that is why we really fundamentally believe that we have to help customers leverage infrastructure whatever makes sense pragmatically both from an architectural standpoint and from a financial standpoint and that's kind of why we talked about how your cloud strategy, is part of your data strategy which is actually fundamentally part of your business strategy. >> So how are you helping customers to leverage this? What is on their minds and what's your response? >> Yeah, it's really interesting, like I said, cloud is cloud, and infrastructure management is certainly something that's at the foremost, at the top of the mind for every CIO today. And what we've consistently heard is they need a way to manage all this data and all this infrastructure in a hybrid multi-tenant, multi-cloud fashion. Because in some GEOs you might not have your favorite cloud renderer. You know, go to parts of Asia is a great example. You might have to use on of the Chinese clouds. You go to parts of Europe, especially with things like the GDPR, the data residency laws and so on, you have to be very, very cognizant of where your data gets stored and where your infrastructure is present. And that is why we fundamentally believe it's really important to have and give enterprise a fabric with which it can manage all of this. And hide the details of all of the underlying infrastructure from them as much as possible. >> And that's DataPlane Services. >> And that's DataPlane Services, exactly. >> The Hortonworks DataPlane Services we launched in October of last year. Actually I was on CUBE talking about it back then too. We see a lot of interest, a lot of excitement around it because now they understand that, again, this doesn't mean that we drive it down to the least common denominator. It is about helping enterprises leverage the key differentiators at each of the cloud renderers products. For example, Google, which we announced a partnership, they are really strong on AI and MO. So if you are running TensorFlow and you want to deal with things like Kubernetes, GKE is a great place to do it. And, for example, you can now go to Google Cloud and get DPUs which work great for TensorFlow. Similarly, a lot of customers run on Amazon for a bunch of the operational stuff, Redshift as an example. So the world we live in, we want to help the CIO leverage the best piece of the cloud but then give them a consistent way to manage and count that data. We were joking on stage that IT has just about learned how deal with Kerberos and Hadoob And now we're telling them, "Oh, go figure out IM on Google." which is also IM on Amazon but they are completely different. The only thing that's consistent is the name. So I think we have a unique opportunity especially with the open source technologies like Altas, Ranger, Knox and so on, to be able to draw a consistent fabric over this and secured occurrence. And help the enterprise leverage the best parts of the cloud to put a best fit architecture together, but which also happens to be a best of breed architecture. >> So the fabric is everything you're describing, all the Apache open source projects in which HortonWorks is a primary committer and contributor, are able to scheme as in policies and metadata and so forth across this distributed heterogeneous fabric of public and private cloud segments within a distributed environment. >> Exactly. >> That's increasingly being containerized in terms of the applications for deployment to edge nodes. Containerization is a big theme in HTP3.0 which you announced at this show. >> Yeah. >> So, if you could give us a quick sense for how that containerization capability plays into more of an edge focus for what your customers are doing. >> Exactly, great point, and again, the fabric is obviously, the core parts of the fabric are the open source projects but we've also done a lot of net new innovation with data plans which, by the way, is also open source. Its a new product and a new platform that you can actually leverage, to lay it out over the open source ones you're familiar with. And again, like you said, containerization, what is actually driving the fundamentals of this, the details matter, the scale at which we operate, we're talking about thousands of nodes, terabytes of data. The details really matter because a 5% improvement at that scale leads to millions of dollars in optimization for capex and opex. So that's why all of that, the details are being fueled and driven by the community which is kind of what we tell over HDP3 Until the key ones, like you said, are containerization because now we can actually get complete agility in terms of how you deploy the applications. You get isolation not only at the resource management level with containers but you also get it at the software level, which means, if two data scientists wanted to use a different version of Python or Scala or Spark or whatever it is, they get that consistently and holistically. That now they can actually go from the test dev cycle into production in a completely consistent manner. So that's why containers are so big because now we can actually leverage it across the stack and the things like MiNiFi showing up. We can actually-- >> Define MiNiFi before you go further. What is MiNiFi for our listeners? >> Great question. Yeah, so we've always had NiFi-- >> Real-time >> Real-time data flow management and NiFi was still sort of within the data center. What MiNiFi does is actually now a really, really small layer, a small thin library if you will that you can throw on a phone, a doorbell, a sensor and that gives you all the capabilities of NiFi but at the edge. >> Mmm Right? And it's actually not just data flow but what is really cool about NiFi it's actually command and control. So you can actually do bidirectional command and control so you can actually change in real-time the flows you want, the processing you do, and so on. So what we're trying to do with MiNiFi is actually not just collect data from the edge but also push the processing as much as possible to the edge because we really do believe a lot more processing is going to happen at the edge especially with the A6 and so on coming out. There will be custom hardware that you can throw and essentially leverage that hardware at the edge to actually do this processing. And we believe, you know, we want to do that even if the cost of data not actually landing up at rest because at the end of the day we're in the insights business not in the data storage business. >> Well I want to get back to that. You were talking about innovation and how so much of it is driven by the open source community and you're a veteran of the big data open source community. How do we maintain that? How does that continue to be the fuel? >> Yeah, and a lot of it starts with just being consistent. From day one, James was around back then, in 2011 we started, we've always said, "We're going to be open source." because we fundamentally believed that the community is going to out innovate any one vendor regardless of how much money they have in the bank. So we really do believe that's the best way to innovate mostly because their is a sense of shared ownership of that product. It's not just one vendor throwing some code out there try to shove it down the customers throat. And we've seen this over and over again, right. Three years ago, we talk about a lot of the data plane stuff comes from Atlas and Ranger and so on. None of these existed. These actually came from the fruits of the collaboration with the community with actually some very large enterprises being a part of it. So it's a great example of how we continue to drive it6 because we fundamentally believe that, that's the best way to innovate and continue to believe so. >> Right. And the community, the Apache community as a whole so many different projects that for example, in streaming, there is Kafka, >> Okay. >> and there is others that address a core set of common requirements but in different ways, >> Exactly. >> supporting different approaches, for example, they are doing streaming with stateless transactions and so forth, or stateless semantics and so forth. Seems to me that HortonWorks is shifting towards being more of a streaming oriented vendor away from data at rest. Though, I should say HDP3.0 has got great scalability and storage efficiency capabilities baked in. I wonder if you could just break it down a little bit what the innovations or enhancements are in HDP3.0 for those of your core customers, which is most of them who are managing massive multi-terabyte, multi-petabyte distributed, federated, big data lakes. What's in HDP3.0 for them? >> Oh for lots. Again, like I said, we obviously spend a lot of time on the streaming side because that's where we see. We live in a real-time world. But again, we don't do it at the cost of our core business which continues to be HDP. And as you can see, the community trend is drive, we talked about continuization massive step up for the Hadoob Community. We've also added support for GPUs. Again, if you think about Trove's at scale machine learning. >> Graphing processing units, >> Graphical-- >> AI, deep learning >> Yeah, it's huge. Deep learning, intensive flow and so on, really, really need a custom, sort of GPU, if you will. So that's coming. That's an HDP3. We've added a whole bunch of scalability improvements with HDFS. We've added federation because now we can go from, you can go over a billion files a billion objects in HDFS. We also added capabilities for-- >> But you indicated yesterday when we were talking that very few of your customers need that capacity yet but you think they will so-- >> Oh for sure. Again, part of this is as we enable more source of data in real-time that's the fuel which drives and that was always the strategy behind the HDF product. It was about, can we leverage the synergies between the real-time world, feed that into what you do today, in your classic enterprise with data at rest and that is what is driving the necessity for scale. >> Yes. >> Right. We've done that. We spend a lot of work, again, loading the total cost of ownership the TCO so we added erasure coding. >> What is that exactly? >> Yeah, so erasure coding is a classic sort of storage concept which allows you to actually in sort of, you know HTFS has always been three replicas So for redundancy, fault tolerance and recovery. Now, it sounds okay having three replicas because it's cheap disk, right. But when you start to think about our customers running 70, 80 hundred terabytes of data those three replicas add up because you've now gone from 80 terabytes of effective data where actually two 1/4 of an exobyte in terms of raw storage. So now what we can do with erasure coding is actually instead of storing the three blocks we actually store parody. We store the encoding of it which means we can actually go down from three to like two, one and a half, whatever we want to do. So, if we can get from three blocks to one and a half especially for your core data, >> Yeah >> the ones you're not accessing every day. It results in a massive savings in terms of your infrastructure costs. And that's kind of what we're in the business doing, helping customers do better with the data they have whether it's on-prem or on the cloud, that's sort of we want to help customers be comfortable getting more data under management along with secured and the lower TCO. The other sort of big piece I'm really excited about HDP3 is all the work that's happened to Hive Community for what we call the real-time database. >> Yes. >> As you guys know, you follow the whole sequel of ours in the Doob Space. >> And hive has changed a lot in the last several years, this is very different from what it was five years ago. >> The only thing that's same from five years ago is the name (laughing) >> So again, the community has done a phenomenal job, kind of, really taking sort of a, we used to call it like a sequel engine on HDFS. From there, to drive it with 3.0, it's now like, with Hive 3 which is part of HDP3 it's a full fledged database. It's got full asset support. In fact, the asset support is so good that writing asset tables is at least as fast as writing non-asset tables now. And you can do that not only on-- >> Transactional database. >> Exactly. Now not only can you do it on prem, you can do it on S3. So you can actually drive the transactions through Hive on S3. We've done a lot of work to actually, you were there yesterday when we were talking about some of the performance work we've done with LAP and so on to actually give consistent performance both on-prem and the cloud and this is a lot of effort simply because the performance characteristics you get from the storage layer with HDFS versus S3 are significantly different. So now we have been able to bridge those with things with LAP. We've done a lot of work and sort of enhanced the security model around it, governance and security. So now you get things like account level, masking, row-level filtering, all the standard stuff that you would expect and more from an Enprise air house. We talked to a lot of our customers, they're doing, literally tens of thousands of views because they don't have the capabilities that exist in Hive now. >> Mmm-hmm 6 And I'm sitting here kind of being amazed that for an open source set of tools to have the best security and governance at this point is pretty amazing coming from where we started off. >> And it's absolutely essential for GDPR compliance and compliance HIPA and every other mandate and sensitivity that requires you to protect personally identifiable information, so very important. So in many ways HortonWorks has one of the premier big data catalogs for all manner of compliance requirements that your customers are chasing. >> Yeah, and James, you wrote about it in the contex6t of data storage studio which we introduced >> Yes. >> You know, things like consent management, having--- >> A consent portal >> A consent portal >> In which the customer can indicate the degree to which >> Exactly. >> they require controls over their management of their PII possibly to be forgotten and so forth. >> Yeah, it's going to be forgotten, it's consent even for analytics. Within the context of GDPR, you have to allow the customer to opt out of analytics, them being part of an analytic itself, right. >> Yeah. >> So things like those are now something we enable to the enhanced security models that are done in Ranger. So now, it's sort of the really cool part of what we've done now with GDPR is that we can get all these capabilities on existing data an existing applications by just adding a security policy, not rewriting It's a massive, massive, massive deal which I cannot tell you how much customers are excited about because they now understand. They were sort of freaking out that I have to go to 30, 40, 50 thousand enterprise apps6 and change them to take advantage, to actually provide consent, and try to be forgotten. The fact that you can do that now by changing a security policy with Ranger is huge for them. >> Arun, thank you so much for coming on theCUBE. It's always so much fun talking to you. >> Likewise. Thank you so much. >> I learned something every time I listen to you. >> Indeed, indeed. I'm Rebecca Knight for James Kobeilus, we will have more from theCUBE's live coverage of DataWorks just after this. (Techno music)

Published Date : Jun 19 2018

SUMMARY :

brought to you by Hortonworks. It's great to have you on Yeah, likewise. is part of the strategy but it really needs to fit and that's the business we are in. And hide the details of all of the underlying infrastructure for a bunch of the operational stuff, So the fabric is everything you're describing, in terms of the applications for deployment to edge nodes. So, if you could give us a quick sense for Until the key ones, like you said, are containerization Define MiNiFi before you go further. Yeah, so we've always had NiFi-- and that gives you all the capabilities of NiFi the processing you do, and so on. and how so much of it is driven by the open source community that the community is going to out innovate any one vendor And the community, the Apache community as a whole I wonder if you could just break it down a little bit And as you can see, the community trend is drive, because now we can go from, you can go over a billion files the real-time world, feed that into what you do today, loading the total cost of ownership the TCO sort of storage concept which allows you to actually is all the work that's happened to Hive Community in the Doob Space. And hive has changed a lot in the last several years, And you can do that not only on-- the performance characteristics you get to have the best security and governance at this point and sensitivity that requires you to protect possibly to be forgotten and so forth. Within the context of GDPR, you have to allow The fact that you can do that now Arun, thank you so much for coming on theCUBE. Thank you so much. we will have more from theCUBE's live coverage of DataWorks

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
James	PERSON	0.99+
Aaron Murphy	PERSON	0.99+
Arun Murphy	PERSON	0.99+
Arun	PERSON	0.99+
2011	DATE	0.99+
Google	ORGANIZATION	0.99+
5%	QUANTITY	0.99+
80 terabytes	QUANTITY	0.99+
FedEx	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
Arun Murthy	PERSON	0.99+
HortonWorks	ORGANIZATION	0.99+
yesterday	DATE	0.99+
San Jose, California	LOCATION	0.99+
three replicas	QUANTITY	0.99+
James Kobeilus	PERSON	0.99+
three blocks	QUANTITY	0.99+
GDPR	TITLE	0.99+
Python	TITLE	0.99+
Europe	LOCATION	0.99+
millions of dollars	QUANTITY	0.99+
Scala	TITLE	0.99+
Spark	TITLE	0.99+
theCUBE	ORGANIZATION	0.99+
five years ago	DATE	0.99+
one and a half	QUANTITY	0.98+
Enprise	ORGANIZATION	0.98+
three	QUANTITY	0.98+
Hive 3	TITLE	0.98+
Three years ago	DATE	0.98+
both	QUANTITY	0.98+
Asia	LOCATION	0.97+
50 thousand	QUANTITY	0.97+
TCO	ORGANIZATION	0.97+
MiNiFi	TITLE	0.97+
Apache	ORGANIZATION	0.97+
40	QUANTITY	0.97+
Altas	ORGANIZATION	0.97+
Hortonworks DataPlane Services	ORGANIZATION	0.96+
DataWorks Summit 2018	EVENT	0.96+
30	QUANTITY	0.95+
thousands of nodes	QUANTITY	0.95+
A6	COMMERCIAL_ITEM	0.95+
Kerberos	ORGANIZATION	0.95+
today	DATE	0.95+
Knox	ORGANIZATION	0.94+
one	QUANTITY	0.94+
hive	TITLE	0.94+
two data scientists	QUANTITY	0.94+
each	QUANTITY	0.92+
Chinese	OTHER	0.92+
TensorFlow	TITLE	0.92+
S3	TITLE	0.91+
October of last year	DATE	0.91+
Ranger	ORGANIZATION	0.91+
Hadoob	ORGANIZATION	0.91+
HIPA	TITLE	0.9+
CUBE	ORGANIZATION	0.9+
tens of thousands	QUANTITY	0.9+
one vendor	QUANTITY	0.89+
last several years	DATE	0.88+
a billion objects	QUANTITY	0.86+
70, 80 hundred terabytes of data	QUANTITY	0.86+
HTP3.0	TITLE	0.86+
two 1/4 of an exobyte	QUANTITY	0.86+
Atlas and	ORGANIZATION	0.85+
DataPlane Services	ORGANIZATION	0.84+
Google Cloud	TITLE	0.82+

Joel Horwitz, IBM | IBM CDO Summit Sping 2018

(techno music) >> Announcer: Live, from downtown San Francisco, it's theCUBE. Covering IBM Chief Data Officer Strategy Summit 2018. Brought to you by IBM. >> Welcome back to San Francisco everybody, this is theCUBE, the leader in live tech coverage. We're here at the Parc 55 in San Francisco covering the IBM CDO Strategy Summit. I'm here with Joel Horwitz who's the Vice President of Digital Partnerships & Offerings at IBM. Good to see you again Joel. >> Thanks, great to be here, thanks for having me. >> So I was just, you're very welcome- It was just, let's see, was it last month, at Think? >> Yeah, it's hard to keep track, right. >> And we were talking about your new role- >> It's been a busy year. >> the importance of partnerships. One of the things I want to, well let's talk about your role, but I really want to get into, it's innovation. And we talked about this at Think, because it's so critical, in my opinion anyway, that you can attract partnerships, innovation partnerships, startups, established companies, et cetera. >> Joel: Yeah. >> To really help drive that innovation, it takes a team of people, IBM can't do it on its own. >> Yeah, I mean look, IBM is the leader in innovation, as we all know. We're the market leader for patents, that we put out each year, and how you get that technology in the hands of the real innovators, the developers, the longtail ISVs, our partners out there, that's the challenging part at times, and so what we've been up to is really looking at how we make it easier for partners to partner with IBM. How we make it easier for developers to work with IBM. So we have a number of areas that we've been adding, so for example, we've added a whole IBM Code portal, so if you go to developer.ibm.com/code you can actually see hundreds of code patterns that we've created to help really any client, any partner, get started using IBM's technology, and to innovate. >> Yeah, and that's critical, I mean you're right, because to me innovation is a combination of invention, which is what you guys do really, and then it's adoption, which is what your customers are all about. You come from the data science world. We're here at the Chief Data Officer Summit, what's the intersection between data science and CDOs? What are you seeing there? >> Yeah, so when I was here last, it was about two years ago in 2015, actually, maybe three years ago, man, time flies when you're having fun. >> Dave: Yeah, the Spark Summit- >> Yeah Spark Technology Center and the Spark Summit, and we were here, I was here at the Chief Data Officer Summit. And it was great, and at that time, I think a lot of the conversation was really not that different than what I'm seeing today. Which is, how do you manage all of your data assets? I think a big part of doing good data science, which is my kind of background, is really having a good understanding of what your data governance is, what your data catalog is, so, you know we introduced the Watson Studio at Think, and actually, what's nice about that, is it brings a lot of this together. So if you look in the market, in the data market, today, you know we used to segment it by a few things, like data gravity, data movement, data science, and data governance. And those are kind of the four themes that I continue to see. And so outside of IBM, I would contend that those are relatively separate kind of tools that are disconnected, in fact Dinesh Nirmal, who's our engineer on the analytic side, Head of Development there, he wrote a great blog just recently, about how you can have some great machine learning, you have some great data, but if you can't operationalize that, then really you can't put it to use. And so it's funny to me because we've been focused on this challenge, and IBM is making the right steps, in my, I'm obviously biased, but we're making some great strides toward unifying the, this tool chain. Which is data management, to data science, to operationalizing, you know, machine learning. So that's what we're starting to see with Watson Studio. >> Well, I always push Dinesh on this and like okay, you've got a collection of tools, but are you bringing those together? And he flat-out says no, we developed this, a lot of this from scratch. Yes, we bring in the best of the knowledge that we have there, but we're not trying to just cobble together a bunch of disparate tools with a UI layer. >> Right, right. >> It's really a fundamental foundation that you're trying to build. >> Well, what's really interesting about that, that piece, is that yeah, I think a lot of folks have cobbled together a UI layer, so we formed a partnership, coming back to the partnership view, with a company called Lightbend, who's based here in San Francisco, as well as in Europe, and the reason why we did that, wasn't just because of the fact that Reactive development, if you're not familiar with Reactive, it's essentially Scala, Akka, Play, this whole framework, that basically allows developers to write once, and it kind of scales up with demand. In fact, Verizon actually used our platform with Lightbend to launch the iPhone 10. And they show dramatic improvements. Now what's exciting about Lightbend, is the fact that application developers are developing with Reactive, but if you turn around, you'll also now be able to operationalize models with Reactive as well. Because it's basically a single platform to move between these two worlds. So what we've continued to see is data science kind of separate from the application world. Really kind of, AI and cloud as different universes. The reality is that for any enterprise, or any company, to really innovate, you have to find a way to bring those two worlds together, to get the most use out of it. >> Fourier always says "Data is the new development kit". He said this I think five or six years ago, and it's barely becoming true. You guys have tried to make an attempt, and have done a pretty good job, of trying to bring those worlds together in a single platform, what do you call it? The Watson Data Platform? >> Yeah, Watson Data Platform, now Watson Studio, and I think the other, so one side of it is, us trying to, not really trying, but us actually bringing together these disparate systems. I mean we are kind of a systems company, we're IT. But not only that, but bringing our trained algorithms, and our trained models to the developers. So for example, we also did a partnership with Unity, at the end of last year, that's now just reaching some pretty good growth, in terms of bringing the Watson SDK to game developers on the Unity platform. So again, it's this idea of bringing the game developer, the application developer, in closer contact with these trained models, and these trained algorithms. And that's where you're seeing incredible things happen. So for example, Star Trek Bridge Crew, which I don't know how many Trekkies we have here at the CDO Summit. >> A few over here probably. >> Yeah, a couple? They're using our SDK in Unity, to basically allow a gamer to use voice commands through the headset, through a VR headset, to talk to other players in the virtual game. So we're going to see more, I can't really disclose too much what we're doing there, but there's some cool stuff coming out of that partnership. >> Real immersive experience driving a lot of data. Now you're part of the Digital Business Group. I like the term digital business, because we talk about it all the time. Digital business, what's the difference between a digital business and a business? What's the, how they use data. >> Joel: Yeah. >> You're a data person, what does that mean? That you're part of the Digital Business Group? Is that an internal facing thing? An external facing thing? Both? >> It's really both. So our Chief Digital Officer, Bob Lord, he has a presentation that he'll give, where he starts out, and he goes, when I tell people I'm the Chief Digital Officer they usually think I just manage the website. You know, if I tell people I'm a Chief Data Officer, it means I manage our data, in governance over here. The reality is that I think these Chief Digital Officer, Chief Data Officer, they're really responsible for business transformation. And so, if you actually look at what we're doing, I think on both sides is we're using data, we're using marketing technology, martech, like Optimizely, like Segment, like some of these great partners of ours, to really look at how we can quickly A/B test, get user feedback, to look at how we actually test different offerings and market. And so really what we're doing is we're setting up a testing platform, to bring not only our traditional offers to market, like DB2, Mainframe, et cetera, but also bring new offers to market, like blockchain, and quantum, and others, and actually figure out how we get better product-market fit. What actually, one thing, one story that comes to mind, is if you've seen the movie Hidden Figures- >> Oh yeah. >> There's this scene where Kevin Costner, I know this is going to look not great for IBM, but I'm going to say it anyways, which is Kevin Costner has like a sledgehammer, and he's like trying to break down the wall to get the mainframe in the room. That's what it feels like sometimes, 'cause we create the best technology, but we forget sometimes about the last mile. You know like, we got to break down the wall. >> Where am I going to put it? >> You know, to get it in the room! So, honestly I think that's a lot of what we're doing. We're bridging that last mile, between these different audiences. So between developers, between ISVs, between commercial buyers. Like how do we actually make this technology, not just accessible to large enterprise, which are our main clients, but also to the other ecosystems, and other audiences out there. >> Well so that's interesting Joel, because as a potential partner of IBM, they want, obviously your go-to-market, your massive company, and great distribution channel. But at the same time, you want more than that. You know you want to have a closer, IBM always focuses on partnerships that have intrinsic value. So you talked about offerings, you talked about quantum, blockchain, off-camera talking about cloud containers. >> Joel: Yeah. >> I'd say cloud and containers may be a little closer than those others, but those others are going to take a lot of market development. So what are the offerings that you guys are bringing? How do they get into the hands of your partners? >> I mean, the commonality with all of these, all the emerging offerings, if you ask me, is the distributed nature of the offering. So if you look at blockchain, it's a distributed ledger. It's a distributed transaction chain that's secure. If you look at data, really and we can hark back to say, Hadoop, right before object storage, it's distributed storage, so it's not just storing on your hard drive locally, it's storing on a distributed network of servers that are all over the world and data centers. If you look at cloud, and containers, what you're really doing is not running your application on an individual server that can go down. You're using containers because you want to distribute that application over a large network of servers, so that if one server goes down, you're not going to be hosed. And so I think the fundamental shift that you're seeing is this distributed nature, which in essence is cloud. So I think cloud is just kind of a synonym, in my opinion, for distributed nature of our business. >> That's interesting and that brings up, you're right, cloud and Big Data/Hadoop, we don't talk about Hadoop much anymore, but it kind of got it all started, with that notion of leave the data where it is. And it's the same thing with cloud. You can't just stuff your business into the public cloud. You got to bring the cloud to your data. >> Joel: That's right. >> But that brings up a whole new set of challenges, which obviously, you're in a position just to help solve. Performance, latency, physics come into play. >> Physics is a rough one. It's kind of hard to avoid that one. >> I hear your best people are working on it though. Some other partnerships that you want to sort of, elucidate. >> Yeah, no, I mean we have some really great, so I think the key kind of partnership, I would say area, that I would allude to is, one of the things, and you kind of referenced this, is a lot of our partners, big or small, want to work with our top clients. So they want to work with our top banking clients. They want, 'cause these are, if you look at for example, MaRisk and what we're doing with them around blockchain, and frankly, talk about innovation, they're innovating containers for real, not virtual containers- >> And that's a joint venture right? >> Yeah, it is, and so it's exciting because, what we're bringing to market is, I also lead our startup programs, called the Global Entrepreneurship Program, and so what I'm focused on doing, and you'll probably see more to come this quarter, is how do we actually bridge that end-to-end? How do you, if you're startup or a small business, ultimately reach that kind of global business partner level? And so kind of bridging that, that end-to-end. So we're starting to bring out a number of different incentives for partners, like co-marketing, so I'll help startups when they're early, figure out product-market fit. We'll give you free credits to use our innovative technology, and we'll also bring you into a number of clients, to basically help you not burn all of your cash on creating your own marketing channel. God knows I did that when I was at a start-up. So I think we're doing a lot to kind of bridge that end-to-end, and help any partner kind of come in, and then grow with IBM. I think that's where we're headed. >> I think that's a critical part of your job. Because I mean, obviously IBM is known for its Global 2000, big enterprise presence, but startups, again, fuel that innovation fire. So being able to attract them, which you're proving you can, providing whatever it is, access, early access to cloud services, or like you say, these other offerings that you're producing, in addition to that go-to-market, 'cause it's funny, we always talk about how efficient, capital efficient, software is, but then you have these companies raising hundreds of millions of dollars, why? Because they got to do promotion, marketing, sales, you know, go-to-market. >> Yeah, it's really expensive. I mean, you look at most startups, like their biggest ticket item is usually marketing and sales. And building channels, and so yeah, if you're, you know we're talking to a number of partners who want to work with us because of the fact that, it's not just like, the direct kind of channel, it's also, as you kind of mentioned, there's other challenges that you have to overcome when you're working with a larger company. for example, security is a big one, GDPR compliance now, is a big one, and just making sure that things don't fall over, is a big one. And so a lot of partners work with us because ultimately, a number of the decision makers in these larger enterprises are going, well, I trust IBM, and if IBM says you're good, then I believe you. And so that's where we're kind of starting to pull partners in, and pull an ecosystem towards us. Because of the fact that we can take them through that level of certification. So we have a number of free online courses. So if you go to partners, excuse me, ibm.com/partners/learn there's a number of blockchain courses that you can learn today, and will actually give you a digital certificate, that's actually certified on our own blockchain, which we're actually a first of a kind to do that, which I think is pretty slick, and it's accredited at some of the universities. So I think that's where people are looking to IBM, and other leaders in this industry, is to help them become experts in their, in this technology, and especially in this emerging technology. >> I love that blockchain actually, because it's such a growing, and interesting, and innovative field. But it needs players like IBM, that can bring credibility, enterprise-grade, whether it's security, or just, as I say, credibility. 'Cause you know, this is, so much of negative connotations associated with blockchain and crypto, but companies like IBM coming to the table, enterprise companies, and building that ecosystem out is in my view, crucial. >> Yeah, no, it takes a village. I mean, there's a lot of folks, I mean that's a big reason why I came to IBM, three, four years ago, was because when I was in start-up land, I used to work for H20, I worked for Alpine Data Labs, Datameer, back in the Hadoop days, and what I realized was that, it's an opportunity cost. So you can't really drive true global innovation, transformation, in some of these bigger companies because there's only so much that you can really kind of bite off. And so you know at IBM it's been a really rewarding experience because we have done things like for example, we partnered with Girls Who Code, Treehouse, Udacity. So there's a number of early educators that we've partnered with, to bring code to, to bring technology to, that frankly, would never have access to some of this stuff. Some of this technology, if we didn't form these alliances, and if we didn't join these partnerships. So I'm very excited about the future of IBM, and I'm very excited about the future of what our partners are doing with IBM, because, geez, you know the cloud, and everything that we're doing to make this accessible, is bar none, I mean, it's great. >> I can tell you're excited. You know, spring in your step. Always a lot of energy Joel, really appreciate you coming onto theCUBE. >> Joel: My pleasure. >> Great to see you again. >> Yeah, thanks Dave. >> You're welcome. Alright keep it right there, everybody. We'll be back. We're at the IBM CDO Strategy Summit in San Francisco. You're watching theCUBE. (techno music) (touch-tone phone beeps)

Published Date : May 2 2018

SUMMARY :

Brought to you by IBM. Good to see you again Joel. that you can attract partnerships, To really help drive that innovation, and how you get that technology Yeah, and that's critical, I mean you're right, Yeah, so when I was here last, to operationalizing, you know, machine learning. that we have there, but we're not trying that you're trying to build. to really innovate, you have to find a way in a single platform, what do you call it? So for example, we also did a partnership with Unity, to basically allow a gamer to use voice commands I like the term digital business, to look at how we actually test different I know this is going to look not great for IBM, but also to the other ecosystems, But at the same time, you want more than that. So what are the offerings that you guys are bringing? So if you look at blockchain, it's a distributed ledger. You got to bring the cloud to your data. But that brings up a whole new set of challenges, It's kind of hard to avoid that one. Some other partnerships that you want to sort of, elucidate. and you kind of referenced this, to basically help you not burn all of your cash early access to cloud services, or like you say, that you can learn today, but companies like IBM coming to the table, that you can really kind of bite off. really appreciate you coming onto theCUBE. We're at the IBM CDO Strategy Summit in San Francisco.

ENTITIES

Entity	Category	Confidence
Joel	PERSON	0.99+
Joel Horwitz	PERSON	0.99+
Europe	LOCATION	0.99+
IBM	ORGANIZATION	0.99+
Kevin Costner	PERSON	0.99+
Dave	PERSON	0.99+
Dinesh Nirmal	PERSON	0.99+
Alpine Data Labs	ORGANIZATION	0.99+
Lightbend	ORGANIZATION	0.99+
Verizon	ORGANIZATION	0.99+
San Francisco	LOCATION	0.99+
Hidden Figures	TITLE	0.99+
Bob Lord	PERSON	0.99+
Both	QUANTITY	0.99+
MaRisk	ORGANIZATION	0.99+
both	QUANTITY	0.99+
iPhone 10	COMMERCIAL_ITEM	0.99+
2015	DATE	0.99+
Datameer	ORGANIZATION	0.99+
both sides	QUANTITY	0.99+
one story	QUANTITY	0.99+
Think	ORGANIZATION	0.99+
five	DATE	0.99+
hundreds	QUANTITY	0.99+
Treehouse	ORGANIZATION	0.99+
three years ago	DATE	0.99+
developer.ibm.com/code	OTHER	0.99+
Unity	ORGANIZATION	0.98+
two worlds	QUANTITY	0.98+
Reactive	ORGANIZATION	0.98+
GDPR	TITLE	0.98+
one side	QUANTITY	0.98+
Digital Business Group	ORGANIZATION	0.98+
today	DATE	0.98+
Udacity	ORGANIZATION	0.98+
ibm.com/partners/learn	OTHER	0.98+
last month	DATE	0.98+
Watson Studio	ORGANIZATION	0.98+
each year	QUANTITY	0.97+
three	DATE	0.97+
single platform	QUANTITY	0.97+
Girls Who Code	ORGANIZATION	0.97+
Parc 55	LOCATION	0.97+
one thing	QUANTITY	0.97+
four themes	QUANTITY	0.97+
Spark Technology Center	ORGANIZATION	0.97+
six years ago	DATE	0.97+
H20	ORGANIZATION	0.97+
four years ago	DATE	0.97+
martech	ORGANIZATION	0.97+
Unity	TITLE	0.96+
hundreds of millions of dollars	QUANTITY	0.94+
Watson Studio	TITLE	0.94+
Dinesh	PERSON	0.93+
one server	QUANTITY	0.93+

Piotr Mierzejewski, IBM | Dataworks Summit EU 2018

>> Announcer: From Berlin, Germany, it's theCUBE covering Dataworks Summit Europe 2018 brought to you by Hortonworks. (upbeat music) >> Well hello, I'm James Kobielus and welcome to theCUBE. We are here at Dataworks Summit 2018, in Berlin, Germany. It's a great event, Hortonworks is the host, they made some great announcements. They've had partners doing the keynotes and the sessions, breakouts, and IBM is one of their big partners. Speaking of IBM, from IBM we have a program manager, Piotr, I'll get this right, Piotr Mierzejewski, your focus is on data science machine learning and data science experience which is one of the IBM Products for working data scientists to build and to train models in team data science enterprise operational environments, so Piotr, welcome to theCUBE. I don't think we've had you before. >> Thank you. >> You're a program manager. I'd like you to discuss what you do for IBM, I'd like you to discuss Data Science Experience. I know that Hortonworks is a reseller of Data Science Experience, so I'd like you to discuss the partnership going forward and how you and Hortonworks are serving your customers, data scientists and others in those teams who are building and training and deploying machine learning and deep learning, AI, into operational applications. So Piotr, I give it to you now. >> Thank you. Thank you for inviting me here, very excited. This is a very loaded question, and I would like to begin, before I get actually to why the partnership makes sense, I would like to begin with two things. First, there is no machine learning about data. And second, machine learning is not easy. Especially, especially-- >> James: I never said it was! (Piotr laughs) >> Well there is this kind of perception, like you can have a data scientist working on their Mac, working on some machine learning algorithms and they can create a recommendation engine, let's say in a two, three days' time. This is because of the explosion of open-source in that space. You have thousands of libraries, from Python, from R, from Scala, you have access to Spark. All these various open-source offerings that are enabling data scientists to actually do this wonderful work. However, when you start talking about bringing machine learning to the enterprise, this is not an easy thing to do. You have to think about governance, resiliency, the data access, actual model deployments, which are not trivial. When you have to expose this in a uniform fashion to actually various business units. Now all this has to actually work in a private cloud, public clouds environment, on a variety of hardware, a variety of different operating systems. Now that is not trivial. (laughs) Now when you deploy a model, as the data scientist is going to deploy the model, he needs to be able to actually explain how the model was created. He has to be able to explain what the data was used. He needs to ensure-- >> Explicable AI, or explicable machine learning, yeah, that's a hot focus of our concern, of enterprises everywhere, especially in a world where governance and tracking and lineage GDPR and so forth, so hot. >> Yes, you've mentioned all the right things. Now, so given those two things, there's no ML web data, and ML is not easy, why the partnership between Hortonworks and IBM makes sense, well, you're looking at the number one industry leading big data plot from Hortonworks. Then, you look at a DSX local, which, I'm proud to say, I've been there since the first line of code, and I'm feeling very passionate about the product, is the merger between the two, ability to integrate them tightly together gives your data scientists secure access to data, ability to leverage the spark that runs inside a Hortonworks cluster, ability to actually work in a platform like DSX that doesn't limit you to just one kind of technology but allows you to work with multiple technologies, ability to actually work on not only-- >> When you say technologies here, you're referring to frameworks like TensorFlow, and-- >> Precisely. Very good, now that part I'm going to get into very shortly, (laughs) so please don't steal my thunder. >> James: Okay. >> Now, what I was saying is that not only DSX and Hortonworks integrated to the point that you can actually manage your Hadoop clusters, Hadoop environments within a DSX, you can actually work on your Python models and your analytics within DSX and then push it remotely to be executed where your data is. Now, why is this important? If you work with the data that's megabytes, gigabytes, maybe you know you can pull it in, but in truly what you want to do when you move to the terabytes and the petabytes of data, what happens is that you actually have to push the analytics to where your data resides, and leverage for example YARN, a resource manager, to distribute your workloads and actually train your models on your actually HDP cluster. That's one of the huge volume propositions. Now, mind you to say this is all done in a secure fashion, with ability to actually install DSX on the edge notes of the HDP clusters. >> James: Hmm... >> As of HDP 264, DSX has been certified to actually work with HDP. Now, this partnership embarked, we embarked on this partnership about 10 months ago. Now, often happens that there is announcements, but there is not much materializing after such announcement. This is not true in case of DSX and HDP. We have had, just recently we have had a release of the DSX 1.2 which I'm super excited about. Now, let's talk about those open-source toolings in the various platforms. Now, you don't want to force your data scientists to actually work with just one environment. Some of them might prefer to work on Spark, some of them like their RStudio, they're statisticians, they like R, others like Python, with Zeppelin, say Jupyter Notebook. Now, how about Tensorflow? What are you going to do when actually, you know, you have to do the deep learning workloads, when you want to use neural nets? Well, DSX does support ability to actually bring in GPU notes and do the Tensorflow training. As a sidecar approach, you can append the note, you can scale the platform horizontally and vertically, and train your deep learning workloads, and actually remove the sidecar out. So you should put it towards the cluster and remove it at will. Now, DSX also actually not only satisfies the needs of your programmer data scientists, that actually code in Python and Scala or R, but actually allows your business analysts to work and create models in a visual fashion. As of DSX 1.2, you can actually, we have embedded, integrated, an SPSS modeler, redesigned, rebranded, this is an amazing technology from IBM that's been on for a while, very well established, but now with the new interface, embedded inside a DSX platform, allows your business analysts to actually train and create the model in a visual fashion and, what is beautiful-- >> Business analysts, not traditional data scientists. >> Not traditional data scientists. >> That sounds equivalent to how IBM, a few years back, was able to bring more of a visual experience to SPSS proper to enable the business analysts of the world to build and do data-mining and so forth with structured data. Go ahead, I don't want to steal your thunder here. >> No, no, precisely. (laughs) >> But I see it's the same phenomenon, you bring the same capability to greatly expand the range of data professionals who can do, in this case, do machine learning hopefully as well as professional, dedicated data scientists. >> Certainly, now what we have to also understand is that data science is actually a team sport. It involves various stakeholders from the organization. From executive, that actually gives you the business use case to your data engineers that actually understand where your data is and can grant the access-- >> James: They manage the Hadoop clusters, many of them, yeah. >> Precisely. So they manage the Hadoop clusters, they actually manage your relational databases, because we have to realize that not all the data is in the datalinks yet, you have legacy systems, which DSX allows you to actually connect to and integrate to get data from. It also allows you to actually consume data from streaming sources, so if you actually have a Kafka message cob and actually were streaming data from your applications or IoT devices, you can actually integrate all those various data sources and federate them within the DSX to use for machine training models. Now, this is all around predictive analytics. But what if I tell you that right now with the DSX you can actually do prescriptive analytics as well? With the 1.2, again I'm going to be coming back to this 1.2 DSX with the most recent release we have actually added decision optimization, an industry-leading solution from IBM-- >> Prescriptive analytics, gotcha-- >> Yes, for prescriptive analysis. So now if you have warehouses, or you have a fleet of trucks, or you want to optimize the flow in let's say, a utility company, whether it be for power or could it be for, let's say for water, you can actually create and train prescriptive models within DSX and deploy them the same fashion as you will deploy and manage your SPSS streams as well as the machine learning models from Spark, from Python, so with XGBoost, Tensorflow, Keras, all those various aspects. >> James: Mmmhmm. >> Now what's going to get really exciting in the next two months, DSX will actually bring in natural learning language processing and text analysis and sentiment analysis by Vio X. So Watson Explorer, it's another offering from IBM... >> James: It's called, what is the name of it? >> Watson Explorer. >> Oh Watson Explorer, yes. >> Watson Explorer, yes. >> So now you're going to have this collaborative message platform, extendable! Extendable collaborative platform that can actually install and run in your data centers without the need to access internet. That's actually critical. Yes, we can deploy an IWS. Yes we can deploy an Azure. On Google Cloud, definitely we can deploy in Softlayer and we're very good at that, however in the majority of cases we find that the customers have challenges for bringing the data out to the cloud environments. Hence, with DSX, we designed it to actually deploy and run and scale everywhere. Now, how we have done it, we've embraced open source. This was a huge shift within IBM to realize that yes we do have 350,000 employees, yes we could develop container technologies, but why? Why not embrace what is actually industry standards with the Docker and equivalent as they became industry standards? Bring in RStudio, the Jupyter, the Zeppelin Notebooks, bring in the ability for a data scientist to choose the environments they want to work with and actually extend them and make the deployments of web services, applications, the models, and those are actually full releases, I'm not only talking about the model, I'm talking about the scripts that can go with that ability to actually pull the data in and allow the models to be re-trained, evaluated and actually re-deployed without taking them down. Now that's what actually becomes, that's what is the true differentiator when it comes to DSX, and all done in either your public or private cloud environments. >> So that's coming in the next version of DSX? >> Outside of DSX-- >> James: We're almost out of time, so-- >> Oh, I'm so sorry! >> No, no, no. It's my job as the host to let you know that. >> Of course. (laughs) >> So if you could summarize where DSX is going in 30 seconds or less as a product, the next version is, what is it? >> It's going to be the 1.2.1. >> James: Okay. >> 1.2.1 and we're expecting to release at the end of June. What's going to be unique in the 1.2.1 is infusing the text and sentiment analysis, so natural language processing with predictive and prescriptive analysis for both developers and your business analysts. >> James: Yes. >> So essentially a platform not only for your data scientist but pretty much every single persona inside the organization >> Including your marketing professionals who are baking sentiment analysis into what they do. Thank you very much. This has been Piotr Mierzejewski of IBM. He's a Program Manager for DSX and for ML, AI, and data science solutions and of course a strong partnership is with Hortonworks. We're here at Dataworks Summit in Berlin. We've had two excellent days of conversations with industry experts including Piotr. We want to thank everyone, we want to thank the host of this event, Hortonworks for having us here. We want to thank all of our guests, all these experts, for sharing their time out of their busy schedules. We want to thank everybody at this event for all the fascinating conversations, the breakouts have been great, the whole buzz here is exciting. GDPR's coming down and everybody's gearing up and getting ready for that, but everybody's also focused on innovative and disruptive uses of AI and machine learning and business, and using tools like DSX. I'm James Kobielus for the entire CUBE team, SiliconANGLE Media, wishing you all, wherever you are, whenever you watch this, have a good day and thank you for watching theCUBE. (upbeat music)

Published Date : Apr 19 2018

SUMMARY :

brought to you by Hortonworks. and to train models in team data science and how you and Hortonworks are serving your customers, Thank you for inviting me here, very excited. from Python, from R, from Scala, you have access to Spark. GDPR and so forth, so hot. that doesn't limit you to just one kind of technology Very good, now that part I'm going to get into very shortly, and then push it remotely to be executed where your data is. Now, you don't want to force your data scientists of the world to build and do data-mining (laughs) you bring the same capability the business use case to your data engineers James: They manage the Hadoop clusters, With the 1.2, again I'm going to be coming back to this as you will deploy and manage your SPSS streams in the next two months, DSX will actually bring in and allow the models to be re-trained, evaluated It's my job as the host to let you know that. (laughs) is infusing the text and sentiment analysis, and of course a strong partnership is with Hortonworks.

ENTITIES

Entity	Category	Confidence
Piotr Mierzejewski	PERSON	0.99+
James Kobielus	PERSON	0.99+
James	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Piotr	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
30 seconds	QUANTITY	0.99+
Berlin	LOCATION	0.99+
IWS	ORGANIZATION	0.99+
Python	TITLE	0.99+
Spark	TITLE	0.99+
two	QUANTITY	0.99+
First	QUANTITY	0.99+
Scala	TITLE	0.99+
Berlin, Germany	LOCATION	0.99+
350,000 employees	QUANTITY	0.99+
DSX	ORGANIZATION	0.99+
Mac	COMMERCIAL_ITEM	0.99+
two things	QUANTITY	0.99+
RStudio	TITLE	0.99+
DSX	TITLE	0.99+
DSX 1.2	TITLE	0.98+
both developers	QUANTITY	0.98+
second	QUANTITY	0.98+
GDPR	TITLE	0.98+
Watson Explorer	TITLE	0.98+
Dataworks Summit 2018	EVENT	0.98+
first line	QUANTITY	0.98+
Dataworks Summit Europe 2018	EVENT	0.98+
SiliconANGLE Media	ORGANIZATION	0.97+
end of June	DATE	0.97+
TensorFlow	TITLE	0.97+
thousands of libraries	QUANTITY	0.96+
R	TITLE	0.96+
Jupyter	ORGANIZATION	0.96+
1.2.1	OTHER	0.96+
two excellent days	QUANTITY	0.95+
Dataworks Summit	EVENT	0.94+
Dataworks Summit EU 2018	EVENT	0.94+
SPSS	TITLE	0.94+
one	QUANTITY	0.94+
Azure	TITLE	0.92+
one kind	QUANTITY	0.92+
theCUBE	ORGANIZATION	0.92+
HDP	ORGANIZATION	0.91+

Greg Benson, SnapLogic | Flink Forward 2018

>> Announcer: Live from San Francisco, it's theCUBE covering Flink Forward brought to you by Data Artisans. >> Hi this is George Gilbert. We are at Flink Forward on the ground in San Francisco. This is the user conference for the Apache Flink Community. It's the second one in the US and this is sponsored by Data Artisans. We have with us Greg Benson, who's Chief Scientist at Snap Logic and also professor of computer science at University of San Francisco. >> Yeah that's great, thanks for havin' me. >> Good to have you. So, Greg, tell us a little bit about how Snap Logic currently sets up its, well how it builds its current technology to connect different applications. And then talk about, a little bit, where you're headed and what you're trying to do. >> Sure, sure, so Snap Logic is a data and app integration Cloud platform. We provide a graphical interface that lets you drag and drop. You can open components that we call Snaps and you kind of put them together like Lego pieces to define relatively sophisticated tasks so that you don't have to write Java code. We use machine learning to help you build out these pipelines quickly so we can anticipate based on your data sources, what you are going to need next, and that lends itself to rapid building of these pipelines. We have a couple of different ways to execute these pipelines. You can think of it as sort of this specification of what the pipeline's supposed to do. We have a proprietary engine that we can execute on single notes, either in the Cloud or behind your firewall in your data center. We also have a mode which can translate these pipelines into Spark code and then execute those pipelines at scale. So, you can do sort of small, low latency processing to sort of larger, batch processing on very large data sets. >> Okay, and so you were telling me before that you're evaluating Flink or doing research with Flink as another option. Tell us what use cases that would address that the first two don't. >> Yeah, good question. I'd love to just back up a little bit. So, because I have this dual role of Chief Scientist and as a professor of Computer Science, I'm able to get graduate students to work on research projects for credit, and then eventually as interns at SnapLogic. A recent project that we've been working on since we started last fall so working on about six or seven months now is investigating Flink as a possible new back end for the SnapLogic platform. So this allows us to you know, to explore and prototype and just sort of figure out if there's going to be a good match between an emerging technology and our platform. So, to go back to your question. What would this address? Well, so, without going into too much of the technical differences between Flink and Spark which I imagine has come up in some of your conversations or it comes up here because they can solve similar use cases our experience with Flink is the code base is easy to work with both from taking our specification of pipelines and then converting them into Flink code that can run. But there's another benefit that we see from Flink and that is, whenever any product, whether it's our product or anybody else's product, that uses something like Spark or Flink as a back end, there's this challenge because you're converting something that your users understand into this target, right, this Spark API code or Flink API code. And the challenge there is if something goes wrong, how do you propagate that back to the users so the user doesn't have to read log files or get into the nuts and bolts of how Spark really works. >> It's almost like you've compiled the code, and now if something doesn't work right, you need to work at the source level. >> That's exactly right, and that's what we don't want our users to do, right? >> Right. >> So one promising thing about Flink is that we're able to integrate the code base in such a way that we have a better understanding of what's happening in the failure conditions that occur. And we're working on ways to propagate those back to the user so they can take actionable steps to remedy those without having to understand the Flink API code iself. >> And what is it, then, about Flink or its API that gives you that feedback about errors or you know, operational status that gives you better visibility than you would get in something else like Spark. >> Yeah, so without getting too too deep on the subject, what we have found is, one thing nice about the Flink code base is the core is written in Scala, but there's a lot of, all the IO and memory handling is written in Java and that's where we need to do our primary interfacing and the building blocks, sort of the core building blocks to get to, for example, something that you build with a dataset API to execution. We have found it easier to follow the transformation steps that Flink takes to end up with the resulting sort of optimized, optimized Flink pipeline. Now by understanding that transformation, like you were saying, the compilation step, by understanding it, then we can work backwards, and understand how, when something happens, how to trace it back to what the user was originally trying to specify. >> The GUI specification. >> Yeah. Right. >> So, help me understand though it sounds like you're the one essentially building a compiler from a graphical specification language down to Spark as the, you know, sort of, pseudo, you know, psuedo compile code, >> Yep. >> Or Flink. And, but if you're the one doing that compilation, I'm still struggling to understand why you would have better reverse engineering capabilities with one. >> It just is a matter of getting visibility into the steps that the underlying frameworks are taking and so, I'm not saying this is impossible to do in Spark, but we have found that we've had, it's been easier for us to get into the transformation steps that Flink is taking. >> Almost like, for someone who's had as much programming as a one semester in night school, like a variable and specter that's already there, >> Yeah, that's a good, there you go, yeah, yeah, yeah. >> Okay, so you don't have to go try and you can't actually add it, and you don't have to then infer it from all this log data. >> Now, I should add, there's another potential Flink. You were asking about use cases and what does Flink address. As you know, Flink is a streaming platform, in addition to being a batch platform, and Flink does streaming differently than how Spark does. Spark takes a microbatch approach. What we're also looking at in my research effort is how to take advantage of Flink's streaming approach to allow the SnapLogic GUI to be used to specify streaming Flink applications. Initially we're just focused on the batch mode but now we're also looking at the potential to convert these graphical pipelines into streaming Flink applications, which would be a great benefit to customers who want-- >> George: Real time integration. >> Want to do what Alibaba and all the other companies are doing but take advantage of it without having to get to the nuts and bolts of the programming. Do it through the GUI. >> Wow, so it's almost like, it's like, Flink, Beam, in terms of obstruction layers, >> Sure. >> And then SnapLogic. >> Greg: Sure, yes. >> Not that you would compile the beam, but the idea that you would have perv and processing and a real-time pipeline. >> Yes. >> Okay. So that's actually interesting, so that would open up a whole new set of capabilities. >> Yeah and, you know, it follows our you know, company's vision in allowing lots of users to do very sophisticated things without being, you know, Hadoop developers or Spark developers, or even Flink developers, we do a lot of the hard work of trying to give you a representation that's easier to work with, right but, also allow you to sort of evolve that and de-bug it and also eventually get the performance out of these systems One of the challenges of course of Spark and Flink is that they have to be tuned, and you have to, and so what we're trying to do is, using some of our machine learning, is eventually gather information that can help us identify how to tune different types of work flows in different environments. And that, if we're able to do that in it's entirety, then we, you know, we take out a lot of the really hard work that goes into making a lot of these streaming applications both scalable and performing. >> Performimg. So this would be, but you would have, to do that, you would probably have to collect well, what's the term? I guess data from the operations of many customers, >> Right. >> Because, as training data, just as the developer alone, you won't really have enough. >> Absolutely, and that's, so that you have to bootstrap that. For our machine learning that we currently use today, we leverage, you know, the thousands of pipelines, the trillions of documents that we now process on a monthly basis, and that allows us to provide good recommendations when you're building pipelines, because we have a lot of information. >> Oh, so you are serving the runtime, these runtime compilations. >> Yes. >> Oh, they're not all hosted on the customer premises. >> Oh, no no no, we do both. So it's interesting, we do both. So you can, you can deploy completely in the cloud, we're a complete SASS provider for you. Most of our customers though, you know, Banks Healthcare, want to run our engine behind their firewalls. Even when we do that though, we still have metadata that we can get introspection, sort of anonymized, but we can get introspection into how things are behaving. >> Okay. That's very interesting. Alright, Greg we're going to have to end it on that note, but uh you know, I guess everyone stay tuned. That sounds like a big step forward in sort of specification of real time pipelines at a graphical level. >> Yeah, well, it's, I hope to be talking to you again soon with more results. >> Looking forward to it. With that, this is George Gilbert, we are at Flink Forward, the user conference for the Apache Flink conference, sorry for the Apache Flink user community, sponsored by Data Artisans, we will be back shortly. (upbeat music)

Published Date : Apr 11 2018

SUMMARY :

brought to you by Data Artisans. We are at Flink Forward on the ground in San Francisco. and what you're trying to do. so that you don't have to write Java code. Okay, and so you were telling me before So this allows us to you know, to explore and prototype you need to work at the source level. so they can take actionable steps to remedy those that gives you that feedback something that you build with a dataset API to execution. you would have better and so, I'm not saying this is impossible to do in Spark, and you don't have to then infer it from all this log data. As you know, Flink is a streaming platform, Want to do what Alibaba and all the other companies the idea that you would have perv and processing so that would open up a whole new is that they have to be tuned, and you have to, So this would be, but you would have, to do that, just as the developer alone, you won't really have enough. we leverage, you know, the thousands of pipelines, Oh, so you are serving the runtime, Most of our customers though, you know, Banks Healthcare, you know, I guess everyone stay tuned. Yeah, well, it's, I hope to be talking to you again soon Looking forward to it.

ENTITIES

Entity	Category	Confidence
Greg Benson	PERSON	0.99+
George Gilbert	PERSON	0.99+
Greg	PERSON	0.99+
US	LOCATION	0.99+
Alibaba	ORGANIZATION	0.99+
Java	TITLE	0.99+
San Francisco	LOCATION	0.99+
Snap Logic	ORGANIZATION	0.99+
George	PERSON	0.99+
Data Artisans	ORGANIZATION	0.99+
both	QUANTITY	0.99+
Spark	TITLE	0.99+
Scala	TITLE	0.99+
Flink	ORGANIZATION	0.99+
thousands	QUANTITY	0.99+
Banks Healthcare	ORGANIZATION	0.99+
second one	QUANTITY	0.99+
Lego	ORGANIZATION	0.99+
last fall	DATE	0.98+
one semester	QUANTITY	0.98+
SnapLogic	ORGANIZATION	0.98+
SnapLogic	TITLE	0.97+
first two	QUANTITY	0.97+
today	DATE	0.97+
Flink	TITLE	0.96+
single notes	QUANTITY	0.96+
about six	QUANTITY	0.96+
trillions of documents	QUANTITY	0.95+
Flink Forward	ORGANIZATION	0.95+
seven months	QUANTITY	0.94+
One	QUANTITY	0.94+
University of San Francisco	ORGANIZATION	0.93+
one	QUANTITY	0.92+
one thing	QUANTITY	0.91+
Apache Flink Community	ORGANIZATION	0.89+
Spark	ORGANIZATION	0.85+
Apache Flink	ORGANIZATION	0.82+
Flink Forward	EVENT	0.82+
2018	DATE	0.81+
pipelines	QUANTITY	0.81+
Flink	EVENT	0.76+
SASS	ORGANIZATION	0.73+
dual	QUANTITY	0.72+
Logic	TITLE	0.72+
Apache	ORGANIZATION	0.57+
Hadoop	ORGANIZATION	0.54+
thing	QUANTITY	0.54+
Beam	TITLE	0.51+

David Abercrombie, Sharethrough & Michael Nixon, Snowflake | Big Data SV 2018

>> Narrator: Live from San Jose, it's theCUBE. Presenting Big Data, Silicon Valley. Brought to you by SiliconANGLE Media and its ecosystem partners. >> Hi, I'm George Gilbert, and we are broadcasting from the Strata Data Conference, we're right around the corner at the Forager Tasting Room & Eatery. We have this wonderful location here, and we are very lucky to have with us Michael Nixon, from Snowflake, which is a leading cloud data warehouse. And David Abercrombie from Sharethrough which is a leading ad tech company. And between the two of them, they're going to tell us some of the most advance these cases we have now for cloud-native data warehousing. Michael, why don't you start with giving us some context for how on a cloud platform one might rethink a data warehouse? >> Yeah, thank you. That's a great question because let me first answer it from the end-user, business value perspective, when you run a workload on a cloud, there's a certain level of expectation you want out of the cloud. You want scalability, you want unlimited scalability, you want to be able to support all your users, you want to be able to support the data types, whatever they may be that comes in into your organization. So, there's a level of expectation that one should expect from a service point of view once you're in a cloud. So, a lot of the technology that were built up to this point have been optimized for on-premises types of data warehousing where perhaps that level of service and currency and unlimited scalability was not really expected but, guess what? Once it comes to the cloud, it's expected. So those on-premises technologies aren't suitable in the cloud, so for enterprises and, I mean, companies, organizations of all types from finance, banking, manufacturing, ad tech as we'll have today, they want that level of service in the cloud. And so, those technologies will not work, and so it requires a rethinking of how those architectures are built. And it requires being built for the cloud. >> And just to, alright, to break this down and be really concrete, some of the rethinking. We separate compute from storage, which is a familiar pattern that we've learned in the cloud but we also then have to have this sort of independent elasticity between-- >> Yes. Storage and the compute, and then Snowflake's taken it even a step further where you can spin out multiple compute clusters. >> Right. >> Tell us how that works and why that's so difficult and unique. >> Yeah, you know, that's taking us under the covers a little bit, but what makes our infrastructure unique is that we have a three-layer architecture. We separate, just as you said, storage from the compute layer, from the services layer. And that's really important because as I mentioned before, you want unlimited capacity, unlimited resources. So, if you scale, compute, and today's world on on-premises MPP, what that really means is that you have to bring the storage along with the compute because compute is tied to the storage so when you scale the storage along with the compute, usually that involves a lot of burden on the data warehouse manager because now they have to redistribute the data and that means redistributing keys, managing keys if you will. And that's a burden, and by the reverse, if all you wanted to do was increase storage but not the compute, because compute was tied to storage. Why you have to buy these additional compute notes, and that might add to the cost when, in fact, all you really wanted to pay for was for additional storage? So, by separating those, you keep them independent, and so you can scale storage apart from compute and then, once you have your compute resources in place, the virtual warehouses that you're talking about that have completed the job, you spun them up, it's done its job, and you take it down, guess what? You can release those resources, and of course, in releasing those resources, basically you can cut your cost as well because, for us, it's pure usage-based pricing. You only pay for what you use, and that's really fantastic. >> Very different from the on-prem model where, as you were saying, tied compute and storage together, so. >> Yeah, let's think about what that means architecturally, right? So if you have an on-premises data warehouse, and you want to scale your capacity, chances are you'll have to have that hardware in place already. And having that hardware in place already means you're paying that expense and, so you may pay for that expense six months prior to need it. Let's take a retailer example. >> Yeah. >> You're gearing up for a peak season, which might be Christmas, and so you put that hardware in place sometime in June, you'll always put it in advanced because why? You have to bring up the environment, so you have to allow time for implementation or, if you will, deployment to make sure everything is operational. >> Okay. >> And then what happens is when that peak period comes, you can't expand in that capacity. But what happens once that peak period is over? You paid for that hardware, but you don't really need it. So, our vision is, or the vision we believe you should have when you move workloads to the cloud is, you pay for those when you need them. >> Okay, so now, David, help us understand, first, what was the business problem you were trying to solve? And why was Snowflake, you know, sort of uniquely suited for that? >> Well, let me talk a little bit about Sharethrough. We're ad tech, at the core of our business we run an ad exchange, where we're doing programmatic training with the bids, with the real-time bidding spec. The data is very high in volume, with 12 billion impressions a month, that's a lot of bids that we have to process, a lot of bid requests. The way it operates, the bids and the bid responses and programmatic training are encoded in JSONs, so our ad exchange is basically exchanging messages in JSON with our business partners. And the JSONs are very complicated, there's a lot of richness and detail, such that the advertisers can decide whether or not they want to bid. Well, this data is very complicated, very high-volume. And advertising, like any business, we really need to have good analytics to understand how our business is operating, how our publishers are doing, how our advertisers are doing. And it all depends upon this very high-volume, very complex JSON event data stream. So, Snowflake was able to ingest our high-volume data very gracefully. The JSON parsing techniques of Snowflake allow me to expose the complicated data structure in a way that's very transparent and usable to our analysts. Our use of Snowflake has replaced clunkier tools where the analysts basically had to be programmers, writing programs in Scala or something to do in analysis. And now, because we've transparently and easily exposed the complicated structures within Snowflake in a relational database, they can use good old-fashioned SQL to run their queries, literally, afternoon analysis is now a five-minute query. >> So, let me, as I'm listening to you describe this. We've had various vendors telling us about these workflows in the sort of data prep and data science tool change. It almost sounds to me like Snowflake is taking semi-structured or complex data and it's sort of unraveling it and normalizing is kind of an overloaded term but it's making it business-ready, so you don't need as much of that manual data prep. >> Yeah, exactly, you don't need as much manual data prep, or you don't need as much expertise. For instance, Snowflake's JSON capabilities, in terms of drilling down the JSON tree with dot path notation, or expanding nested objects is very expressive, very powerful, but still your typical analyst or your BI tool certainly wouldn't know how to do that. So, in Snowflake, we sort of have our cake and eat it too. We can have our JSONs with their full richness in our database, but yet we can simplify and expose the data elements that are needed for analysis, so that an analyst, their first day on the job, they can get right to work and start writing queries. >> So let me ask you about, a little more about the programmatic ad use case. So if you have billions of impressions per month, I'm guessing that means you have quite a few times more, in terms of bids, and then there's the, you know once you have, I guess a successful one, you want to track what happens. >> Correct. >> So tell us a little more about that, what that workload looks like, in terms of, what analytics you're trying to perform, what's your tracking? >> Yeah, well, you're right. There's different steps in our funnel. The impression request expands out by a factor of a dozen as we send it to all the different potential bidders. We track all that data, the responses come back, we track that, we track our decisions and why we selected the bidder. And then, once the ad is shown, of course there's various beacons and tracking things that fire. We'd have to track all of that data, and the only way we could make sense out of our business is by bringing all that data together. And in a way that is reliable, transparent, and visible, and also has data integrity, that's another thing I like about the Snowflake database is that it's a good old-fashioned SQL database that I can declare my primary keys, I can run QC checks, I can ensure high data integrity that is demanded by BI and other sorts of analytics. >> What would be, as you continue to push the boundaries of the ad tech service, what's some functionality that you're looking to add, and Snowflake as your partner, either that's in there now that you still need to take advantage of or things that you're looking to in the future? >> Well, moving forward, of course, we, it's very important for us to be able to quickly gauge the effectiveness of new products. The ad tech market is fast-changing, there's always new ways of bidding, new products that are being developed, new ways for the ad ecosystem to work. And so, as we roll those out, we need to be able to quickly analyze, you know, "Is this thing working or not?" You know, kind of an agile environment, pivot or prove it. Does this feature work or not? So, having all the data in one place makes that possible for that very quick assessment of the viability of a new feature, new product. >> And, dropping down a little under the covers for how that works, does that mean, like you still have the base JSON data that you've absorbed, but you're going to expose it with different schemas or access patterns? >> Yeah, indeed. For instance, we make use of the SQL schemas, roles, and permissions internally where we can have the different teams have their own domain of data that they can expose internally, and looking forward, there's the share house feature of Snowflake that we're looking to implement with our partners, where, rather than sending them data, like a daily dump of data, we can give them access to their data in our database through this top layer that Michael mentioned, the service layer, essentially allows me to create a view grant select onto another customer. So I no longer have to send daily data dumps to partners or have some sort of API for getting data. They can simply query the data themselves so we'll be implementing that feature with our major partners. >> I would be remiss in not asking at a data conference like this, now that there's the tie-in with CuBOL and Spark Integration and Machine Learning, is there anything along that front that you're planning to exploit in the near future? >> Well, yeah, Sharethrough, we're very experimental, playful, we're always examining new data technologies and new ways of doing things but now with Snowflake as sort of our data warehouse of curated data. I've got two petabytes of referential integrity data, and that is reliable. We can move forward into our other analyses and other uses of data knowing that we have captured every event exactly once, and we know exactly where it fits in a business context, in a relational manner. It's clean, good data integrity, reliable, accessible, visible, and it's just plain old SQL. (chuckles) >> That's actually a nice way to sum it up. We've got the integrity that we've come to expect and love from relational databases. We've got the flexibility of machine-oriented data, or JSON. But we don't have to give up the query engine, and then now you have more advanced features, analytic features that you can take advantage of coming down the pipe. >> Yeah, again we're a modern platform for the modern age, that's basically cloud-based computing. With a platform like Snowflake in the backend, you can now move those workloads that you're accustomed to to the cloud and have in the environment that you're familiar with, and it saves you a lot of time and effort. You can focus on more strategic projects. >> Okay, well, with that, we're going to take a short break. This has been George Gilbert, we're with Michael Nixon of Snowflake, and David Abercrombie of Sharethrough listening to how the most modern ad tech companies are taking advantage of the most modern cloud data warehouses. And we'll be back after a short break here at the Strata Data Conference, thanks. (quirky music)

Published Date : Mar 9 2018

SUMMARY :

Brought to you by SiliconANGLE Media some of the most advance these cases we have now a certain level of expectation you want out of the cloud. concrete, some of the rethinking. Storage and the compute, and then Snowflake's taken it and unique. that have completed the job, you spun them up, Very different from the on-prem model where, as you and you want to scale your capacity, chances are You have to bring up the environment, so you have to allow You paid for that hardware, but you don't really need it. of richness and detail, such that the advertisers can So, let me, as I'm listening to you describe this. of drilling down the JSON tree with dot path notation, I'm guessing that means you have quite a few times more, I like about the Snowflake database analyze, you know, "Is this thing working or not?" the service layer, essentially allows me to create and that is reliable. and then now you have more you can now move those workloads that you're accustomed to at the Strata Data Conference, thanks.

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
George Gilbert	PERSON	0.99+
David Abercrombie	PERSON	0.99+
Michael Nixon	PERSON	0.99+
Michael	PERSON	0.99+
June	DATE	0.99+
two	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
Scala	TITLE	0.99+
first	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
five-minute	QUANTITY	0.99+
Snowflake	TITLE	0.99+
Christmas	EVENT	0.98+
Strata Data Conference	EVENT	0.98+
three-layer	QUANTITY	0.98+
first day	QUANTITY	0.98+
a dozen	QUANTITY	0.98+
two petabytes	QUANTITY	0.97+
Sharethrough	ORGANIZATION	0.97+
JSON	TITLE	0.97+
SQL	TITLE	0.96+
one place	QUANTITY	0.95+
six months	QUANTITY	0.94+
Forager Tasting Room & Eatery	ORGANIZATION	0.91+
today	DATE	0.89+
Snowflake	ORGANIZATION	0.87+
Spark	TITLE	0.87+
12 billion impressions a month	QUANTITY	0.87+
Machine Learning	TITLE	0.84+
Big Data	ORGANIZATION	0.84+
billions of impressions	QUANTITY	0.8+
CuBOL	TITLE	0.79+
Big Data SV 2018	EVENT	0.77+
once	QUANTITY	0.72+
theCUBE	ORGANIZATION	0.63+
JSONs	TITLE	0.61+
times	QUANTITY	0.55+

Rob Thomas, IBM | Machine Learning Everywhere 2018

>> Announcer: Live from New York, it's theCUBE, covering Machine Learning Everywhere: Build Your Ladder to AI, brought to you by IBM. >> Welcome back to New York City. theCUBE continue our coverage here at IBM's event, Machine Learning Everywhere: Build Your Ladder to AI. And with us now is Rob Thomas, who is the vice president of, or general manager, rather, of IBM analytics. Sorry about that, Rob. Good to have you with us this morning. Good to see you, sir. >> Great to see you John. Dave, great to see you as well. >> Great to see you. >> Well let's just talk about the event first. Great lineup of guests. We're looking forward to visiting with several of them here on theCUBE today. But let's talk about, first off, general theme with what you're trying to communicate and where you sit in terms of that ladder to success in the AI world. >> So, maybe start by stepping back to, we saw you guys a few times last year. Once in Munich, I recall, another one in New York, and the theme of both of those events was, data science renaissance. We started to see data science picking up steam in organizations. We also talked about machine learning. The great news is that, in that timeframe, machine learning has really become a real thing in terms of actually being implemented into organizations, and changing how companies run. And that's what today is about, is basically showcasing a bunch of examples, not only from our clients, but also from within IBM, how we're using machine learning to run our own business. And the thing I always remind clients when I talk to them is, machine learning is not going to replace managers, but I think machine learning, managers that use machine learning will replace managers that do not. And what you see today is a bunch of examples of how that's true because it gives you superpowers. If you've automated a lot of the insight, data collection, decision making, it makes you a more powerful manager, and that's going to change a lot of enterprises. >> It seems like a no-brainer, right? I mean, or a must-have. >> I think there's a, there's always that, sometimes there's a fear factor. There is a culture piece that holds people back. We're trying to make it really simple in terms of how we talk about the day, and the examples that we show, to get people comfortable, to kind of take a step onto that ladder back to the company. >> It's conceptually a no-brainer, but it's a challenge. You wrote a blog and it was really interesting. It was, one of the clients said to you, "I'm so glad I'm not in the technology industry." And you went, "Uh, hello?" (laughs) "I've got news for you, you are in the technology industry." So a lot of customers that I talk to feel like, meh, you know, in our industry, it's really not getting disrupted. That's kind of taxis and retail. We're in banking and, you know, but, digital is disrupting every industry and every industry is going to have to adopt ML, AI, whatever you want to call it. Can traditional companies close that gap? What's your take? >> I think they can, but, I'll go back to the word I used before, it starts with culture. Am I accepting that I'm a technology company, even if traditionally I've made tractors, as an example? Or if traditionally I've just been you know, selling shirts and shoes, have I embraced the role, my role as a technology company? Because if you set that culture from the top, everything else flows from there. It can't be, IT is something that we do on the side. It has to be a culture of, it's fundamental to what we do as a company. There was an MIT study that said, data-driven cultures drive productivity gains of six to 10 percent better than their competition. You can't, that stuff compounds, too. So if your competitors are doing that and you're not, not only do you fall behind in the short term but you fall woefully behind in the medium term. And so, I think companies are starting to get there but it takes a constant push to get them focused on that. >> So if you're a tractor company, you've got human expertise around making tractors and messaging and marketing tractors, and then, and data is kind of there, sort of a bolt-on, because everybody's got to be data-driven, but if you look at the top companies by market cap, you know, we were talking about it earlier. Data is foundational. It's at their core, so, that seems to me to be the hard part, Rob, I'd like you to comment in terms of that cultural shift. How do you go from sort of data in silos and, you know, not having cloud economics and, that are fundamental, to having that dynamic, and how does IBM help? >> You know, I think, to give companies credit, I think most organizations have developed some type of data practice or discipline over the last, call it five years. But most of that's historical, meaning, yeah, we'll take snapshots of history. We'll use that to guide decision making. You fast-forward to what we're talking about today, just so we're on the same page, machine learning is about, you build a model, you train a model with data, and then as new data flows in, your model is constantly updating. So your ability to make decisions improves over time. That's very different from, we're doing historical reporting on data. And so I think it's encouraging that companies have kind of embraced that data discipline in the last five years, but what we're talking about today is a big next step and what we're trying to break it down to what I call the building blocks, so, back to the point on an AI ladder, what I mean by an AI ladder is, you can't do AI without machine learning. You can't do machine learning without analytics. You can't do analytics without the right data architecture. So those become the building blocks of how you get towards a future of AI. And so what I encourage companies is, if you're not ready for that AI leading edge use case, that's okay, but you can be preparing for that future now. That's what the building blocks are about. >> You know, I think we're, I know we're ahead of, you know, Jeremiah Owyang on a little bit later, but I was reading something that he had written about gut and instinct, from the C-Suite, and how, that's how companies were run, right? You had your CEO, your president, they made decisions based on their guts or their instincts. And now, you've got this whole new objective tool out there that's gold, and it's kind of taking some of the gut and instinct out of it, in a way, and maybe there are people who still can't quite grasp that, that maybe their guts and their instincts, you know, what their gut tells them, you know, is one thing, but there's pretty objective data that might indicate something else. >> Moneyball for business. >> A little bit of a clash, I mean, is there a little bit of a clash in that respect? >> I think you'd be surprise by how much decision making is still pure opinion. I mean, I see that everywhere. But we're heading more towards what you described for sure. One of the clients talking here today, AMC Networks, think it's a great example of a company that you wouldn't think of as a technology company, primarily a content producer, they make great shows, but they've kind of gone that extra step to say, we can integrate data sources from third parties, our own data about viewer habits, we can do that to change our relationship with advertisers. Like, that's a company that's really embraced this idea of being a technology company, and you can see it in their results, and so, results are not coincidence in this world anymore. It's about a practice applied to data, leveraging machine learning, on a path towards AI. If companies are doing that, they're going to be successful. >> And we're going to have the tally from AMC on, but so there's a situation where they have embraced it, that they've dealt with that culture, and data has become foundational. Now, I'm interested as to what their journey look like. What are you seeing with clients? How they break this down, the silos of data that have been built up over decades. >> I think, so they get almost like a maturity curve. You've got, and the rule I talk about is 40-40-20, where 40% of organizations are really using data just to optimize costs right now. That's okay, but that's on the lower end of the maturity curve. 40% are saying, all right, I'm starting to get into data science. I'm starting to think about how I extend to new products, new services, using data. And then 20% are on the leading edge. And that's where I'd put AMC Networks, by the way, because they've done unique things with integrating data sets and building models so that they've automated a lot of what used to be painstakingly long processes, internal processes to do it. So you've got this 40-40-20 of organizations in terms of their maturity on this. If you're not on that curve right now, you have a problem. But I'd say most are somewhere on that curve. If you're in the first 40% and you're, right now data for you is just about optimizing cost, you're going to be behind. If you're not right now, you're going to be behind in the next year, that's a problem. So I'd kind of encourage people to think about what it takes to be in the next 40%. Ultimately you want to be in the 20% that's actually leading this transformation. >> So change it to 40-20-40. That's where you want it to go, right? You want to flip that paradigm. >> I want to ask you a question. You've done a lot of M and A in the past. You spent a lot of time in Silicon Valley and Silicon Valley obviously very, very disruptive, you know, cultures and organizations and it's always been a sort of technology disruption. It seems like there's a ... another disruption going on, not just horizontal technologies, you know, cloud or mobile or social, whatever it is, but within industries. Some industries, as we've been talking, radically disrupted. Retail, taxis, certainly advertising, et cetera et cetera. Some have not yet, the client that you talked to. Do you see, technology companies generally, Silicon Valley companies specifically, as being able to pull off a sort of disruption of not only technologies but also industries and where does IBM play there? You've made a sort of, Ginni in particular has made a deal about, hey, we're not going to compete with our customers. So talking about this sort of dual disruption agenda, one on the technology side, one within industries that Apple's getting into financial services and, you know, Amazon getting into grocery, what's your take on that and where does IBM fit in that world? >> So, I mean, IBM has been in Silicon Valley for a long time, I would say probably longer than 99.9% of the companies in Silicon Valley, so, we've got a big lab there. We do a lot of innovation out of there. So love it, I mean, the culture of the valley is great for the world because it's all about being the challenger, it's about innovation, and that's tremendous. >> No fear. >> Yeah, absolutely. So, look, we work with a lot of different partners, some who are, you know, purely based in the valley. I think they challenge us. We can learn from them, and that's great. I think the one, the one misnomer that I see right now, is there's a undertone that innovation is happening in Silicon Valley and only in Silicon Valley. And I think that's a myth. Give you an example, we just, in December, we released something called Event Store which is basically our stab at reinventing the database business that's been pretty much the same for the last 30 to 40 years. And we're now ingesting millions of rows of data a second. We're doing it in a Parquet format using a Spark engine. Like, this is an amazing innovation that will change how any type of IOT use case can manage data. Now ... people don't think of IBM when they think about innovations like that because it's not the only thing we talk about. We don't have, the IBM website isn't dedicated to that single product because IBM is a much bigger company than that. But we're innovating like crazy. A lot of that is out of what we're doing in Silicon Valley and our labs around the world and so, I'm very optimistic on what we're doing in terms of innovation. >> Yeah, in fact, I think, rephrase my question. I was, you know, you're right. I mean people think of IBM as getting disrupted. I wasn't posing it, I think of you as a disruptor. I know that may sound weird to some people but in the sense that you guys made some huge bets with things like Watson on solving some of the biggest, world's problems. And so I see you as disrupting sort of, maybe yourselves. Okay, frame that. But I don't see IBM as saying, okay, we are going to now disrupt healthcare, disrupt financial services, rather we are going to help our, like some of your comp... I don't know if you'd call them competitors. Amazon, as they say, getting into content and buying grocery, you know, food stores. You guys seems to have a different philosophy. That's what I'm trying to get to is, we're going to disrupt ourselves, okay, fine. But we're not going to go hard into healthcare, hard into financial services, other than selling technology and services to those organizations, does that make sense? >> Yeah, I mean, look, our mission is to make our clients ... better at what they do. That's our mission, we want to be essential in terms of their journey to be successful in their industry. So frankly, I love it every time I see an announcement about Amazon entering another vertical space, because all of those companies just became my clients. Because they're not going to work with Amazon when they're competing with them head to head, day in, day out, so I love that. So us working with these companies to make them better through things like Watson Health, what we're doing in healthcare, it's about making companies who have built their business in healthcare, more effective at how they perform, how they drive results, revenue, ROI for their investors. That's what we do, that's what IBM has always done. >> Yeah, so it's an interesting discussion. I mean, I tend to agree. I think Silicon Valley maybe should focus on those technology disruptions. I think that they'll have a hard time pulling off that dual disruption and maybe if you broadly define Silicon Valley as Seattle and so forth, but, but it seems like that formula has worked for decades, and will continue to work. Other thoughts on sort of the progression of ML, how it gets into organizations. You know, where you see this going, again, I was saying earlier, the parlance is changing. Big data is kind of, you know, mm. Okay, Hadoop, well, that's fine. We seem to be entering this new world that's pervasive, it's embedded, it's intelligent, it's autonomous, it's self-healing, it's all these things that, you know, we aspire to. We're now back in the early innings. We're late innings of big data, that's kind of ... But early innings of this new era, what are your thoughts on that? >> You know, I'd say the biggest restriction right now I see, we talked before about somehow, sometimes companies don't have the desire, so we have to help create the desire, create the culture to go do this. Even for the companies that have a burning desire, the issue quickly becomes a skill gap. And so we're doing a lot to try to help bridge that skill gap. Let's take data science as an example. There's two worlds of data science that I would describe. There's clickers, and there's coders. Clickers want to do drag and drop. They will use traditional tools like SPSS, which we're modernizing, that's great. We want to support them if that's how they want to work and build models and deploy models. There's also this world of coders. This is people that want to do all their data science in ML, and Python, and Scala, and R, like, that's what they want to do. And so we're supporting them through things like Data Science Experience, which is built on Apache Jupiter. It's all open source tooling, it'd designed for coders. The reason I think that's important, it goes back to the point on skill sets. There is a skill gap in most companies. So if you walk in and you say, this is the only way to do this thing, you kind of excluded half the companies because they say, I can't play in that world. So we are intentionally going after a strategy that says, there's a segmentation in skill types. In places there's a gap, we can help you fill that gap. That's how we're thinking about them. >> And who does that bode well for? If you say that you were trying to close a gap, does that bode well for, we talked about the Millennial crowd coming in and so they, you know, do they have a different approach or different mental outlook on this, or is it to the mid-range employee, you know, who is open minded, I mean, but, who is the net sweet spot, you think, that say, oh, this is a great opportunity right now? >> So just take data science as an example. The clicker coder comment I made, I would put the clicker audience as mostly people that are 20 years into their career. They've been around a while. The coder audience is all the Millennials. It's all the new audience. I think the greatest beneficiary is the people that find themselves kind of stuck in the middle, which is they're kind of interested in this ... >> That straddle both sides of the line yeah? >> But they've got the skill set and the desire to do some of the new tooling and new approaches. So I think this kind of creates an opportunity for that group in the middle to say, you know, what am I going to adopt as a platform for how I go forward and how I provide leadership in my company? >> So your advice, then, as you're talking to your clients, I mean you're also talking to their workforce. In a sense, then, your advice to them is, you know, join, jump in the wave, right? You've got your, you can't straddle, you've got to go. >> And you've got to experiment, you've got to try things. Ultimately, organizations are going to gravitate to things that they like using in terms of an approach or a methodology or a tool. But that comes with experimentation, so people need to get out there and try something. >> Maybe we could talk about developers a little bit. We were talking to Dinesh earlier and you guys of course have focused on data scientists, data engineers, obviously developers. And Dinesh was saying, look, many, if not most, of the 10 million Java developers out there, they're not, like, focused around the data. That's really the data scientist's job. But then, my colleague John Furrier says, hey, data is the new development kit. You know, somebody said recently, you know, Andreessen's comment, "software is eating the world." Well, data is eating software. So if Furrier is right and that comment is right, it seems like developers increasingly have to become more data aware, fundamentally. Blockchain developers clearly are more data focused. What's your take on the developer community, where they fit into this whole AI, machine learning space? >> I was just in Las Vegas yesterday and I did a session with a bunch of our business partners. ISVs, so software companies, mostly a developer audience, and the discussion I had with them was around, you're doing, you're building great products, you're building great applications. But your product is only as good as the data and the intelligence that you embed in your product. Because you're still putting too much of a burden on the user, as opposed to having everything happen magically, if you will. So that discussion was around, how do you embed data, embed AI, into your products and do that at the forefront versus, you deliver a product and the client has to say, all right, now I need to get my data out of this application and move it somewhere else so I can do the data science that I want to do. That's what I see happening with developers. It's kind of ... getting them to think about data as opposed to just thinking about the application development framework, because that's where most of them tend to focus. >> Mm, right. >> Well, we've talked about, well, earlier on about the governance, so just curious, with Madhu, which I'll, we'll have that interview in just a little bit here. I'm kind of curious about your take on that, is that it's a little kinder, gentler, friendlier than maybe some might look at it nowadays because of some organization that it causes, within your group and some value that's being derived from that, that more efficiency, more contextual information that's, you know, more relevant, whatever. When you talk to your clients about meeting rules, regs, GDPR, all these things, how do you get them to see that it's not a black veil of doom and gloom but it really is, really more of an opportunity for them to cash in? >> You know, my favorite question to ask when I go visit clients is I say, I say, just show of hands, how many people have all the data they need to do their job? To date, nobody has ever raised their hand. >> Not too many hands up. >> The reason I phrased it that way is, that's fundamentally a governance challenge. And so, when you think about governance, I think everybody immediately thinks about compliance, GDPR, types of things you mentioned, and that's great. But there's two use cases for governance. One is compliance, the other one is self service analytics. Because if you've done data governance, then you can make your data available to everybody in the organization because you know you've got the right rules, the right permissions set up. That will change how people do their jobs and I think sometimes governance gets painted into a compliance corner, when organizations need to think about it as, this is about making data accessible to my entire workforce. That's a big change. I don't think anybody has that today. Except for the clients that we're working with, where I think we've made good strides in that. >> What's your sort of number one, two, and three, or pick one, advice for those companies that as you blogged about, don't realize yet that they're in the software business and the technology business? For them to close the ... machine intelligence, machine learning, AI gap, where should they start? >> I do think it can be basic steps. And the reason I say that is, if you go to a company that hasn't really viewed themselves as a technology company, and you start talking about machine intelligence, AI, like, everybody like, runs away scared, like it's not interesting. So I bring it back to building blocks. For a client to be great in data, and to become a technology company, you really need three platforms for how you think about data. You need a platform for how you manage your data, so think of it as data management. You need a platform for unified governance and integration, and you need a platform for data science and business analytics. And to some extent, I don't care where you start, but you've got to start with one of those. And if you do that, you know, you'll start to create a flywheel of momentum where you'll get some small successes. Then you can go in the other area, and so I just encourage everybody, start down that path. Pick one of the three. Or you may already have something going in one of them, so then pick one where you don't have something going. Just start down the path, because, those building blocks, once you have those in place, you'll be able to scale AI and ML in the future in your organization. But without that, you're going to always be limited to kind of a use case at a time. >> Yeah, and I would add, this is, you talked about it a couple times today, is that cultural aspect, that realization that in order to be data driven, you know, buzzword, you have to embrace that and drive that through the culture. Right? >> That starts at the top, right? Which is, it's not, you know, it's not normal to have a culture of, we're going to experiment, we're going to try things, half of them may not work. And so, it starts at the top in terms of how you set the tone and set that culture. >> IBM Think, we're less than a month away. CUBE is going to be there, very excited about that. First time that you guys have done Think. You've consolidated all your big, big events. What can we expect from you guys? >> I think it's going to be an amazing show. To your point, we thought about this for a while, consolidating to a single IBM event. There's no question just based on the response and the enrollment we have so far, that was the right answer. We'll have people from all over the world. A bunch of clients, we've got some great announcements that will come out that week. And for clients that are thinking about coming, honestly the best thing about it is all the education and training. We basically build a curriculum, and think of it as a curriculum around, how do we make our clients more effective at competing with the Amazons of the world, back to the other point. And so I think we build a great curriculum and it will be a great week. >> Well, if I've heard anything today, it's about, don't be afraid to dive in at the deep end, just dive, right? Get after it and, looking forward to the rest of the day. Rob, thank you for joining us here and we'll see you in about a month! >> Sounds great. >> Right around the corner. >> All right, Rob Thomas joining us here from IBM Analytics, the GM at IBM Analytics. Back with more here on theCUBE. (upbeat music)

Published Date : Feb 27 2018

SUMMARY :

Build Your Ladder to AI, brought to you by IBM. Good to have you with us this morning. Dave, great to see you as well. and where you sit in terms of that ladder And what you see today is a bunch of examples I mean, or a must-have. onto that ladder back to the company. So a lot of customers that I talk to And so, I think companies are starting to get there to be the hard part, Rob, I'd like you to comment You fast-forward to what we're talking about today, and it's kind of taking some of the gut But we're heading more towards what you described for sure. Now, I'm interested as to what their journey look like. to think about what it takes to be in the next 40%. That's where you want it to go, right? I want to ask you a question. So love it, I mean, the culture of the valley for the last 30 to 40 years. but in the sense that you guys made some huge bets in terms of their journey to be successful Big data is kind of, you know, mm. create the culture to go do this. The coder audience is all the Millennials. for that group in the middle to say, you know, you know, join, jump in the wave, right? so people need to get out there and try something. and you guys of course have focused on data scientists, that you embed in your product. When you talk to your clients about have all the data they need to do their job? And so, when you think about governance, and the technology business? And to some extent, I don't care where you start, that in order to be data driven, you know, buzzword, Which is, it's not, you know, it's not normal CUBE is going to be there, very excited about that. I think it's going to be an amazing show. and we'll see you in about a month! from IBM Analytics, the GM at IBM Analytics.

ENTITIES

Entity	Category	Confidence
Amazon	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
December	DATE	0.99+
Rob Thomas	PERSON	0.99+
New York	LOCATION	0.99+
Dinesh	PERSON	0.99+
AMC Networks	ORGANIZATION	0.99+
John	PERSON	0.99+
Jeremiah Owyang	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
Rob	PERSON	0.99+
20 years	QUANTITY	0.99+
Dave	PERSON	0.99+
Munich	LOCATION	0.99+
IBM Analytics	ORGANIZATION	0.99+
Las Vegas	LOCATION	0.99+
MIT	ORGANIZATION	0.99+
10 million	QUANTITY	0.99+
Apple	ORGANIZATION	0.99+
20%	QUANTITY	0.99+
last year	DATE	0.99+
Furrier	PERSON	0.99+
AMC	ORGANIZATION	0.99+
One	QUANTITY	0.99+
yesterday	DATE	0.99+
six	QUANTITY	0.99+
New York City	LOCATION	0.99+
GDPR	TITLE	0.99+
40%	QUANTITY	0.99+
both	QUANTITY	0.99+
three	QUANTITY	0.99+
one	QUANTITY	0.99+
Seattle	LOCATION	0.99+
Scala	TITLE	0.99+
two use cases	QUANTITY	0.99+
today	DATE	0.99+
Python	TITLE	0.98+
Andreessen	PERSON	0.98+
both sides	QUANTITY	0.98+
two	QUANTITY	0.98+
Watson Health	ORGANIZATION	0.98+
millions of rows	QUANTITY	0.98+
five years	QUANTITY	0.97+
next year	DATE	0.97+
less than a month	QUANTITY	0.97+
Madhu	PERSON	0.97+
Amazons	ORGANIZATION	0.96+

John Thomas, IBM | IBM Data Science For All

(upbeat music) >> Narrator: Live from New York City, it's the Cube, covering IBM Data Science for All. Brought to you by IMB. >> Welcome back to Data Science for All. It's a whole new game here at IBM's event, two-day event going on, 6:00 tonight the big keynote presentation on IBM.com so be sure to join the festivities there. You can watch it live stream, all that's happening. Right now, we're live here on the Cube, along with Dave Vellente, I'm John Walls and we are joined by John Thomas who is a distinguished engineer and director at IBM. John, thank you for your time, good to see you. >> Same here, John. >> Yeah, pleasure, thanks for being with us here. >> John Thomas: Sure. >> I know, in fact, you just wrote this morning about machine learning, so that's obviously very near and dear to you. Let's talk first off about IBM, >> John Thomas: Sure. >> Not a new concept by any means, but what is new with regard to machine learning in your work? >> Yeah, well, that's a good question, John. Actually, I get that question a lot. Machine learning itself is not new, companies have been doing it for decades, so exactly what is new, right? I actually wrote this in a blog today, this morning. It's really three different things, I call them democratizing machine learning, operationalizing machine learning, and hybrid machine learning, right? And we can talk through each of these if you like. But I would say hybrid machine learning is probably closest to my heart. So let me explain what that is because it's sounds fancy, right? (laughter) >> Right. It's what we need is another hybrid something, right? >> In reality, what it is is let data gravity decide where your data stays and let your performance requirements, your SLA's, dictate where your machine learning models go, right? So what do I mean by that? You might have sensitive data, customer data, which you want to keep on a certain platform, right? Instead of moving data off that platform to do machine learning, bring machine learning to that platform, whether that be the mainframe or specialized appliances or hadoop clusters, you name it, right? Bring machine learning to where the data is. Do the training, building of the model, where that is, but then have complete flexibility in terms of where you deploy that model. As an example, you might choose to build and train your model on premises behind the firewall using very sensitive data, but the model that has been built, you may choose to deploy that into a Cloud environment because you have other applications that need to consume it. That flexibility is what I mean by hybrid. Another example is, especially when you get into so many more complex machine learning, deep learning domains, you need exploration and there is hardware that provides that exploration, right? For example, GPU's provide exploration. Well, you need to have the flexibility to train and build the models on hardware that provides that kind of exploration, but then the model that has been built might go into inside of a CICS mainframe transaction for some second scoring of a credit card transaction as to whether it's fraudulent or not, right? So there's flexibility off peri, on peri, different platforms, this is what I mean by hybrid. >> What is the technical enabler to allow that to happen? Is it just a modern software architecture, microservices, containers, blah, blah, blah? Explain that in more detail. >> Yeah, that's a good question and we're not, you know, it's a couple different things. One is bringing native machine learning to these platforms themselves. So you need native machine learning on the mainframe, in the Cloud, in a hadoop cluster environment, in an appliance, right? So you need the run times, the libraries, the frameworks running native on those platforms. And that is not easy to do that, you know? You've got machine learning running native on ZOS, not even Linux on Z. It's native to ZOS on the mainframe. >> At the very primitive level you're talking about. >> Yeah. >> So you get the performance you need. >> You have the runtime environments there and then what you need is a seamless experience across all of these platforms. You need way to export models, repositories into which you can save models, the same API's to save models into a different repository and then consume from them there. So it's a bit of engineering that IBM is doing to enable this, right? Native capabilities on the platforms, the same API's to talk to repositories and consume from the repositories. >> So the other piece of that architecture is talking a lot of tooling that integrated and native. >> John Thomas: Yes. >> And the tooling, as you know, changes, I feel like daily. There's a new tool out there and everybody gloms onto it, so the architecture has to be able to absorb those. What is the enabler there? >> Yeah, so you actually bring up a very good point. There is a new language, a new framework everyday, right? I mean, we all know that, in the world of machine learning, Python and R and Scala. Frameworks like Spark and TensorFlow, they're table scapes now, you know? You have to support all of these, scikit-learning, you name it, right? Obviously, you need a way to support all these frameworks on the platforms you want to enable, right? And then you need an environment which lets you work with the tools of your choice. So you need an environment like a workbench which can allow you to work in the language, the framework that you are the most comfortable with. And that's what we are doing with data science experience. I don't know if you have thought of this, but data science experience is an enterprise ML platform, right, runs in the Cloud, on prem, on x86 machines, you can have it on a (mumbles) box. The idea here is support for a variety of open languages, frameworks, enable through a collaborative workbench kind of interface. >> And the decision to move, whether it's on-prem or in the Cloud, it's a function of many things, but let's talk about those. I mean, data volume is one. You can't just move your business into the Cloud. It's not going to work that well. >> It's a journey, yeah. >> It's too expensive. But then there's others, there's governance edicts and security edicts, not that the security in the Cloud is any worse, it might just different than what your organization requires, and the Cloud supplier might not support that. It's different Clouds, it's location, etc. When you talked about the data thing being on trend, maybe training a model, and then that model moving to the Cloud, so obviously, it's a lighter weight ... It's not as much-- >> Yeah, yeah, yeah, you're not moving the entire data. Right. >> But I have a concern. I wonder if clients as you about this. Okay, well, it's my data, my data, I'm going to keep behind my firewall. But that data trained that model and I'm really worried that that model is now my IP that's going to seep out into the industry. What do you tell a client? >> Yeah, that's a fair point. Obviously, you still need your security mechanisms, you access control mechanisms, your governance control mechanisms. So you need governance whether you are on the Cloud or on prem. And your encryption mechanisms, your version control mechanisms, your governance mechanisms, all need to be in place, regardless of where you deploy, right? And to your question of how do you decide where the model should go, as I said earlier to John, you know, let data gravity SLA's performance security requirements dictate where the model should go. >> We're talking so much about concepts, right, and theories that you have. Lets roll up our sleeves and get to the nitty-gritty a little bit here and talk about what are people really doing out there? >> John Thomas: Oh yeah, use cases. >> Yeah, just give us an idea for some of the ... Kind of the latest and greatest that you're seeing. >> Lots of very interesting, interesting use cases out there so actually, a part of what IBM calls a data science elite team. We go out and engage with customers on very interesting use cases, right? And we see a lot of these hybrid discussions happen as well. On one end of the spectrum is understanding customers better. So I call this reading the customer's mind. So can you understand what is in the customer's mind and have an interaction with the client without asking a bunch of questions, right? Can you look at his historical data, his browsing behavior, his purchasing behavior, and have an offer that he will really love? Can you really understand him and give him a celebrity experience? That's one class of use cases, right? Another class of use cases is around improving operations, improving your own internal processes. One example is fraud detection, right? I mean, that is a hot topic these days. So how do you, as the credit card is swiped, right, it's just a few milliseconds before that travels through a network and kicks you back in mainframe and a scoring is done to as to whether this should be approved or not. Well, you need to have a prediction of how likely this is to be fraudulent or not in the span of the transaction. Here's another one. I don't know if you call help desks now. I sometimes call them "helpless desks." (laughter) >> Try not to. >> Dave: Hell desks. >> Try not to helpless desks but, you know, for pretty every enterprise that I am talking to, there is a goal to optimize their help desk, their call centers. And call center optimization is good. So as the customer calls in, can you understand the intent of the customer? See, he may start off talking about something, but as the call progresses, the intent might change. Can you understand that? In fact, not just understand, but predict it and intercept with something that the client will love before the conversation takes a bad turn? (laughter) >> You must be listening in on my calls. >> Your calls, must be your calls! >> I meander, I go every which way. >> I game the system and just go really mad and go, let me get you an operator. (laughter) Agent, okay. >> You tow guys, your data is a special case. >> Dave: Yeah right, this guy's pissed. >> We are red-flagged right off the top. >> We're not even analyzing you. >> Day job, forget about, you know. What about things, you know, because they're moving so far out to the edge and now with mobile and that explosion there, and sensor data being what it is and all this is tremendous growth. Tough to manage. >> Dave: It is, it really is. >> I guess, maybe tougher to make sense of it, so how are you helping people make sense of this so they can really filter through and find the data that matters? >> Yeah, this is a lot of things rolled up into that question, right? One is just managing those devices, those endpoints in multiple thousands, tens of thousands, millions of these devices. How would you manage them? Then, are you doing the processing of the data and applying ML and DL right at the edge, or are you bringing the data back behind the firewall or into Cloud and then processing it there? If you are doing image reduction in a car, in a self-driving car, can you allow the latency of data being shipping of an image of a pedestrian jumping in front, do we ship across the Cloud for a deep-learning network to process it and give you an answer - oh, that's a pedestrian? You know, you may not have that latency now. So you may want to do some processing on the edge, so that is another interesting discussion, right? And you need exploration there as well. Another aspect now is, as you said, separating the signal from the noise, you know. It's just really, really coming down to the different industries that we go into, what are the signals that we understand now? Can we build on them and can we re-use them? That is an interesting discussion as well. But, yeah, you're right. With the world of exploding data that we are in, with all these devices, it's very important to have systematic approach to managing your data, cataloging it, understanding where to apply ML, where to apply exploration, governance. All of these things become important. >> I want to ask you about, come back to the use cases for a moment. You talk about celebrity experiences, I put that in sort of a marketing category. Fraud detection's always been one of the favorite, big data use cases, help desks, recommendation engines and so forth. Let's start with the fraud detection. About a year ago, first of all, fraud detection in the last six, seven years, has been getting immensely better, no question. And it's great. However, the number of false positives, about a year ago, it was too many. We're a small company but we buy a lot of equipment and lights and cameras and stuff. The number of false positives that I personally get was overwhelming. >> Yeah. >> They've gone down dramatically. >> Yeah. >> In the last 12 months. Is that just a coincidence, happenstance, or is it getting better? >> No, it's not that the bad guys have gone down in number. It's not that at all, no. (laughter) >> Well, that, I know. >> No, I think there is a lot of sophistication in terms of the algorithms that are available now. In terms of ... If you have tens of thousands of features that you're looking at, how do you collapse that space and how do you do that efficiently, right? There are techniques that are evolving in terms of handing that kind of information. In terms of the actual algorithms, are different types of innovations that are happening in that space. But I think, perhaps, the most important one is that things that use to take weeks or days to train and test, now can be done in days or minutes, right? The exploration that comes from GPU's, for example, allows you to test out different algorithms, different models and say, okay, well, this performs well enough for me to roll it out and try this out, right? It gives you a very quick cycle of innovation. >> The time to value is really compressed. Okay, now let's take one that's not so good. Ad recommendations, the Google ads that pop up. One in a hundred are maybe relevant, if that, right? And they pop up on the screen and they're annoying. I worry that Siri's listening somehow. I talk to my wife about Israel and then next thing I know, I'm getting ads for going to Israel. Is that a coincidence or are they listening? What's happening there? >> I don't know about what Google's doing. I can't comment on that. (laughter) I don't want to comment on that. >> Maybe just from a technology perspective. >> From a technology perspective, this notion of understanding what is in the customer's mind and really getting to a customer segment at one, this is top interest for many, many organizations. Regardless of which industry you are, insurance or banking or retail, doesn't matter, right? And it all comes down to the fundamental principles about how efficiently can you do. Now, can you identify the features that have the most predictive power? This is a level of sophistication in terms of the feature engineering, in terms of collapsing that space of features that I had talked about, and then, how do I actually go to the latest science of this? How do I do the exploratory analysis? How do I actually build and test my machine learning models quickly? Do the tools allow me to be very productive about this? Or do I spend weeks and weeks coding in lower-level formats? Or do I get help, do I get guided interfaces, which guide me through the process, right? And then, the topic of exploration we talk about, right? These things come together and then couple that with cognitive API's. For example, speech to text, the word (mumbles) have gone down dramatically now. So as you talk on the phone, with a very high accuracy, we can understand what is being talked about. Image recognition, the accuracy has gone up dramatically. You can create custom classifiers for industry-specific topics that you want to identify in pictures. Natural language processing, natural language understanding, all of these have evolved in the last few years. And all these come together. So machine learning's not an island. All these things coming together is what makes these dramatic advancements possible. >> Well, John, if you've figured out anything about the past 20 minutes or so, is that Dave and I want ads delivered that matter and we want our help desk questions answered right away. (laugher) so if you can help us with that, you're welcome back on the Cube anytime, okay? >> We will try, John. >> That's all we want, that's all we ask. >> You guys, your calls are still being screened. (laughter) >> John Thomas, thank you for joining us, we appreciate that. >> Thank you. >> Our panel discussion coming up at 4:00 Eastern time. Live here on the Cube, we're in New York City. Be back in a bit. (upbeat music)

Published Date : Nov 1 2017

SUMMARY :

Brought to you by IMB. John, thank you for your time, good to see you. I know, in fact, you just wrote this morning And we can talk through each of these if you like. It's what we need is another hybrid something, right? of where you deploy that model. What is the technical enabler to allow that to happen? And that is not easy to do that, you know? and then what you need is a seamless experience So the other piece of that architecture is And the tooling, as you know, changes, I feel like daily. the framework that you are the most comfortable with. And the decision to move, whether it's on-prem and security edicts, not that the security in the Cloud is Yeah, yeah, yeah, you're not moving the entire data. I wonder if clients as you about this. So you need governance whether you are and theories that you have. Kind of the latest and greatest that you're seeing. I don't know if you call help desks now. So as the customer calls in, can you understand and go, let me get you an operator. What about things, you know, because they're moving the signal from the noise, you know. I want to ask you about, come back to the use cases In the last 12 months. No, it's not that the bad guys have gone down in number. and how do you do that efficiently, right? I talk to my wife about Israel and then next thing I know, I don't know about what Google's doing. So as you talk on the phone, with a very high accuracy, so if you can help us with that, You guys, your calls are still being screened. Live here on the Cube, we're in New York City.

ENTITIES

Entity	Category	Confidence
Dave Vellente	PERSON	0.99+
John	PERSON	0.99+
John Thomas	PERSON	0.99+
Dave	PERSON	0.99+
IBM	ORGANIZATION	0.99+
John Walls	PERSON	0.99+
Israel	LOCATION	0.99+
Google	ORGANIZATION	0.99+
New York City	LOCATION	0.99+
Siri	TITLE	0.99+
ZOS	TITLE	0.99+
today	DATE	0.99+
Linux	TITLE	0.99+
One example	QUANTITY	0.99+
Python	TITLE	0.99+
thousands	QUANTITY	0.99+
One	QUANTITY	0.99+
Scala	TITLE	0.99+
Spark	TITLE	0.98+
tens of thousands	QUANTITY	0.98+
this morning	DATE	0.98+
each	QUANTITY	0.98+
IMB	ORGANIZATION	0.96+
one	QUANTITY	0.96+
TensorFlow	TITLE	0.95+
millions	QUANTITY	0.95+
About a year ago	DATE	0.95+
first	QUANTITY	0.94+
one class	QUANTITY	0.92+
Z.	TITLE	0.91+
4:00 Eastern time	DATE	0.9+
decades	QUANTITY	0.9+
6:00 tonight	DATE	0.9+
CICS	ORGANIZATION	0.9+
about a year ago	DATE	0.89+
second	QUANTITY	0.88+
two-day event	QUANTITY	0.86+
three different things	QUANTITY	0.85+
last 12 months	DATE	0.84+
IBM Data Science	ORGANIZATION	0.82+
Cloud	TITLE	0.8+
R	TITLE	0.78+
past 20 minutes	DATE	0.77+
Cube	COMMERCIAL_ITEM	0.75+
a hundred	QUANTITY	0.72+
one end	QUANTITY	0.7+
seven years	QUANTITY	0.69+
features	QUANTITY	0.69+
couple	QUANTITY	0.67+
last six	DATE	0.66+
few milliseconds	QUANTITY	0.63+
last few years	DATE	0.59+
x86	QUANTITY	0.55+
IBM.com	ORGANIZATION	0.53+
SLA	ORGANIZATION	0.49+

James Bellenger, Twitter | Node Summit 2017

>> Hey welcome back everybody. Jeff Frick, with the Cube. We're at Node Summit 2017 in downtown San Francisco. About 800 people, developers talking about Node and Node GS. And really the crazy adoption of Node as a development platform. Enterprise adoption. Everything's up and to the right. Some crazy good stories. And we're excited to have somebody coming right off his keynote. It's James Bellenger. He is an engineer at Twitter. James, welcome. >> Thank you, thank you for having me. >> Yeah, absolutely. So you just got off stage and you were talking all about Twitter Lite. What is Twitter Lite? I like Twitter as it is. >> Ah, so Twitter Lite is an optimized, it's a mobile web app. So if you pull up your phone, open up the web browser and go to twitter.com, in your smart phone web browser, you get a Twitter experience that we're calling Twitter Lite. >> Okay. >> And it used to be a little bit out of date. But we've been able to update it using a lot of new exciting web technologies. And so now we have this thing that feels very much like a native web app. >> Okay. >> They call them progressive web apps these days. And so we're using that as sort of a way to sort of compete in areas and markets where maybe a native apps are less able to compete. Where you know, people don't want to download a 200 megabyte iOS app. They want something that fits under 600 kilobytes. >> Okay. So you had the Twitter Lite app before. And then this was really a re-deployment? Or am I getting it wrong? >> I think, well we had We had a web app at mobile.twitter.com. >> Okay. >> And it was just sort of the mobile web app. >> Okay. >> But you know we sort of really rewrote everything. And that includes the back end on Node. And then we're now sort of pushing that and calling it Twitter Lite. >> Okay. And when did that go live or GA? >> About three months ago. >> Three months ago, okay. Super. So obviously you're here at Node. You just spoke at Node. You know, how was the experience using a Node tool set versus whatever you had it built on before? >> It's definitely faster in every way. Well, I mean, >> Faster in every way. That's a good thing. >> So well, let me Let me catch that. Be more specific. It is ... >> It's those benchmarking people. We need them back over here. >> It is very fast for how we apply it. It's really fast for development speed. And perhaps the biggest win is that on both sort of areas of our stack whether it's the part of the application that runs on the browser or it's the part of the application that runs inside the Twitter data center. We have one language and technology. So when a problem comes up and an engineer needs to like go and find the problem and fix it they don't need to sort of "Oh, well that's server code. "I don't know how it works. "And it's written in this language I don't understand." We really just have one application and it happens to run in both places. And so it really improves engineering efficiency. >> And you saw that in the development process, QA and the ongoing. >> Yeah. >> Yeah. And was it more ... So it's more like the guys that were more front end that now have access to the back end and then the other way around. Is that correct? Yeah, it's a little bit of both. >> Okay. >> You know, I think before I think there's people that they really like Scala. And they only want to work in Scala. Or there's people that really don't like it. So you end up, I think, having engineers kind of get bulkanized by their technology choices, and their preferred systems. But I think it really sort of tears down a couple walls. And so it makes, it improves engineering efficiency that way. But we found also that some of the tool sets and the tool chains that we're using allow engineers to just sort of like move faster. >> Right. >> So you know, whether that's like recompiling the service in like one second. Instead of having to wait for multiple minutes. There's just sort of less time spent waiting. >> Right. And in terms of don't share anything you're not supposed to share but in terms of, you know, frequency of releases and ongoing maintenance and kind of the development of the I won't say the app, not the app. I guess it is the app. Going forward, you know, how has that been impacted by moving to this platform? >> I think it might be too early to say. >> Okay. >> We've, you know, right now we've got about 12 to 15 engineers and we're ramping up. And it, I think it might, we're kind of looking to finish around 25 engineers, by the end of the year. >> Okay. >> So the sort of team and contributor base of the kind of like core team that are working on the app is growing. But you know, otherwise there's, you know, we're releasing every day. We're, you know, we try to you know, we're always pushing code. We're running experiments a lot. >> Right. I don't know if that answers your question but. >> So it sounds like it's a little easier but you're still doing everything you were doing before but now it just feels like it's easier because of this. >> Well, you know, talk to me in a couple months. >> Okay. >> Then maybe we'll have some better answers for you. >> Okay. So the other thing I want, if I talk to you in a couple months, I talk to you a year from now, just in terms of as you look down the road, you know, what this opens up. You know, kind of what are some of your priorities now that you've got it out. You said you've been out there for three months. What's kind of next on your roadmap, your horizon? >> So far, I think we've been really encouraged by the success of using this stack for development. So we're looking to kind of double down on that. >> Okay. >> So that means looking at some of the other Twitter web apps. Oh, sorry, Twitter apps in general. The other ways people use Twitter. And to sort of look at how they were built. And to see, because we're using React, and because we're using, I think technologies that make it very easy to you know, be responsive and you know, either be have a wide layout or a very narrow layout, or work offline. We have a lot of potential to sort of cannibalize or replace and also update some of the existing apps >> Right. >> That maybe don't get the attention that they need. >> Right. >> So there's some of that. And then I think Twitter Lite as a product I think that we're going, you know, we're looking to really expand it's reach. And make a big push in some of the developing areas. >> Yeah. Because the other thing people don't know, I mean, Twitter's acquired a bunch of companies, you know, over the years. So we've heard some examples earlier today, where that's a use case when you do have the opportunity to maybe redo an acquired application. You know, that those are kind of natural opportunities to look to redo them with this method. >> Yeah. Sure. >> All right. Cool. Well, James, thanks for taking a few minutes. >> Thank you. >> Congratulations on the talk. And I'll think of you next time I go to Twitter Lite. >> Yeah. Thank you so much. >> All righty. He's James Bellenger from Twitter. I'm Jeff Frick. You're watching the Cube from Node Summit 2017. Thanks for watching. (techno music)

Published Date : Jul 28 2017

SUMMARY :

And really the crazy adoption of Node So you just got off stage and you were talking all about So if you pull up your phone, open up the web browser And it used to be a little bit out of date. And so we're using that as sort of a way to And then this was really a re-deployment? I think, well we had And that includes the back end on Node. a Node tool set versus whatever you had it built on before? It's definitely faster in every way. Faster in every way. So well, let me We need them back over here. And perhaps the biggest win is that on both And you saw that in the development process, QA So it's more like the guys that were more front end that So you end up, I think, having So you know, whether that's like recompiling the service in terms of, you know, frequency of releases and And it, I think it might, we're kind of looking to finish But you know, otherwise there's, you know, I don't know if that answers your question but. So it sounds like it's a little easier but Well, you know, I talk to you a year from now, So we're looking to kind of double down on that. So that means looking at some of the other And make a big push in some of the developing areas. you know, over the years. Well, James, thanks for taking a few minutes. And I'll think of you next time I go to Twitter Lite. I'm Jeff Frick.

ENTITIES

Entity	Category	Confidence
Tim Yokum	PERSON	0.99+
Jeff Frick	PERSON	0.99+
Brian	PERSON	0.99+
Anna	PERSON	0.99+
James Bellenger	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Dave Valante	PERSON	0.99+
James	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
three months	QUANTITY	0.99+
16 times	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
Python	TITLE	0.99+
mobile.twitter.com	OTHER	0.99+
Influx Data	ORGANIZATION	0.99+
iOS	TITLE	0.99+
Twitter	ORGANIZATION	0.99+
30,000 feet	QUANTITY	0.99+
Russ Foundation	ORGANIZATION	0.99+
Scala	TITLE	0.99+
Twitter Lite	TITLE	0.99+
two rows	QUANTITY	0.99+
200 megabyte	QUANTITY	0.99+
Node	TITLE	0.99+
Three months ago	DATE	0.99+
one application	QUANTITY	0.99+
both places	QUANTITY	0.99+
each row	QUANTITY	0.99+
Par K	TITLE	0.99+
Anais Dotis Georgiou	PERSON	0.99+
one language	QUANTITY	0.98+
first one	QUANTITY	0.98+
15 engineers	QUANTITY	0.98+
Anna East Otis Georgio	PERSON	0.98+
both	QUANTITY	0.98+
one second	QUANTITY	0.98+
25 engineers	QUANTITY	0.98+
About 800 people	QUANTITY	0.98+
sql	TITLE	0.98+
Node Summit 2017	EVENT	0.98+
two temperature values	QUANTITY	0.98+
one times	QUANTITY	0.98+
c plus plus	TITLE	0.97+
Rust	TITLE	0.96+
SQL	TITLE	0.96+
today	DATE	0.96+
Influx	ORGANIZATION	0.95+
under 600 kilobytes	QUANTITY	0.95+
first	QUANTITY	0.95+
c plus plus	TITLE	0.95+
Apache	ORGANIZATION	0.95+
par K	TITLE	0.94+
React	TITLE	0.94+
Russ	ORGANIZATION	0.94+
About three months ago	DATE	0.93+
8:30 AM Pacific time	DATE	0.93+
twitter.com	OTHER	0.93+
last decade	DATE	0.93+
Node	ORGANIZATION	0.92+
Hadoop	TITLE	0.9+
InfluxData	ORGANIZATION	0.89+
c c plus plus	TITLE	0.89+
Cube	ORGANIZATION	0.89+
each column	QUANTITY	0.88+
InfluxDB	TITLE	0.86+
Influx DB	TITLE	0.86+
Mozilla	ORGANIZATION	0.86+
DB IOx	TITLE	0.85+

Rob Thomas, IBM Analytics | IBM Fast Track Your Data 2017

>> Announcer: Live from Munich, Germany, it's theCUBE. Covering IBM: Fast Track Your Data. Brought to you by IBM. >> Welcome, everybody, to Munich, Germany. This is Fast Track Your Data brought to you by IBM, and this is theCUBE, the leader in live tech coverage. We go out to the events, we extract the signal from the noise. My name is Dave Vellante, and I'm here with my co-host Jim Kobielus. Rob Thomas is here, he's the General Manager of IBM Analytics, and longtime CUBE guest, good to see you again, Rob. >> Hey, great to see you. Thanks for being here. >> Dave: You're welcome, thanks for having us. So we're talking about, we missed each other last week at the Hortonworks DataWorks Summit, but you came on theCUBE, you guys had the big announcement there. You're sort of getting out, doing a Hadoop distribution, right? TheCUBE gave up our Hadoop distributions several years ago so. It's good that you joined us. But, um, that's tongue-in-cheek. Talk about what's going on with Hortonworks. You guys are now going to be partnering with them essentially to replace BigInsights, you're going to continue to service those customers. But there's more than that. What's that announcement all about? >> We're really excited about that announcement, that relationship, just to kind of recap for those that didn't see it last week. We are making a huge partnership with Hortonworks, where we're bringing data science and machine learning to the Hadoop community. So IBM will be adopting HDP as our distribution, and that's what we will drive into the market from a Hadoop perspective. Hortonworks is adopting IBM Data Science Experience and IBM machine learning to be a core part of their Hadoop platform. And I'd say this is a recognition. One is, companies should do what they do best. We think we're great at data science and machine learning. Hortonworks is the best at Hadoop. Combine those two things, it'll be great for clients. And, we also talked about extending that to things like Big SQL, where they're partnering with us on Big SQL, around modernizing data environments. And then third, which relates a little bit to what we're here in Munich talking about, is governance, where we're partnering closely with them around unified governance, Apache Atlas, advancing Atlas in the enterprise. And so, it's a lot of dimensions to the relationship, but I can tell you since I was on theCUBE a week ago with Rob Bearden, client response has been amazing. Rob and I have done a number of client visits together, and clients see the value of unlocking insights in their Hadoop data, and they love this, which is great. >> Now, I mean, the Hadoop distro, I mean early on you got into that business, just, you had to do it. You had to be relevant, you want to be part of the community, and a number of folks did that. But it's really sort of best left to a few guys who want to do that, and Apache open source is really, I think, the way to go there. Let's talk about Munich. You guys chose this venue. There's a lot of talk about GDPR, you've got some announcements around unified government, but why Munich? >> So, there's something interesting that I see happening in the market. So first of all, you look at the last five years. There's only 10 companies in the world that have outperformed the S&P 500, in each of those five years. And we started digging into who those companies are and what they do. They are all applying data science and machine learning at scale to drive their business. And so, something's happening in the market. That's what leaders are doing. And I look at what's happening in Europe, and I say, I don't see the European market being that aggressive yet around data science, machine learning, how you apply data for competitive advantage, so we wanted to come do this in Munich. And it's a bit of a wake-up call, almost, to say hey, this is what's happening. We want to encourage clients across Europe to think about how do they start to do something now. >> Yeah, of course, GDPR is also a hook. The European Union and you guys have made some talk about that, you've got some keynotes today, and some breakout sessions that are discussing that, but talk about the two announcements that you guys made. There's one on DB2, there's another one around unified governance, what do those mean for clients? >> Yeah, sure, so first of all on GDPR, it's interesting to me, it's kind of the inverse of Y2K, which is there's very little hype, but there's huge ramifications. And Y2K was kind of the opposite. So look, it's coming, May 2018, clients have to be GDPR-compliant. And there's a misconception in the market that that only impacts companies in Europe. It actually impacts any company that does any type of business in Europe. So, it impacts everybody. So we are announcing a platform for unified governance that makes sure clients are GDPR-compliant. We've integrated software technology across analytics, IBM security, some of the assets from the Promontory acquisition that IBM did last year, and we are delivering the only platform for unified governance. And that's what clients need to be GDPR-compliant. The second piece is data has to become a lot simpler. As you think about my comment, who's leading the market today? Data's hard, and so we're trying to make data dramatically simpler. And so for example, with DB2, what we're announcing is you can download and get started using DB2 in 15 minutes or less, and anybody can do it. Even you can do it, Dave, which is amazing. >> Dave: (laughs) >> For the first time ever, you can-- >> We'll test that, Rob. >> Let's go test that. I would love to see you do it, because I guarantee you can. Even my son can do it. I had my son do it this weekend before I came here, because I wanted to see how simple it was. So that announcement is really about bringing, or introducing a new era of simplicity to data and analytics. We call it Download And Go. We started with SPSS, we did that back in March. Now we're bringing Download And Go to DB2, and to our governance catalog. So the idea is make data really simple for enterprises. >> You had a community edition previous to this, correct? There was-- >> Rob: We did, but it wasn't this easy. >> Wasn't this simple, okay. >> Not anybody could do it, and I want to make it so anybody can do it. >> Is simplicity, the rate of simplicity, the only differentiator of the latest edition, or I believe you have Kubernetes support now with this new addition, can you describe what that involves? >> Yeah, sure, so there's two main things that are new functionally-wise, Jim, to your point. So one is, look, we're big supporters of Kubernetes. And as we are helping clients build out private clouds, the best answer for that in our mind is Kubernetes, and so when we released Data Science Experience for Private Cloud earlier this quarter, that was on Kubernetes, extending that now to other parts of the portfolio. The other thing we're doing with DB2 is we're extending JSON support for DB2. So think of it as, you're working in a relational environment, now just through SQL you can integrate with non-relational environments, JSON, documents, any type of no-SQL environment. So we're finally bringing to fruition this idea of a data fabric, which is I can access all my data from a single interface, and that's pretty powerful for clients. >> Yeah, more cloud data development. Rob, I wonder if you can, we can go back to the machine learning, one of the core focuses of this particular event and the announcements you're making. Back in the fall, IBM made an announcement of Watson machine learning, for IBM Cloud, and World of Watson. In February, you made an announcement of IBM machine learning for the z platform. What are the machine learning announcements at this particular event, and can you sort of connect the dots in terms of where you're going, in terms of what sort of innovations are you driving into your machine learning portfolio going forward? >> I have a fundamental belief that machine learning is best when it's brought to the data. So, we started with, like you said, Watson machine learning on IBM Cloud, and then we said well, what's the next big corpus of data in the world? That's an easy answer, it's the mainframe, that's where all the world's transactional data sits, so we did that. Last week with the Hortonworks announcement, we said we're bringing machine learning to Hadoop, so we've kind of covered all the landscape of where data is. Now, the next step is about how do we bring a community into this? And the way that you do that is we don't dictate a language, we don't dictate a framework. So if you want to work with IBM on machine learning, or in Data Science Experience, you choose your language. Python, great. Scala or Java, you pick whatever language you want. You pick whatever machine learning framework you want, we're not trying to dictate that because there's different preferences in the market, so what we're really talking about here this week in Munich is this idea of an open platform for data science and machine learning. And we think that is going to bring a lot of people to the table. >> And with open, one thing, with open platform in mind, one thing to me that is conspicuously missing from the announcement today, correct me if I'm wrong, is any indication that you're bringing support for the deep learning frameworks like TensorFlow into this overall machine learning environment. Am I wrong? I know you have Power AI. Is there a piece of Power AI in these announcements today? >> So, stay tuned on that. We are, it takes some time to do that right, and we are doing that. But we want to optimize so that you can do machine learning with GPU acceleration on Power AI, so stay tuned on that one. But we are supporting multiple frameworks, so if you want to use TensorFlow, that's great. If you want to use Caffe, that's great. If you want to use Theano, that's great. That is our approach here. We're going to allow you to decide what's the best framework for you. >> So as you look forward, maybe it's a question for you, Jim, but Rob I'd love you to chime in. What does that mean for businesses? I mean, is it just more automation, more capabilities as you evolve that timeline, without divulging any sort of secrets? What do you think, Jim? Or do you want me to ask-- >> What do I think, what do I think you're doing? >> No, you ask about deep learning, like, okay, that's, I don't see that, Rob says okay, stay tuned. What does it mean for a business, that, if like-- >> Yeah. >> If I'm planning my roadmap, what does that mean for me in terms of how I should think about the capabilities going forward? >> Yeah, well what it means for a business, first of all, is what they're going, they're using deep learning for, is doing things like video analytics, and speech analytics and more of the challenges involving convolution of neural networks to do pattern recognition on complex data objects for things like connected cars, and so forth. Those are the kind of things that can be done with deep learning. >> Okay. And so, Rob, you're talking about here in Europe how the uptick in some of the data orientation has been a little bit slower, so I presume from your standpoint you don't want to over-rotate, to some of these things. But what do you think, I mean, it sounds like there is difference between certainly Europe and those top 10 companies in the S&P, outperforming the S&P 500. What's the barrier, is it just an understanding of how to take advantage of data, is it cultural, what's your sense of this? >> So, to some extent, data science is easy, data culture is really hard. And so I do think that culture's a big piece of it. And the reason we're kind of starting with a focus on machine learning, simplistic view, machine learning is a general-purpose framework. And so it invites a lot of experimentation, a lot of engagement, we're trying to make it easier for people to on-board. As you get to things like deep learning as Jim's describing, that's where the market's going, there's no question. Those tend to be very domain-specific, vertical-type use cases and to some extent, what I see clients struggle with, they say well, I don't know what my use case is. So we're saying, look, okay, start with the basics. A general purpose framework, do some tests, do some iteration, do some experiments, and once you find out what's hunting and what's working, then you can go to a deep learning type of approach. And so I think you'll see an evolution towards that over time, it's not either-or. It's more of a question of sequencing. >> One of the things we've talked to you about on theCUBE in the past, you and others, is that IBM obviously is a big services business. This big data is complicated, but great for services, but one of the challenges that IBM and other companies have had is how do you take that service expertise, codify it to software and scale it at large volumes and make it adoptable? I thought the Watson data platform announcement last fall, I think at the time you called it Data Works, and then so the name evolved, was really a strong attempt to do that, to package a lot of expertise that you guys had developed over the years, maybe even some different software modules, but bring them together in a scalable software package. So is that the right interpretation, how's that going, what's the uptake been like? >> So, it's going incredibly well. What's interesting to me is what everybody remembers from that announcement is the Watson Data Platform, which is a decomposable framework for doing these types of use cases on the IBM cloud. But there was another piece of that announcement that is just as critical, which is we introduced something called the Data First method. And that is the recipe book to say to a client, so given where you are, how do you get to this future on the cloud? And that's the part that people, clients, struggle with, is how do I get from step to step? So with Data First, we said, well look. There's different approaches to this. You can start with governance, you can start with data science, you can start with data management, you can start with visualization, there's different entry points. You figure out the right one for you, and then we help clients through that. And we've made Data First method available to all of our business partners so they can go do that. We work closely with our own consulting business on that, GBS. But that to me is actually the thing from that event that has had, I'd say, the biggest impact on the market, is just helping clients map out an approach, a methodology, to getting on this journey. >> So that was a catalyst, so this is not a sequential process, you can start, you can enter, like you said, wherever you want, and then pick up the other pieces from majority model standpoint? Exactly, because everybody is at a different place in their own life cycle, and so we want to make that flexible. >> I have a question about the clients, the customers' use of Watson Data Platform in a DevOps context. So, are more of your customers looking to use Watson Data Platform to automate more of the stages of the machine learning development and the training and deployment pipeline, and do you see, IBM, do you see yourself taking the platform and evolving it into a more full-fledged automated data science release pipelining tool? Or am I misunderstanding that? >> Rob: No, I think that-- >> Your strategy. >> Rob: You got it right, I would just, I would expand a little bit. So, one is it's a very flexible way to manage data. When you look at the Watson Data Platform, we've got relational stores, we've got column stores, we've got in-memory stores, we've got the whole suite of open-source databases under the composed-IO umbrella, we've got cloud in. So we've delivered a very flexible data layer. Now, in terms of how you apply data science, we say, again, choose your model, choose your language, choose your framework, that's up to you, and we allow clients, many clients start by building models on their private cloud, then we say you can deploy those into the Watson Data Platform, so therefore then they're running on the data that you have as part of that data fabric. So, we're continuing to deliver a very fluid data layer which then you can apply data science, apply machine learning there, and there's a lot of data moving into the Watson Data Platform because clients see that flexibility. >> All right, Rob, we're out of time, but I want to kind of set up the day. We're doing CUBE interviews all morning here, and then we cut over to the main tent. You can get all of this on IBMgo.com, you'll see the schedule. Rob, you've got, you're kicking off a session. We've got Hilary Mason, we've got a breakout session on GDPR, maybe set up the main tent for us. >> Yeah, main tent's going to be exciting. We're going to debunk a lot of misconceptions about data and about what's happening. Marc Altshuller has got a great segment on what he calls the death of correlations, so we've got some pretty engaging stuff. Hilary's got a great piece that she was talking to me about this morning. It's going to be interesting. We think it's going to provoke some thought and ultimately provoke action, and that's the intent of this week. >> Excellent, well Rob, thanks again for coming to theCUBE. It's always a pleasure to see you. >> Rob: Thanks, guys, great to see you. >> You're welcome; all right, keep it right there, buddy, We'll be back with our next guest. This is theCUBE, we're live from Munich, Fast Track Your Data, right back. (upbeat electronic music)

Published Date : Jun 22 2017

SUMMARY :

Brought to you by IBM. This is Fast Track Your Data brought to you by IBM, Hey, great to see you. It's good that you joined us. and machine learning to the Hadoop community. You had to be relevant, you want to be part of the community, So first of all, you look at the last five years. but talk about the two announcements that you guys made. Even you can do it, Dave, which is amazing. I would love to see you do it, because I guarantee you can. but it wasn't this easy. and I want to make it so anybody can do it. extending that now to other parts of the portfolio. What are the machine learning announcements at this And the way that you do that is we don't dictate I know you have Power AI. We're going to allow you to decide So as you look forward, maybe it's a question No, you ask about deep learning, like, okay, that's, and speech analytics and more of the challenges But what do you think, I mean, it sounds like And the reason we're kind of starting with a focus One of the things we've talked to you about on theCUBE And that is the recipe book to say to a client, process, you can start, you can enter, and deployment pipeline, and do you see, IBM, models on their private cloud, then we say you can deploy and then we cut over to the main tent. and that's the intent of this week. It's always a pleasure to see you. This is theCUBE, we're live from Munich,

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Dave Vellante	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Jim	PERSON	0.99+
Europe	LOCATION	0.99+
Rob	PERSON	0.99+
Marc Altshuller	PERSON	0.99+
Hilary	PERSON	0.99+
Hilary Mason	PERSON	0.99+
Rob Bearden	PERSON	0.99+
February	DATE	0.99+
Dave	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Rob Thomas	PERSON	0.99+
May 2018	DATE	0.99+
March	DATE	0.99+
Munich	LOCATION	0.99+
Scala	TITLE	0.99+
Apache	ORGANIZATION	0.99+
second piece	QUANTITY	0.99+
Last week	DATE	0.99+
Java	TITLE	0.99+
last year	DATE	0.99+
two announcements	QUANTITY	0.99+
10 companies	QUANTITY	0.99+
GDPR	TITLE	0.99+
Python	TITLE	0.99+
DB2	TITLE	0.99+
15 minutes	QUANTITY	0.99+
last week	DATE	0.99+
IBM Analytics	ORGANIZATION	0.99+
European Union	ORGANIZATION	0.99+
five years	QUANTITY	0.99+
JSON	TITLE	0.99+
Watson Data Platform	TITLE	0.99+
third	QUANTITY	0.99+
One	QUANTITY	0.99+
this week	DATE	0.98+
today	DATE	0.98+
a week ago	DATE	0.98+
two things	QUANTITY	0.98+
SQL	TITLE	0.98+
last fall	DATE	0.98+
2017	DATE	0.98+
Munich, Germany	LOCATION	0.98+
each	QUANTITY	0.98+
Y2K	ORGANIZATION	0.98+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Scala: