Sri Satish Ambati, H20.ai | CUBE Conversation, May 2020
>> connecting with thought leaders all around the world, this is a CUBE Conversation. Hi, everybody this is Dave Vellante of theCUBE, and welcome back to my CXO series. I've been running this through really since the start of the COVID-19 crisis to really understand how leaders are dealing with this pandemic. Sri Ambati is here, he's the CEO and founder of H20. Sri, it's great to see you again, thanks for coming on. >> Thank you for having us. >> Yeah, so this pandemic has obviously given people fits, no question, but it's also given opportunities for companies to kind of reassess where they are. Automation is a huge watchword, flexibility, business resiliency and people who maybe really hadn't fully leaned into things like the cloud and AI and automation are now realizing, wow, we have no choice, it's about survival. Your thought as to what you're seeing in the marketplace. >> Thanks for having us. I think first of all, kudos to the frontline health workers who have been ruthlessly saving lives across the country and the world, and what you're really doing is a fraction of what we could have done or should be doing to stay away the next big pandemic. But that apart I think, I usually tend to say BC is before COVID. So if the world was thinking about going digital after COVID-19, they have been forced to go digital and as a result, you're seeing tremendous transformation across our customers, and a lot of application to kind of go in and reinvent their business models that allow them to scale as effortlessly as they could using the digital means. >> So, think about, doctors and diagnosis machines, in some cases, are helping doctors make diagnoses, they're sometimes making even better diagnosis, (mumbles) is informing. There's been a lot of talk about the models, you know how... Yeah, I know you've been working with a lot of healthcare organizations, you may probably familiar with that, you know, the Medium post, The Hammer and the Dance, and if people criticize the models, of course, they're just models, right? And you iterate models and machine intelligence can help us improve. So, in this, you know, you talk about BC and post C, how have you seen the data and in machine intelligence informing the models and proving that what we know about this pandemic, I mean, it changed literally daily, what are you seeing? >> Yeah, and I think it started with Wuhan and we saw the best application of AI in trying to trace, literally from Alipay, to WeChat, track down the first folks who were spreading it across China and then eventually the rest of the world. I think contact tracing, for example, has become a really interesting problem. supply chain has been disrupted like never before. We're beginning to see customers trying to reinvent their distribution mechanisms in the second order effects of the COVID, and the the prime center is hospital staffing, how many ventilator, is the first few weeks so that after COVID crisis as it evolved in the US. We are busy predicting working with some of the local healthcare communities to predict how staffing in hospitals will work, how many PPE and ventilators will be needed and so henceforth, but that quickly and when the peak surge will be those with the beginning problems, and many of our customers have begin to do these models and iterate and improve and kind of educate the community to practice social distancing, and that led to a lot of flattening the curve and you're talking flattening the curve, you're really talking about data science and analytics in public speak. That led to kind of the next level, now that we have somewhat brought a semblance of order to the reaction to COVID, I think what we are beginning to figure out is, is there going to be a second surge, what elective procedures that were postponed, will be top of the mind for customers, and so this is the kind of things that hospitals are beginning to plan out for the second half of the year, and as businesses try to open up, certain things were highly correlated to surgeon cases, such as cleaning supplies, for example, the obvious one or pantry buying. So retailers are beginning to see what online stores are doing well, e-commerce, online purchases, electronic goods, and so everyone essentially started working from home, and so homes needed to have the same kind of bandwidth that offices and commercial enterprises needed to have, and so a lot of interesting, as one side you saw airlines go away, this side you saw the likes of Zoom and video take off. So you're kind of seeing a real divide in the digital divide and that's happening and AI is here to play a very good role to figure out how to enhance your profitability as you're looking about planning out the next two years. >> Yeah, you know, and obviously, these things they get, they get partisan, it gets political, I mean, our job as an industry is to report, your job is to help people understand, I mean, let the data inform and then let public policy you know, fight it out. So who are some of the people that you're working with that you know, as a result of COVID-19. What's some of the work that H2O has done, I want to better understand what role are you playing? >> So one of the things we're kind of privileged as a company to come into the crisis, with a strong balance and an ability to actually have the right kind of momentum behind the company in terms of great talent, and so we have 10% of the world's top data scientists in the in the form of Kaggle Grand Masters in the company. And so we put most of them to work, and they started collecting data sets, curating data sets and making them more qualitative, picking up public data sources, for example, there's a tremendous amount of job loss out there, figuring out which are the more difficult kind of sectors in the economy and then we started looking at exodus from the cities, we're looking at mobility data that's publicly available, mobility data through the data exchanges, you're able to find which cities which rural areas, did the New Yorkers as they left the city, which places did they go to, and what's to say, Californians when they left Los Angeles, which are the new places they have settled in? These are the places which are now busy places for the same kind of items that you need to sell if you're a retailer, but if you go one step further, we started engaging with FEMA, we start engaging with the universities, like Imperial College London or Berkeley, and started figuring out how best to improve the models and automate them. The SEER model, the most popular SEER model, we added that into our Driverless AI product as a recipe and made that accessible to our customers in testing, to customers in healthcare who are trying to predict where the surge is likely to come. But it's mostly about information right? So the AI at the end of it is all about intelligence and being prepared. Predictive is all about being prepared and that's kind of what we did with general, lots of blogs, typical blog articles and working with the largest health organizations and starting to kind of inform them on the most stable models. What we found to our not so much surprise, is that the simplest, very interpretable models are actually the most widely usable, because historical data is actually no longer as effective. You need to build a model that you can quickly understand and retry again to the feedback loop of back testing that model against what really happened. >> Yeah, so I want to double down on that. So really, two things I want to understand, if you have visibility on it, sounds like you do. Just in terms of the surge and the comeback, you know, kind of what those models say, based upon, you know, we have some advanced information coming from the global market, for sure, but it seems like every situation is different. What's the data telling you? Just in terms of, okay, we're coming into the spring and the summer months, maybe it'll come down a little bit. Everybody says it... We fully expect it to come back in the fall, go back to college, don't go back to college. What is the data telling you at this point in time with an understanding that, you know, we're still iterating every day? >> Well, I think I mean, we're not epidemiologists, but at the same time, the science of it is a highly local response, very hyper local response to COVID-19 is what we've seen. Santa Clara, which is just a county, I mean, is different from San Francisco, right, sort of. So you beginning to see, like we saw in Brooklyn, it's very different, and Bronx, very different from Manhattan. So you're seeing a very, very local response to this disease, and I'm talking about US. You see the likes of Brazil, which we're worried about, has picked up quite a bit of cases now. I think the silver lining I would say is that China is up and running to a large degree, a large number of our user base there are back active, you can see the traffic patterns there. So two months after their last research cases, the business and economic activity is back and thriving. And so, you can kind of estimate from that, that this can be done where you can actually contain the rise of active cases and it will take masking of the entire community, masking and the healthy dose of increase in testing. One of our offices is in Prague, and Czech Republic has done an incredible job in trying to contain this and they've done essentially, masked everybody and as a result they're back thinking about opening offices, schools later this month. So I think that's a very, very local response, hyper local response, no one country and no one community is symmetrical with other ones and I think we have a unique situation where in United States you have a very, very highly connected world, highly connected economy and I think we have quite a problem on our hands on how to safeguard our economy while also safeguarding life. >> Yeah, so you can't just, you can't just take Norway and apply it or South Korea and apply it, every situation is different. And then I want to ask you about, you know, the economy in terms of, you know, how much can AI actually, you know, how can it work in this situation where you have, you know, for example, okay, so the Fed, yes, it started doing asset buys back in 2008 but still, very hard to predict, I mean, at this time of this interview you know, Stock Market up 900 points, very difficult to predict that but some event happens in the morning, somebody, you know, Powell says something positive and it goes crazy but just sort of even modeling out the V recovery, the W recovery, deep recession, the comeback. You have to have enough data, do you not? In order for AI to be reasonably accurate? How does it work? And how does at what pace can you iterate and improve on the models? >> So I think that's exactly where I would say, continuous modeling, instead of continuously learning continuous, that's where the vision of the world is headed towards, where data is coming, you build a model, and then you iterate, try it out and come back. That kind of rapid, continuous learning would probably be needed for all our models as opposed to the typical, I'm pushing a model to production once a year, or once every quarter. I think what we're beginning to see is the kind of where companies are beginning to kind of plan out. A lot of people lost their jobs in the last couple of months, right, sort of. And so up scaling and trying to kind of bring back these jobs back both into kind of, both from the manufacturing side, but also lost a lot of jobs in the transportation and the kind of the airlines slash hotel industries, right, sort of. So it's trying to now bring back the sense of confidence and will take a lot more kind of testing, a lot more masking, a lot more social empathy, I think well, some of the things that we are missing while we are socially distant, we know that we are so connected as a species, we need to kind of start having that empathy for we need to wear a mask, not for ourselves, but for our neighbors and people we may run into. And I think that kind of, the same kind of thinking has to kind of parade, before we can open up the economy in a big way. The data, I mean, we can do a lot of transfer learning, right, sort of there are new methods, like try to model it, similar to the 1918, where we had a second bump, or a lot of little bumps, and that's kind of where your W shaped pieces, but governments are trying very well in seeing stimulus dollars being pumped through banks. So some of the US case we're looking for banks is, which small medium business in especially, in unsecured lending, which business to lend to, (mumbles) there's so many applications that have come to banks across the world, it's not just in the US, and banks are caught up with the problem of which and what's growing the concern for this business to kind of, are they really accurate about the number of employees they are saying they have? Do then the next level problem or on forbearance and mortgage, that side of the things are coming up at some of these banks as well. So they're looking at which, what's one of the problems that one of our customers Wells Fargo, they have a question which branch to open, right, sort of that itself, it needs a different kind of modeling. So everything has become a very highly good segmented models, and so AI is absolutely not just a good to have, it has become a must have for most of our customers in how to go about their business. (mumbles) >> I want to talk a little bit about your business, you have been on a mission to democratize AI since the beginning, open source. Explain your business model, how you guys make money and then I want to help people understand basic theoretical comparisons and current affairs. >> Yeah, that's great. I think the last time we spoke, probably about at the Spark Summit. I think Dave and we were talking about Sparkling Water and H2O our open source platforms, which are premium platforms for democratizing machine learning and math at scale, and that's been a tremendous brand for us. Over the last couple of years, we have essentially built a platform called Driverless AI, which is a license software and that automates machine learning models, we took the best practices of all these data scientists, and combined them to essentially build recipes that allow people to build the best forecasting models, best fraud prevention models or the best recommendation engines, and so we started augmenting traditional data scientists with this automatic machine learning called AutoML, that essentially allows them to build models without necessarily having the same level of talent as these great Kaggle Grand Masters. And so that has democratized, allowed ordinary companies to start producing models of high caliber and high quality that would otherwise have been the pedigree of Google, Microsoft or Amazon or some of these top tier AI houses like Netflix and others. So what we've done is democratize not just the algorithms at the open source level. Now, we've made it easy for kind of rapid adoption of AI across every branch inside a company, a large organization, also across smaller organizations which don't have the access to the same kind of talent. Now, third level, you know, what we've brought to market, is ability to augment data sets, especially public and private data sets that you can, the alternative data sets that can increase the signal. And that's where we've started working on a new platform called Q, again, more license software, and I mean, to give you an idea there from business models endpoint, now majority of our software sales is coming from closed source software. And sort of so, we've made that transition, we still make our open source widely accessible, we continue to improve it, a large chunk of the teams are improving and participating in building the communities but I think from a business model standpoint as of last year, 51% of our revenues are now coming from closed source software and that change is continuing to grow. >> And this is the point I wanted to get to, so you know, the open source model was you know, Red Hat the one company that, you know, succeeded wildly and it was, put it out there open source, come up with a service, maintain the software, you got to buy the subscription okay, fine. And everybody thought that you know, you were going to do that, they thought that Databricks was going to do and that changed. But I want to take two examples, Hortonworks which kind of took the Red Hat model and Cloudera which does IP. And neither really lived up to the expectation, but now there seems to be sort of a new breed I mentioned, you guys, Databricks, there are others, that seem to be working. You with your license software model, Databricks with a managed service and so there's, it's becoming clear that there's got to be some level of IP that can be licensed in order to really thrive in the open source community to be able to fund the committers that you have to put forth to open source. I wonder if you could give me your thoughts on that narrative. >> So on Driverless AI, which is the closest platform I mentioned, we opened up the layers in open source as recipes. So for example, different companies build their zip codes differently, right, the domain specific recipes, we put about 150 of them in open source again, on top of our Driverless AI platform, and the idea there is that, open source is about freedom, right? It is not necessarily about, it's not a philosophy, it's not a business model, it allows freedom for rapid adoption of a platform and complete democratization and commodification of a space. And that allows a small company like ours to compete at the level of an SaaS or a Google or a Microsoft because you have the same level of voice as a very large company and you're focused on using code as a community building exercise as opposed to a business model, right? So that's kind of the heart of open source, is allowing that freedom for our end users and the customers to kind of innovate at the same level of that a Silicon Valley company or one of these large tech giants are building software. So it's really about making, it's a maker culture, as opposed to a consumer culture around software. Now, if you look at how the the Red Hat model, and the others who have tried to replicate that, the difficult part there was, if the product is very good, customers are self sufficient and if it becomes a standard, then customers know how to use it. If the product is crippled or difficult to use, then you put a lot of services and that's where you saw the classic Hadoop companies, get pulled into a lot of services, which is a reasonably difficult business to scale. So I think what we chose was, instead, a great product that builds a fantastic brand, that makes AI, even when other first or second.ai domain, and for us to see thousands of companies which are not AI and AI first, and even more companies adopting AI and talking about AI as a major way that was possible because of open source. If you had chosen close source and many of your peers did, they all vanished. So that's kind of how the open source is really about building the ecosystem and having the patience to build a company that takes 10, 20 years to build. And what we are expecting unfortunately, is a first and fast rise up to become unicorns. In that race, you're essentially sacrifice, building a long ecosystem play, and that's kind of what we chose to do, and that took a little longer. Now, if you think about the, how do you truly monetize open source, it takes a little longer and is much more difficult sales machine to scale, right, sort of. Our open source business actually is reasonably positive EBITDA business because it makes more money than we spend on it. But trying to teach sales teams, how to sell open source, that's a much, that's a rate limiting step. And that's why we chose and also explaining to the investors, how open source is being invested in as you go closer to the IPO markets, that's where we chose, let's go into license software model and scale that as a regular business. >> So I've said a few times, it's kind of like ironic that, this pandemic is as we're entering a new decade, you know, we've kind of we're exiting the era, I mean, the many, many decades of Moore's law being the source of innovation and now it's a combination of data, applying machine intelligence and being able to scale and with cloud. Well, my question is, what did we expect out of AI this decade if those are sort of the three, the cocktail of innovation, if you will, what should we expect? Is it really just about, I suggest, is it really about automating, you know, businesses, giving them more agility, flexibility, you know, etc. Or should we should we expect more from AI this decade? >> Well, I mean, if you think about the decade of 2010 2011, that was defined by software is eating the world, right? And now you can say software is the world, right? I mean, pretty much almost all conditions are digital. And AI is eating software, right? (mumbling) A lot of cloud transitions are happening and are now happening much faster rate but cloud and AI are kind of the leading, AI is essentially one of the biggest driver for cloud adoption for many of our customers. So in the enterprise world, you're seeing rebuilding of a lot of data, fast data driven applications that use AI, instead of rule based software, you're beginning to see patterned, mission AI based software, and you're seeing that in spades. And, of course, that is just the tip of the iceberg, AI has been with us for 100 years, and it's going to be ahead of us another hundred years, right, sort of. So as you see the discovery rate at which, it is really a fundamentally a math, math movement and in that math movement at the beginning of every century, it leads to 100 years of phenomenal discovery. So AI is essentially making discoveries faster, AI is producing, entertainment, AI is producing music, AI is producing choreographing, you're seeing AI in every walk of life, AI summarization of Zoom meetings, right, you beginning to see a lot of the AI enabled ETF peaking of stocks, right, sort of. You're beginning to see, we repriced 20,000 bonds every 15 seconds using H2O AI, corporate bonds. And so you and one of our customers is on the fastest growing stock, mostly AI is powering a lot of these insights in a fast changing world which is globally connected. No one of us is able to combine all the multiple dimensions that are changing and AI has that incredible opportunity to be a partner for every... (mumbling) For a hospital looking at how the second half will look like for physicians looking at what is the sentiment of... What is the surge to expect? To kind of what is the market demand looking at the sentiment of the customers. AI is the ultimate money ball in business and then I think it's just showing its depth at this point. >> Yeah, I mean, I think you're right on, I mean, basically AI is going to convert every software, every application, or those tools aren't going to have much use, Sri we got to go but thanks so much for coming to theCUBE and the great work you guys are doing. Really appreciate your insights. stay safe, and best of luck to you guys. >> Likewise, thank you so much. >> Welcome, and thank you for watching everybody, this is Dave Vellante for the CXO series on theCUBE. We'll see you next time. All right, we're clear. All right.
SUMMARY :
Sri, it's great to see you Your thought as to what you're and a lot of application and if people criticize the models, and kind of educate the community and then let public policy you know, and starting to kind of inform them What is the data telling you of the entire community, and improve on the models? and the kind of the airlines and then I want to help people understand and I mean, to give you an idea there in the open source community to be able and the customers to kind of innovate and being able to scale and with cloud. What is the surge to expect? and the great work you guys are doing. Welcome, and thank you
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave | PERSON | 0.99+ |
2008 | DATE | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
Wells Fargo | ORGANIZATION | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
ORGANIZATION | 0.99+ | |
San Francisco | LOCATION | 0.99+ |
Prague | LOCATION | 0.99+ |
Brooklyn | LOCATION | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
51% | QUANTITY | 0.99+ |
May 2020 | DATE | 0.99+ |
China | LOCATION | 0.99+ |
United States | LOCATION | 0.99+ |
100 years | QUANTITY | 0.99+ |
Bronx | LOCATION | 0.99+ |
Databricks | ORGANIZATION | 0.99+ |
Manhattan | LOCATION | 0.99+ |
US | LOCATION | 0.99+ |
Santa Clara | LOCATION | 0.99+ |
last year | DATE | 0.99+ |
10% | QUANTITY | 0.99+ |
20,000 bonds | QUANTITY | 0.99+ |
Imperial College London | ORGANIZATION | 0.99+ |
Hortonworks | ORGANIZATION | 0.99+ |
One | QUANTITY | 0.99+ |
COVID-19 | OTHER | 0.99+ |
Los Angeles | LOCATION | 0.99+ |
Netflix | ORGANIZATION | 0.99+ |
H20 | ORGANIZATION | 0.99+ |
Red Hat | ORGANIZATION | 0.99+ |
South Korea | LOCATION | 0.99+ |
Sri Satish Ambati | PERSON | 0.99+ |
thousands | QUANTITY | 0.99+ |
FEMA | ORGANIZATION | 0.99+ |
Brazil | LOCATION | 0.99+ |
second half | QUANTITY | 0.99+ |
first | QUANTITY | 0.99+ |
second surge | QUANTITY | 0.99+ |
two months | QUANTITY | 0.99+ |
one | QUANTITY | 0.98+ |
second bump | QUANTITY | 0.98+ |
two things | QUANTITY | 0.98+ |
H2O | ORGANIZATION | 0.98+ |
both | QUANTITY | 0.98+ |
Czech Republic | LOCATION | 0.98+ |
Silicon Valley | LOCATION | 0.98+ |
TITLE | 0.98+ | |
three | QUANTITY | 0.98+ |
hundred years | QUANTITY | 0.98+ |
once a year | QUANTITY | 0.97+ |
Powell | PERSON | 0.97+ |
Sparkling Water | ORGANIZATION | 0.97+ |
Alipay | TITLE | 0.97+ |
Norway | LOCATION | 0.97+ |
pandemic | EVENT | 0.97+ |
second order | QUANTITY | 0.97+ |
third level | QUANTITY | 0.97+ |
first folks | QUANTITY | 0.97+ |
COVID-19 crisis | EVENT | 0.96+ |
Fed | ORGANIZATION | 0.95+ |
1918 | DATE | 0.95+ |
later this month | DATE | 0.95+ |
one side | QUANTITY | 0.94+ |
Sri Ambati | PERSON | 0.94+ |
two examples | QUANTITY | 0.93+ |
Moore | PERSON | 0.92+ |
Californians | PERSON | 0.92+ |
CXO | TITLE | 0.92+ |
last couple of months | DATE | 0.92+ |
COVID | OTHER | 0.91+ |
Spark Summit | EVENT | 0.91+ |
one step | QUANTITY | 0.91+ |
The Hammer | TITLE | 0.9+ |
COVID crisis | EVENT | 0.87+ |
every 15 seconds | QUANTITY | 0.86+ |
Sri Satish Ambati, H20.ai | CUBE Conversation, May 2020
>> Starting the record, Dave in five, four, three. Hi, everybody this is Dave Vellante, theCUBE, and welcome back to my CXO series. I've been running this through really since the start of the COVID-19 crisis to really understand how leaders are dealing with this pandemic. Sri Ambati is here, he's the CEO and founder of H20. Sri, it's great to see you again, thanks for coming on. >> Thank you for having us. >> Yeah, so this pandemic has obviously given people fits, no question, but it's also given opportunities for companies to kind of reassess where they are. Automation is a huge watchword, flexibility, business resiliency and people who maybe really hadn't fully leaned into things like the cloud and AI and automation are now realizing, wow, we have no choice, it's about survival. Your thought as to what you're seeing in the marketplace. >> Thanks for having us. I think first of all, kudos to the frontline health workers who have been ruthlessly saving lives across the country and the world, and what you're really doing is a fraction of what we could have done or should be doing to stay away the next big pandemic. But that apart I think, I usually tend to say BC is before COVID. So if the world was thinking about going digital after COVID-19, they have been forced to go digital and as a result, you're seeing tremendous transformation across our customers, and a lot of application to kind of go in and reinvent their business models that allow them to scale as effortlessly as they could using the digital means. >> So, think about, doctors and diagnosis machines, in some cases, are helping doctors make diagnoses, they're sometimes making even better diagnosis, (mumbles) is informing. There's been a lot of talk about the models, you know how... Yeah, I know you've been working with a lot of healthcare organizations, you may probably familiar with that, you know, the Medium post, The Hammer and the Dance, and if people criticize the models, of course, they're just models, right? And you iterate models and machine intelligence can help us improve. So, in this, you know, you talk about BC and post C, how have you seen the data and in machine intelligence informing the models and proving that what we know about this pandemic, I mean, it changed literally daily, what are you seeing? >> Yeah, and I think it started with Wuhan and we saw the best application of AI in trying to trace, literally from Alipay, to WeChat, track down the first folks who were spreading it across China and then eventually the rest of the world. I think contact tracing, for example, has become a really interesting problem. supply chain has been disrupted like never before. We're beginning to see customers trying to reinvent their distribution mechanisms in the second order effects of the COVID, and the the prime center is hospital staffing, how many ventilator, is the first few weeks so that after COVID crisis as it evolved in the US. We are busy predicting working with some of the local healthcare communities to predict how staffing in hospitals will work, how many PPE and ventilators will be needed and so henceforth, but that quickly and when the peak surge will be those with the beginning problems, and many of our customers have begin to do these models and iterate and improve and kind of educate the community to practice social distancing, and that led to a lot of flattening the curve and you're talking flattening the curve, you're really talking about data science and analytics in public speak. That led to kind of the next level, now that we have somewhat brought a semblance of order to the reaction to COVID, I think what we are beginning to figure out is, is there going to be a second surge, what elective procedures that were postponed, will be top of the mind for customers, and so this is the kind of things that hospitals are beginning to plan out for the second half of the year, and as businesses try to open up, certain things were highly correlated to surgeon cases, such as cleaning supplies, for example, the obvious one or pantry buying. So retailers are beginning to see what online stores are doing well, e-commerce, online purchases, electronic goods, and so everyone essentially started working from home, and so homes needed to have the same kind of bandwidth that offices and commercial enterprises needed to have, and so a lot of interesting, as one side you saw airlines go away, this side you saw the likes of Zoom and video take off. So you're kind of seeing a real divide in the digital divide and that's happening and AI is here to play a very good role to figure out how to enhance your profitability as you're looking about planning out the next two years. >> Yeah, you know, and obviously, these things they get, they get partisan, it gets political, I mean, our job as an industry is to report, your job is to help people understand, I mean, let the data inform and then let public policy you know, fight it out. So who are some of the people that you're working with that you know, as a result of COVID-19. What's some of the work that H2O has done, I want to better understand what role are you playing? >> So one of the things we're kind of privileged as a company to come into the crisis, with a strong balance and an ability to actually have the right kind of momentum behind the company in terms of great talent, and so we have 10% of the world's top data scientists in the in the form of Kaggle Grand Masters in the company. And so we put most of them to work, and they started collecting data sets, curating data sets and making them more qualitative, picking up public data sources, for example, there's a tremendous amount of job loss out there, figuring out which are the more difficult kind of sectors in the economy and then we started looking at exodus from the cities, we're looking at mobility data that's publicly available, mobility data through the data exchanges, you're able to find which cities which rural areas, did the New Yorkers as they left the city, which places did they go to, and what's to say, Californians when they left Los Angeles, which are the new places they have settled in? These are the places which are now busy places for the same kind of items that you need to sell if you're a retailer, but if you go one step further, we started engaging with FEMA, we start engaging with the universities, like Imperial College London or Berkeley, and started figuring out how best to improve the models and automate them. The SaaS model, the most popular SaaS model, we added that into our Driverless AI product as a recipe and made that accessible to our customers in testing, to customers in healthcare who are trying to predict where the surge is likely to come. But it's mostly about information right? So the AI at the end of it is all about intelligence and being prepared. Predictive is all about being prepared and that's kind of what we did with general, lots of blogs, typical blog articles and working with the largest health organizations and starting to kind of inform them on the most stable models. What we found to our not so much surprise, is that the simplest, very interpretable models are actually the most widely usable, because historical data is actually no longer as effective. You need to build a model that you can quickly understand and retry again to the feedback loop of back testing that model against what really happened. >> Yeah, so I want to double down on that. So really, two things I want to understand, if you have visibility on it, sounds like you do. Just in terms of the surge and the comeback, you know, kind of what those models say, based upon, you know, we have some advanced information coming from the global market, for sure, but it seems like every situation is different. What's the data telling you? Just in terms of, okay, we're coming into the spring and the summer months, maybe it'll come down a little bit. Everybody says it... We fully expect it to come back in the fall, go back to college, don't go back to college. What is the data telling you at this point in time with an understanding that, you know, we're still iterating every day? >> Well, I think I mean, we're not epidemiologists, but at the same time, the science of it is a highly local response, very hyper local response to COVID-19 is what we've seen. Santa Clara, which is just a county, I mean, is different from San Francisco, right, sort of. So you beginning to see, like we saw in Brooklyn, it's very different, and Bronx, very different from Manhattan. So you're seeing a very, very local response to this disease, and I'm talking about US. You see the likes of Brazil, which we're worried about, has picked up quite a bit of cases now. I think the silver lining I would say is that China is up and running to a large degree, a large number of our user base there are back active, you can see the traffic patterns there. So two months after their last research cases, the business and economic activity is back and thriving. And so, you can kind of estimate from that, that this can be done where you can actually contain the rise of active cases and it will take masking of the entire community, masking and the healthy dose of increase in testing. One of our offices is in Prague, and Czech Republic has done an incredible job in trying to contain this and they've done essentially, masked everybody and as a result they're back thinking about opening offices, schools later this month. So I think that's a very, very local response, hyper local response, no one country and no one community is symmetrical with other ones and I think we have a unique situation where in United States you have a very, very highly connected world, highly connected economy and I think we have quite a problem on our hands on how to safeguard our economy while also safeguarding life. >> Yeah, so you can't just, you can't just take Norway and apply it or South Korea and apply it, every situation is different. And then I want to ask you about, you know, the economy in terms of, you know, how much can AI actually, you know, how can it work in this situation where you have, you know, for example, okay, so the Fed, yes, it started doing asset buys back in 2008 but still, very hard to predict, I mean, at this time of this interview you know, Stock Market up 900 points, very difficult to predict that but some event happens in the morning, somebody, you know, Powell says something positive and it goes crazy but just sort of even modeling out the V recovery, the W recovery, deep recession, the comeback. You have to have enough data, do you not? In order for AI to be reasonably accurate? How does it work? And how does at what pace can you iterate and improve on the models? >> So I think that's exactly where I would say, continuous modeling, instead of continuously learning continuous, that's where the vision of the world is headed towards, where data is coming, you build a model, and then you iterate, try it out and come back. That kind of rapid, continuous learning would probably be needed for all our models as opposed to the typical, I'm pushing a model to production once a year, or once every quarter. I think what we're beginning to see is the kind of where companies are beginning to kind of plan out. A lot of people lost their jobs in the last couple of months, right, sort of. And so up scaling and trying to kind of bring back these jobs back both into kind of, both from the manufacturing side, but also lost a lot of jobs in the transportation and the kind of the airlines slash hotel industries, right, sort of. So it's trying to now bring back the sense of confidence and will take a lot more kind of testing, a lot more masking, a lot more social empathy, I think well, some of the things that we are missing while we are socially distant, we know that we are so connected as a species, we need to kind of start having that empathy for we need to wear a mask, not for ourselves, but for our neighbors and people we may run into. And I think that kind of, the same kind of thinking has to kind of parade, before we can open up the economy in a big way. The data, I mean, we can do a lot of transfer learning, right, sort of there are new methods, like try to model it, similar to the 1918, where we had a second bump, or a lot of little bumps, and that's kind of where your W shaped pieces, but governments are trying very well in seeing stimulus dollars being pumped through banks. So some of the US case we're looking for banks is, which small medium business in especially, in unsecured lending, which business to lend to, (mumbles) there's so many applications that have come to banks across the world, it's not just in the US, and banks are caught up with the problem of which and what's growing the concern for this business to kind of, are they really accurate about the number of employees they are saying they have? Do then the next level problem or on forbearance and mortgage, that side of the things are coming up at some of these banks as well. So they're looking at which, what's one of the problems that one of our customers Wells Fargo, they have a question which branch to open, right, sort of that itself, it needs a different kind of modeling. So everything has become a very highly good segmented models, and so AI is absolutely not just a good to have, it has become a must have for most of our customers in how to go about their business. (mumbles) >> I want to talk a little bit about your business, you have been on a mission to democratize AI since the beginning, open source. Explain your business model, how you guys make money and then I want to help people understand basic theoretical comparisons and current affairs. >> Yeah, that's great. I think the last time we spoke, probably about at the Spark Summit. I think Dave and we were talking about Sparkling Water and H2O or open source platforms, which are premium platforms for democratizing machine learning and math at scale, and that's been a tremendous brand for us. Over the last couple of years, we have essentially built a platform called Driverless AI, which is a license software and that automates machine learning models, we took the best practices of all these data scientists, and combined them to essentially build recipes that allow people to build the best forecasting models, best fraud prevention models or the best recommendation engines, and so we started augmenting traditional data scientists with this automatic machine learning called AutoML, that essentially allows them to build models without necessarily having the same level of talent as these Greek Kaggle Grand Masters. And so that has democratized, allowed ordinary companies to start producing models of high caliber and high quality that would otherwise have been the pedigree of Google, Microsoft or Amazon or some of these top tier AI houses like Netflix and others. So what we've done is democratize not just the algorithms at the open source level. Now, we've made it easy for kind of rapid adoption of AI across every branch inside a company, a large organization, also across smaller organizations which don't have the access to the same kind of talent. Now, third level, you know, what we've brought to market, is ability to augment data sets, especially public and private data sets that you can, the alternative data sets that can increase the signal. And that's where we've started working on a new platform called Q, again, more license software, and I mean, to give you an idea there from business models endpoint, now majority of our software sales is coming from closed source software. And sort of so, we've made that transition, we still make our open source widely accessible, we continue to improve it, a large chunk of the teams are improving and participating in building the communities but I think from a business model standpoint as of last year, 51% of our revenues are now coming from closed source software and that change is continuing to grow. >> And this is the point I wanted to get to, so you know, the open source model was you know, Red Hat the one company that, you know, succeeded wildly and it was, put it out there open source, come up with a service, maintain the software, you got to buy the subscription okay, fine. And everybody thought that you know, you were going to do that, they thought that Databricks was going to do and that changed. But I want to take two examples, Hortonworks which kind of took the Red Hat model and Cloudera which does IP. And neither really lived up to the expectation, but now there seems to be sort of a new breed I mentioned, you guys, Databricks, there are others, that seem to be working. You with your license software model, Databricks with a managed service and so there's, it's becoming clear that there's got to be some level of IP that can be licensed in order to really thrive in the open source community to be able to fund the committers that you have to put forth to open source. I wonder if you could give me your thoughts on that narrative. >> So on Driverless AI, which is the closest platform I mentioned, we opened up the layers in open source as recipes. So for example, different companies build their zip codes differently, right, the domain specific recipes, we put about 150 of them in open source again, on top of our Driverless AI platform, and the idea there is that, open source is about freedom, right? It is not necessarily about, it's not a philosophy, it's not a business model, it allows freedom for rapid adoption of a platform and complete democratization and commodification of a space. And that allows a small company like ours to compete at the level of an SaaS or a Google or a Microsoft because you have the same level of voice as a very large company and you're focused on using code as a community building exercise as opposed to a business model, right? So that's kind of the heart of open source, is allowing that freedom for our end users and the customers to kind of innovate at the same level of that a Silicon Valley company or one of these large tech giants are building software. So it's really about making, it's a maker culture, as opposed to a consumer culture around software. Now, if you look at how the the Red Hat model, and the others who have tried to replicate that, the difficult part there was, if the product is very good, customers are self sufficient and if it becomes a standard, then customers know how to use it. If the product is crippled or difficult to use, then you put a lot of services and that's where you saw the classic Hadoop companies, get pulled into a lot of services, which is a reasonably difficult business to scale. So I think what we chose was, instead, a great product that builds a fantastic brand, that makes AI, even when other first or second.ai domain, and for us to see thousands of companies which are not AI and AI first, and even more companies adopting AI and talking about AI as a major way that was possible because of open source. If you had chosen close source and many of your peers did, they all vanished. So that's kind of how the open source is really about building the ecosystem and having the patience to build a company that takes 10, 20 years to build. And what we are expecting unfortunately, is a first and fast rise up to become unicorns. In that race, you're essentially sacrifice, building a long ecosystem play, and that's kind of what we chose to do, and that took a little longer. Now, if you think about the, how do you truly monetize open source, it takes a little longer and is much more difficult sales machine to scale, right, sort of. Our open source business actually is reasonably positive EBITDA business because it makes more money than we spend on it. But trying to teach sales teams, how to sell open source, that's a much, that's a rate limiting step. And that's why we chose and also explaining to the investors, how open source is being invested in as you go closer to the IPO markets, that's where we chose, let's go into license software model and scale that as a regular business. >> So I've said a few times, it's kind of like ironic that, this pandemic is as we're entering a new decade, you know, we've kind of we're exiting the era, I mean, the many, many decades of Moore's law being the source of innovation and now it's a combination of data, applying machine intelligence and being able to scale and with cloud. Well, my question is, what did we expect out of AI this decade if those are sort of the three, the cocktail of innovation, if you will, what should we expect? Is it really just about, I suggest, is it really about automating, you know, businesses, giving them more agility, flexibility, you know, etc. Or should we should we expect more from AI this decade? >> Well, I mean, if you think about the decade of 2010 2011, that was defined by software is eating the world, right? And now you can say software is the world, right? I mean, pretty much almost all conditions are digital. And AI is eating software, right? (mumbling) A lot of cloud transitions are happening and are now happening much faster rate but cloud and AI are kind of the leading, AI is essentially one of the biggest driver for cloud adoption for many of our customers. So in the enterprise world, you're seeing rebuilding of a lot of data, fast data driven applications that use AI, instead of rule based software, you're beginning to see patterned, mission AI based software, and you're seeing that in spades. And, of course, that is just the tip of the iceberg, AI has been with us for 100 years, and it's going to be ahead of us another hundred years, right, sort of. So as you see the discovery rate at which, it is really a fundamentally a math, math movement and in that math movement at the beginning of every century, it leads to 100 years of phenomenal discovery. So AI is essentially making discoveries faster, AI is producing, entertainment, AI is producing music, AI is producing choreographing, you're seeing AI in every walk of life, AI summarization of Zoom meetings, right, you beginning to see a lot of the AI enabled ETF peaking of stocks, right, sort of. You're beginning to see, we repriced 20,000 bonds every 15 seconds using H2O AI, corporate bonds. And so you and one of our customers is on the fastest growing stock, mostly AI is powering a lot of these insights in a fast changing world which is globally connected. No one of us is able to combine all the multiple dimensions that are changing and AI has that incredible opportunity to be a partner for every... (mumbling) For a hospital looking at how the second half will look like for physicians looking at what is the sentiment of... What is the surge to expect? To kind of what is the market demand looking at the sentiment of the customers. AI is the ultimate money ball in business and then I think it's just showing its depth at this point. >> Yeah, I mean, I think you're right on, I mean, basically AI is going to convert every software, every application, or those tools aren't going to have much use, Sri we got to go but thanks so much for coming to theCUBE and the great work you guys are doing. Really appreciate your insights. stay safe, and best of luck to you guys. >> Likewise, thank you so much. >> Welcome, and thank you for watching everybody, this is Dave Vellante for the CXO series on theCUBE. We'll see you next time. All right, we're clear. All right.
SUMMARY :
Sri, it's great to see you Your thought as to what you're and a lot of application and if people criticize the models, and kind of educate the community and then let public policy you know, is that the simplest, What is the data telling you of the entire community, and improve on the models? and the kind of the airlines and then I want to help people understand and I mean, to give you an idea there in the open source community to be able and the customers to kind of innovate and being able to scale and with cloud. What is the surge to expect? and the great work you guys are doing. Welcome, and thank you
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Wells Fargo | ORGANIZATION | 0.99+ |
Dave | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
2008 | DATE | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
five | QUANTITY | 0.99+ |
San Francisco | LOCATION | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
ORGANIZATION | 0.99+ | |
Brooklyn | LOCATION | 0.99+ |
Prague | LOCATION | 0.99+ |
China | LOCATION | 0.99+ |
Bronx | LOCATION | 0.99+ |
100 years | QUANTITY | 0.99+ |
May 2020 | DATE | 0.99+ |
Manhattan | LOCATION | 0.99+ |
51% | QUANTITY | 0.99+ |
US | LOCATION | 0.99+ |
Brazil | LOCATION | 0.99+ |
Databricks | ORGANIZATION | 0.99+ |
United States | LOCATION | 0.99+ |
COVID-19 | OTHER | 0.99+ |
10% | QUANTITY | 0.99+ |
20,000 bonds | QUANTITY | 0.99+ |
Los Angeles | LOCATION | 0.99+ |
last year | DATE | 0.99+ |
H20 | ORGANIZATION | 0.99+ |
Imperial College London | ORGANIZATION | 0.99+ |
Silicon Valley | LOCATION | 0.99+ |
one | QUANTITY | 0.99+ |
four | QUANTITY | 0.99+ |
Santa Clara | LOCATION | 0.99+ |
One | QUANTITY | 0.99+ |
hundred years | QUANTITY | 0.99+ |
Hortonworks | ORGANIZATION | 0.99+ |
Netflix | ORGANIZATION | 0.99+ |
Sri Satish Ambati | PERSON | 0.99+ |
South Korea | LOCATION | 0.99+ |
three | QUANTITY | 0.99+ |
second half | QUANTITY | 0.99+ |
two things | QUANTITY | 0.99+ |
Red Hat | ORGANIZATION | 0.99+ |
both | QUANTITY | 0.98+ |
second surge | QUANTITY | 0.98+ |
first | QUANTITY | 0.98+ |
H2O | ORGANIZATION | 0.98+ |
third level | QUANTITY | 0.98+ |
once a year | QUANTITY | 0.98+ |
Sparkling Water | ORGANIZATION | 0.98+ |
FEMA | ORGANIZATION | 0.98+ |
TITLE | 0.98+ | |
pandemic | EVENT | 0.98+ |
Powell | PERSON | 0.97+ |
COVID-19 crisis | EVENT | 0.97+ |
second bump | QUANTITY | 0.97+ |
Czech Republic | LOCATION | 0.96+ |
second order | QUANTITY | 0.96+ |
1918 | DATE | 0.96+ |
Norway | LOCATION | 0.96+ |
Fed | ORGANIZATION | 0.95+ |
first folks | QUANTITY | 0.94+ |
thousands of companies | QUANTITY | 0.94+ |
two examples | QUANTITY | 0.91+ |
10, 20 years | QUANTITY | 0.91+ |
COVID | OTHER | 0.91+ |
CXO | TITLE | 0.91+ |
two months | QUANTITY | 0.91+ |
last couple of months | DATE | 0.9+ |
Moore | PERSON | 0.9+ |
later this month | DATE | 0.9+ |
Alipay | TITLE | 0.89+ |
Sri Ambati | PERSON | 0.88+ |
every 15 seconds | QUANTITY | 0.88+ |
COVID crisis | EVENT | 0.86+ |
Californians | PERSON | 0.85+ |
Driverless | TITLE | 0.84+ |
Christian Keynote with Disclaimer
(upbeat music) >> Hi everyone, thank you for joining us at the Data Cloud Summit. The last couple of months have been an exciting time at Snowflake. And yet, what's even more compelling to all of us at Snowflake is what's ahead. Today I have the opportunity to share new product developments that will extend the reach and impact of our Data Cloud and improve the experience of Snowflake users. Our product strategy is focused on four major areas. First, Data Cloud content. In the Data Cloud silos are eliminated and our vision is to bring the world's data within reach of every organization. You'll hear about new data sets and data services available in our data marketplace and see how previous barriers to sourcing and unifying data are eliminated. Second, extensible data pipelines. As you gain frictionless access to a broader set of data through the Data Cloud, Snowflakes platform brings additional capabilities and extensibility to your data pipelines, simplifying data ingestion, and transformation. Third, data governance. The Data Cloud eliminates silos and breaks down barriers and in a world where data collaboration is the norm, the importance of data governance is ratified and elevated. We'll share new advancements to support how the world's most demanding organizations mobilize your data while maintaining high standards of compliance and governance. Finally, our fourth area focuses on platform performance and capabilities. We remain laser focused on continuing to lead with the most performant and capable data platform. We have some exciting news to share about the core engine of Snowflake. As always, we love showing you Snowflake in action, and we prepared some demos for you. Also, we'll keep coming back to the fact that one of the characteristics of Snowflake that we're proud as staff is that we offer a single platform from which you can operate all of your data workloads, across clouds and across regions, which workloads you may ask, specifically, data warehousing, data lake, data science, data engineering, data applications, and data sharing. Snowflake makes it possible to mobilize all your data in service of your business without the cost, complexity and overhead of managing multiple systems, tools and vendors. Let's dive in. As you heard from Frank, the Data Cloud offers a unique capability to connect organizations and create collaboration and innovation across industries fueled by data. The Snowflake data marketplace is the gateway to the Data Cloud, providing visibility for organizations to browse and discover data that can help them make better decisions. For data providers on the marketplace, there is a new opportunity to reach new customers, create new revenue streams, and radically decrease the effort and time to data delivery. Our marketplace dramatically reduces the friction of sharing and collaborating with data opening up new possibilities to all participants in the Data Cloud. We introduced the Snowflake data marketplace in 2019. And it is now home to over 100 data providers, with half of them having joined the marketplace in the last four months. Since our most recent product announcements in June, we have continued broadening the availability of the data marketplace, across regions and across clouds. Our data marketplace provides the opportunity for data providers to reach consumers across cloud and regional boundaries. A critical aspect of the Data Cloud is that we envisioned organizations collaborating not just in terms of data, but also data powered applications and services. Think of instances where a provider doesn't want to open access to the entirety of a data set, but wants to provide access to business logic that has access and leverages such data set. That is what we call data services. And we want Snowflake to be the platform of choice for developing discovering and consuming such rich building blocks. To see How the data marketplace comes to live, and in particular one of these data services, let's jump into a demo. For all of our demos today, we're going to put ourselves in the shoes of a fictional global insurance company. We've called it Insureco. Insurance is a data intensive and highly regulated industry. Having the right access control and insight from data is core to every insurance company's success. I'm going to turn it over to Prasanna to show how the Snowflake data marketplace can solve a data discoverability and access problem. >> Let's look at how Insureco can leverage data and data services from the Snowflake data marketplace and use it in conjunction with its own data in the Data Cloud to do three things, better detect fraudulent claims, arm its agents with the right information, and benchmark business health against competition. Let's start with detecting fraudulent claims. I'm an analyst in the Claims Department. I have auto claims data in my account. I can see there are 2000 auto claims, many of these submitted by auto body shops. I need to determine if they are valid and legitimate. In particular, could some of these be insurance fraud? By going to the Snowflake data marketplace where numerous data providers and data service providers can list their offerings, I find the quantifying data service. It uses a combination of external data sources and predictive risk typology models to inform the risk level of an organization. Quantifying external sources include sanctions and blacklists, negative news, social media, and real time search engine results. That's a wealth of data and models built on that data which we don't have internally. So I'd like to use Quantifind to determine a fraud risk score for each auto body shop that has submitted a claim. First, the Snowflake data marketplace made it really easy for me to discover a data service like this. Without the data marketplace, finding such a service would be a lengthy ad hoc process of doing web searches and asking around. Second, once I find Quantifind, I can use Quantifind service against my own data in three simple steps using data sharing. I create a table with the names and addresses of auto body shops that have submitted claims. I then share the table with Quantifind to start the risk assessment. Quantifind does the risk scoring and shares the data back with me. Quantifind uses external functions which we introduced in June to get results from their risk prediction models. Without Snowflake data sharing, we would have had to contact Quantifind to understand what format they wanted the data in, then extract this data into a file, FTP the file to Quantifind, wait for the results, then ingest the results back into our systems for them to be usable. Or I would have had to write code to call Quantifinds API. All of that would have taken days. In contrast, with data sharing, I can set this up in minutes. What's more, now that I have set this up, as new claims are added in the future, they will automatically leverage Quantifind's data service. I view the scores returned by Quantifind and see the two entities in my claims data have a high score for insurance fraud risk. I open up the link returned by Quantifind to read more, and find that this organization has been involved in an insurance crime ring. Looks like that is a claim that we won't be approving. Using the Quantifind data service through the Snowflake data marketplace gives me access to a risk scoring capability that we don't have in house without having to call custom APIs. For a provider like Quantifind this drives new leads and monetization opportunities. Now that I have identified potentially fraudulent claims, let's move on to the second part. I would like to share this fraud risk information with the agents who sold the corresponding policies. To do this, I need two things. First, I need to find the agents who sold these policies. Then I need to share with these agents the fraud risk information that we got from Quantifind. But I want to share it such that each agent only sees the fraud risk information corresponding to claims for policies that they wrote. To find agents who sold these policies, I need to look up our Salesforce data. I can find this easily within Insureco's internal data exchange. I see there's a listing with Salesforce data. Our sales Ops team has published this listing so I know it's our officially blessed data set, and I can immediately access it from my Snowflake account without copying any data or having to set up ETL. I can now join Salesforce data with my claims to identify the agents for the policies that were flagged to have fraudulent claims. I also have the Snowflake account information for each agent. Next, I create a secure view that joins on an entitlements table, such that each agent can only see the rows corresponding to policies that they have sold. I then share this directly with the agents. This share contains the secure view that I created with the names of the auto body shops, and the fraud risk identified by Quantifind. Finally, let's move on to the third and last part. Now that I have detected potentially fraudulent claims, I'm going to move on to building a dashboard that our executives have been asking for. They want to see how Insureco compares against other auto insurance companies on key metrics, like total claims paid out for the auto insurance line of business nationwide. I go to the Snowflake data marketplace and find SNL U.S. Insurance Statutory Data from SNP. This data is included with Insureco's existing subscription with SMP so when I request access to it, SMP can immediately share this data with me through Snowflake data sharing. I create a virtual database from the share, and I'm ready to query this data, no ETL needed. And since this is a virtual database, pointing to the original data in SNP Snowflake account, I have access to the latest data as it arrives in SNPs account. I see that the SNL U.S. Insurance Statutory Data from SNP has data on assets, premiums earned and claims paid out by each us insurance company in 2019. This data is broken up by line of business and geography and in many cases goes beyond the data that would be available from public financial filings. This is exactly the data I need. I identify a subset of comparable insurance companies whose net total assets are within 20% of Insureco's, and whose lines of business are similar to ours. I can now create a Snow site dashboard that compares Insureco against similar insurance companies on key metrics, like net earned premiums, and net claims paid out in 2019 for auto insurance. I can see that while we are below median our net earned premiums, we are doing better than our competition on total claims paid out in 2019, which could be a reflection of our improved claims handling and fraud detection. That's a good insight that I can share with our executives. In summary, the Data Cloud enabled me to do three key things. First, seamlessly fine data and data services that I need to do my job, be it an external data service like Quantifind and external data set from SNP or internal data from Insureco's data exchange. Second, get immediate live access to this data. And third, control and manage collaboration around this data. With Snowflake, I can mobilize data and data services across my business ecosystem in just minutes. >> Thank you Prasanna. Now I want to turn our focus to extensible data pipelines. We believe there are two different and important ways of making Snowflakes platform highly extensible. First, by enabling teams to leverage services or business logic that live outside of Snowflake interacting with data within Snowflake. We do this through a feature called external functions, a mechanism to conveniently bring data to where the computation is. We announced this feature for calling regional endpoints via AWS gateway in June, and it's currently available in public preview. We are also now in public preview supporting Azure API management and will soon support Google API gateway and AWS private endpoints. The second extensibility mechanism does the converse. It brings the computation to Snowflake to run closer to the data. We will do this by enabling the creation of functions and procedures in SQL, Java, Scala or Python ultimately providing choice based on the programming language preference for you or your organization. You will see Java, Scala and Python available through private and public previews in the future. The possibilities enabled by these extensibility features are broad and powerful. However, our commitment to being a great platform for data engineers, data scientists and developers goes far beyond programming language. Today, I am delighted to announce Snowpark a family of libraries that will bring a new experience to programming data in Snowflake. Snowpark enables you to write code directly against Snowflake in a way that is deeply integrated into the languages I mentioned earlier, using familiar concepts like DataFrames. But the most important aspect of Snowpark is that it has been designed and optimized to leverage the Snowflake engine with its main characteristics and benefits, performance, reliability, and scalability with near zero maintenance. Think of the power of a declarative SQL statements available through a well known API in Scala, Java or Python, all these against data governed in your core data platform. We believe Snowpark will be transformative for data programmability. I'd like to introduce Sri to showcase how our fictitious insurance company Insureco will be able to take advantage of the Snowpark API for data science workloads. >> Thanks Christian, hi, everyone? I'm Sri Chintala, a product manager at Snowflake focused on extensible data pipelines. And today, I'm very excited to show you a preview of Snowpark. In our first demo, we saw how Insureco could identify potentially fraudulent claims. Now, for all the valid claims InsureCo wants to ensure they're providing excellent customer service. To do that, they put in place a system to transcribe all of their customer calls, so they can look for patterns. A simple thing they'd like to do is detect the sentiment of each call so they can tell which calls were good and which were problematic. They can then better train their claim agents for challenging calls. Let's take a quick look at the work they've done so far. InsureCo's data science team use Snowflakes external functions to quickly and easily train a machine learning model in H2O AI. Snowflake has direct integrations with H2O and many other data science providers giving Insureco the flexibility to use a wide variety of data science libraries frameworks or tools to train their model. Now that the team has a custom trained sentiment model tailored to their specific claims data, let's see how a data engineer at Insureco can use Snowpark to build a data pipeline that scores customer call logs using the model hosted right inside of Snowflake. As you can see, we have the transcribed call logs stored in the customer call logs table inside Snowflake. Now, as a data engineer trained in Scala, and used to working with systems like Spark and Pandas, I want to use familiar programming concepts to build my pipeline. Snowpark solves for this by letting me use popular programming languages like Java or Scala. It also provides familiar concepts in APIs, such as the DataFrame abstraction, optimized to leverage and run natively on the Snowflake engine. So here I am in my ID, where I've written a simple scalar program using the Snowpark libraries. The first step in using the Snowpark API is establishing a session with Snowflake. I use the session builder object and specify the required details to connect. Now, I can create a DataFrame for the data in the transcripts column of the customer call logs table. As you can see, the Snowpark API provides native language constructs for data manipulation. Here, I use the Select method provided by the API to specify the column names to return rather than writing select transcripts as a string. By using the native language constructs provided by the API, I benefit from features like IntelliSense and type checking. Here you can see some of the other common methods that the DataFrame class offers like filters like join and others. Next, I define a get sentiment user defined function that will return a sentiment score for an input string by using our pre trained H2O model. From the UDF, we call the score method that initializes and runs the sentiment model. I've built this helper into a Java file, which along with the model object and license are added as dependencies that Snowpark will send to Snowflake for execution. As a developer, this is all programming that I'm familiar with. We can now call our get sentiment function on the transcripts column of the DataFrame and right back the results of the score transcripts to a new target table. Let's run this code and switch over to Snowflake to see the score data and also all the work that Snowpark has done for us on the back end. If I do a select star from scored logs, we can see the sentiment score of each call right alongside the transcript. With Snowpark all the logic in my program is pushed down into Snowflake. I can see in the query history that Snowpark has created a temporary Java function to host the pre trained H20 model, and that the model is running right in my Snowflake warehouse. Snowpark has allowed us to do something completely new in Snowflake. Let's recap what we saw. With Snowpark, Insureco was able to use their preferred programming language, Scala and use the familiar DataFrame constructs to score data using a machine learning model. With support for Java UDFs, they were able to run a train model natively within Snowflake. And finally, we saw how Snowpark executed computationally intensive data science workloads right within Snowflake. This simplifies Insureco's data pipeline architecture, as it reduces the number of additional systems they have to manage. We hope that extensibility with Scala, Java and Snowpark will enable our users to work with Snowflake in their preferred way while keeping the architecture simple. We are very excited to see how you use Snowpark to extend your data pipelines. Thank you for watching and with that back to you, Christian. >> Thank you Sri. You saw how Sri could utilize Snowpark to efficiently perform advanced sentiment analysis. But of course, if this use case was important to your business, you don't want to fully automate this pipeline and analysis. Imagine being able to do all of the following in Snowflake, your pipeline could start far upstream of what you saw in the demo. By storing your actual customer care call recordings in Snowflake, you may notice that this is new for Snowflake. We'll come back to the idea of storing unstructured data in Snowflake at the end of my talk today. Once you have the data in Snowflake, you can use our streams and past capabilities to call an external function to transcribe these files. To simplify this flow even further, we plan to introduce a serverless execution model for tasks where Snowflake can automatically size and manage resources for you. After this step, you can use the same serverless task to execute sentiment scoring of your transcript as shown in the demo with incremental processing as each transcript is created. Finally, you can surface the sentiment score either via snow side, or through any tool you use to share insights throughout your organization. In this example, you see data being transformed from a raw asset into a higher level of information that can drive business action, all fully automated all in Snowflake. Turning back to Insureco, you know how important data governance is for any major enterprise but particularly for one in this industry. Insurance companies manage highly sensitive data about their customers, and have some of the strictest requirements for storing and tracking such data, as well as managing and governing it. At Snowflake, we think about governance as the ability to know your data, manage your data and collaborate with confidence. As you saw in our first demo, the Data Cloud enables seamless collaboration, control and access to data via the Snowflake data marketplace. And companies may set up their own data exchanges to create similar collaboration and control across their ecosystems. In future releases, we expect to deliver enhancements that create more visibility into who has access to what data and provide usage information of that data. Today, we are announcing a new capability to help Snowflake users better know and organize your data. This is our new tagging framework. Tagging in Snowflake will allow user defined metadata to be attached to a variety of objects. We built a broad and robust framework with powerful implications. Think of the ability to annotate warehouses with cost center information for tracking or think of annotating tables and columns with sensitivity classifications. Our tagging capability will enable the creation of companies specific business annotations for objects in Snowflakes platform. Another key aspect of data governance in Snowflake is our policy based framework where you specify what you want to be true about your data, and Snowflake enforces those policies. We announced one such policy earlier this year, our dynamic data masking capability, which is now available in public preview. Today, we are announcing a great complimentary a policy to achieve row level security to see how role level security can enhance InsureCo's ability to govern and secure data. I'll hand it over to Artin for a demo. >> Hello, I'm Martin Avanes, Director of Product Management for Snowflake. As Christian has already mentioned, the rise of the Data Cloud greatly accelerates the ability to access and share diverse data leading to greater data collaboration across teams and organizations. Controlling data access with ease and ensuring compliance at the same time is top of mind for users. Today, I'm thrilled to announce our new row access policies that will allow users to define various rules for accessing data in the Data Cloud. Let's check back in with Insureco to see some of these in action and highlight how those work with other existing policies one can define in Snowflake. Because Insureco is a multinational company, it has to take extra measures to ensure data across geographic boundaries is protected to meet a wide range of compliance requirements. The Insureco team has been asked to segment what data sales team members have access to based on where they are regionally. In order to make this possible, they will use Snowflakes row access policies to implement row level security. We are going to apply policies for three Insureco's sales team members with different roles. Alice, an executive must be able to view sales data from both North America and Europe. Alex in North America sales manager will be limited to access sales data from North America only. And Jordan, a Europe sales manager will be limited to access sales data from Europe only. As a first step, the security administrator needs to create a lookup table that will be used to determine which data is accessible based on each role. As you can see, the lookup table has the row and their associated region, both of which will be used to apply policies that we will now create. Row access policies are implemented using standard SQL syntax to make it easy for administrators to create policies like the one our administrators looking to implement. And similar to masking policies, row access policies are leveraging our flexible and expressive policy language. In this demo, our admin users to create a row access policy that uses the row and region of a user to determine what row level data they have access to when queries are executed. When users queries are executed against the table protected by such a row access policy, Snowflakes query engine will dynamically generate and apply the corresponding predicate to filter out rows the user is not supposed to see. With the policy now created, let's log in as our Sales Users and see if it worked. Recall that as a sales executive, Alice should have the ability to see all rows from North America and Europe. Sure enough, when she runs her query, she can see all rows so we know the policy is working for her. You may also have noticed that some columns are showing masked data. That's because our administrator's also using our previously announced data masking capabilities to protect these data attributes for everyone in sales. When we look at our other users, we should notice that the same columns are also masked for them. As you see, you can easily combine masking and row access policies on the same data sets. Now let's look at Alex, our North American sales manager. Alex runs to st Korea's Alice, row access policies leverage the lookup table to dynamically generate the corresponding predicates for this query. The result is we see that only the data for North America is visible. Notice too that the same columns are still masked. Finally, let's try Jordan, our European sales manager. Jordan runs the query and the result is only the data for Europe with the same columns also masked. And you reintroduced masking policies, today you saw row access policies in action. And similar to our masking policies, row access policies in Snowflake will be accepted Hands of capability integrated seamlessly across all of Snowflake everywhere you expect it to work it does. If you're accessing data stored in external tables, semi structured JSON data, or building data pipelines via streams or plan to leverage Snowflakes data sharing functionality, you will be able to implement complex row access policies for all these diverse use cases and workloads within Snowflake. And with Snowflakes unique replication feature, you can instantly apply these new policies consistently to all of your Snowflake accounts, ensuring governance across regions and even across different clouds. In the future, we plan to demonstrate how to combine our new tagging capabilities with Snowflakes policies, allowing advanced audit and enforcing those policies with ease. And with that, let's pass it back over to Christian. >> Thank you Artin. We look forward to making this new tagging and row level security capabilities available in private preview in the coming months. One last note on the broad area of data governance. A big aspect of the Data Cloud is the mobilization of data to be used across organizations. At the same time, privacy is an important consideration to ensure the protection of sensitive, personal or potentially identifying information. We're working on a set of product capabilities to simplify compliance with privacy related regulatory requirements, and simplify the process of collaborating with data while preserving privacy. Earlier this year, Snowflake acquired a company called Crypto Numerix to accelerate our efforts on this front, including the identification and anonymization of sensitive data. We look forward to sharing more details in the future. We've just shown you three demos of new and exciting ways to use Snowflake. However, I want to also remind you that our commitment to the core platform has never been greater. As you move workloads on to Snowflake, we know you expect exceptional price performance and continued delivery of new capabilities that benefit every workload. On price performance, we continue to drive performance improvements throughout the platform. Let me give you an example comparing an identical set of customers submitted queries that ran both in August of 2019, and August of 2020. If I look at the set of queries that took more than one second to compile 72% of those improved by at least 50%. When we make these improvements, execution time goes down. And by implication, the required compute time is also reduced. Based on our pricing model to charge for what you use, performance improvements not only deliver faster insights, but also translate into cost savings for you. In addition, we have two new major announcements on performance to share today. First, we announced our search optimization service during our June event. This service currently in public preview can be enabled on a table by table basis, and is able to dramatically accelerate lookup queries on any column, particularly those not used as clustering columns. We initially support equality comparisons only, and today we're announcing expanded support for searches in values, such as pattern matching within strings. This will unlock a number of additional use cases such as analytics on logs data for performance or security purposes. This expanded support is currently being validated by a few customers in private preview, and will be broadly available in the future. Second, I'd like to introduce a new service that will be in private preview in a future release. The query acceleration service. This new feature will automatically identify and scale out parts of a query that could benefit from additional resources and parallelization. This means that you will be able to realize dramatic improvements in performance. This is especially impactful for data science and other scan intensive workloads. Using this feature is pretty simple. You define a maximum amount of additional resources that can be recruited by a warehouse for acceleration, and the service decides when it would be beneficial to use them. Given enough resources, a query over a massive data set can see orders of magnitude performance improvement compared to the same query without acceleration enabled. In our own usage of Snowflake, we saw a common query go 15 times faster without changing the warehouse size. All of these performance enhancements are extremely exciting, and you will see continued improvements in the future. We love to innovate and continuously raise the bar on what's possible. More important, we love seeing our customers adopt and benefit from our new capabilities. In June, we announced a number of previews, and we continue to roll those features out and see tremendous adoption, even before reaching general availability. Two have those announcements were the introduction of our geospatial support and policies for dynamic data masking. Both of these features are currently in use by hundreds of customers. The number of tables using our new geography data type recently crossed the hundred thousand mark, and the number of columns with masking policies also recently crossed the same hundred thousand mark. This momentum and level of adoption since our announcements in June is phenomenal. I have one last announcement to highlight today. In 2014, Snowflake transformed the world of data management and analytics by providing a single platform with first class support for both structured and semi structured data. Today, we are announcing that Snowflake will be adding support for unstructured data on that same platform. Think of the abilities of Snowflake used to store access and share files. As an example, would you like to leverage the power of SQL to reason through a set of image files. We have a few customers as early adopters and we'll provide additional details in the future. With this, you will be able to leverage Snowflake to mobilize all your data in the Data Cloud. Our customers rely on Snowflake as the data platform for every part of their business. However, the vision and potential of Snowflake is actually much bigger than the four walls of any organization. Snowflake has created a Data Cloud a data connected network with a vision where any Snowflake customer can leverage and mobilize the world's data. Whether it's data sets, or data services from traditional data providers for SaaS vendors, our marketplace creates opportunities for you and raises the bar in terms of what is possible. As examples, you can unify data across your supply chain to accelerate your time and quality to market. You can build entirely new revenue streams, or collaborate with a consortium on data for good. The possibilities are endless. Every company has the opportunity to gain richer insights, build greater products and deliver better services by reaching beyond the data that he owns. Our vision is to enable every company to leverage the world's data through seamless and governing access. Snowflake is your window into this data network into this broader opportunity. Welcome to the Data Cloud. (upbeat music)
SUMMARY :
is the gateway to the Data Cloud, FTP the file to Quantifind, It brings the computation to Snowflake and that the model is running as the ability to know your data, the ability to access is the mobilization of data to
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Insureco | ORGANIZATION | 0.99+ |
Christian | PERSON | 0.99+ |
Alice | PERSON | 0.99+ |
August of 2020 | DATE | 0.99+ |
August of 2019 | DATE | 0.99+ |
June | DATE | 0.99+ |
InsureCo | ORGANIZATION | 0.99+ |
Martin Avanes | PERSON | 0.99+ |
Europe | LOCATION | 0.99+ |
Quantifind | ORGANIZATION | 0.99+ |
Prasanna | PERSON | 0.99+ |
15 times | QUANTITY | 0.99+ |
2019 | DATE | 0.99+ |
Alex | PERSON | 0.99+ |
SNP | ORGANIZATION | 0.99+ |
2014 | DATE | 0.99+ |
Jordan | PERSON | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
Scala | TITLE | 0.99+ |
Java | TITLE | 0.99+ |
72% | QUANTITY | 0.99+ |
SQL | TITLE | 0.99+ |
Today | DATE | 0.99+ |
North America | LOCATION | 0.99+ |
each agent | QUANTITY | 0.99+ |
SMP | ORGANIZATION | 0.99+ |
second part | QUANTITY | 0.99+ |
First | QUANTITY | 0.99+ |
Second | QUANTITY | 0.99+ |
Snowflake | ORGANIZATION | 0.99+ |
Snowflake | TITLE | 0.99+ |
Python | TITLE | 0.99+ |
each call | QUANTITY | 0.99+ |
Sri Chintala | PERSON | 0.99+ |
each role | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
Two | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
Both | QUANTITY | 0.99+ |
Crypto Numerix | ORGANIZATION | 0.99+ |
two entities | QUANTITY | 0.99+ |
Matt Carroll, Immuta | CUBEConversation, November 2019
>> From the Silicon Angle Media office, in Boston Massachusetts, it's the Cube. Now, here's your host, Dave Vellante. >> Hi everybody, welcome to this Cube Conversation here in our studios, outside of Boston. My name is Dave Vellante. I'm here with Matt Carroll, who's the CEO of Immuta. Matt, good to see ya. >> Good, nice to have me on. >> So we're going to talk about governance, how to automate governance, data privacy, but let me start with Immuta. What is Immuta, why did you guys start this company? >> Yeah, Immuta is an automated data governance platform. We started this company back in 2014 because we saw a gap in the market to be able to control data. What's happened in the market as changes is that every enterprise wants to leverage their data. Data's the new app. But, governments want to regulate it and consumers want to protect it. These were at odds with one another, so we saw a need of creating a platform that could meet the needs of everyone. To democratize access to data and in the enterprise, but at the same time, provide the necessary controls on the data to enforce any regulation, and ensure that there was transparency as to who is using it and why. >> So let's unpack that a little bit. Just try to dig into the problem here. So we all know about the data explosion, of course, and I often say data used to be a liability, now it's turned into an asset. People used to say get rid of the data, now everybody wants to mine it, and they want to take advantage of it, but that causes privacy concerns for individuals. We've seen this with Facebook and many others. Regulations now come into play, GDPR, different states applying different regulations, so you have all these competing forces. The business guys just want to go and get out to the market, but then the lawyers and the compliance officers and others. So are you attacking that problem? Maybe you could describe that problem a little further and talk about how you guys... >> Yeah, absolutely. As you described, there's over 150 privacy regulations being proposed over 25 states, just in 2019 alone. GDPR has created or opened the flood gates if you will, for people to start thinking about how do we want to insert our values into data? How should people use it? And so, the challenge now is, you're right, your most sensitive data in an enterprise is most likely going to give you the most insight into driving your business forward, creating new revenue channels, and be able to optimize your operational expenses. But the challenge is that consumers have awoken to, we're not exactly sure we're okay with that, right? We signed a YULU with you to just use our data for marketing, but now you're using it for other revenue channels? Why? And so, where Immuta is trying to play in there is how do we give the line of business the ability to access that instantaneously? But also give the CISO, the Chief Information Security Officer, and the governance seems the ability to take control back. So it's a delicate balance between speed and safety. And I think what's really happening in the market is we used to think about security from building firewalls, we invested in physical security controls around managing external adversaries from stealing our data. But now it's not necessarily someone trying to steal it, it's just potentially misusing it by accident in the enterprise. And the CISO is having to step in and provide that level of control. And it's also the collision of the cloud and these privacy regulations. Cause now, we have data everywhere, it's not just in our firewalls. And that's the big challenge. That's the opportunity at hand, democratization of data in the enterprise. The problem is data's not all in the enterprise. Data's in the cloud, data's in SaaS, data's in the infrastructure. >> It's distributed by it's very nature. All right, so there's a lot of things I want to follow up on. So first, there's GDPR. When GDPR came out of course, it was May of 2018 I think. It went into effect. It actually came out in 2017, but the penalties didn't take effect till '18. And I thought, okay, maybe this can be a framework for governments around the world and states. It sounds like yeah sort of, but not really. Maybe there's elements of GDPR that people are adopting, but then it sounds like they're putting in their own twists, which is going to be a nightmare for companies. So, are you not seeing a sort of, GDPR becoming this global standard? It sounds like, no. >> I don't think it's going to be necessarily global standard, but I do think the spirit of the GDPR, and at the core of it is, why are you using my data? What was the purpose? So traditionally, when we think about using data, we think about all right, who's the user, and what authorizations do they have, right? But now, there's a third question. Sure, you're authorized to see this data, depending on your role or organization right? But why are you using it? Are you using it for certain business use? Are you using it for personal use? Why are you using this? That's the spirit of GDPR that everyone is adopting across the board. And then of course, each state, or each federal organization is thinking about their unique lens on it, right? And so you're right. This is going to be incredibly complex. And the amount of policies being enforced at query time. I'm in my favorite, let's just say I'm in Tableau or Looker right? I'm just some simple analyst, I'm a young kid, I'm 22, my first job right? And I'm running these queries, I don't know where the data is, right? I don't know what I'm combining. And what we found is on average in these large enterprises, any query at any moment in time, might have over 500 thousand policies that need to be enforced in real time. >> Wow. >> And it's only getting worse. We have to automate it. No human can handle all those edge cases. We have to automate. >> So, I want to get into how you guys actually do that. Before I do, there seems to be... There's a lot of confusion in the marketplace. Take the word data management, data protection. All the backup guys are using that term, the database guys use that term, GOC folks use that term, so there's a lot of confusion there. You have all these adjacent markets coming together. You've got the whole governance risk and compliance space, you've got cyber security, there's privacy concerns, which is kind of two sides of the same coin. How do you see these adjacencies coming together? It seems like you sit in the middle of all that. >> Yeah, welcome to why my marketing budget is getting bigger and bigger. The challenge we're facing now is I think, who owns the problem right? The Chief Data Officer is taking on a much larger role in these organizations, the CISO is taking a much more larger role in reporting up to the board. You have the line of business who now is almost self-sustaining, they don't have to depend on IT as much any longer because of the cloud and because of the new compute layers to make it easier. So who owns it? At the end of the day, where we see it is we think there's a next generation of cyber tools that are coming out. We think that the CISO has to own this. And the reason is that the CISO's job is to protect the enterprise from cyber risk. And at the core of cyber risk is data. And they must own the data problem. The CDO must find the data, and explain what that data is, and make sure it's quality, but it is the CISO that must protect the enterprise from these threats. And so, I see us as part of this next wave of cyber tools that are coming out. There's other companies that are equally in our stratosphere, like BigID, we're seeing AWS with Macy doing sensitive data discovery, Google has their data loss prevention service. So the cloud players are starting to see, hey, we've got to identify sensitive data. There's other startups that are saying hey, we got to identify and catalog sensitive data. And for us, we're saying hey, we need to be able to consume all that cataloging, understand what's sensitive, and automatically apply policies to ensure that any regulation in that environment is met. >> I want to ask you about the cloud too. So much to talk to you about here, Matt. So, I also wanted to get your perspective on variances within industries. So you mentioned Chief Data Officers. The ascendancy of the Chief Data Officers started in financial services, healthcare, and government where we had highly regulation industries. And now it's sort of seeped into more commercial. But it terms of those regulated industries, take healthcare for example. There are specific nuances. Can you talk about what you're seeing in terms of industry variance. >> Yeah, it's a great point. Starting with like, healthcare. What does it mean to be HIPPA compliant anymore? There are different types of devices now where I can point it at your heartbeat from a distance away and I can have 99 percent accuracy of identifying you, right? It takes three data points in any data set to identify 87 percent of US citizens. If I have your age, sex, location, I can identify you. So, what does it mean anymore to be HIPPA compliant? So the challenge is how do we build guarantees of trust that we've de-identified these DESA's, cause we have to use it, right? No one's going to go into a hospital and say, "You know what, I don't want you to say my life. "Cause I want my data protected," right? No one's ever going to say that. So the challenges we face now across these regulated industries is the most sensitive data sets are critical for those businesses to operate. So there has to be a compromise. So, what we're trying to do in these organizations is help them leverage their data and build levels of proportionality, to access that right? So, the key isn't to stop people from using data. The key is to build the controls necessary to leverage a small bit of the data. Let's just say, we've made it indistinguishable. You can only ask Agriculture and Statistics the question. Well, you know what, we actually found some really interesting things there, we need to be a little bit more useful, it's this trade-off between privacy and utility. It's a pendulum that swings back and forth. As someone proves I need more of this, you can swing it, or just mask it. I need more of it? All right, we'll just redact some of the certain things. Nope, this is really important, it's going to save someone's life. Okay, completely unmasked, you have the raw data. But it's that control that's necessary in these environments, that's what's missing. You know, we came out of the US Intelligence community. We understood this better than anyone. Because highly regulated, very sensitive data, but we knew we needed the ability to rapidly control. Well is this just a hunch, or is this a 9-11 event? And you need the ability to switch like that. That's the difference and so, healthcare is going through a change of, we have all these new algorithms. Like Facebook the other day said, hey, we have machine learning algorithms that can look at MRI scans, and we're going to be better than anyone in the world at identifying these. Do you feel good about giving your data to Facebook? I don't know, but we can maybe provide guaranteed anonymization to them, to prove to the world they're going to do right. That's where we have to get to. >> Well, this is huge, especially for the consumer, cause you just gave several examples. Facebook's going to know a lot about me, a mobile device, a Fit Bit, and yet, if I want to get access to my own medical records, it's like Fort Knox to try to get, please, give this to my insurance company. You know, you got to go through all these forms. So, you've got those diverging objectives and so, as a consumer, I want to be able to trust that when I say yes you can use it, go, and I can get access to it, and other can get access to it. I want to understand exactly what it is that you guys do, what you sell. Is it software, is it SAS, and then let's get into how it works. So what is it? >> Yeah, so we're a software platform. We deploy into any infrastructure, but it is not multi-tenant so, we can deploy on any cloud, or on premises for any customer, and we do that with customers across the world. But if you think about at the core of what is Immuta, think of Immuta as a system of record for the CISO or the line of business where I can connect to any data, on any infrastructure, on any compute layer, and we connect into over 61 different storage platforms. We then have built a UI where lawyers... We actually have three lawyers as employees that act as product managers to help any lawyer of any stature take what's on paper, these regulations, these rules and policies, and they digitize it essentially, in active code. So they can build any policy they want on any data in the ecosystem, in the enterprise, and enforce it globally without having to write any code. And then because we're this plane where you can connect any tool to this data, and enforce any regulation because we're the man in the middle, we can audit who is using what data and why. In every action, in any change in policy. So, if you think about it, it's connect any tool to any data, control it, any regulation, and prove compliance in a court of law. >> So you can set the policy at the data set level? >> Correct. >> And so, how does one do that? Can you automate that on the creation of that data set? I mean you've got you know, dependencies. How does that all work? >> Yeah, what's a really interesting part of our secret sauce is that one, we could do that at the column level, we can do it at the row level, we can do it at the cell level. >> So very granular. >> Very, very granular. This is something again, we learned from the US Intelligence community, that we have to have very fine grained access to every little bit of the data. The reason is that, especially in the age of data, is people are going to combine many data sets together. The challenge isn't enforcing the policy on a static data set, the challenge is enforcing the policy across three data sets where you merge three pieces of data together, who have conflicting policies. What do you do then? That's the beauty of our system. We deal with that policy inheritance, we manage that lineage of the policy, and can tell you here's what the policy will be. >> In other words, you can manage to the highest common denominator as an example. >> Or we can automate it to the lowest common denominator, where you can work in projects together recognizing hey, we're going to bring someone into the project that's not going to have the level of access. Everyone else will automatically change it to the lowest common denominator. But then you share that work with another team and it'll automatically be brought to the highest common denominator. And we've built all these work flows in. That was what was missing and that's why I call it a system of record. It's really a symbiotic relationship between IT, the data owner, governance, the CISO, who are trying to protect the data, and the consumer, and all they want to do is access the data as fast as possible to make better, more informed decisions. >> So the other mega-trend you have is obviously, the super power of machine intelligence, or artificial intelligence, and then you've got edge devices and machine to machine communication, where it's just an explosion of IP addresses and data, and so, it sounds like you guys can attack that problem as well. >> Any of this data coming in on any system, the idea is that eventually it's going to land somewhere, right? And you got to protect it. We call that like rogue data, right? This is why I said earlier, when we talk about data, we have to start thinking about it as it's not in some building anymore. Data's everywhere. It's going to be on a cloud infrastructure, it's going to be on premises, and it's likely, in the future, going to be on many distributed data centers around the world cause business is global. And so, what's interesting to us is no matter where the data's sitting, we can protect it, we can connect to it, and we allow people to access it. And that's the key thing is not worrying about how to lock down your physical infrastructure, it's about logically separating it. And that's why what differentiates us from other people is one, we don't copy the data, right? That's the always the barrier for these types of platforms. We leave the data where it is. The second is we take all those regulations and we can actually, at query time, push it down to where that data is. So rather than bring it to us, we push the policy to the data. And what that does is that's what allows us, what differentiates us from everyone else is, it allows us to guarantee that protection, no matter where the data's living. >> So you're essentially virtualizing the data? >> Yeah, yeah. It's virtual views of data, but it's not all the data. What people have to realize is in the day of apps, we cared about storage. We put all the data into a database, we built some services on top of it and a UI, and it was controlled that way, right? You had all the nice business logic to control it. In the age of data, right? Data is the new app, right? We have all these automation tools, Data Robot, and H20, and Domino, and Tableau's building all these automation work flows. >> The robotic process automation. >> Yeah, RPA, UI Path, the Work Fusion, right? They're making it easier and easier for any user to connect to any data and then automate the process around it. They don't need an app to build a unique work flows, these new tools do that for them. The key is getting to the data. And the challenge with the supply chain of data is time to data is the most critical aspect of that. Cause, the time to insight is perishable. And so, what I always tell people, a little story, I came from the government, I worked in Baghdad, we had 42 minutes to know whether or not a bad guy in the environment, we could go after him. After that, that data was perishable, right? We didn't know where he was. It's the same thing in the real world. It's like imagine if Google told you, well, in 42 minutes it might be a good time to go 495. (laughter) It's not very useful, I need to know the information now. That's the key. What we see is policy enforcement and regulations are the key barrier of entry. So our ability to rapidly, with no latency, be able to connect anyone to that data and enforce those policies where the data lives, that's the critical nature. >> Okay, so you can apply the policies and you do it quickly, and so now you can help solve the problem. You mentioned a cloud before, or on prem. What is the strategy there with regard to various clouds and how do you approach multi-clouds? >> I think cloud is what used to be an infrastructure as a service game, is now becoming a compute game. I think large, regulated enterprises, government, healthcare, financial services, insurance, are all moving to cloud now in a different way. >> What do you mean by that? Cause people think infrastructure as service, they'll say oh that's compute storage and some networking. What do you mean by that? >> I think there's a whole new age of software that's being laid on top of the availability of compute and the availability of storage. That's companies like Databricks, companies like Snowflake, and what they're doing is dramatically changing how people interact with data. The availability zones, the different types of features, the ability to rip and replace legacy warehouses and main frames. It's changing the ability to not just access, but also the types of users that could even come on to leverage this data. And so these enterprises are now thinking through, "How do I move my entire infrastructure of data to them? "And what are these new capabilities "that I could get out of that?" Which, that is just happening now. A lot of people have been thinking, "Oh, this has been happening over the past five years," no, the compute game is now the new war. I used to think of like, Big Data, right? Big Data created, everyone started to understand, "Ah, if we've got our data assets together, "we can get value." Now they're thinking, "All right, let's move beyond that." The new cloud at our currents works is Snowflake and Databricks. What they're thinking about is, "How do I take all your meta-data "and allow anyone to connect any BI tool, "any data science tool, and provide highly performance, "and highly dependable compute services "to process petabytes of data?" It's pretty fantastic. >> And very cost efficient and being able to scale, compute independent of storage, from an architectural perspective. A lot of people claim they can do that, but it doesn't scale the same way. >> Yeah, when you're talking about... Cause that's the thing is you got to remember, these financial systems especially, they depend on these transactions. They cannot go down and they're processing petabytes of data. That's what the new war is over, is that data in the compute layer. >> And the opportunity for you is that data that can come from anywhere, it's not sitting in a God box, where you can enforce policies on that corpus. You don't know where it's coming from. >> We want to be invisible to that right? You're using Snowflake, it's just automatically enforced. You're using Databricks, it's automatically enforced. All these policies are enforced in flight. No one should even truly care about us. We just want to allow you to use the data the way you're used to using it. >> And you do this, this secret sauce you talked about is math, it's artificial intelligence? >> It's math. I wish I could say it was like super fancy, unsupervised neural nets or what not, it's 15 years of working in the most regulated, sticky environments. We learned about very simple novel ways of pushing it down. Great engineering's always simple. But what we've done is... At query time, what's really neat is we figured a way to take user attributes from identity management system and combine that with a purpose, and then what we do is we've built all these libraries to connect into all these dispert storage and compute systems, to push it in there. The nice thing about that is prior to this what people were doing, was making copies. They'd go to the data engineering team and they'd say hey, "I need to ETL this "and get a copy and it'll be anatomized." Think about that for a second. One, the load on your production systems, of all these copies, all the time, right? The second is CISO, the surface area. Now you've got all this data that in a snapshot in time, is legal and ethical, might change tomorrow. And so, now you've got an increase surface area of risk. Like that no-copy aspect. So the pushing it down and then the no-copy aspect really changed the game for enterprises. >> And you've got providence issues, like you say. You've got governance and compliance. >> And imagine trying, if someone said to you, imagine Congress said hey, "Any data source that you've processed "over the past five years, I want to know if "there was these three people in any of these data sources "and if there were, who touched that data "and why did they touch it?" >> Yeah and storage is cheap, but there's unintended consequences. People are, management isn't. >> We just don't have a unified way to look at all of the logs cross listed. >> So we started to talk about cloud and then I took you down a different path. But you offer your software on any cloud, is that right? >> Yeah, so right now, we are in production on Immuta's Marketplace. And that is a managed service, so you can go deploy in there, it'll go into your VPC, and we can manage the updates for you, we have no insight into your infrastructure, but we can push those updates, it'll automatically update, so you're getting our quarterly releases, we release every season. But yeah, we started with AWBS, and then we will grow out. We see cloud is just too ubiquitous. Currently, we still support though, Bigquery, Data Praq, we support Azure, Data Light Storage version two, as well as Azure Databricks. But you can get us through Immuta's Marketplace. We're also investing in ReInvent, we'll be out there in Vegas in a couple weeks. It's a big event for us just because obviously, the government has a very big stake in AWBS, but also commercial customers. It's been a massive endeavor to move. We've seen lots of infrastructure. Most of our deals now are on cloud infrastructure. >> Great, so tell us about the company. You've raised, I think in a Series B, about 28 million to date. Maybe you could give us the head count, and whatever you can share about momentum, maybe customer examples. >> Yeah, so we've raised 32 million to date. >> 32 million. >> From some great investors. The company's about 70 people now. So not too big, but not small anymore. Just this year, at this point, I haven't closed my fiscal year, so I don't want to give too much, but we've doubled our ARR and we've tripled our LOGO count this year alone and we've still got one more quarter here. We just started our fourth quarter. And some customer cases, the way I think about our business is I love healthcare, I love government, I love finance. To give you some examples is like, COGNO is a really great example. COGNO and what they're trying to solve is can they predict where a child is on the autism spectrum? And they're trying to use machine learning to be able to narrow these children down so that they can see patterns as to how a provider, a therapist is helping these families give these kids the skills to operate in the real world. And so it's like this symbiotic relationship utilizing software, surveys and video and what not, to help connect these kids that are in similar areas of the spectrum, to help say hey, this is a successful treatment, right? The problem with that is we need lots of training data. And this is children, one, two, this is healthcare, and so, how do you guarantee HIPPA compliance? How do you get through FDA trials, through third party, blind testing? And still continue to validate and retrain your models, while protecting the identity of these children? So we provide a platform where we can anonymize all the data for them, we can guarantee that there's blind studies, where the company doesn't have access to certain subsets of the data. We can also then connect providers to gain access to the HIPPA data as needed. We can automate the whole thing for them. And they're a startup too, there are 100 people. But imagine if you were a startup in this health-tech industry and you had to invest in the backend infrastructure to handle all of that. It's too expensive. What we're unlocking for them, I mean yes, it's great that they're HIPPA compliant and all that, that's what we want right? But the more important thing is like, we're providing a value add to innovate in areas utilizing machine learning, that regulations would've stymied, right? We're allowing startups in that ecosystem to really push us forward and help those families. >> Cause HIPPA compliance is table stay compulsory. But now you're talking about enabling new business models. >> Yeah, yeah exactly. >> How did you get into all this? You're CEO, you're business savvy, but it sounds like you're pretty technical as well. What's your background? >> Yeah I mean, so I worked in the intelligence community before this. And most of my focus was on how do we take data and be able to leverage it, either for counter-terrorism missions, to different non-kinetic operations. And so, where I kind of grew up in is in this age of, think about billions of dollars in Baghdad. Where I learned is that through the computing infrastructure there, everything changed. 2006 Baghdad created this boom of technology. We had drones, right? We had all these devices on our trucks that were collecting information in real time and telling us things. And then we started building computing infrastructure and it burst Hadoop. So, I kind of grew up in this era of Big Data. We were collecting it all, we had no idea what to do with it. We had nowhere to process it. And so, I kind of saw like, there's a problem here. If we can find the unique little, you know, nuggets of information out of that, we can make some really smart decisions and save lives. So once I left that community, I kind of dedicated myself to that. The birth of this company again, was spun out of the US Intelligence community and it was really a simple problem. It was, they had a bunch of data scientists that couldn't access data fast enough. So they couldn't solve problems at the speed they needed to. It took four to six months to get to data, the mission said they needed it in less than 72 hours. So it was orthogonal to one another, and so it was very clear we had to solve that problem fast. So that weird world of very secure, really sensitive, but also the success that we saw of using data. It was so obvious that we need to democratize access to data, but we need to do it securely and we need to be able to prove it. We work with more lawyers in the intelligence community than you could ever imagine, so the goal was always, how do we make a lawyer happy? If you figure that problem out, you have some success and I think we've done it. >> Well that's awesome in applying that example to the commercial business world. Scott McNeely's famous for saying there is no privacy in the internet, get over it. Well guess what, people aren't going to get over it. It's the individuals that are much more concerned with it after the whole Facebook and fake news debacle. And as well, organizations putting data in the cloud. They need to govern their data, they need that privacy. So Matt, thanks very much for sharing with us your perspectives on the market, and the best of luck with Immuta. >> Thanks so much, I appreciate it. Thanks for having me out. >> All right, you're welcome. All right and thank you everybody for watching this Cube Conversation. This is Dave Vellante, we'll see ya next time. (digital music)
SUMMARY :
in Boston Massachusetts, it's the Cube. Matt, good to see ya. What is Immuta, why did you guys start this company? on the data to enforce any regulation, and get out to the market, but then the lawyers and the governance seems the ability to take control back. but the penalties didn't take effect till '18. and at the core of it is, why are you using my data? We have to automate it. There's a lot of confusion in the marketplace. So the cloud players are starting to see, So much to talk to you about here, Matt. So, the key isn't to stop people from using data. and I can get access to it, and other can get access to it. and we do that with customers across the world. Can you automate that on the creation of that data set? we can do it at the row level, The reason is that, especially in the age of data, to the highest common denominator as an example. and the consumer, and all they want to do So the other mega-trend you have is obviously, and it's likely, in the future, You had all the nice business logic to control it. Cause, the time to insight is perishable. What is the strategy there with regard to are all moving to cloud now in a different way. What do you mean by that? It's changing the ability to not just access, but it doesn't scale the same way. Cause that's the thing is you got to remember, And the opportunity for you is that data We just want to allow you to use the data and they'd say hey, "I need to ETL this And you've got providence issues, like you say. Yeah and storage is cheap, to look at all of the logs cross listed. and then I took you down a different path. and we can manage the updates for you, and whatever you can share about momentum, in the backend infrastructure to handle all of that. But now you're talking about enabling new business models. How did you get into all this? so the goal was always, how do we make a lawyer happy? and the best of luck with Immuta. Thanks so much, I appreciate it. All right and thank you everybody
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave Vellante | PERSON | 0.99+ |
Matt Carroll | PERSON | 0.99+ |
Boston | LOCATION | 0.99+ |
Immuta | ORGANIZATION | 0.99+ |
Matt | PERSON | 0.99+ |
2014 | DATE | 0.99+ |
ORGANIZATION | 0.99+ | |
2017 | DATE | 0.99+ |
15 years | QUANTITY | 0.99+ |
32 million | QUANTITY | 0.99+ |
ORGANIZATION | 0.99+ | |
2019 | DATE | 0.99+ |
November 2019 | DATE | 0.99+ |
Vegas | LOCATION | 0.99+ |
99 percent | QUANTITY | 0.99+ |
Congress | ORGANIZATION | 0.99+ |
Baghdad | LOCATION | 0.99+ |
Snowflake | ORGANIZATION | 0.99+ |
42 minutes | QUANTITY | 0.99+ |
GDPR | TITLE | 0.99+ |
four | QUANTITY | 0.99+ |
third question | QUANTITY | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
six months | QUANTITY | 0.99+ |
22 | QUANTITY | 0.99+ |
three people | QUANTITY | 0.99+ |
Boston Massachusetts | LOCATION | 0.99+ |
May of 2018 | DATE | 0.99+ |
Bigquery | ORGANIZATION | 0.99+ |
three pieces | QUANTITY | 0.99+ |
87 percent | QUANTITY | 0.99+ |
two sides | QUANTITY | 0.99+ |
Data Praq | ORGANIZATION | 0.99+ |
Scott McNeely | PERSON | 0.99+ |
Databricks | ORGANIZATION | 0.99+ |
less than 72 hours | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
100 people | QUANTITY | 0.99+ |
first | QUANTITY | 0.99+ |
tomorrow | DATE | 0.99+ |
first job | QUANTITY | 0.98+ |
second | QUANTITY | 0.98+ |
2006 | DATE | 0.98+ |
ReInvent | ORGANIZATION | 0.98+ |
each state | QUANTITY | 0.98+ |
US | LOCATION | 0.98+ |
this year | DATE | 0.98+ |
AWBS | ORGANIZATION | 0.98+ |
over 500 thousand policies | QUANTITY | 0.98+ |
over 25 states | QUANTITY | 0.98+ |
one | QUANTITY | 0.98+ |
over 150 privacy regulations | QUANTITY | 0.98+ |
'18 | DATE | 0.98+ |
495 | QUANTITY | 0.98+ |
fourth quarter | DATE | 0.98+ |
One | QUANTITY | 0.97+ |
about 70 people | QUANTITY | 0.96+ |
three data sets | QUANTITY | 0.96+ |
billions of dollars | QUANTITY | 0.95+ |
Series B | OTHER | 0.95+ |
one more quarter | QUANTITY | 0.95+ |
YULU | ORGANIZATION | 0.95+ |
CISO | ORGANIZATION | 0.95+ |
Looker | ORGANIZATION | 0.94+ |
over 61 different storage platforms | QUANTITY | 0.93+ |
Fort Knox | ORGANIZATION | 0.92+ |
about 28 million | QUANTITY | 0.92+ |
Immuta | TITLE | 0.92+ |
Tableau | ORGANIZATION | 0.88+ |
Sri Satish Ambati, H2O.ai | CUBE Conversation, August 2019
(upbeat music) >> Woman Voiceover: From our studios in the heart of Silicon Valley, Palo Alto, California this is a CUBE Conversation. >> Hello and welcome to this special CUBE Conversation here in Palo Alto, California, CUBE Studios, I'm John Furrier, host of theCUBE, here with Sri Ambati. He's the founder and CEO of H20.ai. CUBE Alum, hot start up right in the action of all the machine learning, artificial intelligence, with democratization the role of data in the future, it's all happening with Cloud 2.0, DevOps 2.0, Sri, great to see you. Thanks for coming by. You're a neighbor, you're right down the street from us at our studio here. >> It's exciting to be at theCUBE Com. >> That's KubeCon, that's Kubernetes Con. CUBEcon, coming soon, not to be confused with KubeCon. Great to see you. So tell us about the company, what's going on, you guys are smoking hot, congratulations. You got the right formula here with AI. Explain what's going on. >> It started about seven years ago, and .ai was just a new fad that arrived that arrived in Silicon Valley. And today we have thousands of companies in AI, and we're very excited to be partners in making more companies become AI-first. And our vision here is to democratize AI, and we've made it simple with our open source, made it easy for people to start adapting data science and machine learning in different functions inside their large organizations. And apply that for different use cases across financial services, insurance, health care. We leapfrogged in 2016 and built our first closed source product, Driverless AI, we made it on GPUs using the latest hardware and software innovations. Open source AI has funded the rise of automatic machine learning, Which further reduces the need for extraordinary talent to fill the machine learning. No one has time today, and then we're trying to really bring that automatic machine learning at a very significant crunch time for AI, so people can consume AI better. >> You know, this is one of the things that I love about the current state of the market right now, the entrepreneur market as well as startups and growing companies that are going to go public. Is that there's a new breed of entrepreneurship going on around large scale, standing up infrastructure, shortening the time it takes to do something. Like provisioning. The old AIs, you got to be a PHD. And we're seeing this in data science, you don't have to be a python coder. This democratization is not just a tag line, actually the reality is of a business opportunity. Whoever can provide the infrastructure and the systems for people to do it. It is an opportunity, you guys are doing that. This is a real dynamic. This is a new way, a new kind of dynamic and an industry. >> The three real characteristics on ability to adopt AI, one is data is a team sport. Which means you've got to bring different dimensions within your organization to be able to take advantage of data and AI. And you've got to bring in your domain scientists, work closely with your data scientists, work closely with your data engineers, produce applications that can be deployed, and then get your design on top of it that can convince users or strategists to make those decisions that data is showing up So that takes a multi-dimensional workforce to work closely together. The real problem in adoption of AI today is not just technology, it's also culture. So we're kind of bringing those aspects together in formal products. One of our products, for example, Explainable AI. It's helping the data scientists tell a story that businesses can understand. Why is the model deciding I need to take this test in this direction? Why is this model giving this particular nurse a high credit score even though she doesn't have a high school graduation? That kind of figuring out those democratization goes all the way down. Why is the model deciding what it's deciding, and explaining and breaking that down into English. And building a trust is a huge aspect in AI right now. >> Well I want to get to the talent, and the time, and the trust equation on the next talk, but I want to get the hard news out there. You guys have some news, Driverless AI is one of your core things. Explain the news, what's the big news? >> The big news has been that... AI's a money ball for business, right? And money ball as it has been played out has been the experts were left out of the field, and algorithms taking over. And there is no participation between experts, the domain scientists, and the data scientists. And what we're bringing with the new product in Driverless AI, is an ability for companies to take our AI and become AI companies themselves. The real AI race is not between the Googles and the Amazons and the Microsofts and other AI companies, AI software companies. The real AI race is in the verticals and how can a company which is a bank, or an insurance giant, or a healthcare company take AI platforms and become, take the data and monetize the data and become AI companies themselves. >> Yeah, that's a really profound statement I would agree with 100% on that. I think we saw that early on in the big data world around Hadoop, well Hadoop kind of died by the wayside, but Dave Vellante and the WikiBon team have observed, and they actually predicted, that the most value was going to come from practitioners, not the vendors. 'Cause they're the ones who have the data. And you mentioned verticals, this is another interesting point I want to get more explanation from you on, is that apps are driven by data. Data needs domain-specific information. So you can't just say "I have data, therefore magic happens" it's really at the edge of the domain speak or the domain feature of the application. This is where the data is, so this kind of supports your idea that the AI's about the companies that are using it, not the suppliers of the technology. >> Our vision has always been how we make our customers satisfied. We focus on the customer, and through that we actually make customer one of the product managers inside the company. And the doors that open from working very closely with some of our leading customers is that we need to get them to participate and take AIs, algorithms, and platforms, that can tune automatically the algorithms, and have the right hyper parameter optimizations, the right features. And augment the right data sets that they have. There's a whole data lake around there, around data architecture today. Which data sets am I not using in my current problem I'm solving, that's a reasonable problem I'm looking at. That combination of these various pieces have been automated in Driverless AI. And the new version that we're now bringing to market is able to allow them to create their own recipes, bring their own transformers, and make an automatic fit for their particular race. So if you think about this as we built all the components of a race car, you're going to take it and apply it for that particular race to win. >> John: So that's the word driverless comes in. It's driverless in the sense of you don't really need a full operator, it kind of operates on its own. >> In some sense it's driverless. They're taking the data scientists, giving them a power tool. Historically, before automatic machine learning, driverless is in the umbrella of machine learning, they would fine tune, learning the nuances of the data, and the problem at hand, what they're optimizing for, and the right tweaks in the algorithm. So they have to understand how deep the streets are going to be, how many layers of deep learning they need, what variation of deep learning they should put, and in a natural language crossing, what context they need. Long term shot, memory, all these pieces they have to learn themselves. And there were only a few grand masters or big data scientists in the world who could come up with the right answer for different problems. >> So you're spreading the love of AI around. >> Simplifying that. >> You get the big brains to work on it, and democratization means people can participate and the machines also can learn. Both humans and machines. >> Between our open source and the very maker-centric culture, we've been able to attract some of the world's top data scientists, physicists, and compiler engineers. To bring in a form factor that businesses can use. One data scientist in a company like Franklin Templeton can operate at a level of ten or hundreds of them, and then bring the best in data science in a form factor that they can plug in and play. >> I was having a concert with Kent Libby, who works with me on our platform team. We have all this data with theCUBE, and we were just talking, we need to hire a data scientist and AI specialist. And you go out and look around, you've got Google, Amazon, all these big players spending between 3-4 million per machine learning engineer. And that might be someone under the age of 30 with no experience. So the talent bore is huge. The cost to just hire, we can't hire these people. >> It's a global war. There's talent shortage in China, there's talent shortage in India, there's talent shortage in Europe, and we have offices in Europe and India. There's a talent shortage in Toronto and Ottawa. So it's a global shortage of physicists and mathematicians and data scientists. So that's where our tools can help. And we see Driverless AI as, you can drive to New York or you can fly to New York. >> I was talking to my son the other day, he's taking computer science classes in night school. And it's like, well you know, the machine learning in AI is kind of like dog training. You have dog training, you train the dog to do some tricks, it does some tricks. Well, if you're a coder you want to train the machine. This is the machine training. This is data science, is what AI possibility is there. Machines have to be taught something. There's a base input, machines just aren't self-learning on their own. So as you look at the science of AI, this becomes the question on the talent gap. Can the talent gap be closed by machines? And you got the time, you want speed, low latency, and trust. All these things are hard to do. All three, balancing all three is extremely difficult. What's your thoughts on those three variables? >> So that's why we brought AI to help with AI. Driverless AI is a concept of bringing AI to simplify. It's an expert system to do AI better. So you can actually give to the hands of the new data scientists, so you can perform at the power of an advanced data scientist. We're not disempowering the data scientist, the part's still for a data scientist. When you start with a confusion matrix, false positives, false negatives, that's something a data scientist can understand. When you talk about feature engineering, that's something a data scientist can understand. And what Driverless AI is really doing is helping him do that rapidly, and automated on the latest hardware, that's where the time is coming into. GPUs, FPGAs, TPUs, different form of clouds. Cheaper, right. So faster, cheaper, easier, that's the democratization aspect. But it's really targeted at the data scientist to prevent experimental error. In science, the data science is a search for truth, but it's a lot of experiments to get to truth. If you can make the cost of experiments really simple, cheaper, and prevent over fitting. That's a common problem in our science. Prevent bias, accidental bias that you introduce because the data is biased, right. So trying to prevent the flaws in doing data science. Leakage, usually your signal leaks, and how do you prevent those common pieces. That's where Driverless AI is coming at it. But if you put that in a box, what that really unlocks is imagination. The real hard problems in the world are still the same. >> AI for creative people, for instance. They want infrastructure, they don't want to have to be an expert. They want that value. That's the consumerization. >> AI is really the co founder for someone who's highly imaginative and has courage, right. And you don't have to look for founders to look for courage and imagination. A lot of entrepreneurs in large companies, who are trying to bring change to their organizations. >> Yeah, we always say, the intellectual property game is changing from protocols, locked in, patented, to you could have a workflow innovation. Change one little tweak of a process with data and powerful AI, that's the new magic IP equation. It's in the workflow, it's in the application, it's new opportunities. Do you agree with that? >> Absolutely. The leapfrog from here is businesses will come up with new business processes. So we looked at business process optimization, and globalization's going to help there. But AI, as you rightfully said earlier, is training computers. Not just programming them, you're schooling them. A host of computers that can now, with data, think almost at the same level as a Go player. The world's leading Go player. They can think at the same level of an expert in that space. And if that's happening, now I can transform. My business can run 24 by 7 and the rate at which I can assemble machines and feed it data. Data creation becomes, making new data becomes, the real value that AI can- >> H20.ai announcing Driverless AI, part of their flagship product around recipes and democratizing AI. Congratulations. Final point, take a minute to explain to the folks just the product, how they buy it, what's it made of, what's the commitment, how do they engage with you guys? >> It's an annual license, a software license people can download on our website. Get a three week trial, try it on their own. >> Free trial? >> A free trial, our recipes are open-source. About a hundred recipes, built by grand masters have been made open source. And they can be plugged, and tried. Customers of course don't have to make their software open source. They can take this, make it theirs. And our vision here is to make every company an AI company. And that means that they have to embrace AI, learn it, tweak it, participate, some of the leading conservation companies are giving it back in the open source. But the real vision here is to build that community of AI practitioners inside large organizations. We are here, our teams are global, and we're here to support that transformation of some large customers. >> So my problem of hiring an AI person, you could help me solve that. >> Right today. >> Okay, so anyone who's watching, please get their stuff and come get an opening here. That's the goal. But that is the dream, we want AI in our system. >> I have watched you the last ten years, you've been an entrepreneur with a fierce passion, you want AI to be a partner so you can take your message to wider audience and build monetization around the data you have created. Businesses are the largest, after the big data warlords we have, and data privacy's going to come eventually, but I think businesses are the second largest owners of data they just don't know how to monetize it, unlock value from it, and AI will help. >> Well you know we love data, we want to be data-driven, we want to go faster. Love the driverless vision, Driverless AI, H20.ai. Here in theCUBE I'm John Furrier with breaking news here in Silicon Valley from hot startup H20.ai. Thanks for watching.
SUMMARY :
in the heart of Silicon Valley, Palo Alto, California of all the machine learning, artificial intelligence, You got the right formula here with AI. Which further reduces the need for extraordinary talent and the systems for people to do it. Why is the model deciding I need to take and the trust equation on the next talk, and the data scientists. that the most value was going to come from practitioners, and have the right hyper parameter optimizations, It's driverless in the sense of you don't really need and the problem at hand, what they're optimizing for, You get the big brains to work on it, Between our open source and the very So the talent bore is huge. and we have offices in Europe and India. This is the machine training. of the new data scientists, so you can perform That's the consumerization. AI is really the co founder for someone who's It's in the workflow, and the rate at which I can assemble machines just the product, how they buy it, what's it made of, a software license people can download on our website. And that means that they have to embrace AI, you could help me solve that. But that is the dream, we want AI in our system. around the data you have created. Love the driverless vision, Driverless AI, H20.ai.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Europe | LOCATION | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
New York | LOCATION | 0.99+ |
Toronto | LOCATION | 0.99+ |
ORGANIZATION | 0.99+ | |
2016 | DATE | 0.99+ |
Amazons | ORGANIZATION | 0.99+ |
Microsofts | ORGANIZATION | 0.99+ |
August 2019 | DATE | 0.99+ |
John Furrier | PERSON | 0.99+ |
India | LOCATION | 0.99+ |
Silicon Valley | LOCATION | 0.99+ |
Ottawa | LOCATION | 0.99+ |
ten | QUANTITY | 0.99+ |
Sri Satish Ambati | PERSON | 0.99+ |
John | PERSON | 0.99+ |
China | LOCATION | 0.99+ |
three week | QUANTITY | 0.99+ |
24 | QUANTITY | 0.99+ |
Googles | ORGANIZATION | 0.99+ |
hundreds | QUANTITY | 0.99+ |
100% | QUANTITY | 0.99+ |
WikiBon | ORGANIZATION | 0.99+ |
H20.ai | ORGANIZATION | 0.99+ |
Cloud 2.0 | TITLE | 0.99+ |
one | QUANTITY | 0.98+ |
7 | QUANTITY | 0.98+ |
Sri Ambati | PERSON | 0.98+ |
One | QUANTITY | 0.98+ |
3-4 million | QUANTITY | 0.98+ |
today | DATE | 0.98+ |
Franklin Templeton | ORGANIZATION | 0.97+ |
Both | QUANTITY | 0.97+ |
three variables | QUANTITY | 0.97+ |
DevOps 2.0 | TITLE | 0.97+ |
CUBE Conversation | EVENT | 0.97+ |
One data | QUANTITY | 0.96+ |
python | TITLE | 0.95+ |
Palo Alto, California | LOCATION | 0.95+ |
About a hundred recipes | QUANTITY | 0.94+ |
first | QUANTITY | 0.94+ |
English | OTHER | 0.93+ |
CUBE Studios | ORGANIZATION | 0.91+ |
Kent Libby | PERSON | 0.91+ |
Hadoop | TITLE | 0.89+ |
about seven years ago | DATE | 0.88+ |
first closed | QUANTITY | 0.88+ |
CUBE Alum | ORGANIZATION | 0.87+ |
Go | TITLE | 0.87+ |
Silicon Valley, Palo Alto, California | LOCATION | 0.87+ |
Kubernetes | TITLE | 0.85+ |
thousands of companies | QUANTITY | 0.84+ |
30 | QUANTITY | 0.84+ |
three real characteristics | QUANTITY | 0.83+ |
three | QUANTITY | 0.82+ |
theCUBE | ORGANIZATION | 0.81+ |
H20.ai | TITLE | 0.79+ |
H2O.ai | ORGANIZATION | 0.79+ |
second largest | QUANTITY | 0.76+ |
under | QUANTITY | 0.76+ |
KubeCon | EVENT | 0.71+ |
last ten years | DATE | 0.7+ |
theCUBE Com | ORGANIZATION | 0.68+ |
Con. | EVENT | 0.59+ |
.ai | TITLE | 0.57+ |
Sri | ORGANIZATION | 0.57+ |
CUBEcon | EVENT | 0.55+ |
Joel Horwitz, IBM | IBM CDO Summit Sping 2018
(techno music) >> Announcer: Live, from downtown San Francisco, it's theCUBE. Covering IBM Chief Data Officer Strategy Summit 2018. Brought to you by IBM. >> Welcome back to San Francisco everybody, this is theCUBE, the leader in live tech coverage. We're here at the Parc 55 in San Francisco covering the IBM CDO Strategy Summit. I'm here with Joel Horwitz who's the Vice President of Digital Partnerships & Offerings at IBM. Good to see you again Joel. >> Thanks, great to be here, thanks for having me. >> So I was just, you're very welcome- It was just, let's see, was it last month, at Think? >> Yeah, it's hard to keep track, right. >> And we were talking about your new role- >> It's been a busy year. >> the importance of partnerships. One of the things I want to, well let's talk about your role, but I really want to get into, it's innovation. And we talked about this at Think, because it's so critical, in my opinion anyway, that you can attract partnerships, innovation partnerships, startups, established companies, et cetera. >> Joel: Yeah. >> To really help drive that innovation, it takes a team of people, IBM can't do it on its own. >> Yeah, I mean look, IBM is the leader in innovation, as we all know. We're the market leader for patents, that we put out each year, and how you get that technology in the hands of the real innovators, the developers, the longtail ISVs, our partners out there, that's the challenging part at times, and so what we've been up to is really looking at how we make it easier for partners to partner with IBM. How we make it easier for developers to work with IBM. So we have a number of areas that we've been adding, so for example, we've added a whole IBM Code portal, so if you go to developer.ibm.com/code you can actually see hundreds of code patterns that we've created to help really any client, any partner, get started using IBM's technology, and to innovate. >> Yeah, and that's critical, I mean you're right, because to me innovation is a combination of invention, which is what you guys do really, and then it's adoption, which is what your customers are all about. You come from the data science world. We're here at the Chief Data Officer Summit, what's the intersection between data science and CDOs? What are you seeing there? >> Yeah, so when I was here last, it was about two years ago in 2015, actually, maybe three years ago, man, time flies when you're having fun. >> Dave: Yeah, the Spark Summit- >> Yeah Spark Technology Center and the Spark Summit, and we were here, I was here at the Chief Data Officer Summit. And it was great, and at that time, I think a lot of the conversation was really not that different than what I'm seeing today. Which is, how do you manage all of your data assets? I think a big part of doing good data science, which is my kind of background, is really having a good understanding of what your data governance is, what your data catalog is, so, you know we introduced the Watson Studio at Think, and actually, what's nice about that, is it brings a lot of this together. So if you look in the market, in the data market, today, you know we used to segment it by a few things, like data gravity, data movement, data science, and data governance. And those are kind of the four themes that I continue to see. And so outside of IBM, I would contend that those are relatively separate kind of tools that are disconnected, in fact Dinesh Nirmal, who's our engineer on the analytic side, Head of Development there, he wrote a great blog just recently, about how you can have some great machine learning, you have some great data, but if you can't operationalize that, then really you can't put it to use. And so it's funny to me because we've been focused on this challenge, and IBM is making the right steps, in my, I'm obviously biased, but we're making some great strides toward unifying the, this tool chain. Which is data management, to data science, to operationalizing, you know, machine learning. So that's what we're starting to see with Watson Studio. >> Well, I always push Dinesh on this and like okay, you've got a collection of tools, but are you bringing those together? And he flat-out says no, we developed this, a lot of this from scratch. Yes, we bring in the best of the knowledge that we have there, but we're not trying to just cobble together a bunch of disparate tools with a UI layer. >> Right, right. >> It's really a fundamental foundation that you're trying to build. >> Well, what's really interesting about that, that piece, is that yeah, I think a lot of folks have cobbled together a UI layer, so we formed a partnership, coming back to the partnership view, with a company called Lightbend, who's based here in San Francisco, as well as in Europe, and the reason why we did that, wasn't just because of the fact that Reactive development, if you're not familiar with Reactive, it's essentially Scala, Akka, Play, this whole framework, that basically allows developers to write once, and it kind of scales up with demand. In fact, Verizon actually used our platform with Lightbend to launch the iPhone 10. And they show dramatic improvements. Now what's exciting about Lightbend, is the fact that application developers are developing with Reactive, but if you turn around, you'll also now be able to operationalize models with Reactive as well. Because it's basically a single platform to move between these two worlds. So what we've continued to see is data science kind of separate from the application world. Really kind of, AI and cloud as different universes. The reality is that for any enterprise, or any company, to really innovate, you have to find a way to bring those two worlds together, to get the most use out of it. >> Fourier always says "Data is the new development kit". He said this I think five or six years ago, and it's barely becoming true. You guys have tried to make an attempt, and have done a pretty good job, of trying to bring those worlds together in a single platform, what do you call it? The Watson Data Platform? >> Yeah, Watson Data Platform, now Watson Studio, and I think the other, so one side of it is, us trying to, not really trying, but us actually bringing together these disparate systems. I mean we are kind of a systems company, we're IT. But not only that, but bringing our trained algorithms, and our trained models to the developers. So for example, we also did a partnership with Unity, at the end of last year, that's now just reaching some pretty good growth, in terms of bringing the Watson SDK to game developers on the Unity platform. So again, it's this idea of bringing the game developer, the application developer, in closer contact with these trained models, and these trained algorithms. And that's where you're seeing incredible things happen. So for example, Star Trek Bridge Crew, which I don't know how many Trekkies we have here at the CDO Summit. >> A few over here probably. >> Yeah, a couple? They're using our SDK in Unity, to basically allow a gamer to use voice commands through the headset, through a VR headset, to talk to other players in the virtual game. So we're going to see more, I can't really disclose too much what we're doing there, but there's some cool stuff coming out of that partnership. >> Real immersive experience driving a lot of data. Now you're part of the Digital Business Group. I like the term digital business, because we talk about it all the time. Digital business, what's the difference between a digital business and a business? What's the, how they use data. >> Joel: Yeah. >> You're a data person, what does that mean? That you're part of the Digital Business Group? Is that an internal facing thing? An external facing thing? Both? >> It's really both. So our Chief Digital Officer, Bob Lord, he has a presentation that he'll give, where he starts out, and he goes, when I tell people I'm the Chief Digital Officer they usually think I just manage the website. You know, if I tell people I'm a Chief Data Officer, it means I manage our data, in governance over here. The reality is that I think these Chief Digital Officer, Chief Data Officer, they're really responsible for business transformation. And so, if you actually look at what we're doing, I think on both sides is we're using data, we're using marketing technology, martech, like Optimizely, like Segment, like some of these great partners of ours, to really look at how we can quickly A/B test, get user feedback, to look at how we actually test different offerings and market. And so really what we're doing is we're setting up a testing platform, to bring not only our traditional offers to market, like DB2, Mainframe, et cetera, but also bring new offers to market, like blockchain, and quantum, and others, and actually figure out how we get better product-market fit. What actually, one thing, one story that comes to mind, is if you've seen the movie Hidden Figures- >> Oh yeah. >> There's this scene where Kevin Costner, I know this is going to look not great for IBM, but I'm going to say it anyways, which is Kevin Costner has like a sledgehammer, and he's like trying to break down the wall to get the mainframe in the room. That's what it feels like sometimes, 'cause we create the best technology, but we forget sometimes about the last mile. You know like, we got to break down the wall. >> Where am I going to put it? >> You know, to get it in the room! So, honestly I think that's a lot of what we're doing. We're bridging that last mile, between these different audiences. So between developers, between ISVs, between commercial buyers. Like how do we actually make this technology, not just accessible to large enterprise, which are our main clients, but also to the other ecosystems, and other audiences out there. >> Well so that's interesting Joel, because as a potential partner of IBM, they want, obviously your go-to-market, your massive company, and great distribution channel. But at the same time, you want more than that. You know you want to have a closer, IBM always focuses on partnerships that have intrinsic value. So you talked about offerings, you talked about quantum, blockchain, off-camera talking about cloud containers. >> Joel: Yeah. >> I'd say cloud and containers may be a little closer than those others, but those others are going to take a lot of market development. So what are the offerings that you guys are bringing? How do they get into the hands of your partners? >> I mean, the commonality with all of these, all the emerging offerings, if you ask me, is the distributed nature of the offering. So if you look at blockchain, it's a distributed ledger. It's a distributed transaction chain that's secure. If you look at data, really and we can hark back to say, Hadoop, right before object storage, it's distributed storage, so it's not just storing on your hard drive locally, it's storing on a distributed network of servers that are all over the world and data centers. If you look at cloud, and containers, what you're really doing is not running your application on an individual server that can go down. You're using containers because you want to distribute that application over a large network of servers, so that if one server goes down, you're not going to be hosed. And so I think the fundamental shift that you're seeing is this distributed nature, which in essence is cloud. So I think cloud is just kind of a synonym, in my opinion, for distributed nature of our business. >> That's interesting and that brings up, you're right, cloud and Big Data/Hadoop, we don't talk about Hadoop much anymore, but it kind of got it all started, with that notion of leave the data where it is. And it's the same thing with cloud. You can't just stuff your business into the public cloud. You got to bring the cloud to your data. >> Joel: That's right. >> But that brings up a whole new set of challenges, which obviously, you're in a position just to help solve. Performance, latency, physics come into play. >> Physics is a rough one. It's kind of hard to avoid that one. >> I hear your best people are working on it though. Some other partnerships that you want to sort of, elucidate. >> Yeah, no, I mean we have some really great, so I think the key kind of partnership, I would say area, that I would allude to is, one of the things, and you kind of referenced this, is a lot of our partners, big or small, want to work with our top clients. So they want to work with our top banking clients. They want, 'cause these are, if you look at for example, MaRisk and what we're doing with them around blockchain, and frankly, talk about innovation, they're innovating containers for real, not virtual containers- >> And that's a joint venture right? >> Yeah, it is, and so it's exciting because, what we're bringing to market is, I also lead our startup programs, called the Global Entrepreneurship Program, and so what I'm focused on doing, and you'll probably see more to come this quarter, is how do we actually bridge that end-to-end? How do you, if you're startup or a small business, ultimately reach that kind of global business partner level? And so kind of bridging that, that end-to-end. So we're starting to bring out a number of different incentives for partners, like co-marketing, so I'll help startups when they're early, figure out product-market fit. We'll give you free credits to use our innovative technology, and we'll also bring you into a number of clients, to basically help you not burn all of your cash on creating your own marketing channel. God knows I did that when I was at a start-up. So I think we're doing a lot to kind of bridge that end-to-end, and help any partner kind of come in, and then grow with IBM. I think that's where we're headed. >> I think that's a critical part of your job. Because I mean, obviously IBM is known for its Global 2000, big enterprise presence, but startups, again, fuel that innovation fire. So being able to attract them, which you're proving you can, providing whatever it is, access, early access to cloud services, or like you say, these other offerings that you're producing, in addition to that go-to-market, 'cause it's funny, we always talk about how efficient, capital efficient, software is, but then you have these companies raising hundreds of millions of dollars, why? Because they got to do promotion, marketing, sales, you know, go-to-market. >> Yeah, it's really expensive. I mean, you look at most startups, like their biggest ticket item is usually marketing and sales. And building channels, and so yeah, if you're, you know we're talking to a number of partners who want to work with us because of the fact that, it's not just like, the direct kind of channel, it's also, as you kind of mentioned, there's other challenges that you have to overcome when you're working with a larger company. for example, security is a big one, GDPR compliance now, is a big one, and just making sure that things don't fall over, is a big one. And so a lot of partners work with us because ultimately, a number of the decision makers in these larger enterprises are going, well, I trust IBM, and if IBM says you're good, then I believe you. And so that's where we're kind of starting to pull partners in, and pull an ecosystem towards us. Because of the fact that we can take them through that level of certification. So we have a number of free online courses. So if you go to partners, excuse me, ibm.com/partners/learn there's a number of blockchain courses that you can learn today, and will actually give you a digital certificate, that's actually certified on our own blockchain, which we're actually a first of a kind to do that, which I think is pretty slick, and it's accredited at some of the universities. So I think that's where people are looking to IBM, and other leaders in this industry, is to help them become experts in their, in this technology, and especially in this emerging technology. >> I love that blockchain actually, because it's such a growing, and interesting, and innovative field. But it needs players like IBM, that can bring credibility, enterprise-grade, whether it's security, or just, as I say, credibility. 'Cause you know, this is, so much of negative connotations associated with blockchain and crypto, but companies like IBM coming to the table, enterprise companies, and building that ecosystem out is in my view, crucial. >> Yeah, no, it takes a village. I mean, there's a lot of folks, I mean that's a big reason why I came to IBM, three, four years ago, was because when I was in start-up land, I used to work for H20, I worked for Alpine Data Labs, Datameer, back in the Hadoop days, and what I realized was that, it's an opportunity cost. So you can't really drive true global innovation, transformation, in some of these bigger companies because there's only so much that you can really kind of bite off. And so you know at IBM it's been a really rewarding experience because we have done things like for example, we partnered with Girls Who Code, Treehouse, Udacity. So there's a number of early educators that we've partnered with, to bring code to, to bring technology to, that frankly, would never have access to some of this stuff. Some of this technology, if we didn't form these alliances, and if we didn't join these partnerships. So I'm very excited about the future of IBM, and I'm very excited about the future of what our partners are doing with IBM, because, geez, you know the cloud, and everything that we're doing to make this accessible, is bar none, I mean, it's great. >> I can tell you're excited. You know, spring in your step. Always a lot of energy Joel, really appreciate you coming onto theCUBE. >> Joel: My pleasure. >> Great to see you again. >> Yeah, thanks Dave. >> You're welcome. Alright keep it right there, everybody. We'll be back. We're at the IBM CDO Strategy Summit in San Francisco. You're watching theCUBE. (techno music) (touch-tone phone beeps)
SUMMARY :
Brought to you by IBM. Good to see you again Joel. that you can attract partnerships, To really help drive that innovation, and how you get that technology Yeah, and that's critical, I mean you're right, Yeah, so when I was here last, to operationalizing, you know, machine learning. that we have there, but we're not trying that you're trying to build. to really innovate, you have to find a way in a single platform, what do you call it? So for example, we also did a partnership with Unity, to basically allow a gamer to use voice commands I like the term digital business, to look at how we actually test different I know this is going to look not great for IBM, but also to the other ecosystems, But at the same time, you want more than that. So what are the offerings that you guys are bringing? So if you look at blockchain, it's a distributed ledger. You got to bring the cloud to your data. But that brings up a whole new set of challenges, It's kind of hard to avoid that one. Some other partnerships that you want to sort of, elucidate. and you kind of referenced this, to basically help you not burn all of your cash early access to cloud services, or like you say, that you can learn today, but companies like IBM coming to the table, that you can really kind of bite off. really appreciate you coming onto theCUBE. We're at the IBM CDO Strategy Summit in San Francisco.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Joel | PERSON | 0.99+ |
Joel Horwitz | PERSON | 0.99+ |
Europe | LOCATION | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Kevin Costner | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
Dinesh Nirmal | PERSON | 0.99+ |
Alpine Data Labs | ORGANIZATION | 0.99+ |
Lightbend | ORGANIZATION | 0.99+ |
Verizon | ORGANIZATION | 0.99+ |
San Francisco | LOCATION | 0.99+ |
Hidden Figures | TITLE | 0.99+ |
Bob Lord | PERSON | 0.99+ |
Both | QUANTITY | 0.99+ |
MaRisk | ORGANIZATION | 0.99+ |
both | QUANTITY | 0.99+ |
iPhone 10 | COMMERCIAL_ITEM | 0.99+ |
2015 | DATE | 0.99+ |
Datameer | ORGANIZATION | 0.99+ |
both sides | QUANTITY | 0.99+ |
one story | QUANTITY | 0.99+ |
Think | ORGANIZATION | 0.99+ |
five | DATE | 0.99+ |
hundreds | QUANTITY | 0.99+ |
Treehouse | ORGANIZATION | 0.99+ |
three years ago | DATE | 0.99+ |
developer.ibm.com/code | OTHER | 0.99+ |
Unity | ORGANIZATION | 0.98+ |
two worlds | QUANTITY | 0.98+ |
Reactive | ORGANIZATION | 0.98+ |
GDPR | TITLE | 0.98+ |
one side | QUANTITY | 0.98+ |
Digital Business Group | ORGANIZATION | 0.98+ |
today | DATE | 0.98+ |
Udacity | ORGANIZATION | 0.98+ |
ibm.com/partners/learn | OTHER | 0.98+ |
last month | DATE | 0.98+ |
Watson Studio | ORGANIZATION | 0.98+ |
each year | QUANTITY | 0.97+ |
three | DATE | 0.97+ |
single platform | QUANTITY | 0.97+ |
Girls Who Code | ORGANIZATION | 0.97+ |
Parc 55 | LOCATION | 0.97+ |
one thing | QUANTITY | 0.97+ |
four themes | QUANTITY | 0.97+ |
Spark Technology Center | ORGANIZATION | 0.97+ |
six years ago | DATE | 0.97+ |
H20 | ORGANIZATION | 0.97+ |
four years ago | DATE | 0.97+ |
martech | ORGANIZATION | 0.97+ |
Unity | TITLE | 0.96+ |
hundreds of millions of dollars | QUANTITY | 0.94+ |
Watson Studio | TITLE | 0.94+ |
Dinesh | PERSON | 0.93+ |
one server | QUANTITY | 0.93+ |
Frederick Reiss, IBM STC - Big Data SV 2017 - #BigDataSV - #theCUBE
>> Narrator: Live from San Jose, California it's the Cube, covering Big Data Silicon Valley 2017. (upbeat music) >> Big Data SV 2016, day two of our wall to wall coverage of Strata Hadoob Conference, Big Data SV, really what we call Big Data Week because this is where all the action is going on down in San Jose. We're at the historic Pagoda Lounge in the back of the Faramount, come on by and say hello, we've got a really cool space and we're excited and never been in this space before, so we're excited to be here. So we got George Gilbert here from Wiki, we're really excited to have our next guest, he's Fred Rice, he's the chief architect at IBM Spark Technology Center in San Francisco. Fred, great to see you. >> Thank you, Jeff. >> So I remember when Rob Thomas, we went up and met with him in San Francisco when you guys first opened the Spark Technology Center a couple of years now. Give us an update on what's going on there, I know IBM's putting a lot of investment in this Spark Technology Center in the San Francisco office specifically. Give us kind of an update of what's going on. >> That's right, Jeff. Now we're in the new Watson West building in San Francisco on 505 Howard Street, colocated, we have about a 50 person development organization. Right next to us we have about 25 designers and on the same floor a lot of developers from Watson doing a lot of data science, from the weather underground, doing weather and data analysis, so it's a really exciting place to be, lots of interesting work in data science going on there. >> And it's really great to see how IBM is taking the core Watson, obviously enabled by Spark and other core open source technology and now applying it, we're seeing Watson for Health, Watson for Thomas Vehicles, Watson for Marketing, Watson for this, and really bringing that type of machine learning power to all the various verticals in which you guys play. >> Absolutely, that's been what Watson has been about from the very beginning, bringing the power of machine learning, the power of artificial intelligence to real world applications. >> Jeff: Excellent. >> So let's tie it back to the Spark community. Most folks understand how data bricks builds out the core or does most of the core work for, like, the sequel workload the streaming and machine learning and I guess graph is still immature. We were talking earlier about IBM's contributions in helping to build up the machine learning side. Help us understand what the data bricks core technology for machine learning is and how IBM is building beyond that. >> So the core technology for machine learning in Apache Spark comes out, actually, of the machine learning department at UC Berkeley as well as a lot of different memories from the community. Some of those community members also work for data bricks. We actually at the IBM Spark Technology Center have made a number of contributions to the core Apache Spark and the libraries, for example recent contributions in neural nets. In addition to that, we also work on a project called Apache System ML, which used to be proprietary IBM technology, but the IBM Spark Technology Center has turned System ML into Apache System ML, it's now an open Apache incubating project that's been moving forward out in the open. You can now download the latest release online and that provides a piece that we saw was missing from Spark and a lot of other similar environments and optimizer for machine learning algorithms. So in Spark, you have the catalyst optimizer for data analysis, data frames, sequel, you write your queries in terms of those high level APIs and catalyst figures out how to make them go fast. In System ML, we have an optimizer for high level languages like Spark and Python where you can write algorithms in terms of linear algebra, in terms of high level operations on matrices and vectors and have the optimizer take care of making those algorithms run in parallel, run in scale, taking account of the data characteristics. Does the data fit in memory, and if so, keep it in memory. Does the data not fit in memory? Stream it from desk. >> Okay, so there was a ton of stuff in there. >> Fred: Yep. >> And if I were to refer to that as so densely packed as to be a black hole, that might come across wrong, so I won't refer to that as a black hole. But let's unpack that, so the, and I meant that in a good way, like high bandwidth, you know. >> Fred: Thanks, George. >> Um, so the traditional Spark, the machine learning that comes with Spark's ML lib, one of it's distinguishing characteristics is that the models, the algorithms that are in there, have been built to run on a cluster. >> Fred: That's right. >> And very few have, very few others have built machine learning algorithms to run on a cluster, but as you were saying, you don't really have an optimizer for finding something where a couple of the algorithms would be fit optimally to solve a problem. Help us understand, then, how System ML solves a more general problem for, say, ensemble models and for scale out, I guess I'm, help us understand how System ML fits relative to Sparks ML lib and the more general problems it can solve. >> So, ML Live and a lot of other packages such as Sparking Water from H20, for example, provide you with a toolbox of algorithms and each of those algorithms has been hand tuned for a particular range of problem sizes and problem characteristics. This works great as long as the particular problem you're facing as a data scientist is a good match to that implementation that you have in your toolbox. What System ML provides is less like having a toolbox and more like having a machine shop. You can, you have a lot more flexibility, you have a lot more power, you can write down an algorithm as you would write it down if you were implementing it just to run on your laptop and then let the System ML optimizer take care of producing a parallel version of that algorithm that is customized to the characteristics of your cluster, customized to the characteristics of your data. >> So let me stop you right there, because I want to use an analogy that others might find easy to relate to for all the people who understand sequel and scale out sequel. So, the way you were describing it, it sounds like oh, if I were a sequel developer and I wanted to get at some data on my laptop, I would find it pretty easy to write the sequel to do that. Now, let's say I had a bunch of servers, each with it's own database, and I wanted to get data from each database. If I didn't have a scale out database, I would have to figure out physically how to go to each server in the cluster to get it. What I'm hearing for System ML is it will take that query that I might have written on my one server and it will transparently figure out how to scale that out, although in this case not queries, machine learning algorithms. >> The database analogy is very apt. Just like sequel and query optimization by allowing you to separate that logical description of what you're looking for from the physical description of how to get at it. Lets you have a parallel database with the exact same language as a single machine database. In System ML, because we have an optimizer that separates that logical description of the machine learning algorithm from the physical implementation, we can target a lot of parallel systems, we can also target a large server and the code, the code that implements the algorithm stays the same. >> Okay, now let's take that a step further. You refer to matrix math and I think linear algebra and a whole lot of other things that I never quite made it to since I was a humanities major but when we're talking about those things, my understanding is that those are primitives that Spark doesn't really implement so that if you wanted to do neural nets, which relies on some of those constructs for high performance, >> Fred: Yes. >> Then, um, that's not built into Spark. Can you get to that capability using System ML? >> Yes. System ML edits core, provides you with a library, provides you as a user with a library of machine, rather, linear algebra primitives, just like a language like r or a library like Mumpai gives you matrices and vectors and all of the operations you can do on top of those primitives. And just to be clear, linear algebra really is the language of machine learning. If you pick up a paper about an advanced machine learning algorithm, chances are the specification for what that algorithm does and how that algorithm works is going to be written in the paper literally in linear algebra and the implementation that was used in that paper is probably written in the language where linear algebra is built in, like r, like Mumpai. >> So it sounds to me like Spark has done the work of sort of the blocking and tackling of machine learning to run in parallel. And that's I mean, to be clear, since we haven't really talked about it, that's important when you're handling data at scale and you want to train, you know, models on very, very large data sets. But it sounds like when we want to go to some of the more advanced machine learning capabilities, the ones that today are making all the noise with, you know, speech to text, text to speech, natural language, understanding those neural network based capabilities are not built into the core Spark ML lib, that, would it be fair to say you could start getting at them through System ML? >> Yes, System ML is a much better way to do scalable linear algebra on top of Spark than the very limited linear algebra that's built into Spark. >> So alright, let's take the next step. Can System ML be grafted onto Spark in some way or would it have to be in an entirely new API that doesn't take, integrate with all the other Spark APIs? In a way, that has differentiated Spark, where each API is sort of accessible from every other. Can you tie System ML in or do the Spark guys have to build more primitives into their own sort of engine first? >> A lot of the work that we've done with the Spark Technology Center as part of bringing System ML into the Apache ecosystem has been to build a nice, tight integration with Apache Spark so you can pass Spark data frames directly into System ML you can get data frames back. Your System ML algorithm, once you've written it, in terms of one of System ML's main systematic languages it just plugs into Spark like all the algorithms that are built into Spark. >> Okay, so that's, that would keep Spark competitive with more advanced machine learning frameworks for a longer period of time, in other words, it wouldn't hit the wall the way if would if it encountered tensor flow from Google for Google's way of doing deep learning, Spark wouldn't hit the wall once it needed, like, a tensor flow as long as it had System ML so deeply integrated the way you're doing it. >> Right, with a system like System ML, you can quickly move into new domains of machine learning. So for example, this afternoon I'm going to give a talk with one of our machine learning developers, Mike Dusenberry, about our recent efforts to implement deep learning in System ML, like full scale, convolutional neural nets running on a cluster in parallel processing many gigabytes of images, and we implemented that with very little effort because we have this optimizer underneath that takes care of a lot of the details of how you get that data into the processing, how you get the data spread across the cluster, how you get the processing moved to the data or vice versa. All those decisions are taken care of in the optimizer, you just write down the linear algebra parts and let the system take care of it. That let us implement deep learning much more quickly than we would have if we had done it from scratch. >> So it's just this ongoing cadence of basically removing the infrastructure gut management from the data scientists and enabling them to concentrate really where their value is is on the algorithms themselves, so they don't have to worry about how many clusters it's running on, and that configuration kind of typical dev ops that we see on the regular development side, but now you're really bringing that into the machine learning space. >> That's right, Jeff. Personally, I find all the minutia of making a parallel algorithm worked really fascinating but a lot of people working in data science really see parallelism as a tool. They want to solve the data science problem and System ML lets you focus on solving the data science problem because the system takes care of the parallelism. >> You guys could go on in the weeds for probably three hours but we don't have enough coffee and we're going to set up a follow up time because you're both in San Francisco. But before we let you go, Fred, as you look forward into 2017, kind of the advances that you guys have done there at the IBM Spark Center in the city, what's kind of the next couple great hurdles that you're looking to cross, new challenges that are getting you up every morning that you're excited to come back a year from now and be able to say wow, these are the one or two things that we were able to take down in 2017? >> We're moving forward on several different fronts this year. On one front, we're helping to get the notebook experience with Spark notebooks consistent across the entire IBM product portfolio. We helped a lot with the rollout of notebooks on data science experience on z, for example, and we're working actively with the data science experience and with the Watson data platform. On the other hand, we're contributing to Spark 2.2. There are some exciting features, particularly in sequel that we're hoping to get into that release as well as some new improvements to ML Live. We're moving forward with Apache System ML, we just cut Version 0.13 of that. We're talking right now on the mailing list about getting System ML out of incubation, making it a full, top level project. And we're also continuing to help with the adoption of Apache Spark technology in the enterprise. Our latest focus has been on deep learning on Spark. >> Well, I think we found him! Smartest guy in the room. (laughter) Thanks for stopping by and good luck on your talk this afternoon. >> Thank you, Jeff. >> Absolutely. Alright, he's Fred Rice, he's George Gilbert, and I'm Jeff Rick, you're watching the Cube from Big Data SV, part of Big Data Week in San Jose, California. (upbeat music) (mellow music) >> Hi, I'm John Furrier, the cofounder of SiliconANGLE Media cohost of the Cube. I've been in the tech business since I was 19, first programming on mini computers.
SUMMARY :
it's the Cube, covering Big Data Silicon Valley 2017. in the back of the Faramount, come on by and say hello, in the San Francisco office specifically. and on the same floor a lot of developers from Watson to all the various verticals in which you guys play. of machine learning, the power of artificial intelligence or does most of the core work for, like, the sequel workload and have the optimizer take care of making those algorithms and I meant that in a good way, is that the models, the algorithms that are in there, and the more general problems it can solve. to that implementation that you have in your toolbox. in the cluster to get it. and the code, the code that implements the algorithm so that if you wanted to do neural nets, Can you get to that capability using System ML? and all of the operations you can do the ones that today are making all the noise with, you know, linear algebra on top of Spark than the very limited So alright, let's take the next step. System ML into the Apache ecosystem has been to build so deeply integrated the way you're doing it. and let the system take care of it. is on the algorithms themselves, so they don't have to worry because the system takes care of the parallelism. into 2017, kind of the advances that you guys have done of Apache Spark technology in the enterprise. Smartest guy in the room. and I'm Jeff Rick, you're watching the Cube cohost of the Cube.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
George Gilbert | PERSON | 0.99+ |
Jeff Rick | PERSON | 0.99+ |
George | PERSON | 0.99+ |
Jeff | PERSON | 0.99+ |
Fred Rice | PERSON | 0.99+ |
Mike Dusenberry | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
2017 | DATE | 0.99+ |
San Francisco | LOCATION | 0.99+ |
John Furrier | PERSON | 0.99+ |
San Jose | LOCATION | 0.99+ |
Rob Thomas | PERSON | 0.99+ |
505 Howard Street | LOCATION | 0.99+ |
ORGANIZATION | 0.99+ | |
Frederick Reiss | PERSON | 0.99+ |
Spark Technology Center | ORGANIZATION | 0.99+ |
Fred | PERSON | 0.99+ |
IBM Spark Technology Center | ORGANIZATION | 0.99+ |
one | QUANTITY | 0.99+ |
San Jose, California | LOCATION | 0.99+ |
Spark 2.2 | TITLE | 0.99+ |
three hours | QUANTITY | 0.99+ |
Watson | ORGANIZATION | 0.99+ |
UC Berkeley | ORGANIZATION | 0.99+ |
one server | QUANTITY | 0.99+ |
Spark | TITLE | 0.99+ |
SiliconANGLE Media | ORGANIZATION | 0.99+ |
Python | TITLE | 0.99+ |
each server | QUANTITY | 0.99+ |
both | QUANTITY | 0.99+ |
each | QUANTITY | 0.99+ |
each database | QUANTITY | 0.98+ |
Big Data Week | EVENT | 0.98+ |
Pagoda Lounge | LOCATION | 0.98+ |
Strata Hadoob Conference | EVENT | 0.98+ |
System ML | TITLE | 0.98+ |
Big Data SV | EVENT | 0.97+ |
each API | QUANTITY | 0.97+ |
ML Live | TITLE | 0.96+ |
today | DATE | 0.96+ |
Thomas Vehicles | ORGANIZATION | 0.96+ |
Apache System ML | TITLE | 0.95+ |
Big Data | EVENT | 0.95+ |
Apache Spark | TITLE | 0.94+ |
Watson for Marketing | ORGANIZATION | 0.94+ |
Sparking Water | TITLE | 0.94+ |
first | QUANTITY | 0.94+ |
one front | QUANTITY | 0.94+ |
Big Data SV 2016 | EVENT | 0.94+ |
IBM Spark Technology Center | ORGANIZATION | 0.94+ |
about 25 designers | QUANTITY | 0.93+ |
Jean Francois Puget, IBM | IBM Machine Learning Launch 2017
>> Announcer: Live from New York, it's theCUBE, covering the IBM machine learning launch event. Brought to you by IBM. Now, here are your hosts, Dave Vellante and Stu Miniman. >> Alright, we're back. Jean Francois Puget is here, he's the distinguished engineer for machine learning and optimization at IBM analytics, CUBE alum. Good to see you again. >> Yes. >> Thanks very much for coming on, big day for you guys. >> Jean Francois: Indeed. >> It's like giving birth every time you guys give one of these products. We saw you a little bit in the analyst meeting, pretty well attended. Give us the highlights from your standpoint. What are the key things that we should be focused on in this announcement? >> For most people, machine learning equals machine learning algorithms. Algorithms, when you look at newspapers or blogs, social media, it's all about algorithms. Our view that, sure, you need algorithms for machine learning, but you need steps before you run algorithms, and after. So before, you need to get data, to transform it, to make it usable for machine learning. And then, you run algorithms. These produce models, and then, you need to move your models into a production environment. For instance, you use an algorithm to learn from past credit card transaction fraud. You can learn models, patterns, that correspond to fraud. Then, you want to use those models, those patterns, in your payment system. And moving from where you run the algorithm to the operation system is a nightmare today, so our value is to automate what you do before you run algorithms, and then what you do after. That's our differentiator. >> I've had some folks in theCUBE in the past have said years ago, actually, said, "You know what, algorithms are plentiful." I think he made the statement, I remember my friend Avi Mehta, "Algorithms are free. "It's what you do with them that matters." >> Exactly, that's, I believe in autonomy that open source won for machine learning algorithms. Now the future is with open source, clearly. But it solves only a part of the problem you're facing if you want to action machine learning. So, exactly what you said. What do you do with the results of algorithm is key. And open source people don't care much about it, for good reasons. They are focusing on producing the best algorithm. We are focusing on creating value for our customers. It's different. >> In terms of, you mentioned open source a couple times, in terms of customer choice, what's your philosophy with regard to the various tooling and platforms for open source, how do you go about selecting which to support? >> Machine learning is fascinating. It's overhyped, maybe, but it's also moving very quickly. Every year there is a new cool stuff. Five years ago, nobody spoke about deep learning. Now it's everywhere. Who knows what will happen next year? Our take is to support open source, to support the top open source packages. We don't know which one will win in the future. We don't know even if one will be enough for all needs. We believe one size does not fit all, so our take is support a curated list of mid-show open source. We start with Spark ML for many reasons, but we won't stop at Spark ML. >> Okay, I wonder if we can talk use cases. Two of my favorite, well, let's just start with fraud. Fraud has become much, much better over the past certainly 10 years, but still not perfect. I don't know if perfection is achievable, but lot of false positives. How will machine learning affect that? Can we expect as consumers even better fraud detection in more real time? >> If we think of the full life cycle going from data to value, we will provide a better answer. We still use machine learning algorithm to create models, but a model does not tell you what to do. It will tell you, okay, for this credit card transaction coming, it has a high probability to be fraud. Or this one has a lower priority, uh, probability. But then it's up to the designer of the overall application to make decisions, so what we recommend is to use machine learning data prediction but not only, and then use, maybe, (murmuring). For instance, if your machine learning model tells you this is a fraud with a high probability, say 90%, and this is a customer you know very well, it's a 10-year customer you know very well, then you can be confident that it's a fraud. Then if next fraud tells you this is 70% probability, but it's a customer since one week. In a week, we don't know the customer, so the confidence we can get in machine learning should be low, and there you will not reject the transaction immediately. Maybe you will enter, you don't approve it automatically, maybe you will send a one-time passcode, or you enter a serve vendor system, but you don't reject it outright. Really, the idea is to use machine learning predictions as yet another input for making decisions. You're making decision informed on what you could learn from your past. But it's not replacing human decision-making. Our approach with IBM, you don't see IBM speak much about artificial intelligence in general because we don't believe we're here to replace humans. We're here to assist humans, so we say, augmented intelligence or assistance. That's the role we see for machine learning. It will give you additional data so that you make better decisions. >> It's not the concept that you object to, it's the term artificial intelligence. It's really machine intelligence, it's not fake. >> I started my career as a PhD in artificial intelligence, I won't say when, but long enough. At that time, there were already promise that we have Terminator in the next decade and this and that. And the same happened in the '60s, or it was after the '60s. And then, there is an AI winter, and we have a risk here to have an AI winter because some people are just raising red flags that are not substantiated, I believe. I don't think that technology's here that we can replace human decision-making altogether any time soon, but we can help. We can certainly make some proficient, more efficient, more productive with machine learning. >> Having said that, there are a lot of cognitive functions that are getting replaced, maybe not by so-called artificial intelligence, but certainly by machines and automation. >> Yes, so we're automating a number of things, and maybe we won't need to have people do quality check and just have an automated vision system detect defects. Sure, so we're automating more and more, but this is not new, it has been going on for centuries. >> Well, the list evolved. So, what can humans do that machines can't, and how would you expect that to change? >> We're moving away from IMB machine learning, but it is interesting. You know, each time there is a capacity that a machine that will automate, we basically redefine intelligence to exclude it, so you know. That's what I foresee. >> Yeah, well, robots a while ago, Stu, couldn't climb stairs, and now, look at that. >> Do we feel threatened because a robot can climb a stair faster than us? Not necessarily. >> No, it doesn't bother us, right. Okay, question? >> Yeah, so I guess, bringing it back down to the solution that we're talking about today, if I now am doing, I'm doing the analytics, the machine learning on the mainframe, how do we make sure that we don't overrun and blow out all our MIPS? >> We recommend, so we are not using the mainframe base compute system. We recommend using ZIPS, so additional calls to not overload, so it's a very important point. We claim, okay, if you do everything on the mainframe, you can learn from operational data. You don't want to disturb, and you don't want to disturb takes a lot of different meanings. One that you just said, you don't want to slow down your operation processings because you're going to hurt your business. But you also want to be careful. Say we have a payment system where there is a machine learning model predicting fraud probability, a part of the system. You don't want a young bright data scientist decide that he had a great idea, a great model, and he wants to push his model in production without asking anyone. So you want to control that. That's why we insist, we are providing governance that includes a lot of things like keeping track of how models were created from which data sets, so lineage. We also want to have access control and not allow anyone to just deploy a new model because we make it easy to deploy, so we want to have a role-based access and only someone someone with some executive, well, it depends on the customer, but not everybody can update the production system, and we want to support that. And that's something that differentiates us from open source. Open source developers, they don't care about governance. It's not their problem, but it is our customer problem, so this solution will come with all the governance and integrity constraints you can expect from us. >> Can you speak to, first solution's going to be on z/OS, what's the roadmap look like and what are some of those challenges of rolling this out to other private cloud solutions? >> We are going to shape this quarter IBM machine learning for Z. It starts with Spark ML as a base open source. This is not, this is interesting, but it's not all that is for machine learning. So that's how we start. We're going to add more in the future. Last week we announced we will shape Anaconda, which is a major distribution for Python ecosystem, and it includes a number of machine learning open source. We announced it for next quarter. >> I believe in the press release it said down the road things like TensorFlow are coming, H20. >> But Anaconda will announce for next quarter, so we will leverage this when it's out. Then indeed, we have a roadmap to include major open source, so major open source are the one from Anaconda (murmuring), mostly. Key deep learning, so TensorFlow and probably one or two additional, we're still discussing. One that I'm very keen on, it's called XGBoost in one word. People don't speak about it in newspapers, but this is what wins all Kaggle competitions. Kaggle is a machine learning competition site. When I say all, all that are not imagery cognition competitions. >> Dave: And that was ex-- >> XGBoost, X-G-B-O-O-S-T. >> Dave: XGBoost, okay. >> XGBoost, and it's-- >> Dave: X-ray gamma, right? >> It's really a package. When I say we don't know which package will win, XGBoost was introduced a year ago also, or maybe a bit more, but not so long ago, and now, if you have structure data, it is the best choice today. It's a really fast-moving, but so, we will support mid-show deep learning package and mid-show classical learning package like the one from Anaconda or XGBoost. The other thing we start with Z. We announced in the analyst session that we will have a power version and a private cloud, meaning XTC69X version as well. I can't tell you when because it's not firm, but it will come. >> And in public cloud as well, I guess we'll, you've got components in the public cloud today like the Watson Data Platform that you've extracted and put here. >> We have extracted part of the testing experience, so we've extracted notebooks and a graphical tool called ModelBuilder from DSX as part of IBM machine learning now, and we're going to add more of DSX as we go. But the goal is to really share code and function across private cloud and public cloud. As Rob Thomas defined it, we want with private cloud to offer all the features and functionality of public cloud, except that it would run inside a firewall. We are really developing machine learning and Watson machine learning on a command code base. It's an internal open source project. We share code, and then, we shape on different platform. >> I mean, you haven't, just now, used the word hybrid. Every now and then IBM does, but do you see that so-called hybrid use case as viable, or do you see it more, some workloads should run on prem, some should run in the cloud, and maybe they'll never come together? >> Machine learning, you basically have to face, one is training and the other is scoring. I see people moving training to cloud quite easily, unless there is some regulation about data privacy. But training is a good fit for cloud because usually you need a large computing system but only for limited time, so elasticity's great. But then deployment, if you want to score transaction in a CICS transaction, it has to run beside CICS, not cloud. If you want to score data on an IoT gateway, you want to score other gateway, not in a data center. I would say that may not be what people think first, but what will drive really the split between public cloud, private, and on prem is where you want to apply your machine learning models, where you want to score. For instance, smart watches, they are switching to gear to fit measurement system. You want to score your health data on the watch, not in the internet somewhere. >> Right, and in that CICS example that you gave, you'd essentially be bringing the model to the CICS data, is that right? >> Yes, that's what we do. That's a value of machine learning for Z is if you want to score transactions happening on Z, you need to be running on Z. So it's clear, mainframe people, they don't want to hear about public cloud, so they will be the last one moving. They have their reasons, but they like mainframe because it ties really, really secure and private. >> Dave: Public cloud's a dirty word. >> Yes, yes, for Z users. At least that's what I was told, and I could check with many people. But we know that in general the move is for public cloud, so we want to help people, depending on their journey, of the cloud. >> You've got one of those, too. Jean Francois, thanks very much for coming on theCUBE, it was really a pleasure having you back. >> Thank you. >> You're welcome. Alright, keep it right there, everybody. We'll be back with our next guest. This is theCUBE, we're live from the Waldorf Astoria. IBM's machine learning announcement, be right back. (electronic keyboard music)
SUMMARY :
Brought to you by IBM. Good to see you again. on, big day for you guys. What are the key things that we and then what you do after. "It's what you do with them that matters." So, exactly what you said. but we won't stop at Spark ML. the past certainly 10 years, so that you make better decisions. that you object to, that we have Terminator in the next decade cognitive functions that and maybe we won't need to and how would you expect that to change? to exclude it, so you know. and now, look at that. Do we feel threatened because No, it doesn't bother us, right. and you don't want to disturb but it's not all that I believe in the press release it said so we will leverage this when it's out. and now, if you have structure data, like the Watson Data Platform But the goal is to really but do you see that so-called is where you want to apply is if you want to score so we want to help people, depending on it was really a pleasure having you back. from the Waldorf Astoria.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
Jean Francois | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
10-year | QUANTITY | 0.99+ |
Stu Miniman | PERSON | 0.99+ |
Avi Mehta | PERSON | 0.99+ |
New York | LOCATION | 0.99+ |
Anaconda | ORGANIZATION | 0.99+ |
70% | QUANTITY | 0.99+ |
Jean Francois Puget | PERSON | 0.99+ |
next year | DATE | 0.99+ |
Two | QUANTITY | 0.99+ |
Last week | DATE | 0.99+ |
next quarter | DATE | 0.99+ |
90% | QUANTITY | 0.99+ |
Rob Thomas | PERSON | 0.99+ |
one-time | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
Five years ago | DATE | 0.99+ |
one word | QUANTITY | 0.99+ |
CICS | ORGANIZATION | 0.99+ |
Python | TITLE | 0.99+ |
a year ago | DATE | 0.99+ |
one | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
next decade | DATE | 0.98+ |
one week | QUANTITY | 0.98+ |
first solution | QUANTITY | 0.98+ |
XGBoost | TITLE | 0.98+ |
a week | QUANTITY | 0.97+ |
Spark ML | TITLE | 0.97+ |
'60s | DATE | 0.97+ |
ModelBuilder | TITLE | 0.96+ |
one size | QUANTITY | 0.96+ |
One | QUANTITY | 0.95+ |
first | QUANTITY | 0.94+ |
Watson Data Platform | TITLE | 0.93+ |
each time | QUANTITY | 0.93+ |
Kaggle | ORGANIZATION | 0.92+ |
Stu | PERSON | 0.91+ |
this quarter | DATE | 0.91+ |
DSX | TITLE | 0.89+ |
XGBoost | ORGANIZATION | 0.89+ |
Waldorf Astoria | ORGANIZATION | 0.86+ |
Spark ML. | TITLE | 0.85+ |
z/OS | TITLE | 0.82+ |
years | DATE | 0.8+ |
centuries | QUANTITY | 0.75+ |
10 years | QUANTITY | 0.75+ |
DSX | ORGANIZATION | 0.72+ |
Terminator | TITLE | 0.64+ |
XTC69X | TITLE | 0.63+ |
IBM Machine Learning Launch 2017 | EVENT | 0.63+ |
couple times | QUANTITY | 0.57+ |
machine learning | EVENT | 0.56+ |
X | TITLE | 0.56+ |
Watson | TITLE | 0.55+ |
these products | QUANTITY | 0.53+ |
-G-B | COMMERCIAL_ITEM | 0.53+ |
H20 | ORGANIZATION | 0.52+ |
TensorFlow | ORGANIZATION | 0.5+ |
theCUBE | ORGANIZATION | 0.49+ |
CUBE | ORGANIZATION | 0.37+ |