Ravi Dharnikota, SnapLogic & Katharine Matsumoto, eero - Big Data SV 17 - #BigDataSV - #theCUBE
>> Announcer: Live from San Jose, California, it's theCUBE, covering Big Data Silicon Valley 2017. (light techno music) >> Hey, welcome back everybody. Jeff Frick here with theCUBE. We're at Big Data SV, wrapping up with two days of wall-to-wall coverage of Big Data SV which is associated with Strata Comp, which is part of Big Data Week, which always becomes the epicenter of the big data world for a week here in San Jose. We're at the historic Pagoda Lounge, and we're excited to have our next two guests, talking a little bit different twist on big data that maybe you hadn't thought of. We've got Ravi Dharnikota, he is the Chief Enterprise Architect at SnapLogic, welcome. - Hello. >> Jeff: And he has brought along a customer, Katharine Matsumoto, she is a Data Scientist at eero, welcome. >> Thank you, thanks for having us. >> Jeff: Absolutely, so we had SnapLogic on a little earlier with Garavs, but tell us a little bit about eero. I've never heard of eero before, for folks that aren't familiar with the company. >> Yeah, so eero is a start-up based in San Francisco. We are sort of driven to increase home connectivity, both the performance and the ease of use, as wifi becomes totally a part of everyday life. We do that. We've created the world's first mesh wifi system. >> Okay. >> So that means you have, for an average home, three different individual units, and you plug one in to replace your router, and then the other three get plugged in throughout the home just to power, and they're able to spread coverage, reliability, speed, throughout your homes. No more buffering, dead zones, in that way back bedroom. >> Jeff: And it's a consumer product-- >> Yes. >> So you got all the fun and challenges of manufacturing, you've got the fun challenges of distribution, consumer marketing, so a lot of challenges for a start-up. But you guys are doing great. Why SnapLogic? >> Yeah, so in addition to the challenges with the hardware, we also are a really strong software. So, everything is either set up via the app. We are not just the backbone to your home's connectivity, but also part of it, so we're sending a lot of information back from our devices to be able to learn and improve the wifi that we're delivering based on the data we get back. So that's a lot of data, a lot of different teams working on different pieces. So when we were looking at launch, how do we integrate all of that information together to make it accessible to business users across different teams, and also how do we handle the scale. I made a checklist (laughs), and SnapLogic was really the only one that seemed to be able to deliver on both of those promises with a look to the future of like, I don't know what my next Sass product is, I don't know what our next API point we're going to need to hit is, sort of the flexibility of that as well as the fact that we have analysts were able to pick it up, engineers were able to pick it up, and I could still manage all the software written by, or the pipelines written by each of those different groups without having to read whatever version of code they're writing. >> Right, so Ravi, we heard you guys are like doubling your customer base every year, and lots of big names, Adobe we talked about earlier today. But I don't know that most people would think of SnapLogic really, as a solution to a start-up mesh network company. >> Yeah, absolutely, so that's a great point though, let me just start off with saying that in this new world, we don't discriminate-- (guest and host laugh) we integrate and we don't discriminate. In this new world that I speak about is social media, you know-- >> Jeff: Do you bus? (all laugh) >> So I will get to that. (all laugh) So, social, mobile, analytics, and cloud. And in this world, people have this thing which we fondly call integrators' dilemma. You want to integrate apps, you go to a different tool set. You integrate data, you start thinking about different tool sets. So we want to dispel that and really provide a unified platform for both apps and data. So remember, when we are seeing all the apps move into the cloud and being provided as services, but the data systems are also moving to the cloud. You got your data warehouses, databases, your BI systems, analytical tools, all are being provided to you as services. So, in this world data is data. If it's apps, it's probably schema mapping. If it's data systems, it's transformations moving from one end to the other. So, we're here to solve both those challenges in this new world with a unified platform. And it also helps that our lineage and the brain trust that brings us here, we did this a couple of decades ago and we're here to reinvent that space. >> Well, we expect you to bring Clayton Christensen on next time you come to visit, because he needs a new book, and I think that's a good one. (all laugh) But I think it was a really interesting part of the story though too, is you have such a dynamic product. Right, if you looked at your boxes, I've got the website pulled up, you wouldn't necessarily think of the dynamic nature that you're constantly tweaking and taking the data from the boxes to change the service that you're delivering. It's not just this thing that you made to a spec that you shipped out the door. >> Yeah, and that's really where the auto connected, we did 20 from our updates last year. We had problems with customers would have the same box for three years, and the technology change, the chips change, but their wifi service is the same, and we're constantly innovating and being able to push those out, but if you're going to do that many updates, you need a lot of feedback on the updates because things break when you update sometimes, and we've been able to build systems that catch that that are able to identify changes that say, not one person could be able to do by looking at their own things or just with support. We have leading indicators across all sorts of different stability and performance and different devices, so if Xbox changes their protocols, we can identify that really quickly. And that's sort of the goal of having all the data in one place across customer support and manufacturing. We can easily pinpoint where in the many different complicated factors you can find the problem. >> Have issues. - Yeah. >> So, I've actually got questions for both of you. Ravi, starting with you, it sounds like you're trying to tackle a challenge that in today's tools would have included Kafka at the data integration level, and there it's very much a hub and spoke approach. And I guess it's also, you would think of the application level integration more like the TIBCO and other EAI vendors in a previous generation-- - [Ravi] Yeah. >> Which I don't think was hub and spoke, it was more point to point, and I'm curious how you resolve that, in other words, how you'd tackle both together in a unified architecture? >> Yeah, that's an excellent question. In fact, one of the integrators' dilemma that I spoke about you've got the problem set where you've got the high-latency, high-volume, where you go to ETL tools. And then the low-latency, low-volume, you immediately go to the TIBCOs of the world and that's ESB, EAI sort of tool sets that you look to solve. So what we've done is we've thought about it hard. At one level we've just said, why can integration not be offered as a service? So that's step number one where the design experience is through the cloud, and then execution can just happen anywhere, behind your firewall or in the cloud, or in a big data system, so it caters to all of that. But then also, the data set itself is changing. You're seeing a lot of the document data model that are being offered by the Sass services. So the old ETL companies that were built before all of this social, mobile sort of stuff came around, it was all row and column oriented. So how do you deal with the more document oriented JSON sort of stuff? And we built that for, the platform to be able to handle that kind of data. Streaming is an interesting and important question. Pretty much everyone I spoke to last year were, streaming was a big-- let's do streaming, I want everything in real-time. But batch also has it's place. So you've got to have a system that does batch as well as real-time, or as near real-time as needed. So we solve for all of those problems. >> Okay, so Katharine, coming to you, each customer has a different, well, every consumer has a different, essentially, a stall base. To bring all the telemetry back to make sense out of what's working and what's not working, or how their environment is changing. How do you make sense out of all that, considering that it's not B to B, it's B to C so, I don't know how many customers you have, but it must be in the tens or hundreds. >> I'm sure I'm not allowed to say (laughs). >> No. But it's the distinctness of each customer that I gather makes the support challenge for you. >> Yeah, and part of that's exposing as much information to the different sources, and starting to automate the ways in which we do it. There's certainly a lot, we are very early on as a company. We've hit our year mark for public availability the end of last month so-- >> Jeff: Congratulations. >> Thank you, it's been a long year. But with that we learn more, constantly, and different people come to different views as different new questions come up. The special-snowflake aspect of each customer, there's a balance between how much actually is special and how much you can find patterns. And that's really where you get into much more interesting things on the statistics and machine learning side is how do you identify those patterns that you may not even know you're looking for. We are still beginning to understand our customers from a qualitative standpoint. It actually came up this week where I was doing an analysis and I was like, this population looks kind of weird, and with two clicks was able to send out a list over to our CX team. They had access to all the same systems because all of our data is connected and they could pull up the tickets based on, because through SnapLogic, we're joining all the data together. We use Looker as our BI tool, they were just able to start going into all the tickets and doing a deep dive, and that's being presented later this week as to like, hey, what is this population doing? >> So, for you to do this, that must mean you have at least some data that's common to every customer. For you to be able to use something like Looker, I imagine. If every customer was a distinct snowflake, it would be very hard to find patterns across them. >> Well I mean, look at how many people have iPhones, have MacBooks, you know, we are looking at a lot of aggregate-level data in terms of how things are behaving, and always the challenge of any data science project is creating those feature extractions, and so that's where the process we're going through as the analytics team is to start extracting those things and adding them to our central data source. That's one of the areas also where having very integrated analytics and ETL has been helpful as we're just feeding that information back in to everyone. So once we figure out, oh hey, this is how you differentiate small businesses from homes, because we do see a couple of small businesses using our product, that goes back into the data and now everyone's consuming it. Each of those common features, it's a slow process to create them, but it's also increases the value every time you add one to the central group. >> One last question-- >> It's an interesting way to think of the wifi service and the connected devices an integration challenge, as opposed to just an appliance that kind of works like an old POTS line, which it isn't, clearly at all. (all laugh) With 20 firmware updates a year (laughs). >> Yeah, there's another interesting point, that we were just having the discussion offline, it's that it's a start-up. They obviously don't have the resources or the app, but have a large IT department to set up these systems. So, as Katharine mentioned, one person team initially when they started, and to be able to integrate, who knows which system is going to be next. Maybe they experiment with one cloud service, it perhaps scales to their liking or not, and then they quickly change and go to another one. You cannot change the integration underneath that. You got to be able to adjust to that. So that flexibility, and the other thing is, what they've done with having their business become self-sufficient is another very fascinating thing. It's like, give them the power. Why should IT or that small team become the bottom line? Don't come to me, I'll just empower you with the right tool set and the patterns and then from there, you change and put in your business logic and be productive immediately. >> Let me drill down on that, 'cause my understanding, at least in the old world was that DTL was kind of brittle, and if you're constantly ... Part of actually, the genesis of Hadoop, certainly at Yahoo was, we're going to bring all the data we might ever possibly need into the repository so we don't have to keep re-writing the pipeline. And it sounds like you have the capability to evolve the pipeline rather quickly as you want to bring more data into this sort of central resource. Am I getting that about right? >> Yeah, it's a little bit of both. We do have a central, I think, down data's the fancy term for that, so we're bringing everything into S3, jumping it into those raw JSONs, you know, whatever nested format it comes into, so whatever makes it so that extraction is easy. Then there's also, as part of ETL, there's that last mile which is a lot of business logic, and that's where you run into teams starting to diverge very quickly if you don't have a way for them to give feedback into the process. We've really focused on empowering business users to be self-service, in terms of answering their own questions, and that's freed up our in list to add more value back into the greater group as well as answer harder questions, that both beget more questions, but also feeds back insights into that data source because they have access to their piece of that last business logic. By changing the way that one JSON field maps or combining two, they've suddenly created an entirely new variable that's accessible to everyone. So it's sort of last-leg business logic versus the full transport layer. We have a whole platform that's designed to transport everything and be much more robust to changes. >> Alright, so let me make sure I understand this, it sounds like the less-trained or more self-sufficient, they go after the central repository and then the more highly-trained and scarcer resource, they are responsible for owning one or more of the feeds and that they enrich that or make that more flexible and general-purpose so that those who are more self-sufficient can get at it in the center. >> Yeah, and also you're able to make use of the business. So we have sort of a hybrid model with our analysts that are really closely embedded into the teams, and so they have all that context that you need that if you're relying on, say, a central IT team, that you have to go back and forth of like, why are you doing this, what does this mean? They're able to do all that in logic. And then the goal of our platform team is really to focus on building technologies that complement what we have with SnapLogic or others that are accustomed to our data systems that enable that same sort of level of self-service for creating specific definitions, or are able to do it intelligently based on agreed upon patterns of extraction. >> George: Okay. >> Heavy science. Alright, well unfortunately we are out of time. I really appreciate the story, I love the site, I'll have to check out the boxes, because I know I have a bunch of dead spots in my house. (all laugh) But Ravi, I want to give you the last word, really about how is it working with a small start-up doing some cool, innovative stuff, but it's not your Adobes, it's not a lot of the huge enterprise clients that you have. What have you taken, why does that add value to SnapLogic to work with kind of a cool, fun, small start-up? >> Yeah, so the enterprise is always a retrofit job. You have to sort of go back to the SAPs and the Oracle databases and make sure that we are able to connect the legacy with a new cloud application. Whereas with a start-up, it's all new stuff. But their volumes are constantly changing, they probably have spikes, they have burst volumes, they're thinking about this differently, enabling everyone else, quickly changing and adopting newer technologies. So we have to be able to adjust to that agility along with them. So we're very excited as sort of partnering with them and going along with them on this journey. And as they start looking at other things, the machine learning and the AI and the IRT space, we're very excited to have that partnership and learn from them and evolve our platform as well. >> Clearly. You're smiling ear-to-ear, Katharine's excited, you're solving problems. So thanks again for taking a few minutes and good luck with your talk tomorrow. Alright, I'm Jeff Frick, he's George Gilbert, you're watching theCUBE from Big Data SV. We'll be back after this short break. Thanks for watching. (light techno music)
SUMMARY :
it's theCUBE, that maybe you hadn't thought of. Jeff: And he has brought along a customer, for folks that aren't familiar with the company. We are sort of driven to increase home connectivity, and you plug one in to replace your router, So you got all the fun and challenges of manufacturing, We are not just the backbone to your home's connectivity, and lots of big names, Adobe we talked about earlier today. (guest and host laugh) but the data systems are also moving to the cloud. and taking the data from the boxes and the technology change, the chips change, - Yeah. more like the TIBCO and other EAI vendors the platform to be able to handle that kind of data. considering that it's not B to B, that I gather makes the support challenge for you. and starting to automate the ways in which we do it. and how much you can find patterns. that must mean you have at least some data as the analytics team is to start and the connected devices an integration challenge, and then they quickly change and go to another one. into the repository so we don't have to keep and that's where you run into teams of the feeds and that they enrich that and so they have all that context that you need it's not a lot of the huge enterprise clients that you have. and the Oracle databases and make sure and good luck with your talk tomorrow.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Jeff Frick | PERSON | 0.99+ |
Katharine Matsumoto | PERSON | 0.99+ |
Jeff | PERSON | 0.99+ |
Ravi Dharnikota | PERSON | 0.99+ |
Katharine | PERSON | 0.99+ |
George Gilbert | PERSON | 0.99+ |
Adobe | ORGANIZATION | 0.99+ |
Yahoo | ORGANIZATION | 0.99+ |
George | PERSON | 0.99+ |
San Jose | LOCATION | 0.99+ |
San Francisco | LOCATION | 0.99+ |
tens | QUANTITY | 0.99+ |
last year | DATE | 0.99+ |
three years | QUANTITY | 0.99+ |
Clayton Christensen | PERSON | 0.99+ |
20 | QUANTITY | 0.99+ |
one | QUANTITY | 0.99+ |
Ravi | PERSON | 0.99+ |
San Jose, California | LOCATION | 0.99+ |
SnapLogic | ORGANIZATION | 0.99+ |
iPhones | COMMERCIAL_ITEM | 0.99+ |
Kafka | TITLE | 0.99+ |
two days | QUANTITY | 0.99+ |
hundreds | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
tomorrow | DATE | 0.99+ |
two clicks | QUANTITY | 0.99+ |
TIBCO | ORGANIZATION | 0.99+ |
both | QUANTITY | 0.99+ |
each customer | QUANTITY | 0.99+ |
Xbox | COMMERCIAL_ITEM | 0.99+ |
Big Data Week | EVENT | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
One last question | QUANTITY | 0.98+ |
eero | ORGANIZATION | 0.98+ |
Pagoda Lounge | LOCATION | 0.98+ |
20 firmware updates | QUANTITY | 0.98+ |
Adobes | ORGANIZATION | 0.98+ |
this week | DATE | 0.98+ |
S3 | TITLE | 0.98+ |
Strata Comp | ORGANIZATION | 0.98+ |
MacBooks | COMMERCIAL_ITEM | 0.98+ |
Each | QUANTITY | 0.97+ |
three | QUANTITY | 0.97+ |
each | QUANTITY | 0.97+ |
one person | QUANTITY | 0.96+ |
JSON | TITLE | 0.96+ |
two guests | QUANTITY | 0.95+ |
today | DATE | 0.95+ |
three different individual units | QUANTITY | 0.95+ |
later this week | DATE | 0.95+ |
a week | QUANTITY | 0.94+ |
#BigDataSV | TITLE | 0.93+ |
earlier today | DATE | 0.92+ |
one level | QUANTITY | 0.92+ |
couple of decades ago | DATE | 0.9+ |
CX | ORGANIZATION | 0.9+ |
theCUBE | ORGANIZATION | 0.9+ |
SnapLogic | TITLE | 0.87+ |
end | DATE | 0.87+ |
first mesh | QUANTITY | 0.87+ |
one person team | QUANTITY | 0.87+ |
Sass | TITLE | 0.86+ |
one cloud | QUANTITY | 0.84+ |
Big Data SV | TITLE | 0.84+ |
last month | DATE | 0.83+ |
one place | QUANTITY | 0.83+ |
Big Data Silicon Valley 2017 | EVENT | 0.82+ |
Darren Chinen, Malwarebytes - Big Data SV 17 - #BigDataSV - #theCUBE
>> Announcer: Live from San Jose, California, it's The Cube, covering Big Data Silicon Valley 2017. >> Hey, welcome back everybody. Jeff Frick here with The Cube. We are at Big Data SV in San Jose at the Historic Pagoda Lounge, part of Big Data week which is associated with Strata + Hadoop. We've been coming here for eight years and we're excited to be back. The innovation and dynamicism of big data and evolutions now with machine learning and artificial intelligence, just continues to roll, and we're really excited to be here talking about one of the nasty aspects of this world, unfortunately, malware. So we're excited to have Darren Chinen. He's the senior director of data science and engineering from Malwarebytes. Darren, welcome. >> Darren: Thank you. >> So for folks that aren't familiar with the company, give us just a little bit of background on Malwarebytes. >> So Malwarebytes is basically a next-generation anti-virus software. We started off as humble roots with our founder at 14 years old getting infected with a piece of malware, and he reached out into the community and, at 14 years old, wrote his first, with the help of some people, wrote his first lines of code to remediate a couple of pieces of malware. It grew from there and I think by the ripe old age of 18, founded the company. And he's now I want to say 26 or 27 and we're doing quite well. >> It was interesting, before we went live you were talking about his philosophy and how important that is to the company and now has turned into really a strategic asset, that no one should have to suffer from malware, and he decided to really offer a solution for free to help people rid themselves of this bad software. >> Darren: That's right. Yeah, so Malwarebytes was founded under the principle that Marcin believes that everyone has the right to a malware-free existence and so we've always offered a free version Malwarebytes that will help you to remediate if your machine does get infected with a piece of malware. And that's actually still going to this day. >> And that's now given you the ability to have a significant amount of inpoint data, transactional data, trend data, that now you can bake back into the solution. >> Darren: That's right. It's turned into a strategic advantage for the company, it's not something I don't think that we could have planned at 18 years old when he was doing this. But we've instrumented it so that we can get some anonymous-level telemetry and we can understand how malware proliferates. For many, many years we've been positioned as a second-opinion scanner and so we're able to see a lot of things, some trends happening in there and we can actually now see that in real time. >> So, starting out as a second-position scanner, you're basically looking at, you're finding what others have missed. And how can you, what do you have to do to become the first line of defense? >> Well, with our new product Malwarebytes 3.0, I think some of that landscape is changing. We have a very complete and layered offering. I'm not the product manager, so I don't think, as the data science guy, I don't know that I'm qualified to give you the ins and outs, but I think some of that is changing as we have, we've combined a lot of products and we have a much more complete sweep of layered protection built into the product. >> And so, maybe tell us, without giving away all the secret sauce, what sort of platform technologies did you use that enabled you to scale to these hundreds of millions of in points, and then to be fast enough at identifying things that were trending that are bad that you had to prioritize? >> Right, so traditionally, I think AV companies, they have these honeypots, right, where they go and the collect a piece of virus or a piece of malware, and they'll take the MD5 hash of that and then they'll basically insert that into a definition's database. And that's a very exact way to do it. The problem is is that there's so much malware or viruses out there in the wild, it's impossible to get all of them. I think one of the things that we did was we set up telemetry and we have a phenomenal research team where we're able to actually have our team catch entire families of malware, and that's really the secret sauce to Malwarebytes. There's several other levels but that's where we're helping out in the immediate term. What we do is we have, internally, we sort of jokingly call it a Lambda Two architecture. We had considered Lambda long ago, long ago and I say about a year ago when we first started this journey. But there's, Lambda is riddled with, as you know, a number of issues. If you've ever talked to Jay Kreps from Confluent, he has a lot of opinions on that, right? And one of the key problems with that is, that if you do a traditional Lambda, you have to implement your code in two places, it's very difficult, things get out of sync, you have to have replay frameworks. And these are some of the challenges with Lambda. So we do processing in a number of areas. The first thing that we did was we implemented Kafka to handle all of the streaming data. We use Kafka streams to do inline stateless transformations and then we also use Kafka Connect. And we write all of our data both into HBase, we use that, we may swap that out later for something like Redis, and that would be a thin speed layer. And then we also move the data into S3 and we use some ephemeral clusters to do very large-scale batch processing, and that really provides our data lab. >> When you call that Lambda Two, is that because you're still working essentially on two different infrastructures, so your code isn't quite the same? You still have to check the results on either on either fork. >> That's right, yeah, we didn't feel like it was, we did evaluate doing everything in the stream. But there are certain operations that are difficult to do with purely streamed processing, and so we did need a little bit, we did need to have a thin, what we call real time indicators, a speed layer, to supplement what we were doing in the stream. And so that's the differentiating factor between a traditional Lambda architecture where you'd want to have everything in the stream and everything in batch, and the batch is really more of a truing mechanism as opposed to, our real time is really directional, so in the traditional sense, if you look at traditional business intelligence, you'd have KPIs that would allow you to gauge the health of your business. We have RTIs, Real Time Indicators, that allow us to gauge directionally, what is important to look at this day, this hour, this minute? >> This thing is burning up the charts, >> Exactly. >> Therefore it's priority one. >> That's right, you got it. >> Okay. And maybe tell us a little more, because everyone I'm sure is familiar with Kafka but the streams product from them is a little newer as is Kafka Connect, so it sounds like you've got, it's not just the transport, but you've got some basic analytics and you've got the ability to do the ETL because you've got Connect that comes from sources and destinations, sources and syncs. Tell us how you've used that. >> Well, the streams product is, it's quite different than something like Spark Streaming. It's not working off micro-batching, it's actually working off the stream. And the second thing is, it's not a separate cluster. It's just a library, effectively a .jar file, right? And so because it works natively with Kafka, it handles certain things there quite well. It handles back pressure and when you expand the cluster, it's pretty good with things like that. We've found it to be a fairly stable technology. It's just a library and we've worked very closely with Confluent to develop that. Whereas Kafka Connect is really something that we use to write out to S3. In fact, Confluent just released a new, an S3 connector direct. We were using Stream X, which was a wrapper on top of an HDFS connector and they rigged that up to write to S3 for us. >> So tell us, as you look out, what sorts of technologies do you see as enabling you to build a platform that's richer, and then how would that show up in the functionality consumers like we would see? >> Darren: With respect to the architecture? >> Yeah. >> Well one of the things that we had to do is we had to evaluate where we wanted to spend our time. We're a very small team, the entire data science and engineering team is less than I think 10 months old. So all of us got hired, we've started this platform, we've gone very, very fast. And we had to decide, how are we going to, a, get, we've made this big investment, how are we going to get value to our end customer quickly, so that they're not waiting around and you get the traditional big-data story where, we've spent all this money and now we're not getting anything out of it. And so we had to make some of those strategic decisions and because of the fact that the data was really truly big data in nature, there's just a huge amount of work that has to be done in these open-source technologies. They're not baked, it's not like going out to Oracle and giving them a purchase order and you install it and away you go. There's a tremendous amount of work, and so we've made some strategic decisions on what we're going to do in open-source and what we're going to do with a third-party vendor solution. And one of those solutions that we decided was workload automation. So I just did a talk on this about how Control-M from BMC was really the tool that we chose to handle a lot of the coordination, the sophisticated coordination, and the workload automation on the batch side, and we're about to implement that in a data-quality monitoring framework. And that's turned out to be an incredibly stable solution for us. It's allowed us to not spend time with open-source solutions that do the same things like Airflow, which may or may not work well, but there's really no support around that, and focus our efforts on what we believe to be the really, really hard problems to tackle in Kafka, Kafka Streams, Connect, et cetera. >> Is it fair to say that Kafka plus Kafka Connect solves many of the old ETL problems or do you still need some sort of orchestration tool on top of it to completely commoditize, essentially moving and transforming data from OLTP or operational system to a decision support system? >> I guess the answer to that is, it depends on your use case. I think there's a lot of things that Kafka and the stream's job can solve for you, but I don't think that we're at the point where everything can be streaming. I think that's a ways off. There's legacy systems that really don't natively stream to you anyway, and there's just certain operations that are just more efficient to do in batch. And so that's why we've, I don't think batch for us is going away any time soon and that's one of the reasons why workload automation in the batch layer initially was so important and we've decided to extend that, actually, into building out a data-quality monitoring framework to put a collar around how accurate our data is on the real-time side. >> Cuz it's really horses for courses, it's not one or the other, it's application-specific, what's the best solution for that particular is. >> Yeah, I don't think that there's, if there was a one-size-fits-all it'd be a company, and there would be no need for architects, so I think that you have to look at your use case, your company, what kind of data, what style of data, what type of analysis do you need. Do you really actually need the data in real time and if you do put in all the work to get it in real time, are you going to be able to take action on it? And I think Malwarebytes was a great candidate. When it came in, I said, "Well, it does look like we can justify "the need for real time data, and the effort "that goes into building out a real-time framework." >> Jeff: Right, right. And we always say, what is real time? In time to do something about it, (all chuckle) and if there's not time to do something about it, depending on how you define real time, really what difference does it make if you can't do anything about it that fast. So as you look out in the future with IoT, all these connected devices, this is a hugely increased attack surface as we just read our essay a few weeks back. How does that work into your planning? What do you guys think about the future where there's so many more connected devices out on the edge and various degrees of intelligence and opportunities to hi-jack, if you will? >> Yeah, I think, I don't think I'm qualified to speak about the Malwarebytes product roadmap as far as IoT goes. >> But more philosophically, from a professional point of view, cuz every coin has two sides, there's a lot of good stuff coming from IoT and connected devices, but as we keep hearing over and over, just this massive attack surface expansion. >> Well I think, for us, the key is we're small and we're not operating, like I came from Apple where we operated on a budget of infinity, so we're not-- >> Have to build the infinity or the address infinity (Darren laughs) with an actual budget. >> We're small and we have to make sure that whatever we do creates value. And so what I'm seeing in the future is, as we get more into the IoT space and logs begin to proliferate and data just exponentiates in size, it's really how do we do the same thing and how are we going to manage that in terms of cost? Generally, big data is very low in information density. It's not like transactional systems where you get the data, it's effectively an Excel spreadsheet and you can go run some pivot tables and filters and away you go. I think big data in general requires a tremendous amount of massaging to get to the point where a data scientist or an analyst can actually extract some insight and some value. And the question is, how do you massage that data in a way that's going to be cost-effective as IoT expands and proliferates? So that's the question that we're dealing with. We're, at this point, all in with cloud technologies, we're leveraging quite a few of Amazon services, server-less technologies as well. We just are in the process of moving to the Athena, to Athena, as just an on-demand query service. And we use a lot of ephemeral clusters as well, and that allows us to actually run all of our ETL in about two hours. And so these are some of the things that we're doing to prepare for this explosion of data and making sure that we're in a position where we're not spending a dollar to gain a penny if that makes sense. >> That's his business. Well, he makes fun of that business model. >> I think you could do it, you want to drive revenue to sell dollars for 90 cents. >> That's the dot com model, I was there. >> Exactly, and make it up in volume. All right, Darren Chenin, thanks for taking a few minutes out of your day and giving us the story on Malwarebytes, sounds pretty exciting and a great opportunity. >> Thanks, I enjoyed it. >> Absolutely, he's Darren, he's George, I'm Jeff, you're watching The Cube. We're at Big Data SV at the Historic Pagoda Lounge. Thanks for watching, we'll be right back after this short break. (upbeat techno music)
SUMMARY :
it's The Cube, and evolutions now with machine learning So for folks that aren't and he reached out into the community and, and how important that is to the company and so we've always offered a free version And that's now given you the ability it so that we can get what do you have to do to become and we have a much more complete sweep and that's really the secret the results on either and so we did need a little bit, and you've got the ability to do the ETL that we use to write out to S3. and because of the fact that the data and that's one of the reasons it's not one or the other, and if you do put in all the and opportunities to hi-jack, if you will? I don't think I'm qualified to speak and connected devices, or the address infinity and how are we going to Well, he makes fun of that business model. I think you could do it, and giving us the story on Malwarebytes, the Historic Pagoda Lounge.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Jeff | PERSON | 0.99+ |
Darren Chinen | PERSON | 0.99+ |
Darren | PERSON | 0.99+ |
Jeff Frick | PERSON | 0.99+ |
Darren Chenin | PERSON | 0.99+ |
George | PERSON | 0.99+ |
Jay Kreps | PERSON | 0.99+ |
90 cents | QUANTITY | 0.99+ |
two sides | QUANTITY | 0.99+ |
Apple | ORGANIZATION | 0.99+ |
Athena | LOCATION | 0.99+ |
Marcin | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
two places | QUANTITY | 0.99+ |
San Jose | LOCATION | 0.99+ |
BMC | ORGANIZATION | 0.99+ |
eight years | QUANTITY | 0.99+ |
San Jose, California | LOCATION | 0.99+ |
first lines | QUANTITY | 0.99+ |
Malwarebytes | ORGANIZATION | 0.99+ |
Kafka | TITLE | 0.99+ |
one | QUANTITY | 0.99+ |
10 months | QUANTITY | 0.99+ |
Kafka Connect | TITLE | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
Lambda | TITLE | 0.99+ |
first | QUANTITY | 0.99+ |
second thing | QUANTITY | 0.99+ |
Gene | PERSON | 0.99+ |
Excel | TITLE | 0.99+ |
Confluent | ORGANIZATION | 0.99+ |
The Cube | TITLE | 0.98+ |
first line | QUANTITY | 0.98+ |
27 | QUANTITY | 0.97+ |
26 | QUANTITY | 0.97+ |
Redis | TITLE | 0.97+ |
Kafka Streams | TITLE | 0.97+ |
S3 | TITLE | 0.97+ |
18 | QUANTITY | 0.96+ |
14 years old | QUANTITY | 0.96+ |
18 years old | QUANTITY | 0.96+ |
about two hours | QUANTITY | 0.96+ |
g ago | DATE | 0.96+ |
Connect | TITLE | 0.96+ |
second-position | QUANTITY | 0.95+ |
HBase | TITLE | 0.95+ |
first thing | QUANTITY | 0.95+ |
Historic Pagoda Lounge | LOCATION | 0.94+ |
both | QUANTITY | 0.93+ |
two different infrastructures | QUANTITY | 0.92+ |
S3 | COMMERCIAL_ITEM | 0.91+ |
Big Data | EVENT | 0.9+ |
The Cube | ORGANIZATION | 0.88+ |
Lambda Two | TITLE | 0.87+ |
Malwarebytes 3.0 | TITLE | 0.84+ |
Airflow | TITLE | 0.83+ |
a year ago | DATE | 0.83+ |
second-opinion | QUANTITY | 0.82+ |
hundreds of millions of | QUANTITY | 0.78+ |
Holden Karau, IBM Big Data SV 17 #BigDataSV #theCUBE
>> Announcer: Big Data Silicon Valley 2017. >> Hey, welcome back, everybody, Jeff Frick here with The Cube. We are live at the historic Pagoda Lounge in San Jose for Big Data SV, which is associated with Strathead Dupe World, across the street, as well as Big Data week, so everything big data is happening in San Jose, we're happy to be here, love the new venue, if you're around, stop by, back of the Fairmount, Pagoda Lounge. We're excited to be joined in this next segment by, who's now become a regular, any time we're at a Big Data event, a Spark event, Holden always stops by. Holden Karau, she's the principal software engineer at IBM. Holden, great to see you. >> Thank you, it's wonderful to be back yet again. >> Absolutely, so the big data meme just keeps rolling, Google Cloud Next was last week, a lot of talk about AI and ML and of course you're very involved in Spark, so what are you excited about these days? What are you, I'm sure you've got a couple presentations going on across the street. >> Yeah, so my two presentations this week, oh wow, I should remember them. So the one that I'm doing today is with my co-worker Seth Hendrickson, also at IBM, and we're going to be focused on how to use structured streaming for machine learning. And sort of, I think that's really interesting, because streaming machine learning is something a lot of people seem to want to do but aren't yet doing in production, so it's always fun to talk to people before they've built their systems. And then tomorrow I'm going to be talking with Joey on how to debug Spark, which is something that I, you know, a lot of people ask questions about, but I tend to not talk about, because it tends to scare people away, and so I try to keep the happy going. >> Jeff: Bugs are never fun. >> No, no, never fun. >> Just picking up on that structured streaming and machine learning, so there's this issue of, as we move more and more towards the industrial internet of things, like having to process events as they come in, make a decision. How, there's a range of latency that's required. Where does structured streaming and ML fit today, and where might that go? >> So structured streaming for today, latency wise, is probably not something I would use for something like that right now. It's in the like sub second range. Which is nice, but it's not what you want for like live serving of decisions for your car, right? That's just not going to be feasible. But I think it certainly has the potential to get a lot faster. We've seen a lot of renewed interest in ML liblocal, which is really about making it so that we can take the models that we've trained in Spark and really push them out to the edge and sort of serve them in the edge, and apply our models on end devices. So I'm really excited about where that's going. To be fair, part of my excitement is someone else is doing that work, so I'm very excited that they're doing this work for me. >> Let me clarify on that, just to make sure I understand. So there's a lot of overhead in Spark, because it runs on a cluster, because you have an optimizer, because you have the high availability or the resilience, and so you're saying we can preserve the predict and maybe serve part and carve out all the other overhead for running in a very small environment. >> Right, yeah. So I think for a lot of these IOT devices and stuff like that it actually makes a lot more sense to do the predictions on the device itself, right. These models generally are megabytes in size, and we don't need a cluster to do predictions on these models, right. We really need the cluster to train them, but I think for a lot of cases, pushing the prediction out to the edge node is actually a pretty reasonable use case. And so I'm really excited that we've got some work going on there. >> Taking that one step further, we've talked to a bunch of people, both like at GE, and at their Minds and Machines show, and IBM's Genius of Things, where you want to be able to train the models up in the cloud where you're getting data from all the different devices and then push the retrained model out to the edge. Can that happen in Spark, or do we have to have something else orchestrating all that? >> So actually pushing the model out isn't something that I would do in Spark itself, I think that's better served by other tools. Spark is not really well suited to large amounts of internet traffic, right. But it's really well suited to the training, and I think with ML liblocal it'll essentially, we'll be able to provide both sides of it, and the copy part will be left up to whoever it is that's doing their work, right, because like if you're copying over a cell network you need to do something very different as if you're broadcasting over a terrestrial XM or something like that, you need to do something very different for satellite. >> If you're at the edge on a device, would you be actually running, like you were saying earlier, structured streaming, with the prediction? >> Right, I don't think you would use structured streaming per se on the edge device, but essentially there would be a lot of code share between structured streaming and the code that you'd be using on the edge device. And it's being vectored out now so that we can have this code sharing and Spark machine learning. And you would use structured streaming maybe on the training side, and then on the serving side you would use your custom local code. >> Okay, so tell us a little more about Spark ML today and how we can democratize machine learning, you know, for a bigger audience. >> Right, I think machine learning is great, but right now you really need a strong statistical background to really be able to apply it effectively. And we probably can't get rid of that for all problems, but I think for a lot of problems, doing things like hyperparameter tuning can actually give really powerful tools to just like regular engineering folks who, they're smart, but maybe they don't have a strong machine learning background. And Spark's ML pipelines make it really easy to sort of construct multiple stages, and then just be like, okay, I don't know what these parameters should be, I want you to do a search over what these different parameters could be for me, and it makes it really easy to do this as just a regular engineer with less of an ML background. >> Would that be like, just for those of us who are, who don't know what hyperparameter tuning is, that would be the knobs, the variables? >> Yeah, it's going to spin the knobs on like our regularization parameter on like our regression, and it can also spin some knobs on maybe the engram sizes that we're using on the inputs to something else, right. And it can compare how these knobs sort of interact with each other, because often you can tune one knob but you actually have six different knobs that you want to tune and you don't know, if you just explore each one individually, you're not going to find the best setting for them working together. >> So this would make it easier for, as you're saying, someone who's not a data scientist to set up a pipeline that lets you predict. >> I think so, very much. I think it does a lot of the, brings a lot of the benefits from sort of the SciPy world to the big data world. And SciPy is really wonderful about making machine learning really accessible, but it's just not ready for big data, and I think this does a good job of bringing these same concepts, if not the code, but the same concepts, to big data. >> The SciPy, if I understand, is it a notebook that would run essentially on one machine? >> SciPy can be put in a notebook environment, and generally it would run on, yeah, a single machine. >> And so to make that sit on Spark means that you could then run it on a cluster-- >> So this isn't actually taking SciPy and distributing it, this is just like stealing the good concepts from SciPy and making them available for big data people. Because SciPy's done a really good job of making a very intuitive machine learning interface. >> So just to put a fine sort of qualifier on one thing, if you're doing the internet of things and you have Spark at the edge and you're running the model there, it's the programming model, so structured streaming is one way of programming Spark, but if you don't have structured streaming at the edge, would you just be using the core batch Spark programming model? >> So at the edge you'd just be using, you wouldn't even be using batch, right, because you're trying to predict individual events, right, so you'd just be calling predict with every new event that you're getting in. And you might have a q mechanism of some type. But essentially if we had this batch, we would be adding additional latency, and I think at the edge we really, the reason we're moving the models to the edge is to avoid the latency. >> So just to be clear then, is the programming model, so it wouldn't be structured streaming, and we're taking out all the overhead that forced us to use batch with Spark. So the reason I'm trying to clarify is a lot of people had this question for a long time, which is are we going to have a different programming model at the edge from what we have at the center? >> Yeah, that's a great question. And I don't think the answer is finished yet, but I think the work is being done to try and make it look the same. Of course, you know, trying to make it look the same, this is Boosh, it's not like actually barking at us right now, even though she looks like a dog, she is, there will always be things which are a little bit different from the edge to your cluster, but I think Spark has done a really good job of making things look very similar on single node cases to multi node cases, and I think we can probably bring the same things to ML. >> Okay, so it's almost time, we're coming back, Spark took us from single machine to cluster, and now we have to essentially bring it back for an edge device that's really light weight. >> Yeah, I think at the end of the day, just from a latency point of view, that's what we have to do for serving. For some models, not for everyone. Like if you're building a website with a recommendation system, you don't need to serve that model like on the edge node, that's fine, but like if you've got a car device we can't depend on cell latency, right, you have to serve that in car. >> So what are some of the things, some of the other things that IBM is contributing to the ecosystem that you see having a big impact over the next couple years? >> So there's a lot of really exciting things coming out of IBM. And I'm obviously pretty biased. I spend a lot of time focused on Python support in Spark, and one of the most exciting things is coming from my co-worker Brian, I'm not going to say his last name in case I get it wrong, but Brian is amazing, and he's been working on integrating Arrow with Spark, and this can make it so that it's going to be a lot easier to sort of interoperate between JVM languages and Python and R, so I'm really optimistic about the sort of Python and R interfaces improving a lot in Spark and getting a lot faster as well. And we're also, in addition to the Arrow work, we've got some work around making it a lot easier for people in R and Python to get started. The R stuff is mostly actually the Microsoft people, thanks Felix, you're awesome. I don't actually know which camera I should have done that to but that's okay. >> I think you got it! >> But Felix is amazing, and the other people working on R are too. But I think we've both been pursuing sort of making it so that people who are in the R or Python spaces can just use like Pit Install, Conda Install, or whatever tool it is they're used to working with, to just bring Spark into their machine really easily, just like they would sort of any other software package that they're using. Because right now, for someone getting started in Spark, if you're in the Java space it's pretty easy, but if you're in R or Python you have to do sort of a lot of weird setup work, and it's worth it, but like if we can get rid of that friction, I think we can get a lot more people in these communities using Spark. >> Let me see, just as a scenario, the R server is getting fairly well integrated into Sequel server, so would it be, would you be able to use R as the language with a Spark execution engine to somehow integrate it into Sequel server as an execution engine for doing the machine learning and predicting? >> You definitely, well I shouldn't say definitely, you probably could do that. I don't necessarily know if that's a good idea, but that's the kind of stuff that this would enable, right, it'll make it so that people that are making tools in R or Python can just use Spark as another library, right, and it doesn't have to be this really special setup. It can just be this library and they point out the cluster and they can do whatever work it wants to do. That being said, the Sequel server R integration, if you find yourself using that to do like distributed computing, you should probably take a step back and like rethink what you're doing. >> George: Because it's not really scale out. >> It's not really set up for that. And you might be better off doing this with like, connecting your Spark cluster to your Sequel server instance using like JDBC or a special driver and doing it that way, but you definitely could do it in another inverted sort of way. >> So last question from me, if you look out a couple years, how will we make machine learning accessible to a bigger and bigger audience? And I know you touched on the tuning of the knobs, hyperparameter tuning, what will it look like ultimately? >> I think ML pipelines are probably what things are going to end up looking like. But I think the other part that we'll sort of see is we'll see a lot more examples of how to work with certain kinds of data, because right now, like, I know what I need to do when I'm ingesting some textural data, but I know that because I spent like a week trying to figure out what the hell I was doing once, right. And I didn't bother to write it down. And it looks like no one else bothered to write it down. So really I think we'll see a lot of tools that look very similar to the tools we have today, they'll have more options and they'll be a bit easier to use, but I think the main thing that we're really lacking right now is good documentation and sort of good books and just good resources for people to figure out how to use these tools. Now of course, I mean, I'm biased, because I work on these tools, so I'm like, yeah, they're pretty great. So there might be other people who are like, Holden, no, you're wrong, we need to rethink everything. But I think this is, we can go very far with the pipeline concept. >> And then that's good, right? The democratization of these things opens it up to more people, you get more creative people solving more different problems, that makes the whole thing go. >> You can like install Spark easily, you can, you know, set up an ML pipeline, you can train your model, you can start doing predictions, you can, people that haven't been able to do machine learning at scale can get started super easily, and build a recommendation system for their small little online shop and be like, hey, you bought this, you might also want to buy Boosh, he's really cute, but you can't have this one. No no no, not this one. >> Such a tease! >> Holden: I'm sorry, I'm sorry. >> Well Holden, that will, we'll say goodbye for now, I'm sure we will see you in June in San Francisco at the Spark Summit, and look forward to the update. >> Holden: I look forward to chatting with you then. >> Absolutely, and break a leg this afternoon at your presentation. >> Holden: Thank you. >> She's Holden Karau, I'm Jeff Frick, he's George Gilbert, you're watching The Cube, we're at Big Data SV, thanks for watching. (upbeat music)
SUMMARY :
Announcer: Big Data We're excited to be joined to be back yet again. so what are you excited about these days? but I tend to not talk about, like having to process and really push them out to the edge and carve out all the other overhead We really need the cluster to train them, model out to the edge. and the copy part will be left up to and then on the serving side you would use you know, for a bigger audience. and it makes it really easy to do this that you want to tune and you don't know, that lets you predict. but the same concepts, to big data. and generally it would run the good concepts from SciPy the models to the edge So just to be clear then, from the edge to your cluster, machine to cluster, like on the edge node, that's fine, R and Python to get started. and the other people working on R are too. but that's the kind of stuff not really scale out. to your Sequel server instance and they'll be a bit easier to use, that makes the whole thing go. and be like, hey, you bought this, look forward to the update. to chatting with you then. Absolutely, and break you're watching The Cube,
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Jeff Frick | PERSON | 0.99+ |
Brian | PERSON | 0.99+ |
Jeff Frick | PERSON | 0.99+ |
Holden Karau | PERSON | 0.99+ |
Holden | PERSON | 0.99+ |
Felix | PERSON | 0.99+ |
George Gilbert | PERSON | 0.99+ |
George | PERSON | 0.99+ |
Joey | PERSON | 0.99+ |
Jeff | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
San Jose | LOCATION | 0.99+ |
Seth Hendrickson | PERSON | 0.99+ |
Spark | TITLE | 0.99+ |
Python | TITLE | 0.99+ |
last week | DATE | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
tomorrow | DATE | 0.99+ |
San Francisco | LOCATION | 0.99+ |
June | DATE | 0.99+ |
six different knobs | QUANTITY | 0.99+ |
GE | ORGANIZATION | 0.99+ |
Boosh | PERSON | 0.99+ |
Pagoda Lounge | LOCATION | 0.99+ |
one knob | QUANTITY | 0.99+ |
both sides | QUANTITY | 0.99+ |
two presentations | QUANTITY | 0.99+ |
this week | DATE | 0.98+ |
today | DATE | 0.98+ |
The Cube | ORGANIZATION | 0.98+ |
Java | TITLE | 0.98+ |
both | QUANTITY | 0.97+ |
one thing | QUANTITY | 0.96+ |
one | QUANTITY | 0.96+ |
Big Data week | EVENT | 0.96+ |
single machine | QUANTITY | 0.95+ |
R | TITLE | 0.95+ |
SciPy | TITLE | 0.95+ |
Big Data | EVENT | 0.95+ |
single machine | QUANTITY | 0.95+ |
each one | QUANTITY | 0.94+ |
JDBC | TITLE | 0.93+ |
Spark ML | TITLE | 0.89+ |
JVM | TITLE | 0.89+ |
The Cube | TITLE | 0.88+ |
single | QUANTITY | 0.88+ |
Sequel | TITLE | 0.87+ |
Big Data Silicon Valley 2017 | EVENT | 0.86+ |
Spark Summit | LOCATION | 0.86+ |
one machine | QUANTITY | 0.86+ |
a week | QUANTITY | 0.84+ |
Fairmount | LOCATION | 0.83+ |
liblocal | TITLE | 0.83+ |
Gaurav Dhillon | Big Data SV 17
>> Hey, welcome back everybody. Jeff Rick here with the Cube. We are live in downtown San Jose at the historic Pagoda Lounge, part of Big Data SV, which is part of Strata + Hadoop Conference, which is part of Big Data Week because everything big data is pretty much in San Jose this week. So we're excited to be here. We're here with George Gilbert, our big data analyst from Wikibon, and a great guest, Gaurav Dhillon, Chairman and CEO of SnapLogic. Gaurav, great to see you. >> Pleasure to be here, Jeff. Thank you for having me. George, good to see you. >> You guys have been very busy since we last saw you about a year ago. >> We have. We had a pretty epic year. >> Yeah, give us an update, funding, and customers, and you guys have a little momentum. >> It's a good thing. It's a good thing, you know. A friend and a real mentor to us, Dan Wormenhoven, the Founder and CEO of NetApp for a very long time, longtime CEO of NetApp, he always likes to joke that growth cures all startup problems. And you know what, that's the truth. >> Jeff: Yes. >> So we had a scorching year, you know. 2016 was a year of continuing to strengthen our products, getting a bunch more customers. We got about 300 new customers. >> Jeff: 300 new customers? >> Yes, and as you know, we don't sell to small business. We sell to the enterprise. >> Right, right. >> So, this is the who's who of pharmaceuticals, continued strength in high-tech, continued strength in retail. You know, all the way from Subway Sandwich to folks like AstraZeneca and Amgen and Bristol-Myers Squibb. >> Right. >> So, some phenomenal growth for the company. But, you know, we look at it very simply. We want to double our company every year. We want to do it in a responsible way. In other words, we are growing our business in such a way that we can sail over to cash flow break-even at anytime. So responsibly doubling your business is a wonderful thing. >> So when you look at it, obviously, you guys are executing, you've got good products, people are buying. But what are some of the macro-trends that you're seeing talking to all these customers that are really helping push you guys along? >> Right, right. So what we see is, and it used to be the majority of our business. It's now getting to be 50/50. But still I would say, historically, the primary driver for 2016 of our business was a digital transformation at a boardroom level causing a rethinking of the appscape and people bringing in cloud applications like Workday. So, one of the big drivers of our growth is helping fit Workday into the new fabric in many enterprises: Vassar College, into Capital One, into finance and various other sectors. Where people bring in Workday, they want to make that work with what they have and what they're going to buy in the future, whether it's more applications or new types of data strategies. And that is the primary driver for growth. In the past, it was probably a secondary driver, this new world of data warehousing. We like to think of it as a post-modern era in the use of data and the use of analytics. But this year, it's trending to be probably 50/50 between apps and data. And that is a shift towards people deploying in the same way that they moved from on-premise apps to SAS apps, a move towards looking at data platforms in the cloud for all the benefits of racking and stacking and having the capability rather than being in the air-conditioning, HVAC, and power consumption business. And that has been phenomenal. We've seen great growth with some of the work from Microsoft Azure with the Insights products, AWS's Redshift is a fantastic growth area for us. And these sorts of technologies, we think are going to be of significant impact to the everyday, the work clothing types of analytics. Maybe the more exotic stuff will stay on prem, but a lot of the regular business-like stuff, you know, stuff in suits and ties is moving into the cloud at a rapid pace. >> And we just came off the Google Next show last week. And Google really is helping continue to push kind of ML and AI out front. And so, maybe it's not the blue suit analytics. >> Gaurav: Indeed, yes. >> But it does drive expectations. And you know, the expectations of what we can get, what we should get, what we should be moving towards is rapidly changing. >> Rapidly changing, for example, we saw at The New York Times, which as many of Google's flagship enterprise customers are media-related. >> Jeff: Right. >> No accident, they're so proficient themselves being in the consumer internet space. So as we encountered in places like The New York Times, is there's a shift away from a legacy data warehouse, which people like me and others in the last century, back in my time in Informatica, might have sold them towards a cloud-first strategy of using, in their case, Google products, Bigtable, et cetera. And also, they're doing that because they aspirationally want to get at consumer prices without having to have a campus and the expense of Google's big brain. They want to benefit from some of those things like TensorFlow, et cetera, through the machine learning and other developer capabilities that are now coming along with that in the cloud. And by the way, Microsoft has amazing machine learning capability in its Azure for Microsoft Research as well. >> So Gaurav, it's interesting to hear sort of the two drivers. We know PeopleSoft took off starting with HR first and then would add on financials and stumble a little bit with manufacturing. So, when someone wants to bring in Workday, is it purely an efficiency value prop? And then, how are you helping them tie into the existing fabric of applications? >> Look, I think you have to ask Dave or Aneel or ask them together more about that dynamic. What I know, as a friend of the firm and as somebody we collaborate with, and, you know, this is an interesting statistic, 20 percent of Workday's financial customers are using SnapLogic, 20 percent. Now, it's a nascent business for them and you and I were around in the last century of ERP. We saw the evolution of functional winners. Some made it into suites and some didn't. Siebel never did. PeopleSoft at least made a significant impact on a variety of other things. Yes, there was Bonn and other things that prevented their domination of manufacturing and, of course, the small company in Walldorf did a very good job on it too. But that said, what we find is it's very typical, in a sense, how people using TIBCO and Informatica in the last century are looking at SnapLogic. And it's no accident because we saw Workdays go to market motion, and in a sense, are following, trying to do the same thing Dave and Aneel have done, but we're trying to do the same thing, being a bunch of ex-Informatica guys. So here's what it is. When you look at your legacy installation, and you want to modernize it, what are your choices? You can do a big old upgrade because it's on-premise software. Or you can say, "You know what? "For 20% more, I could just get the new thing." And guess what? A lot of people want to get the new thing. And that's what you're going to see all the time. And that's what's happening with companies like SnapLogic and Workday is, you know, someone. Right here locally, Adobe, it's an icon in technology and certainly in San Jose that logo is very big. A few years ago, they decided to make the jump from legacy middleware, TIBCO, Informatica, WebMethods, and they've replaced everything globally with SnapLogic. So in that same way, instead of trying to upgrade this version and that version and what about what we do in Japan, what do we do in Sweden, why don't you just find a platform as a service that lets you elevate your success and go towards a better product, more of a self-service better UX, millennial-friendly type of product? So that's what's happening out there. >> But even that three-letter company from Walldorf was on-stage last week. You can now get SAP on the Google Cloud Platform which I thought was pretty amazing. And the other piece I just love but there's still a few doubters out there on the SAS platform is now there's a really visual representation. >> Gaurav: There is. >> Of the dominance of that style going up in downtown San Francisco. It's 60 stories high, and it's taken over the landscape. So if there's ever any a doubt of enterprise adaptation of SAS, and if anything, I would wonder if kind of the proliferation of apps now within the SAS environment inside the enterprise starts to become a problem in and of its own self. Because now you have so many different apps that you're working on and working. God help if the internet goes down, right? >> It's true, and you know, and how do you make e pluribus unim, out of many one, right? So it's hilarious. It is almost at proliferation at this point. You know, our CFO tapped me the other day. He said, "Hey, you've got to check this out." "They're using a SAS application which they got "from a law firm to track stock options "inside the company." I'm like, "Wow, that is a job title and a vertical." So only high growth private venture backed companies need this, and typically it's high tech. And you have very capable SAS, even in the small grid squares in the enterprise. >> Jeff: Right, right. >> So, a sign, and I think that's probably another way to think about the work that we do at SnapLogic and others. >> Jeff: Right, right. >> Other people in the marketplace like us. What we do essentially is we give you the ERP of one. Because if you could choose things that make sense for you and they could work together in a very good way to give you very good fabric for your purposes, you've essentially bought a bespoke suit at rack prices. Right? Without that nine times multiplier of the last century of having to have just consultants without end, darkened the sky with consultants to make that happen. You know? So that, yes, SAS proliferation is happening. That is the opportunity, also the problem. For us, it's an opportunity where that glass is half-full we come in with SnapLogic and knit it together for you to give you fabric back. And people love that because the businesses can buy what they want, and the enterprise gets a comprehensive solution. >> Jeff: Right, right. >> Well, at the risk of taking a very short tangent, that comment about darkening the skies, if I recall, was the battle of the Persians threatening the 300 Greeks at the battle of Thermopylae. >> Gaurav: Yes. >> And they said, "We'll darken the skies with our arrows." And so the Greek. >> Gaurav: Come and get 'em. >> No, no. >> The famous line was, he said, "Give us your weapons." And the guy says, "Come and get 'em." (laughs) >> We got to that point, the Greek general says, "Well, we'll fight in the shade." (all laughing) But I wanted to ask you. >> This is the movie 300 as well, right? >> Yes. >> The famous line is, "Give us your weapons." He said, "Come and get 'em." (all laughing) >> But I'm thinking also of the use case where a customer brings in Workday and you help essentially instrument it so it can be a good citizen. So what does that make, or connect it so it can be a good citizen. How much easier does that mean or does that make fitting in other SAS apps or any other app into the fabric, application fabric? >> Right, right. Look, George. As you and I know, we both had some wonderful runs in the last century, and here we are doing version 2.0 in many ways, again, very similar to the Workday management. The enterprise is hip to the fact that there is a Switzerland nature to making things work together. So they want amazing products like Workday. They want amazing products like the SAP Cloud Suite, now with Concur, SuccessFactors in there. Some very cool things happening in the analytics world which you'll see at Sapphire and so on. So some very, very capable products coming from, I mean, Oracle's bought 80 SAS companies or 87 SAS companies. And so, what you're seeing is the enterprise understands that there's going to be red versus blue and a couple other stripes and colors and that they want their businesspeople to buy whatever works for them. But they want to make them work together. All right? So there is a natural sort of geographic or structural nature to this business where there is a need for Switzerland and there is a need for amazing technology, some of which can only come from large companies with big balance sheets and vertical understanding and a legacy of success. But if a customer like an AstraZeneca where you have a CIO like Dave Smoley who transformed Flextronics, is now doing the same thing at AstraZeneca bringing cloud apps, is able to use companies like SnapLogic and then deploy Workday appropriately, SAP appropriately, have his own custom development, some domestic, some overseas, all over the world, then you've got the ability again to get something very custom, and you can do that at a fraction of the cost of overconsulting or darkening the skies in the way that things were done in the last century. >> So, then tell us about maybe the convergence of the new age data warehousing, the data science pipeline, and then this bespoke collection of applications, not bespoke the way Oracle tried it 20 years ago where you had to upgrade every app tied into every other app on prem, but perhaps the integration, more from many to one because they're in the cloud. There's only one version of each. How do you tie those two worlds together? >> You know, it's like that old bromide, "Know when to hold 'em. "Know when to fold them." There is a tendency when programming becomes more approachable, you have more millennials who are able to pick up technology in a way. I mean, it's astounding what my children can do. So what you want to do is as a enterprise, you want to very carefully build those things that you want to build, make sure you don't overbuild. Or, say, if you have a development capability, then every problem looks like a development nail and you have a hammer called development. "Let's hire more Java programmers." That's not the answer. Conversely, you don't want to lose sight of the fact that to really be successful in this millennium, you have to have a core competence around technology. So you want to carefully assemble and build your capability. Now, nobody should ever outsource management. That's a bad idea. (chuckles) But what you want to do is you want to think about those things that you want to buy as a package. Is that a core competence? So, there are excellent products for finance, for human capital management, for travel expense management. Coupa just announced today their for managing your spend. Some of the work at Ariba, now the Ariba Cloud at SAP, are excellent products to help you do certain job titles really well. So you really shouldn't be building those things. But what you should be doing is doing the right element of build and buy. So now, what does that mean for the world of analytics? In my view, people building data platforms or using a lot of open source and a lot of DevOps labor and virtualization engineering and all that stuff may be less valuable over time because where the puck is going is where a lot of people should skate to is there is a nature of developing certain machine language and certain kind of AI capabilities that I think are going to be transformational for almost every industry. It is hard to imagine anything in a more mechanized back office, moving paper, manufacturing, that cannot go through a quantum of improvement through AI. There are obviously moral and certain humanity dystopia issues around that to be dealt with. But what people should be doing is I think building out the AI capabilities because those are very custom to that business. Those have to do with the business's core competence, its milieu of markets and competitors. But there should be, in a sense, stroking a purchase order in the direction of a SAS provider, a cloud data provider like Microsoft Azure or Redshift, and shrinking down their lift-and-shift bill and their data center bill by doing that. >> It's fascinating how long it took enterprises to figure out that. Just like they've been leveraging ADP for God knows how many years, you know, there's a lot of other SAS applications you can use to do your non-differentiated heavy lifting, but they're clearly all in now. So Gaurav, we're running low on time. I just want to say, when we get you here next year, what's top of your plate? What's top of priorities for 2017? Cause obviously you guys are knocking down things left and right. >> Thank you, Jeff. Look, priority for us is growth. We're a growth company. We grow responsibly. We've seen a return to quality on the part of investors, on the part of public and private investors. And you know, you'll see us continue to sort of go at that growth opportunity in a manner consistent with our core values of building product with incredible success. 99% of our customers are new to our products last quarter. >> Jeff: Ninety-nine percent? >> Yes sir. >> That says it all. >> And in the world of enterprise software where there's a lot of snake oil, I'm proud to say that we are building new product with old-fashioned values, and that's what you see from us. >> Well 99% customer retention, you can't beat that. >> Gaurav: Hard to beat! There's no way but down from there, right? (laughing) >> Exactly. Alright Gaurav, well, thanks. >> Pleasure. >> For taking a few minutes out of your busy day. >> Thank you, Jeff. >> And I really appreciate the time. >> Thank you, Jeff, thank you, George. >> Alright, he's George Gilbert. I'm Jeff Rick. You're watching the Cube from the historic Pagoda Lounge in downtown San Jose. Thanks for watching.
SUMMARY :
at the historic Pagoda Thank you for having me. since we last saw you about a year ago. We had a pretty epic year. and customers, and you guys the Founder and CEO of So we had a scorching year, you know. Yes, and as you know, we You know, all the way from Subway Sandwich growth for the company. So when you look at it, And that is the primary driver for growth. the blue suit analytics. And you know, the expectations of Google's flagship enterprise customers and the expense of Google's big brain. sort of the two drivers. What I know, as a friend of the firm And the other piece I just love if kind of the proliferation of apps now even in the small grid that we do at SnapLogic and others. and the enterprise gets at the battle of Thermopylae. And so the Greek. And the guy says, "Come and get 'em." the Greek general says, "Give us your weapons." and you help essentially instrument it a fraction of the cost of the new age data warehousing, of the fact that to really be successful we get you here next year, And you know, you'll see us continue And in the world of enterprise software retention, you can't beat that. Alright Gaurav, well, thanks. out of your busy day. the historic Pagoda Lounge
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
George Gilbert | PERSON | 0.99+ |
Dave Smoley | PERSON | 0.99+ |
Dan Wormenhoven | PERSON | 0.99+ |
Jeff | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
Gaurav Dhillon | PERSON | 0.99+ |
George | PERSON | 0.99+ |
2017 | DATE | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
AstraZeneca | ORGANIZATION | 0.99+ |
Jeff Rick | PERSON | 0.99+ |
ORGANIZATION | 0.99+ | |
Amgen | ORGANIZATION | 0.99+ |
NetApp | ORGANIZATION | 0.99+ |
Ariba | ORGANIZATION | 0.99+ |
PeopleSoft | ORGANIZATION | 0.99+ |
Japan | LOCATION | 0.99+ |
Gaurav | PERSON | 0.99+ |
San Jose | LOCATION | 0.99+ |
Vassar College | ORGANIZATION | 0.99+ |
2016 | DATE | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
Sweden | LOCATION | 0.99+ |
20% | QUANTITY | 0.99+ |
20 percent | QUANTITY | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
99% | QUANTITY | 0.99+ |
Walldorf | LOCATION | 0.99+ |
80 | QUANTITY | 0.99+ |
Aneel | PERSON | 0.99+ |
SnapLogic | ORGANIZATION | 0.99+ |
TIBCO | ORGANIZATION | 0.99+ |
87 | QUANTITY | 0.99+ |
next year | DATE | 0.99+ |
Informatica | ORGANIZATION | 0.99+ |
300 new customers | QUANTITY | 0.99+ |
last week | DATE | 0.99+ |
Bristol-Myers Squibb | ORGANIZATION | 0.99+ |
60 stories | QUANTITY | 0.99+ |
Ninety-nine percent | QUANTITY | 0.99+ |
Adobe | ORGANIZATION | 0.99+ |
Switzerland | LOCATION | 0.99+ |
last century | DATE | 0.99+ |
Wikibon | ORGANIZATION | 0.99+ |
SAP | ORGANIZATION | 0.99+ |
Coupa | ORGANIZATION | 0.98+ |
two drivers | QUANTITY | 0.98+ |
WebMethods | ORGANIZATION | 0.98+ |
two worlds | QUANTITY | 0.98+ |
Flextronics | ORGANIZATION | 0.98+ |
Sapphire | ORGANIZATION | 0.98+ |
SAP Cloud Suite | TITLE | 0.98+ |
this year | DATE | 0.98+ |
Frederick Reiss, IBM STC - Big Data SV 2017 - #BigDataSV - #theCUBE
>> Narrator: Live from San Jose, California it's the Cube, covering Big Data Silicon Valley 2017. (upbeat music) >> Big Data SV 2016, day two of our wall to wall coverage of Strata Hadoob Conference, Big Data SV, really what we call Big Data Week because this is where all the action is going on down in San Jose. We're at the historic Pagoda Lounge in the back of the Faramount, come on by and say hello, we've got a really cool space and we're excited and never been in this space before, so we're excited to be here. So we got George Gilbert here from Wiki, we're really excited to have our next guest, he's Fred Rice, he's the chief architect at IBM Spark Technology Center in San Francisco. Fred, great to see you. >> Thank you, Jeff. >> So I remember when Rob Thomas, we went up and met with him in San Francisco when you guys first opened the Spark Technology Center a couple of years now. Give us an update on what's going on there, I know IBM's putting a lot of investment in this Spark Technology Center in the San Francisco office specifically. Give us kind of an update of what's going on. >> That's right, Jeff. Now we're in the new Watson West building in San Francisco on 505 Howard Street, colocated, we have about a 50 person development organization. Right next to us we have about 25 designers and on the same floor a lot of developers from Watson doing a lot of data science, from the weather underground, doing weather and data analysis, so it's a really exciting place to be, lots of interesting work in data science going on there. >> And it's really great to see how IBM is taking the core Watson, obviously enabled by Spark and other core open source technology and now applying it, we're seeing Watson for Health, Watson for Thomas Vehicles, Watson for Marketing, Watson for this, and really bringing that type of machine learning power to all the various verticals in which you guys play. >> Absolutely, that's been what Watson has been about from the very beginning, bringing the power of machine learning, the power of artificial intelligence to real world applications. >> Jeff: Excellent. >> So let's tie it back to the Spark community. Most folks understand how data bricks builds out the core or does most of the core work for, like, the sequel workload the streaming and machine learning and I guess graph is still immature. We were talking earlier about IBM's contributions in helping to build up the machine learning side. Help us understand what the data bricks core technology for machine learning is and how IBM is building beyond that. >> So the core technology for machine learning in Apache Spark comes out, actually, of the machine learning department at UC Berkeley as well as a lot of different memories from the community. Some of those community members also work for data bricks. We actually at the IBM Spark Technology Center have made a number of contributions to the core Apache Spark and the libraries, for example recent contributions in neural nets. In addition to that, we also work on a project called Apache System ML, which used to be proprietary IBM technology, but the IBM Spark Technology Center has turned System ML into Apache System ML, it's now an open Apache incubating project that's been moving forward out in the open. You can now download the latest release online and that provides a piece that we saw was missing from Spark and a lot of other similar environments and optimizer for machine learning algorithms. So in Spark, you have the catalyst optimizer for data analysis, data frames, sequel, you write your queries in terms of those high level APIs and catalyst figures out how to make them go fast. In System ML, we have an optimizer for high level languages like Spark and Python where you can write algorithms in terms of linear algebra, in terms of high level operations on matrices and vectors and have the optimizer take care of making those algorithms run in parallel, run in scale, taking account of the data characteristics. Does the data fit in memory, and if so, keep it in memory. Does the data not fit in memory? Stream it from desk. >> Okay, so there was a ton of stuff in there. >> Fred: Yep. >> And if I were to refer to that as so densely packed as to be a black hole, that might come across wrong, so I won't refer to that as a black hole. But let's unpack that, so the, and I meant that in a good way, like high bandwidth, you know. >> Fred: Thanks, George. >> Um, so the traditional Spark, the machine learning that comes with Spark's ML lib, one of it's distinguishing characteristics is that the models, the algorithms that are in there, have been built to run on a cluster. >> Fred: That's right. >> And very few have, very few others have built machine learning algorithms to run on a cluster, but as you were saying, you don't really have an optimizer for finding something where a couple of the algorithms would be fit optimally to solve a problem. Help us understand, then, how System ML solves a more general problem for, say, ensemble models and for scale out, I guess I'm, help us understand how System ML fits relative to Sparks ML lib and the more general problems it can solve. >> So, ML Live and a lot of other packages such as Sparking Water from H20, for example, provide you with a toolbox of algorithms and each of those algorithms has been hand tuned for a particular range of problem sizes and problem characteristics. This works great as long as the particular problem you're facing as a data scientist is a good match to that implementation that you have in your toolbox. What System ML provides is less like having a toolbox and more like having a machine shop. You can, you have a lot more flexibility, you have a lot more power, you can write down an algorithm as you would write it down if you were implementing it just to run on your laptop and then let the System ML optimizer take care of producing a parallel version of that algorithm that is customized to the characteristics of your cluster, customized to the characteristics of your data. >> So let me stop you right there, because I want to use an analogy that others might find easy to relate to for all the people who understand sequel and scale out sequel. So, the way you were describing it, it sounds like oh, if I were a sequel developer and I wanted to get at some data on my laptop, I would find it pretty easy to write the sequel to do that. Now, let's say I had a bunch of servers, each with it's own database, and I wanted to get data from each database. If I didn't have a scale out database, I would have to figure out physically how to go to each server in the cluster to get it. What I'm hearing for System ML is it will take that query that I might have written on my one server and it will transparently figure out how to scale that out, although in this case not queries, machine learning algorithms. >> The database analogy is very apt. Just like sequel and query optimization by allowing you to separate that logical description of what you're looking for from the physical description of how to get at it. Lets you have a parallel database with the exact same language as a single machine database. In System ML, because we have an optimizer that separates that logical description of the machine learning algorithm from the physical implementation, we can target a lot of parallel systems, we can also target a large server and the code, the code that implements the algorithm stays the same. >> Okay, now let's take that a step further. You refer to matrix math and I think linear algebra and a whole lot of other things that I never quite made it to since I was a humanities major but when we're talking about those things, my understanding is that those are primitives that Spark doesn't really implement so that if you wanted to do neural nets, which relies on some of those constructs for high performance, >> Fred: Yes. >> Then, um, that's not built into Spark. Can you get to that capability using System ML? >> Yes. System ML edits core, provides you with a library, provides you as a user with a library of machine, rather, linear algebra primitives, just like a language like r or a library like Mumpai gives you matrices and vectors and all of the operations you can do on top of those primitives. And just to be clear, linear algebra really is the language of machine learning. If you pick up a paper about an advanced machine learning algorithm, chances are the specification for what that algorithm does and how that algorithm works is going to be written in the paper literally in linear algebra and the implementation that was used in that paper is probably written in the language where linear algebra is built in, like r, like Mumpai. >> So it sounds to me like Spark has done the work of sort of the blocking and tackling of machine learning to run in parallel. And that's I mean, to be clear, since we haven't really talked about it, that's important when you're handling data at scale and you want to train, you know, models on very, very large data sets. But it sounds like when we want to go to some of the more advanced machine learning capabilities, the ones that today are making all the noise with, you know, speech to text, text to speech, natural language, understanding those neural network based capabilities are not built into the core Spark ML lib, that, would it be fair to say you could start getting at them through System ML? >> Yes, System ML is a much better way to do scalable linear algebra on top of Spark than the very limited linear algebra that's built into Spark. >> So alright, let's take the next step. Can System ML be grafted onto Spark in some way or would it have to be in an entirely new API that doesn't take, integrate with all the other Spark APIs? In a way, that has differentiated Spark, where each API is sort of accessible from every other. Can you tie System ML in or do the Spark guys have to build more primitives into their own sort of engine first? >> A lot of the work that we've done with the Spark Technology Center as part of bringing System ML into the Apache ecosystem has been to build a nice, tight integration with Apache Spark so you can pass Spark data frames directly into System ML you can get data frames back. Your System ML algorithm, once you've written it, in terms of one of System ML's main systematic languages it just plugs into Spark like all the algorithms that are built into Spark. >> Okay, so that's, that would keep Spark competitive with more advanced machine learning frameworks for a longer period of time, in other words, it wouldn't hit the wall the way if would if it encountered tensor flow from Google for Google's way of doing deep learning, Spark wouldn't hit the wall once it needed, like, a tensor flow as long as it had System ML so deeply integrated the way you're doing it. >> Right, with a system like System ML, you can quickly move into new domains of machine learning. So for example, this afternoon I'm going to give a talk with one of our machine learning developers, Mike Dusenberry, about our recent efforts to implement deep learning in System ML, like full scale, convolutional neural nets running on a cluster in parallel processing many gigabytes of images, and we implemented that with very little effort because we have this optimizer underneath that takes care of a lot of the details of how you get that data into the processing, how you get the data spread across the cluster, how you get the processing moved to the data or vice versa. All those decisions are taken care of in the optimizer, you just write down the linear algebra parts and let the system take care of it. That let us implement deep learning much more quickly than we would have if we had done it from scratch. >> So it's just this ongoing cadence of basically removing the infrastructure gut management from the data scientists and enabling them to concentrate really where their value is is on the algorithms themselves, so they don't have to worry about how many clusters it's running on, and that configuration kind of typical dev ops that we see on the regular development side, but now you're really bringing that into the machine learning space. >> That's right, Jeff. Personally, I find all the minutia of making a parallel algorithm worked really fascinating but a lot of people working in data science really see parallelism as a tool. They want to solve the data science problem and System ML lets you focus on solving the data science problem because the system takes care of the parallelism. >> You guys could go on in the weeds for probably three hours but we don't have enough coffee and we're going to set up a follow up time because you're both in San Francisco. But before we let you go, Fred, as you look forward into 2017, kind of the advances that you guys have done there at the IBM Spark Center in the city, what's kind of the next couple great hurdles that you're looking to cross, new challenges that are getting you up every morning that you're excited to come back a year from now and be able to say wow, these are the one or two things that we were able to take down in 2017? >> We're moving forward on several different fronts this year. On one front, we're helping to get the notebook experience with Spark notebooks consistent across the entire IBM product portfolio. We helped a lot with the rollout of notebooks on data science experience on z, for example, and we're working actively with the data science experience and with the Watson data platform. On the other hand, we're contributing to Spark 2.2. There are some exciting features, particularly in sequel that we're hoping to get into that release as well as some new improvements to ML Live. We're moving forward with Apache System ML, we just cut Version 0.13 of that. We're talking right now on the mailing list about getting System ML out of incubation, making it a full, top level project. And we're also continuing to help with the adoption of Apache Spark technology in the enterprise. Our latest focus has been on deep learning on Spark. >> Well, I think we found him! Smartest guy in the room. (laughter) Thanks for stopping by and good luck on your talk this afternoon. >> Thank you, Jeff. >> Absolutely. Alright, he's Fred Rice, he's George Gilbert, and I'm Jeff Rick, you're watching the Cube from Big Data SV, part of Big Data Week in San Jose, California. (upbeat music) (mellow music) >> Hi, I'm John Furrier, the cofounder of SiliconANGLE Media cohost of the Cube. I've been in the tech business since I was 19, first programming on mini computers.
SUMMARY :
it's the Cube, covering Big Data Silicon Valley 2017. in the back of the Faramount, come on by and say hello, in the San Francisco office specifically. and on the same floor a lot of developers from Watson to all the various verticals in which you guys play. of machine learning, the power of artificial intelligence or does most of the core work for, like, the sequel workload and have the optimizer take care of making those algorithms and I meant that in a good way, is that the models, the algorithms that are in there, and the more general problems it can solve. to that implementation that you have in your toolbox. in the cluster to get it. and the code, the code that implements the algorithm so that if you wanted to do neural nets, Can you get to that capability using System ML? and all of the operations you can do the ones that today are making all the noise with, you know, linear algebra on top of Spark than the very limited So alright, let's take the next step. System ML into the Apache ecosystem has been to build so deeply integrated the way you're doing it. and let the system take care of it. is on the algorithms themselves, so they don't have to worry because the system takes care of the parallelism. into 2017, kind of the advances that you guys have done of Apache Spark technology in the enterprise. Smartest guy in the room. and I'm Jeff Rick, you're watching the Cube cohost of the Cube.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
George Gilbert | PERSON | 0.99+ |
Jeff Rick | PERSON | 0.99+ |
George | PERSON | 0.99+ |
Jeff | PERSON | 0.99+ |
Fred Rice | PERSON | 0.99+ |
Mike Dusenberry | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
2017 | DATE | 0.99+ |
San Francisco | LOCATION | 0.99+ |
John Furrier | PERSON | 0.99+ |
San Jose | LOCATION | 0.99+ |
Rob Thomas | PERSON | 0.99+ |
505 Howard Street | LOCATION | 0.99+ |
ORGANIZATION | 0.99+ | |
Frederick Reiss | PERSON | 0.99+ |
Spark Technology Center | ORGANIZATION | 0.99+ |
Fred | PERSON | 0.99+ |
IBM Spark Technology Center | ORGANIZATION | 0.99+ |
one | QUANTITY | 0.99+ |
San Jose, California | LOCATION | 0.99+ |
Spark 2.2 | TITLE | 0.99+ |
three hours | QUANTITY | 0.99+ |
Watson | ORGANIZATION | 0.99+ |
UC Berkeley | ORGANIZATION | 0.99+ |
one server | QUANTITY | 0.99+ |
Spark | TITLE | 0.99+ |
SiliconANGLE Media | ORGANIZATION | 0.99+ |
Python | TITLE | 0.99+ |
each server | QUANTITY | 0.99+ |
both | QUANTITY | 0.99+ |
each | QUANTITY | 0.99+ |
each database | QUANTITY | 0.98+ |
Big Data Week | EVENT | 0.98+ |
Pagoda Lounge | LOCATION | 0.98+ |
Strata Hadoob Conference | EVENT | 0.98+ |
System ML | TITLE | 0.98+ |
Big Data SV | EVENT | 0.97+ |
each API | QUANTITY | 0.97+ |
ML Live | TITLE | 0.96+ |
today | DATE | 0.96+ |
Thomas Vehicles | ORGANIZATION | 0.96+ |
Apache System ML | TITLE | 0.95+ |
Big Data | EVENT | 0.95+ |
Apache Spark | TITLE | 0.94+ |
Watson for Marketing | ORGANIZATION | 0.94+ |
Sparking Water | TITLE | 0.94+ |
first | QUANTITY | 0.94+ |
one front | QUANTITY | 0.94+ |
Big Data SV 2016 | EVENT | 0.94+ |
IBM Spark Technology Center | ORGANIZATION | 0.94+ |
about 25 designers | QUANTITY | 0.93+ |