Clarke Patterson, Confluent - #SparkSummit - #theCUBE

>> Announcer: Live from San Francisco, it's theCUBE. covering Spark Summit 2017, brought to you by Databricks. (techno music) >> Welcome to theCUBE, at Spark Summit here at San Francisco, at the Moscone Center West, and we're going to be competing with all the excitement happening behind us. They're going to be going off with raffles, and I don't know what all. But we'll just have to talk above them, right? >> Clarke: Well at least we didn't get to win. >> Our next guest here on the show is Clarke Patterson from Confluent. You're the Senior Director of Product Marketing, is that correct? >> Yeah, you got it. >> All right, well it's exciting -- >> Clarke: Pleasure to be here >> To have you on the show. >> Clarke: It's my first time here. >> David: First time on theCUBE? >> I feel like one of those radio people, first time caller, here I am. Yup, first time on theCUBE. >> Well, long time listener too, I hope. >> Clarke: Yes, I am. >> And so, have you announced anything new that you want to talk about from Confluent? >> Yeah, I mean not particularly at this show per se, but most recently, we've done a lot of stuff to enable customers to adopt Confluent in the Cloud. So we came up with a Confluent Cloud offering, which is a managed service of our Confluent platform a couple weeks ago, at our event around Kafka. So we're really excited about that. It really fits that need where Cloud First or operation-starved organizations are really wanting to do things with storing platforms based on Kafka, but they just don't have the means to make it happen. And so, we're now standing this up as a managed service center that allows them to get their hands on this great set of capabilities with us as the back stop to do things with it. >> And you said, Kafka is not just a publish and subscribe engine, right? >> Yeah, I'm glad that you asked that. So, that one of the big misconceptions, I think, of Kafka. You know, it's made its way into a lot of organizations from the early use case of publish and subscribe for data. But, over the last 12 to 18 months, in particular, there's been a lot of interesting advancements. Two things in particular: One is the ability to connect, which is called a Connect API in Kafka. And it essentially simplifies how you integrate large amounts of producers and consumers of data as information flows through. So, a modernization of ETL, if you will. The second thing is stream processing. So there's a Kafka streams API that's built-in now as well that allows you to do the lightweight transformations of data as it flows from point A to point B, and you could publish out new topics if you need to manipulate things. And it expands the overall capabilities of what Kafka can do. >> Okay, and I'm going to ask George here to dive in, if you could. >> And I was just going to ask you. >> David: I can feel it. (laughing) >> So, this is interesting. But if we want to frame this in terms of what people understand from, I don't want to say prehistoric eras, but earlier approaches to similar problems. So, let's say, in days gone by, you had an ETL solution. >> Clarke: Yup. >> So now, let's put Connect together with stream processing, and how does that change the whole architecture of integrating your systems? >> Yeah, I mean I think the easiest way to think about this is if you think about some of the different market segments that have existed over the last 10 to 20 years. So data integration was all about how do I get a lot of different systems to integrate a bunch of data and transform it in some manner, and ship it off to some other place in my business. And it was really good at building these end-to-end workflows, moving big quantities of data. But it was generally kind of batch-oriented. And so we've been fixated on, how do we make this process faster? To some degree, and the other segment is application integration which said, hey, you know when I want applications to talk to one another, it doesn't have the scale of information exchange, but it needs to happen a whole lot faster. So these real-time integration systems, ESBs, and things like that came along and it was able to serve that particular need. But as we move forward into this world that we're in now, where there's just all sorts of information, companies want to become advanced-centric. You need to be able to get the best of both of those worlds. And this is really where Kafka is starting to sit. It's saying, hey let's take massive amounts of data producers that need to connect to massive amounts of data consumers, be able to ship a super-granular level of information, transform it as you need, and do that in real-time so that everything can get served out very, very fast. >> But now that you, I mean that's a wonderful and kind of pithy kind of way to distill it. But now that we have this new way of thinking of app integration, data integration, best of both worlds, that has sort of second order consequences in terms of how we build applications and connect them. So what does that look like? What do applications look like in the old world and now what enables them to be sort of re-factored? Or for new apps, how do you build them differently? >> Yeah, I mean we see a lot of people that are going into microservices oriented architecture. So moving away from one big monolithic app that takes this inordinate amount of effort to change in some capacity. And quite frankly, it happens very, very slow. And so they look to microservices to be able to split those up into very small, functional components that they can integrate a whole lot faster, decouple engineering teams so we're not dependent on one another, and just make things happen a whole lot quicker than we could before. But obviously when you do that, you need something that can connect all those pieces, and Kafka's a great thing to sit in there as a way to exchange state across all these things. So that's a massive use case for us and for Kafka specifically in terms of what we're seeing people do. >> You've said something in there at the end that I want to key off, which is, "To exchange state." So in the old world, we used a massive shared database to share state for a monolithic app or sometimes between monolithic apps. So what sort of state-of-the-art way that that's done now with microservices, if there's more than one, how does that work? >> Yeah, I mean so this is kind of rooted in the way we do stream processing. So there's this concept of topics, which effectively could align to individual microservices. And you're able to make sure that the most recent state of any particular one is stored in the central repository of Kafka. But then given that we take an API approach to stream processing, it's easy to embed those types of capabilities in any of the end-points. And so some of the activity can happen on that particular front, then it all gets synchronized down into the centralized hub. >> Okay, let me unpack that a little bit. Because you take an API approach, that means that if you're manipulating a topic, you're processing a microservice and that has state in it? Is that the right way to think about it? >> I think that's the easiest way to think about it, yeah. >> Okay. So where are we? Is this a 10 year migration, or is it a, some certain class of apps will lend themselves well to microservices, legacy apps will stay monolithic, and some new apps, some new Greenfield apps, will still be database-centric? How do you, or how should customers think about that mix? >> Yeah that's a great question. I don't know that I have the answer to it. The best gauge I can have is just the amount of interest and conversations that we have on this particular topic. I will say that from one of the topics that we do engage with, it's easily one of the most popular that people are interested in. So if that's a data point, it's definitely a lot of interested people trying to figure out how to do this stuff very, very fast. >> How to do the microservices? >> Yeah and I think if you look at some of the more notable tech companies of late, they're architected this way from the start. And so everyone's kind of looking at the Netflix of the world, and the Ubers of the world saying, I want to be like those guys, how do I do that? And it's driving them down this path. So competitive pressure, I think, will help force people's hands. The more that your competitors are getting in front of you and are able to deliver a better customer experience through some sort of mobile app or something like that, then it's going to force people to have to make these changes quicker. But how long that takes it'll be interesting to see. >> Great! Great stuff. Switch gears just a little bit. Talk about maybe why you're using Databricks and what some of the key value you've gotten out of that. >> Yeah, so I wouldn't say that we're using Databricks per se, but we integrate directly with Spark. So if you look at a lot of the use cases that people use Spark for, they need to obviously get data to where it is. And some of the principles that I said before about Kafka generally, it's a very flexible, very dynamic mechanism for taking lots of sources of information, culling all that down into one centralized place and then distributing it to places such as Spark. So we see a lot of people using the technologies together to get the data from point A to point B, do some transformation as they so need, and then obviously do some amazing computing horsepower and whatnot in Spark itself. >> David: All right. >> I'm processing this, and it's tough because you can go in so many different directions, especially like the question about Spark. I guess, give us some of the scenarios where Spark would fit. Would it be like doing microservices that require more advanced analytics, and then they feed other topics, or feed consumers? And then where might you stick with a shared database that a couple services might communicate with, rather than maintaining the state within the microservice? >> I think, let me see if I can kind of unpack that myself a little bit. >> George: I know it was packed pretty hard. (laughing) >> Got a lot packed in there. When folks want to do things like, I guess when you think about it like an overall business process. If you think about something like an order to cash business process these days, it has a whole bunch of different systems that hang off it. It's got your order processing. You've got an inventory management. Maybe you've got some real-time pricing. You've got some shipments. Things, like that all just kind of hang off of the flow of data across there. Now with any given system that you use for addressing any answers to each of those problems could be vastly different. It could be Spark. It could be a relational database. It could be a whole bunch of different things. Where the centralization of data comes in for us is to be able to just kind of make sure that all those components can be communicating with each other based on the last thing that happened within each of them individually. And so their ability to embed transformation, data transformations and data processing in themselves and then publish back out any change that they had into the shared cluster subsequently makes that state available to everybody else. So that if necessary, they can react to it. So in a lot of ways, we're kind of agnostic to the type of processing that happens on the end-points. It's more just the free movement of all the data to all those things. And then if they have any relevant updates that need to make it back to any of the other components hanging on that process flow, they should have the ability to publish that back down it. >> And so one thing that Jay Kreps, Founder and CEO, talks about is that Kafka may ultimately, or in his language, will ultimately grow into something that rivals the relational database. Tell us what that world would look like. >> It would be controversial (laughing). >> George: That's okay. >> You want me to be the bad guy? So it's interesting because we did Kafka Summit about a month ago, and there's a lot of people, a lot of companies I should say, that are actually using and calling Kafka an enterprise data hub, a central hub for data, a data distribution network. And they are literally storing all sorts (raffle announcements beginning on loudspeaker) of different links of data. So one interesting example was the New York Times. So they used Kafka and literally stored every piece of content that has ever been generated at that publisher since the beginning of time in Kafka. So all the way back to 1851, they've obviously digitized everything. And it sits in there, and then they disposition that back out to various forms of the business. So that's -- >> They replay it, they pull it. They replay and pull, wow, okay. >> So that has some very interesting implications. So you can replay data. If you run some analytics on something and you didn't get the result that you wanted, and you wanted to redo it, it makes it really easy and really fast to be able to do that. If you want to bring on a new system that has some new functionality, you can do that really quickly because you have the full pedigree of everything that sits in there. And then imagine this world where you could actually start to ask questions on it directly. That's where it starts to get very, very profound, and it will be interesting to see where that goes. >> Two things then: First, it sounds, like a database takes updates, so you don't have a perfect historical record. You have a snapshot of current values. Like whereas in a log, like Kafka, or log-structured data structure you have every event that ever happened. >> Clarke: Correct. >> Now, what's the impact on performance if you want to pull, you know -- >> Clarke: That much data? >> Yeah. >> Yeah, I mean so it all comes down to managing the environment on which you run it. So obviously the more data you're going to store in here, and the more type of things you're going to try to connect to it, you're going to have to take that into account. >> And you mentioned just a moment ago about directly asking about the data contained in the hub, in the data hub. >> Clarke: Correct. >> How would that work? >> Not quite sure today, to be honest with you. And I think this is where that question, I think, is a pretty provocative one. Like what does it mean to have this entire view of all granular event streams, not in some aggregated form over time? I think the key will be some mechanism to come onto an environment like this to make it more consumable for more business types users. And that's probably one of the areas we'll want to watch to see how that's (background noise drowns out speaker). >> Okay, only one unanswered question. But you answered all the other ones really well. So we're going to wrap it up here. We're up against a loud break right now. I want to think Clarke Patterson from Confluent for joining us. Thank you so much for being on the show. >> Clarke: Thank you for having me. >> Appreciate it so much. And thank you for watching theCUBE. We'll be back after the raffle in just a few minutes. We have one more guest. Stay with us, thank you. (techno music)

Published Date : Jun 8 2017

SUMMARY :

covering Spark Summit 2017, brought to you by Databricks. They're going to be going off with raffles, is that correct? I feel like one of those radio people, but they just don't have the means to make it happen. Yeah, I'm glad that you asked that. Okay, and I'm going to ask George here to dive in, David: I can feel it. but earlier approaches to similar problems. that have existed over the last 10 to 20 years. But now that we have this new way of thinking And so they look to microservices to be able So in the old world, we used a massive shared database And so some of the activity can happen Is that the right way to think about it? So where are we? I don't know that I have the answer to it. But how long that takes it'll be interesting to see. and what some of the key value you've gotten out of that. and then distributing it to places such as Spark. And then where might you stick with a shared database that myself a little bit. George: I know it was packed pretty hard. So that if necessary, they can react to it. that rivals the relational database. that publisher since the beginning of time in Kafka. They replay it, they pull it. and really fast to be able to do that. or log-structured data structure you have every event the environment on which you run it. And you mentioned just a moment ago about directly And that's probably one of the areas we'll want to watch But you answered all the other ones really well. And thank you for watching theCUBE.

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Justin Warren	PERSON	0.99+
Sanjay Poonen	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Clarke	PERSON	0.99+
David Floyer	PERSON	0.99+
Jeff Frick	PERSON	0.99+
Dave Volante	PERSON	0.99+
George	PERSON	0.99+
Dave	PERSON	0.99+
Diane Greene	PERSON	0.99+
Michele Paluso	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Sam Lightstone	PERSON	0.99+
Dan Hushon	PERSON	0.99+
Nutanix	ORGANIZATION	0.99+
Teresa Carlson	PERSON	0.99+
Kevin	PERSON	0.99+
Andy Armstrong	PERSON	0.99+
Michael Dell	PERSON	0.99+
Pat Gelsinger	PERSON	0.99+
John	PERSON	0.99+
Google	ORGANIZATION	0.99+
Lisa Martin	PERSON	0.99+
Kevin Sheehan	PERSON	0.99+
Leandro Nunez	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Alibaba	ORGANIZATION	0.99+
NVIDIA	ORGANIZATION	0.99+
EMC	ORGANIZATION	0.99+
GE	ORGANIZATION	0.99+
NetApp	ORGANIZATION	0.99+
Keith	PERSON	0.99+
Bob Metcalfe	PERSON	0.99+
VMware	ORGANIZATION	0.99+
90%	QUANTITY	0.99+
Sam	PERSON	0.99+
Larry Biagini	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Brendan	PERSON	0.99+
Dell	ORGANIZATION	0.99+
Peter	PERSON	0.99+
Clarke Patterson	PERSON	0.99+

Darren Chinen, Malwarebytes - Big Data SV 17 - #BigDataSV - #theCUBE

>> Announcer: Live from San Jose, California, it's The Cube, covering Big Data Silicon Valley 2017. >> Hey, welcome back everybody. Jeff Frick here with The Cube. We are at Big Data SV in San Jose at the Historic Pagoda Lounge, part of Big Data week which is associated with Strata + Hadoop. We've been coming here for eight years and we're excited to be back. The innovation and dynamicism of big data and evolutions now with machine learning and artificial intelligence, just continues to roll, and we're really excited to be here talking about one of the nasty aspects of this world, unfortunately, malware. So we're excited to have Darren Chinen. He's the senior director of data science and engineering from Malwarebytes. Darren, welcome. >> Darren: Thank you. >> So for folks that aren't familiar with the company, give us just a little bit of background on Malwarebytes. >> So Malwarebytes is basically a next-generation anti-virus software. We started off as humble roots with our founder at 14 years old getting infected with a piece of malware, and he reached out into the community and, at 14 years old, wrote his first, with the help of some people, wrote his first lines of code to remediate a couple of pieces of malware. It grew from there and I think by the ripe old age of 18, founded the company. And he's now I want to say 26 or 27 and we're doing quite well. >> It was interesting, before we went live you were talking about his philosophy and how important that is to the company and now has turned into really a strategic asset, that no one should have to suffer from malware, and he decided to really offer a solution for free to help people rid themselves of this bad software. >> Darren: That's right. Yeah, so Malwarebytes was founded under the principle that Marcin believes that everyone has the right to a malware-free existence and so we've always offered a free version Malwarebytes that will help you to remediate if your machine does get infected with a piece of malware. And that's actually still going to this day. >> And that's now given you the ability to have a significant amount of inpoint data, transactional data, trend data, that now you can bake back into the solution. >> Darren: That's right. It's turned into a strategic advantage for the company, it's not something I don't think that we could have planned at 18 years old when he was doing this. But we've instrumented it so that we can get some anonymous-level telemetry and we can understand how malware proliferates. For many, many years we've been positioned as a second-opinion scanner and so we're able to see a lot of things, some trends happening in there and we can actually now see that in real time. >> So, starting out as a second-position scanner, you're basically looking at, you're finding what others have missed. And how can you, what do you have to do to become the first line of defense? >> Well, with our new product Malwarebytes 3.0, I think some of that landscape is changing. We have a very complete and layered offering. I'm not the product manager, so I don't think, as the data science guy, I don't know that I'm qualified to give you the ins and outs, but I think some of that is changing as we have, we've combined a lot of products and we have a much more complete sweep of layered protection built into the product. >> And so, maybe tell us, without giving away all the secret sauce, what sort of platform technologies did you use that enabled you to scale to these hundreds of millions of in points, and then to be fast enough at identifying things that were trending that are bad that you had to prioritize? >> Right, so traditionally, I think AV companies, they have these honeypots, right, where they go and the collect a piece of virus or a piece of malware, and they'll take the MD5 hash of that and then they'll basically insert that into a definition's database. And that's a very exact way to do it. The problem is is that there's so much malware or viruses out there in the wild, it's impossible to get all of them. I think one of the things that we did was we set up telemetry and we have a phenomenal research team where we're able to actually have our team catch entire families of malware, and that's really the secret sauce to Malwarebytes. There's several other levels but that's where we're helping out in the immediate term. What we do is we have, internally, we sort of jokingly call it a Lambda Two architecture. We had considered Lambda long ago, long ago and I say about a year ago when we first started this journey. But there's, Lambda is riddled with, as you know, a number of issues. If you've ever talked to Jay Kreps from Confluent, he has a lot of opinions on that, right? And one of the key problems with that is, that if you do a traditional Lambda, you have to implement your code in two places, it's very difficult, things get out of sync, you have to have replay frameworks. And these are some of the challenges with Lambda. So we do processing in a number of areas. The first thing that we did was we implemented Kafka to handle all of the streaming data. We use Kafka streams to do inline stateless transformations and then we also use Kafka Connect. And we write all of our data both into HBase, we use that, we may swap that out later for something like Redis, and that would be a thin speed layer. And then we also move the data into S3 and we use some ephemeral clusters to do very large-scale batch processing, and that really provides our data lab. >> When you call that Lambda Two, is that because you're still working essentially on two different infrastructures, so your code isn't quite the same? You still have to check the results on either on either fork. >> That's right, yeah, we didn't feel like it was, we did evaluate doing everything in the stream. But there are certain operations that are difficult to do with purely streamed processing, and so we did need a little bit, we did need to have a thin, what we call real time indicators, a speed layer, to supplement what we were doing in the stream. And so that's the differentiating factor between a traditional Lambda architecture where you'd want to have everything in the stream and everything in batch, and the batch is really more of a truing mechanism as opposed to, our real time is really directional, so in the traditional sense, if you look at traditional business intelligence, you'd have KPIs that would allow you to gauge the health of your business. We have RTIs, Real Time Indicators, that allow us to gauge directionally, what is important to look at this day, this hour, this minute? >> This thing is burning up the charts, >> Exactly. >> Therefore it's priority one. >> That's right, you got it. >> Okay. And maybe tell us a little more, because everyone I'm sure is familiar with Kafka but the streams product from them is a little newer as is Kafka Connect, so it sounds like you've got, it's not just the transport, but you've got some basic analytics and you've got the ability to do the ETL because you've got Connect that comes from sources and destinations, sources and syncs. Tell us how you've used that. >> Well, the streams product is, it's quite different than something like Spark Streaming. It's not working off micro-batching, it's actually working off the stream. And the second thing is, it's not a separate cluster. It's just a library, effectively a .jar file, right? And so because it works natively with Kafka, it handles certain things there quite well. It handles back pressure and when you expand the cluster, it's pretty good with things like that. We've found it to be a fairly stable technology. It's just a library and we've worked very closely with Confluent to develop that. Whereas Kafka Connect is really something that we use to write out to S3. In fact, Confluent just released a new, an S3 connector direct. We were using Stream X, which was a wrapper on top of an HDFS connector and they rigged that up to write to S3 for us. >> So tell us, as you look out, what sorts of technologies do you see as enabling you to build a platform that's richer, and then how would that show up in the functionality consumers like we would see? >> Darren: With respect to the architecture? >> Yeah. >> Well one of the things that we had to do is we had to evaluate where we wanted to spend our time. We're a very small team, the entire data science and engineering team is less than I think 10 months old. So all of us got hired, we've started this platform, we've gone very, very fast. And we had to decide, how are we going to, a, get, we've made this big investment, how are we going to get value to our end customer quickly, so that they're not waiting around and you get the traditional big-data story where, we've spent all this money and now we're not getting anything out of it. And so we had to make some of those strategic decisions and because of the fact that the data was really truly big data in nature, there's just a huge amount of work that has to be done in these open-source technologies. They're not baked, it's not like going out to Oracle and giving them a purchase order and you install it and away you go. There's a tremendous amount of work, and so we've made some strategic decisions on what we're going to do in open-source and what we're going to do with a third-party vendor solution. And one of those solutions that we decided was workload automation. So I just did a talk on this about how Control-M from BMC was really the tool that we chose to handle a lot of the coordination, the sophisticated coordination, and the workload automation on the batch side, and we're about to implement that in a data-quality monitoring framework. And that's turned out to be an incredibly stable solution for us. It's allowed us to not spend time with open-source solutions that do the same things like Airflow, which may or may not work well, but there's really no support around that, and focus our efforts on what we believe to be the really, really hard problems to tackle in Kafka, Kafka Streams, Connect, et cetera. >> Is it fair to say that Kafka plus Kafka Connect solves many of the old ETL problems or do you still need some sort of orchestration tool on top of it to completely commoditize, essentially moving and transforming data from OLTP or operational system to a decision support system? >> I guess the answer to that is, it depends on your use case. I think there's a lot of things that Kafka and the stream's job can solve for you, but I don't think that we're at the point where everything can be streaming. I think that's a ways off. There's legacy systems that really don't natively stream to you anyway, and there's just certain operations that are just more efficient to do in batch. And so that's why we've, I don't think batch for us is going away any time soon and that's one of the reasons why workload automation in the batch layer initially was so important and we've decided to extend that, actually, into building out a data-quality monitoring framework to put a collar around how accurate our data is on the real-time side. >> Cuz it's really horses for courses, it's not one or the other, it's application-specific, what's the best solution for that particular is. >> Yeah, I don't think that there's, if there was a one-size-fits-all it'd be a company, and there would be no need for architects, so I think that you have to look at your use case, your company, what kind of data, what style of data, what type of analysis do you need. Do you really actually need the data in real time and if you do put in all the work to get it in real time, are you going to be able to take action on it? And I think Malwarebytes was a great candidate. When it came in, I said, "Well, it does look like we can justify "the need for real time data, and the effort "that goes into building out a real-time framework." >> Jeff: Right, right. And we always say, what is real time? In time to do something about it, (all chuckle) and if there's not time to do something about it, depending on how you define real time, really what difference does it make if you can't do anything about it that fast. So as you look out in the future with IoT, all these connected devices, this is a hugely increased attack surface as we just read our essay a few weeks back. How does that work into your planning? What do you guys think about the future where there's so many more connected devices out on the edge and various degrees of intelligence and opportunities to hi-jack, if you will? >> Yeah, I think, I don't think I'm qualified to speak about the Malwarebytes product roadmap as far as IoT goes. >> But more philosophically, from a professional point of view, cuz every coin has two sides, there's a lot of good stuff coming from IoT and connected devices, but as we keep hearing over and over, just this massive attack surface expansion. >> Well I think, for us, the key is we're small and we're not operating, like I came from Apple where we operated on a budget of infinity, so we're not-- >> Have to build the infinity or the address infinity (Darren laughs) with an actual budget. >> We're small and we have to make sure that whatever we do creates value. And so what I'm seeing in the future is, as we get more into the IoT space and logs begin to proliferate and data just exponentiates in size, it's really how do we do the same thing and how are we going to manage that in terms of cost? Generally, big data is very low in information density. It's not like transactional systems where you get the data, it's effectively an Excel spreadsheet and you can go run some pivot tables and filters and away you go. I think big data in general requires a tremendous amount of massaging to get to the point where a data scientist or an analyst can actually extract some insight and some value. And the question is, how do you massage that data in a way that's going to be cost-effective as IoT expands and proliferates? So that's the question that we're dealing with. We're, at this point, all in with cloud technologies, we're leveraging quite a few of Amazon services, server-less technologies as well. We just are in the process of moving to the Athena, to Athena, as just an on-demand query service. And we use a lot of ephemeral clusters as well, and that allows us to actually run all of our ETL in about two hours. And so these are some of the things that we're doing to prepare for this explosion of data and making sure that we're in a position where we're not spending a dollar to gain a penny if that makes sense. >> That's his business. Well, he makes fun of that business model. >> I think you could do it, you want to drive revenue to sell dollars for 90 cents. >> That's the dot com model, I was there. >> Exactly, and make it up in volume. All right, Darren Chenin, thanks for taking a few minutes out of your day and giving us the story on Malwarebytes, sounds pretty exciting and a great opportunity. >> Thanks, I enjoyed it. >> Absolutely, he's Darren, he's George, I'm Jeff, you're watching The Cube. We're at Big Data SV at the Historic Pagoda Lounge. Thanks for watching, we'll be right back after this short break. (upbeat techno music)

Published Date : Mar 15 2017

SUMMARY :

it's The Cube, and evolutions now with machine learning So for folks that aren't and he reached out into the community and, and how important that is to the company and so we've always offered a free version And that's now given you the ability it so that we can get what do you have to do to become and we have a much more complete sweep and that's really the secret the results on either and so we did need a little bit, and you've got the ability to do the ETL that we use to write out to S3. and because of the fact that the data and that's one of the reasons it's not one or the other, and if you do put in all the and opportunities to hi-jack, if you will? I don't think I'm qualified to speak and connected devices, or the address infinity and how are we going to Well, he makes fun of that business model. I think you could do it, and giving us the story on Malwarebytes, the Historic Pagoda Lounge.

ENTITIES

Entity	Category	Confidence
Jeff	PERSON	0.99+
Darren Chinen	PERSON	0.99+
Darren	PERSON	0.99+
Jeff Frick	PERSON	0.99+
Darren Chenin	PERSON	0.99+
George	PERSON	0.99+
Jay Kreps	PERSON	0.99+
90 cents	QUANTITY	0.99+
two sides	QUANTITY	0.99+
Apple	ORGANIZATION	0.99+
Athena	LOCATION	0.99+
Marcin	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
two places	QUANTITY	0.99+
San Jose	LOCATION	0.99+
BMC	ORGANIZATION	0.99+
eight years	QUANTITY	0.99+
San Jose, California	LOCATION	0.99+
first lines	QUANTITY	0.99+
Malwarebytes	ORGANIZATION	0.99+
Kafka	TITLE	0.99+
one	QUANTITY	0.99+
10 months	QUANTITY	0.99+
Kafka Connect	TITLE	0.99+
Oracle	ORGANIZATION	0.99+
Lambda	TITLE	0.99+
first	QUANTITY	0.99+
second thing	QUANTITY	0.99+
Gene	PERSON	0.99+
Excel	TITLE	0.99+
Confluent	ORGANIZATION	0.99+
The Cube	TITLE	0.98+
first line	QUANTITY	0.98+
27	QUANTITY	0.97+
26	QUANTITY	0.97+
Redis	TITLE	0.97+
Kafka Streams	TITLE	0.97+
S3	TITLE	0.97+
18	QUANTITY	0.96+
14 years old	QUANTITY	0.96+
18 years old	QUANTITY	0.96+
about two hours	QUANTITY	0.96+
g ago	DATE	0.96+
Connect	TITLE	0.96+
second-position	QUANTITY	0.95+
HBase	TITLE	0.95+
first thing	QUANTITY	0.95+
Historic Pagoda Lounge	LOCATION	0.94+
both	QUANTITY	0.93+
two different infrastructures	QUANTITY	0.92+
S3	COMMERCIAL_ITEM	0.91+
Big Data	EVENT	0.9+
The Cube	ORGANIZATION	0.88+
Lambda Two	TITLE	0.87+
Malwarebytes 3.0	TITLE	0.84+
Airflow	TITLE	0.83+
a year ago	DATE	0.83+
second-opinion	QUANTITY	0.82+
hundreds of millions of	QUANTITY	0.78+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Jay Kreps: