Basil Faruqui, BMC | theCUBE NYC 2018
(upbeat music) >> Live from New York, it's theCUBE. Covering theCUBE New York City 2018. Brought to you by SiliconANGLE Media and its ecosystem partners. >> Okay, welcome back everyone to theCUBE NYC. This is theCUBE's live coverage covering CubeNYC Strata Hadoop Strata Data Conference. All things data happen here in New York this week. I'm John Furrier with Peter Burris. Our next guest is Basil Faruqui lead solutions marketing manager digital business automation within BMC returns, he was here last year with us and also Big Data SV, which has been renamed CubeNYC, Cube SV because it's not just big data anymore. We're hearing words like multi cloud, Istio, all those Kubernetes. Data now is so important, it's now up and down the stack, impacting everyone, we talked about this last year with Control M, how you guys are automating in a hurry. The four pillars of pipelining data. The setup days are over; welcome to theCUBE. >> Well thank you and it's great to be back on theCUBE. And yeah, what you said is exactly right, so you know, big data has really, I think now been distilled down to data. Everybody understands data is big, and it's important, and it is really you know, it's quite a cliche, but to a larger degree, data is the new oil, as some people say. And I think what you said earlier is important in that we've been very fortunate to be able to not only follow the journey of our customers but be a part of it. So about six years ago, some of the early adopters of Hadoop came to us and said that look, we use your products for traditional data warehousing on the ERP side for orchestration workloads. We're about to take some of these projects on Hadoop into production and really feel that the Hadoop ecosystem is lacking enterprise-grade workflow orchestration tools. So we partnered with them and some of the earliest goals they wanted to achieve was build a data lake, provide richer and wider data sets to the end users to be able to do some dashboarding, customer 360, and things of that nature. Very quickly, in about five years time, we have seen a lot of these projects mature from how do I build a data lake to now applying cutting-edge ML and AI and cloud is a major enabler of that. You know, it's really, as we were talking about earlier, it's really taking away excuses for not being able to scale quickly from an infrastructure perspective. Now you're talking about is it Hadoop or is it S3 or is it Azure Blob Storage, is it Snowflake? And from a control-end perspective, we're very platform and technology agnostic, so some of our customers who had started with Hadoop as a platform, they are now looking at other technologies like Snowflake, so one of our customers describes it as kind of the spine or a power strip of orchestration where regardless of what technology you have, you can just plug and play in and not worry about how do I rewire the orchestration workflows because control end is taking care of it. >> Well you probably always will have to worry about that to some degree. But I think where you're going, and this is where I'm going to test with you, is that as analytics, as data is increasingly recognized as a strategic asset, as analytics increasingly recognizes the way that you create value out of those data assets, and as a business becomes increasingly dependent upon the output of analytics to make decisions and ultimately through AI to act differently in markets, you are embedding these capabilities or these technologies deeper into business. They have to become capabilities. They have to become dependable. They have to become reliable, predictable, cost, performance, all these other things. That suggests that ultimately, the historical approach of focusing on the technology and trying to apply it to a periodic or series of data science problems has to become a little bit more mature so it actually becomes a strategic capability. So the business can say we're operating on this, but the technologies to take that underlying data science technology to turn into business operations that's where a lot of the net work has to happen. Is that what you guys are focused on? >> Yeah, absolutely, and I think one of the big differences that we're seeing in general in the industry is that this time around, the pull of how do you enable technology to drive the business is really coming from the line of business, versus starting on the technology side of the house and then coming to the business and saying hey we've got some cool technologies that can probably help you, it's really line of business now saying no, I need better analytics so I can drive new business models for my company, right? So the need for speed is greater than ever because the pull is from the line of business side. And this is another area where we are unique is that, you know, Control M has been designed in a way where it's not just a set of solutions or tools for the technical guys. Now, the line of business is getting closer and closer, you know, it's blending into the technical side as well. They have a very, very keen interest in understanding are the dashboards going to be refreshed on time? Are we going to be able to get all the right promotional offers at the right time? I mean, we're here at NYC Strata, there's a lot of real-time promotion happening here. The line of business has direct interest in the delivery and the timing of all of this, so we have always had multiple interfaces to Control M where a business user who has an interest in understanding are the promotional offers going to happen at the right time and is that on schedule? They have a mobile app for them to do that. A developer who's building up complex, multi-application platform, they have an API and a programmatic interface to do that. Operations that has to monitor all of this has rich dashboards to be able to do that. That's one of the areas that has been key for our success over the last couple decades, and we're seeing that translate very well into the big data place. >> So I just want to go under the hood for a minute because I love that answer. And I'd like to pivot off what Peter said, tying it back to the business, okay, that's awesome. And I want to learn a little bit more about this because we talked about this last year and I kind of am seeing it now. Kubernetes and all this orchestration is about workloads. You guys nailed the workflow issue, complex workflows. Because if you look at it, if you're adding line of business into the equation, that's just complexity in and of itself. As more workflows exist within its own line of business, whether it's recommendations and offers and workflow issues, more lines of business in there is complex for even IT to deal with, so you guys have nailed that. How does that work? Do you plug it in and the lines of businesses have their own developers, so the people who work with the workflows engage how? >> So that's a good question, with sort of orchestration and automation now becoming very, very generic, it's kind of important to classify where we play. So there's a lot of tools that do release and build automation. There's a lot of tools that'll do infrastructure automation and orchestration. All of this infrastructure and release management process is done ultimately to run applications on top of it, and the workflows of the application need orchestration and that's the layer that we play in. And if you think about how does the end user, the business and consumer interact with all of this technology is through applications, k? So the orchestration of the workflow's inside the applications, whether you start all the way from an ERP or a CRM and then you land into a data lake and then do an ML model, and then out come the recommendations analytics, that's the layer we are automating today. Obviously, all of this-- >> By the way, the technical complexity for the user's in the app. >> Correct, so the line of business obviously has a lot more control, you're seeing roles like chief digital officers emerge, you're seeing CTOs that have mandates like okay you're going to be responsible for all applications that are facing customer facing where the CIO is going to take care of everything that's inward facing. It's not a settled structure or science involved. >> It's evolving fast. >> It's evolving fast. But what's clear is that line of business has a lot more interest and influence in driving these technology projects and it's important that technologies evolve in a way where line of business can not only understand but take advantage of that. >> So I think it's a great question, John, and I want to build on that and then ask you something. So the way we look at the world is we say the first fifty years of computing were known process, unknown technology. The next fifty years are going to be unknown process, known technology. It's all going to look like a cloud. But think about what that means. Known process, unknown technology, Control M and related types of technologies tended to focus on how you put in place predictable workflows in the technology layer. And now, unknown process, known technology, driven by the line of business, now we're talking about controlling process flows that are being created, bespoke, strategic, differentiating doing business. >> Well, dynamic, too, I mean, dynamic. >> Highly dynamic, and those workflows in many respects, those technologies, piecing applications and services together, become the process that differentiates the business. Again, you're still focused on the infrastructure a bit, but you've moved it up. Is that right? >> Yeah, that's exactly right. We see our goal as abstracting the complexity of the underlying application data and infrastructure. So, I mean, it's quite amazing-- >> So it could be easily reconfigured to a business's needs. >> Exactly, so whether you're on Hadoop and now you're thinking about moving to Snowflake or tomorrow something else that comes up, the orchestration or the workflow, you know, that's as a business as a product that's our goal is to continue to evolve quickly and in a manner that we continue to abstract the complexity so from-- >> So I've got to ask you, we've been having a lot of conversations around Hadoop versus Kubernetes on multi cloud, so as cloud has certainly come in and changed the game, there's no debate on that. How it changes is debatable, but we know that multiple clouds is going to be the modus operandus for customers. >> Correct. >> So I got a lot of data and now I've got pipelining complexities and workflows are going to get even more complex, potentially. How do you see the impact of the cloud, how are you guys looking at that, and what are some customer use cases that you see for you guys? >> So the, what I mentioned earlier, that being platform and technology agnostic is actually one of the unique differentiating factors for us, so whether you are an AWS or an Azure or a Google or On-Prem or still on a mainframe, a lot of, we're in New York, a lot of the banks, insurance companies here still do some of the most critical processing on the mainframe. The ability to abstract all of that whether it's cloud or legacy solutions is one of our key enablers for our customers, and I'll give you an example. So Malwarebytes is one of our customers and they've been using Control M for several years. Primarily the entire structure is built on AWS, but they are now utilizing Google cloud for some of their recommendation analysis on sentiment analysis because their goal is to pick the best of breed technology for the problem they're looking to solve. >> Service, the best breed service is in the cloud. >> The best breed service is in the cloud to solve the business problem. So from Control M's perspective, transcending from AWS to Google cloud is completely abstracted for them, so runs Google tomorrow it's Azure, they decide to build a private cloud, they will be able to extend the same workflow orchestration. >> But you can build these workflows across whatever set of services are available. >> Correct, and you bring up an important point. It's not only being able to build the workflows across platforms but being able to define dependencies and track the dependencies across all of this, because none of this is happening in silos. If you want to use Google's API to do the recommendations, well, you've got to feed it the data, and the data's pipeline, like we talked about last time, data ingestion, data storage, data processing, and analytics have very, very intricate dependencies, and these solutions should be able to manage not only the building of the workflow but the dependencies as well. >> But you're defining those elements as fundamental building blocks through a control model >> Correct. >> That allows you to treat the higher level services as reliable, consistent, capabilities. >> Correct, and the other thing I would like to add here is not only just build complex multiplatform, multiapplication workflows, but never lose focus of the business service of the business process there, so you can tie all of this to a business service and then, these things are complex, there are problems, let's say there's an ETL job that fails somewhere upstream, Control M will immediately be able to predict the impact and be able to tell you this means the recommendation engine will not be able to make the recommendations. Now, the staff that's going to work under mediation understands the business impact versus looking at a screen where there's 500 jobs and one of them has failed. What does that really mean? >> Set priorities and focal points and everything else. >> Right. >> So I just want to wrap up by asking you how your talk went at Strata Hadoop Data Conference. What were you talking about, what was the core message? Was it Control M, was it customer presentations? What was the focus? >> So the focus of yesterday's talk was actually, you know, one of the things is academic talk is great, but it's important to, you know, show how things work in real life. The session was focused on a real-use case from a customer. Navistar, they have IOT data-driven pipelines where they are predicting failures of parts inside trucks and buses that they manufacture, you know, reducing vehicle downtime. So we wanted to simulate a demo like that, so that's exactly what we did. It was very well received. In real-time, we spun up EMR environment in AWS, automatically provision control of infrastructure there, we applied spark and machine learning algorithms to the data and out came the recommendation at the end was that, you know, here are the vehicles that are-- >> Fix their brakes. (laughing) >> Exactly, so it was very, very well received. >> I mean, there's a real-world example, there's real money to be saved, maintenance, scheduling, potential liability, accidents. >> Liability is a huge issue for a lot of manufacturers. >> And Navistar has been at the leading edge of how to apply technologies in that business. >> They really have been a poster child for visual transformation. >> They sure have. >> Here's a company that's been around for 100 plus years and when we talk to them they tell us that we have every technology under the sun that has come since the mainframe, and for them to be transforming and leading in this way, we're very fortunate to be part of their journey. >> Well we'd love to talk more about some of these customer use cases. Other people love about theCUBE, we want to do more of them, share those examples, people love to see proof in real-world examples, not just talk so appreciate it sharing. >> Absolutely. >> Thanks for sharing, thanks for the insights. We're here Cube live in New York City, part of CubeNYC, we're getting all the data, sharing that with you. I'm John Furrier with Peter Burris. Stay with us for more day two coverage after this short break. (upbeat music)
SUMMARY :
Brought to you by SiliconANGLE Media with Control M, how you guys are automating in a hurry. describes it as kind of the spine or a power strip but the technologies to take that underlying of the house and then coming to the business You guys nailed the workflow issue, and that's the layer that we play in. for the user's in the app. Correct, so the line of business and it's important that technologies evolve in a way So the way we look at the world is we say that differentiates the business. of the underlying application data and infrastructure. so as cloud has certainly come in and changed the game, and what are some customer use cases that you see for the problem they're looking to solve. is in the cloud. The best breed service is in the cloud But you can build these workflows across and the data's pipeline, like we talked about last time, That allows you to treat the higher level services and be able to tell you this means the recommendation engine So I just want to wrap up by asking you at the end was that, you know, Fix their brakes. there's real money to be saved, And Navistar has been at the leading edge of how They really have been a poster child for and for them to be transforming and leading in this way, people love to see proof in real-world examples, Thanks for sharing, thanks for the insights.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
John | PERSON | 0.99+ |
Basil Faruqui | PERSON | 0.99+ |
Peter Burris | PERSON | 0.99+ |
BMC | ORGANIZATION | 0.99+ |
Peter | PERSON | 0.99+ |
500 jobs | QUANTITY | 0.99+ |
ORGANIZATION | 0.99+ | |
New York | LOCATION | 0.99+ |
last year | DATE | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
New York City | LOCATION | 0.99+ |
SiliconANGLE Media | ORGANIZATION | 0.99+ |
John Furrier | PERSON | 0.99+ |
Hadoop | TITLE | 0.99+ |
first fifty years | QUANTITY | 0.99+ |
theCUBE | ORGANIZATION | 0.99+ |
Navistar | ORGANIZATION | 0.99+ |
tomorrow | DATE | 0.98+ |
yesterday | DATE | 0.98+ |
one | QUANTITY | 0.98+ |
this week | DATE | 0.97+ |
Malwarebytes | ORGANIZATION | 0.97+ |
Cube | ORGANIZATION | 0.95+ |
Control M | ORGANIZATION | 0.95+ |
NYC | LOCATION | 0.95+ |
Snowflake | TITLE | 0.95+ |
Strata Hadoop Data Conference | EVENT | 0.94+ |
100 plus years | QUANTITY | 0.93+ |
CubeNYC Strata Hadoop Strata Data Conference | EVENT | 0.92+ |
last couple decades | DATE | 0.91+ |
Azure | TITLE | 0.91+ |
about five years | QUANTITY | 0.91+ |
Istio | ORGANIZATION | 0.9+ |
CubeNYC | ORGANIZATION | 0.89+ |
day | QUANTITY | 0.87+ |
about six years ago | DATE | 0.85+ |
Kubernetes | TITLE | 0.85+ |
today | DATE | 0.84+ |
NYC Strata | ORGANIZATION | 0.83+ |
Hadoop | ORGANIZATION | 0.78+ |
one of them | QUANTITY | 0.77+ |
Big Data SV | ORGANIZATION | 0.75+ |
2018 | EVENT | 0.7+ |
Kubernetes | ORGANIZATION | 0.66+ |
fifty years | DATE | 0.62+ |
Control M | TITLE | 0.61+ |
four pillars | QUANTITY | 0.61+ |
two | QUANTITY | 0.6+ |
-Prem | ORGANIZATION | 0.6+ |
Cube SV | COMMERCIAL_ITEM | 0.58+ |
a minute | QUANTITY | 0.58+ |
S3 | TITLE | 0.55+ |
Azure | ORGANIZATION | 0.49+ |
cloud | TITLE | 0.49+ |
2018 | DATE | 0.43+ |
Darren Chinen, Malwarebytes - Big Data SV 17 - #BigDataSV - #theCUBE
>> Announcer: Live from San Jose, California, it's The Cube, covering Big Data Silicon Valley 2017. >> Hey, welcome back everybody. Jeff Frick here with The Cube. We are at Big Data SV in San Jose at the Historic Pagoda Lounge, part of Big Data week which is associated with Strata + Hadoop. We've been coming here for eight years and we're excited to be back. The innovation and dynamicism of big data and evolutions now with machine learning and artificial intelligence, just continues to roll, and we're really excited to be here talking about one of the nasty aspects of this world, unfortunately, malware. So we're excited to have Darren Chinen. He's the senior director of data science and engineering from Malwarebytes. Darren, welcome. >> Darren: Thank you. >> So for folks that aren't familiar with the company, give us just a little bit of background on Malwarebytes. >> So Malwarebytes is basically a next-generation anti-virus software. We started off as humble roots with our founder at 14 years old getting infected with a piece of malware, and he reached out into the community and, at 14 years old, wrote his first, with the help of some people, wrote his first lines of code to remediate a couple of pieces of malware. It grew from there and I think by the ripe old age of 18, founded the company. And he's now I want to say 26 or 27 and we're doing quite well. >> It was interesting, before we went live you were talking about his philosophy and how important that is to the company and now has turned into really a strategic asset, that no one should have to suffer from malware, and he decided to really offer a solution for free to help people rid themselves of this bad software. >> Darren: That's right. Yeah, so Malwarebytes was founded under the principle that Marcin believes that everyone has the right to a malware-free existence and so we've always offered a free version Malwarebytes that will help you to remediate if your machine does get infected with a piece of malware. And that's actually still going to this day. >> And that's now given you the ability to have a significant amount of inpoint data, transactional data, trend data, that now you can bake back into the solution. >> Darren: That's right. It's turned into a strategic advantage for the company, it's not something I don't think that we could have planned at 18 years old when he was doing this. But we've instrumented it so that we can get some anonymous-level telemetry and we can understand how malware proliferates. For many, many years we've been positioned as a second-opinion scanner and so we're able to see a lot of things, some trends happening in there and we can actually now see that in real time. >> So, starting out as a second-position scanner, you're basically looking at, you're finding what others have missed. And how can you, what do you have to do to become the first line of defense? >> Well, with our new product Malwarebytes 3.0, I think some of that landscape is changing. We have a very complete and layered offering. I'm not the product manager, so I don't think, as the data science guy, I don't know that I'm qualified to give you the ins and outs, but I think some of that is changing as we have, we've combined a lot of products and we have a much more complete sweep of layered protection built into the product. >> And so, maybe tell us, without giving away all the secret sauce, what sort of platform technologies did you use that enabled you to scale to these hundreds of millions of in points, and then to be fast enough at identifying things that were trending that are bad that you had to prioritize? >> Right, so traditionally, I think AV companies, they have these honeypots, right, where they go and the collect a piece of virus or a piece of malware, and they'll take the MD5 hash of that and then they'll basically insert that into a definition's database. And that's a very exact way to do it. The problem is is that there's so much malware or viruses out there in the wild, it's impossible to get all of them. I think one of the things that we did was we set up telemetry and we have a phenomenal research team where we're able to actually have our team catch entire families of malware, and that's really the secret sauce to Malwarebytes. There's several other levels but that's where we're helping out in the immediate term. What we do is we have, internally, we sort of jokingly call it a Lambda Two architecture. We had considered Lambda long ago, long ago and I say about a year ago when we first started this journey. But there's, Lambda is riddled with, as you know, a number of issues. If you've ever talked to Jay Kreps from Confluent, he has a lot of opinions on that, right? And one of the key problems with that is, that if you do a traditional Lambda, you have to implement your code in two places, it's very difficult, things get out of sync, you have to have replay frameworks. And these are some of the challenges with Lambda. So we do processing in a number of areas. The first thing that we did was we implemented Kafka to handle all of the streaming data. We use Kafka streams to do inline stateless transformations and then we also use Kafka Connect. And we write all of our data both into HBase, we use that, we may swap that out later for something like Redis, and that would be a thin speed layer. And then we also move the data into S3 and we use some ephemeral clusters to do very large-scale batch processing, and that really provides our data lab. >> When you call that Lambda Two, is that because you're still working essentially on two different infrastructures, so your code isn't quite the same? You still have to check the results on either on either fork. >> That's right, yeah, we didn't feel like it was, we did evaluate doing everything in the stream. But there are certain operations that are difficult to do with purely streamed processing, and so we did need a little bit, we did need to have a thin, what we call real time indicators, a speed layer, to supplement what we were doing in the stream. And so that's the differentiating factor between a traditional Lambda architecture where you'd want to have everything in the stream and everything in batch, and the batch is really more of a truing mechanism as opposed to, our real time is really directional, so in the traditional sense, if you look at traditional business intelligence, you'd have KPIs that would allow you to gauge the health of your business. We have RTIs, Real Time Indicators, that allow us to gauge directionally, what is important to look at this day, this hour, this minute? >> This thing is burning up the charts, >> Exactly. >> Therefore it's priority one. >> That's right, you got it. >> Okay. And maybe tell us a little more, because everyone I'm sure is familiar with Kafka but the streams product from them is a little newer as is Kafka Connect, so it sounds like you've got, it's not just the transport, but you've got some basic analytics and you've got the ability to do the ETL because you've got Connect that comes from sources and destinations, sources and syncs. Tell us how you've used that. >> Well, the streams product is, it's quite different than something like Spark Streaming. It's not working off micro-batching, it's actually working off the stream. And the second thing is, it's not a separate cluster. It's just a library, effectively a .jar file, right? And so because it works natively with Kafka, it handles certain things there quite well. It handles back pressure and when you expand the cluster, it's pretty good with things like that. We've found it to be a fairly stable technology. It's just a library and we've worked very closely with Confluent to develop that. Whereas Kafka Connect is really something that we use to write out to S3. In fact, Confluent just released a new, an S3 connector direct. We were using Stream X, which was a wrapper on top of an HDFS connector and they rigged that up to write to S3 for us. >> So tell us, as you look out, what sorts of technologies do you see as enabling you to build a platform that's richer, and then how would that show up in the functionality consumers like we would see? >> Darren: With respect to the architecture? >> Yeah. >> Well one of the things that we had to do is we had to evaluate where we wanted to spend our time. We're a very small team, the entire data science and engineering team is less than I think 10 months old. So all of us got hired, we've started this platform, we've gone very, very fast. And we had to decide, how are we going to, a, get, we've made this big investment, how are we going to get value to our end customer quickly, so that they're not waiting around and you get the traditional big-data story where, we've spent all this money and now we're not getting anything out of it. And so we had to make some of those strategic decisions and because of the fact that the data was really truly big data in nature, there's just a huge amount of work that has to be done in these open-source technologies. They're not baked, it's not like going out to Oracle and giving them a purchase order and you install it and away you go. There's a tremendous amount of work, and so we've made some strategic decisions on what we're going to do in open-source and what we're going to do with a third-party vendor solution. And one of those solutions that we decided was workload automation. So I just did a talk on this about how Control-M from BMC was really the tool that we chose to handle a lot of the coordination, the sophisticated coordination, and the workload automation on the batch side, and we're about to implement that in a data-quality monitoring framework. And that's turned out to be an incredibly stable solution for us. It's allowed us to not spend time with open-source solutions that do the same things like Airflow, which may or may not work well, but there's really no support around that, and focus our efforts on what we believe to be the really, really hard problems to tackle in Kafka, Kafka Streams, Connect, et cetera. >> Is it fair to say that Kafka plus Kafka Connect solves many of the old ETL problems or do you still need some sort of orchestration tool on top of it to completely commoditize, essentially moving and transforming data from OLTP or operational system to a decision support system? >> I guess the answer to that is, it depends on your use case. I think there's a lot of things that Kafka and the stream's job can solve for you, but I don't think that we're at the point where everything can be streaming. I think that's a ways off. There's legacy systems that really don't natively stream to you anyway, and there's just certain operations that are just more efficient to do in batch. And so that's why we've, I don't think batch for us is going away any time soon and that's one of the reasons why workload automation in the batch layer initially was so important and we've decided to extend that, actually, into building out a data-quality monitoring framework to put a collar around how accurate our data is on the real-time side. >> Cuz it's really horses for courses, it's not one or the other, it's application-specific, what's the best solution for that particular is. >> Yeah, I don't think that there's, if there was a one-size-fits-all it'd be a company, and there would be no need for architects, so I think that you have to look at your use case, your company, what kind of data, what style of data, what type of analysis do you need. Do you really actually need the data in real time and if you do put in all the work to get it in real time, are you going to be able to take action on it? And I think Malwarebytes was a great candidate. When it came in, I said, "Well, it does look like we can justify "the need for real time data, and the effort "that goes into building out a real-time framework." >> Jeff: Right, right. And we always say, what is real time? In time to do something about it, (all chuckle) and if there's not time to do something about it, depending on how you define real time, really what difference does it make if you can't do anything about it that fast. So as you look out in the future with IoT, all these connected devices, this is a hugely increased attack surface as we just read our essay a few weeks back. How does that work into your planning? What do you guys think about the future where there's so many more connected devices out on the edge and various degrees of intelligence and opportunities to hi-jack, if you will? >> Yeah, I think, I don't think I'm qualified to speak about the Malwarebytes product roadmap as far as IoT goes. >> But more philosophically, from a professional point of view, cuz every coin has two sides, there's a lot of good stuff coming from IoT and connected devices, but as we keep hearing over and over, just this massive attack surface expansion. >> Well I think, for us, the key is we're small and we're not operating, like I came from Apple where we operated on a budget of infinity, so we're not-- >> Have to build the infinity or the address infinity (Darren laughs) with an actual budget. >> We're small and we have to make sure that whatever we do creates value. And so what I'm seeing in the future is, as we get more into the IoT space and logs begin to proliferate and data just exponentiates in size, it's really how do we do the same thing and how are we going to manage that in terms of cost? Generally, big data is very low in information density. It's not like transactional systems where you get the data, it's effectively an Excel spreadsheet and you can go run some pivot tables and filters and away you go. I think big data in general requires a tremendous amount of massaging to get to the point where a data scientist or an analyst can actually extract some insight and some value. And the question is, how do you massage that data in a way that's going to be cost-effective as IoT expands and proliferates? So that's the question that we're dealing with. We're, at this point, all in with cloud technologies, we're leveraging quite a few of Amazon services, server-less technologies as well. We just are in the process of moving to the Athena, to Athena, as just an on-demand query service. And we use a lot of ephemeral clusters as well, and that allows us to actually run all of our ETL in about two hours. And so these are some of the things that we're doing to prepare for this explosion of data and making sure that we're in a position where we're not spending a dollar to gain a penny if that makes sense. >> That's his business. Well, he makes fun of that business model. >> I think you could do it, you want to drive revenue to sell dollars for 90 cents. >> That's the dot com model, I was there. >> Exactly, and make it up in volume. All right, Darren Chenin, thanks for taking a few minutes out of your day and giving us the story on Malwarebytes, sounds pretty exciting and a great opportunity. >> Thanks, I enjoyed it. >> Absolutely, he's Darren, he's George, I'm Jeff, you're watching The Cube. We're at Big Data SV at the Historic Pagoda Lounge. Thanks for watching, we'll be right back after this short break. (upbeat techno music)
SUMMARY :
it's The Cube, and evolutions now with machine learning So for folks that aren't and he reached out into the community and, and how important that is to the company and so we've always offered a free version And that's now given you the ability it so that we can get what do you have to do to become and we have a much more complete sweep and that's really the secret the results on either and so we did need a little bit, and you've got the ability to do the ETL that we use to write out to S3. and because of the fact that the data and that's one of the reasons it's not one or the other, and if you do put in all the and opportunities to hi-jack, if you will? I don't think I'm qualified to speak and connected devices, or the address infinity and how are we going to Well, he makes fun of that business model. I think you could do it, and giving us the story on Malwarebytes, the Historic Pagoda Lounge.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Jeff | PERSON | 0.99+ |
Darren Chinen | PERSON | 0.99+ |
Darren | PERSON | 0.99+ |
Jeff Frick | PERSON | 0.99+ |
Darren Chenin | PERSON | 0.99+ |
George | PERSON | 0.99+ |
Jay Kreps | PERSON | 0.99+ |
90 cents | QUANTITY | 0.99+ |
two sides | QUANTITY | 0.99+ |
Apple | ORGANIZATION | 0.99+ |
Athena | LOCATION | 0.99+ |
Marcin | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
two places | QUANTITY | 0.99+ |
San Jose | LOCATION | 0.99+ |
BMC | ORGANIZATION | 0.99+ |
eight years | QUANTITY | 0.99+ |
San Jose, California | LOCATION | 0.99+ |
first lines | QUANTITY | 0.99+ |
Malwarebytes | ORGANIZATION | 0.99+ |
Kafka | TITLE | 0.99+ |
one | QUANTITY | 0.99+ |
10 months | QUANTITY | 0.99+ |
Kafka Connect | TITLE | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
Lambda | TITLE | 0.99+ |
first | QUANTITY | 0.99+ |
second thing | QUANTITY | 0.99+ |
Gene | PERSON | 0.99+ |
Excel | TITLE | 0.99+ |
Confluent | ORGANIZATION | 0.99+ |
The Cube | TITLE | 0.98+ |
first line | QUANTITY | 0.98+ |
27 | QUANTITY | 0.97+ |
26 | QUANTITY | 0.97+ |
Redis | TITLE | 0.97+ |
Kafka Streams | TITLE | 0.97+ |
S3 | TITLE | 0.97+ |
18 | QUANTITY | 0.96+ |
14 years old | QUANTITY | 0.96+ |
18 years old | QUANTITY | 0.96+ |
about two hours | QUANTITY | 0.96+ |
g ago | DATE | 0.96+ |
Connect | TITLE | 0.96+ |
second-position | QUANTITY | 0.95+ |
HBase | TITLE | 0.95+ |
first thing | QUANTITY | 0.95+ |
Historic Pagoda Lounge | LOCATION | 0.94+ |
both | QUANTITY | 0.93+ |
two different infrastructures | QUANTITY | 0.92+ |
S3 | COMMERCIAL_ITEM | 0.91+ |
Big Data | EVENT | 0.9+ |
The Cube | ORGANIZATION | 0.88+ |
Lambda Two | TITLE | 0.87+ |
Malwarebytes 3.0 | TITLE | 0.84+ |
Airflow | TITLE | 0.83+ |
a year ago | DATE | 0.83+ |
second-opinion | QUANTITY | 0.82+ |
hundreds of millions of | QUANTITY | 0.78+ |
Basil Faruqui, BMC Software - BigData SV 2017 - #BigDataSV - #theCUBE
(upbeat music) >> Announcer: Live from San Jose, California, it's theCUBE covering Big Data Silicon Valley 2017. >> Welcome back everyone. We are here live in Silicon Valley for theCUBE's Big Data coverage. Our event, Big Data Silicon Valley, also called Big Data SV. A companion event to our Big Data NYC event where we have our unique program in conjunction with Strata Hadoop. I'm John Furrier with George Gilbert, our Wikibon big data analyst. And we have Basil Faruqui, who is the Solutions Marketing Manager at BMC Software. Welcome to theCUBE. >> Thank you, great to be here. >> We've been hearing a lot on theCUBE about schedulers and automation, and machine learning is the hottest trend happening in big data. We're thinking that this is going to help move the needle on some things. Your thoughts on this, on the world we're living in right now, and what BMC is doing at the show. >> Absolutely. So, scheduling and workflow automation is absolutely critical to the success of big data projects. This is not something new. Hadoop is only about 10 years old but other technologies that have come before Hadoop have relied on this foundation for driving success. If we look the Hadoop world, what gets all the press is all the real-time stuff, but what powers all of that underneath it is a very important layer of batch. If you think about some of the most common use cases for big data, if you think of a bank, they're talking about fraud detection and things like that. Let's just take the fraud detection example. Detecting an anomaly of how somebody is spending, if somebody's credit card is used which doesn't match with their spending habits, the bank detects that and they'll maybe close the card down or contact somebody. But if you think about everything else that has happened before that as something that has happened in batch mode. For them to collect the history of how that card has been used, then match it with how all the other card members use the cards. When the cards are stolen, what are those patterns? All that stuff is something that is being powered by what's today known as workload automation. In the past, it's been known by names such as job scheduling and batch processing. >> In the systems businesses everyone knows what schedulers, compilers, all this computer science stuff. But this is interesting. Now that the data lake has become so swampy, and people call it the data swamp, people are looking at moving data out of data lakes into real time, as you mention, but it requires management. So, there's a lot of coordination going on. This seems to be where most enterprises are now focusing their attention on, is to make that data available. >> Absolutely. >> Hence the notion of scheduling and workloads. Because their use cases are different. Am I getting it right? >> Yeah, absolutely. And if we look at what companies are doing, every CEO and every boardroom, there's a charter for digital transformation for companies. And, it's no longer about taking one or two use cases around big data and driving success. Data and intelligence is now at the center of everything a company does, whether it's building new customer engagement models, whether it's building new ecosystems with their partners, suppliers. Back-office optimization. So, when CIOs and data architects think about having to build a system like that, they are faced with a number of challenges. It has to become enterprise ready. It has to take into account governance, security, and others. But, if you peel the onion just a little bit, what architects and CIOs are faced with is okay, you've got a web of complex technologies, legacy applications, modern applications that hold a lot of the corporate data today. And then you have new sources of data like social media, devices, sensors, which have a tendency to produce a lot more data. First things first, you've got a ecosystem like Hadoop, which is supposed to be kind of the nerve center of the new digital platform. You've got to start ingesting all this data into Hadoop. This has to be in an automated fashion for it to be able to scalable. >> But this is the combination of streaming and batch. >> Correct. >> Now this seems to be the management holy grail right now. Nailing those two. Did I get that? >> Absolutely. So, people talk about, in technical terms, the speed layer and the batch layer. And both have to converge for them to be able to deliver the intelligence and insight that the business users are looking for. >> Would it be fair to say it's not just the convergence of the speed layer and batch layer in Hadoop but what BMC brings to town is the non-Hadoop parts of those workloads. Whether it's batch outside Hadoop or if there's streaming, which sort-of pre-Hadoop was more nichey. But we need this over-arching control, which if it's not a Hadoop-centric architecture. >> Absolutely. So, I've said this for a long time, that Hadoop is never going to live on an island on its own in the enterprise. And with the maturation of the market, Hadoop has to now play with the other technologies in the stack So if you think about, just take data ingestion for an example, you've got ERP's, you've got CRM's, you've got middleware, you've got data warehouses, and you have to ingest a lot of that in. Where Control-M brings a lot of value and speeds up time to market is that we have out-of-the box integrations with a lot of the systems that already exist in the enterprise, such as ERP solutions and others. Virtually any application that can expose itself through an API or a web service, Control-M has the ability to automate that ingestion piece. But this is only step one of the journey. So, you've brought all this data into Hadoop and now you've got to process it. The number of tools available for processing this is growing at an unprecedented rate. You've got, you know MapReduce was a hot thing just two years ago and now Spark has taken over. So Control-M, about four years ago we started building very deep native capabilities in their new ecosystem. So, you've got ingestion that's automated, then you can seamlessly automate the actual processing of the data using things like Spark, Hive, PEG, and others. And the last mile of the journey, the most important one, is them making this refined data available to systems and users that can analyze it. Often Hadoop is not the repository where analytic systems sit on top of. It's another layer where all of this has to be moved. So, if you zoom out and take a look at it, this is a monumental task. And if you use siloed approach to automating this, this becomes unscalable. And that's where a lot of the Hadoop projects often >> Crash and burn. >> Crash and burn, yes, sustainability. >> Let's just say it, they crash and burn. >> So, Control-M has been around for 30 years. >> By the way, just to add to the crash-and-burn piece, the data lake gets stalled there, that's why the swamp happens, because they're like, now how do I operationalize this and scale it out? >> Right, if you're storing a lot of data and not making it available for processing and analysis, then it's of no use. And that's exactly our value proposition. This is a problem we haven't solved for the first time. We did this as we have seen these waves of automation come through. From the mainframe time when it was called batch processing. Then it evolved into distributed client server when it was known more as job scheduling. And now. >> So BMCs have seen this movie before. >> Absolutely. >> Alright, so let's take a step back. Zoom out, step back, go hang out in the big trees, look down on the market. Data practitioners, big data practitioners out there right now are wrestling with this issue. You've got streaming, real-time stuff, you got batch, it's all coming together. What is Control-M doing great right now with practitioners that you guys are solving? Because there are a zillion tools out there, but people are human. Every hammer looks for a nail. >> Sure. So, you have a lot of change happening at the same time but yet these tools. What is Control-M doing to really win? Where are you guys winning? >> Where we are adding a lot of value for our customers is helping them speed up the time to market and delivering these big data projects, in delivering them at scale and quality. >> Give me an example of a project. >> Malwarebytes is a Silicon Valley-based company. They are using this to ingest and analyze data from thousands of end-points from their end users. >> That's their Lambda architecture, right? >> In Lambda architecture, I won't steal their thunder, they're presenting tomorrow at eleven. >> Okay. >> Eleven-thirty tomorrow. Another example is a company called Navistar. Now here's a company that's been around for 200 years. They manufacture heavy-duty trucks, 18-wheelers, school buses. And they recently came up with a service called OnCommand. They have a fleet of 160,000 trucks that are fitted with sensors. They're sending telematic data back to their data centers. And in between that stops in the cloud. So it gets to the cloud. >> So they're going up to the cloud for upload and backhaul, basically, right? >> Correct. So, it goes to the cloud. From there it is ingested inside their Hadoop systems. And they're looking for trends to make sure none of the trucks break down because a truck that's carrying freight breaks down hits the bottom line right away. But that's not where they're stopping. In real time they can triangulate the position of the truck, figure out where the nearest dealership is. Do they have the parts? When to schedule the service. But, if you think about it, the warranty information, the parts information is not sitting in Hadoop. That's sitting in their mainframes, SAP systems, and others. And Control-M is orchestrating this across the board, from mainframe to ERP and into Hadoop for them to be able to marry all this data together. >> How do you get back into the legacy? That's because you have the experience there? Is that part of the product portfolio? >> That is absolutely a part of the product portfolio. We started our journey back in the mainframe days, and as the world has evolved, to client server to web, and now mobile and virtualized and software-defined infrastructures, we have kept pace with that. >> You guys have a nice end-to-end view right now going on. And certainly that example with the trucks highlights IOT rights right there. >> Exactly. You have a clear line of sight on IOT? >> Yup. >> That would be the best measure of your maturity is the breadth of your integrations. >> Absolutely. And we don't stop at what we provide just out of the box. We realized that we have 30 to 35 out-of-the box integrations but there are a lot more applications than that. We have architected control them in a way where that can automate data loads on any application and any database that can expose itself through an API. That is huge because if you think about the open-source world, by the time this conference is going to be over, there's going to be a dozen new tools and projects that come online. And that's a big challenge for companies too. How do you keep pace with this and how do you (drowned out) all this? >> Well, I think people are starting to squint past the fashion aspect of open source, which I love by the way, but it does create more diversity. But, you know, some things become fashionable and then get big-time trashed. Look at Spark. Spark was beautiful. That one came out of the woodwork. George, you're tracking all the fashion. What's the hottest thing right now on open source? >> It seems to me that we've spent five-plus years building data lakes and now we're trying to take that data and apply the insides from it to applications. And, really Control-M's value add, my understanding is, we have to go beyond Hadoop because Hadoop was an island, you know, an island or a data lake, but now the insides have to be enacted on applications that go outside that ecosystem. And that's where Control-M comes in. >> Yeah, absolutely. We are that overarching layer that helps you connect your legacy systems and modern systems and bring it all into Hadoop. The story I tell when I'm explaining this to somebody is that you've installed Hadoop day-one, great, guess what, it has no data in it. You've got to ingest data and you have to be able to take a strategic approach to that because you can use some point solutions and do scripting for the first couple of use cases, but as soon as the business gives us the green light and says, you know what, we really like what we've seen now let's scale up, that's where you really need to take a strategic approach, and that's where Control-M comes in. >> So, let me ask then, if the bleeding edge right now is trying to operationalize the machine learning models that people are beginning to experiment with, just the way they were experimenting with data lakes five years ago, what role can Control-M play today in helping people take a trained model and embed it in an application so it produces useful actions, recommendations, and how much custom integration does that take? >> If we take the example of machine learning, if you peel the onion of machine learning, you've got data that needs to be moved, that needs to be constantly evaluated, and then the algorithms have to be run against it to provide the insights. So, this in itself is exactly what Control-M allows you to do, is ingest the data, process the data, let the algorithms process it, and then of course move it to a layer where people and other systems, it's not just about people anymore, it's other systems that'll analyze the data. And the important piece here is that we're allowing you to do this from a single pane of glass. And being able to see this picture end to end. All of this work is being done to drive business results, generating new revenue models, like in the case of Navistar. Allowing you to capture all of this and then tie it to business SOAs, that is one of the most highly-rated capabilities of Control-M from our customers. >> This is the cloud equation we were talking last week at Google Next. A combination of enterprise readiness across the board. The end-to-end is the picture and you guys are in a good position. Congratulations, and thanks for coming on theCUBE. Really appreciate it. >> Absolutely, great to be here. >> It's theCUBE breaking it down here at Big Data World. This is the trend. It's an operating system world in the cloud. Big data with IOT, AI, machine learning. Big themes breaking out early-on at Big Data SV in conjunction with Strata Hadoop. More right after this short break.
SUMMARY :
it's theCUBE covering Big A companion event to and machine learning is the hottest trend is all the real-time stuff, and people call it the data swamp, Hence the notion of Data and intelligence is now at the center But this is the combination Now this seems to be the that the business users are looking for. of the speed layer and the market, Hadoop has to So, Control-M has From the mainframe time when look down on the market. What is Control-M doing to really win? and delivering these big data projects, Malwarebytes is a Silicon In Lambda architecture, And in between that stops in the cloud. So, it goes to the cloud. and as the world has evolved, And certainly that example with the trucks You have a clear line of sight on IOT? is the breadth of your integrations. is going to be over, That one came out of the woodwork. but now the insides have to and do scripting for the that is one of the most This is the cloud This is the trend.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
George Gilbert | PERSON | 0.99+ |
Basil Faruqui | PERSON | 0.99+ |
BMC | ORGANIZATION | 0.99+ |
one | QUANTITY | 0.99+ |
Silicon Valley | LOCATION | 0.99+ |
Navistar | ORGANIZATION | 0.99+ |
George | PERSON | 0.99+ |
five-plus years | QUANTITY | 0.99+ |
30 | QUANTITY | 0.99+ |
John Furrier | PERSON | 0.99+ |
160,000 trucks | QUANTITY | 0.99+ |
San Jose, California | LOCATION | 0.99+ |
two | QUANTITY | 0.99+ |
Hadoop | TITLE | 0.99+ |
Malwarebytes | ORGANIZATION | 0.99+ |
Silicon Valley | LOCATION | 0.99+ |
last week | DATE | 0.99+ |
Lambda | TITLE | 0.99+ |
both | QUANTITY | 0.99+ |
OnCommand | ORGANIZATION | 0.99+ |
five years ago | DATE | 0.99+ |
tomorrow | DATE | 0.98+ |
two years ago | DATE | 0.98+ |
35 | QUANTITY | 0.98+ |
first time | QUANTITY | 0.98+ |
Big Data SV | EVENT | 0.98+ |
18-wheelers | QUANTITY | 0.98+ |
first couple | QUANTITY | 0.98+ |
Big Data | EVENT | 0.98+ |
BMC Software | ORGANIZATION | 0.97+ |
ORGANIZATION | 0.97+ | |
today | DATE | 0.97+ |
First | QUANTITY | 0.97+ |
about 10 years old | QUANTITY | 0.97+ |
Control-M | ORGANIZATION | 0.96+ |
two use cases | QUANTITY | 0.96+ |
Big Data Silicon Valley 2017 | EVENT | 0.95+ |
Hadoop | ORGANIZATION | 0.95+ |
30 years | QUANTITY | 0.94+ |
first | QUANTITY | 0.94+ |
NYC | LOCATION | 0.94+ |
Big Data Silicon Valley | EVENT | 0.93+ |
single pane | QUANTITY | 0.92+ |
Eleven-thirty | DATE | 0.9+ |
step one | QUANTITY | 0.88+ |
Strata Hadoop | TITLE | 0.88+ |
200 years | QUANTITY | 0.87+ |
theCUBE | ORGANIZATION | 0.87+ |
a dozen new tools | QUANTITY | 0.83+ |
about four years ago | DATE | 0.83+ |
Wikibon | ORGANIZATION | 0.83+ |
-M | ORGANIZATION | 0.82+ |
Big Data SV | ORGANIZATION | 0.82+ |
Control-M | PERSON | 0.81+ |
a zillion tools | QUANTITY | 0.8+ |
thousands of end-points | QUANTITY | 0.76+ |
eleven | DATE | 0.76+ |
Spark | TITLE | 0.76+ |
BMCs | ORGANIZATION | 0.74+ |
Strata Hadoop | PERSON | 0.67+ |
BigData SV 2017 | EVENT | 0.66+ |
#BigDataSV | EVENT | 0.62+ |
Big | ORGANIZATION | 0.62+ |
SAP | ORGANIZATION | 0.6+ |
MapReduce | ORGANIZATION | 0.58+ |
Hive | TITLE | 0.52+ |