PUBLIC SECTOR Optimize

>> Good day, everyone. Thank you for joining me. I'm Cindy Maike, joined by Rick Taylor of Cloudera. We're here to talk about predictive maintenance for the public sector and how to increase asset service reliability. On today's agenda, we'll talk specifically around how to optimize your equipment maintenance, how to reduce costs, asset failure with data and analytics. We'll go into a little more depth on what type of data, the analytical methods that we're typically seeing used, the associated- Brooke will go over a case study as well as a reference architecture. So by basic definition, predictive maintenance is about determining when an asset should be maintained and what specific maintenance activities need to be performed either based upon an assets actual condition or state. It's also about predicting and preventing failures and performing maintenance on your time on your schedule to avoid costly unplanned downtime. McKenzie has looked at analyzing predictive maintenance costs across multiple industries and has identified that there's the opportunity to reduce overall predictive maintenance costs by roughly 50% with different types of analytical methods. So let's look at those three types of models. First, we've got our traditional type of method for maintenance, and that's really about uncorrective maintenance, and that's when we're performing maintenance on an asset after the equipment fails. The challenges with that is we end up with unplanned downtime. We end up with disruptions in our schedules, as well as reduce quality around the performance of the asset. And then we started looking at preventive maintenance and preventative maintenance is really when we're performing maintenance on a set schedule. The challenges with that is we're typically doing it regardless of the actual condition of the asset, which has resulted in unnecessary downtime and expense. And specifically we're really now focused on condition-based maintenance, which is looking at leveraging predictive maintenance techniques based upon actual conditions and real time events and processes. Within that, we've seen organizations and again, source from McKenzie, have a 50% reduction in downtime, as well as overall 40% reduction in maintenance costs. Again, this is really looking at things across multiple industries, but let's look at it in the context of the public sector and based upon some activity by the department of energy several years ago, they really looked at what does predictive maintenance mean to the public sector? What is the benefit of looking at increasing return on investment of assets, reducing, you know, reduction in downtime as well as overall maintenance costs. So corrective or reactive based maintenance is really about performing once there's been a failure and then the movement towards preventative, which is based upon a set schedule. We're looking at predictive where we're monitoring real-time conditions. And most importantly is now actually leveraging IOT and data and analytics to further reduce those overall down times. And there's a research report by the department of energy that goes into more specifics on the opportunity within the public sector. So Rick, let's talk a little bit about what are some of the challenges regarding data, regarding predictive maintenance? >> Some of the challenges include having data silos, historically our government organizations and organizations in the commercial space as well, have multiple data silos. They've spun up over time. There are multiple business units and note, there's no single view of assets. And oftentimes there's redundant information stored in these silos of information. Couple that with huge increases in data volume, data growing exponentially, along with new types of data that we can ingest there's social media, there's semi and unstructured data sources and the real time data that we can now collect from the internet of things. And so the challenge is to collect all these assets together and begin to extract intelligence from them and additional insights and and that in turn, then fuels machine learning and what we call artificial intelligence, which enables predictive maintenance. Next slide. >> Cindy: So let's look specifically at, you know, the types of use cases and I'm going to- Rick and I are going to focus on those use cases, where do we see predictive maintenance coming in to the procurement facility, supply chain, operations and logistics? We've got various level of maturity. So, you know, we're talking about predictive maintenance. We're also talking about using information, whether it be on a connected asset or a vehicle doing monitoring to also leveraging predictive maintenance on how do we bring about looking at data from connected warehouses facilities and buildings? I'll bring an opportunity to both increase the quality and effectiveness of the missions within the agencies to also looking at looking at cost efficiency, as well as looking at risk and safety. And the types of data, you know, that Rick mentioned around, you know, the new types of information. Some of those data elements that we typically have seen is looking at failure history. So when has an asset or a machine or a component within a machine failed in the past? We've also looking at bringing together a maintenance history, looking at a specific machine. Are we getting error codes off of a machine or assets looking at when we've replaced certain components to looking at how are we actually leveraging the assets? What were the operating conditions? Pulling up data from a sensor on that asset? Also looking at the features of an asset, whether it's, you know, engine size it's make and model, where's the asset located? To also looking at who's operated the asset, you know, whether it be their certifications, what's their experience, how are they leveraging the assets? And then also bringing in together some of the pattern analysis that we've seen. So what are the operating limits? Are we getting service reliability? Are we getting a product recall information from the actual manufacturer? So Rick, I know the data landscape has really changed. Let's, let's go over looking at some of those components. >> Rick: Sure. So this slide depicts sort of the, some of the inputs that inform a predictive maintenance program. So we've talked a little bit about the silos of information, the ERP system of record, perhaps the spares and the service history. So we want, what we want to do is combine that information with sensor data, whether it's a facility and equipment sensors, or temperature and humidity, for example. All this stuff is then combined together and then used to develop machine learning models that better inform predictive maintenance, because we do need to take into account the environmental factors that may cause additional wear and tear on the asset that we're monitoring. So here are some examples of private sector maintenance use cases that also have broad applicability across the government. For example, one of the busiest airports in Europe is running Cloudera on Azure to capture secure and correlate sensor data collected from equipment within the airport. The people moving equipment more specifically, the escalators, the elevators, and the baggage carousels. The objective here is to prevent breakdowns and improve airport efficiency and passenger safety. Another example is a container shipping port. In this case, we use IOT data and machine learning to help customers recognize how their cargo handling equipment is performing in different weather conditions to understand how usage relates to failure rates and to detect anomalies in transport systems. These all improve port efficiency. Another example is Navistar. Navistar is a leading manufacturer of commercial trucks, buses, and military vehicles. Typically vehicle maintenance, as Cindy mentioned, is based on miles traveled or based on a schedule or a time since the last service. But these are only two of the thousands of data points that can signal the need for maintenance. And as it turns out, unscheduled maintenance and vehicle breakdowns account for a large share of the total cost for vehicle owners. So to help fleet owners move from a reactive approach to a more predictive model, Navistar built an IOT enabled remote diagnostics platform called On Command. The platform brings in over 70 sensor data feeds for more than 375,000 connected vehicles. These include engine performance, trucks speed, acceleration, coolant temperature and break ware. This data is then correlated with other Navistar and third-party data sources, including weather, geolocation, vehicle usage, traffic, warranty, and parts inventory information. So the platform then uses machine learning and advanced analytics to automatically detect problems early and predict maintenance requirements. So how does the fleet operator use this information? They can monitor truck health and performance from smartphones or tablets and prioritize needed repairs. Also, they can identify that the nearest service location that has the relevant parts, the train technicians and the available service space. So sort of wrapping up the benefits. Navistar's helped fleet owners reduce maintenance costs by more than 30%. This same platform has also used to help school buses run safely and on time. For example, one school district with 110 buses that travel over a million miles annually reduce the number of tows needed year over year, thanks to predictive insights, delivered by this platform. So I'd like to take a moment and walk through the data life cycle as depicted in this diagram. So data ingest from the edge may include feeds from the factory floor or things like connected vehicles, whether they're trucks, aircraft, heavy equipment, cargo vessels, et cetera. Next, the data lands on a secure and governed data platform where it is combined with data from existing systems of record to provide additional insights. And this platform supports multiple analytic functions working together on the same data while maintaining strict security, governance and control measures. Once processed the data is used to train machine learning models, which are then deployed into production, monitored and retrained as needed to maintain accuracy. The process data is also typically placed in a data warehouse and use to support business intelligence analytics and dashboards. And in fact, this data life cycle is representative of one of our government customers doing condition-based maintenance across a variety of aircraft. And the benefits they've discovered include; less unscheduled maintenance and a reduction in mean man hours to repair, increased maintenance efficiencies, improved aircraft availability, and the ability to avoid cascading component failures, which typically costs more in repair cost and downtime. Also, they're able to better forecast the requirements for replacement parts and consumables and last, and certainly very importantly, this leads to enhanced safety. This chart overlays the secure open source Cloudera platform used in support of the data life cycle we've been discussing. Cloudera data flow, provides the data ingest, data movement and real time streaming data query capabilities. So data flow gives us the capability to bring data in from the asset of interest, from the internet of things. While the data platform provides a secure governed data lake and visibility across the full machine learning life cycle eliminates silos and streamlines workflows across teams. The platform includes a integrated suite of secure analytic applications. And two that we're specifically calling out here are Cloudera machine learning, which supports the collaborative data science and machine learning environment, which facilitates machine learning and AI and the Cloudera data warehouse, which supports the analytics and business intelligence, including those dashboards for leadership Cindy, over to you. >> Cindy: Rick, Thank you. And I hope that Rick and I provided you some insights on how predictive maintenance condition-based maintenance is being used and can be used within your respective agency, bringing together data sources that maybe you're having challenges with today, bringing that more real-time information in from a streaming perspective, blending that industrial IOT, as well as historical information together to help actually optimize maintenance and produce costs within each of your agencies. To learn a little bit more about Cloudera and our, what we're doing from a predictive maintenance, please visit us at Cloudera.com/Solutions/PublicSector And we look forward to scheduling a meeting with you. And on that, we appreciate your time today and thank you very much.

Published Date : Aug 5 2021

SUMMARY :

for the public sector and how to increase And so the challenge is to And the types of data, you know, and the ability to avoid And on that, we appreciate your time today

ENTITIES

Entity	Category	Confidence
Rick	PERSON	0.99+
Rick Taylor	PERSON	0.99+
Cindy Maike	PERSON	0.99+
Cindy	PERSON	0.99+
Europe	LOCATION	0.99+
40%	QUANTITY	0.99+
110 buses	QUANTITY	0.99+
50%	QUANTITY	0.99+
Cloudera	ORGANIZATION	0.99+
Navistar	ORGANIZATION	0.99+
more than 30%	QUANTITY	0.99+
two	QUANTITY	0.99+
First	QUANTITY	0.99+
today	DATE	0.99+
more than 375,000 connected vehicles	QUANTITY	0.99+
each	QUANTITY	0.98+
both	QUANTITY	0.98+
one	QUANTITY	0.98+
over a million miles	QUANTITY	0.98+
Azure	TITLE	0.98+
McKenzie	ORGANIZATION	0.97+
over 70 sensor data	QUANTITY	0.96+
one school district	QUANTITY	0.95+
Cloudera	TITLE	0.95+
Brooke	PERSON	0.91+
thousands	QUANTITY	0.91+
several years ago	DATE	0.85+
three types	QUANTITY	0.81+
single view	QUANTITY	0.81+
Cloudera.com/Solutions/PublicSector	OTHER	0.77+
airports	QUANTITY	0.72+
data points	QUANTITY	0.67+
Couple	QUANTITY	0.51+

MANUFACTURING Drive Transportation

(upbeat music) >> Welcome to our industry drill-down. This is from manufacturing. I'm here with Michael Ger who is the managing director for automotive and manufacturing solutions at Cloudera, and in this first session, we're going to discuss how to drive transportation efficiencies and improve sustainability with data. Connected trucks are fundamental to optimizing fleet performance, costs, and delivering new services to fleet operators, and what's going to happen here is Michael's going to present some data and information, and we're going to come back and have a little conversation about what we just heard. Michael, great to see you. Over to you. >> Oh, thank you, Dave, and I appreciate having this conversation today. Hey, this is actually an area, connected trucks. This is an area that we have seen a lot of action here at Cloudera, and I think the reason is kind of important because first of all, you can see that this change is happening very, very quickly. 150% growth is forecast by 2022, and I think this is why we're seeing a lot of action and a lot of growth is that there are a lot of benefits. We're talking about a B2B type of situation here. So this is truck makers providing benefits to fleet operators, and if you look at the top benefits that fleet operators expect, you see this in the graph over here. Almost 80% of them expect improved productivity, things like improved routing, so route efficiencies and improved customer service, decrease in fuel consumption, but better be, this isn't technology for technology's sake. These connected trucks are coming onto the marketplace because hey, they can provide tremendous value to the business, and in this case, we're talking about fleet operators and fleet efficiencies. So, one of the things that's really important to be able to enable this, trucks are becoming connected because at the end of the day, we want to be able to provide fleet efficiencies through connected truck analytics and machine learning. Let me explain to you a little bit about what we mean by that because how this happens is by creating a connected vehicle analytics machine learning lifecycle, and to do that, you need to do a few different things. You start off, of course, with connected trucks in the field, and you could have many of these trucks 'cause typically, you're dealing at a truck level and at a fleet level. We want to be able to do analytics and machine learning to improve performance. So you start off with these trucks, and the first you need to be able to do is connect to those trucks. You have to have an intelligent edge where you can collect that information from the trucks, and by the way, once you've conducted this information from the trucks, you want to be able to analyze that data in real-time and take real-time actions. Now, what I'm going to show you, the ability to take this real-time action is actually the result of a machine learning life cycle. Let me explain to you what I mean by that. So we have this truck, so we start to collect data from it. At the end of the day, what we'd like to be able to do is pull that data into either your data center or into the cloud where we can start to do more advanced analytics, and we start with being able to ingest that data into the cloud, into the enterprise data lake. We store that data. We want to enrich it with other data sources, so for example, if you're doing truck predictive maintenance, you want to take that sensor data that you've collected from those trucks, and you want to augment that with your dealership service information. Now, you have sensor data and the resulting repair orders. You're now equipped to do things like predict when maintenance will occur. You've got all the data sets that you need to be able to do that. So what do you do here? Like I said, you ingest it, you're storing it, you're enriching it with data. You're processing that data. You're aligning, say, the sensor data to that transactional system data from your repair maintenance systems. You're bringing it together so that you can do two things. First of all, you could do self-service BI on that data. You can do things like fleet analytics, but more importantly, what I was talking to you about before is you now have the data sets to be able to create machine learning models. So if you have the sensor values and the need, for example, for a dealership repair order so you could start to correlate which sensor values predicted the need for maintenance, and you could build out those machine learning models, and then, as I mentioned to you, you could push those machine learning models back out to the edge which is how you would then take those real-time actions I mentioned earlier. As that data that then comes through in real-time, you're running it against that model, and you can take some real-time actions. This analytics and machine learning model, machine learning life cycle, is exactly what Cloudera enables. This end-to-end ability to ingest data, store it, put a query lay over it, create machine learning models, and then run those machine learning models in real-time. Now, that's what we do as a business. Now, one such customer, and I just wanted to give you one example of a customer that we have worked with to provide these types of results is Navistar, and Navistar was kind of an early adopter of connected-truck analytics, and they provided these capabilities to their fleet operators. And they started off by connecting 475,000 trucks, up to well over a million now, and the point here is that they were centralizing data from their telematics service providers, from their trucks' telematics service providers. They're bringing in things like weather data and all those types of things, and what they started to do was to build out machine learning models aimed at predictive maintenance, and what's really interesting is that you see that Navistar made tremendous strides in reducing the expense associated with maintenance. So rather than waiting for a truck to break, and then fixing it, they would predict when that truck needs service, condition-based monitoring, and service it before it broke down so that you can do that in a much more cost-effective manner. And if you see the benefits, they reduced maintenance costs 3 cents a mile down from the industry average of 15 cents a mile down to 12 cents cents a mile. So this was a tremendous success for Navistar, and we're seeing this across many of our truck manufacturers. We're working with many of the truck OEMs, and they are all working to achieve very, very similar types of benefits to their customers. So just a little bit about Navistar. Now, we're going to turn to Q&A. Dave's got some questions for me in a second, but before we do that, if you want to learn more about how we work with connected vehicles and autonomous vehicles, please go to our website, what you see up on the screen. There's the URLs, cloudera.com/solutions/manufacturing, and you'll see a whole slew of lateral and information in much more detail in terms of how we connect trucks to fleet operators who provide analytics, use cases that drive dramatically improved performance. So with that being said, I'm going to turn it over to Dave for questions. >> Thank you, Michael. That's a great example. I love the lifecycle. We can visualize that very well. You've got an edge-use case. You're doing both real time inference, really, at the edge, and then you're blending that sensor data with other data sources to enrich your models, and you can push that back to the edge. That's that life cycle, so really appreciate that info. Let me ask you, what are you seeing as the most common connected vehicle when you think about analytics and machine learning, the use cases that you see customers really leaning into? >> Yeah, that's a great question, Dave 'cause everybody always thinks about machine learning, like this is the first thing you go to. Well, actually it's not. The first thing you really want to be able to do, and many of our customers are doing, is look, let's simply connect our trucks or our vehicles or whatever our IOT asset is, and then you can do very simple things like just performance monitoring of the piece of equipment. In the truck industry, a lot of performance monitoring of the truck, but also performance monitoring of the driver. So how is the driver performing? Is there a lot of idle time spent? What's route efficiencies looking like? By connecting the vehicles, you get insights, as I said, into the truck and into the driver, and that's not machine learning any more, but that monitoring piece is really, really important. So the first thing that we see is monitoring types of use cases. Then you start to see companies move towards more of the, what I call, the machine learning and AI models where you're using inference on the edge, and there you start to see things like predictive maintenance happening, kind of real-time route optimization and things like that, and you start to see that evolution again to those smarter, more intelligent, dynamic types of decision-making. But let's not minimize the value of good old-fashioned monitoring to give you that kind of visibility first, then moving to smarter use cases as you go forward. >> You know, it's interesting. I'm envisioning, when you talked about the monitoring, I'm envisioning you see the bumper sticker how am I driving? The only time somebody ever probably calls is when they get cut off, and many people might think, oh, it's about Big Brother, but it's not. I mean, that's yeah, okay, fine, but it's really about improvement and training and continuous improvement, and then of course the route optimization. I mean, that's bottom-line business value. I love those examples. >> Great. >> What are the big hurdles that people should think about when they want to jump into those use cases that you just talked about? What are they going to run into, the blind spots they're going to get hit with? >> There's a few different things. So first of all, a lot of times, your IT folks aren't familiar with kind of the more operational IOT types of data. So just connecting to that type of data can be a new skill set. There's very specialized hardware in the car and things like that and protocols. That's number one. That's the classic IT-OT kind of conundrum that many of our customers struggle with, but then more fundamentally is if you look at the way these types of connected truck or IOT solutions started, oftentimes the first generation were very custom built, so they were brittle. They were kind of hardwired. Then as you move towards more commercial solutions, you had what I call the silo problem. You had fragmentation in terms of this capability from this vendor, this capability from another vendor. You get the idea. One of the things that we really think that needs to be brought to the table is first of all, having an end-to-end data management platform that's kind of integrated, it's all tested together. You have a data lineage across the entire stack, but then also importantly, to be realistic, you have to be able to integrate to industry kind of best practices as well in terms of solution components in the car, the hardware, and all those types of things. So I think there's, it's just stepping back for a second, I think that there has been fragmentation and complexity in the past. We're moving towards more standards and more standard types of offerings. Our job as a software maker is to make that easier and connect those dots so customers don't have to do it all and all on their own. >> And you mentioned specialized hardware. One of the things we heard earlier in the main stage was your partnership with Nvidia. We're talking about new types of hardware coming in. You guys are optimizing for that. We see the IT and the OT worlds blending together, no question, and then that end-to-end management piece. This is different from, you're right, from IT. Normally everything's controlled, you're the data center, and this is a metadata rethinking, kind of how you manage metadata. So in the spirit of what we talked about earlier today, other technology partners, are you working with other partners to sort of accelerate these solutions, move them forward faster? >> Yeah, I'm really glad you're asking that Dave because we actually embarked on a project called Project Fusion which really was about integrating with, when you look at that connected vehicle lifecycle, there are some core vendors out there that are providing some very important capabilities. So what we did is we joined forces with them to build an end-to-end demonstration and reference architecture to enable the complete data management life cycle. Now, Cloudera's piece of this was ingesting data and all the things I talked about being storing and the machine learning. So we provide that end-to-end, but what we wanted to do is we wanted to partner with some key partners, and the partners that we did integrate with were NXP. NXP provides the service-oriented gateways in the cars. That's the hardware in the car. Wind River provides an in-car operating system that's Linux, that's hardened and tested. We then ran our Apache MiNiFi which is part of Cloudera Dataflow in the vehicle, on that operating system, on that hardware. We pumped the data over into the cloud where we did all the data analytics and machine learning and built out these very specialized models, and then we used a company called Airbiquity once we built those models to do. They specialize in automotive over-the-air updates, so they can then take those models and update those models back to the vehicle very rapidly. So what we said is, look, there's an established ecosystem, if you will, of leaders in this space. What we wanted to do is make sure that Cloudera was part and parcel of this ecosystem, and by the way, you mentioned Nvidia as well. We're working close with Nvidia now so when we're doing the machine learning, we can leverage some of their hardware to get some still further acceleration in the machine learning side of things. So yeah, one of the things I always say about these types of use cases, it does take a village, and what we've really tried to do is build out an ecosystem that provides that village so that we can speed that analytics and machine learning life cycle just as fast as it can be. >> This is, again, another great example of data-intensive workloads. It's not your grandfather's ERP that's running on traditional systems. These are really purpose built. Maybe they're customizable for certain edge-use cases. They're low cost, low power. They can't be bloated, and you're right, it does take an ecosystem. You've got to have APIs that connect, and that takes a lot of work and a lot of thought. So that leads me to the technologies that are sort of underpinning this. We talked a lot on theCUBE about semiconductor technology, and now that's changing, and the advancements we're seeing there. What do you see as some of the key technology areas that are advancing this connected vehicle machine learning? >> You know, it's interesting. I'm seeing it in a few places, just a few notable ones. I think, first of all, we see that the vehicle itself is getting smarter. So when you look at that NXP-type of gateway that we talked about, that used to be kind of a dumb gateway that was really, all it was doing was pushing data up and down and provided isolation as a gateway down from the lower level subsystems, so it was really security and just basic communication. That gateway now is becoming what they call a service oriented gateway, so it can run. It's got discs, it's got memory, it's got all this stuff. So now you could run serious compute in the car. So now, all of these things like running machine learning inference models, you have a lot more power in the car. At the same time, 5G is making it so that you can push data fast enough making low-latency computing available even on the cloud. So now you've got incredible compute both at the edge in the vehicle and on the cloud. And then on the cloud, you've got partners like Nvidia who are accelerating it still further through better GPU-based compute. So I mean the whole stack, if you look at that machine learning life cycle we talked about, no Dave, it seems like there's improvements in every step along the way. We're starting to see technology optimization just pervasive throughout the cycle. >> And then, real quick. It's not a quick topic, but you mentioned security. I mean, we've seen a whole new security model emerge. There is no perimeter anymore in a use case like this, is there? >> No, there isn't, and one of the things that we're- Remember, we're the data management platform, and the thing we have to provide is provide end-to-end lineage of where that data came from, who can see it, how it changed, and that's something that we have integrated into from the beginning of when that data is ingested through when it's stored through when it's kind of processed, and people are doing machine learning. We will provide that lineage so that security and governance is assured throughout the data learning life cycle. >> And federated, in this example, across the fleet. So, all right, Michael, that's all the time we have right now. Thank you so much for that great information. Really appreciate it. >> Dave, thank you, and thanks for the audience for listening in today. >> Yes, thank you for watching. Keep it right there. (upbeat music)

Published Date : Aug 5 2021

SUMMARY :

and in this first session, and the first you need to be able to do and machine learning, the and then you can do very talked about the monitoring, and complexity in the past. So in the spirit of what we and the partners that we and the advancements we're seeing there. it so that you can push data but you mentioned security. and the thing we have that's all the time we have right now. and thanks for the audience Yes, thank you for watching.

ENTITIES

Entity	Category	Confidence
Michael	PERSON	0.99+
Dave	PERSON	0.99+
Nvidia	ORGANIZATION	0.99+
Michael Ger	PERSON	0.99+
Airbiquity	ORGANIZATION	0.99+
Cloudera	ORGANIZATION	0.99+
Navistar	ORGANIZATION	0.99+
NXP	ORGANIZATION	0.99+
first	QUANTITY	0.99+
475,000 trucks	QUANTITY	0.99+
2022	DATE	0.99+
150%	QUANTITY	0.99+
Linux	TITLE	0.99+
first generation	QUANTITY	0.99+
3 cents a mile	QUANTITY	0.99+
One	QUANTITY	0.99+
15 cents a mile	QUANTITY	0.98+
first session	QUANTITY	0.98+
today	DATE	0.98+
both	QUANTITY	0.98+
two things	QUANTITY	0.97+
Wind River	ORGANIZATION	0.97+
one example	QUANTITY	0.97+
cloudera.com/solutions/manufacturing	OTHER	0.96+
one	QUANTITY	0.95+
first thing	QUANTITY	0.95+
First	QUANTITY	0.95+
5G	ORGANIZATION	0.92+
one such	QUANTITY	0.88+
12 cents cents a mile	QUANTITY	0.86+
Apache	ORGANIZATION	0.83+
over a million	QUANTITY	0.83+
80%	QUANTITY	0.81+
earlier today	DATE	0.72+
Brother	TITLE	0.6+
second	QUANTITY	0.59+
MiNiFi	COMMERCIAL_ITEM	0.58+
well	QUANTITY	0.55+

MANUFACTURING V1b | CLOUDERA

>>Welcome to our industry. Drill-downs from manufacturing. I'm here with Michael Gerber, who is the managing director for automotive and manufacturing solutions at cloud era. And in this first session, we're going to discuss how to drive transportation efficiencies and improve sustainability with data connected trucks are fundamental to optimizing fleet performance costs and delivering new services to fleet operators. And what's going to happen here is Michael's going to present some data and information, and we're gonna come back and have a little conversation about what we just heard. Michael, great to see you over to you. >>Oh, thank you, Dave. And I appreciate having this conversation today. Hey, um, you know, this is actually an area connected trucks. You know, this is an area that we have seen a lot of action here at Cloudera. And I think the reason is kind of important, right? Because, you know, first of all, you can see that, you know, this change is happening very, very quickly, right? 150% growth is forecast by 2022. Um, and the reasons, and I think this is why we're seeing a lot of action and a lot of growth is that there are a lot of benefits, right? We're talking about a B2B type of situation here. So this is truck made truck makers providing benefits to fleet operators. And if you look at the F the top fleet operator, uh, the top benefits that fleet operators expect, you see this in the graph over here. >>Now almost 80% of them expect improved productivity, things like improved routing rates. So route efficiencies and improve customer service decrease in fuel consumption, but better technology. This isn't technology for technology sake, these connected trucks are coming onto the marketplace because Hey, it can provide for Mendez value to the business. And in this case, we're talking about fleet operators and fleet efficiencies. So, you know, one of the things that's really important to be able to enable this right, um, trucks are becoming connected because at the end of the day, um, we want to be able to provide fleet deficiencies through connected truck, um, analytics and machine learning. Let me explain to you a little bit about what we mean by that, because what, you know, how this happens is by creating a connected vehicle analytics machine learning life cycle, and to do that, you need to do a few different things, right? >>You start off of course, with connected trucks in the field. And, you know, you can have many of these trucks cause typically you're dealing at a truck level and at a fleet level, right? You want to be able to do analytics and machine learning to improve performance. So you start off with these trucks. And the first you need to be able to do is connect to those products, right? You have to have an intelligent edge where you can collect that information from the trucks. And by the way, once you conducted the, um, this information from the trucks, you want to be able to analyze that data in real-time and take real-time actions. Now what I'm going to show you the ability to take this real-time action is actually the result of your machine learning license. Let me explain to you what I mean by that. >>So we have this trucks, we start to collect data from it right at the end of the day. Well we'd like to be able to do is pull that data into either your data center or into the cloud where we can start to do more advanced analytics. And we start with being able to ingest that data into the cloud, into that enterprise data lake. We store that data. We want to enrich it with other data sources. So for example, if you're doing truck predictive maintenance, you want to take that sensor data that you've connected collected from those trucks. And you want to augment that with your dealership, say service information. Now you have, you know, you have sensor data and there was salting repair orders. You're now equipped to do things like predict one day maintenance will work correctly for all the data sets that you need to be able to do that. >>So what do you do here? Like I said, you adjusted your storage, you're enriching it with data, right? You're processing that data. You're aligning say the sensor data to that transactional system data from your, uh, from your, your pair maintenance systems, you know, you're bringing it together so that you can do two things you can do. First of all, you could do self-service BI on that date, right? You can do things like fleet analytics, but more importantly, what I was talking to you about before is you now have the data sets to be able to do create machine learning models. So if you have the sensor right values and the need, for example, for, for a dealership repair, or as you could start to correlate, which sensor values predicted the need for maintenance, and you could build out those machine learning models. And then as I mentioned to you, you could push those machine learning models back out to the edge, which is how you would then take those real-time action. >>I mentioned earlier as that data that then comes through in real-time, you're running it against that model, and you can take some real time actions. This is what we are, this, this, this, this analytics and machine learning model, um, machine learning life cycle is exactly what Cloudera enables this end-to-end ability to ingest, um, stroke, you know, store it, um, put a query, lay over it, um, machine learning models, and then run those machine learning models. Real-time now that's what we, that's what we do as a business. Now when such customer, and I just wanted to give you one example, um, a customer that we have worked with to provide these types of results is Navistar and Navistar was kind of an early, early adopter of connected truck analytics. And they provided these capabilities to their fleet operators, right? And they started off, uh, by, um, by, you know, connecting 475,000 trucks to up to well over a million now. >>And you know, the point here is with that, they were centralizing data from their telematics service providers, from their trucks, from telematics service providers. They're bringing in things like weather data and all those types of things. Um, and what they started to do was to build out machine learning models, aimed at predictive maintenance. And what's really interesting is that you see that Navistar, um, made tremendous strides in reducing the need or the expense associated with maintenance, right? So rather than waiting for a truck to break and then fixing it, they would predict when that truck needs service, condition-based monitoring and service it before it broke down so that you could do that in a much more cost-effective manner. And if you see the benefits, right, they, they reduced maintenance costs 3 cents a mile, um, from the, you know, down from the industry average of 15 cents a mile down to 12 cents cents a mile. >>So this was a tremendous success for Navistar. And we're seeing this across many of our, um, um, you know, um, uh, truck manufacturers. We were working with many of the truck OEMs and they are all working to achieve, um, you know, very, very similar types of, um, benefits to their customers. So just a little bit about Navistar. Um, now we're gonna turn to Q and a, Dave's got some questions for me in a second, but before we do that, if you want to learn more about our, how we work with connected vehicles and autonomous vehicles, please go to our lives or to our website, what you see up, uh, up on the screen, there's the URLs cloudera.com for slash solutions for slash manufacturing. And you'll see a whole slew of, um, um, lateral and information, uh, in much more detail in terms of how we connect, um, trucks to fleet operators who provide analytics, use cases that drive dramatically improved performance. So with that being said, I'm going to turn it over to Dave for questions. >>Thank you. Uh, Michael, that's a great example. You've got, I love the life cycle. You can visualize that very well. You've got an edge use case you do in both real time inference, really at the edge. And then you're blending that sensor data with other data sources to enrich your models. And you can push that back to the edge. That's that lifecycle. So really appreciate that, that info. Let me ask you, what are you seeing as the most common connected vehicle when you think about analytics and machine learning, the use cases that you see customers really leaning into. >>Yeah, that's really, that's a great question. They, you know, cause you know, everybody always thinks about machine learning. Like this is the first thing you go, well, actually it's not right for the first thing you really want to be able to go around. Many of our customers are doing slow. Let's simply connect our trucks or our vehicles or whatever our IOT asset is. And then you can do very simple things like just performance monitoring of the, of the piece of equipment in the truck industry, a lot of performance monitoring of the truck, but also performance monitoring of the driver. So how has the, how has the driver performing? Is there a lot of idle time spent, um, you know, what's, what's route efficiencies looking like, you know, by connecting the vehicles, right? You get insights, as I said into the truck and into the driver and that's not machine learning. >>Right. But that, that, that monitoring piece is really, really important. The first thing that we see is monitoring types of use cases. Then you start to see companies move towards more of the, uh, what I call the machine learning and AI models, where you're using inference on the edge. And then you start to see things like, uh, predictive maintenance happening, um, kind of route real-time, route optimization and things like that. And you start to see that evolution again, to those smarter, more intelligent dynamic types of decision-making, but let's not, let's not minimize the value of good old fashioned monitoring that site to give you that kind of visibility first, then moving to smarter use cases as you, as you go forward. >>You know, it's interesting. I'm, I'm envisioning when you talked about the monitoring, I'm envisioning a, you see the bumper sticker, you know, how am I driving this all the time? If somebody ever probably causes when they get cut off it's snow and you know, many people might think, oh, it's about big brother, but it's not. I mean, that's yeah. Okay, fine. But it's really about improvement and training and continuous improvement. And then of course the, the route optimization, I mean, that's, that's bottom line business value. So, so that's, I love those, uh, those examples. Um, I wonder, I mean, one of the big hurdles that people should think about when they want to jump into those use cases that you just talked about, what are they going to run into, uh, you know, the blind spots they're, they're going to, they're going to get hit with, >>There's a few different things, right? So first of all, a lot of times your it folks aren't familiar with the kind of the more operational IOT types of data. So just connecting to that type of data can be a new skill set, right? That's very specialized hardware in the car and things like that. And protocols that's number one, that that's the classic, it OT kind of conundrum that, um, you know, uh, many of our customers struggle with, but then more fundamentally is, you know, if you look at the way these types of connected truck or IOT solutions started, you know, oftentimes they were, the first generation were very custom built, right? So they were brittle, right? They were kind of hardwired. And as you move towards, um, more commercial solutions, you had what I call the silo, right? You had fragmentation in terms of this capability from this vendor, this capability from another vendor, you get the idea, you know, one of the things that we really think that we need with that, that needs to be brought to the table is first of all, having an end to end data management platform, that's kind of integrated, it's all tested together. >>You have the data lineage across the entire stack, but then also importantly, to be realistic, we have to be able to integrate to, um, industry kind of best practices as well in terms of, um, solution components in the car, how the hardware and all those types things. So I think there's, you know, it's just stepping back for a second. I think that there is, has been fragmentation and complexity in the past. We're moving towards more standards and more standard types of art, um, offerings. Um, our job as a software maker is to make that easier and connect those dots. So customers don't have to do it all on all on their own. >>And you mentioned specialized hardware. One of the things we heard earlier in the main stage was your partnership with Nvidia. We're talking about, you know, new types of hardware coming in, you guys are optimizing for that. We see the it and the OT worlds blending together, no question. And then that end to end management piece, you know, this is different from your right, from it, normally everything's controlled or the data center, and this is a metadata, you know, rethinking kind of how you manage metadata. Um, so in the spirit of, of what we talked about earlier today, uh, uh, other technology partners, are you working with other partners to sort of accelerate these solutions, move them forward faster? >>Yeah, I'm really glad you're asking that because we actually embarked on a product on a project called project fusion, which really was about integrating with, you know, when you look at that connected vehicle life cycle, there are some core vendors out there that are providing some very important capabilities. So what we did is we joined forces with them to build an end-to-end demonstration and reference architecture to enable the complete data management life cycle. Cloudera is Peter piece of this was ingesting data and all the things I talked about being storing and the machine learning, right? And so we provide that end to end. But what we wanted to do is we wanted to partner with some key partners and the partners that we did with, um, integrate with or NXP NXP provides the service oriented gateways in the car. So that's a hardware in the car when river provides an in-car operating system, that's Linux, right? >>That's hardened and tested. We then ran ours, our, uh, Apache magnify, which is part of flood era data flow in the vehicle, right on that operating system. On that hardware, we pump the data over into the cloud where we did them, all the data analytics and machine learning and, and builds out these very specialized models. And then we used a company called Arabic equity. Once we both those models to do, you know, they specialize in automotive over the air updates, right? So they can then take those models and update those models back to the vehicle very rapidly. So what we said is, look, there's, there's an established, um, you know, uh, ecosystem, if you will, of leaders in this space, what we wanted to do is make sure that our, there was part and parcel of this ecosystem. And by the way, you mentioned Nvidia as well. We're working closely with Nvidia now. So when we're doing the machine learning, we can leverage some of their hardware to get some further acceleration in the machine learning side of things. So, uh, yeah, you know, one of the things I always say about this types of use cases, it does take a village. And what we've really tried to do is build out that, that, uh, an ecosystem that provides that village so that we can speed that analytics and machine learning, um, lifecycle just as fast as it can be. This >>Is again another great example of, of data intensive workloads. It's not your, it's not your grandfather's ERP. That's running on, you know, traditional, you know, systems it's, these are really purpose-built, maybe they're customizable for certain edge use cases. They're low cost, low, low power. They can't be bloated, uh, ended you're right. It does take an ecosystem. You've got to have, you know, API APIs that connect and, and that's that, that takes a lot of work and a lot of thoughts. So that, that leads me to the technologies that are sort of underpinning this we've talked we've we talked a lot in the cube about semiconductor technology, and now that's changing and the advancements we're seeing there, what do you see as the, some of the key technical technology areas that are advancing this connected vehicle machine learning? >>You know, it's interesting, I'm seeing it in a few places, just a few notable ones. I think, first of all, you know, we see that the vehicle itself is getting smarter, right? So when you look at, we look at that NXP type of gateway that we talked about that used to be kind of a, a dumb gateway. That was really all it was doing was pushing data up and down and provided isolation, um, as a gateway down to the, uh, down from the lower level subsistence. So it was really security and just basic, um, you know, basic communication that gateway now is becoming what they call a service oriented gate. So it can run. It's not that it's bad desk. It's got memories that always, so now you could run serious compute in the car, right? So now all of these things like running machine learning, inference models, you have a lot more power in the corner at the same time. >>5g is making it so that you can push data fast enough, making low latency computing available, even on the cloud. So now you now you've got credible compute both at the edge in the vehicle and on the cloud. Right. And, um, you know, and then on the, you know, on the cloud, you've got partners like Nvidia who are accelerating, it's still further through better GPU based compute. So I mean the whole stack, if you look at it, that that machine learning life cycle we talked about, no, David seems like there's improvements and EV every step along the way, we're starting to see technology, um, optimum optimization, um, just pervasive throughout the cycle. >>And then real quick, it's not a quick topic, but you mentioned security. If it was seeing a whole new security model emerge, there is no perimeter anymore in this use case like this is there. >>No there isn't. And one of the things that we're, you know, remember where the data management platform platform and the thing we have to provide is provide end-to-end link, you know, end end-to-end lineage of where that data came from, who can see it, you know, how it changed, right? And that's something that we have integrated into from the beginning of when that data is ingested through, when it's stored through, when it's kind of processed and people are doing machine learning, we provide, we will provide that lineage so that, um, you know, that security and governance is a short throughout the, throughout the data learning life cycle, it >>Federated across in this example, across the fleet. So, all right, Michael, that's all the time we have right now. Thank you so much for that great information. Really appreciate it, >>Dave. Thank you. And thank you. Thanks for the audience for listening in today. Yes. Thank you for watching. >>Okay. We're here in the second manufacturing drill down session with Michael Gerber. He was the managing director for automotive and manufacturing solutions at Cloudera. And we're going to continue the discussion with a look at how to lower costs and drive quality in IOT analytics with better uptime. And look, when you do the math, that's really quite obvious when the system is down, productivity is lost and it hits revenue and the bottom line improve quality drives, better service levels and reduces loss opportunities. Michael. Great to see you >>Take it away. All right. Thank you so much. So I'd say we're going to talk a little bit about connected manufacturing, right. And how those IOT IOT around connected manufacturing can do as Dave talked about improved quality outcomes for manufacturing improve and improve your plant uptime. So just a little bit quick, quick, little indulgent, quick history lesson. I promise to be quick. We've all heard about industry 4.0, right? That is the fourth industrial revolution. And that's really what we're here to talk about today. First industrial revolution, real simple, right? You had steam power, right? You would reduce backbreaking work. Second industrial revolution, massive assembly line. Right. So think about Henry Ford and motorized conveyor belts, mass automation, third industrial revolution. Things got interesting, right? You started to see automation, but that automation was done, essentially programmed a robot to do something. It did the same thing over and over and over irrespective about it, of how your outside operations, your outside conditions change fourth industrial revolution, very different breakfast. >>Now we're connecting, um, equipment and processes and getting feedback from it. And through machine learning, we can make those, um, those processes adaptive right through machine learning. That's really what we're talking about in the fourth industrial revolution. And it is intrinsically connected to data and a data life cycle. And by the way, it's important, not just for a little bit of a slight issue. There we'll issue that, but it's important, not for technology sake, right? It's important because it actually drives and very important business outcomes. First of all, quality, right? If you look at the cost of quality, even despite decades of, of, of, of, uh, companies, um, and manufacturers moving to improve while its quality promise still accounted to 20% of sales, right? So every fifth of what you meant or manufactured from a revenue perspective, you've got quality issues that are costing you a lot. >>Plant downtime, cost companies, $50 billion a year. So when we're talking about using data and these industry 4.0 types of use cases, connected data types of use cases, we're not doing it just merely to implement technology. We're doing it to move these from drivers, improving quality, reducing downtime. So let's talk about how a connected manufacturing data life cycle, what like, right, because this is actually the business that cloud era is, is in. Let's talk a little bit about that. So we call this manufacturing edge to AI, this, this analytics life cycle, and it starts with having your plants, right? Those plants are increasingly connected. As I said, sensor prices have come down two thirds over the last decade, right? And those sensors have connected over the internet. So suddenly we can collect all this data from your, um, ma manufacturing plants. What do we want to be able to do? >>You know, we want to be able to collect it. We want to be able to analyze that data as it's coming across. Right? So, uh, in scream, right, we want to be able to analyze it and take intelligent real-time actions. Right? We might do some simple processing and filtering at the edge, but we really want to take real-time actions on that data. But, and this is the inference part of things, right? Taking the time. But this, the ability to take these real-time actions, um, is actually the result of a machine learning life cycle. I want to walk you through this, right? And it starts with, um, ingesting this data for the first time, putting it into our enterprise data lake, right in that data lake enterprise data lake can be either within your data center or it could be in the cloud. You've got, you're going to ingest that data. >>You're going to store it. You're going to enrich it with enterprise data sources. So now you'll have say sensor data and you'll have maintenance repair orders from your maintenance management systems. Right now you can start to think about do you're getting really nice data sets. You can start to say, Hey, which sensor values correlate to the need for machine maintenance, right? You start to see the data sets. They're becoming very compatible with machine learning, but so you, you bring these data sets together. You process that you align your time series data from your sensors to your timestamp data from your, um, you know, from your enterprise systems that your maintenance management system, as I mentioned, you know, once you've done that, we could put a query layer on top. So now we can start to do advanced analytics query across all these different types of data sets. >>But as I mentioned, you, and what's really important here is the fact that once you've stored long histories that say that you can build out those machine learning models I talked to you about earlier. So like I said, you can start to say, which sensor values drove the need, a correlated to the need for equipment maintenance for my maintenance management systems, right? And you can build out those models and say, Hey, here are the sensor values of the conditions that predict the need for Maples. Once you understand that you can actually then build out those models for deploy the models out the edge, where they will then work in that inference mode that we talked about, I will continuously sniff that data as it's coming and say, Hey, which are the, are we experiencing those conditions that PR that predicted the need for maintenance? If so, let's take real-time action, right? >>Let's schedule a work order or an equipment maintenance work order in the past, let's in the future, let's order the parts ahead of time before that piece of equipment fails and allows us to be very, very proactive. So, you know, we have, this is a, one of the Mo the most popular use cases we're seeing in terms of connecting connected manufacturing. And we're working with many different manufacturers around the world. I want to just highlight. One of them is I thought it's really interesting. This company is bought for Russia, for SIA, for ACA is the, um, is the, was, is the, um, the, uh, a supplier associated with Peugeot central line out of France. They are huge, right? This is a multi-national automotive parts and systems supplier. And as you can see, they operate in 300 sites in 35 countries. So very global, they connected 2000 machines, right. >>Um, and then once be able to take data from that. They started off with learning how to ingest the data. They started off very well with, um, you know, with, uh, manufacturing control towers, right? To be able to just monitor data firms coming in, you know, monitor the process. That was the first step, right. Uh, and, you know, 2000 machines, 300 different variables, things like, um, vibration pressure temperature, right? So first let's do performance monitoring. Then they said, okay, let's start doing machine learning on some of these things to start to build out things like equipment, um, predictive maintenance models or compute. And what they really focused on is computer vision while the inspection. So let's take pictures of, um, parts as they go through a process and then classify what that was this picture associated with the good or bad Bali outcome. Then you teach the machine to make that decision on its own. >>So now, now the machine, the camera is doing the inspections. And so they both had those machine learning models. They took that data, all this data was on-prem, but they pushed that data up to the cloud to do the machine learning models, develop those machine learning models. Then they push the machine learning models back into the plants where they, where they could take real-time actions through these computer vision, quality inspections. So great use case. Um, great example of how you can start with monitoring, moved to machine learning, but at the end of the day, or improving quality and improving, um, uh, equipment uptime. And that is the goal of most manufacturers. So with that being said, um, I would like to say, if you want to learn some more, um, we've got a wealth of information on our website. You see the URL in front of you, please go there and you'll learn. There's a lot of information there in terms of the use cases that we're seeing in manufacturing, a lot more detail, and a lot more talk about a lot more customers we'll work with. If you need that information, please do find it. Um, with that, I'm going to turn it over to Dave, to Steve. I think you had some questions you want to run by. >>I do, Michael, thank you very much for that. And before I get into the questions, I just wanted to sort of make some observations that was, you know, struck by what you're saying about the phases of industry. We talk about industry 4.0, and my observation is that, you know, traditionally, you know, machines have always replaced humans, but it's been around labor and, and the difference with 4.0, and what you talked about with connecting equipment is you're injecting machine intelligence. Now the camera inspection example, and then the machines are taking action, right? That's, that's different and, and is a really new kind of paradigm here. I think the, the second thing that struck me is, you know, the cost, you know, 20% of, of sales and plant downtime costing, you know, many tens of billions of dollars a year. Um, so that was huge. I mean, the business case for this is I'm going to reduce my expected loss quite dramatically. >>And then I think the third point, which we turned in the morning sessions, and the main stage is really this, the world is hybrid. Everybody's trying to figure out hybrid, get hybrid, right. And it certainly applies here. Uh, this is, this is a hybrid world you've got to accommodate, you know, regardless of, of where the data is. You've gotta be able to get to it, blend it, enrich it, and then act on it. So anyway, those are my big, big takeaways. Um, so first question. So in thinking about implementing connected manufacturing initiatives, what are people going to run into? What are the big challenges that they're going to, they're going to hit, >>You know, there's, there's, there, there's a few of the, but I think, you know, one of the ones, uh, w one of the key ones is bridging what we'll call the it and OT data divide, right. And what we mean by the it, you know, your, it systems are the ones, your ERP systems, your MES systems, right? Those are your transactional systems that run on relational databases and your it departments are brilliant, are running on that, right? The difficulty becomes an implementing these use cases that you also have to deal with operational technology, right? And those are, um, all of the, that's all the equipment in your manufacturing plant that runs on its proprietary network with proprietorial pro protocols. That information can be very, very difficult to get to. Right. So, and it's, it's a much more unstructured than from your OT. So th the key challenge is being able to bring these data sets together in a single place where you can start to do advanced analytics and leverage that diverse data to do machine learning. Right? So that is one of the, if I boil it down to the single hardest thing in this, uh, in this, in this type of environment, nectar manufacturing is that that operational technology has kind of run on its own in its own world. And for a long time, the silos, um, uh, the silos a, uh, bound, but at the end of the day, this is incredibly valuable data that now can be tapped, um, um, to, to, to, to move those, those metrics we talked about right around quality and uptime. So a huge, >>Well, and again, this is a hybrid team and you, you've kind of got this world, that's going toward an equilibrium. You've got the OT side and, you know, pretty hardcore engineers. And we know, we know it. A lot of that data historically has been analog data. Now it's getting, you know, instrumented and captured. Uh, so you've got that, that cultural challenge. And, you know, you got to blend those two worlds. That's critical. Okay. So, Michael, let's talk about some of the use cases you touched on, on some, but let's peel the onion a bit when you're thinking about this world of connected manufacturing and analytics in that space, when you talk to customers, you know, what are the most common use cases that you see? >>Yeah, that's a good, that's a great question. And you're right. I did allude to a little bit earlier, but there really is. I want people to think about, there's a spectrum of use cases ranging from simple to complex, but you can get value even in the simple phases. And when I talk about the simple use cases, the simplest use cases really is really around monitoring, right? So in this, you monitor your equipment or monitor your processes, right? And you just make sure that you're staying within the bounds of your control plan, right. And this is much easier to do now. Right? Cause some of these sensors are a more sensors and those sensors are moving more and more towards internet types of technology. So, Hey, you've got the opportunity now to be able to do some monitoring. Okay. No machine learning, but just talking about simple monitoring next level down, and we're seeing is something we would call quality event forensic analysis. >>And now on this one, you say, imagine I've got warranty plans in the, in the field, right? So I'm starting to see warranty claims kick up. And what you simply want to be able to do is do the forensic analysis back to what was the root cause of within the manufacturing process that caused it. So this is about connecting the dots. What about warranty issues? What were the manufacturing conditions of the day that caused it? Then you could also say which other tech, which other products were impacted by those same conditions. And we call those proactively rather than, and, and selectively rather than say, um, recalling an entire year's fleet of the car. So, and that, again, also not machine learning, we're simply connecting the dots from a warranty claims in the field to the manufacturing conditions of the day, so that you could take corrective actions, but then you get into a whole slew of machine learning, use dates, you know, and that ranges from things like Wally or say yield optimization. >>We start to collect sensor values and, um, manufacturing yield, uh, values from your ERP system. And you're certain start to say, which, um, you know, which on a sensor values or factors drove good or bad yield outcomes, and you can identify those factors that are the most important. So you, um, you, you measure those, you monitor those and you optimize those, right. That's how you optimize your, and then you go down to the more traditional machine learning use cases around predictive maintenance. So the key point here, Dave is, look, there's a huge, you know, depending on a customer's maturity around big data, you could start simply with, with monitoring, get a lot of value, start then bringing together more diverse data sets to do things like connect the.analytics then and all the way then to, to, to the more advanced machine learning use cases, there's this value to be had throughout. >>I remember when the, you know, the it industry really started to think about, or in the early days, you know, IOT and IOT. Um, it reminds me of when, you know, there was, uh, the, the old days of football field, we were grass and, and the new player would come in and he'd be perfectly white uniform, and you had it. We had to get dirty as an industry, you know, it'll learn. And so, so I question it relates to other technology partners that you might be working with that are maybe new in this space that, that to accelerate some of these solutions that we've been talking about. >>Yeah. That's a great question. And it kind of goes back to one of the things I alluded to alluded upon earlier. We've had some great partners, a partner, for example, litmus automation, whose whole world is the OT world. And what they've done is for example, they built some adapters to be able to catch it practically every industrial protocol. And they've said, Hey, we can do that. And then give a single interface of that data to the Idera data platform. So now, you know, we're really good at ingesting it data and things like that. We can leverage say a company like litmus that can open the flood gates of that OT data, making it much easier to get that data into our platform. And suddenly you've got all the data you need to, to, to implement those types of, um, industry for porno, our analytics use cases. And it really boils down to, can I get to that? Can I break down that it OT, um, you know, uh, a barrier that we've always had and, and bring together those data sets that we can really move the needle in terms of improving manufacturing performance. >>Okay. Thank you for that last question. Speaking to moving the needle, I want to li lead this discussion on the technology advances. I'd love to talk tech here. Uh, what are the key technology enablers and advancers, if you will, that are going to move connected manufacturing and machine learning forward in this transportation space. Sorry, manufacturing. Yeah. >>Yeah. I know in the manufacturing space, there's a few things, first of all, I think the fact that obviously I know we touched upon this, the fact that sensor prices have come down and have become ubiquitous that number one, we can, we've finally been able to get to the OT data, right? That's that's number one, you know, numb number two, I think, you know, um, we, we have the ability that now to be able to store that data a whole lot more efficiently, you know, we've got, we've got great capabilities to be able to do that, to put it over into the cloud, to do the machine learning types of workloads. You've got things like if you're doing computer vision, while in analyst respect GPU's to make those machine learning models much more, uh, much more effective, if that 5g technology that starts to blur at least from a latency perspective where you do your computer, whether it be on the edge or in the cloud, you've, you've got more, the super business critical stuff. >>You probably don't want to rely on, uh, any type of network connection, but from a latency perspective, you're starting to see, uh, you know, the ability to do compute where it's the most effective now. And that's really important. And again, the machine learning capabilities, and they believed a book to build a GP, you know, GPU level machine learning, build out those models and then deployed by over the air updates to, to your equipment. All of those things are making this, um, there's, you know, the advanced analytics and machine learning, uh, data life cycle just faster and better. And at the end of the day, to your point, Dave, that equipment and processor getting much smarter, uh, very much more quickly. Yeah, we got >>A lot of data and we have way lower cost, uh, processing platforms I'll throw in NP use as well. Watch that space neural processing units. Okay. Michael, we're going to leave it there. Thank you so much. Really appreciate your time, >>Dave. I really appreciate it. And thanks. Thanks for, uh, for everybody who joined us. Thanks. Thanks for joining today. Yes. Thank you for watching. Keep it right there.

Published Date : Aug 4 2021

SUMMARY :

Michael, great to see you over to you. And if you look at the F the top fleet operator, uh, the top benefits that So, you know, one of the things that's really important to be able to enable this right, And by the way, once you conducted the, um, this information from the trucks, you want to be able to analyze And you want to augment that with your dealership, say service information. So what do you do here? And they started off, uh, by, um, by, you know, connecting 475,000 And you know, the point here is with that, they were centralizing data from their telematics service providers, many of our, um, um, you know, um, uh, truck manufacturers. And you can push that back to the edge. And then you can do very simple things like just performance monitoring And then you start to see things like, uh, predictive maintenance happening, uh, you know, the blind spots they're, they're going to, they're going to get hit with, it OT kind of conundrum that, um, you know, So I think there's, you know, it's just stepping back for a second. the data center, and this is a metadata, you know, rethinking kind of how you manage metadata. with, you know, when you look at that connected vehicle life cycle, there are some core vendors And by the way, you mentioned Nvidia as well. and now that's changing and the advancements we're seeing there, what do you see as the, um, you know, basic communication that gateway now is becoming um, you know, and then on the, you know, on the cloud, you've got partners like Nvidia who are accelerating, And then real quick, it's not a quick topic, but you mentioned security. And one of the things that we're, you know, remember where the data management Thank you so much for that great information. Thank you for watching. And look, when you do the math, that's really quite obvious when the system is down, productivity is lost and it hits Thank you so much. So every fifth of what you meant or manufactured from a revenue So we call this manufacturing edge to AI, I want to walk you through this, um, you know, from your enterprise systems that your maintenance management system, And you can build out those models and say, Hey, here are the sensor values of the conditions And as you can see, they operate in 300 sites in They started off very well with, um, you know, great example of how you can start with monitoring, moved to machine learning, I think the, the second thing that struck me is, you know, the cost, you know, 20% of, And then I think the third point, which we turned in the morning sessions, and the main stage is really this, And what we mean by the it, you know, your, it systems are the ones, You've got the OT side and, you know, pretty hardcore engineers. And you just make sure that you're staying within the bounds of your control plan, And now on this one, you say, imagine I've got warranty plans in the, in the field, look, there's a huge, you know, depending on a customer's maturity around big data, I remember when the, you know, the it industry really started to think about, or in the early days, you know, uh, a barrier that we've always had and, if you will, that are going to move connected manufacturing and machine learning forward that starts to blur at least from a latency perspective where you do your computer, and they believed a book to build a GP, you know, GPU level machine learning, Thank you so much. Thank you for watching.

ENTITIES

Entity	Category	Confidence
Steve	PERSON	0.99+
Dave	PERSON	0.99+
Michael	PERSON	0.99+
France	LOCATION	0.99+
Michael Gerber	PERSON	0.99+
Nvidia	ORGANIZATION	0.99+
20%	QUANTITY	0.99+
300 sites	QUANTITY	0.99+
12 cents	QUANTITY	0.99+
David	PERSON	0.99+
Cloudera	ORGANIZATION	0.99+
2000 machines	QUANTITY	0.99+
first	QUANTITY	0.99+
2000 machines	QUANTITY	0.99+
Peugeot	ORGANIZATION	0.99+
Navistar	ORGANIZATION	0.99+
2022	DATE	0.99+
150%	QUANTITY	0.99+
today	DATE	0.99+
second thing	QUANTITY	0.99+
35 countries	QUANTITY	0.99+
first generation	QUANTITY	0.99+
first step	QUANTITY	0.99+
Peter	PERSON	0.99+
475,000 trucks	QUANTITY	0.99+
first question	QUANTITY	0.99+
First	QUANTITY	0.99+
One	QUANTITY	0.99+
first time	QUANTITY	0.99+
NXP	ORGANIZATION	0.99+
Russia	LOCATION	0.99+
single	QUANTITY	0.99+
first session	QUANTITY	0.98+
third point	QUANTITY	0.98+
SIA	ORGANIZATION	0.98+
Linux	TITLE	0.98+
3 cents a mile	QUANTITY	0.98+
decades	QUANTITY	0.98+
Apache	ORGANIZATION	0.98+
one example	QUANTITY	0.98+
litmus	ORGANIZATION	0.97+
both	QUANTITY	0.97+
one	QUANTITY	0.97+
15 cents a mile	QUANTITY	0.97+
one day	QUANTITY	0.97+
300 different variables	QUANTITY	0.97+
ACA	ORGANIZATION	0.96+
cloudera.com	OTHER	0.96+
two things	QUANTITY	0.95+
fourth industrial revolution	EVENT	0.95+
Mendez	PERSON	0.95+
two worlds	QUANTITY	0.94+
$50 billion a year	QUANTITY	0.94+
almost 80%	QUANTITY	0.94+
fourth industrial revolution	EVENT	0.93+
Idera	ORGANIZATION	0.93+
First industrial revolution	QUANTITY	0.93+
two thirds	QUANTITY	0.92+
4.0 types	QUANTITY	0.92+

PUBLIC SECTOR V1 | CLOUDERA

>>Hi, this is Cindy Mikey, vice president of industry solutions at caldera. Joining me today is chef is Molly, our solution engineer for the public sector. Today. We're going to talk about speed to insight. Why using machine learning in the public sector, specifically around fraud, waste and abuse. So topic for today, we'll discuss machine learning, why the public sector uses it to target fraud, waste, and abuse, the challenges. How do we enhance your data and analytical approaches the data landscape analytical methods and shad we'll go over reference architecture and a case study. So by definition, fraud, waste and abuse per the government accountability office is fraud. Isn't an attempt to obtain something about value through unwelcome misrepresentation waste is about squandering money or resources and abuse is about behaving improperly or unreasonably to actually obtain something of value for your personal benefit. So as we look at fraud, um, and across all industries, it's a top of mind, um, area within the public sector. >>Um, the types of fraud that we see is specifically around cyber crime, uh, looking at accounting fraud, whether it be from an individual perspective to also, uh, within organizations, looking at financial statement fraud, to also looking at bribery and corruption, as we look at fraud, it really hits us from all angles, whether it be from external perpetrators or internal perpetrators, and specifically from the research by PWC, the key focus area is we also see over half of fraud is actually through some form of internal or external, uh, perpetrators again, key topics. So as we also look at a report recently by the association of certified fraud examiners, um, within the public sector, the us government, um, in 2017, it was identified roughly $148 billion was attributable to fraud, waste and abuse. Specifically about 57 billion was focused on reported monetary losses and another 91 billion on areas where that opportunity or the monetary basis had not yet been measured. >>As we look at breaking those areas down again, we look at several different topics from permit out payment perspective. So breaking it down within the health system, over $65 billion within social services, over $51 billion to procurement fraud to also, um, uh, fraud, waste and abuse that's happening in the grants and the loan process to payroll fraud, and then other aspects, again, quite a few different topical areas. So as we look at those areas, what are the areas that we see additional type of focus, there's a broad stroke areas. What are the actual use cases that our agencies are using the data landscape? What data, what analytical methods can we use to actually help curtail and prevent some of the, uh, the fraud waste and abuse. So, as we look at some of the analytical processes and analytical use crate, uh, use cases in the public sector, whether it's from, uh, you know, the taxation areas to looking at, you know, social services, uh, to public safety, to also the, um, our, um, uh, additional agency methods, we're gonna use focused specifically on some of the use cases around, um, you know, fraud within the tax area. >>Uh, we'll briefly look at some of the aspects of, um, unemployment insurance fraud, uh, benefit fraud, as well as payment and integrity. So fraud has it it's, um, uh, underpinnings inquiry, like you different on government agencies and difficult, different analytical methods, and I usage of different data. So I think one of the key elements is, you know, you can look at your, your data landscape on specific data sources that you need, but it's really about bringing together different data sources across a different variety, a different velocity. So, uh, data has different dimensions. So we'll look at structured types of data of semi-structured data, behavioral data, as well as when we look at, um, you know, predictive models. We're typically looking at historical type information, but if we're actually trying to look at preventing fraud before it actually happens, or when a case may be in flight, which is specifically a use case that shad is going to talk about later is how do I look at more of that? >>Real-time that streaming information? How do I take advantage of data, whether it be, uh, you know, uh, financial transactions we're looking at, um, asset verification, we're looking at tax records, we're looking at corporate filings. Um, and we can also look at more, uh, advanced data sources where as we're looking at, um, investigation type information. So we're maybe going out and we're looking at, uh, deep learning type models around, uh, you know, semi or that, uh, behavioral, uh, that's unstructured data, whether it be camera analysis and so forth. So for quite a different variety of data and the, the breadth and the opportunity really comes about when you can integrate and look at data across all different data sources. So in a looking at a more extensive, uh, data landscape. So specifically I want to focus on some of the methods, some of the data sources and some of the analytical techniques that we're seeing, uh, being used, um, in the government agencies, as well as opportunities, uh, to look at new methods. >>So as we're looking at, you know, from a, um, an audit planning or looking at, uh, the opportunity for the likelihood of non-compliance, um, specifically we'll see data sources where we're maybe looking at a constituents profile, we might actually be investigating the forms that they've provided. We might be comparing that data, um, or leveraging internal data sources, possibly looking at net worth, comparing it against other financial data, and also comparison across other constituents groups. Some of the techniques that we use are some of the basic natural language processing, maybe we're going to do some text mining. We might be doing some probabilistic modeling, uh, where we're actually looking at, um, information within the agency to also comparing that against possibly tax forms. A lot of times it's information historically has been done on a batch perspective, both structured and semi-structured type information. And typically the data volumes can be low, but we're also seeing those data volumes on increase exponentially based upon the types of events that we're dealing with, the number of transactions. >>Um, so getting the throughput, um, and chef's going to specifically talk about that in a moment. The other aspect is, as we look at other areas of opportunity is when we're building upon, how do I actually do compliance? How do I actually look at conducting audits, uh, or potential fraud to also looking at areas of under-reported tax information? So there you might be pulling in some of our other types of data sources, whether it's being property records, it could be data that's being supplied by the actual constituents or by vendors to also pulling in social media information to geographical information, to leveraging photos on techniques that we're seeing used is possibly some sentiment analysis, link analysis. Um, how do we actually blend those data sources together from a natural language processing? But I think what's important here is also the method and the looking at the data velocity, whether it be batch, whether it be near real time, again, looking at all types of data, whether it's structured semi-structured or unstructured and the key and the value behind this is, um, how do we actually look at increasing the potential revenue or the, um, under reported revenue? >>Uh, how do we actually look at stopping fraudulent payments before they actually occur? Um, also looking at increasing the amount of, uh, the level of compliance, um, and also looking at the potential of prosecution of fraud cases. And additionally, other areas of opportunity could be looking at, um, economic planning. How do we actually perform some link analysis? How do we bring some more of those things that we saw in the data landscape on customer, or, you know, constituent interaction, bringing in social media, bringing in, uh, potentially police records, property records, um, other tax department, database information. Um, and then also looking at comparing one individual to other individuals, looking at people like a specific, like a constituent, are there areas where we're seeing, uh, >>Um, other >>Aspects of, of fraud potentially being occurring. Um, and also as we move forward, some of the more advanced techniques that we're seeing around deep learning is looking at computer vision, um, leveraging geospatial information, looking at social network entity analysis, uh, also looking at, uh, agent-based modeling techniques, where we're looking at simulation Monte Carlo type techniques that we typically see in the financial services industry, actually applying that to fraud, waste, and abuse within the, uh, the public sector. Um, and again, that really, uh, lends itself to a new opportunities. And on that, I'm going to turn it over to chef to talk about, uh, the reference architecture for, uh, doing these buckets. >>Thanks, Cindy. Um, so I'm gonna walk you through an example, reference architecture for fraud detection using, uh, Cloudera's underlying technology. Um, and you know, before I get into the technical details, uh, I want to talk about how this would be implemented at a much higher level. So with fraud detection, what we're trying to do is identify anomalies or novelists behavior within our datasets. Um, now in order to understand what aspects of our incoming data represents anomalous behavior, we first need to understand what normal behavior is. So in essence, once we understand normal behavior, anything that deviates from it can be thought of as an anomaly, right? So in order to understand what normal behavior is, we're going to need to be able to collect store and process a very large amount of historical data. And so incomes, clutters platform, and this reference architecture that needs to be for you. >>So, uh, let's start on the left-hand side of this reference architecture with the collect phase. So fraud detection will always begin with data collection. We need to collect large amounts of information from systems that could be in the cloud. It could be in the data center or even on edge devices, and this data needs to be collected so we can create our normal behavior profiles. And these normal behavioral profiles would then in turn, be used to create our predictive models for fraudulent activity. Now, uh, thinking, uh, to the data collection side, one of the main challenges that many organizations face, uh, in this phase, uh, involves using a single technology that can handle, uh, data that's coming in all different types of formats and protocols and standards with different velocities and velocities. Um, let me give you an example. Uh, we could be collecting data from a database that gets updated daily, uh, and maybe that data is being collected in Agra format. >>At the same time, we can be collecting data from an edge device that's streaming in every second, and that data may be coming in Jason or a binary format, right? So this is a data collection challenge that can be solved with cluttered data flow, which is a suite of technologies built on a patch NIFA in mini five, allowing us to ingest all of this data, do a drag and drop interface. So now we're collecting all of this data, that's required to map out normal behavior. The next thing that we need to do is enrich it, transform it and distribute it to, uh, you know, downstream systems for further process. Uh, so let's, let's walk through how that would work first. Let's taking Richmond for, uh, for enrichment, think of adding additional information to your incoming data, right? Let's take, uh, financial transactions, for example, uh, because Cindy mentioned it earlier, right? >>You can store known locations of an individual in an operational database, uh, with Cloudera that would be HBase. And as an individual makes a new transaction, their geolocation that's in that transaction data can be enriched with previously known locations of that very same individual. And all of that enriched data can be later used downstream for predictive analysis, predictable. So the data has been enrich. Uh, now it needs to be transformed. We want the data that's coming in, uh, you know, Avro and Jason and binary and whatever other format to be transformed into a single common format. So it can be used downstream for stream processing. Uh, again, this is going to be done through clutter and data flow, which is backed by NIFA, right? So the transformed semantic data is then going to be stricted to Kafka and coffin. It's going to serve as that central repository of syndicated services or a buffer zone, right? >>So coffee is going to pretty much provide you with, uh, extremely fast resilient and fault tolerance storage. And it's also gonna give you the consumer APIs that you need that are going to enable a wide variety of applications to leverage that enriched and transformed data within your buffer zone, uh, allowed that, you know, 17. So you can store that data in a distributed file system, give you that historical context that you're going to need later on for machine learning, right? So the next step in the architecture is to leverage a cluttered SQL stream builder, which enables us to write, uh, streaming SQL jobs on top of Apache Flink. So we can, uh, filter, analyze and, uh, understand the data that's in the Kafka buffer in real time. Uh I'll you know, I'll also add like, you know, if you have time series data, or if you need a lab type of cubing, you can leverage kudu, uh, while EDA or, you know, exploratory data analysis and visualization, uh, can all be enabled through clever visualization technology. >>All right, so we've filtered, we've analyzed and we've explored our incoming data. We can now proceed to train our machine learning models, uh, which will detect anomalous behavior in our historically collected data set, uh, to do this, we can use a combination of supervised unsupervised, uh, even deep learning techniques with neural networks. And these models can be tested on new incoming streaming data. And once we've gone ahead and obtain the accuracy of the performance, the scores that we want, we can then take these models and deploy them into production. And once the models are productionalized or operationalized, they can be leveraged within our streaming pipeline. So as new data is ingested in real-time knife, I can query these models to detect if the activity is anomalous or fraudulent. And if it is, they can alert downstream users and systems, right? So this in essence is how fraudulent activity detection works. >>Uh, and this entire pipeline is powered by clutters technology, right? And so, uh, the IRS is one of, uh, clutter's customers. That's leveraging our platform today and implementing, uh, a very similar architecture, uh, to detect fraud, waste, and abuse across a very large set of historical facts, data. Um, and one of the neat things with the IRS is that they've actually recently leveraged the partnership between Cloudera and Nvidia to accelerate their spark based analytics and their machine learning, uh, and the results have been nothing short of amazing, right? And in fact, we have a quote here from Joe and salty who's, uh, you know, the technical branch chief for the research analytics and statistics division group within the IRS with zero changes to our fraud detection workflow, we're able to obtain eight times to performance simply by adding GPS to our mainstream big data servers. This improvement translates to half the cost of ownership for the same workloads, right? So embedding GPU's into the reference architecture I covered earlier has enabled the IRS to improve their time to insights by as much as eight X while simultaneously reducing their underlying infrastructure costs by half, uh, Cindy back to you >>Chef. Thank you. Um, and I hope that you found, uh, some of the, the analysis, the information that Sheva and I have provided, um, to give you some insights on how cloud era is actually helping, uh, with the fraud waste and abuse challenges within the, uh, the public sector, um, specifically looking at any and all types of data, how the clutter platform is bringing together and analyzing information, whether it be you're structured you're semi-structured to unstructured data, both in a fast or in a real-time perspective, looking at anomalies, being able to do some of those on detection, uh, looking at neural network analysis, time series information. So next steps we'd love to have additional conversation with you. You can also find on some additional information around, I have caught areas working in the, the federal government by going to cloudera.com solutions slash public sector. And we welcome scheduling a meeting with you again, thank you for joining us Sheva and I today. We greatly appreciate your time and look forward to future progress. >>Good day, everyone. Thank you for joining me. I'm Sydney. Mike joined by Rick Taylor of Cloudera. Uh, we're here to talk about predictive maintenance for the public sector and how to increase assets, service, reliability on today's agenda. We'll talk specifically around how to optimize your equipment maintenance, how to reduce costs, asset failure with data and analytics. We'll go into a little more depth on, um, what type of data, the analytical methods that we're typically seeing used, um, the associated, uh, Brooke, we'll go over a case study as well as a reference architecture. So by basic definition, uh, predictive maintenance is about determining when an asset should be maintained and what specific maintenance activities need to be performed either based upon an assets of actual condition or state. It's also about predicting and preventing failures and performing maintenance on your time on your schedule to avoid costly unplanned downtime. >>McKinsey has looked at analyzing predictive maintenance costs across multiple industries and has identified that there's the opportunity to reduce overall predictive maintenance costs by roughly 50% with different types of analytical methods. So let's look at those three types of models. First, we've got our traditional type of method for maintenance, and that's really about our corrective maintenance, and that's when we're performing maintenance on an asset, um, after the equipment fails. But the challenges with that is we end up with unplanned. We end up with disruptions in our schedules, um, as well as reduced quality, um, around the performance of the asset. And then we started looking at preventive maintenance and preventative maintenance is really when we're performing maintenance on a set schedule. Um, the challenges with that is we're typically doing it regardless of the actual condition of the asset, um, which has resulted in unnecessary downtime and expense. Um, and specifically we're really now focused on pre uh, condition-based maintenance, which is looking at leveraging predictive maintenance techniques based upon actual conditions and real time events and processes. Um, within that we've seen organizations, um, and again, source from McKenzie have a 50% reduction in downtime, as well as an overall 40% reduction in maintenance costs. Again, this is really looking at things across multiple industries, but let's look at it in the context of the public sector and based upon some activity by the department of energy, um, several years ago, >>Um, they've really >>Looked at what does predictive maintenance mean to the public sector? What is the benefit, uh, looking at increasing return on investment of assets, reducing, uh, you know, reduction in downtime, um, as well as overall maintenance costs. So corrective or reactive based maintenance is really about performing once there's been a failure. Um, and then the movement towards, uh, preventative, which is based upon a set schedule or looking at predictive where we're monitoring real-time conditions. Um, and most importantly is now actually leveraging IOT and data and analytics to further reduce those overall downtimes. And there's a research report by the, uh, department of energy that goes into more specifics, um, on the opportunity within the public sector. So, Rick, let's talk a little bit about what are some of the challenges, uh, regarding data, uh, regarding predictive maintenance. >>Some of the challenges include having data silos, historically our government organizations and organizations in the commercial space as well, have multiple data silos. They've spun up over time. There are multiple business units and note, there's no single view of assets. And oftentimes there's redundant information stored in, in these silos of information. Uh, couple that with huge increases in data volume data growing exponentially, along with new types of data that we can ingest there's social media, there's semi and unstructured data sources and the real time data that we can now collect from the internet of things. And so the challenge is to collect all these assets together and begin to extract intelligence from them and insights and, and that in turn then fuels, uh, machine learning and, um, and, and what we call artificial intelligence, which enables predictive maintenance. Next slide. So >>Let's look specifically at, you know, the, the types of use cases and I'm going to Rick and I are going to focus on those use cases, where do we see predictive maintenance coming into the procurement facility, supply chain, operations and logistics. Um, we've got various level of maturity. So, you know, we're talking about predictive maintenance. We're also talking about, uh, using, uh, information, whether it be on a, um, a connected asset or a vehicle doing monitoring, uh, to also leveraging predictive maintenance on how do we bring about, uh, looking at data from connected warehouses facilities and buildings all bring on an opportunity to both increase the quality and effectiveness of the missions within the agencies to also looking at re uh, looking at cost efficiency, as well as looking at risk and safety and the types of data, um, you know, that Rick mentioned around, you know, the new types of information, some of those data elements that we typically have seen is looking at failure history. >>So when has that an asset or a machine or a component within a machine failed in the past? Uh, we've also looking at bringing together a maintenance history, looking at a specific machine. Are we getting error codes off of a machine or assets, uh, looking at when we've replaced certain components to looking at, um, how are we actually leveraging the assets? What were the operating conditions, uh, um, pulling off data from a sensor on that asset? Um, also looking at the, um, the features of an asset, whether it's, you know, engine size it's make and model, um, where's the asset located on to also looking at who's operated the asset, uh, you know, whether it be their certifications, what's their experience, um, how are they leveraging the assets and then also bringing in together, um, some of the, the pattern analysis that we've seen. So what are the operating limits? Um, are we getting service reliability? Are we getting a product recall information from the actual manufacturer? So, Rick, I know the data landscape has really changed. Let's, let's go over looking at some of those components. Sure. >>So this slide depicts sort of the, some of the inputs that inform a predictive maintenance program. So, as we've talked a little bit about the silos of information, the ERP system of record, perhaps the spares and the service history. So we want, what we want to do is combine that information with sensor data, whether it's a facility and equipment sensors, um, uh, or temperature and humidity, for example, all this stuff is then combined together, uh, and then use to develop machine learning models that better inform, uh, predictive maintenance, because we'll do need to keep, uh, to take into account the environmental factors that may cause additional wear and tear on the asset that we're monitoring. So here's some examples of private sector, uh, maintenance use cases that also have broad applicability across the government. For example, one of the busiest airports in Europe is running cloud era on Azure to capture secure and correlate sensor data collected from equipment within the airport, the people moving equipment more specifically, the escalators, the elevators, and the baggage carousels. >>The objective here is to prevent breakdowns and improve airport efficiency and passenger safety. Another example is a container shipping port. In this case, we use IOT data and machine learning, help customers recognize how their cargo handling equipment is performing in different weather conditions to understand how usage relates to failure rates and to detect anomalies and transport systems. These all improve for another example is Navistar Navistar, leading manufacturer of commercial trucks, buses, and military vehicles. Typically vehicle maintenance, as Cindy mentioned, is based on miles traveled or based on a schedule or a time since the last service. But these are only two of the thousands of data points that can signal the need for maintenance. And as it turns out, unscheduled maintenance and vehicle breakdowns account for a large share of the total cost for vehicle owner. So to help fleet owners move from a reactive approach to a more predictive model, Navistar built an IOT enabled remote diagnostics platform called on command. >>The platform brings in over 70 sensor data feeds for more than 375,000 connected vehicles. These include engine performance, trucks, speed, acceleration, cooling temperature, and break where this data is then correlated with other Navistar and third-party data sources, including weather geo location, vehicle usage, traffic warranty, and parts inventory information. So the platform then uses machine learning and advanced analytics to automatically detect problems early and predict maintenance requirements. So how does the fleet operator use this information? They can monitor truck health and performance from smartphones or tablets and prioritize needed repairs. Also, they can identify that the nearest service location that has the relevant parts, the train technicians and the available service space. So sort of wrapping up the, the benefits Navistar's helped fleet owners reduce maintenance by more than 30%. The same platform is also used to help school buses run safely. And on time, for example, one school district with 110 buses that travel over a million miles annually reduce the number of PTOs needed year over year, thanks to predictive insights delivered by this platform. >>So I'd like to take a moment and walk through the data. Life cycle is depicted in this diagram. So data ingest from the edge may include feeds from the factory floor or things like connected vehicles, whether they're trucks, aircraft, heavy equipment, cargo vessels, et cetera. Next, the data lands on a secure and governed data platform. Whereas combined with data from existing systems of record to provide additional insights, and this platform supports multiple analytic functions working together on the same data while maintaining strict security governance and control measures once processed the data is used to train machine learning models, which are then deployed into production, monitored, and retrained as needed to maintain accuracy. The process data is also typically placed in a data warehouse and use to support business intelligence, analytics, and dashboards. And in fact, this data lifecycle is representative of one of our government customers doing condition-based maintenance across a variety of aircraft. >>And the benefits they've discovered include less unscheduled maintenance and a reduction in mean man hours to repair increased maintenance efficiencies, improved aircraft availability, and the ability to avoid cascading component failures, which typically cost more in repair cost and downtime. Also, they're able to better forecast the requirements for replacement parts and consumables and last, and certainly very importantly, this leads to enhanced safety. This chart overlays the secure open source Cloudera platform used in support of the data life cycle. We've been discussing Cloudera data flow, the data ingest data movement and real time streaming data query capabilities. So data flow gives us the capability to bring data in from the asset of interest from the internet of things. While the data platform provides a secure governed data lake and visibility across the full machine learning life cycle eliminates silos and streamlines workflows across teams. The platform includes an integrated suite of secure analytic applications. And two that we're specifically calling out here are Cloudera machine learning, which supports the collaborative data science and machine learning environment, which facilitates machine learning and AI and the cloud era data warehouse, which supports the analytics and business intelligence, including those dashboards for leadership Cindy, over to you, Rick, >>Thank you. And I hope that, uh, Rick and I provided you some insights on how predictive maintenance condition-based maintenance is being used and can be used within your respective agency, bringing together, um, data sources that maybe you're having challenges with today. Uh, bringing that, uh, more real-time information in from a streaming perspective, blending that industrial IOT, as well as historical information together to help actually, uh, optimize maintenance and reduce costs within the, uh, each of your agencies, uh, to learn a little bit more about Cloudera, um, and our, what we're doing from a predictive maintenance please, uh, business@cloudera.com solutions slash public sector. And we look forward to scheduling a meeting with you, and on that, we appreciate your time today and thank you very much.

Published Date : Aug 4 2021

SUMMARY :

So as we look at fraud, Um, the types of fraud that we see is specifically around cyber crime, So as we look at those areas, what are the areas that we see additional So I think one of the key elements is, you know, you can look at your, the breadth and the opportunity really comes about when you can integrate and Some of the techniques that we use and the value behind this is, um, how do we actually look at increasing Um, also looking at increasing the amount of, uh, the level of compliance, I'm going to turn it over to chef to talk about, uh, the reference architecture for, before I get into the technical details, uh, I want to talk about how this would be implemented at a much higher level. It could be in the data center or even on edge devices, and this data needs to be collected At the same time, we can be collecting data from an edge device that's streaming in every second, So the data has been enrich. So the next step in the architecture is to leverage a cluttered SQL stream builder, obtain the accuracy of the performance, the scores that we want, Um, and one of the neat things with the IRS the analysis, the information that Sheva and I have provided, um, to give you some insights on the analytical methods that we're typically seeing used, um, the associated, doing it regardless of the actual condition of the asset, um, uh, you know, reduction in downtime, um, as well as overall maintenance costs. And so the challenge is to collect all these assets together and begin the types of data, um, you know, that Rick mentioned around, you know, the new types on to also looking at who's operated the asset, uh, you know, whether it be their certifications, So we want, what we want to do is combine that information with So to help fleet So the platform then uses machine learning and advanced analytics to automatically detect problems So data ingest from the edge may include feeds from the factory floor or things like improved aircraft availability, and the ability to avoid cascading And I hope that, uh, Rick and I provided you some insights on how predictive

ENTITIES

Entity	Category	Confidence
Cindy Mikey	PERSON	0.99+
Rick	PERSON	0.99+
Rick Taylor	PERSON	0.99+
Molly	PERSON	0.99+
Nvidia	ORGANIZATION	0.99+
2017	DATE	0.99+
PWC	ORGANIZATION	0.99+
40%	QUANTITY	0.99+
110 buses	QUANTITY	0.99+
Europe	LOCATION	0.99+
50%	QUANTITY	0.99+
Cindy	PERSON	0.99+
Mike	PERSON	0.99+
Joe	PERSON	0.99+
Cloudera	ORGANIZATION	0.99+
Today	DATE	0.99+
today	DATE	0.99+
Navistar	ORGANIZATION	0.99+
First	QUANTITY	0.99+
two	QUANTITY	0.99+
more than 30%	QUANTITY	0.99+
over $51 billion	QUANTITY	0.99+
NIFA	ORGANIZATION	0.99+
over $65 billion	QUANTITY	0.99+
IRS	ORGANIZATION	0.99+
over a million miles	QUANTITY	0.99+
first	QUANTITY	0.98+
one	QUANTITY	0.98+
Jason	PERSON	0.98+
Azure	TITLE	0.98+
Brooke	PERSON	0.98+
Avro	PERSON	0.98+
one school district	QUANTITY	0.98+
SQL	TITLE	0.97+
both	QUANTITY	0.97+
$148 billion	QUANTITY	0.97+
Sheva	PERSON	0.97+
three types	QUANTITY	0.96+
each	QUANTITY	0.95+
McKenzie	ORGANIZATION	0.95+
more than 375,000 connected vehicles	QUANTITY	0.95+
Cloudera	TITLE	0.95+
about 57 billion	QUANTITY	0.95+
salty	PERSON	0.94+
several years ago	DATE	0.94+
single technology	QUANTITY	0.94+
eight times	QUANTITY	0.93+
91 billion	QUANTITY	0.93+
eight X	QUANTITY	0.92+
business@cloudera.com	OTHER	0.92+
McKinsey	ORGANIZATION	0.92+
zero changes	QUANTITY	0.92+
Monte Carlo	TITLE	0.92+
caldera	ORGANIZATION	0.91+
couple	QUANTITY	0.9+
over 70 sensor data feeds	QUANTITY	0.88+
Richmond	LOCATION	0.84+
Navistar Navistar	ORGANIZATION	0.82+
single view	QUANTITY	0.81+
17	OTHER	0.8+
single common format	QUANTITY	0.8+
thousands of data points	QUANTITY	0.79+
Sydney	LOCATION	0.78+

Manufacturing - Drive Transportation Efficiency and Sustainability with Big | Cloudera

>> Welcome to our industry drill down. This is for manufacturing. I'm here with Michael Ger, who is the managing director for automotive and manufacturing solutions at Cloudera. And in this first session, we're going to discuss how to drive transportation efficiencies and improve sustainability with data. Connected trucks are fundamental to optimizing fleet performance, costs, and delivering new services to fleet operators. And what's going to happen here is Michael's going to present some data and information, and we're going to come back and have a little conversation about what we just heard. Michael, great to see you! Over to you. >> Oh, thank you, Dave. And I appreciate having this conversation today. Hey, you know, this is actually an area, connected trucks, you know, this is an area that we have seen a lot of action here at Cloudera. And I think the reason is kind of important, right? Because you know, first of all, you can see that, you know, this change is happening very, very quickly, right? 150% growth is forecast by 2022 and the reasons, and I think this is why we're seeing a lot of action and a lot of growth, is that there are a lot of benefits, right? We're talking about a B2B type of situation here. So this is truck made, truck makers providing benefits to fleet operators. And if you look at the, the top fleet operator, the top benefits that fleet operators expect, you see this in, in the, in the graph over here, now almost 80% of them expect improved productivity, things like improved routing, right? So route efficiencies, improved customer service, decrease in fuel consumption, better better technology. This isn't technology for technology's sake, these connected trucks are coming onto the marketplace because, hey, it can provide tremendous value to the business. And in this case, we're talking about fleet operators and fleet efficiencies. So, you know, one of the things that's really important to be able to enable us, right, trucks are becoming connected because at the end of the day, we want to be able to provide fleet efficiencies through connected truck analytics and machine learning. Let me explain to you a little bit about what we mean by that, because what, you know, how this happens is by creating a connected vehicle, analytics, machine-learning life cycle, and to do that, you need to do a few different things, right? You start off, of course, with connected trucks in the field. And, you know, you could have many of these trucks because typically you're dealing at a truck level and at a fleet level, right? You want to be able to do analytics and machine learning to improve performance. So you start off with these trucks. And the first thing you need to be able to do is connect to those trucks, right? You have to have an intelligent edge where you can collect that information from the trucks. And by the way, once you collect the, this information from the trucks, you want to be able to analyze that data in real-time and take real-time actions. Now what I'm going to show you, the ability to take this real-time action, is actually the result of your machine-learning lifecycle. Let me explain to you what I mean by that. So we have these trucks, we start to collect data from it, right? At the end of the day what we'd like to be able to do is pull that data into either your data center or into the cloud, where we can start to do more advanced analytics. And we start with being able to ingest that data into the cloud, into that enterprise data lake. We store that data. We want to enrich it with other data sources. So for example, if you're doing truck predictive maintenance, you want to take that sensor data that you've connected, collected from those trucks. And you want to augment that with your dealership, say, service information. Now you have, you know, you have sensor data and the resulting repair orders. You're now equipped to do things like predict when maintenance will work, all right. You've got all the data sets that you need to be able to do that. So what do you do? Like I said, you're ingested, you're storing, you're enriching it with data, right? You're processing that data. You're aligning, say, the sensor data to that transactional system data from your, from your your repair maintenance systems; you're, you're bringing it together so that you can do two things. You can do, first of all, you could do self-service BI on that data, right? You can do things like fleet analytics, but more importantly, what I was talking to you about before is you now have the data sets to be able to do create machine learning models. So if you have the sensor values and the need, for example, for, for a dealership repair, or is, you could start to correlate which sensor values predicted the need for maintenance, and you could build out those machine learning models. And then as I mentioned to you, you could push those machine learning models back out to the edge, which is how you would then take those real-time actions I mentioned earlier. As that data that then comes through in real-time, you're running it again against that model. And you can take some real-time actions. This is what we, this is this, this, this analytics and machine learning model, machine learning life cycle is exactly what Cloudera enables. This end-to-end ability to ingest data; store, you know, store it, put a query lay over it, create machine learning models, and then run those machine learning models in real time. Now that's what we, that's what we do as a business. Now one such customer, and I just want to give you one example of a customer that we have worked with to provide these types of results is Navistar. And Navistar was kind of an early, early adopter of connected truck analytics, and they provided these capabilities to their fleet operators, right? And they started off by, by, you know, connecting 475,000 trucks to up to well over a million now. And you know, the point here is that they were centralizing data from their telematics service providers, from their trucks' telematics service providers. They're bringing in things like weather data and all those types of things. And what they started to do was to build out machine learning models aimed at predictive maintenance. And what's really interesting is that you see that Navistar made tremendous strides in reducing the need, or the expense associated with maintenance, right? So rather than waiting for a truck to break and then fixing it, they would predict when that truck needs service, condition-based monitoring, and service it before it broke down, so that you can do that in a much more cost-effective manner. And if you see the benefits, right, they reduce maintenance costs 3 cents a mile from the, you know, down from the industry average of 15 cents a mile down to 12 cents cents a mile. So this was a tremendous success for Navistar. And we're seeing this across many of our, you know, truck manufacturers. We're, we're working with many of the truck OEMs, and they are all working to achieve very, very similar types of benefits to their customers. So just a little bit about Navistar. Now, we're going to turn to Q and A. Dave's got some questions for me in a second, but before we do that, if you want to learn more about our, how we work with connected vehicles and autonomous vehicles, please go to our web, to our website. What you see up, up on the screen. There's the URL. It's cloudera.com forward slash solutions, forward slash manufacturing. And you'll see a whole slew of collateral and information in much more detail in terms of how we connect trucks to fleet operators who provide analytics. Use cases that drive dramatically improved performance. So with that being said, I'm going to turn it over to Dave for questions. >> Thank you, Michael. That's a great example you've got. I love the life cycle. You can visualize that very well. You've got an edge use case you do in both real time inference, really, at the edge. And then you're blending that sensor data with other data sources to enrich your models. And you can push that back to the edge. That's that life cycle. So really appreciate that, that info. Let me ask you, what are you seeing as the most common connected vehicle when you think about analytics and machine learning, the use cases that you see customers really leaning into? >> Yeah, that's really, that's a great question, Dave, you know, cause, you know, everybody always thinks about machine learning like this is the first thing you go to. Well, actually it's not, right? For the first thing you really want to be able to go down, many of our customers are doing, is look, let's simply connect our trucks or our vehicles or whatever our IOT asset is, and then you can do very simple things like just performance monitoring of the, of the piece of equipment. In the truck industry, a lot of performance monitoring of the truck, but also performance monitoring of the driver. So how is the, how is the driver performing? Is there a lot of idle time spent? You know, what's, what's route efficiency looking like? You know, by connecting the vehicles, right? You get insights, as I said, into the truck and into the driver and that's not machine learning even, right? But, but that, that monitoring piece is really, really important. So the first thing that we see is monitoring types of use cases. Then you start to see companies move towards more of the, what I call the machine learning and AI models, where you're using inference on the edge. And then you start to see things like predictive maintenance happening, kind of route real-time, route optimization and things like that. And you start to see that evolution again, to those smarter, more intelligent dynamic types of decision-making. But let's not, let's not minimize the value of good old fashioned monitoring, that's to give you that kind of visibility first, then moving to smarter use cases as you, as you go forward. >> You know, it's interesting, I'm I'm envisioning, when you talked about the monitoring, I'm envisioning, you see the bumper sticker, you know, "How am I driving?" The only time somebody ever probably calls is when they get cut off it's and you know, I mean, people might think, "Oh, it's about big brother," but it's not. I mean, that's yeah okay, fine. But it's really about improvement and training and continuous improvement. And then of course the, the route optimization. I mean, that's, that's bottom line business value. So, so that's, I love those, those examples. >> Great! >> I wonder, I mean, what are the big hurdles that people should think about when they want to jump into those use cases that you just talked about, what are they going to run into? You know, the blind spots they're, they're going to, they're going to to get hit with. >> There's a few different things, right? So first of all, a lot of times your IT folks aren't familiar with the kind of the more operational IOT types of data. So just connecting to that type of data can be a new skill set, right? There's very specialized hardware in the car and things like, like that and protocols. That's number one. That's the classic IT OT kind of conundrum that, you know, many of our customers struggle with. But then, more fundamentally, is, you know, if you look at the way these types of connected truck or IOT solutions started, you know, oftentimes they were, the first generation were very custom built, right? So they were brittle, right? They were kind of hardwired. Then as you move towards more commercial solutions, you had what I call the silo problem, right? You had fragmentation in terms of this capability from this vendor, this capability from another vendor. You get the idea. You know, one of the things that we really think that we need that we, that needs to be brought to the table, is, first of all, having an end to end data management platform. It's kind of an integrated, it's all tested together, you have a data lineage across the entire stack. But then also importantly, to be realistic, we have to be able to integrate to industry kind of best practices as well in terms of solution components in the car, the hardware and all those types of things. So I think there's, you know, it's just stepping back for a second, I think that there is, has been fragmentation and complexity in the past. We're moving towards more standards and more standard types of offerings. Our job as a software maker is to make that easier and connect those dots, so customers don't have to do it all on all on their own. >> And you mentioned specialized hardware. One of the things we heard earlier in the main stage was your partnership with Nvidia. We're talking about new types of hardware coming in. You guys are optimizing for that. We see the IT and the OT worlds blending together, no question. And then that end-to-end management piece, you know, this is different from, your right, from IT, normally everything's controlled, you're in the data center. And this is a metadata, you know, rethinking kind of how you manage metadata. So in the spirit of, of what we talked about earlier today, other technology partners, are you working with other partners to sort of accelerate these solutions, move them forward faster? >> Yeah, I'm really glad you're asking that, Dave, because we actually embarked on a product on a project called Project Fusion, which really was about integrating with, you know, when you look at that connected vehicle lifecycle, there are some core vendors out there that are providing some very important capabilities. So what we did is we joined forces with them to build an end-to-end demonstration and reference architecture to enable the complete data management life cycle. Now Cloudera's piece of this was ingesting data and all the things I talked about in storing and the machine learning, right? And so we provide that end to end. But what we wanted to do is we wanted to partner with some key partners. And the partners that we did integrate with were NXP. NXP provides the service-oriented gateways in the car, so that's the hardware in the car. Wind River provides an in-car operating system. That's Linux, right? That's hardened and tested. We then ran ours, our, our Apache MiNiFi, which is part of Cloudera data flow, in the vehicle, right on that operating system, on that hardware. We pumped the data over into the cloud where we did the, all the data analytics and machine learning, and built out these very specialized models. And then we used a company called Airbiquity, once we built those models, to do, you know, they specialize in automotive over-the-air updates, right? So they can then take those models, and update those models back to the vehicle very rapidly. So what we said is, look, there's, there's an established, you know, ecosystem, if you will, of leaders in this space. What we wanted to do is make sure that Cloudera was part and parcel of this ecosystem. And by the way, you mentioned Nvidia as well. We're working close with Nvidia now. So when we're doing the machine learning, we can leverage some of their hardware to get some still further acceleration in the machine learning side of things. So yeah, you know, one of the things I, I, I always say about these types of use cases, it does take a village. And what we've really tried to do is build out that, that, that an ecosystem that provides that village so that we can speed that analytics and machine learning lifecycle just as fast as it can be. >> This is, again, another great example of data intensive workloads. It's not your, it's not your grandfather's ERP that's running on, you know, traditional, you know, systems, it's, these are really purpose built, maybe they're customizable for certain edge-use cases. They're low cost, low, low power. They can't be bloated. And you're right, it does take an ecosystem. You've got to have, you know, APIs that connect and, and that's that, that takes a lot of work and a lot of thought. So that, that leads me to the technologies that are sort of underpinning this. We've talked, we've talked a lot on The Cube about semiconductor technology, and now that's changing, and the advancements we're seeing there. What, what do you see as some of the key technology areas that are advancing this connected vehicle machine learning? >> You know, it's interesting, I'm seeing it in, in a few places, just a few notable ones. I think, first of all, you know, we see that the vehicle itself is getting smarter, right? So when you look at, we look at that NXP type of gateway that we talked about. That used to be kind of a, a dumb gateway that was, really all it was doing was pushing data up and down, and provided isolation as a gateway down to the, down from the lower level subsystems. So it was really security and just basic, you know, basic communication. That gateway now is becoming what they call a service oriented gateway. So it can run. It's not, it's got disc, it's got memory, it's got all this. So now you could run serious compute in the car, right? So now all of these things like running machine-learning inference models, you have a lot more power in the car. At the same time, 5G is making it so that you can push data fast enough, making low latency computing available, even on the cloud. So now, now you've got incredible compute both at the edge in the vehicle and on the cloud, right? And, you know, and then on the, you know, on the cloud, you've got partners like Nvidia, who are accelerating it still further through better GPU-based computing. So, I mean the whole stack, if you look at that, that machine learning life cycle we talked about, you know, Dave, it seems like there's improvements in every, in every step along the way, we're starting to see technology optim, optimization just pervasive throughout the cycle. >> And then, you know, real quick, it's not a quick topic, but you mentioned security. I mean, we've seen a whole new security model emerge. There is no perimeter anymore in this, in a use case like this is there? >> No, there isn't. And one of the things that we're, you know, remember we're the data management plat, platform, and the thing we have to provide is provide end-to-end, you know, end-to-end lineage of where that data came from, who can see it, you know, how it changed, right? And that's something that we have integrated into, from the beginning of when that data is ingested, through when it's stored, through when it's kind of processed and people are doing machine learning; we provide, we will provide that lineage so that, you know, that security and governance is assured throughout the, throughout that data learning life's level. >> And federated across, in this example, across the fleet, so. All right, Michael, that's all the time we have right now. Thank you so much for that great information. Really appreciate it. >> Dave, thank you. And thanks for the audience for listening in today. >> Yes, thank you for watching. Keep it right there.

Published Date : Aug 3 2021

SUMMARY :

And in this first session, And the first thing you the use cases that you see For the first thing you really it's and you know, I that you just talked about, So I think there's, you know, And this is a metadata, you know, And by the way, you You've got to have, you and just basic, you know, And then, you know, real that lineage so that, you know, the time we have right now. And thanks for the audience Yes, thank you for watching.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Michael	PERSON	0.99+
Nvidia	ORGANIZATION	0.99+
Michael Ger	PERSON	0.99+
Cloudera	ORGANIZATION	0.99+
12 cents	QUANTITY	0.99+
NXP	ORGANIZATION	0.99+
first	QUANTITY	0.99+
Airbiquity	ORGANIZATION	0.99+
Navistar	ORGANIZATION	0.99+
150%	QUANTITY	0.99+
475,000 trucks	QUANTITY	0.99+
2022	DATE	0.99+
first session	QUANTITY	0.99+
today	DATE	0.99+
two things	QUANTITY	0.99+
first generation	QUANTITY	0.98+
15 cents a mile	QUANTITY	0.98+
Wind River	ORGANIZATION	0.98+
Linux	TITLE	0.98+
cloudera.com	OTHER	0.98+
One	QUANTITY	0.98+
3 cents a mile	QUANTITY	0.98+
first thing	QUANTITY	0.97+
one example	QUANTITY	0.97+
both	QUANTITY	0.96+
one	QUANTITY	0.96+
almost 80%	QUANTITY	0.94+
Apache	ORGANIZATION	0.94+
cents a mile	QUANTITY	0.82+
over a million	QUANTITY	0.79+
earlier today	DATE	0.79+
Project	TITLE	0.75+
one such customer	QUANTITY	0.72+
Cube	ORGANIZATION	0.7+
a second	QUANTITY	0.69+
5G	ORGANIZATION	0.64+
well	QUANTITY	0.61+
MiNiFi	COMMERCIAL_ITEM	0.59+
second	QUANTITY	0.56+
up	QUANTITY	0.54+
Cloudera	TITLE	0.53+

Basil Faruqui, BMC Software | BigData NYC 2017

>> Live from Midtown Manhattan, it's theCUBE. Covering BigData New York City 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. (calm electronic music) >> Basil Faruqui, who's the Solutions Marketing Manger at BMC, welcome to theCUBE. >> Thank you, good to be back on theCUBE. >> So first of all, heard you guys had a tough time in Houston, so hope everything's gettin' better, and best wishes to everyone down in-- >> We're definitely in recovery mode now. >> Yeah and so hopefully that can get straightened out quick. What's going on with BMC? Give us a quick update in context to BigData NYC. What's happening, what is BMC doing in the big data space now, the AI space now, the IOT space now, the cloud space? >> So like you said that, you know, the data link space, the IOT space, the AI space, there are four components of this entire picture that literally haven't changed since the beginning of computing. If you look at those four components of a data pipeline it's ingestion, storage, processing, and analytics. What keeps changing around it, is the infrastructure, the types of data, the volume of data, and the applications that surround it. And the rate of change has picked up immensely over the last few years with Hadoop coming in to the picture, public cloud providers pushing it. It's obviously creating a number of challenges, but one of the biggest challenges that we are seeing in the market, and we're helping costumers address, is a challenge of automating this and, obviously, the benefit of automation is in scalability as well and reliability. So when you look at this rather simple data pipeline, which is now becoming more and more complex, how do you automate all of this from a single point of control? How do you continue to absorb new technologies, and not re-architect our automation strategy every time, whether it's it Hadoop, whether it's bringing in machine learning from a cloud provider? And that is the issue we've been solving for customers-- >> Alright let me jump into it. So, first of all, you mention some things that never change, ingestion, storage, and what's the third one? >> Ingestion, storage, processing and eventually analytics. >> And analytics. >> Okay so that's cool, totally buy that. Now if your move and say, hey okay, if you believe that standard, but now in the modern era that we live in, which is complex, you want breath of data, but also you want the specialization when you get down to machine limits highly bounded, that's where the automation is right now. We see the trend essentially making that automation more broader as it goes into the customer environments. >> Correct >> How do you architect that? If I'm a CXO, or I'm a CDO, what's in it for me? How do I architect this? 'Cause that's really the number one thing, as I know what the building blocks are, but they've changed in their dynamics to the market place. >> So the way I look at it, is that what defines success and failure, and particularly in big data projects, is your ability to scale. If you start a pilot, and you spend three months on it, and you deliver some results, but if you cannot roll it out worldwide, nationwide, whatever it is, essentially the project has failed. The analogy I often given is Walmart has been testing the pick-up tower, I don't know if you've seen. So this is basically a giant ATM for you to go pick up an order that you placed online. They're testing this at about a hundred stores today. Now if that's a success, and Walmart wants to roll this out nation wide, how much time do you think their IT department's going to have? Is this a five year project, a ten year project? No, and the management's going to want this done six months, ten months. So essentially, this is where automation becomes extremely crucial because it is now allowing you to deliver speed to market and without automation, you are not going to be able to get to an operational stage in a repeatable and reliable manner. >> But you're describing a very complex automation scenario. How can you automate in a hurry without sacrificing the details of what needs to be? In other words, there would seem to call for repurposing or reusing prior automation scripts and rules, so forth. How can the Walmart's of the world do that fast, but also do it well? >> Yeah so we do it, we go about it in two ways. One is that out of the box we provide a lot of pre-built integrations to some of the most commonly used systems in an enterprise. All the way from the Mainframes, Oracles, SAPs, Hadoop, Tableaus of the world, they're all available out of the box for you to quickly reuse these objects and build an automated data pipeline. The other challenge we saw, and particularly when we entered the big data space four years ago was that the automation was something that was considered close to the project becoming operational. Okay, and that's where a lot of rework happened because developers had been writing their own scripts using point solutions, so we said alright, it's time to shift automation left, and allow companies to build automations and artifact very early in the developmental life cycle. About a month ago, we released what we call Control-M Workbench, its essentially a community edition of Control-M, targeted towards developers so that instead of writing their own scripts, they can use Control-M in a completely offline manner, without having to connect to an enterprise system. As they build, and test, and iterate, they're using Control-M to do that. So as the application progresses through the development life cycle, and all of that work can then translate easily into an enterprise edition of Control-M. >> Just want to quickly define what shift left means for the folks that might not know software methodologies, they don't think >> Yeah, so. of left political, left or right. >> So, we're not shifting Control-M-- >> Alt-left, alt-right, I mean, this is software development, so quickly take a minute and explain what shift left means, and the importance of it. >> Correct, so if you think of software development as a straight line continuum, you've got, you will start with building some code, you will do some testing, then unit testing, then user acceptance testing. As it moves along this chain, there was a point right before production where all of the automation used to happen. Developers would come in and deliver the application to Ops and Ops would say, well hang on a second, all this Crontab, and these other point solutions we've been using for automation, that's not what we use in production, and we need you to now go right in-- >> So test early and often. >> Test early and often. So the challenge was the developers, the tools they used were not the tools that were being used on the production end of the site. And there was good reason for it, because developers don't need something really heavy and with all the bells and whistles early in the development lifecycle. Now Control-M Workbench is a very light version, which is targeted at developers and focuses on the needs that they have when they're building and developing it. So as the application progresses-- >> How much are you seeing waterfall-- >> But how much can they, go ahead. >> How much are you seeing waterfall, and then people shifting left becoming more prominent now? What percentage of your customers have moved to Agile, and shifting left percentage wise? >> So we survey our customers on a regular basis, and the last survey showed that eighty percent of the customers have either implemented a more continuous integration delivery type of framework, or are in the process of doing it, And that's the other-- >> And getting close to a 100 as possible, pretty much. >> Yeah, exactly. The tipping point is reached. >> And what is driving. >> What is driving all is the need from the business. The days of the five year implementation timelines are gone. This is something that you need to deliver every week, two weeks, and iteration. >> Iteration, yeah, yeah. And we have also innovated in that space, and the approach we call jobs as code, where you can build entire complex data pipelines in code format, so that you can enable the automation in a continuous integration and delivery framework. >> I have one quick question, Jim, and I'll let you take the floor and get a word in soon, but I have one final question on this BMC methodology thing. You guys have a history, obviously BMC goes way back. Remember Max Watson CEO, and Bob Beach, back in '97 we used to chat with him, dominated that landscape. But we're kind of going back to a systems mindset. The question for you is, how do you view the issue of this holy grail, the promised land of AI and machine learning, where end-to-end visibility is really the goal, right? At the same time, you want bounded experiences at root level so automation can kick in to enable more activity. So there's a trade-off between going for the end-to-end visibility out of the gate, but also having bounded visibility and data to automate. How do you guys look at that market? Because customers want the end-to-end promise, but they don't want to try to get there too fast. There's a diseconomies of scale potentially. How do you talk about that? >> Correct. >> And that's exactly the approach we've taken with Control-M Workbench, the Community Edition, because earlier on you don't need capabilities like SLA management and forecasting and automated promotion between environments. Developers want to be able to quickly build and test and show value, okay, and they don't need something that is with all the bells and whistles. We're allowing you to handle that piece, in that manner, through Control-M Workbench. As things progress and the application progresses, the needs change as well. Well now I'm closer to delivering this to the business, I need to be able to manage this within an SLA, I need to be able to manage this end-to-end and connect this to other systems of record, and streaming data, and clickstream data, all of that. So that, we believe that it doesn't have to be a trade off, that you don't have to compromise speed and quality for end-to-end visibility and enterprise grade automation. >> You mentioned trade offs, so the Control-M Workbench, the developer can use it offline, so what amount of testing can they possibly do on a complex data pipeline automation when the tool's offline? I mean it seems like the more development they do offline, the greater the risk that it simply won't work when they go into production. Give us a sense for how they mitigate, the mitigation risk in using Control-M Workbench. >> Sure, so we spend a lot of time observing how developers work, right? And very early in the development stage, all they're doing is working off of their Mac or their laptop, and they're not really connected to any. And that is where they end up writing a lot of scripts, because whatever code business logic they've written, the way they're going to make it run is by writing scripts. And that, essentially, becomes the problem, because then you have scripts managing more scripts, and as the application progresses, you have this complex web of scripts and Crontabs and maybe some opensource solutions, trying to simply make all of this run. And by doing this on an offline manner, that doesn't mean that they're losing all of the other Control-M capabilities. Simply, as the application progresses, whatever automation that the builtin Control-M can seamlessly now flow into the next stage. So when you are ready to take an application into production, there's essentially no rework required from an automation perspective. All of that, that was built, can now be translated into the enterprise-grade Control M, and that's where operations can then go in and add the other artifacts, such as SLA management and forecasting and other things that are important from an operational perspective. >> I'd like to get both your perspectives, 'cause, so you're like an analyst here, so Jim, I want you guys to comment. My question to both of you would be, lookin' at this time in history, obviously in the BMC side we mention some of the history, you guys are transforming on a new journey in extending that capability of this world. Jim, you're covering state-of-the-art AI machine learning. What's your take of this space now? Strata Data, which is now Hadoop World, which is Cloud Air went public, Hortonworks is now public, kind of the big, the Hadoop guys kind of grew up, but the world has changed around them, it's not just about Hadoop anymore. So I'd like to get your thoughts on this kind of perspective, that we're seeing a much broader picture in big data in NYC, versus the Strata Hadoop show, which seems to be losing steam, but I mean in terms of the focus. The bigger focus is much broader, horizontally scalable. And your thoughts on the ecosystem right now? >> Let the Basil answer fist, unless Basil wants me to go first. >> I think that the reason the focus is changing, is because of where the projects are in their lifecycle. Now what we're seeing is most companies are grappling with, how do I take this to the next level? How do I scale? How do I go from just proving out one or two use cases to making the entire organization data driven, and really inject data driven decision making in all facets of decision making? So that is, I believe what's driving the change that we're seeing, that now you've gone from Strata Hadoop to being Strata Data, and focus on that element. And, like I said earlier, the difference between success and failure is your ability to scale and operationalize. Take machine learning for an example. >> Good, that's where there's no, it's not a hype market, it's show me the meat on the bone, show me scale, I got operational concerns of security and what not. >> And machine learning, that's one of the hottest topics. A recent survey I read, which pulled a number of data scientists, it revealed that they spent about less than 3% of their time in training the data models, and about 80% of their time in data manipulation, data transformation and enrichment. That is obviously not the best use of a data scientist's time, and that is exactly one of the problems we're solving for our customers around the world. >> That needs to be automated to the hilt. To help them >> Correct. to be more productive, to deliver faster results. >> Ecosystem perspective, Jim, what's your thoughts? >> Yeah, everything that Basil said, and I'll just point out that many of the core uses cases for AI are automation of the data pipeline. It's driving machine learning driven predictions, classifications, abstractions and so forth, into the data pipeline, into the application pipeline to drive results in a way that is contextually and environmentally aware of what's goin' on. The history, historical data, what's goin' on in terms of current streaming data, to drive optimal outcomes, using predictive models and so forth, in line to applications. So really, fundamentally then, what's goin' on is that automation is an artifact that needs to be driven into your application architecture as a repurposable resource for a variety of-- >> Do customers even know what to automate? I mean, that's the question, what do I-- >> You're automating human judgment. You're automating effort, like the judgments that a working data engineer makes to prepare data for modeling and whatever. More and more that can be automated, 'cause those are pattern structured activities that have been mastered by smart people over many years. >> I mean we just had a customer on with a Glass'Gim CSK, with that scale, and his attitude is, we see the results from the users, then we double down and pay for it and automate it. So the automation question, it's an option question, it's a rhetorical question, but it just begs the question, which is who's writing the algorithms as machines get smarter and start throwing off their own real-time data? What are you looking at? How do you determine? You're going to need machine learning for machine learning? Are you going to need AI for AI? Who writes the algorithms >> It's actually, that's. for the algorithm? >> Automated machine learning is a hot, hot not only research focus, but we're seeing it more and more solution providers, like Microsoft and Google and others, are goin' deep down, doubling down in investments in exactly that area. That's a productivity play for data scientists. >> I think the data markets going to change radically in my opinion. I see you're startin' to some things with blockchain and some other things that are interesting. Data sovereignty, data governance are huge issues. Basil, just give your final thoughts for this segment as we wrap this up. Final thoughts on data and BMC, what should people know about BMC right now? Because people might have a historical view of BMC. What's the latest, what should they know? What's the new Instagram picture of BMC? What should they know about you guys? >> So I think what I would say people should know about BMC is that all the work that we've done over the last 25 years, in virtually every platform that came before Hadoop, we have now innovated to take this into things like big data and cloud platforms. So when you are choosing Control-M as a platform for automation, you are choosing a very, very mature solution, an example of which is Navistar. Their CIO's actually speaking at the Keno tomorrow. They've had Control-M for 15, 20 years, and they've automated virtually every business function through Control-M. And when they started their predictive maintenance project, where they're ingesting data from about 300,000 vehicles today to figure out when this vehicle might break, and to predict maintenance on it. When they started their journey, they said that they always knew that they were going to use Control-M for it, because that was the enterprise standard, and they knew that they could simply now extend that capability into this area. And when they started about three, four years ago, they were ingesting data from about 100,000 vehicles. That has now scaled to over 325,000 vehicles, and they have no had to re-architect their strategy as they grow and scale. So I would say that is one of the key messages that we are taking to market, is that we are bringing innovation that spans over 25 years, and evolving it-- >> Modernizing it, basically. >> Modernizing it, and bringing it to newer platforms. >> Well congratulations, I wouldn't call that a pivot, I'd call it an extensibility issue, kind of modernizing kind of the core things. >> Absolutely. >> Thanks for coming and sharing the BMC perspective inside theCUBE here, on BigData NYC, this is the theCUBE, I'm John Furrier. Jim Kobielus here in New York city. More live coverage, for three days we'll be here, today, tomorrow and Thursday, and BigData NYC, more coverage after this short break. (calm electronic music) (vibrant electronic music)

Published Date : Feb 11 2019

SUMMARY :

Brought to you by SiliconANGLE Media who's the Solutions Marketing Manger at BMC, in the big data space now, the AI space now, And that is the issue we've been solving for customers-- So, first of all, you mention some things that never change, and eventually analytics. but now in the modern era that we live in, 'Cause that's really the number one thing, No, and the management's going to How can the Walmart's of the world do that fast, One is that out of the box we provide a lot of left political, left or right. Alt-left, alt-right, I mean, this is software development, and we need you to now go right in-- and focuses on the needs that they have And getting close to a 100 The tipping point is reached. The days of the five year implementation timelines are gone. and the approach we call jobs as code, At the same time, you want bounded experiences at root level And that's exactly the approach I mean it seems like the more development and as the application progresses, kind of the big, the Hadoop guys kind of grew up, Let the Basil answer fist, and focus on that element. it's not a hype market, it's show me the meat of the problems we're solving That needs to be automated to the hilt. to be more productive, to deliver faster results. and I'll just point out that many of the core uses cases like the judgments that a working data engineer makes So the automation question, it's an option question, for the algorithm? doubling down in investments in exactly that area. What's the latest, what should they know? should know about BMC is that all the work kind of modernizing kind of the core things. Thanks for coming and sharing the BMC perspective

ENTITIES

Entity	Category	Confidence
Jim	PERSON	0.99+
Jim Kobielus	PERSON	0.99+
Walmart	ORGANIZATION	0.99+
BMC	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
NYC	LOCATION	0.99+
Microsoft	ORGANIZATION	0.99+
one	QUANTITY	0.99+
Basil Faruqui	PERSON	0.99+
five year	QUANTITY	0.99+
ten months	QUANTITY	0.99+
two weeks	QUANTITY	0.99+
three months	QUANTITY	0.99+
six months	QUANTITY	0.99+
John Furrier	PERSON	0.99+
15	QUANTITY	0.99+
Basil	PERSON	0.99+
Houston	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Mac	COMMERCIAL_ITEM	0.99+
BMC Software	ORGANIZATION	0.99+
two ways	QUANTITY	0.99+
both	QUANTITY	0.99+
tomorrow	DATE	0.99+
Midtown Manhattan	LOCATION	0.99+
One	QUANTITY	0.99+
ten year	QUANTITY	0.99+
over 25 years	QUANTITY	0.99+
over 325,000 vehicles	QUANTITY	0.99+
about 300,000 vehicles	QUANTITY	0.99+
third one	QUANTITY	0.99+
three days	QUANTITY	0.99+
about 100,000 vehicles	QUANTITY	0.99+
about 80%	QUANTITY	0.98+
BigData	ORGANIZATION	0.98+
Thursday	DATE	0.98+
eighty percent	QUANTITY	0.98+
today	DATE	0.98+
20 years	QUANTITY	0.98+
one quick question	QUANTITY	0.98+
single point	QUANTITY	0.98+
Bob Beach	PERSON	0.97+
four years ago	DATE	0.97+
two use cases	QUANTITY	0.97+
one final question	QUANTITY	0.97+
'97	DATE	0.97+
Instagram	ORGANIZATION	0.97+
Agile	TITLE	0.96+
New York city	LOCATION	0.96+
About a month ago	DATE	0.96+
Oracles	ORGANIZATION	0.96+
Hadoop	TITLE	0.95+
about a hundred stores	QUANTITY	0.94+
less than 3%	QUANTITY	0.94+
2017	DATE	0.93+
Glass'Gim	ORGANIZATION	0.92+
about	QUANTITY	0.92+
first	QUANTITY	0.91+
Ops	ORGANIZATION	0.91+
Hadoop	ORGANIZATION	0.9+
Max Watson	PERSON	0.88+
100	QUANTITY	0.88+
theCUBE	ORGANIZATION	0.88+
Mainframes	ORGANIZATION	0.88+
Navistar	ORGANIZATION	0.86+

Basil Faruqui, BMC | theCUBE NYC 2018

(upbeat music) >> Live from New York, it's theCUBE. Covering theCUBE New York City 2018. Brought to you by SiliconANGLE Media and its ecosystem partners. >> Okay, welcome back everyone to theCUBE NYC. This is theCUBE's live coverage covering CubeNYC Strata Hadoop Strata Data Conference. All things data happen here in New York this week. I'm John Furrier with Peter Burris. Our next guest is Basil Faruqui lead solutions marketing manager digital business automation within BMC returns, he was here last year with us and also Big Data SV, which has been renamed CubeNYC, Cube SV because it's not just big data anymore. We're hearing words like multi cloud, Istio, all those Kubernetes. Data now is so important, it's now up and down the stack, impacting everyone, we talked about this last year with Control M, how you guys are automating in a hurry. The four pillars of pipelining data. The setup days are over; welcome to theCUBE. >> Well thank you and it's great to be back on theCUBE. And yeah, what you said is exactly right, so you know, big data has really, I think now been distilled down to data. Everybody understands data is big, and it's important, and it is really you know, it's quite a cliche, but to a larger degree, data is the new oil, as some people say. And I think what you said earlier is important in that we've been very fortunate to be able to not only follow the journey of our customers but be a part of it. So about six years ago, some of the early adopters of Hadoop came to us and said that look, we use your products for traditional data warehousing on the ERP side for orchestration workloads. We're about to take some of these projects on Hadoop into production and really feel that the Hadoop ecosystem is lacking enterprise-grade workflow orchestration tools. So we partnered with them and some of the earliest goals they wanted to achieve was build a data lake, provide richer and wider data sets to the end users to be able to do some dashboarding, customer 360, and things of that nature. Very quickly, in about five years time, we have seen a lot of these projects mature from how do I build a data lake to now applying cutting-edge ML and AI and cloud is a major enabler of that. You know, it's really, as we were talking about earlier, it's really taking away excuses for not being able to scale quickly from an infrastructure perspective. Now you're talking about is it Hadoop or is it S3 or is it Azure Blob Storage, is it Snowflake? And from a control-end perspective, we're very platform and technology agnostic, so some of our customers who had started with Hadoop as a platform, they are now looking at other technologies like Snowflake, so one of our customers describes it as kind of the spine or a power strip of orchestration where regardless of what technology you have, you can just plug and play in and not worry about how do I rewire the orchestration workflows because control end is taking care of it. >> Well you probably always will have to worry about that to some degree. But I think where you're going, and this is where I'm going to test with you, is that as analytics, as data is increasingly recognized as a strategic asset, as analytics increasingly recognizes the way that you create value out of those data assets, and as a business becomes increasingly dependent upon the output of analytics to make decisions and ultimately through AI to act differently in markets, you are embedding these capabilities or these technologies deeper into business. They have to become capabilities. They have to become dependable. They have to become reliable, predictable, cost, performance, all these other things. That suggests that ultimately, the historical approach of focusing on the technology and trying to apply it to a periodic or series of data science problems has to become a little bit more mature so it actually becomes a strategic capability. So the business can say we're operating on this, but the technologies to take that underlying data science technology to turn into business operations that's where a lot of the net work has to happen. Is that what you guys are focused on? >> Yeah, absolutely, and I think one of the big differences that we're seeing in general in the industry is that this time around, the pull of how do you enable technology to drive the business is really coming from the line of business, versus starting on the technology side of the house and then coming to the business and saying hey we've got some cool technologies that can probably help you, it's really line of business now saying no, I need better analytics so I can drive new business models for my company, right? So the need for speed is greater than ever because the pull is from the line of business side. And this is another area where we are unique is that, you know, Control M has been designed in a way where it's not just a set of solutions or tools for the technical guys. Now, the line of business is getting closer and closer, you know, it's blending into the technical side as well. They have a very, very keen interest in understanding are the dashboards going to be refreshed on time? Are we going to be able to get all the right promotional offers at the right time? I mean, we're here at NYC Strata, there's a lot of real-time promotion happening here. The line of business has direct interest in the delivery and the timing of all of this, so we have always had multiple interfaces to Control M where a business user who has an interest in understanding are the promotional offers going to happen at the right time and is that on schedule? They have a mobile app for them to do that. A developer who's building up complex, multi-application platform, they have an API and a programmatic interface to do that. Operations that has to monitor all of this has rich dashboards to be able to do that. That's one of the areas that has been key for our success over the last couple decades, and we're seeing that translate very well into the big data place. >> So I just want to go under the hood for a minute because I love that answer. And I'd like to pivot off what Peter said, tying it back to the business, okay, that's awesome. And I want to learn a little bit more about this because we talked about this last year and I kind of am seeing it now. Kubernetes and all this orchestration is about workloads. You guys nailed the workflow issue, complex workflows. Because if you look at it, if you're adding line of business into the equation, that's just complexity in and of itself. As more workflows exist within its own line of business, whether it's recommendations and offers and workflow issues, more lines of business in there is complex for even IT to deal with, so you guys have nailed that. How does that work? Do you plug it in and the lines of businesses have their own developers, so the people who work with the workflows engage how? >> So that's a good question, with sort of orchestration and automation now becoming very, very generic, it's kind of important to classify where we play. So there's a lot of tools that do release and build automation. There's a lot of tools that'll do infrastructure automation and orchestration. All of this infrastructure and release management process is done ultimately to run applications on top of it, and the workflows of the application need orchestration and that's the layer that we play in. And if you think about how does the end user, the business and consumer interact with all of this technology is through applications, k? So the orchestration of the workflow's inside the applications, whether you start all the way from an ERP or a CRM and then you land into a data lake and then do an ML model, and then out come the recommendations analytics, that's the layer we are automating today. Obviously, all of this-- >> By the way, the technical complexity for the user's in the app. >> Correct, so the line of business obviously has a lot more control, you're seeing roles like chief digital officers emerge, you're seeing CTOs that have mandates like okay you're going to be responsible for all applications that are facing customer facing where the CIO is going to take care of everything that's inward facing. It's not a settled structure or science involved. >> It's evolving fast. >> It's evolving fast. But what's clear is that line of business has a lot more interest and influence in driving these technology projects and it's important that technologies evolve in a way where line of business can not only understand but take advantage of that. >> So I think it's a great question, John, and I want to build on that and then ask you something. So the way we look at the world is we say the first fifty years of computing were known process, unknown technology. The next fifty years are going to be unknown process, known technology. It's all going to look like a cloud. But think about what that means. Known process, unknown technology, Control M and related types of technologies tended to focus on how you put in place predictable workflows in the technology layer. And now, unknown process, known technology, driven by the line of business, now we're talking about controlling process flows that are being created, bespoke, strategic, differentiating doing business. >> Well, dynamic, too, I mean, dynamic. >> Highly dynamic, and those workflows in many respects, those technologies, piecing applications and services together, become the process that differentiates the business. Again, you're still focused on the infrastructure a bit, but you've moved it up. Is that right? >> Yeah, that's exactly right. We see our goal as abstracting the complexity of the underlying application data and infrastructure. So, I mean, it's quite amazing-- >> So it could be easily reconfigured to a business's needs. >> Exactly, so whether you're on Hadoop and now you're thinking about moving to Snowflake or tomorrow something else that comes up, the orchestration or the workflow, you know, that's as a business as a product that's our goal is to continue to evolve quickly and in a manner that we continue to abstract the complexity so from-- >> So I've got to ask you, we've been having a lot of conversations around Hadoop versus Kubernetes on multi cloud, so as cloud has certainly come in and changed the game, there's no debate on that. How it changes is debatable, but we know that multiple clouds is going to be the modus operandus for customers. >> Correct. >> So I got a lot of data and now I've got pipelining complexities and workflows are going to get even more complex, potentially. How do you see the impact of the cloud, how are you guys looking at that, and what are some customer use cases that you see for you guys? >> So the, what I mentioned earlier, that being platform and technology agnostic is actually one of the unique differentiating factors for us, so whether you are an AWS or an Azure or a Google or On-Prem or still on a mainframe, a lot of, we're in New York, a lot of the banks, insurance companies here still do some of the most critical processing on the mainframe. The ability to abstract all of that whether it's cloud or legacy solutions is one of our key enablers for our customers, and I'll give you an example. So Malwarebytes is one of our customers and they've been using Control M for several years. Primarily the entire structure is built on AWS, but they are now utilizing Google cloud for some of their recommendation analysis on sentiment analysis because their goal is to pick the best of breed technology for the problem they're looking to solve. >> Service, the best breed service is in the cloud. >> The best breed service is in the cloud to solve the business problem. So from Control M's perspective, transcending from AWS to Google cloud is completely abstracted for them, so runs Google tomorrow it's Azure, they decide to build a private cloud, they will be able to extend the same workflow orchestration. >> But you can build these workflows across whatever set of services are available. >> Correct, and you bring up an important point. It's not only being able to build the workflows across platforms but being able to define dependencies and track the dependencies across all of this, because none of this is happening in silos. If you want to use Google's API to do the recommendations, well, you've got to feed it the data, and the data's pipeline, like we talked about last time, data ingestion, data storage, data processing, and analytics have very, very intricate dependencies, and these solutions should be able to manage not only the building of the workflow but the dependencies as well. >> But you're defining those elements as fundamental building blocks through a control model >> Correct. >> That allows you to treat the higher level services as reliable, consistent, capabilities. >> Correct, and the other thing I would like to add here is not only just build complex multiplatform, multiapplication workflows, but never lose focus of the business service of the business process there, so you can tie all of this to a business service and then, these things are complex, there are problems, let's say there's an ETL job that fails somewhere upstream, Control M will immediately be able to predict the impact and be able to tell you this means the recommendation engine will not be able to make the recommendations. Now, the staff that's going to work under mediation understands the business impact versus looking at a screen where there's 500 jobs and one of them has failed. What does that really mean? >> Set priorities and focal points and everything else. >> Right. >> So I just want to wrap up by asking you how your talk went at Strata Hadoop Data Conference. What were you talking about, what was the core message? Was it Control M, was it customer presentations? What was the focus? >> So the focus of yesterday's talk was actually, you know, one of the things is academic talk is great, but it's important to, you know, show how things work in real life. The session was focused on a real-use case from a customer. Navistar, they have IOT data-driven pipelines where they are predicting failures of parts inside trucks and buses that they manufacture, you know, reducing vehicle downtime. So we wanted to simulate a demo like that, so that's exactly what we did. It was very well received. In real-time, we spun up EMR environment in AWS, automatically provision control of infrastructure there, we applied spark and machine learning algorithms to the data and out came the recommendation at the end was that, you know, here are the vehicles that are-- >> Fix their brakes. (laughing) >> Exactly, so it was very, very well received. >> I mean, there's a real-world example, there's real money to be saved, maintenance, scheduling, potential liability, accidents. >> Liability is a huge issue for a lot of manufacturers. >> And Navistar has been at the leading edge of how to apply technologies in that business. >> They really have been a poster child for visual transformation. >> They sure have. >> Here's a company that's been around for 100 plus years and when we talk to them they tell us that we have every technology under the sun that has come since the mainframe, and for them to be transforming and leading in this way, we're very fortunate to be part of their journey. >> Well we'd love to talk more about some of these customer use cases. Other people love about theCUBE, we want to do more of them, share those examples, people love to see proof in real-world examples, not just talk so appreciate it sharing. >> Absolutely. >> Thanks for sharing, thanks for the insights. We're here Cube live in New York City, part of CubeNYC, we're getting all the data, sharing that with you. I'm John Furrier with Peter Burris. Stay with us for more day two coverage after this short break. (upbeat music)

Published Date : Sep 13 2018

SUMMARY :

Brought to you by SiliconANGLE Media with Control M, how you guys are automating in a hurry. describes it as kind of the spine or a power strip but the technologies to take that underlying of the house and then coming to the business You guys nailed the workflow issue, and that's the layer that we play in. for the user's in the app. Correct, so the line of business and it's important that technologies evolve in a way So the way we look at the world is we say that differentiates the business. of the underlying application data and infrastructure. so as cloud has certainly come in and changed the game, and what are some customer use cases that you see for the problem they're looking to solve. is in the cloud. The best breed service is in the cloud But you can build these workflows across and the data's pipeline, like we talked about last time, That allows you to treat the higher level services and be able to tell you this means the recommendation engine So I just want to wrap up by asking you at the end was that, you know, Fix their brakes. there's real money to be saved, And Navistar has been at the leading edge of how They really have been a poster child for and for them to be transforming and leading in this way, people love to see proof in real-world examples, Thanks for sharing, thanks for the insights.

ENTITIES

Entity	Category	Confidence
John	PERSON	0.99+
Basil Faruqui	PERSON	0.99+
Peter Burris	PERSON	0.99+
BMC	ORGANIZATION	0.99+
Peter	PERSON	0.99+
500 jobs	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
New York	LOCATION	0.99+
last year	DATE	0.99+
AWS	ORGANIZATION	0.99+
New York City	LOCATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Hadoop	TITLE	0.99+
first fifty years	QUANTITY	0.99+
theCUBE	ORGANIZATION	0.99+
Navistar	ORGANIZATION	0.99+
tomorrow	DATE	0.98+
yesterday	DATE	0.98+
one	QUANTITY	0.98+
this week	DATE	0.97+
Malwarebytes	ORGANIZATION	0.97+
Cube	ORGANIZATION	0.95+
Control M	ORGANIZATION	0.95+
NYC	LOCATION	0.95+
Snowflake	TITLE	0.95+
Strata Hadoop Data Conference	EVENT	0.94+
100 plus years	QUANTITY	0.93+
CubeNYC Strata Hadoop Strata Data Conference	EVENT	0.92+
last couple decades	DATE	0.91+
Azure	TITLE	0.91+
about five years	QUANTITY	0.91+
Istio	ORGANIZATION	0.9+
CubeNYC	ORGANIZATION	0.89+
day	QUANTITY	0.87+
about six years ago	DATE	0.85+
Kubernetes	TITLE	0.85+
today	DATE	0.84+
NYC Strata	ORGANIZATION	0.83+
Hadoop	ORGANIZATION	0.78+
one of them	QUANTITY	0.77+
Big Data SV	ORGANIZATION	0.75+
2018	EVENT	0.7+
Kubernetes	ORGANIZATION	0.66+
fifty years	DATE	0.62+
Control M	TITLE	0.61+
four pillars	QUANTITY	0.61+
two	QUANTITY	0.6+
-Prem	ORGANIZATION	0.6+
Cube SV	COMMERCIAL_ITEM	0.58+
a minute	QUANTITY	0.58+
S3	TITLE	0.55+
Azure	ORGANIZATION	0.49+
cloud	TITLE	0.49+
2018	DATE	0.43+

Basil Faruqui, BMC Software | BigData NYC 2017

>> Announcer: Live from Midtown Manhattan its theCUBE. Covering BigData New York City 2017. Brought to you by SiliconANGLE Media and it's ecosystem sponsors. >> His name is Jim Kobielus. >> Jim: That right, John Furrier is actually how I pronounce his name for the record. But he is Basil Faruqui. >> Basil Faruqui who's the solutions marketing manager at BMC, welcome to theCUBE. >> Basil: Thank you, good to be back on theCUBE. >> So, first of all, I heard you guys had a tough time in Houston, so hope everything's getting better and best wishes. >> Basil: Definitely in recovery mode now. >> Hopefully that can get straightened out. What's going on BMC, give us a quick update and in context to BigData NYC what's happening, what is BMC doing in the the big data space now? The AI space now, the IoT space now, the cloud space? >> Like you said you know the data space, the IoT space. the AI space. There are four components of this entire picture that literally haven't changed since the beginning of computing. If you look at those four components of a data pipeline a suggestion, storage. processing and analytics. What keeps changing around it is the infrastructure, the types of data, the volume of data and the applications that surround it. The rate of change has picked up immensely over the last few years with Hadoop coming into the picture, public cloud providers pushing it. It's obviously created a number of challenges, but one of the biggest challenges that we are seeing in the market and we're helping customers address is the challenge of automating this. And obviously the benefit of automation is in scalability as well as reliability. So when you look at this rather simple data pipeline, which is now becoming more and more complex. How do you automate all of this from a single point of control? How do you continue to absorb new technologies and not re-architect your automation strategy every time. Whether it's Hadoop, whether it's bringing in machine learning from a cloud provider. And that is the the issue we've been solving for customers. >> All right, let me jump into it. So first of all you mention some things some things that never change, ingestion storage, and what was the third one? >> Ingestions, storage, processing and eventual analytics. >> So OK, so that's cool, totally buy that. Now if you move and say hey okay so you believe that's standard but now in the modern era that we live in, which is complex, you want breadth of data, and also you want the specialization when you get down the machine learning. That's highly bound, that's where the automation it is right now. We see the trend essentially making that automation more broader as it goes into the customer environments. >> Basil: Correct. >> How do you architect that? If I'm a CXO to I'm a CDO, what's in it for me? How do I architect this because that's really the number one thing is I know what the building blocks are but they've changed in their dynamics to the marketplace. >> So the way I look at it is that what defines success and failure, and particularly in big data projects, is your ability to scale. If you start a pilot and you spend, you know, three months on it and you deliver some results. But if you cannot roll it out worldwide, nationwide, whatever it is essentially the project has failed. The analogy often give is Walmart has been testing the pick up tower, I don't know if you seen, so this is basically a giant ATM for you to go pick up an order that you placed online. They're testing this at about hundred stores today. Now that's a success and Walmart wants to roll this out nationwide. How much time do you think their IT departments can have? Is this is a five year project, ten year project? No, the management's going to want this done six months, ten months. So essentially, this is where automation becomes extremely crucial because it is now allowing you to deliver speed to market and without automation you are not going to be able to get to an operational stage in a repeatable and reliable manner. >> You're describing a very complex automation scenario. How can you automate in a hurry without sacrificing you know, the details of what needs to be, In other words, you seem to call for re purposing or reusing prior automation scripts and rules and so forth. How how can the Walmart's of the world do that fast, but also do it well? >> So we do it we go about it in two ways. One is that out of the box we provide a lot of pre built integrations to some of the most commonly used systems in an enterprise. All the way up from the mainframes, Oracle's, SAP's Hadoop, Tableau's, of the world. They're all available out of the box for you to quickly reuse these objects and build an automated data pipeline. The other challenge we saw, and particularly when we entered the big data space four years ago, was that the automation was something that was considered close to the project becoming operational. And that's where a lot of rework happened because developers have been writing their own scripts, using point solutions. So we said all right, it's time to shift automation left and allow companies to build automation as an artifact very early in the development lifecycle. About a month ago we released what we call Control-M Workbench which is essentially a Community Edition of Control-M targeted towards developers. So that instead of writing their own scripts they can use a Control-M in a completely offline manner without having to connect to an enterprise system. As they build and test and iterate, they're using Control-M to do that. So as the application progresses the development lifecycle, and all of that work can then translate easily into an Enterprise Edition of Control-M. >> So quickly, just explain what shift-left means for the folks that might not know software methodologies, left political or left alt-right, this is software development so please take a minute explain what shift-left means, and the importance of it. >> Correct, so the if you if you think of software development and as a straight line continuum you can start with building some code, you will do some testing, then unit testing, than user acceptance testing. As it moves along this chain, there was a point right before production where all of the automation used to happen. You know, developers would come in and deliver the application to ops, and ops would say, well hang on a second all this CRON tab and all these other point solutions have been using for automation, that's not what we use in production. And we need you to now. >> To test early and often. >> Test early and often. The challenge was the developers, the tools they use, we're not the tools that were being used on the production end of the cycle. And there was good reason for it because developers don't need something really heavy and with all the bells and whistles early in the development lifecycle. Control-M Workbench is a very light version which is targeted at developers and focuses on the needs that they have when they're building and developing as the application progresses through its life cycle. >> How much are you seeing Waterfall and then people shifting-left becoming more prominent now. What percentage of your customers have moved to Agile and shifting-left percentage wise? >> So we survey our customers on a regular basis. In the last survey showed that 80% of the customers have either implemented a more continuous integration delivery type of framework, or are in the process of doing it. And that's the other. >> And getting upfront costs as possible, a tipping point is reached. >> What is driving all of that is the need from the business, you know, the days of the five year implementation timelines are gone. This is something that you need to deliver every week, two weeks, and iteration. And we have also innovated in that space and the approach we call Jobs-as-Code where you can build entire, complex data pipelines in code formats so that you can enable the automation in a continuous integration and delivery framework. >> I have one quick question, Jim, and then I'll let you take the floor and got to learn to get a word in soon. But I have one final question on this BMC methodology thing. You guys have a history obviously BMC goes way back. Remember Max Watson CEO, and then in Palm Beach back in 97 we used to chat with him. Dominated that landscape, but we're kind of going back to a systems mindset, so the question for you is how do you view the issue of the this holy grail, the promised land of AI and machine learning. Where, you know, end-to-end visibility is really the goal, right. At the same time, you want bounded experiences at root level so automation can kick in to enable more activity. So it's a trade off between going for the end-to-end visibility out of the gate, but also having bounded visibility and data to automate. How do you guys look at that market because customers want the end-to-end promise, but they don't want to try to get there too fast as a dis-economies of scale potentially. How do you talk about that? >> And that's exactly the approach we've taken with Control-M Workbench the Community Edition. Because early on you don't need capabilities like SLA management and forecasting and automated promotion between environments. Developers want to be able to quickly build, and test and show value, OK. And they don't need something that, as you know, with all the bells and whistles. We're allowing you to handle that piece in that manner, through Control-M Workbench. As things progress, and the application progresses, the needs change as well. Now I'm closer to delivering this to the business, I need to be able to manage this within an SLA. I need to be able to manage this end-to-end and connect this other systems of record and streaming data and click stream data, all of that. So that we believe that there it doesn't have to be a trade off. That you don't have to compromise speed and quality and visibility and enterprise grade automation. >> You mention trade-offs so the Control-M Workbench the developer can use it offline, so what amount of testing can they possibly do on a complex data pipeline automation, when it's when the tool is off line? I mean it simply seems like the more development they do off line, the greater the risk that it simply won't work when they go into production. Give us a sense for how they mitigate that risk. >> Sure, we spent a lot of time observing how developers work and very early in the development stage, all they're doing is working off of their Mac or their laptop and they're not really connecting to any. And that is where they end up writing a lot of scripts because whatever code, business logic, that they've written the way they're going to make it run is by writing scripts. And that essentially becomes a problem because then you have scripts managing more scripts and as the the application progresses, you have this complex web of scripts and CRON tabs and maybe some open source solutions. trying to make, simply make, all of this run. And by doing this I don't know offline manner that doesn't mean that they're losing all of the other controlling capabilities. Simply, as the application progresses whatever automation that they've built in Control-M can seamlessly now flow into the next stage. So when you are ready take an application into production there is essentially no rework required from an automation perspective. All of that that was built can now be translated into the enterprise grade Control-M and that's where operations can then go in and add the other artifacts such as SLA management forecasting and other things that are important from an operational perspective. >> I'd like to get both your perspectives because you're like an analyst here. So Jim, I want you guys to comment, my question to both of you would be you know, looking at this time in history, obviously on the BMC side, mention some of the history. You guys are transforming on a new journey and extending that capability in this world. Jim, you're covering state of the art AI machine learning. What's your take of the space now? Strata Data which is now Hadoop World, which is, Cloudera went public, Hortonworks is now public. Kind of the big, the Hadoop guys kind of grew up, but the world has changed around them. It's not just about Hadoop anymore. So I want to get your thoughts on this kind of perspective. We're seeing a much broader picture in BigData NYC versus the Strata Hadoop, which seems to be losing steam. But, I mean, in terms of the focus, the bigger focus is much broader horizontally scalable your thoughts on the ecosystem right now. >> Let Basil answer first unless Basil wants me to go first. >> I think the reason the focus is changing is because of where the projects are in their life cycle. You know now what we're seeing is most companies are grappling with how do I take this to the next level. How do I scale, how do I go from just proving out one or two use cases to making the entire organization data driven and really inject data driven decision making in all facets of decision making. So that is, I believe, what's driving the change that we're seeing, that you know now you've gone from Strata Hadoop to being Strata Data, and focus on that element. Like I said earlier, these difference between success and failure is your ability to scale and operationalize. Take machine learning for example. >> And really it's not a hype market. Show me the meat on the bone, show me scale, I got operational concerns of security and whatnot. >> And machine learning you know that's one of the hottest topics. A recent survey I read which polled a number of data scientists, it revealed that they spent about less than 3% of their time in training the data models and about 80% of their time in data manipulation, data transformation and enrichment. That is obviously not the best use of the data scientists time, and that is exactly one of the problems we're solving for our customers around the world. >> And it needs to be automated to the hilt to help them to be more productive delivering fast results. >> Ecosystem perspective, Jim whats you thoughts? >> Yes everything that Basil said, and I'll just point out that many of the core use cases for AI are automation of the data pipeline. You know it's driving machine learning driven predictions, classifications, you know abstractions and so forth, into the data pipeline, into the application pipeline to drive results in a way that is contextually and environmentally aware of what's going on. The path, the history historical data, what's going on in terms of current streaming data to drive optimal outcomes, you know, using predictive models and so forth, in line to applications. So really, fundamentally then, what's going on is that automation is an artifact that needs to be driven into your application architecture as a re-purposeful resource for a variety of jobs. >> How would you even know what to automate? I mean that's the question. >> You're automating human judgment, your automating effort. Like the judgments that a working data engineer makes to prepare data for modeling and whatever. More and more that need can be automated because those are patterned, structured activities that have been mastered by smart people over many years. >> I mean we just had a customer on his with a glass company, GSK, with that scale, and his attitude is we see the results from the users then we double down and pay for it and automate it. So the automation question, it's a rhetorical question but this begs the question, which is you know who's writing the algorithms as machines get smarter and start throwing off their own real time data. What are you looking at, how do you determine you're going to need you machine learning for machine learning? You're going to need AI for AI? Who writes the algorithms for the algorithms? >> Automated machine learning is a hot hot, not only research focus, but we're seeing it more and more solution providers like Microsoft and Google and others, are going deep down doubling down and investments in exactly that area. That's a productivity play for data scientists. >> I think the data markets going to change radically in my opinion, so you're starting to see some things with blockchain some other things that are interesting. Data sovereignty, data governance are huge issues. Basil, just give your final thoughts for this segment as we wrap this up. Final thoughts on data and BMC, what should people know about BMC right now, because people might have a historical view of BMC. What's the latest, what should they know, what's the new Instagram picture of BMC? What should they know about you guys? >> I think what I would say people should know about BMC is that you know all the work that we've done over the last 25 years, in virtually every platform that came before Hadoop, we have now innovated to take this into things like big data and cloud platforms. So when you are choosing Control-M as a platform for automation, you are choosing a very very mature solution. An example of which is Navistar and their CIO is actually speaking at the keynote tomorrow. They've had Control-M for 15, 20 years and have automated virtually every business function through Control-M. And when they started their predictive maintenance project where there ingesting data from about 300 thousand vehicles today, to figure out when this vehicle might break and do predictive maintenance on it. When they started their journey they said that they always knew that they were going to use Control-M for it because that was the enterprise standard. And they knew that they could simply now extend that capability into this area. And when they started about three four years ago there were ingesting data from about a hundred thousand vehicles, that has now scaled over 325 thousand vehicles and they have not had to re-architect their strategy as they grow and scale. So, I would say that is one of the key messages that we are are taking to market, is that we are bringing innovation that has spanned over 25 years and evolving it. >> Modernizing it. >> Modernizing it and bringing it to newer platforms. >> Congratulations, I wouldn't call that a pivot, I'd call it an extensibility issue, kind of modernizing the core things. >> Absolutely. >> Thanks for coming and sharing the BMC perspective inside theCUBE here. On BigData NYC this is theCUBE. I'm John Furrier, Jim Kobielus here in New York City, more live coverage the three days we will be here, today, tomorrow and Thursday at BigData NYC. More coverage after this short break.

Published Date : Sep 27 2017

SUMMARY :

Brought to you by SiliconANGLE Media how I pronounce his name for the record. Basil Faruqui who's the solutions marketing manager So, first of all, I heard you guys The AI space now, the IoT space now, the cloud space? And that is the the issue we've been solving So first of all you mention some things some things the specialization when you get down the machine learning. the number one thing is I know what the building blocks are the pick up tower, I don't know if you seen, How how can the Walmart's of the world One is that out of the box we provide for the folks that might not know software methodologies, Correct, so the if you if you think and developing as the application progresses How much are you seeing Waterfall And that's the other. And getting upfront costs as possible, What is driving all of that is the need from At the same time, you want bounded experiences And that's exactly the approach we've taken with I mean it simply seems like the more development and as the the application progresses, Kind of the big, the Hadoop guys kind of grew up, that we're seeing, that you know now you've gone Show me the meat on the bone, show me scale, of the data scientists time, and that is exactly And it needs to be automated to the hilt that many of the core use cases for AI are automation I mean that's the question. Like the judgments that a working data engineer makes So the automation question, it's a rhetorical question and more solution providers like Microsoft What's the latest, what should they know, is that you know all the work that we've done and bringing it to newer platforms. the core things. more live coverage the three days we will be here,

ENTITIES

Entity	Category	Confidence
Jim	PERSON	0.99+
Walmart	ORGANIZATION	0.99+
Jim Kobielus	PERSON	0.99+
Basil Faruqui	PERSON	0.99+
John Furrier	PERSON	0.99+
BMC	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Basil	PERSON	0.99+
Google	ORGANIZATION	0.99+
Houston	LOCATION	0.99+
New York City	LOCATION	0.99+
15	QUANTITY	0.99+
80%	QUANTITY	0.99+
Palm Beach	LOCATION	0.99+
one	QUANTITY	0.99+
ten months	QUANTITY	0.99+
five year	QUANTITY	0.99+
ten year	QUANTITY	0.99+
two weeks	QUANTITY	0.99+
six months	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
three days	QUANTITY	0.99+
over 325 thousand vehicles	QUANTITY	0.99+
Mac	COMMERCIAL_ITEM	0.99+
both	QUANTITY	0.99+
One	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
three months	QUANTITY	0.99+
tomorrow	DATE	0.99+
two ways	QUANTITY	0.99+
Thursday	DATE	0.99+
GSK	ORGANIZATION	0.99+
about 300 thousand vehicles	QUANTITY	0.99+
about 80%	QUANTITY	0.99+
today	DATE	0.99+
Midtown Manhattan	LOCATION	0.99+
SAP	ORGANIZATION	0.98+
one quick question	QUANTITY	0.98+
third one	QUANTITY	0.98+
Strata Hadoop	TITLE	0.98+
four years ago	DATE	0.98+
over 25 years	QUANTITY	0.98+
single point	QUANTITY	0.98+
about a hundred thousand vehicles	QUANTITY	0.97+
one final question	QUANTITY	0.97+
About a month ago	DATE	0.96+
Max Watson	PERSON	0.96+
Instagram	ORGANIZATION	0.96+
BigData	ORGANIZATION	0.95+
four components	QUANTITY	0.95+
about hundred stores	QUANTITY	0.95+
first	QUANTITY	0.95+
two use cases	QUANTITY	0.95+
NYC	LOCATION	0.94+
Navistar	ORGANIZATION	0.94+
BMC Software	ORGANIZATION	0.93+
97	DATE	0.93+
Agile	TITLE	0.89+

Basil Faruqui, BMC Software - BigData SV 2017 - #BigDataSV - #theCUBE

(upbeat music) >> Announcer: Live from San Jose, California, it's theCUBE covering Big Data Silicon Valley 2017. >> Welcome back everyone. We are here live in Silicon Valley for theCUBE's Big Data coverage. Our event, Big Data Silicon Valley, also called Big Data SV. A companion event to our Big Data NYC event where we have our unique program in conjunction with Strata Hadoop. I'm John Furrier with George Gilbert, our Wikibon big data analyst. And we have Basil Faruqui, who is the Solutions Marketing Manager at BMC Software. Welcome to theCUBE. >> Thank you, great to be here. >> We've been hearing a lot on theCUBE about schedulers and automation, and machine learning is the hottest trend happening in big data. We're thinking that this is going to help move the needle on some things. Your thoughts on this, on the world we're living in right now, and what BMC is doing at the show. >> Absolutely. So, scheduling and workflow automation is absolutely critical to the success of big data projects. This is not something new. Hadoop is only about 10 years old but other technologies that have come before Hadoop have relied on this foundation for driving success. If we look the Hadoop world, what gets all the press is all the real-time stuff, but what powers all of that underneath it is a very important layer of batch. If you think about some of the most common use cases for big data, if you think of a bank, they're talking about fraud detection and things like that. Let's just take the fraud detection example. Detecting an anomaly of how somebody is spending, if somebody's credit card is used which doesn't match with their spending habits, the bank detects that and they'll maybe close the card down or contact somebody. But if you think about everything else that has happened before that as something that has happened in batch mode. For them to collect the history of how that card has been used, then match it with how all the other card members use the cards. When the cards are stolen, what are those patterns? All that stuff is something that is being powered by what's today known as workload automation. In the past, it's been known by names such as job scheduling and batch processing. >> In the systems businesses everyone knows what schedulers, compilers, all this computer science stuff. But this is interesting. Now that the data lake has become so swampy, and people call it the data swamp, people are looking at moving data out of data lakes into real time, as you mention, but it requires management. So, there's a lot of coordination going on. This seems to be where most enterprises are now focusing their attention on, is to make that data available. >> Absolutely. >> Hence the notion of scheduling and workloads. Because their use cases are different. Am I getting it right? >> Yeah, absolutely. And if we look at what companies are doing, every CEO and every boardroom, there's a charter for digital transformation for companies. And, it's no longer about taking one or two use cases around big data and driving success. Data and intelligence is now at the center of everything a company does, whether it's building new customer engagement models, whether it's building new ecosystems with their partners, suppliers. Back-office optimization. So, when CIOs and data architects think about having to build a system like that, they are faced with a number of challenges. It has to become enterprise ready. It has to take into account governance, security, and others. But, if you peel the onion just a little bit, what architects and CIOs are faced with is okay, you've got a web of complex technologies, legacy applications, modern applications that hold a lot of the corporate data today. And then you have new sources of data like social media, devices, sensors, which have a tendency to produce a lot more data. First things first, you've got a ecosystem like Hadoop, which is supposed to be kind of the nerve center of the new digital platform. You've got to start ingesting all this data into Hadoop. This has to be in an automated fashion for it to be able to scalable. >> But this is the combination of streaming and batch. >> Correct. >> Now this seems to be the management holy grail right now. Nailing those two. Did I get that? >> Absolutely. So, people talk about, in technical terms, the speed layer and the batch layer. And both have to converge for them to be able to deliver the intelligence and insight that the business users are looking for. >> Would it be fair to say it's not just the convergence of the speed layer and batch layer in Hadoop but what BMC brings to town is the non-Hadoop parts of those workloads. Whether it's batch outside Hadoop or if there's streaming, which sort-of pre-Hadoop was more nichey. But we need this over-arching control, which if it's not a Hadoop-centric architecture. >> Absolutely. So, I've said this for a long time, that Hadoop is never going to live on an island on its own in the enterprise. And with the maturation of the market, Hadoop has to now play with the other technologies in the stack So if you think about, just take data ingestion for an example, you've got ERP's, you've got CRM's, you've got middleware, you've got data warehouses, and you have to ingest a lot of that in. Where Control-M brings a lot of value and speeds up time to market is that we have out-of-the box integrations with a lot of the systems that already exist in the enterprise, such as ERP solutions and others. Virtually any application that can expose itself through an API or a web service, Control-M has the ability to automate that ingestion piece. But this is only step one of the journey. So, you've brought all this data into Hadoop and now you've got to process it. The number of tools available for processing this is growing at an unprecedented rate. You've got, you know MapReduce was a hot thing just two years ago and now Spark has taken over. So Control-M, about four years ago we started building very deep native capabilities in their new ecosystem. So, you've got ingestion that's automated, then you can seamlessly automate the actual processing of the data using things like Spark, Hive, PEG, and others. And the last mile of the journey, the most important one, is them making this refined data available to systems and users that can analyze it. Often Hadoop is not the repository where analytic systems sit on top of. It's another layer where all of this has to be moved. So, if you zoom out and take a look at it, this is a monumental task. And if you use siloed approach to automating this, this becomes unscalable. And that's where a lot of the Hadoop projects often >> Crash and burn. >> Crash and burn, yes, sustainability. >> Let's just say it, they crash and burn. >> So, Control-M has been around for 30 years. >> By the way, just to add to the crash-and-burn piece, the data lake gets stalled there, that's why the swamp happens, because they're like, now how do I operationalize this and scale it out? >> Right, if you're storing a lot of data and not making it available for processing and analysis, then it's of no use. And that's exactly our value proposition. This is a problem we haven't solved for the first time. We did this as we have seen these waves of automation come through. From the mainframe time when it was called batch processing. Then it evolved into distributed client server when it was known more as job scheduling. And now. >> So BMCs have seen this movie before. >> Absolutely. >> Alright, so let's take a step back. Zoom out, step back, go hang out in the big trees, look down on the market. Data practitioners, big data practitioners out there right now are wrestling with this issue. You've got streaming, real-time stuff, you got batch, it's all coming together. What is Control-M doing great right now with practitioners that you guys are solving? Because there are a zillion tools out there, but people are human. Every hammer looks for a nail. >> Sure. So, you have a lot of change happening at the same time but yet these tools. What is Control-M doing to really win? Where are you guys winning? >> Where we are adding a lot of value for our customers is helping them speed up the time to market and delivering these big data projects, in delivering them at scale and quality. >> Give me an example of a project. >> Malwarebytes is a Silicon Valley-based company. They are using this to ingest and analyze data from thousands of end-points from their end users. >> That's their Lambda architecture, right? >> In Lambda architecture, I won't steal their thunder, they're presenting tomorrow at eleven. >> Okay. >> Eleven-thirty tomorrow. Another example is a company called Navistar. Now here's a company that's been around for 200 years. They manufacture heavy-duty trucks, 18-wheelers, school buses. And they recently came up with a service called OnCommand. They have a fleet of 160,000 trucks that are fitted with sensors. They're sending telematic data back to their data centers. And in between that stops in the cloud. So it gets to the cloud. >> So they're going up to the cloud for upload and backhaul, basically, right? >> Correct. So, it goes to the cloud. From there it is ingested inside their Hadoop systems. And they're looking for trends to make sure none of the trucks break down because a truck that's carrying freight breaks down hits the bottom line right away. But that's not where they're stopping. In real time they can triangulate the position of the truck, figure out where the nearest dealership is. Do they have the parts? When to schedule the service. But, if you think about it, the warranty information, the parts information is not sitting in Hadoop. That's sitting in their mainframes, SAP systems, and others. And Control-M is orchestrating this across the board, from mainframe to ERP and into Hadoop for them to be able to marry all this data together. >> How do you get back into the legacy? That's because you have the experience there? Is that part of the product portfolio? >> That is absolutely a part of the product portfolio. We started our journey back in the mainframe days, and as the world has evolved, to client server to web, and now mobile and virtualized and software-defined infrastructures, we have kept pace with that. >> You guys have a nice end-to-end view right now going on. And certainly that example with the trucks highlights IOT rights right there. >> Exactly. You have a clear line of sight on IOT? >> Yup. >> That would be the best measure of your maturity is the breadth of your integrations. >> Absolutely. And we don't stop at what we provide just out of the box. We realized that we have 30 to 35 out-of-the box integrations but there are a lot more applications than that. We have architected control them in a way where that can automate data loads on any application and any database that can expose itself through an API. That is huge because if you think about the open-source world, by the time this conference is going to be over, there's going to be a dozen new tools and projects that come online. And that's a big challenge for companies too. How do you keep pace with this and how do you (drowned out) all this? >> Well, I think people are starting to squint past the fashion aspect of open source, which I love by the way, but it does create more diversity. But, you know, some things become fashionable and then get big-time trashed. Look at Spark. Spark was beautiful. That one came out of the woodwork. George, you're tracking all the fashion. What's the hottest thing right now on open source? >> It seems to me that we've spent five-plus years building data lakes and now we're trying to take that data and apply the insides from it to applications. And, really Control-M's value add, my understanding is, we have to go beyond Hadoop because Hadoop was an island, you know, an island or a data lake, but now the insides have to be enacted on applications that go outside that ecosystem. And that's where Control-M comes in. >> Yeah, absolutely. We are that overarching layer that helps you connect your legacy systems and modern systems and bring it all into Hadoop. The story I tell when I'm explaining this to somebody is that you've installed Hadoop day-one, great, guess what, it has no data in it. You've got to ingest data and you have to be able to take a strategic approach to that because you can use some point solutions and do scripting for the first couple of use cases, but as soon as the business gives us the green light and says, you know what, we really like what we've seen now let's scale up, that's where you really need to take a strategic approach, and that's where Control-M comes in. >> So, let me ask then, if the bleeding edge right now is trying to operationalize the machine learning models that people are beginning to experiment with, just the way they were experimenting with data lakes five years ago, what role can Control-M play today in helping people take a trained model and embed it in an application so it produces useful actions, recommendations, and how much custom integration does that take? >> If we take the example of machine learning, if you peel the onion of machine learning, you've got data that needs to be moved, that needs to be constantly evaluated, and then the algorithms have to be run against it to provide the insights. So, this in itself is exactly what Control-M allows you to do, is ingest the data, process the data, let the algorithms process it, and then of course move it to a layer where people and other systems, it's not just about people anymore, it's other systems that'll analyze the data. And the important piece here is that we're allowing you to do this from a single pane of glass. And being able to see this picture end to end. All of this work is being done to drive business results, generating new revenue models, like in the case of Navistar. Allowing you to capture all of this and then tie it to business SOAs, that is one of the most highly-rated capabilities of Control-M from our customers. >> This is the cloud equation we were talking last week at Google Next. A combination of enterprise readiness across the board. The end-to-end is the picture and you guys are in a good position. Congratulations, and thanks for coming on theCUBE. Really appreciate it. >> Absolutely, great to be here. >> It's theCUBE breaking it down here at Big Data World. This is the trend. It's an operating system world in the cloud. Big data with IOT, AI, machine learning. Big themes breaking out early-on at Big Data SV in conjunction with Strata Hadoop. More right after this short break.

Published Date : Mar 15 2017

SUMMARY :

it's theCUBE covering Big A companion event to and machine learning is the hottest trend is all the real-time stuff, and people call it the data swamp, Hence the notion of Data and intelligence is now at the center But this is the combination Now this seems to be the that the business users are looking for. of the speed layer and the market, Hadoop has to So, Control-M has From the mainframe time when look down on the market. What is Control-M doing to really win? and delivering these big data projects, Malwarebytes is a Silicon In Lambda architecture, And in between that stops in the cloud. So, it goes to the cloud. and as the world has evolved, And certainly that example with the trucks You have a clear line of sight on IOT? is the breadth of your integrations. is going to be over, That one came out of the woodwork. but now the insides have to and do scripting for the that is one of the most This is the cloud This is the trend.

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
Basil Faruqui	PERSON	0.99+
BMC	ORGANIZATION	0.99+
one	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
Navistar	ORGANIZATION	0.99+
George	PERSON	0.99+
five-plus years	QUANTITY	0.99+
30	QUANTITY	0.99+
John Furrier	PERSON	0.99+
160,000 trucks	QUANTITY	0.99+
San Jose, California	LOCATION	0.99+
two	QUANTITY	0.99+
Hadoop	TITLE	0.99+
Malwarebytes	ORGANIZATION	0.99+
Silicon Valley	LOCATION	0.99+
last week	DATE	0.99+
Lambda	TITLE	0.99+
both	QUANTITY	0.99+
OnCommand	ORGANIZATION	0.99+
five years ago	DATE	0.99+
tomorrow	DATE	0.98+
two years ago	DATE	0.98+
35	QUANTITY	0.98+
first time	QUANTITY	0.98+
Big Data SV	EVENT	0.98+
18-wheelers	QUANTITY	0.98+
first couple	QUANTITY	0.98+
Big Data	EVENT	0.98+
BMC Software	ORGANIZATION	0.97+
Google	ORGANIZATION	0.97+
today	DATE	0.97+
First	QUANTITY	0.97+
about 10 years old	QUANTITY	0.97+
Control-M	ORGANIZATION	0.96+
two use cases	QUANTITY	0.96+
Big Data Silicon Valley 2017	EVENT	0.95+
Hadoop	ORGANIZATION	0.95+
30 years	QUANTITY	0.94+
first	QUANTITY	0.94+
NYC	LOCATION	0.94+
Big Data Silicon Valley	EVENT	0.93+
single pane	QUANTITY	0.92+
Eleven-thirty	DATE	0.9+
step one	QUANTITY	0.88+
Strata Hadoop	TITLE	0.88+
200 years	QUANTITY	0.87+
theCUBE	ORGANIZATION	0.87+
a dozen new tools	QUANTITY	0.83+
about four years ago	DATE	0.83+
Wikibon	ORGANIZATION	0.83+
-M	ORGANIZATION	0.82+
Big Data SV	ORGANIZATION	0.82+
Control-M	PERSON	0.81+
a zillion tools	QUANTITY	0.8+
thousands of end-points	QUANTITY	0.76+
eleven	DATE	0.76+
Spark	TITLE	0.76+
BMCs	ORGANIZATION	0.74+
Strata Hadoop	PERSON	0.67+
BigData SV 2017	EVENT	0.66+
#BigDataSV	EVENT	0.62+
Big	ORGANIZATION	0.62+
SAP	ORGANIZATION	0.6+
MapReduce	ORGANIZATION	0.58+
Hive	TITLE	0.52+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Navistar: