PUBLIC SECTOR Speed to Insight

>>Hi, this is Cindy Mikey, vice president of industry solutions at caldera. Joining me today is chef is Molly, our solution engineer for the public sector. Today. We're going to talk about speed to insight. Why using machine learning in the public sector, specifically around fraud, waste and abuse. So topic for today, we'll discuss machine learning, why the public sector uses it to target fraud, waste, and abuse, the challenges. How do we enhance your data and analytical approaches the data landscape analytical methods and shad we'll go over reference architecture and a case study. So by definition at fraud waste and abuse per the government accountability office is broad as an attempt to obtain something about a value through unwelcomed misrepresentation waste is about squandering money or resources and abuse is about behaving improperly or unreasonably to actually obtain something of value for your personal, uh, benefit. So as we look at fraud, um, and across all industries, it's a top of mind, um, area within the public sector. >>Um, the types of fraud that we see is specifically around cyber crime, uh, looking at accounting fraud, whether it be from an individual perspective to also, uh, within organizations, looking at financial statement fraud, to also looking at bribery and corruption, as we look at fraud, it really hits us from all angles, whether it be from external perpetrators or internal perpetrators, and specifically for the research by PWC, the key focus area is we also see over half of fraud is actually through some form of internal or external perpetrators, again, key topics. So as we also look at a report recently by the association of certified fraud examiners, um, within the public sector, the us government, um, in 2017, it was identified roughly $148 billion was attributable to fraud, waste and abuse. Specifically of that 57 billion was focused on reported monetary losses and another 91 billion on areas where that opportunity or the monetary basis had not yet been measured. >>As we look at breaking those areas down again, we look at several different topics from an out payment perspective. So breaking it down within the health system, over $65 billion within social services, over $51 billion to procurement fraud to also, uh, uh, fraud, waste and abuse that's happening in the grants and the loan process to payroll fraud, and then other aspects, again, quite a few different topical areas. So as we look at those areas, what are the areas that we see additional type of focus, those are broad stroke areas. What are the actual use cases that, um, agencies are using the data landscape? What data, what analytical methods can we use to actually help curtail and prevent some of the, uh, the fraud waste and abuse. So, as we look at some of the analytical processes and analytical use great, uh, use cases in the public sector, whether it's from, uh, you know, the taxation areas to looking at, you know, social services, uh, to public safety, to also the, um, our, um, additional agency methods, we're going to focus specifically on some of the use cases around, um, you know, fraud within the tax area. >>Uh, we'll briefly look at some of the aspects of unemployment insurance fraud, uh, benefit fraud, as well as payment integrity. So fraud has its, um, uh, underpinnings in quite a few different government agencies and difficult, different analytical methods and I usage of different data. So I think one of the key elements is, you know, you can look at your, your data landscape on specific data sources that you need, but it's really about bringing together different data sources across a different variety, a different velocity. So, uh, data has different dimensions. So we'll look at on structured types of data of semi-structured data, behavioral data, as well as when we look at, um, you know, predictive models, we're typically looking at historical type information, but if we're actually trying to look at preventing fraud before it actually happens, or when a case may be in flight, which is specifically a use case that Chev is going to talk about later it's how do I look at more, that real, that streaming information? >>How do I take advantage of data, whether it be, uh, you know, uh, financial transactions we're looking at, um, asset verification, we're looking at tax records, we're looking at corporate filings. Um, and we can also look at more, uh, advanced data sources where as we're looking at, um, investigation type information. So we're maybe going out and we're looking at, uh, deep learning type models around, uh, you know, semi or that, uh, behavioral that's unstructured data, whether it be camera analysis and so forth. So for quite a different variety of data and the breadth and the opportunity really comes about when you can integrate and look at data across all different data sources. So in essence, looking at a more extensive, uh, data landscape. So specifically I want to focus on some of the methods, some of the data sources and some of the analytical techniques that we're seeing, uh, being used, um, in the government agencies, as well as opportunities to look at new methods. >>So as we're looking at, you know, from a, um, an audit planning or looking at, uh, the opportunity for the likelihood of non-compliance, um, specifically we'll see data sources where we're maybe looking at a constituents profile, we might actually be investigating the forms that they provided. We might be comparing that data, um, or leveraging internal data sources, possibly looking at net worth, comparing it against other financial data, and also comparison across other constituents groups. Some of the techniques that we use are some of the basic natural language processing, maybe we're going to do some text mining. We might be doing some probabilistic modeling, uh, where we're actually looking at, um, information within the agency to also comparing that against possibly tax forms. A lot of times it's information historically has been done on a batch perspective, both structured and semi-structured type information. And typically the data volumes can be low, but we're also seeing those data volumes on increase exponentially based upon the types of events that we're dealing with, the number of transactions. >>Um, so getting the throughput, um, and chef's going to specifically talk about that in a moment. The other aspect is, as we look at other areas of opportunity is when we're building upon, how do I actually do compliance? How do I actually look at conducting audits or potential fraud to also looking at areas of under-reported tax information? So there you might be pulling in, um, some of our other types of data sources, whether it's being property records, it could be data that's being supplied by the actual constituents or by vendors to also pulling in social media information to geographical information, to leveraging photos on techniques that we're seeing used is possibly some sentiment analysis, link analysis. Um, how do we actually blend those data sources together from a natural language processing? But I think what's important here is also the method and the looking at the data velocity, whether it be batch, whether it be near real time, again, looking at all types of data, whether it's structured semi-structured or unstructured and the key and the value behind this is, um, how do we actually look at increasing the potential revenue or the, uh, under reported revenue? >>Uh, how do we actually look at stopping fraudulent payments before they actually occur? Um, also looking at increasing the amount of, uh, the level of compliance, um, and also looking at the potential of prosecution of fraud cases. And additionally, other areas of opportunity could be looking at, um, economic planning. How do we actually perform some link analysis? How do we bring some more of those things that we saw in the data landscape on customer, or, you know, constituent interaction, bringing in social media, bringing in, uh, potentially police records, property records, um, other tax department, database information. Um, and then also looking at comparing one individual to other individuals, looking at people like a specific constituent, are there areas where we're seeing, uh, um, other aspects of a fraud potentially being occurring. Um, and also as we move forward, some of the more advanced techniques that we're seeing around deep learning is looking at computer vision, um, leveraging geospatial information, looking at social network entity analysis, uh, also looking at, um, agent-based modeling techniques, where we're looking at, uh, simulation Monte Carlo type techniques that we typically see in the financial services industry, actually applying that to fraud, waste, and abuse within the, uh, the public sector. >>Um, and again, that really lends itself to a new opportunities. And on that, I'm going to turn it over to Shev to talk about, uh, the reference architecture for, uh, doing these baskets. >>Thanks, Cindy. Um, so I'm going to walk you through an example, reference architecture for fraud detection using, uh, Cloudera underlying technology. Um, and you know, before I get into the technical details, uh, I want to talk about how this would be implemented at a much higher level. So with fraud detection, what we're trying to do is identify anomalies or novelists behavior within our data sets. Um, now in order to understand what aspects of our incoming data represents anomalous behavior, we first need to understand what normal behavior is. So in essence, once we understand normal behavior, anything that deviates from it can be thought of as an anomaly, right? So in order to understand what normal behavior is, we're going to need to be able to collect store and process a very large amount of historical data. And so then comes clutter's platform and this reference architecture that needs to before you, so, uh, let's start on the left-hand side of this reference architecture with the collect phase. >>So fraud detection will always begin with data collection. Uh, we need to collect large amounts of information from systems that could be in the cloud. It could be in the data center or even on edge devices, and this data needs to be collected so we can create our normal behavior profiles. And these normal behavioral profiles would then in turn, be used to create our predictive models for fraudulent activity. Now, uh, uh, to the data collection side, one of the main challenges that many organizations face, uh, in this phase, uh, involves using a single technology that can handle, uh, data that's coming in all different types of formats and protocols and standards with different porosities and velocities. Um, let me give you an example. Uh, we could be collecting data from a database that gets updated daily, uh, and maybe that data is being collected in Agra format. >>At the same time, we can be collecting data from an edge device that's streaming in every second, and that data may be coming in Jason or a binary format, right? So this is a data collection challenge that can be solved with clutter data flow, which is a suite of technologies built on Apache NIFA and mini five, allowing us to ingest all of this data, do a drag and drop interface. So now we're collecting all of this data, that's required to map out normal behavior. The next thing that we need to do is enrich it, transform it and distribute it to, uh, you know, downstream systems for further process. Uh, so let's, let's walk through how that would work first. Let's taking Richmond for, uh, for enrichment, think of adding additional information to your incoming data, right? Let's take, uh, financial transactions, for example, uh, because Cindy mentioned it earlier, right? >>You can store known locations of an individual in an operational database, uh, with Cloudera that would be HBase. And as an individual makes a new transaction, their geo location that's in that transaction data, it can be enriched with previously known locations of that very same individual and all of that enriched data. It can be later used downstream for predictive analysis, predictable. So the data has been enrich. Uh, now it needs to be transformed. We want the data that's coming in, uh, you know, Avro and Jason and binary and whatever other format to be transformed into a single common format. So it can be used downstream for stream processing. Uh, again, this is going to be done through clutter and data flow, which is backed by NIFA, right? So the transformed semantic data is then going to be stimulated to Kafka and coffin. It's going to serve as that central repository of syndicated services or a buffer zone, right? >>So cough is, you know, pretty much provides you with, uh, extremely fast resilient and fault tolerance storage. And it's also going to give you the consumer APIs that you need that are going to enable a wide variety of applications to leverage that enriched and transformed data within your buffer zone. Uh, I'll add that, you know, 17, so you can store that data, uh, in a distributed file system, give you that historical context that you're going to need later on for machine learning, right? So the next step in the architecture is to leverage a cluttered SQL string builder, which enables us to write, uh, streaming sequel jobs on top of Apache Flink. So we can, uh, filter, analyze and, uh, understand the data that's in the Kafka buffer zone in real time. Uh I'll you know, I'll also add like, you know, if you have time series data, or if you need a lab type of cubing, you can leverage kudu, uh, while EDA or exploratory data analysis and visualization, uh, can all be enabled through clever visual patient technology. >>All right, so we've filtered, we've analyzed and we've explored our incoming data. We can now proceed to train our machine learning models, uh, which will detect anomalous behavior in our historically collected data set, uh, to do this, we can use a combination of supervised unsupervised, uh, even deep learning techniques with neural networks and these models can be tested on new incoming streaming data. And once we've gone ahead and obtain the accuracy of the performance, the scores that we want, we can then take these models and deploy them into production. And once the models are productionalized or operationalized, they can be leveraged within our streaming pipeline. So as new data is ingested in real-time knife, I can query these models to detect if the activity is anomalous or fraudulent. And if it is, they can alert downstream users and systems, right? So this in essence is how fraudulent activity detection works. >>Uh, and this entire pipeline is powered by clutter's technology, right? And so, uh, the IRS is one of, uh, clutters customers. That's leveraging our platform today and implementing, uh, a very similar architecture, uh, to detect fraud, waste, and abuse across a very large set of, uh, historical facts, data. Um, and one of the neat things with the IRS is that they've actually, uh, recently leveraged the partnership between Cloudera and Nvidia to accelerate their Spark-based analytics and their machine learning. Uh, and the results have been nothing short of amazing, right? And in fact, we have a quote here from Joe and salty who's, uh, you know, the technical branch chief for the research analytics and statistics division group within the IRS with zero changes to our fraud detection workflow, we're able to obtain eight times to performance simply by adding GPS to our mainstream big data servers. This improvement translates to half the cost of ownership for the same workloads, right? So embedding GPU's into the reference architecture I covered earlier has enabled the IRS to improve their time to insights by as much as eight X while simultaneously reducing their underlying infrastructure costs by half, uh, Cindy back to you >>Chef. Thank you. Um, and I hope that you found, uh, some of the, the analysis, the information that Sheva and I have provided, um, to give you some insights on how cloud era is actually helping, uh, with the fraud waste and abuse challenges within the, uh, the public sector, um, specifically looking at any and all types of data, how the clutter a platform is bringing together and analyzing information, whether it be you're structured you're semi-structured to unstructured data, both in a fast or in a real time perspective, looking at anomalies, being able to do some of those on detection methods, uh, looking at neural network analysis, time series information. So next steps we'd love to have an additional conversation with you. You can also find on some additional information around, uh, how quad areas working in the federal government by going to cloudera.com solutions slash public sector. And we welcome scheduling a meeting with you again, thank you for joining Chevy and I today, we greatly appreciate your time and look forward to future >>Conversation..

Published Date : Aug 5 2021

SUMMARY :

So as we look at fraud, So as we also look at a So as we look at those areas, what are the areas that we see additional So I think one of the key elements is, you know, you can look at your, looking at, uh, deep learning type models around, uh, you know, So as we're looking at, you know, from a, um, an audit planning or looking and the value behind this is, um, how do we actually look at increasing Um, also looking at increasing the amount of, uh, the level of compliance, And on that, I'm going to turn it over to Shev to talk about, uh, the reference architecture for, before I get into the technical details, uh, I want to talk about how this would be implemented at a much higher It could be in the data center or even on edge devices, and this data needs to be collected so uh, you know, downstream systems for further process. So the data has been enrich. So the next step in the architecture is to leverage a cluttered SQL string builder, historically collected data set, uh, to do this, we can use a combination of supervised And in fact, we have a quote here from Joe and salty who's, uh, you know, the technical branch chief for the the analysis, the information that Sheva and I have provided, um, to give you some insights on

ENTITIES

Entity	Category	Confidence
Cindy Mikey	PERSON	0.99+
Nvidia	ORGANIZATION	0.99+
Molly	PERSON	0.99+
2017	DATE	0.99+
patrick	PERSON	0.99+
NVIDIA	ORGANIZATION	0.99+
PWC	ORGANIZATION	0.99+
Cindy	PERSON	0.99+
Patrick Osbourne	PERSON	0.99+
Joe	PERSON	0.99+
Peter	PERSON	0.99+
NIFA	ORGANIZATION	0.99+
Today	DATE	0.99+
today	DATE	0.99+
HP	ORGANIZATION	0.99+
Cloudera	ORGANIZATION	0.99+
over $65 billion	QUANTITY	0.99+
over $51 billion	QUANTITY	0.99+
last year	DATE	0.99+
Shev	PERSON	0.99+
57 billion	QUANTITY	0.99+
IRS	ORGANIZATION	0.99+
Sheva	PERSON	0.98+
Jason	PERSON	0.98+
first	QUANTITY	0.98+
both	QUANTITY	0.97+
one	QUANTITY	0.97+
HPE	ORGANIZATION	0.97+
Intel	ORGANIZATION	0.97+
Avro	PERSON	0.96+
salty	PERSON	0.95+
eight X	QUANTITY	0.95+
Apache	ORGANIZATION	0.94+
single technology	QUANTITY	0.92+
eight times	QUANTITY	0.92+
91 billion	QUANTITY	0.91+
zero changes	QUANTITY	0.9+
next year	DATE	0.9+
caldera	ORGANIZATION	0.9+
Chev	ORGANIZATION	0.87+
Richmond	LOCATION	0.85+
three prong	QUANTITY	0.85+
$148 billion	QUANTITY	0.84+
single common format	QUANTITY	0.83+
SQL	TITLE	0.82+
Kafka	PERSON	0.82+
Chevy	PERSON	0.8+
HP Labs	ORGANIZATION	0.8+
one individual	QUANTITY	0.8+
Patrick	PERSON	0.78+
Monte Carlo	TITLE	0.76+
half	QUANTITY	0.75+
over half	QUANTITY	0.68+
17	QUANTITY	0.65+
second	QUANTITY	0.65+
HBase	TITLE	0.56+
elements	QUANTITY	0.53+
Apache Flink	ORGANIZATION	0.53+
cloudera.com	OTHER	0.5+
coffin	PERSON	0.5+
Spark	TITLE	0.49+
Lake	COMMERCIAL_ITEM	0.48+
HPE	TITLE	0.47+
mini five	COMMERCIAL_ITEM	0.45+
Green	ORGANIZATION	0.37+

PUBLIC SECTOR V1 | CLOUDERA

>>Hi, this is Cindy Mikey, vice president of industry solutions at caldera. Joining me today is chef is Molly, our solution engineer for the public sector. Today. We're going to talk about speed to insight. Why using machine learning in the public sector, specifically around fraud, waste and abuse. So topic for today, we'll discuss machine learning, why the public sector uses it to target fraud, waste, and abuse, the challenges. How do we enhance your data and analytical approaches the data landscape analytical methods and shad we'll go over reference architecture and a case study. So by definition, fraud, waste and abuse per the government accountability office is fraud. Isn't an attempt to obtain something about value through unwelcome misrepresentation waste is about squandering money or resources and abuse is about behaving improperly or unreasonably to actually obtain something of value for your personal benefit. So as we look at fraud, um, and across all industries, it's a top of mind, um, area within the public sector. >>Um, the types of fraud that we see is specifically around cyber crime, uh, looking at accounting fraud, whether it be from an individual perspective to also, uh, within organizations, looking at financial statement fraud, to also looking at bribery and corruption, as we look at fraud, it really hits us from all angles, whether it be from external perpetrators or internal perpetrators, and specifically from the research by PWC, the key focus area is we also see over half of fraud is actually through some form of internal or external, uh, perpetrators again, key topics. So as we also look at a report recently by the association of certified fraud examiners, um, within the public sector, the us government, um, in 2017, it was identified roughly $148 billion was attributable to fraud, waste and abuse. Specifically about 57 billion was focused on reported monetary losses and another 91 billion on areas where that opportunity or the monetary basis had not yet been measured. >>As we look at breaking those areas down again, we look at several different topics from permit out payment perspective. So breaking it down within the health system, over $65 billion within social services, over $51 billion to procurement fraud to also, um, uh, fraud, waste and abuse that's happening in the grants and the loan process to payroll fraud, and then other aspects, again, quite a few different topical areas. So as we look at those areas, what are the areas that we see additional type of focus, there's a broad stroke areas. What are the actual use cases that our agencies are using the data landscape? What data, what analytical methods can we use to actually help curtail and prevent some of the, uh, the fraud waste and abuse. So, as we look at some of the analytical processes and analytical use crate, uh, use cases in the public sector, whether it's from, uh, you know, the taxation areas to looking at, you know, social services, uh, to public safety, to also the, um, our, um, uh, additional agency methods, we're gonna use focused specifically on some of the use cases around, um, you know, fraud within the tax area. >>Uh, we'll briefly look at some of the aspects of, um, unemployment insurance fraud, uh, benefit fraud, as well as payment and integrity. So fraud has it it's, um, uh, underpinnings inquiry, like you different on government agencies and difficult, different analytical methods, and I usage of different data. So I think one of the key elements is, you know, you can look at your, your data landscape on specific data sources that you need, but it's really about bringing together different data sources across a different variety, a different velocity. So, uh, data has different dimensions. So we'll look at structured types of data of semi-structured data, behavioral data, as well as when we look at, um, you know, predictive models. We're typically looking at historical type information, but if we're actually trying to look at preventing fraud before it actually happens, or when a case may be in flight, which is specifically a use case that shad is going to talk about later is how do I look at more of that? >>Real-time that streaming information? How do I take advantage of data, whether it be, uh, you know, uh, financial transactions we're looking at, um, asset verification, we're looking at tax records, we're looking at corporate filings. Um, and we can also look at more, uh, advanced data sources where as we're looking at, um, investigation type information. So we're maybe going out and we're looking at, uh, deep learning type models around, uh, you know, semi or that, uh, behavioral, uh, that's unstructured data, whether it be camera analysis and so forth. So for quite a different variety of data and the, the breadth and the opportunity really comes about when you can integrate and look at data across all different data sources. So in a looking at a more extensive, uh, data landscape. So specifically I want to focus on some of the methods, some of the data sources and some of the analytical techniques that we're seeing, uh, being used, um, in the government agencies, as well as opportunities, uh, to look at new methods. >>So as we're looking at, you know, from a, um, an audit planning or looking at, uh, the opportunity for the likelihood of non-compliance, um, specifically we'll see data sources where we're maybe looking at a constituents profile, we might actually be investigating the forms that they've provided. We might be comparing that data, um, or leveraging internal data sources, possibly looking at net worth, comparing it against other financial data, and also comparison across other constituents groups. Some of the techniques that we use are some of the basic natural language processing, maybe we're going to do some text mining. We might be doing some probabilistic modeling, uh, where we're actually looking at, um, information within the agency to also comparing that against possibly tax forms. A lot of times it's information historically has been done on a batch perspective, both structured and semi-structured type information. And typically the data volumes can be low, but we're also seeing those data volumes on increase exponentially based upon the types of events that we're dealing with, the number of transactions. >>Um, so getting the throughput, um, and chef's going to specifically talk about that in a moment. The other aspect is, as we look at other areas of opportunity is when we're building upon, how do I actually do compliance? How do I actually look at conducting audits, uh, or potential fraud to also looking at areas of under-reported tax information? So there you might be pulling in some of our other types of data sources, whether it's being property records, it could be data that's being supplied by the actual constituents or by vendors to also pulling in social media information to geographical information, to leveraging photos on techniques that we're seeing used is possibly some sentiment analysis, link analysis. Um, how do we actually blend those data sources together from a natural language processing? But I think what's important here is also the method and the looking at the data velocity, whether it be batch, whether it be near real time, again, looking at all types of data, whether it's structured semi-structured or unstructured and the key and the value behind this is, um, how do we actually look at increasing the potential revenue or the, um, under reported revenue? >>Uh, how do we actually look at stopping fraudulent payments before they actually occur? Um, also looking at increasing the amount of, uh, the level of compliance, um, and also looking at the potential of prosecution of fraud cases. And additionally, other areas of opportunity could be looking at, um, economic planning. How do we actually perform some link analysis? How do we bring some more of those things that we saw in the data landscape on customer, or, you know, constituent interaction, bringing in social media, bringing in, uh, potentially police records, property records, um, other tax department, database information. Um, and then also looking at comparing one individual to other individuals, looking at people like a specific, like a constituent, are there areas where we're seeing, uh, >>Um, other >>Aspects of, of fraud potentially being occurring. Um, and also as we move forward, some of the more advanced techniques that we're seeing around deep learning is looking at computer vision, um, leveraging geospatial information, looking at social network entity analysis, uh, also looking at, uh, agent-based modeling techniques, where we're looking at simulation Monte Carlo type techniques that we typically see in the financial services industry, actually applying that to fraud, waste, and abuse within the, uh, the public sector. Um, and again, that really, uh, lends itself to a new opportunities. And on that, I'm going to turn it over to chef to talk about, uh, the reference architecture for, uh, doing these buckets. >>Thanks, Cindy. Um, so I'm gonna walk you through an example, reference architecture for fraud detection using, uh, Cloudera's underlying technology. Um, and you know, before I get into the technical details, uh, I want to talk about how this would be implemented at a much higher level. So with fraud detection, what we're trying to do is identify anomalies or novelists behavior within our datasets. Um, now in order to understand what aspects of our incoming data represents anomalous behavior, we first need to understand what normal behavior is. So in essence, once we understand normal behavior, anything that deviates from it can be thought of as an anomaly, right? So in order to understand what normal behavior is, we're going to need to be able to collect store and process a very large amount of historical data. And so incomes, clutters platform, and this reference architecture that needs to be for you. >>So, uh, let's start on the left-hand side of this reference architecture with the collect phase. So fraud detection will always begin with data collection. We need to collect large amounts of information from systems that could be in the cloud. It could be in the data center or even on edge devices, and this data needs to be collected so we can create our normal behavior profiles. And these normal behavioral profiles would then in turn, be used to create our predictive models for fraudulent activity. Now, uh, thinking, uh, to the data collection side, one of the main challenges that many organizations face, uh, in this phase, uh, involves using a single technology that can handle, uh, data that's coming in all different types of formats and protocols and standards with different velocities and velocities. Um, let me give you an example. Uh, we could be collecting data from a database that gets updated daily, uh, and maybe that data is being collected in Agra format. >>At the same time, we can be collecting data from an edge device that's streaming in every second, and that data may be coming in Jason or a binary format, right? So this is a data collection challenge that can be solved with cluttered data flow, which is a suite of technologies built on a patch NIFA in mini five, allowing us to ingest all of this data, do a drag and drop interface. So now we're collecting all of this data, that's required to map out normal behavior. The next thing that we need to do is enrich it, transform it and distribute it to, uh, you know, downstream systems for further process. Uh, so let's, let's walk through how that would work first. Let's taking Richmond for, uh, for enrichment, think of adding additional information to your incoming data, right? Let's take, uh, financial transactions, for example, uh, because Cindy mentioned it earlier, right? >>You can store known locations of an individual in an operational database, uh, with Cloudera that would be HBase. And as an individual makes a new transaction, their geolocation that's in that transaction data can be enriched with previously known locations of that very same individual. And all of that enriched data can be later used downstream for predictive analysis, predictable. So the data has been enrich. Uh, now it needs to be transformed. We want the data that's coming in, uh, you know, Avro and Jason and binary and whatever other format to be transformed into a single common format. So it can be used downstream for stream processing. Uh, again, this is going to be done through clutter and data flow, which is backed by NIFA, right? So the transformed semantic data is then going to be stricted to Kafka and coffin. It's going to serve as that central repository of syndicated services or a buffer zone, right? >>So coffee is going to pretty much provide you with, uh, extremely fast resilient and fault tolerance storage. And it's also gonna give you the consumer APIs that you need that are going to enable a wide variety of applications to leverage that enriched and transformed data within your buffer zone, uh, allowed that, you know, 17. So you can store that data in a distributed file system, give you that historical context that you're going to need later on for machine learning, right? So the next step in the architecture is to leverage a cluttered SQL stream builder, which enables us to write, uh, streaming SQL jobs on top of Apache Flink. So we can, uh, filter, analyze and, uh, understand the data that's in the Kafka buffer in real time. Uh I'll you know, I'll also add like, you know, if you have time series data, or if you need a lab type of cubing, you can leverage kudu, uh, while EDA or, you know, exploratory data analysis and visualization, uh, can all be enabled through clever visualization technology. >>All right, so we've filtered, we've analyzed and we've explored our incoming data. We can now proceed to train our machine learning models, uh, which will detect anomalous behavior in our historically collected data set, uh, to do this, we can use a combination of supervised unsupervised, uh, even deep learning techniques with neural networks. And these models can be tested on new incoming streaming data. And once we've gone ahead and obtain the accuracy of the performance, the scores that we want, we can then take these models and deploy them into production. And once the models are productionalized or operationalized, they can be leveraged within our streaming pipeline. So as new data is ingested in real-time knife, I can query these models to detect if the activity is anomalous or fraudulent. And if it is, they can alert downstream users and systems, right? So this in essence is how fraudulent activity detection works. >>Uh, and this entire pipeline is powered by clutters technology, right? And so, uh, the IRS is one of, uh, clutter's customers. That's leveraging our platform today and implementing, uh, a very similar architecture, uh, to detect fraud, waste, and abuse across a very large set of historical facts, data. Um, and one of the neat things with the IRS is that they've actually recently leveraged the partnership between Cloudera and Nvidia to accelerate their spark based analytics and their machine learning, uh, and the results have been nothing short of amazing, right? And in fact, we have a quote here from Joe and salty who's, uh, you know, the technical branch chief for the research analytics and statistics division group within the IRS with zero changes to our fraud detection workflow, we're able to obtain eight times to performance simply by adding GPS to our mainstream big data servers. This improvement translates to half the cost of ownership for the same workloads, right? So embedding GPU's into the reference architecture I covered earlier has enabled the IRS to improve their time to insights by as much as eight X while simultaneously reducing their underlying infrastructure costs by half, uh, Cindy back to you >>Chef. Thank you. Um, and I hope that you found, uh, some of the, the analysis, the information that Sheva and I have provided, um, to give you some insights on how cloud era is actually helping, uh, with the fraud waste and abuse challenges within the, uh, the public sector, um, specifically looking at any and all types of data, how the clutter platform is bringing together and analyzing information, whether it be you're structured you're semi-structured to unstructured data, both in a fast or in a real-time perspective, looking at anomalies, being able to do some of those on detection, uh, looking at neural network analysis, time series information. So next steps we'd love to have additional conversation with you. You can also find on some additional information around, I have caught areas working in the, the federal government by going to cloudera.com solutions slash public sector. And we welcome scheduling a meeting with you again, thank you for joining us Sheva and I today. We greatly appreciate your time and look forward to future progress. >>Good day, everyone. Thank you for joining me. I'm Sydney. Mike joined by Rick Taylor of Cloudera. Uh, we're here to talk about predictive maintenance for the public sector and how to increase assets, service, reliability on today's agenda. We'll talk specifically around how to optimize your equipment maintenance, how to reduce costs, asset failure with data and analytics. We'll go into a little more depth on, um, what type of data, the analytical methods that we're typically seeing used, um, the associated, uh, Brooke, we'll go over a case study as well as a reference architecture. So by basic definition, uh, predictive maintenance is about determining when an asset should be maintained and what specific maintenance activities need to be performed either based upon an assets of actual condition or state. It's also about predicting and preventing failures and performing maintenance on your time on your schedule to avoid costly unplanned downtime. >>McKinsey has looked at analyzing predictive maintenance costs across multiple industries and has identified that there's the opportunity to reduce overall predictive maintenance costs by roughly 50% with different types of analytical methods. So let's look at those three types of models. First, we've got our traditional type of method for maintenance, and that's really about our corrective maintenance, and that's when we're performing maintenance on an asset, um, after the equipment fails. But the challenges with that is we end up with unplanned. We end up with disruptions in our schedules, um, as well as reduced quality, um, around the performance of the asset. And then we started looking at preventive maintenance and preventative maintenance is really when we're performing maintenance on a set schedule. Um, the challenges with that is we're typically doing it regardless of the actual condition of the asset, um, which has resulted in unnecessary downtime and expense. Um, and specifically we're really now focused on pre uh, condition-based maintenance, which is looking at leveraging predictive maintenance techniques based upon actual conditions and real time events and processes. Um, within that we've seen organizations, um, and again, source from McKenzie have a 50% reduction in downtime, as well as an overall 40% reduction in maintenance costs. Again, this is really looking at things across multiple industries, but let's look at it in the context of the public sector and based upon some activity by the department of energy, um, several years ago, >>Um, they've really >>Looked at what does predictive maintenance mean to the public sector? What is the benefit, uh, looking at increasing return on investment of assets, reducing, uh, you know, reduction in downtime, um, as well as overall maintenance costs. So corrective or reactive based maintenance is really about performing once there's been a failure. Um, and then the movement towards, uh, preventative, which is based upon a set schedule or looking at predictive where we're monitoring real-time conditions. Um, and most importantly is now actually leveraging IOT and data and analytics to further reduce those overall downtimes. And there's a research report by the, uh, department of energy that goes into more specifics, um, on the opportunity within the public sector. So, Rick, let's talk a little bit about what are some of the challenges, uh, regarding data, uh, regarding predictive maintenance. >>Some of the challenges include having data silos, historically our government organizations and organizations in the commercial space as well, have multiple data silos. They've spun up over time. There are multiple business units and note, there's no single view of assets. And oftentimes there's redundant information stored in, in these silos of information. Uh, couple that with huge increases in data volume data growing exponentially, along with new types of data that we can ingest there's social media, there's semi and unstructured data sources and the real time data that we can now collect from the internet of things. And so the challenge is to collect all these assets together and begin to extract intelligence from them and insights and, and that in turn then fuels, uh, machine learning and, um, and, and what we call artificial intelligence, which enables predictive maintenance. Next slide. So >>Let's look specifically at, you know, the, the types of use cases and I'm going to Rick and I are going to focus on those use cases, where do we see predictive maintenance coming into the procurement facility, supply chain, operations and logistics. Um, we've got various level of maturity. So, you know, we're talking about predictive maintenance. We're also talking about, uh, using, uh, information, whether it be on a, um, a connected asset or a vehicle doing monitoring, uh, to also leveraging predictive maintenance on how do we bring about, uh, looking at data from connected warehouses facilities and buildings all bring on an opportunity to both increase the quality and effectiveness of the missions within the agencies to also looking at re uh, looking at cost efficiency, as well as looking at risk and safety and the types of data, um, you know, that Rick mentioned around, you know, the new types of information, some of those data elements that we typically have seen is looking at failure history. >>So when has that an asset or a machine or a component within a machine failed in the past? Uh, we've also looking at bringing together a maintenance history, looking at a specific machine. Are we getting error codes off of a machine or assets, uh, looking at when we've replaced certain components to looking at, um, how are we actually leveraging the assets? What were the operating conditions, uh, um, pulling off data from a sensor on that asset? Um, also looking at the, um, the features of an asset, whether it's, you know, engine size it's make and model, um, where's the asset located on to also looking at who's operated the asset, uh, you know, whether it be their certifications, what's their experience, um, how are they leveraging the assets and then also bringing in together, um, some of the, the pattern analysis that we've seen. So what are the operating limits? Um, are we getting service reliability? Are we getting a product recall information from the actual manufacturer? So, Rick, I know the data landscape has really changed. Let's, let's go over looking at some of those components. Sure. >>So this slide depicts sort of the, some of the inputs that inform a predictive maintenance program. So, as we've talked a little bit about the silos of information, the ERP system of record, perhaps the spares and the service history. So we want, what we want to do is combine that information with sensor data, whether it's a facility and equipment sensors, um, uh, or temperature and humidity, for example, all this stuff is then combined together, uh, and then use to develop machine learning models that better inform, uh, predictive maintenance, because we'll do need to keep, uh, to take into account the environmental factors that may cause additional wear and tear on the asset that we're monitoring. So here's some examples of private sector, uh, maintenance use cases that also have broad applicability across the government. For example, one of the busiest airports in Europe is running cloud era on Azure to capture secure and correlate sensor data collected from equipment within the airport, the people moving equipment more specifically, the escalators, the elevators, and the baggage carousels. >>The objective here is to prevent breakdowns and improve airport efficiency and passenger safety. Another example is a container shipping port. In this case, we use IOT data and machine learning, help customers recognize how their cargo handling equipment is performing in different weather conditions to understand how usage relates to failure rates and to detect anomalies and transport systems. These all improve for another example is Navistar Navistar, leading manufacturer of commercial trucks, buses, and military vehicles. Typically vehicle maintenance, as Cindy mentioned, is based on miles traveled or based on a schedule or a time since the last service. But these are only two of the thousands of data points that can signal the need for maintenance. And as it turns out, unscheduled maintenance and vehicle breakdowns account for a large share of the total cost for vehicle owner. So to help fleet owners move from a reactive approach to a more predictive model, Navistar built an IOT enabled remote diagnostics platform called on command. >>The platform brings in over 70 sensor data feeds for more than 375,000 connected vehicles. These include engine performance, trucks, speed, acceleration, cooling temperature, and break where this data is then correlated with other Navistar and third-party data sources, including weather geo location, vehicle usage, traffic warranty, and parts inventory information. So the platform then uses machine learning and advanced analytics to automatically detect problems early and predict maintenance requirements. So how does the fleet operator use this information? They can monitor truck health and performance from smartphones or tablets and prioritize needed repairs. Also, they can identify that the nearest service location that has the relevant parts, the train technicians and the available service space. So sort of wrapping up the, the benefits Navistar's helped fleet owners reduce maintenance by more than 30%. The same platform is also used to help school buses run safely. And on time, for example, one school district with 110 buses that travel over a million miles annually reduce the number of PTOs needed year over year, thanks to predictive insights delivered by this platform. >>So I'd like to take a moment and walk through the data. Life cycle is depicted in this diagram. So data ingest from the edge may include feeds from the factory floor or things like connected vehicles, whether they're trucks, aircraft, heavy equipment, cargo vessels, et cetera. Next, the data lands on a secure and governed data platform. Whereas combined with data from existing systems of record to provide additional insights, and this platform supports multiple analytic functions working together on the same data while maintaining strict security governance and control measures once processed the data is used to train machine learning models, which are then deployed into production, monitored, and retrained as needed to maintain accuracy. The process data is also typically placed in a data warehouse and use to support business intelligence, analytics, and dashboards. And in fact, this data lifecycle is representative of one of our government customers doing condition-based maintenance across a variety of aircraft. >>And the benefits they've discovered include less unscheduled maintenance and a reduction in mean man hours to repair increased maintenance efficiencies, improved aircraft availability, and the ability to avoid cascading component failures, which typically cost more in repair cost and downtime. Also, they're able to better forecast the requirements for replacement parts and consumables and last, and certainly very importantly, this leads to enhanced safety. This chart overlays the secure open source Cloudera platform used in support of the data life cycle. We've been discussing Cloudera data flow, the data ingest data movement and real time streaming data query capabilities. So data flow gives us the capability to bring data in from the asset of interest from the internet of things. While the data platform provides a secure governed data lake and visibility across the full machine learning life cycle eliminates silos and streamlines workflows across teams. The platform includes an integrated suite of secure analytic applications. And two that we're specifically calling out here are Cloudera machine learning, which supports the collaborative data science and machine learning environment, which facilitates machine learning and AI and the cloud era data warehouse, which supports the analytics and business intelligence, including those dashboards for leadership Cindy, over to you, Rick, >>Thank you. And I hope that, uh, Rick and I provided you some insights on how predictive maintenance condition-based maintenance is being used and can be used within your respective agency, bringing together, um, data sources that maybe you're having challenges with today. Uh, bringing that, uh, more real-time information in from a streaming perspective, blending that industrial IOT, as well as historical information together to help actually, uh, optimize maintenance and reduce costs within the, uh, each of your agencies, uh, to learn a little bit more about Cloudera, um, and our, what we're doing from a predictive maintenance please, uh, business@cloudera.com solutions slash public sector. And we look forward to scheduling a meeting with you, and on that, we appreciate your time today and thank you very much.

Published Date : Aug 4 2021

SUMMARY :

So as we look at fraud, Um, the types of fraud that we see is specifically around cyber crime, So as we look at those areas, what are the areas that we see additional So I think one of the key elements is, you know, you can look at your, the breadth and the opportunity really comes about when you can integrate and Some of the techniques that we use and the value behind this is, um, how do we actually look at increasing Um, also looking at increasing the amount of, uh, the level of compliance, I'm going to turn it over to chef to talk about, uh, the reference architecture for, before I get into the technical details, uh, I want to talk about how this would be implemented at a much higher level. It could be in the data center or even on edge devices, and this data needs to be collected At the same time, we can be collecting data from an edge device that's streaming in every second, So the data has been enrich. So the next step in the architecture is to leverage a cluttered SQL stream builder, obtain the accuracy of the performance, the scores that we want, Um, and one of the neat things with the IRS the analysis, the information that Sheva and I have provided, um, to give you some insights on the analytical methods that we're typically seeing used, um, the associated, doing it regardless of the actual condition of the asset, um, uh, you know, reduction in downtime, um, as well as overall maintenance costs. And so the challenge is to collect all these assets together and begin the types of data, um, you know, that Rick mentioned around, you know, the new types on to also looking at who's operated the asset, uh, you know, whether it be their certifications, So we want, what we want to do is combine that information with So to help fleet So the platform then uses machine learning and advanced analytics to automatically detect problems So data ingest from the edge may include feeds from the factory floor or things like improved aircraft availability, and the ability to avoid cascading And I hope that, uh, Rick and I provided you some insights on how predictive

ENTITIES

Entity	Category	Confidence
Cindy Mikey	PERSON	0.99+
Rick	PERSON	0.99+
Rick Taylor	PERSON	0.99+
Molly	PERSON	0.99+
Nvidia	ORGANIZATION	0.99+
2017	DATE	0.99+
PWC	ORGANIZATION	0.99+
40%	QUANTITY	0.99+
110 buses	QUANTITY	0.99+
Europe	LOCATION	0.99+
50%	QUANTITY	0.99+
Cindy	PERSON	0.99+
Mike	PERSON	0.99+
Joe	PERSON	0.99+
Cloudera	ORGANIZATION	0.99+
Today	DATE	0.99+
today	DATE	0.99+
Navistar	ORGANIZATION	0.99+
First	QUANTITY	0.99+
two	QUANTITY	0.99+
more than 30%	QUANTITY	0.99+
over $51 billion	QUANTITY	0.99+
NIFA	ORGANIZATION	0.99+
over $65 billion	QUANTITY	0.99+
IRS	ORGANIZATION	0.99+
over a million miles	QUANTITY	0.99+
first	QUANTITY	0.98+
one	QUANTITY	0.98+
Jason	PERSON	0.98+
Azure	TITLE	0.98+
Brooke	PERSON	0.98+
Avro	PERSON	0.98+
one school district	QUANTITY	0.98+
SQL	TITLE	0.97+
both	QUANTITY	0.97+
$148 billion	QUANTITY	0.97+
Sheva	PERSON	0.97+
three types	QUANTITY	0.96+
each	QUANTITY	0.95+
McKenzie	ORGANIZATION	0.95+
more than 375,000 connected vehicles	QUANTITY	0.95+
Cloudera	TITLE	0.95+
about 57 billion	QUANTITY	0.95+
salty	PERSON	0.94+
several years ago	DATE	0.94+
single technology	QUANTITY	0.94+
eight times	QUANTITY	0.93+
91 billion	QUANTITY	0.93+
eight X	QUANTITY	0.92+
business@cloudera.com	OTHER	0.92+
McKinsey	ORGANIZATION	0.92+
zero changes	QUANTITY	0.92+
Monte Carlo	TITLE	0.92+
caldera	ORGANIZATION	0.91+
couple	QUANTITY	0.9+
over 70 sensor data feeds	QUANTITY	0.88+
Richmond	LOCATION	0.84+
Navistar Navistar	ORGANIZATION	0.82+
single view	QUANTITY	0.81+
17	OTHER	0.8+
single common format	QUANTITY	0.8+
thousands of data points	QUANTITY	0.79+
Sydney	LOCATION	0.78+

Greg Tinker, SereneIT | CUBEConversation, November 2019

(upbeat music) >> Hi, and welcome to another CUBEConversation where we go in-depth into the topics that are most important to the technology industry with the thought leaders who are actually getting the work done. I'm Peter Burris, and we've got a great conversation today, and it all starts with the idea of how do you get smart people outside of your organization, in-service organizations to help you achieve your outcomes? It's a challenge because as we become more dependent upon services, we discover that service companies are often trying to sell us bills of goods or visions that aren't solving our exact problem. There's a new breed of service company that's really fascinated by your problem, and wants to sell it. Starts with engineering, starts with value add, and then leads to other types of potential relationships and activities. So what do those service companies look like? Well, to have that conversation, we've got Greg Tinker, who is the CTO and founder of Serene IT. Greg, welcome back to theCUBE. >> Thank you very much Peter, glad to be here. >> So tell us a little bit about Serene IT. >> So Serene IT is a, well we call it a next generation bar. So what do I mean by that? We mean that we are an engineering-first firm, so our staff is big, we're across the U.S., we have multiple branches and we just went international into Canada, with Serene IT Canada. We have other international branches that we coming online next year. So with that being said though, the key to our growth, the key to our success is the fact that we're an engineering firm first. We have very few sales staff. Our sales staff are more of an account management style, more of a nurturer or a farmer, we would call it, versus a hunter that means someone going out, because the customers are coming to us with their problems because they need a smart engineering bench to help them. They're not looking for somebody else's to bring them askew, or resell them a product. That can be easily done by some of the large conglomerates that are already out there, not to mention, spend 30 seconds on Google, you can pretty much buy anything you want. >> Yeah, and you know Fred Brookes said a million years ago, when I was, even before I got into computer science, wrote "The Mythical Man Month", and made the observation that the solution to a hard problem typically, is not more people, >> Right. >> It's working smarter, and working more with the right people. So tell a little about how you're able to find the right people from the industry, and bring them together to turn them into the right team. >> It's a great question, Peter, so I've been very fortunate. I loved my career at Hewlett Packard. I left on good terms because I saw a problem in the industry that I wanted to go and tackle head-on. It's easy for people to sit back and talk about it, it's more difficult to actually go and try to solve the problem, and I'm trying to solve the problem. The problem is, there's a lot of orders out there that bring very low value today, they bring a lot of resale. And that's great for those clients that just know what they want. The vast majority of customers don't know what they want today because the technologies are so advanced, they need help to get from where they were, a legacy model, to a more modern software-defined ecosystem. >> And the business problems are so complex. >> Yes. >> It's that combination of complex business problems, 'cause your competitions and your customers are pushing you, and now advanced technologies that have to be marshaled to solve those problems. >> That's exactly right, so with that being said, I set out build an engineering firm and resale would be something later, but we sell through the engineering consulting firms to solve those business problems for our clients. And so our engineering bench is comprised of engineers from Cisco, from Dell, from HPE, from a lot of big conglomerates that everybody all knows. But when you work in this industry, in the labs of these big conglomerates, me coming from HPE, when you do that, you get a lot of friends across the pillars. >> Sure. >> You build networks. >> You build networks. And quite frankly, it's the Marvel lab guys that own today Q-Logic. We all know each other, and with that being said, some of these guys want to go out and try to solve these big problems with companies like myself, and so with that being said, that's how we're building Serene IT, is engineering-first, and we have a very large technical bench today. Just think about it, the company came online in 2017 with just two, so today, we are significantly bigger than that. We're approaching a 50-plus headcount, and we continue to expand with multiple branches, and our growth rate is almost double every six months. And it's something I'm having a great deal of fun doing. The key thing here though is solving business problems and helping customers. >> Well let's talk about that, because every IT organization faces the challenge that they've been so focused on the hardware assets for so long, or the application assets. Now they're trying to focus on the data assets, but they find themselves often in conflict with the business They're not doing a particularly good job of translating a business opportunity into a technology solution still. >> True. >> You've got these great engineers. How are you getting them to also speak business, so that you facilitate that domain expertise about the business so it can be turned into a technology-reliable solution? >> Like any good engineering firm, you have to have levels right? So we have a knock all the way to level four, and our level four engineers are our master technologists that are usually patent published or some varied nature thereof, with usually a multitude of master ASC certification structures to be able to state the fact that they are level four. We also have some college kids that are coming up that are wanting to learn with us, which is good. But I want to tell you on that same point though, is we only allow those elite, the level three, the level four guys, to be in front of our clients, because they've been in this industry a long time. Like myself, we can understand the business problems, as well as the technology problems, and help a client go from zero to hero. That's what we do well. >> So you're bringing in people who have been business people, but have strong engineering backgrounds >> Correct. >> In product domains, in service domains, in the industry, and you're bringing them together and saying, let's go back to being engineers, that can still talk business. >> That's exactly it, that's the key differentiator with us, is the fact that we're not talking just essays, a lot of ours, in our mindsets have essays they call engineers. We don't hire anyone that can't put fingers on a keyboard. If they can't make magic happen on a keyboard, they're of no value to us, they're of no value to our clients, which is what they need help with. So if we're not able to sit down and have a conversation and pull out a laptop and make some some magic happen with, name it, Ansible, Puppet, Shell, Saltstack, that's just in automation CodeLogics, C-code we've got all the cool stuff in that space. But if we can't sit down and write Python, Ruby on Rails and whatnot, and make something tangible to a client in very short order, we didn't do our job. >> So a lot of companies that I've experienced, a lot of customers I've talked to, have what I would call the "goldilocks" problem with their service providers. By that I mean, some of their service providers don't have the technical chops to just throw numbers at it, so they're too cold. Some of their service providers are too smart, or pushing too hard and they get suspicious of them. How do you be that just right, stay focused on the problem bringing the other team, the engineers or the IT folks that you're working with along with you, so you get that natural technology transfer so the business gets the capability that it can run and you can go do something else? >> So that's a good point, Peter. I mean, we're still working out some of those details, I'll tell you, to be honest with you on that stuff. >> Everybody is. >> Yeah. We're getting better at it, you know customers. If we get to aggressive, and tell the customers this is what's wrong with your problem, this is where you need to go, we call their baby ugly, it puts a lot of contention right on the onset, so it causes problems. So we have to be very cognitive of what they have, and where they want to go, and show them where we're going and why we're doing it, and not just focus on "You did it the wrong way". We don't want to focus on that. That's already done, that ship's already sailed, why bash it? I tell my engineers don't talk negative, there's no good going to come of it. Focus on what you have, and where you need to go with it, and how we're going to get there. Keep it a positive message, and you'll find it'd be more receptive, and it's working for our team. >> Well I'll tell you, one of the things I've heard about Serene IT is that you guys especially developed competencies in technologies that have worked in the past. >> You can say that. >> It seems as though one of the things you're able to do is you're able not to make something so new and so distinct that the client can't see how they can possibly operate it without you. You're taking a lot of open-source, a lot of established tried-and-true technologies and using your smarts to put them together in new and interesting ways so the customer says, "Oh that was smart, that was smart. "I can do that, oh yes, now I get it". Is that, am I mis-characterizing your guys? >> No, you're not, you're actually spot-on. We actually have one of the largest ZFS file systems on the planet right now with 142 million users hitting it and-- >> ZFS? >> Yeah, it's old school. >> With 142 million, okay. >> Yeah, it's old-school But if what's old is new again, we're just putting a new wrapper around it. It worked great in its day, but you put that old technology, the file system itself that's been around for a long time, one of the biggest file systems at 128 bit. You take that file system and you put that on today's Red Hat, Caldera, SUSE, name your favorite. You put that on a big machine, a Linux machine today, a large scale like an HPDL380 with NVME drives with a back-end data store, like a 3PAR or Primäre, or name whatever you want on the back end with a big fiber channel, you'd be surprised what we can do with that thing. So we're able to keep customers' costs down by showing them we can take a old-school technology and make it far bigger than you ever imagined, and give you more horsepower and at less cost, and customers are really receptive to that. Now is that perfect for every footprint? No, that was a unique situation. Not everybody's got 142 million users.(chuckles) >> Well, that's true. And so let me build on that, because the other thing that the CIOs I talk to and senior IT people and also business people, increasingly, is they want to make sure that the solution works now, but that it's not going to end-of-life options for them. >> Yeah. >> How do you do this using tried-and-true technologies combined into new and interesting ways, in a way that still nonetheless gives customers future growth options or future application options? >> I'm not a fan of vendor-locking, I'm not a fan of Franken-monsters. Our team of engineers, we have a mandate that they do not build anything like that, I won't approve it. Because I don't want to have a customer locked in to Serene IT. That was never the intent. We want them to choose us, we want them to come to our team and get our value, so we can show them how to grow their business, and do it in a nice, sustainable way, so we can show their staff how to support it. That takes us into our managed services component. Most of the big things we design and do, we're what we call an adaptive managed services, an AMS model. What do I mean by that statement? We're not a WITO. What's a WITO, you ask? It's a "Walk In, Take Over". That's the big boys, that's the DXEs of the world, that's the Assentras, that's what they do. And they do that well. We're not here to compete with that. But what we're here to do is say, to a company or business, whoever they might be, you probably don't need us to take over everything in your IT shop, and really, we're not going to be the best at that, nor are they in some cases, the other vendors. I'll tell you, you know your business the best. We know infrastructure the best, and we can show you where you can build your skillsets up and get better at it. We can automate a lot of it and show you how to manage the automation, and there'll be certain key points that maybe you guys don't want to own for various reasons, and we will manage just that key component, and we do that today with a lot of our big clients. >> Greg Tinker, CTO and founder of Serene IT, thanks very much for being on theCUBE. >> Thank you, Peter. >> And once again, I want to thank you for participating in this CUBEConversation. Until next time. (upbeat music)

Published Date : Nov 6 2019

SUMMARY :

and it all starts with the idea of how do you get the key to our growth, the key to our success and bring them together to turn them into the right team. I left on good terms because I saw a problem in the industry that have to be marshaled to solve those problems. from a lot of big conglomerates that everybody all knows. and we continue to expand with multiple branches, faces the challenge that they've been so focused on so that you facilitate that domain expertise But I want to tell you on that same point though, and you're bringing them together and saying, That's exactly it, that's the key differentiator with us, So a lot of companies that I've experienced, So that's a good point, Peter. and not just focus on "You did it the wrong way". is that you guys especially developed competencies that the client can't see We actually have one of the largest ZFS file systems You take that file system and you put that because the other thing that the CIOs I talk to and we can show you where Greg Tinker, CTO and founder of Serene IT, And once again, I want to thank you for participating

ENTITIES

Entity	Category	Confidence
Greg Tinker	PERSON	0.99+
Peter	PERSON	0.99+
Greg	PERSON	0.99+
Peter Burris	PERSON	0.99+
Cisco	ORGANIZATION	0.99+
Canada	LOCATION	0.99+
Hewlett Packard	ORGANIZATION	0.99+
2017	DATE	0.99+
Dell	ORGANIZATION	0.99+
Serene IT	ORGANIZATION	0.99+
Fred Brookes	PERSON	0.99+
HPE	ORGANIZATION	0.99+
next year	DATE	0.99+
Marvel	ORGANIZATION	0.99+
today	DATE	0.99+
128 bit	QUANTITY	0.99+
Python	TITLE	0.99+
30 seconds	QUANTITY	0.99+
Q-Logic	ORGANIZATION	0.99+
November 2019	DATE	0.99+
The Mythical Man Month	TITLE	0.99+
142 million	QUANTITY	0.99+
Ruby on Rails	TITLE	0.98+
U.S.	LOCATION	0.98+
two	QUANTITY	0.98+
zero	QUANTITY	0.98+
142 million users	QUANTITY	0.97+
Google	ORGANIZATION	0.97+
Linux	TITLE	0.97+
one	QUANTITY	0.96+
Ansible	ORGANIZATION	0.95+
CUBEConversation	EVENT	0.94+
first firm	QUANTITY	0.91+
Saltstack	ORGANIZATION	0.9+
level three	QUANTITY	0.9+
a million years ago	DATE	0.9+
50-plus headcount	QUANTITY	0.87+
level four	QUANTITY	0.86+
every six months	QUANTITY	0.84+
Shell	ORGANIZATION	0.82+
first	QUANTITY	0.8+
Puppet	ORGANIZATION	0.71+
Red Hat	TITLE	0.7+
HPDL380	COMMERCIAL_ITEM	0.69+
Serene	ORGANIZATION	0.68+
3PAR	OTHER	0.68+
double	QUANTITY	0.67+
SUSE	TITLE	0.66+
WITO	ORGANIZATION	0.63+
CodeLogics	TITLE	0.55+
SereneIT	ORGANIZATION	0.54+
Caldera	ORGANIZATION	0.52+
CTO	PERSON	0.49+
Primäre	TITLE	0.49+
NVME	TITLE	0.43+
Franken	ORGANIZATION	0.39+

DD, Cisco + Han Yang, Cisco | theCUBE NYC 2018

>> Live from New York, It's the CUBE! Covering theCUBE, New York City 2018. Brought to you by SiliconANGLE Media and its Ecosystem partners. >> Welcome back to the live CUBE coverage here in New York City for CUBE NYC, #CubeNYC. This coverage of all things data, all things cloud, all things machine learning here in the big data realm. I'm John Furrier and Dave Vellante. We've got two great guests from Cisco. We got DD who is the Vice President of Data Center Marketing at Cisco, and Han Yang who is the Senior Product Manager at Cisco. Guys, welcome to the Cube. Thanks for coming on again. >> Good to see ya. >> Thanks for having us. >> So obviously one of the things that has come up this year at the Big Data Show, used to be called Hadoop World, Strata Data, now it's called, the latest name. And obviously CUBE NYC, we changed from Big Data NYC to CUBE NYC, because there's a lot more going on. I heard hallway conversations around blockchain, cryptocurrency, Kubernetes has been said on theCUBE already at least a dozen times here today, multicloud. So you're seeing the analytical world try to be, in a way, brought into the dynamics around IT infrastructure operations, both cloud and on premises. So interesting dynamics this year, almost a dev ops kind of culture to analytics. This is a new kind of sign from this community. Your thoughts? >> Absolutely, I think data and analytics is one of those things that's pervasive. Every industry, it doesn't matter. Even at Cisco, I know we're going to talk a little more about the new AI and ML workload, but for the last few years, we've been using AI and ML techniques to improve networking, to improve security, to improve collaboration. So it's everywhere. >> You mean internally, in your own IT? >> Internally, yeah. Not just in IT, in the way we're designing our network equipment. We're storing data that's flowing through the data center, flowing in and out of clouds, and using that data to make better predictions for better networking application performance, security, what have you. >> The first topic I want to talk to you guys about is around the data center. Obviously, you do data center marketing, that's where all the action is. The cloud, obviously, has been all the buzz, people going to the cloud, but Andy Jassy's announcement at VMworld really is a validation that we're seeing, for the first time, hybrid multicloud validated. Amazon announced RDS on VMware on-premises. >> That's right. This is the first time Amazon's ever done anything of this magnitude on-premises. So this is a signal from the customers voting with their wallet that on-premises is a dynamic. The data center is where the data is, that's where the main footprint of IT is. This is important. What's the impact of that dynamic, of data center, where the data is with the option of a cloud. How does that impact data, machine learning, and the things that you guys see as relevant? >> I'll start and Han, feel free to chime in here. So I think those boundaries between this is a data center, and this a cloud, and this is campus, and this is the edge, I think those boundaries are going away. Like you said, data center is where the data is. And it's the ability of our customers to be able to capture that data, process it, curate it, and use it for insight to take decision locally. A drone is a data center that flies, and boat is a data center that floats, right? >> And a cloud is a data center that no one sees. >> That's right. So those boundaries are going away. We at Cisco see this as a continuum. It's the edge cloud continuum. The edge is exploding, right? There's just more and more devices, and those devices are cranking out more data than ever before. Like I said, it's the ability of our customers to harness the data to make more meaningful decisions. So Cisco's take on this is the new architectural approach. It starts with the network, because the network is the one piece that connects everything- every device, every edge, every individual, every cloud. There's a lot of data within the network which we're using to make better decisions. >> I've been pretty close with Cisco over the years, since '95 timeframe. I've had hundreds of meetings, some technical, some kind of business. But I've heard that term edge the network many times over the years. This is not a new concept at Cisco. Edge of the network actually means something in Cisco parlance. The edge of the network >> Yeah. >> that the packets are moving around. So again, this is not a new idea at Cisco. It's just materialized itself in a new way. >> It's not, but what's happening is the edge is just now generating so much data, and if you can use that data, convert it into insight and make decisions, that's the exciting thing. And that's why this whole thing about machine learning and artificial intelligence, it's the data that's being generated by these cameras, these sensors. So that's what is really, really interesting. >> Go ahead, please. >> One of our own studies pointed out that by 2021, there will be 847 zettabytes of information out there, but only 1.3 zettabytes will actually ever make it back to the data center. That just means an opportunity for analytics at the edge to make sense of that information before it ever makes it home. >> What were those numbers again? >> I think it was like 847 zettabytes of information. >> And how much makes it back? >> About 1.3. >> Yeah, there you go. So- >> So a huge compression- >> That confirms your research, Dave. >> We've been saying for a while now that most of the data is going to stay at the edge. There's no reason to move it back. The economics don't support it, the latency doesn't make sense. >> The network cost alone is going to kill you. >> That's right. >> I think you really want to collect it, you want to clean it, and you want to correlate it before ever sending it back. Otherwise, sending that information, of useless information, that status is wonderful. Well that's not very valuable. And 99.9 percent, "things are going well." >> Temperature hasn't changed. (laughs) >> If it really goes wrong, that's when you want to alert or send more information. How did it go bad? Why did it go bad? Those are the more insightful things that you want to send back. >> This is not just for IoT. I mean, cat pictures moving between campuses cost money too, so why not just keep them local, right? But the basic concepts of networking. This is what I want to get in my point, too. You guys have some new announcements around UCS and some of the hardware and the gear and the software. What are some of the new announcements that you're announcing here in New York, and what does it mean for customers? Because they want to know not only speeds and feeds. It's a software-driven world. How does the software relate? How does the gear work? What's the management look like? Where's the control plane? Where's the management plane? Give us all the data. >> I think the biggest issues starts from this. Data scientists, their task is to export different data sources, find out the value. But at the same time, IT is somewhat lagging behind. Because as the data scientists go from data source A to data source B, it could be 3 petabytes of difference. IT is like, 3 petabytes? That's only from Monday through Wednesday? That's a huge infrastructure requirement change. So Cisco's way to help the customer is to make sure that we're able to come out with blueprints. Blueprints enabling the IT team to scale, so that the data scientists can work beyond their own laptop. As they work through the petabytes of data that's come in from all these different sources, they're able to collaborate well together and make sense of that information. Only by scaling with IT helping the data scientists to work the scale, that's the only way they can succeed. So that's why we announced a new server. It's called a C480ML. Happens to have 8 GPUs from Nvidia inside helping customers that want to do that deep learning kind of capabilities. >> What are some of the use cases on these as products? It's got some new data capabilities. What are some of the impacts? >> Some of the things that Han just mentioned. For me, I think the biggest differentiation in our solution is things that we put around the box. So the management layer, right? I mean, this is not going to be one server and one data center. It's going to be multiple of them. You're never going to have one data center. You're going to have multiple data centers. And we've got a really cool management tool called Intersight, and this is supported in Intersight, day one. And Intersight also uses machine learning techniques to look at data from multiple data centers. And that's really where the innovation is. Honestly, I think every vendor is bend sheet metal around the latest chipset, and we've done the same. But the real differentiation is how we manage it, how we use the data for more meaningful insight. I think that's where some of our magic is. >> Can you add some code to that, in terms of infrastructure for AI and ML, how is it different than traditional infrastructures? So is the management different? The sheet metal is not different, you're saying. But what are some of those nuances that we should understand. >> I think especially for deep learning, multiple scientists around the world have pointed that if you're able to use GPUs, they're able to run the deep learning frameworks faster by roughly two waters magnitude. So that's part of the reason why, from an infrastructure perspective, we want to bring in that GPUs. But for the IT teams, we didn't want them to just add yet another infrastructure silo just to support AI or ML. Therefore, we wanted to make sure it fits in with a UCS-managed unified architecture, enabling the IT team to scale but without adding more infrastructures and silos just for that new workload. But having that unified architecture, it helps the IT to be more efficient and, at the same time, is better support of the data scientists. >> The other thing I would add is, again, the things around the box. Look, this industry is still pretty nascent. There is lots of start-ups, there is lots of different solutions, and when we build a server like this, we don't just build a server and toss it over the fence to the customer and say "figure it out." No, we've done validated design guides. With Google, with some of the leading vendors in the space to make sure that everything works as we say it would. And so it's all of those integrations, those partnerships, all the way through our systems integrators, to really understand a customer's AI and ML environment and can fine tune it for the environment. >> So is that really where a lot of the innovation comes from? Doing that hard work to say, "yes, it's going to be a solution that's going to work in this environment. Here's what you have to do to ensure best practice," etc.? Is that right? >> So I think some of our blueprints or validated designs is basically enabling the IT team to scale. Scale their stores, scale their CPU, scale their GPU, and scale their network. But do it in a way so that we work with partners like Hortonworks or Cloudera. So that they're able to take advantage of the data lake. And adding in the GPU so they're able to do the deep learning with Tensorflow, with Pytorch, or whatever curated deep learning framework the data scientists need to be able to get value out of those multiple data sources. These are the kind of solutions that we're putting together, making sure our customers are able to get to that business outcome sooner and faster, not just a-- >> Right, so there's innovation at all altitudes. There's the hardware, there's the integrations, there's the management. So it's innovation. >> So not to go too much into the weeds, but I'm curious. As you introduce these alternate processing units, what is the relationship between traditional CPUs and these GPUs? Are you managing them differently, kind of communicating somehow, or are they sort of fenced off architecturally. I wonder if you could describe that. >> We actually want it to be integrated, because by having it separated and fenced off, well that's an IT infrastructure silo. You're not going to have the same security policy or the storage mechanisms. We want it to be unified so it's easier on IT teams to support the data scientists. So therefore, the latest software is able to manage both CPUs and GPUs, as well as having a new file system. Those are the solutions that we're putting forth, so that ARC-IT folks can scale, our data scientists can succeed. >> So IT's managing a logical block. >> That's right. And even for things like inventory management, or going back and adding patches in the event of some security event, it's so much better to have one integrated system rather than silos of management, which we see in the industry. >> So the hard news is basically UCS for AI and ML workloads? >> That's right. This is our first server custom built ground up to support these deep learning, machine learning workloads. We partnered with Nvidia, with Google. We announced earlier this week, and the phone is ringing constantly. >> I don't want to say godbot. I just said it. (laughs) This is basically the power tool for deep learning. >> Absolutely. >> That's how you guys see it. Well, great. Thanks for coming out. Appreciate it, good to see you guys at Cisco. Again, deep learning dedicated technology around the box, not just the box itself. Ecosystem, Nvidia, good call. Those guys really get the hot GPUs out there. Saw those guys last night, great success they're having. They're a key partner with you guys. >> Absolutely. >> Who else is partnering, real quick before we end the segment? >> We've been partnering with software sci, we partner with folks like Anaconda, with their Anaconda Enterprise, which data scientists love to use as their Python data science framework. We're working with Google, with their Kubeflow, which is open source project integrating Tensorflow on top of Kubernetes. And of course we've been working with folks like Caldera as well as Hortonworks to access the data lake from a big data perspective. >> Yeah, I know you guys didn't get a lot of credit. Google Cloud, we were certainly amplifying it. You guys were co-developing the Google Cloud servers with Google. I know they were announcing it, and you guys had Chuck on stage there with Diane Greene, so it was pretty positive. Good integration with Google can make a >> Absolutely. >> Thanks for coming on theCUBE, thanks, we appreciate the commentary. Cisco here on theCUBE. We're in New York City for theCUBE NYC. This is where the world of data is converging in with IT infrastructure, developers, operators, all running analytics for future business. We'll be back with more coverage, after this short break. (upbeat digital music)

Published Date : Sep 12 2018

SUMMARY :

It's the CUBE! Welcome back to the live CUBE coverage here So obviously one of the things that has come up this year but for the last few years, Not just in IT, in the way we're designing is around the data center. and the things that you guys see as relevant? And it's the ability of our customers to It's the edge cloud continuum. The edge of the network that the packets are moving around. is the edge is just now generating so much data, analytics at the edge Yeah, there you go. that most of the data is going to stay at the edge. I think you really want to collect it, (laughs) Those are the more insightful things and the gear and the software. the data scientists to work the scale, What are some of the use cases on these as products? Some of the things that Han just mentioned. So is the management different? it helps the IT to be more efficient in the space to make sure that everything works So is that really where a lot of the data scientists need to be able to get value There's the hardware, there's the integrations, So not to go too much into the weeds, Those are the solutions that we're putting forth, in the event of some security event, and the phone is ringing constantly. This is basically the power tool for deep learning. Those guys really get the hot GPUs out there. to access the data lake from a big data perspective. the Google Cloud servers with Google. This is where the world of data

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Nvidia	ORGANIZATION	0.99+
Cisco	ORGANIZATION	0.99+
Han Yang	PERSON	0.99+
Google	ORGANIZATION	0.99+
New York	LOCATION	0.99+
Diane Greene	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
2021	DATE	0.99+
New York City	LOCATION	0.99+
Andy Jassy	PERSON	0.99+
8 GPUs	QUANTITY	0.99+
847 zettabytes	QUANTITY	0.99+
John Furrier	PERSON	0.99+
99.9 percent	QUANTITY	0.99+
Monday	DATE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
3 petabytes	QUANTITY	0.99+
Anaconda	ORGANIZATION	0.99+
Wednesday	DATE	0.99+
DD	PERSON	0.99+
first time	QUANTITY	0.99+
one server	QUANTITY	0.99+
Cloudera	ORGANIZATION	0.99+
Python	TITLE	0.99+
first topic	QUANTITY	0.99+
one piece	QUANTITY	0.99+
VMworld	ORGANIZATION	0.99+
'95	DATE	0.98+
1.3 zettabytes	QUANTITY	0.98+
NYC	LOCATION	0.98+
both	QUANTITY	0.98+
one	QUANTITY	0.98+
this year	DATE	0.98+
Big Data Show	EVENT	0.98+
Caldera	ORGANIZATION	0.98+
two waters	QUANTITY	0.97+
today	DATE	0.97+
Chuck	PERSON	0.97+
One	QUANTITY	0.97+
Big Data	ORGANIZATION	0.97+
earlier this week	DATE	0.97+
Intersight	ORGANIZATION	0.97+
hundreds of meetings	QUANTITY	0.97+
CUBE	ORGANIZATION	0.97+
first server	QUANTITY	0.97+
last night	DATE	0.95+
one data center	QUANTITY	0.94+
UCS	ORGANIZATION	0.92+
petabytes	QUANTITY	0.92+
two great guests	QUANTITY	0.9+
Tensorflow	TITLE	0.86+
CUBE NYC	ORGANIZATION	0.86+
Han	PERSON	0.85+
#CubeNYC	LOCATION	0.83+
Strata Data	ORGANIZATION	0.83+
Kubeflow	TITLE	0.82+
Hadoop World	ORGANIZATION	0.81+
2018	DATE	0.8+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for caldera: