David C King, FogHorn Systems | CUBEConversation, November 2018

(uplifting orchestral music) >> Hey, welcome back, everybody. Jeff Frick here with theCUBE. We're at the Palo Alto studios, having theCUBE Conversation, a little break in the action of the conference season before things heat up, before we kind of come to the close of 2018. It's been quite a year. But it's nice to be back in the studio. Things are a little bit less crazy, and we're excited to talk about one of the really hot topics right now, which is edge computing, fog computing, cloud computing. What do all these things mean, how do they all intersect, and we've got with us today David King. He's the CEO of FogHorn Systems. David, first off, welcome. >> Thank you, Jeff. >> So, FogHorn Systems, I guess by the fog, you guys are all about the fog, and for those that don't know, fog is kind of this intersection between cloud, and on prem, and... So first off, give us a little bit of the background of the company and then let's jump into what this fog thing is all about. >> Sure, actually, it all dovetails together. So yeah, you're right, FogHorn, the name itself, came from Cisco's invented term, called fog computing, from almost a decade ago, and it connoted this idea of computing at the edge, but didn't really have a lot of definition early on. And so, FogHorn was started actually by a Palo Alto Incubator, just nearby here, that had the idea that hey, we got to put some real meaning and some real meat on the bones here, with fog computing. And what we think FogHorn has become over the last three and a half years, since we took it out of the incubator, since I joined, was to put some real purpose, meaning, and value in that term. And so, it's more than just edge computing. Edge computing is a related term. In the industrial world, people would say, hey, I've had edge computing for three, 40, 50 years with my production line control and also my distributed control systems. I've got hard wired compute. I run, they call them, industrial PCs in the factory. That's edge compute. The IT roles come along and said, no, no, no, fog compute is a more advanced form of it. Well, the real purpose of fog computing and edge computing, in our view, in the modern world, is to apply what has traditionally been thought of as cloud computing functions, big, big data, but running in an industrial environment, or running on a machine. And so, we call it as really big data operating in the world's smallest footprint, okay, and the real point of this for industrial customers, which is our primary focus, industrial IoT, is to deliver as much analytic machine learning, deep learning AI capability on live-streaming sensor data, okay, and what that means is rather than persisting a lot of data either on prem, and then sending it to the cloud, or trying to stream all this to the cloud to make sense of terabytes or petabytes a day, per machine sometimes, right, think about a jet engine, a petabyte every flight. You want to do the compute as close to the source as possible, and if possible, on the live streaming data, not after you've persisted it on a big storage system. So that's the idea. >> So you touch on all kinds of stuff there. So we'll break it down. >> Unpack it, yeah. >> Unpack it. So first off, just kind of the OT/IT thing, and I think that's really important, and we talked before turning the cameras on about Dr. Tom from HP, he loves to make a big symbolic handshake of the operations technology, >> One of our partners. >> Right, and IT, and the marriage of these two things, where before, as you said, the OT guys, the guys that have been running factories, you know, they've been doing this for a long time, and now suddenly, the IT folks are butting in and want to get access to that data to provide more control. So, you know, as you see the marriage of those two things coming together, what are the biggest points of friction, and really, what's the biggest opportunity? >> Great set of questions. So, quite right, the OT folks are inherently suspicious of IT, right? I mean, if you don't know the history, 40 plus years ago, there was a fork in the road, where in factory operations, were they going to embrace things like ethernet, the internet, connected systems? In fact, they purposely air gapped an island of those systems 'cause they was all about machine control, real-time, for safety, productivity, and uptime of the machine. They don't want any, you can't use kind of standard ethernet, it has to be industrial ethernet, right? It has to have time bound and deterministic. It can't be a retry kind of a system, right? So different MAC layer for a reason, for example. What did the physical wiring look like? It's also different cabling, because you can't have cuts, jumps in the cable, right? So it's a different environment entirely that OT grew up in, and so, FogHorn is trying to really bring the value of what people are delivering for AI, essentially, into that environment in a way that's non-threatening to, it's supplemental to, and adds value in the OT world. So Dr. Tom is right, this idea of bringing IT and OT together is inherently challenging, because these were kind of fork in the road, island-ed in the networks, if you will, different systems, different nomenclature, different protocols, and so, there's a real education curve that IT companies are going through, and the idea of taking all this OT data that's already been produced in tremendous volumes already before you add new kinds of sensing, and sending it across a LAN which it's never talked to before, then across a WAN to go to a cloud, to get some insight doesn't make any sense, right? So you want to leverage the cloud, you want to leverage data centers, you want to leverage the LAN, you want to leverage 5G, you want to leverage all the new IT technologies, but you have to do it in a way that makes sense for it and adds value in the OT context. >> I'm just curious, you talked about the air gapping, the two systems, which means they are not connected, right? >> No, they're connected with a duct, they're connected to themselves, in the industrial-- >> Right, right, but before, the OT system was air gapped from the IT system, so thinking about security and those types of threats, now, if those things are connected, that security measure has gone away, so what is the excitement, adoption scare when now, suddenly, these things that were separate, especially in the age of breaches that we know happen all the time as you bring those things together? >> Well, in fact, there have been cyber breaches in the OT context. Think about Stuxnet, think about things that have happened, think about the utilities back keys that were found to have malwares implanted in them. And so, this idea of industrial IoT is very exciting, the ability to get real-time kind of game changing insights about your production. A huge amount of economic activity in the world could be dramatically improved. You can talk about trillions of dollars of value which the McKenzie, and BCG, and Bain talk about, right, by bringing kind of AI, ML into the plant environment. But the inherent problem is that by connecting the systems, you introduce security problems. You're talking about a huge amount of cost to move this data around, persist it then add value, and it's not real-time, right? So, it's not that cloud is not relevant, it's not that it's not used, it's that you want to do the compute where it makes sense, and for industrial, the more industrialized the environment, the more high frequency, high volume data, the closer to the system that you can do the compute, the better, and again, it's multi-layer of compute. You probably have something on the machine, something in the plant, and something in the cloud, right? But rather than send raw OT data to the cloud, you're going to send processed intelligent metadata insights that have already been derived at the edge, update what they call the fleet-wide digital twin, right? The digital twin for that whole fleet of assets should sit in the cloud, but the digital twin of the specific asset should probably be on the asset. >> So let's break that down a little bit. There's so much good stuff here. So, we talked about OT/IT and that marriage. Next, I just want to touch on cloud, 'cause a lot of people know cloud, it's very hot right now, and the ultimate promise of cloud, right, is you have infinite capacity >> Right, infinite compute. >> Available on demand, and you have infinite compute, and hopefully you have some big fat pipes to get your stuff in and out. But the OT challenge is, and as you said, the device challenge is very, very different. They've got proprietary operating systems, they've been running for a very, very long time. As you said, they put off boatloads, and boatloads, and boatloads of data that was never really designed to feed necessarily a machine learning algorithm, or an artificial intelligence algorithm when these things were designed. It wasn't really part of the equation. And we talk all the time about you know, do you move the compute to the data, you move the data to the compute, and really, what you're talking about in this fog computing world is kind of a hybrid, if you will, of trying to figure out which data you want to process locally, and then which data you have time, relevance, and other factors that just go ahead and pump it upstream. >> Right, that's a great way to describe it. Actually, we're trying to move as much of the compute as possible to the data. That's really the point of, that's why we say fog computing is a nebulous term about edge compute. It doesn't have any value until you actually decide what you're trying to do with it, and what we're trying to do is to take as much of the harder compute challenges, like analytics, machine learning, deep learning, AI, and bring it down to the source, as close to the source as you can, because you can essentially streamline or make more efficient every layer of the stack. Your models will get much better, right? You might have built them in the cloud initially, think about a deep learning model, but it may only be 60, 70% accurate. How do you do the improvement of the model to get it closer to perfect? I can't go send all the data up to keep trying to improve it. Well, typically, what happens is I down sample the data, I average it and I send it up, and I don't see any changes in the average data. Guess what? We should do is inference all the time and all the data, run it in our stack, and then send the metadata up, and then have the cloud look across all the assets of a similar type, and say, oh, the global fleet-wide model needs to be updated, and then to push it down. So, with Google just about a month ago, in Barcelona, at the IoT show, what we demonstrated was the world's first instance of AI for industrial, which is closed loop machine learning. We were taking a model, a TensorFlow model, trained in the cloud in the data center, brought into our stack and referring 100% inference-ing in all the live data, pushing the insights back up into Google Cloud, and then automatically updating the model without a human or data scientist having to look at it. Because essentially, it's ML on ML. And that to us, ML on ML is the foundation of AI for industrial. >> I just love that something comes up all the time, right? We used to make decisions based on the sampling of historical data after the fact. >> That's right, that's how we've all been doing it. >> Now, right, right now, the promise of streaming is you can make it based on all the data, >> All the time. >> All the time in real time. >> Permanently. >> This is a very different thing. So, but as you talked about, you know, running some complex models, and running ML, and retraining these things. You know, when you think of edge, you think of some little hockey puck that's out on the edge of a field, with limited power, limited connectivity, so you know, what's the reality of, how much power do you have at some of these more remote edges, or we always talk about the field of turbines, oil platforms, and how much power do you need, and how much compute that it actually starts to be meaningful in terms of the platform for the software? >> Right, there's definitely use cases, like you think about the smart meters, right, in the home. The older generation of those meters may have had very limited compute, right, like you know, talking about single megabyte of memory maybe, or less, right, kilobytes of memory. Very hard to run a stack on that kind of footprint. The latest generation of smart meters have about 250 megabytes of memory. A Raspberry Pi today is anywhere from a half a gig to a gig of memory, and we're fundamentally memory-bound, and obviously, CPU if it's trying to really fast compute, like vibration analysis, or acoustic, or video. But if you're just trying to take digital sensing data, like temperature, pressure, velocity, torque, we can take humidity, we can take all of that, believe it or not, run literally dozens and dozens of models, even train the models in something as small as a Raspberry Pi, or a low end x86. So our stack can run in any hardware, we're completely OS independent. It's a full up software layer. But the whole stack is about 100 megabytes of memory, with all the components, including Docker containerization, right, which compares to about 10 gigs of running a stream processing stack like Spark in the Cloud. So it's that order of magnitude of footprint reduction and speed of execution improvement. So as I said, world's smallest fastest compute engine. You need to do that if you're going to talk about, like a wind turbine, it's generating data, right, every millisecond, right. So you have high frequency data, like turbine pitch, and you have other conceptual data you're trying to bring in, like wind conditions, reference information about how the turbine is supposed to operate. You're bringing in a torrential amount of data to do this computation on the fly. And so, the challenge for a lot of the companies that have really started to move into the space, the cloud companies, like our partners, Google, and Amazon, and Microsoft, is they have great cloud capabilities for AI, ML. They're trying to move down to the edge by just transporting the whole stack to there. So in a plant environment, okay, that might work if you have massive data centers that can run it. Now I still got to stream all my assets, all the data from all of my assets to that central point. What we're trying to do is come out the opposite way, which is by having the world's smallest, fastest engine, we can run it in a small compute, very limited compute on the asset, or near the asset, or you can run this in a big compute and we can take on lots and lots of use cases for models simultaneously. >> I'm just curious on the small compute case, and again, you want all the data-- >> You want to inference another thing, right? >> Does it eventually go back, or is there a lot of cases where you can get the information you need off the stream and you don't necessarily have to save or send that upstream? >> So fundamentally today, in the OT world, the data usually gets, if the PLC, the production line controller, that has simple KPIs, if temperature goes to X or pressure goes to Y, do this. Those simple KPIs, if nothing is executed, it gets dumped into a local protocol server, and then about every 30, 60, 90 days, it gets written over. Nobody ever looks at it, right? That's why I say, 99% of the brown field data in OT has never really been-- >> Almost like a security-- >> Has never been mined for insight. Right, it just gets-- >> It runs, and runs, and runs, and every so often-- >> Exactly, and so, if you're doing inference-ing, and doing real time decision making, real time actual with our stack, what you would then persist is metadata insights, right? Here is an event, or here is an outcome, and oh, by the way, if you're doing deep learning or machine learning, and you're seeing deviation or drift from the model's prediction, you probably want to keep that and some of the raw data packets from that moment in time, and send that to the cloud or data center to say, oh, our fleet-wide model may not be accurate, or may be drifting, right? And so, what you want to do, again, different horses for different courses. Use our stack to do the lion's share of the heavy duty real time compute, produce metadata that you can send to either a data center or a cloud environment for further learning. >> Right, so your piece is really the gathering and the ML, and then if it needs to go back out for more heavy lifting, you'll send it back up, or do you have the cloud application as well that connects if you need? >> Yeah, so we build connectors to you know, Google Cloud Platform, Google IoT Core, to AWS S3, to Microsoft Azure, virtually any, Kafka, Hadoop. We can send the data wherever you want, either on plant, right back into the existing control systems, we can send it to OSIsoft PI, which is a great time series database that a lot of process industries use. You could of course send it to any public cloud or a Hadoop data lake private cloud. You can send the data wherever you want. Now, we also have, one of our components is a time series database. You can also persist it in memory in our stack, just for buffering, or if you have high value data that you want to take a measurement, a value from a previous calculation and bring it into another calculation during later, right, so, it's a very flexible system. >> Yeah, we were at OSIsoft PI World earlier this year. Some fascinating stories that came out of-- >> 30 year company. >> The building maintenance, and all kinds of stuff. So I'm just curious, some of the easy to understand applications that you've seen in the field, and maybe some of the ones that were a surprise on the OT side. I mean, obviously, preventative maintenance is always towards the top of the list. >> Yeah, I call it the layer cake, right? Especially when you get to remote assets that are either not monitored or lightly monitored. They call it drive-by monitoring. Somebody shows up and listens or looks at a valve or gauge and leaves. Condition-based monitoring, right? That is actually a big breakthrough for some, you know, think about fracking sites, or remote oil fields, or mining sites. The second layer is predictive maintenance, which the next generation is kind of predictive, prescriptive, even preventive maintenance, right? You're making predictions or you're helping to avoid downtime. The third layer, which is really where our stack is sort of unique today in delivering is asset performance optimization. How do I increase throughput, how do I reduce scrap, how do I improve worker safety, how do I get better processing of the data that my PLC can't give me, so I can actually improve the performance of the machine? Now, ultimately, what we're finding is a couple of things. One is, you can look at individual asset optimization, process optimization, but there's another layer. So often, we're deployed to two layers on premise. There's also the plant-wide optimization. We talked about wind farm before, off camera. So you've got the wind turbine. You can do a lot of things about turbine health, the blade pitch and condition of the blade, you can do things on the battery, all the systems on the turbine, but you also need a stack running, like ours, at that concentration point where there's 200 plus turbines that come together, 'cause the optimization of the whole farm, every turbine affects the other turbine, so a single turbine can't tell you speed, rotation, things that need to change, if you want to adjust the speed of one turbine, versus the one next to it. So there's also kind of a plant-wide optimization. Talking about time that's driving, there's going to be five layers of compute, right? You're going to have the, almost what I call the ECU level, the individual sub-system in the car that, the engine, how it's performing. You're going to have the gateway in the car to talk about things that are happening across systems in the car. You're going to have the peer to peer connection over 5G to talk about optimization right between vehicles. You're going to have the base station algorithms looking at a micro soil or macro soil within a geographic area, and of course, you'll have the ultimate cloud, 'cause you want to have the data on all the assets, right, but you don't want to send all that data to the cloud, you want to send the right metadata to the cloud. >> That's why there are big trucks full of compute now. >> By the way, you mentioned one thing that I should really touch on, which is, we've talked a lot about what I call traditional brown field automation and control type analytics and machine learning, and that's kind of where we started in discrete manufacturing a few years ago. What we found is that in that domain, and in oil and gas, and in mining, and in agriculture, transportation, in all those places, the most exciting new development this year is the movement towards video, 3D imaging and audio sensing, 'cause those sensors are now becoming very economical, and people have never thought about, well, if I put a camera and apply it to a certain application, what can I learn, what can I do that I never did before? And often, they even have cameras today, they haven't made use of any of the data. So there's a very large customer of ours who has literally video inspection data every product they produce everyday around the world, and this is in hundreds of plants. And that data never gets looked at, right, other than training operators like, hey, you missed the defects this day. The system, as you said, they just write over that data after 30 days. Well, guess what, you can apply deep learning tensor flow algorithms to build a convolutional neural network model and essentially do the human visioning, rather than an operator staring at a camera, or trying to look at training tapes. 30 days later, I'm doing inference-ing of the video image on the fly. >> So, do your systems close loop back to the control systems now, or is it more of a tuning mechanism for someone to go back and do it later? >> Great question, I just got asked that this morning by a large oil and gas super major that Intel just introduced us to. The short answer is, our stack can absolutely go right back into the control loop. In fact, one of our investors and partners, I should mention, our investors for series A was GE, Bosch, Yokogawa, Dell EMC, and our series debuted a year ago was Intel, Saudi Aramco, and Honeywell. So we have one foot in tech, one foot in industrial, and really, what we're really trying to bring is, you said, IT, OT together. The short answer is, you can do that, but typically in the industrial environment, there's a conservatism about, hey, I don't want to touch, you know, affect the machine until I've proven it out. So initially, people tend to start with alerting, so we send an automatic alert back into the control system to say, hey, the machine needs to be re-tuned. Very quickly, though, certainly for things that are not so time-sensitive, they will just have us, now, Yokogawa, one of our investors, I pointed out our investors, actually is putting us in PLCs. So rather than sending the data off the PLC to another gateway running our stack, like an x86 or ARM gateway, we're actually, those PLCs now have Raspberry Pi plus capabilities. A lot of them are-- >> To what types of mechanism? >> Well, right now, they're doing the IO and the control of the machine, but they have enough compute now that you can run us in a separate module, like the little brain sitting right next to the control room, and then do the AI on the fly, and there, you actually don't even need to send the data off the PLC. We just re-program the actuator. So that's where it's heading. It's eventually, and it could take years before people get comfortable doing this automatically, but what you'll see is that what AI represents in industrial is the self-healing machine, the self-improving process, and this is where it starts. >> Well, the other thing I think is so interesting is what are you optimizing for, and there is no right answer, right? It could be you're optimizing for, like you said, a machine. You could be optimizing for the field. You could be optimizing for maintenance, but if there is a spike in pricing, you may say, eh, we're not optimizing now for maintenance, we're actually optimizing for output, because we have this temporary condition and it's worth the trade-off. So I mean, there's so many ways that you can skin the cat when you have a lot more information and a lot more data. >> No, that's right, and I think what we typically like to do is start out with what's the business value, right? We don't want to go do a science project. Oh, I can make that machine work 50% better, but if it doesn't make any difference to your business operations, so what? So we always start the investigation with what is a high value business problem where you have sufficient data where applying this kind of AI and the edge concept will actually make a difference? And that's the kind of proof of concept we like to start with. >> So again, just to come full circle, what's the craziest thing an OT guy said, oh my goodness, you IT guys actually brought some value here that I didn't know. >> Well, I touched on video, right, so without going into the whole details of the story, one of our big investors, a very large oil and gas company, we said, look, you guys have done some great work with I call it software defined SCADA, which is a term, SCADA is the network environment for OT, right, and so, SCADA is what the PLCs and DCSes connect over these SCADA networks. That's the control automation role. And this investor said, look, you can come in, you've already shown us, that's why they invested, that you've gone into brown field SCADA environments, done deep mining of the existing data and shown value by reducing scrap and improving output, improving worker safety, all the great business outcomes for industrial. If you come into our operation, our plant people are going to say, no, you're not touching my PLC. You're not touching my SCADA network. So come in and do something that's non-invasive to that world, and so that's where we actually got started with video about 18 months ago. They said, hey, we've got all these video cameras, and we're not doing anything. We just have human operators writing down, oh, I had a bad event. It's a totally non-automated system. So we went in and did a video use case around, we call it, flare monitoring. You know, hundreds of stacks of burning of oil and gas in a production plant. 24 by seven team of operators just staring at it, writing down, oh, I think I had a bad flare. I mean, it's a very interesting old world process. So by automating that and giving them an AI dashboard essentially. Oh, I've got a permanent record of exactly how high the flare was, how smoky was it, what was the angle, and then you can then fuse that data back into plant data, what caused that, and also OSIsoft data, what was the gas composition? Was it in fact a safety violation? Was it in fact an environmental violation? So, by starting with video, and doing that use case, we've now got dozens of use cases all around video. Oh, I could put a camera on this. I could put a camera on a rig. I could've put a camera down the hole. I could put the camera on the pipeline, on a drone. There's just a million places that video can show up, or audio sensing, right, acoustic. So, video is great if you can see the event, like I'm flying over the pipe, I can see corrosion, right, but sometimes, like you know, a burner or an oven, I can't look inside the oven with a camera. There's no camera that could survive 600 degrees. So what do you do? Well, that's probably, you can do something like either vibration or acoustic. Like, inside the pipe, you got to go with sound. Outside the pipe, you go video. But these are the kind of things that people, traditionally, how did they inspect pipe? Drive by. >> Yes, fascinating story. Even again, I think at the end of the day, it's again, you can make real decisions based on all the data in real time, versus some of the data after the fact. All right, well, great conversation, and look forward to watching the continued success of FogHorn. >> Thank you very much. >> All right. >> Appreciate it. >> He's David King, I'm Jeff Frick, you're watching theCUBE. We're having a CUBE conversation at our Palo Alto studio. Thanks for watching, we'll see you next time. (uplifting symphonic music)

Published Date : Nov 16 2018

SUMMARY :

of the conference season the background of the company and the real point of this So you touch on Unpack it, of the OT/IT thing, and the marriage of these two things, and the idea of taking all this OT data and something in the cloud, right? and the ultimate promise of cloud, right, and then which data you have time, and all the data, all the time, right? That's right, that's how and how much power do you need, and you have other conceptual data 99% of the brown field data in OT Right, it just gets-- and some of the raw data packets You can send the data wherever you want. that came out of-- and maybe some of the ones the peer to peer connection over 5G of compute now. and essentially do the human visioning, back into the control system to say, and the control of the machine, You could be optimizing for the field. of AI and the edge concept So again, just to come full circle, Outside the pipe, you go video. based on all the data in real time, we'll see you next time.

ENTITIES

Entity	Category	Confidence
Jeff	PERSON	0.99+
David King	PERSON	0.99+
Bosch	ORGANIZATION	0.99+
GE	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
50%	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
Jeff Frick	PERSON	0.99+
600 degrees	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
David C King	PERSON	0.99+
David	PERSON	0.99+
Intel	ORGANIZATION	0.99+
November 2018	DATE	0.99+
FogHorn Systems	ORGANIZATION	0.99+
Yokogawa	ORGANIZATION	0.99+
Honeywell	ORGANIZATION	0.99+
Cisco	ORGANIZATION	0.99+
99%	QUANTITY	0.99+
100%	QUANTITY	0.99+
two systems	QUANTITY	0.99+
one foot	QUANTITY	0.99+
two things	QUANTITY	0.99+
three	QUANTITY	0.99+
Barcelona	LOCATION	0.99+
BCG	ORGANIZATION	0.99+
40	QUANTITY	0.99+
HP	ORGANIZATION	0.99+
third layer	QUANTITY	0.99+
Palo Alto	LOCATION	0.99+
seven team	QUANTITY	0.99+
One	QUANTITY	0.99+
second layer	QUANTITY	0.99+
2018	DATE	0.99+
Saudi Aramco	ORGANIZATION	0.99+
200 plus turbines	QUANTITY	0.99+
SCADA	TITLE	0.99+
60	QUANTITY	0.99+
two layers	QUANTITY	0.99+
McKenzie	ORGANIZATION	0.98+
Dr.	PERSON	0.98+
Tom	PERSON	0.98+
a year ago	DATE	0.98+
OSIsoft	ORGANIZATION	0.98+
Yokogawa	PERSON	0.98+
Dell EMC	ORGANIZATION	0.98+
first	QUANTITY	0.98+
today	DATE	0.98+
single megabyte	QUANTITY	0.98+
30 year	QUANTITY	0.98+
one	QUANTITY	0.97+
AWS	ORGANIZATION	0.97+
90 days	QUANTITY	0.97+
half a gig	QUANTITY	0.97+
about 10 gigs	QUANTITY	0.97+
earlier this year	DATE	0.97+
five layers	QUANTITY	0.96+
40 plus years ago	DATE	0.96+
one turbine	QUANTITY	0.96+
about 100 megabytes	QUANTITY	0.96+
about 250 megabytes	QUANTITY	0.96+
30 days later	DATE	0.96+
one thing	QUANTITY	0.96+
dozens	QUANTITY	0.96+
OSIsoft PI World	ORGANIZATION	0.95+
trillions of dollars	QUANTITY	0.95+
50 years	QUANTITY	0.94+
single turbine	QUANTITY	0.93+
Bain	PERSON	0.93+
Hadoop	TITLE	0.92+
60, 70%	QUANTITY	0.91+
a decade ago	DATE	0.9+
a month ago	DATE	0.89+
FogHorn	ORGANIZATION	0.89+
CUBE	ORGANIZATION	0.88+

Sastry Malladi, FogHorn | Big Data SV 2018

>> Announcer: Live from San Jose, it's theCUBE, presenting Big Data Silicon Valley, brought to you by SiliconANGLE Media and its ecosystem partner. (upbeat electronic music) >> Welcome back to The Cube. I'm Lisa Martin with George Gilbert. We are live at our event, Big Data SV, in downtown San Jose down the street from the Strata Data Conference. We're joined by a new guest to theCUBE, Sastry Malladi, the CTO Of FogHorn. Sastry, welcome to theCUBE. >> Thank you, thank you, Lisa. >> So FogHorn, cool name, what do you guys do, who are you? Tell us all that good stuff. >> Sure. We are a startup based in Silicon Valley right here in Mountain View. We started about three years ago, three plus years ago. We provide edge computing intelligence software for edge computing or fog computing. That's how our company name got started is FogHorn. For our particularly, for our IoT industrial sector. All of the industrial guys, whether it's transportation, manufacturing, oil and gas, smart cities, smart buildings, any of those different sectors, they use our software to predict failure conditions in real time, or do condition monitoring, or predictive maintenance, any of those use cases and successfully save a lot of money. Obviously in the process, you know, we get paid for what we do. >> So Sastry... GE populized this concept of IIoT and the analytics and, sort of the new business outcomes you could build on it, like Power by the Hour instead of selling a jet engine. >> Sastry: That's right. But there's... Actually we keep on, and David Floor did some pioneering research on how we're going to have to do a lot of analytics on the edge for latency and bandwidth. What's the FogHorn secret sauce that others would have difficulty with on the edge analytics? >> Okay, that's a great question. Before I directly answer the question, if you don't mind, I'll actually even describe why that's even important to do that, right? So a lot of these industrial customers, if you look at, because we work with a lot of them, the amount of data that's produced from all of these different machines is terabytes to petabytes of data, it's real. And it's not just the traditional digital sensors but there are video, audio, acoustic sensors out there. The amount of data is humongous, right? It's not even practical to send all of that to a Cloud environment and do data processing, for many reasons. One is obviously the connectivity, bandwidth issues, and all of that. But the two most important things are cyber security. None of these customers actually want to connect these highly expensive machines to the internet. That's one. The second is the lack of real-time decision making. What they want to know, when there is a problem, they want to know before it's too late. We want to notify them it is a problem that is occurring so that have a chance to go fix it and optimize their asset that is in question. Now, existing solutions do not work in this constrained environment. That's why FogHorn had to invent that solution. >> And tell us, actually, just to be specific, how constrained an environment you can operate in. >> We can run in about less than 100 to 150 megabytes of memory, single-core to dual-core of CPU, whether it's an ARM processor, an x86 Intel-based processor, almost literally no storage because we're a real-time processing engine. Optionally, you could have some storage if you wanted to store some of the results locally there but that's the kind of environment we're talking about. Now, when I say 100 megabytes of memory, it's like a quarter of Raspberry Pi, right? And even in that environment we have customers that run dozens of machinery models, right? And we're not talking -- >> George: Like an ensemble. >> Like an anomaly detection, a regression, a random forest, or a clustering, or a gamut, some of those. Now, if we get into more deep learning models, like image processing and neural net and all of that, you obviously need a little bit more memory. But what we have shown, we could still run, one of our largest smart city buildings customer, elevator company, runs in a raspberry Pi on millions of elevators, right? Dozens of machinery algorithms on top of that, right? So that's the kind of size we're talking about. >> Let me just follow up with one question on the other thing you said, with, besides we have to do the low-latency locally. You said a lot of customers don't want to connect these brown field, I guess, operations technology machines to the internet, and physically, I mean there was physical separation for security. So it's like security, Bill Joy used to say "Security by obscurity." Here it's security by -- >> Physical separation, absolutely. Tell me about it. I was actually coming from, if you don't mind, last week I was in Saudi Arabia. One of the oil and gas plants where we deployed our software, you have to go to five levels of security even to get to there, It's a multibillion dollar plant and refining the gas and all of that. Completely offline, no connectivity to the internet, and we installed, in their existing small box, our software, connected to their live video cameras that are actually measuring the stuff, doing the processing and detecting the specific conditions that we're looking for. >> That's my question, which was if they want to be monitoring. So there's like one low level, really low hardware low level, the sensor feeds. But you could actually have a richer feed, which is video and audio, but how much of that, then, are you doing the, sort of, inferencing locally? Or even retraining, and I assume that since it's not the OT device, and it's something that's looking at it, you might be more able to send it back up the Cloud if you needed to do retraining? >> That's exactly right. So the way the model works is particularly for image processing because you need, it's a more complex process to train than create a model. You could create a model offline, like in a GPU box, an FPGA box and whatnot. Import and bring the model back into this small little device that's running in the plant, and now the live video data is coming in, the model is inferencing the specific thing. Now there are two ways to update and revise the model: incremental revision of the model, you could do that if you want, or you can send the results to a central location. Not internet, they do have local, in this example for example a PIDB, an OSS PIDB, or some other local service out there, where you have an opportunity to gather the results from each of these different locations and then consolidate and retrain the model, put the model back again. >> Okay, the one part that I didn't follow completely is... If the model is running ultimately on the device, again and perhaps not even on a CPU, but a programmable logic controller. >> It could, even though a programmable controller also typically have some shape of CPU there as well. These days, most of the PLCs, programmable controllers, have either an RM-based processor or an x86-based processor. We can run either one of those too. >> So, okay, assume you've got the model deployed down there, for the, you know, local inferencing. Now, some retraining is going to go on in the Cloud, where you have, you're pulling in the richer perspective from many different devices. How does that model get back out to the device if it doesn't have the connectivity between the device and the Cloud? >> Right, so if there's strictly no connectivity, so what happens is once the model is regenerated or retrained, they put a model in a USB stick, it's a low attack. USB stick, bring it to the PLC device and upload the model. >> George: Oh, so this is sort of how we destroyed the Iranian centrifuges. >> That's exactly right, exactly right. But you know, some other environments, even though it's not connectivity to the Cloud environment, per se, but the devices have the ability to connect to the Cloud. Optionally, they say, "Look, I'm the device "that's coming up, do you have an upgraded model for me?" Then it can pull the model. So in some of the environments it's super strict where there are absolutely no way to connect this device, you put it in a USB stick and bring the model back here. Other environments, device can query the Cloud but Cloud cannot connect to the device. This is a very popular model these days because, in other words imagine this, an elevator sitting in a building, somebody from the Cloud cannot reach the elevator, but an elevator can reach the Cloud when it wants to. >> George: Sort of like a jet engine, you don't want the Cloud to reach the jet engine. >> That's exactly right. The jet engine can reach the Cloud it if wants to, when it wants to, but the Cloud cannot reach the jet engine. That's how we can pull the model. >> So Sastry, as a CTO you meet with customers often. You mentioned you were in Saudi Arabia last week. I'd love to understand how you're leveraging and gaging with customers to really help drive the development of FogHorn, in terms of being differentiated in the market. What are those, kind of bi-directional, symbiotic customer relationships like? And how are they helping FogHorn? >> Right, that's actually a great question. We learn a lot from customers because we started a long time ago. We did an initial version of the product. As we begin to talk to the customers, particularly that's part of my job, where I go talk to many of these customers, they give us feedback. Well, my problem is really that I can't even do, I can't even give you connectivity to the Cloud, to upgrade the model. I can't even give you sample data. How do you do that modeling, right? And sometimes they say, "You know what, "We are not technical people, help us express the problem, "the outcome, give me tools "that help me express that outcome." So we created a bunch of what we call OT tools, operational technology tools. How we distinguish ourselves in this process, from the traditional Cloud-based vendor, the traditional data science and data analytics companies, is that they think in terms of computer scientists, computer programmers, and expressions. We think in terms of industrial operators, what can they express, what do they know? They don't really necessarily care about, when you tell them, "I've got an anomaly detection "data science machine algorithm", they're going to look at you like, "What are you talking about? "I don't understand what you're talking about", right? You need to tell them, "Look, this machine is failing." What are the conditions in which the machine is failing? How do you express that? And then we translate that requirement, or that into the underlying models, underlying Vel expressions, Vel or CPU expression language. So we learned a ton from user interface, capabilities, latency issues, connectivity issues, different protocols, a number of things that we learn from customers. >> So I'm curious with... More of the big data vendors are recognizing data in motion and data coming from devices. And some, like Hortonworks DataFlow NiFi has a MiNiFi component written in C plus plus, really low resource footprint. But I assume that that's really just a transport. It's almost like a collector and that it doesn't have the analytics built in -- >> That's exactly right, NiFi has the transport, it has the real-time transport capability for sure. What it does not have is this notion of that CEP concept. How do you combine all of the streams, everything is a time series data for us, right, from the devices. Whether it's coming from a device or whether it's coming from another static source out there. How do you express a pattern, a recognition pattern definition, across these streams? That's where our CPU comes in the picture. A lot of these seemingly similar software capabilities that people talk about, don't quite exactly have, either the streaming capability, or the CPU capability, or the real-time, or the low footprint. What we have is a combination of all of that. >> And you talked about how everything's time series to you. Is there a need to have, sort of an equivalent time series database up in some central location? So that when you subset, when you determine what relevant subset of data to move up to the Cloud, or you know, on-prem central location, does it need to be the same database? >> No, it doesn't need to be the same database. It's optional. In fact, we do ship a local time series database at the edge itself. If you have a little bit of a local storage, you can down sample, take the results, and store it locally, and many customers actually do that. Some others, because they have their existing environment, they have some Cloud storage, whether it's Microsoft, it doesn't matter what they use, we have connectors from our software to send these results into their existing environments. >> So, you had also said something interesting about your, sort of, tool set, as being optimized for operations technology. So this is really important because back when we had the Net-Heads and the Bell-Heads, you know it was a cultural clash and they had different technologies. >> Sastry: They sure did, yeah. >> Tell us more about how selling to operations, not just selling, but supporting operations technology is different from IT technology and where does that boundary live? >> Right, so typical IT environment, right, you start with the boss who is the decision maker, you work with them and they approve the project and you go and execute that. In an industrial, in an OT environment, it doesn't quite work like that. Even if the boss says, "Go ahead and go do this project", if the operator on the floor doesn't understand what you're talking about, because that person is in charge of operating that machine, it doesn't quite work like that. So you need to work bottom up as well, to convincing them that you are indeed actually solving their pain point. So the way we start, where rather than trying to tell them what capabilities we have as a product, or what we're trying to do, the first thing we ask is what is their pain point? "What's your problem? What is the problem "you're trying to solve?" Some customers say, "Well I've got yield, a lot of scrap. "Help me reduce my scrap. "Help me to operate my equipment better. "Help me predict these failure conditions "before it's too late." That's how the problem starts. Then we start inquiring them, "Okay, what kind of data "do you have, what kind of sensors do you have? "Typically, do you have information about under what circumstances you have seen failures "versus not seeing failures out there?" So in the process of inauguration we begin to understand how they might actually use our software and then we tell them, "Well, here, use your software, "our software, to predict that." And, sorry, I want 30 more seconds on that. The other thing is that, typically in an IT environment, because I came from that too, I've been in this position for 30 plus years, IT, UT and all of that, where we don't right away talk about CEP, or expressions, or analytics, and we don't talk about that. We talk about, look, you have these bunch of sensors, we have OT tools here, drag and drop your sensors, express the outcome that you're trying to look for, what is the outcome you're trying to look for, and then we drive behind the scenes what it means. Is it analytics, is it machine learning, is it something else, and what is it? So that's kind of how we approach the problem. Of course, if, sometimes you do surprisingly occasionally run into very technical people. From those people we can right away talk about, "Hey, you need these analytics, you need to use machinery, "you need to use expressions" and all of that. That's kind of how we operate. >> One thing, you know, that's becoming clearer is I think this widespread recognition that's data intensive and low latency work to be done near the edge. But what goes on in the Cloud is actually closer to simulation and high-performance compute, if you want to optimize a model. So not just train it, but maybe have something that's prescriptive that says, you know, here's the actionable information. As more of your data is video and audio, how do you turn that into something where you can simulate a model, that tells you the optimal answer? >> Right, so this is actually a good question. From our experience, there are models that require a lot of data, for example, video and audio. There are some other models that do not require a lot of data for training. I'll give you an example of what customer use cases that we have. There's one customer in a manufacturing domain, where they've been seeing a lot of finished goods failures, there's a lot of scrap and the problem then was, "Hey, predict the failures, "reduce my scrap, save the money", right? Because they've been seeing a lot of failures every single day, we did not need a lot of data to train and create a model to that. So, in fact, we just needed one hour's worth of data. We created a model, put the thing, we have reduced, completely eliminated their scrap. There are other kinds of models, other kinds of models of video, where we can't do that in the edge, so we're required for example, some video files or simulated audio files, take it to an offline model, create the model, and see whether it's accurately predicting based on the real-time video coming in or not. So it's a mix of what we're seeing between those two. >> Well Sastry, thank you so much for stopping by theCUBE and sharing what it is that you guys at FogHorn are doing, what you're hearing from customers, how you're working together with them to solve some of these pretty significant challenges. >> Absolutely, it's been a pleasure. Hopefully this was helpful, and yeah. >> Definitely, very educational. We want to thank you for watching theCUBE, I'm Lisa Martin with George Gilbert. We are live at our event, Big Data SV in downtown San Jose. Come stop by Forager Tasting Room, hang out with us, learn as much as we are about all the layers of big data digital transformation and the opportunities. Stick around, we will be back after a short break. (upbeat electronic music)

Published Date : Mar 8 2018

SUMMARY :

brought to you by SiliconANGLE Media down the street from the Strata Data Conference. what do you guys do, who are you? Obviously in the process, you know, the new business outcomes you could build on it, What's the FogHorn secret sauce that others Before I directly answer the question, if you don't mind, how constrained an environment you can operate in. but that's the kind of environment we're talking about. So that's the kind of size we're talking about. on the other thing you said, with, and refining the gas and all of that. the Cloud if you needed to do retraining? Import and bring the model back If the model is running ultimately on the device, These days, most of the PLCs, programmable controllers, if it doesn't have the connectivity USB stick, bring it to the PLC device and upload the model. we destroyed the Iranian centrifuges. but the devices have the ability to connect to the Cloud. you don't want the Cloud to reach the jet engine. but the Cloud cannot reach the jet engine. So Sastry, as a CTO you meet with customers often. they're going to look at you like, and that it doesn't have the analytics built in -- or the real-time, or the low footprint. So that when you subset, when you determine If you have a little bit of a local storage, So, you had also said something interesting So the way we start, where rather than trying that tells you the optimal answer? and the problem then was, "Hey, predict the failures, and sharing what it is that you guys at FogHorn are doing, Hopefully this was helpful, and yeah. We want to thank you for watching theCUBE,

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
George	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Saudi Arabia	LOCATION	0.99+
Sastry Malladi	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
one hour	QUANTITY	0.99+
Sastry	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
GE	ORGANIZATION	0.99+
100 megabytes	QUANTITY	0.99+
Lisa	PERSON	0.99+
Bill Joy	PERSON	0.99+
two	QUANTITY	0.99+
FogHorn	ORGANIZATION	0.99+
last week	DATE	0.99+
Mountain View	LOCATION	0.99+
30 more seconds	QUANTITY	0.99+
David Floor	PERSON	0.99+
one question	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
30 plus years	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
three plus years ago	DATE	0.99+
one customer	QUANTITY	0.98+
one	QUANTITY	0.98+
second	QUANTITY	0.98+
C plus plus	TITLE	0.98+
One	QUANTITY	0.98+
theCUBE	ORGANIZATION	0.98+
150 megabytes	QUANTITY	0.98+
two ways	QUANTITY	0.97+
Strata Data Conference	EVENT	0.97+
Iranian	OTHER	0.97+
five levels	QUANTITY	0.95+
millions of elevators	QUANTITY	0.95+
about less than 100	QUANTITY	0.95+
one part	QUANTITY	0.94+
Vel	OTHER	0.94+
One thing	QUANTITY	0.92+
dozens of machinery models	QUANTITY	0.92+
each	QUANTITY	0.91+
Intel	ORGANIZATION	0.91+
FogHorn	PERSON	0.86+
2018	DATE	0.85+
first thing	QUANTITY	0.85+
single-core	QUANTITY	0.85+
NiFi	ORGANIZATION	0.82+
Power by the Hour	ORGANIZATION	0.81+
about three years ago	DATE	0.81+
Forager Tasting R	ORGANIZATION	0.8+
a ton	QUANTITY	0.8+
CTO	PERSON	0.79+
multibillion dollar	QUANTITY	0.79+
Data	EVENT	0.79+
Bell-Heads	ORGANIZATION	0.78+
every single day	QUANTITY	0.76+
The Cube	ORGANIZATION	0.75+
Cloud	COMMERCIAL_ITEM	0.73+
Dozens of machinery algorithms	QUANTITY	0.71+
Pi	COMMERCIAL_ITEM	0.71+
petabytes	QUANTITY	0.7+
raspberry	ORGANIZATION	0.69+
Big Data	ORGANIZATION	0.68+
Cloud	TITLE	0.67+
dual-core	QUANTITY	0.65+
Sastry	ORGANIZATION	0.62+
Net	ORGANIZATION	0.61+

Western Digital Taking the Cloud to the Edge, Panel 2 | DataMakesPossible

>> They are disruptive technologies. And if you think about the disruption that's happening in business, with IoT, with OT, and with big data, you can't get anything more disruptive to the whole of the business chain as this particular area. It's an area that I focused on myself, asking the question, should everything go to the cloud? Is that the new future? Is 90% of the computing going to go to the cloud with just little mobile devices right on the edge? Felt wrong when I did the math on it, I did some examples of real-world environments, wind farms, et cetera, it clearly was not the right answer, things need to be near the edge. And I think one of the areas to me that solidified it was when you looked at an area like video. Huge amounts of data, real important decisions being made on the content of that video, for example, recognizing a face, a white hat or a black hat. If you look at the technology, sending that data somewhere to do that recognition just does not make sense. Where is it going? It's going actually into the camera itself, right next to the data, because that's where you have the raw data, that's where you have the maximum granularity of data, that's where you need to do the processing of which faces are which, right close to the edge itself, and then you can send the other data back up to the cloud, for example, to improve those algorithms within that camera, to do all that sort of work on the batch basis over time, that's what I was looking at, and looking at the cost justification for doing that sort of work. So today, we've got a set people here on the panel, and we want to talk about coming down one level to where IoT and IT are going to have to connect together. So on the panel I've got, I'm going to get these names really wrong, Sanjeev Kumar? >> Yes, that's right. >> From FogHorn, could you introduce yourself and what you're doing where the data is meeting the people and the machines? >> Sure, sure, so my name is Sanjeev Kumar, I actually run engineering for a company called FogHorn Systems, we are actually bringing analytics and machine learning to the edge, and, so our goal and motto is to take computing to where the data is, than the other way around. So it's a two-year-old company that started, was incubated in the hive, and we are in the process of getting our second release of the product out shortly. >> Excellent, so let me start at the other end, Rohan, can you talk about your company and what contribution you're focusing on? >> Sure, I'm head product marketing for Maana, Maana is a startup, about three years old, what we're doing is we're offering an enterprise platform for large enterprises, we're helping the likes of Shell and Maersk and Chevron digitally transform, and that simply means putting the focus on subject matter experts, putting the focus on the people, and data's definitely an important part of it, but allowing them to bring their expertise into the decision flows, so that ultimately the key decisions that are driving the revenue for these behemoths, are made at a higher quality and faster. >> Excellent. Well, two software companies, we have a practitioner here who is actually doing fog computing, doing it for real, has been doing it for some time, so could you like, Janet George from Western Digital, can you introduce yourself, and say something from the trenches, of what's really going on? >> Okay, very good, thank you. I actually build infrastructure for the edge to deal with fog computing, and so for Western Digital, we're very lucky, because we are the largest storage manufacture, and we have what we call Internet of Things, and Internet of Test Equipment, and I process petabytes of data that comes out of the Internet of Things, which is basically our factories, and then I take these petabytes of data, I process them both on the cloud and then on the edge, but primarily, to be able to consume that data. And the way we consume that data is by building very high-profile models through artificial intelligence and machine learning, and I'll talk a lot more about that, but at the end of the day, it's all about consuming the data that you collect from anywhere, Internet of Things, computer equipment, data that's being produced through products, you have to figure out a way to compute that, and the cloud has many advantages and many trade-offs, and so we're going to talk about the trade-offs, that's where the gap for computing comes into play. >> Excellent, thanks very much. And last but not least, we have Val, and I can never pronounce your surname. >> Bercovici. >> Thank you. (chuckling) You are in the midst of a transition yourself, so talk about where you have been and where you're going. >> For the better part of this century, I've been with NetApp, working at various functions, obviously enterprise storage, and around 2008, my developer instinct kind of fired up, and this thing called cloud became very interesting to me. So I became a self-anointed cloud czar at NetApp, and I ended up initiating a lot of our projects which we know today as the NetApp Data Fabric, that culminated about 18 months ago, in acquisition of SolidFire, and I'm now the acting CTO of SolidFire, but I plan to retire from the storage industry at the end of our fiscal year, at the end of April, and I'm spending a lot of time with particularly the Cloud Native Compute Foundation, that is, the opensource home of Google's Kubernetes Technology and about seven other related projects, we keep adding some almost every month, and I'm starting to lose track, and spending a lot of time on the data gravity challenge. It's a challenge in the cloud, it's a particularly new and interesting challenge at the edge, and I look forward to talking about that. >> Okay, and data gravity is absolutely key, isn't it, it's extremely expensive and extremely heavy to move around. >> And the best analogy is workloads are like electricity, they move fairly easily and lightly, data's like water, it's really hard to move, particularly large bodies around. >> Great. I want to start with one question though, just in the problem, the core problem, particularly in established industries, of how do we get change to work? In an IT shop, we have enough problems dealing with operations and development. In the industrial world, we have the IT and the OT, who look at each other with less than pleasure, and mainly disdain. How do we solve the people problem in trying to put together solutions? You must be right in the middle of it, would you like to start with that question? >> Absolutely, so we are 26 years old, probably more than that, but we have very old and new mix of manufacturing equipment, it's a storage industry, and in our storage industry, we are used to doing things a certain way. We have existing data, we have historical data, we have trend data, you can't get rid of what you already have. The goal is to make connectors such that you can move from where you're at to where you're going, and so you have to be able to take care of the shift that is happening in the market, so at the end of the day, if you look at five years from now, it's all going to be machine learning and AI, right? Agent technology's already here, it's proven, we can see, Siri is out here, we can see Alexa, we can see these agent technologies out there, so machine learning is a getting a lot of momentum, deep learning and neural networks, things like that. So we got to be able to look at that data and tap into our data, near realistically, very different, and the way to do that is really making these connections happen, tapping into old versus new. Like for example, if you look at storage, you have file storage, you have block storage, and then you have object storage, right? We've not really tapped into the field of object storage, and the reason is because if you are going to process one trillion objects like Amazon is doing right now with S3, you can't do it with the file system level storage or with the blog system level storage, you have to go to objects. Think Internet of Things. How many trillions of objects are going to come out of these Internet of Things? So one, you have to be positioned from an infrastructure standpoint. Two, you have to be positioned from a use case prototyping perspective, and three, you got to be able to scale that very rapidly, very quickly, and that's how change happens, change does not happen because you ask somebody to change their behavior, change happens when you show value, and people are so eager to get that value out of what you've shown them in real life, that they are so quick to adapt. >> That's an excellent-- >> If I could comment on that as well, which is, we just got through training a bunch of OT guys on our software, and two analogies that actually work very well, one is sort of, the operational people are very familiar with circuit diagrams, and so, and sort of, flow of things through essentially black boxes, you can think of these as something that has a bunch of inputs and has a bunch of outputs. So that's one thing that worked very well. The second thing that works very well is the PLC model, and there are direct analogies between PLC's and analytics, which people on the floor can actually relate to. So if you have software that's basically based on data streams and time, as a first-class citizen, the PLC model again works very well in terms of explaining the new software to the OT people. >> Excellent, okay, would you want to come in on that as well? >> Sure, I think a couple of points to add to what Janet said, I couldn't agree more in terms of the result, I think Maana did a few projects, a few pilots to convince customers of their value, and we typically focus very heavily on operationalizing the output, so we are very focused on making sure that there is some measurable value that comes out of it, and it's not until the end user started seeing that value that they were willing and open to adopt the newer methodologies. A second point to that is, a lot of the more recent techniques available to solve certain challenges, there are deep learning neural nets there's all sorts of sophisticated AI and machine learning algorithms that are out there, a lot of these are very sophisticated in their ability to deliver results, but not necessarily in the transparency of how you got that, and I think that's another thing that Maana's learning, is yes, we have this arsenal of fantastic algorithms to throw at problems, but we try to start with the simplest approach first, we don't unnecessarily try to brute force, because I think an enterprise, they are more than willing to have that transparency in how they're solving something, so if they're able to see how they were able to get to us, how the software was able to get to a certain conclusion, then they are a lot happier with that approach. >> Could you maybe just give one example, a real-world example, make it a little bit real? >> Right, absolutely, so we did a project for a very large organization for collections, they have a lot of outstanding capital locked up and customers not paying, it's a standard problem, you're going to find it in pretty much any industry, and so for that outstanding invoice, what we did was we went ahead and we worked with the subject matter experts, we looked at all the historical accounts receivable data, we took data from a lot of other sources, and we were able to come up with models to predict when certain customers are likely to pay, and when they should be contacted. Ultimately, what we wanted to give the collection agent were a list of customers to call. It was fairly straightforward, of course, the solution was not very, very easy, but at least on a holistic level, it made a lot of sense to us. When we went to the collection agents, many of them actually refused to use that approach, and this is part of change management in some sense, they were so used to doing things their way, they were so used to trying to target the customers with the largest outstanding invoice, or the ones that hadn't paid for the longest amount of time, that it actually took us a while, because initially, what the feedback we got was that your approach is not working, we're not seeing the results. And when we dug into it, it was because it wasn't being used, so that would be one example. >> So again, proof points that you will actually get results from this. >> Absolutely, and the transparency, I think we actually sent some of our engineers to work with the collections agents to help them understand what approach is it that we're taking, and we showed them that this is not magic, we're actually, instead of looking at the final dollar value, we're looking, we're calculating time value lost, so we are coming up with a metric that allows us to incorporate not just the outstanding amount, or the time that they haven't paid for, but a lot of other factors as well. >> Excellent, Val. >> When you asked that question, I immediately went to more of a nontechnical business side of my brain to answer it, so my experience over the years has been particularly during major industry transitions, I'm old enough to remember the mainframe to client server transition, and now client server to virtualization and cloud, and really, sales reps have that well-earned reputation of being coin-operated, though it's remarkable how much you can adjust compensation plans for pretty much anyone, in a capitalist environment, and the IT/OT divide, if you will, is pretty easy to solve from a business perspective when you take someone with an IT supporting the business mentality, and you compensate them on new revenue streams, new business, all of a sudden, the world perspective changes sometimes overnight, or certainly when that contract is signed. That's probably the number one thing you can do from a people perspective, is incent them and motivate them to focus on these new things, the technology is, particularly nowadays is evolving to support them for these new initiatives, but nothing motivates like the right compensation plan. >> Excellent, a great series of different viewpoints. So the second question I have again coming down a bit to this level, is how do we architect a solution? We heard you got to architect it, and you've got less, like this, it seems to me that that's pretty difficult to do ahead of where you're going, that in general, you take smaller steps, one step at a time, you solve one problem, you go on to the next. Am I right in that? If I am, how would you suggest the people go about this decision-making of putting architectures together, and if you think I'm wrong and you have a great new way of doing it, I'd love to hear about it. >> I can take a shorter route. So we have a number of customers that are trying to adopt, are going through a phased way of adopting our technology and products, and so it begins with first gathering of the data, and replaying it back, to build the first level of confidence, in the sense that the product is actually doing what you're expecting it to do. So that's more from monitoring administration standpoint. The second stage is you should begin to capture analytical logic into the project, where it can start doing prediction for you, so you go into, so from operational, you go into a predictive maintenance, predictive maintenance, predictive models standpoint. The third part is prescriptive, where you actually help create a machine learning model, now, it's still in flux in terms of where the model gets created, whether it's on the cloud, in a central fashion, or some sort of a, the right place, the right context in a multi-level hierarchical fog layer, and then, you sort of operationalize that as close to the data again as possible, so you go through this operational to predictive to prescriptive adoption of the technology, and that's how people actually build confidence in terms of adopting something new into, let's say, a manufacturing environment, or things that are pretty expensive, so I give you another example where you have the case of capacitors being built on a assembly line, manufacturing, and so how do you, can you look at data across different stations and manufacturing on a assembly line? And can you predict on the second station that it's going to fail on the eighth one? By that, what you're doing is you are actually reducing the scrap that's coming off of the assembly line. So, that's the kind of usage that you're going to in the second and third stage. >> Host: Excellent. Janet, do you want to go on? >> Yeah, I agree and I have a slightly different point of view also. I think architecture's very difficult, it's like Thomas Edison, he spent a lot of time creating negative knowledge to get to that positive knowledge, and so that's kind of the way it is in the trenches, we spend a lot of time trying to think through, the keyword that comes to mind is abstraction layers, because where we came from, everything was tightly coupled, and tightly coupled, computer and storage are tightly coupled, structured and unstructured data are tightly coupled, they're tightly coupled with the database, schema is tightly coupled, so now we are going into this world of everything being decoupled. In that, multiple things, multiple operating systems should be able to use your storage. Multiple models should be able to use your data. You cannot structure your data in any kind of way that is customized to one particular model. Many models have to run on that data on the fly, retrain itself, and then run again, so when you think about that, you think about what suits best to stay in the cloud, maybe large amounts of training data, schema that's already processed can stay on the cloud. Schema that is very dynamic, schema that is on the fly, that you need to read, and data that's coming at you from the Internet of Things that's changing, I call it heteroscedastic data, which is very statistical in nature, and highly variable in nature, you don't have time to sit there and create rows and columns and structure this data and put it into some sort of a structured set, you need to have a data lake, you need to have a stack on top of that data lake that can then adapt, create metadata, process that data and make it available for your models, so, and then over time, like I totally believe that now we're running into near realtime compute bottleneck, processing all this pattern processing for the different models and training sets, so we need a stack that we can quickly replace with GPUs, which is where the future is going, with pattern processing and machine learning, so your architecture has to be extremely flexible, high layers of abstraction, ability to train and grow and iterate. >> Excellent. Do you want to go next? >> So I'll be a broken record, back to data gravity, I think in an edge context, you really got to look at the cost of processing data is orders of magnitude less than moving it or even storing it, and so I think that the real urgency, I don't know, there's 90% that think of data at the edge is kind of wasted, you can filter through it and find that signal through the noise, so processing data to make sure that you're dealing with really good data at the edge first, figuring out what's worth retaining for future steps, I love the manufacturing example, I have lots of customer examples ourselves where, for quality control in a high-moving assembly line, you want to take thousands of not millions of images and compare frame and frame exactly according to the schematics where the device is compared to where it should be, or where the components, and the device compared to where they should be, processing all of that data locally and making sure you extract the maximum value before you move data to a central data lake to correlate it against other anomalies or other similarities, that's really key, so really focus on that cost of moving and storing data, yeah. >> Yes, do you want the last word? >> Sure, Maana takes an interesting approach, I'm going to up-level a little bit. Whenever we are faced with a customer or a particular problem for a customer, we try to go over the question-answer approach, so we start with taking a very specific business question, we don't look at what data sources are available, we don't ask them whether they have a data lake, or we literally get their business leaders, their subject matter experts, we literally lock them up in a room and we say, "You have to define "a very specific problem statement "from which we start working backwards," each problem statement can be then broken down into questions, and what we believe is any question can be answered by a series of models, you talked about models, we go beyond just data models, we believe anything in the real world, in the case of, let's say, manufacturing, since we're talking about it, any smallest component of a machine should be represented in the form of a concept, relationships between people operating that machinery should be represented in the form of models, and even physics equations that are going into predicting behavior should be able to represent in the form of a model, so ultimately, what that allows us is that granularity, that abstraction that you were talking about, that it shouldn't matter what the data source is, any model should be able to plug into any data source, or any more sophisticated bigger model, I'll give you an example of that, we started solving a problem of predictive maintenance for a very large customer, and while we were solving that predictive maintenance problem, we came up with a number of models to go ahead and solve that problem. We soon realized that within that enterprise, there are several related problems, for example, replacement of part inventory management, so now that you figured out which machine is going to fail at roughly what instance of time from now, we can also figure out what parts are likely to fail, so now you don't have to go ahead and order a ton of replacement parts, because you know what parts are going to likely fail, and then you can take that a step further by figuring out which equipment engineer has the skillset to go ahead and solve that particular issue. Now, all of that, in today's world, is somewhat happening in some companies, but it is actually a series of point solutions that are not talking to each other, that's where our pattern technology graph is coming into play where each and every model is actually a note on the graph including computational models, so once you build 10 models to solve that first problem, you can reuse some of them to solve the second and third, so it's a time-to-value advantage. >> Well, you've been a fantastic panel, I think these guys would like to get to a drink at the bar, and there's an opportunity to talk to you people, I think this conversation could go on for a long, long time, there's so much to learn and so much to share in this particular information. So with that, over to you! >> I'll just wrap it up real quick, thanks everyone, give the panel a hand, great job. Thanks for coming out, we have drinks for the next hour or two here, so feel free to network and mingle, great questions to ask them privately one-on-one, or just have a great conversation, and thanks for coming, we really appreciate it, for our Big Data SV Event livestreamed out, it'll be on demand on YouTube.com/siliconangle, all the video, if you want to go back, look at the presentations, go to YouTube.com/siliconangle, and of course, siliconangle.com, and Wikibond.com for the research and content coverage, so thanks for coming, one more time, big round of applause for the panel, enjoy your evening, thanks so much.

Published Date : Mar 16 2017

SUMMARY :

Is 90% of the computing going to go to the cloud of getting our second release of the product out shortly. and that simply means putting the focus so could you like, Janet George from Western Digital, consuming the data that you collect from anywhere, and I can never pronounce your surname. so talk about where you have been the acting CTO of SolidFire, but I plan to retire Okay, and data gravity is absolutely key, isn't it, And the best analogy is workloads are like electricity, would you like to start with that question? and the reason is because if you are going to process in terms of explaining the new software to the OT people. but not necessarily in the transparency of how you got that, and we were able to come up with models to predict So again, proof points that you will actually Absolutely, and the transparency, and the IT/OT divide, if you will, and if you think I'm wrong and you have a great new way and then, you sort of operationalize that Janet, do you want to go on? the keyword that comes to mind is abstraction layers, Do you want to go next? and the device compared to where they should be, and then you can take that a step further and there's an opportunity to talk to you people, all the video, if you want to go back,

ENTITIES

Entity	Category	Confidence
Janet George	PERSON	0.99+
Janet	PERSON	0.99+
Western Digital	ORGANIZATION	0.99+
Cloud Native Compute Foundation	ORGANIZATION	0.99+
90%	QUANTITY	0.99+
10 models	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Shell	ORGANIZATION	0.99+
Siri	TITLE	0.99+
Sanjeev Kumar	PERSON	0.99+
second	QUANTITY	0.99+
Rohan	PERSON	0.99+
one question	QUANTITY	0.99+
Maana	ORGANIZATION	0.99+
FogHorn Systems	ORGANIZATION	0.99+
two analogies	QUANTITY	0.99+
thousands	QUANTITY	0.99+
Bercovici	PERSON	0.99+
Thomas Edison	PERSON	0.99+
second question	QUANTITY	0.99+
second station	QUANTITY	0.99+
SolidFire	ORGANIZATION	0.99+
FogHorn	ORGANIZATION	0.99+
third	QUANTITY	0.99+
first	QUANTITY	0.99+
third part	QUANTITY	0.99+
today	DATE	0.99+
second thing	QUANTITY	0.98+
Two	QUANTITY	0.98+
two-year-old	QUANTITY	0.98+
one problem	QUANTITY	0.98+
end of April	DATE	0.98+
Alexa	TITLE	0.98+
first problem	QUANTITY	0.98+
one example	QUANTITY	0.97+
three	QUANTITY	0.97+
second release	QUANTITY	0.97+
third stage	QUANTITY	0.97+
second point	QUANTITY	0.96+
one trillion objects	QUANTITY	0.96+
second stage	QUANTITY	0.96+
one	QUANTITY	0.96+
one level	QUANTITY	0.95+
two software companies	QUANTITY	0.95+
NetApp Data Fabric	ORGANIZATION	0.95+
each	QUANTITY	0.95+
millions of images	QUANTITY	0.95+
2008	DATE	0.95+
first level	QUANTITY	0.95+
both	QUANTITY	0.95+
eighth one	QUANTITY	0.94+
S3	TITLE	0.93+
trillions of objects	QUANTITY	0.92+
Wikibond.com	ORGANIZATION	0.92+
each problem statement	QUANTITY	0.92+
one thing	QUANTITY	0.92+
one step	QUANTITY	0.91+
Big Data SV Event	EVENT	0.91+
siliconangle.com	OTHER	0.91+
NetApp	ORGANIZATION	0.91+
Maersk	ORGANIZATION	0.9+
about three years old	QUANTITY	0.89+
five years	QUANTITY	0.89+
Maana	PERSON	0.89+
about 18 months ago	DATE	0.88+
26 years old	QUANTITY	0.82+
one particular model	QUANTITY	0.82+
Kubernetes Technology	ORGANIZATION	0.82+
Val	PERSON	0.82+
every	QUANTITY	0.81+
Chevron	ORGANIZATION	0.79+
Panel 2	QUANTITY	0.77+
seven other related projects	QUANTITY	0.7+
next hour	DATE	0.69+
to	TITLE	0.66+
petabytes	QUANTITY	0.64+
time	QUANTITY	0.64+
two	QUANTITY	0.63+
series of models	QUANTITY	0.52+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for FogHorn: