Sastry Malladi, FogHorn | Big Data SV 2018

>> Announcer: Live from San Jose, it's theCUBE, presenting Big Data Silicon Valley, brought to you by SiliconANGLE Media and its ecosystem partner. (upbeat electronic music) >> Welcome back to The Cube. I'm Lisa Martin with George Gilbert. We are live at our event, Big Data SV, in downtown San Jose down the street from the Strata Data Conference. We're joined by a new guest to theCUBE, Sastry Malladi, the CTO Of FogHorn. Sastry, welcome to theCUBE. >> Thank you, thank you, Lisa. >> So FogHorn, cool name, what do you guys do, who are you? Tell us all that good stuff. >> Sure. We are a startup based in Silicon Valley right here in Mountain View. We started about three years ago, three plus years ago. We provide edge computing intelligence software for edge computing or fog computing. That's how our company name got started is FogHorn. For our particularly, for our IoT industrial sector. All of the industrial guys, whether it's transportation, manufacturing, oil and gas, smart cities, smart buildings, any of those different sectors, they use our software to predict failure conditions in real time, or do condition monitoring, or predictive maintenance, any of those use cases and successfully save a lot of money. Obviously in the process, you know, we get paid for what we do. >> So Sastry... GE populized this concept of IIoT and the analytics and, sort of the new business outcomes you could build on it, like Power by the Hour instead of selling a jet engine. >> Sastry: That's right. But there's... Actually we keep on, and David Floor did some pioneering research on how we're going to have to do a lot of analytics on the edge for latency and bandwidth. What's the FogHorn secret sauce that others would have difficulty with on the edge analytics? >> Okay, that's a great question. Before I directly answer the question, if you don't mind, I'll actually even describe why that's even important to do that, right? So a lot of these industrial customers, if you look at, because we work with a lot of them, the amount of data that's produced from all of these different machines is terabytes to petabytes of data, it's real. And it's not just the traditional digital sensors but there are video, audio, acoustic sensors out there. The amount of data is humongous, right? It's not even practical to send all of that to a Cloud environment and do data processing, for many reasons. One is obviously the connectivity, bandwidth issues, and all of that. But the two most important things are cyber security. None of these customers actually want to connect these highly expensive machines to the internet. That's one. The second is the lack of real-time decision making. What they want to know, when there is a problem, they want to know before it's too late. We want to notify them it is a problem that is occurring so that have a chance to go fix it and optimize their asset that is in question. Now, existing solutions do not work in this constrained environment. That's why FogHorn had to invent that solution. >> And tell us, actually, just to be specific, how constrained an environment you can operate in. >> We can run in about less than 100 to 150 megabytes of memory, single-core to dual-core of CPU, whether it's an ARM processor, an x86 Intel-based processor, almost literally no storage because we're a real-time processing engine. Optionally, you could have some storage if you wanted to store some of the results locally there but that's the kind of environment we're talking about. Now, when I say 100 megabytes of memory, it's like a quarter of Raspberry Pi, right? And even in that environment we have customers that run dozens of machinery models, right? And we're not talking -- >> George: Like an ensemble. >> Like an anomaly detection, a regression, a random forest, or a clustering, or a gamut, some of those. Now, if we get into more deep learning models, like image processing and neural net and all of that, you obviously need a little bit more memory. But what we have shown, we could still run, one of our largest smart city buildings customer, elevator company, runs in a raspberry Pi on millions of elevators, right? Dozens of machinery algorithms on top of that, right? So that's the kind of size we're talking about. >> Let me just follow up with one question on the other thing you said, with, besides we have to do the low-latency locally. You said a lot of customers don't want to connect these brown field, I guess, operations technology machines to the internet, and physically, I mean there was physical separation for security. So it's like security, Bill Joy used to say "Security by obscurity." Here it's security by -- >> Physical separation, absolutely. Tell me about it. I was actually coming from, if you don't mind, last week I was in Saudi Arabia. One of the oil and gas plants where we deployed our software, you have to go to five levels of security even to get to there, It's a multibillion dollar plant and refining the gas and all of that. Completely offline, no connectivity to the internet, and we installed, in their existing small box, our software, connected to their live video cameras that are actually measuring the stuff, doing the processing and detecting the specific conditions that we're looking for. >> That's my question, which was if they want to be monitoring. So there's like one low level, really low hardware low level, the sensor feeds. But you could actually have a richer feed, which is video and audio, but how much of that, then, are you doing the, sort of, inferencing locally? Or even retraining, and I assume that since it's not the OT device, and it's something that's looking at it, you might be more able to send it back up the Cloud if you needed to do retraining? >> That's exactly right. So the way the model works is particularly for image processing because you need, it's a more complex process to train than create a model. You could create a model offline, like in a GPU box, an FPGA box and whatnot. Import and bring the model back into this small little device that's running in the plant, and now the live video data is coming in, the model is inferencing the specific thing. Now there are two ways to update and revise the model: incremental revision of the model, you could do that if you want, or you can send the results to a central location. Not internet, they do have local, in this example for example a PIDB, an OSS PIDB, or some other local service out there, where you have an opportunity to gather the results from each of these different locations and then consolidate and retrain the model, put the model back again. >> Okay, the one part that I didn't follow completely is... If the model is running ultimately on the device, again and perhaps not even on a CPU, but a programmable logic controller. >> It could, even though a programmable controller also typically have some shape of CPU there as well. These days, most of the PLCs, programmable controllers, have either an RM-based processor or an x86-based processor. We can run either one of those too. >> So, okay, assume you've got the model deployed down there, for the, you know, local inferencing. Now, some retraining is going to go on in the Cloud, where you have, you're pulling in the richer perspective from many different devices. How does that model get back out to the device if it doesn't have the connectivity between the device and the Cloud? >> Right, so if there's strictly no connectivity, so what happens is once the model is regenerated or retrained, they put a model in a USB stick, it's a low attack. USB stick, bring it to the PLC device and upload the model. >> George: Oh, so this is sort of how we destroyed the Iranian centrifuges. >> That's exactly right, exactly right. But you know, some other environments, even though it's not connectivity to the Cloud environment, per se, but the devices have the ability to connect to the Cloud. Optionally, they say, "Look, I'm the device "that's coming up, do you have an upgraded model for me?" Then it can pull the model. So in some of the environments it's super strict where there are absolutely no way to connect this device, you put it in a USB stick and bring the model back here. Other environments, device can query the Cloud but Cloud cannot connect to the device. This is a very popular model these days because, in other words imagine this, an elevator sitting in a building, somebody from the Cloud cannot reach the elevator, but an elevator can reach the Cloud when it wants to. >> George: Sort of like a jet engine, you don't want the Cloud to reach the jet engine. >> That's exactly right. The jet engine can reach the Cloud it if wants to, when it wants to, but the Cloud cannot reach the jet engine. That's how we can pull the model. >> So Sastry, as a CTO you meet with customers often. You mentioned you were in Saudi Arabia last week. I'd love to understand how you're leveraging and gaging with customers to really help drive the development of FogHorn, in terms of being differentiated in the market. What are those, kind of bi-directional, symbiotic customer relationships like? And how are they helping FogHorn? >> Right, that's actually a great question. We learn a lot from customers because we started a long time ago. We did an initial version of the product. As we begin to talk to the customers, particularly that's part of my job, where I go talk to many of these customers, they give us feedback. Well, my problem is really that I can't even do, I can't even give you connectivity to the Cloud, to upgrade the model. I can't even give you sample data. How do you do that modeling, right? And sometimes they say, "You know what, "We are not technical people, help us express the problem, "the outcome, give me tools "that help me express that outcome." So we created a bunch of what we call OT tools, operational technology tools. How we distinguish ourselves in this process, from the traditional Cloud-based vendor, the traditional data science and data analytics companies, is that they think in terms of computer scientists, computer programmers, and expressions. We think in terms of industrial operators, what can they express, what do they know? They don't really necessarily care about, when you tell them, "I've got an anomaly detection "data science machine algorithm", they're going to look at you like, "What are you talking about? "I don't understand what you're talking about", right? You need to tell them, "Look, this machine is failing." What are the conditions in which the machine is failing? How do you express that? And then we translate that requirement, or that into the underlying models, underlying Vel expressions, Vel or CPU expression language. So we learned a ton from user interface, capabilities, latency issues, connectivity issues, different protocols, a number of things that we learn from customers. >> So I'm curious with... More of the big data vendors are recognizing data in motion and data coming from devices. And some, like Hortonworks DataFlow NiFi has a MiNiFi component written in C plus plus, really low resource footprint. But I assume that that's really just a transport. It's almost like a collector and that it doesn't have the analytics built in -- >> That's exactly right, NiFi has the transport, it has the real-time transport capability for sure. What it does not have is this notion of that CEP concept. How do you combine all of the streams, everything is a time series data for us, right, from the devices. Whether it's coming from a device or whether it's coming from another static source out there. How do you express a pattern, a recognition pattern definition, across these streams? That's where our CPU comes in the picture. A lot of these seemingly similar software capabilities that people talk about, don't quite exactly have, either the streaming capability, or the CPU capability, or the real-time, or the low footprint. What we have is a combination of all of that. >> And you talked about how everything's time series to you. Is there a need to have, sort of an equivalent time series database up in some central location? So that when you subset, when you determine what relevant subset of data to move up to the Cloud, or you know, on-prem central location, does it need to be the same database? >> No, it doesn't need to be the same database. It's optional. In fact, we do ship a local time series database at the edge itself. If you have a little bit of a local storage, you can down sample, take the results, and store it locally, and many customers actually do that. Some others, because they have their existing environment, they have some Cloud storage, whether it's Microsoft, it doesn't matter what they use, we have connectors from our software to send these results into their existing environments. >> So, you had also said something interesting about your, sort of, tool set, as being optimized for operations technology. So this is really important because back when we had the Net-Heads and the Bell-Heads, you know it was a cultural clash and they had different technologies. >> Sastry: They sure did, yeah. >> Tell us more about how selling to operations, not just selling, but supporting operations technology is different from IT technology and where does that boundary live? >> Right, so typical IT environment, right, you start with the boss who is the decision maker, you work with them and they approve the project and you go and execute that. In an industrial, in an OT environment, it doesn't quite work like that. Even if the boss says, "Go ahead and go do this project", if the operator on the floor doesn't understand what you're talking about, because that person is in charge of operating that machine, it doesn't quite work like that. So you need to work bottom up as well, to convincing them that you are indeed actually solving their pain point. So the way we start, where rather than trying to tell them what capabilities we have as a product, or what we're trying to do, the first thing we ask is what is their pain point? "What's your problem? What is the problem "you're trying to solve?" Some customers say, "Well I've got yield, a lot of scrap. "Help me reduce my scrap. "Help me to operate my equipment better. "Help me predict these failure conditions "before it's too late." That's how the problem starts. Then we start inquiring them, "Okay, what kind of data "do you have, what kind of sensors do you have? "Typically, do you have information about under what circumstances you have seen failures "versus not seeing failures out there?" So in the process of inauguration we begin to understand how they might actually use our software and then we tell them, "Well, here, use your software, "our software, to predict that." And, sorry, I want 30 more seconds on that. The other thing is that, typically in an IT environment, because I came from that too, I've been in this position for 30 plus years, IT, UT and all of that, where we don't right away talk about CEP, or expressions, or analytics, and we don't talk about that. We talk about, look, you have these bunch of sensors, we have OT tools here, drag and drop your sensors, express the outcome that you're trying to look for, what is the outcome you're trying to look for, and then we drive behind the scenes what it means. Is it analytics, is it machine learning, is it something else, and what is it? So that's kind of how we approach the problem. Of course, if, sometimes you do surprisingly occasionally run into very technical people. From those people we can right away talk about, "Hey, you need these analytics, you need to use machinery, "you need to use expressions" and all of that. That's kind of how we operate. >> One thing, you know, that's becoming clearer is I think this widespread recognition that's data intensive and low latency work to be done near the edge. But what goes on in the Cloud is actually closer to simulation and high-performance compute, if you want to optimize a model. So not just train it, but maybe have something that's prescriptive that says, you know, here's the actionable information. As more of your data is video and audio, how do you turn that into something where you can simulate a model, that tells you the optimal answer? >> Right, so this is actually a good question. From our experience, there are models that require a lot of data, for example, video and audio. There are some other models that do not require a lot of data for training. I'll give you an example of what customer use cases that we have. There's one customer in a manufacturing domain, where they've been seeing a lot of finished goods failures, there's a lot of scrap and the problem then was, "Hey, predict the failures, "reduce my scrap, save the money", right? Because they've been seeing a lot of failures every single day, we did not need a lot of data to train and create a model to that. So, in fact, we just needed one hour's worth of data. We created a model, put the thing, we have reduced, completely eliminated their scrap. There are other kinds of models, other kinds of models of video, where we can't do that in the edge, so we're required for example, some video files or simulated audio files, take it to an offline model, create the model, and see whether it's accurately predicting based on the real-time video coming in or not. So it's a mix of what we're seeing between those two. >> Well Sastry, thank you so much for stopping by theCUBE and sharing what it is that you guys at FogHorn are doing, what you're hearing from customers, how you're working together with them to solve some of these pretty significant challenges. >> Absolutely, it's been a pleasure. Hopefully this was helpful, and yeah. >> Definitely, very educational. We want to thank you for watching theCUBE, I'm Lisa Martin with George Gilbert. We are live at our event, Big Data SV in downtown San Jose. Come stop by Forager Tasting Room, hang out with us, learn as much as we are about all the layers of big data digital transformation and the opportunities. Stick around, we will be back after a short break. (upbeat electronic music)

Published Date : Mar 8 2018

SUMMARY :

brought to you by SiliconANGLE Media down the street from the Strata Data Conference. what do you guys do, who are you? Obviously in the process, you know, the new business outcomes you could build on it, What's the FogHorn secret sauce that others Before I directly answer the question, if you don't mind, how constrained an environment you can operate in. but that's the kind of environment we're talking about. So that's the kind of size we're talking about. on the other thing you said, with, and refining the gas and all of that. the Cloud if you needed to do retraining? Import and bring the model back If the model is running ultimately on the device, These days, most of the PLCs, programmable controllers, if it doesn't have the connectivity USB stick, bring it to the PLC device and upload the model. we destroyed the Iranian centrifuges. but the devices have the ability to connect to the Cloud. you don't want the Cloud to reach the jet engine. but the Cloud cannot reach the jet engine. So Sastry, as a CTO you meet with customers often. they're going to look at you like, and that it doesn't have the analytics built in -- or the real-time, or the low footprint. So that when you subset, when you determine If you have a little bit of a local storage, So, you had also said something interesting So the way we start, where rather than trying that tells you the optimal answer? and the problem then was, "Hey, predict the failures, and sharing what it is that you guys at FogHorn are doing, Hopefully this was helpful, and yeah. We want to thank you for watching theCUBE,

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
George	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Saudi Arabia	LOCATION	0.99+
Sastry Malladi	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
one hour	QUANTITY	0.99+
Sastry	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
GE	ORGANIZATION	0.99+
100 megabytes	QUANTITY	0.99+
Lisa	PERSON	0.99+
Bill Joy	PERSON	0.99+
two	QUANTITY	0.99+
FogHorn	ORGANIZATION	0.99+
last week	DATE	0.99+
Mountain View	LOCATION	0.99+
30 more seconds	QUANTITY	0.99+
David Floor	PERSON	0.99+
one question	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
30 plus years	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
three plus years ago	DATE	0.99+
one customer	QUANTITY	0.98+
one	QUANTITY	0.98+
second	QUANTITY	0.98+
C plus plus	TITLE	0.98+
One	QUANTITY	0.98+
theCUBE	ORGANIZATION	0.98+
150 megabytes	QUANTITY	0.98+
two ways	QUANTITY	0.97+
Strata Data Conference	EVENT	0.97+
Iranian	OTHER	0.97+
five levels	QUANTITY	0.95+
millions of elevators	QUANTITY	0.95+
about less than 100	QUANTITY	0.95+
one part	QUANTITY	0.94+
Vel	OTHER	0.94+
One thing	QUANTITY	0.92+
dozens of machinery models	QUANTITY	0.92+
each	QUANTITY	0.91+
Intel	ORGANIZATION	0.91+
FogHorn	PERSON	0.86+
2018	DATE	0.85+
first thing	QUANTITY	0.85+
single-core	QUANTITY	0.85+
NiFi	ORGANIZATION	0.82+
Power by the Hour	ORGANIZATION	0.81+
about three years ago	DATE	0.81+
Forager Tasting R	ORGANIZATION	0.8+
a ton	QUANTITY	0.8+
CTO	PERSON	0.79+
multibillion dollar	QUANTITY	0.79+
Data	EVENT	0.79+
Bell-Heads	ORGANIZATION	0.78+
every single day	QUANTITY	0.76+
The Cube	ORGANIZATION	0.75+
Cloud	COMMERCIAL_ITEM	0.73+
Dozens of machinery algorithms	QUANTITY	0.71+
Pi	COMMERCIAL_ITEM	0.71+
petabytes	QUANTITY	0.7+
raspberry	ORGANIZATION	0.69+
Big Data	ORGANIZATION	0.68+
Cloud	TITLE	0.67+
dual-core	QUANTITY	0.65+
Sastry	ORGANIZATION	0.62+
Net	ORGANIZATION	0.61+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Sastry Malladi: