Breaking Analysis: New Data Signals C Suite Taps the Brakes on Tech Spending

>> From theCUBE Studios in Palo Alto in Boston, bringing you data driven insights from theCUBE and ETR. This is Breaking Analysis with Dave Vellante. >> New data from ETR's soon to be released April survey, shows a clear deceleration in spending and a more cautious posture from technology buyers. Just this week, we saw sell side downgrades in hardware companies like Dell and HP and revised guidance from high flyer UiPath, citing exposures to Russia, Europe and certain sales execution challenges, but these headlines, we think are a canary in the coal mine. According to ETR analysis and channel checks in theCUBE, the real story is these issues are not isolated. Rather we're seeing signs of caution from buyers across the board in enterprise tech. Hello and welcome to this week's Wikibon CUBE insights powered by ETR. In this Breaking Analysis, we are the bearers of bad news. Don't shoot the messenger. We'll share a first look at fresh data that suggests a tightening in tech spending calling for 6% growth this year which is below our January prediction of 8% for 2022. Now, unfortunately the party may be coming to an end at least for a while. You know, it's really not surprising, right? We've had a two year record run in tech spending and meteoric rises in high flying technology stocks. Hybrid work, equipping and securing remote workers. The forced march to digital that we talk about sometimes. These were all significant tailwinds for tech companies. The NASDAQ peaked late last year and then as you can see in this chart, bottomed in mid-March of 2022, and it made a nice run up through the 29th of last month, but the mini rally appears to be in jeopardy with FED rate hikes, Russia, supply chain challenges. There's a lot of uncertainty so we should expect the C-suite to be saying, hey, wait slow down. Now we don't think the concerns are confined to companies with exposure to Russia and Europe. We think it's more broad based than that and we're seeing caution from technology companies and tech buyers that we think is prudent, given the conditions. You know, looks like the two year party has ended and as my ETR colleague Erik Bradley said, a little hangover shouldn't be a surprise to anybody. So let's get right to the new spending data. I'm limited to what I can share with you today because ETR is in its quiet period and hasn't released full results yet outside of its client base. But, they did put out an alert today and I can share this slide. It shows the expectation on spending growth from more than a thousand CIOs and IT buyers who responded in the most recent survey. It measures their expectations for spending. The key focus areas that I want you to pay attention to in this data are the yellow bars. The most recent survey is the yellow compared to the blue and the gray bars, which are the December and September '21 surveys respectively. And you can see a steep drop from last year in Q1, lowered expectations for Q2 in the far right, a drop from nearly 9% last September to around 6% today. Now you may think a 200 basis point downgrade from our prediction in January of 8% seems somewhat benign, but in a $4 trillion IT market, that's 80 billion coming off the income statements of some tech companies. Now the good news is that 6% growth is still very healthy and higher than pre pandemic spending levels. And the buyers we've talked to this week are saying, look, we're still spending money. We just have to be more circumspect about where and how fast. Now, there were a few other callouts in the ETR data and in my discussions today with Erik Bradley on this. First, it looks like in response to expected supply chain constraints that buyers pulled forward their orders late last year and earlier this year. You remember when we couldn't buy toilet paper, people started the stockpile and it created this rubber banding effect. So we see clear signs of receding momentum in the PC and laptop market. But as we said, this is not isolated to PCs, UiPath's earning guidance confirm this but the story doesn't end there. This isn't isolated to UiPath in our view, rather it's a more based slowdown. The other big sign is spending in outsourced IT which is showing a meaningful deceleration in the last survey, showing a net score drop from 13% in January to 6% today. Net score remember is a measure of the net percentage of customers in the survey that on balance are spending more than last survey. It's derived by subtracting the percent of customers spending less from those spending more. And there's a, that's a 700 basis point drop in three months. This isn't a market where you can't hire enough people. The percent of companies hiring has gone from 10% during the pandemic to 50% today according to recent data from ETR. And we know there's still an acute skills shortage. So you would expect more IT outsourcing, but you don't see that in the data, it's down. And as this quote from Erik Bradley explains, historically, when outsourced IT drops like this, especially in a tight labor market, it's not good news for IT spending. All right, now, the other interesting callout from ETR were some specific company names that appear to be seeing the biggest change in spending momentum. Here's the list of those companies that all have meaningful exposure to Europe. That's really where the focus was. SAP has big exposure to on-premises installations and of course, Europe as well. ServiceNow has European exposure and also broad based exposure in IT in across the globe, especially in the US. Zoom didn't go to the moon, no surprise there given the quasi return to work and Zoom fatigue. McAfee is a bit of a concern because security seemed to be one of those areas, when you look at some of the other data, that is per actually insulated from all the spending caution. Of course we saw the Okta hack and we're going to cover that next week with hopefully some new data from ETR, but generally security's been holding up pretty well. You look at CrowdStrike, you look at Zscaler in particular. Adobe's another company that's had a nice bounce in the last couple of weeks. Accenture, again, speaks to that outsourcing headwinds that we mentioned earlier. And now the Google Cloud platform is a bit of a concern. It's still elevated overall, you know but down and well down in Europe. Under that magic, you know we often show that magic 40% dotted line, that red dotted line of net score anything above that we cite as elevated. Well, some important callouts to hear that you see companies that have Euro exposure. And again, we see this as just not confined to Europe and this is something we're going to pay close attention to and continue to report on in the next several weeks and months. All right, so what should we expect from here? The Ark investment stocks of Cathie Wood fame have been tracking in a downward trend since last November, meaning, you know, these high PE stocks are making lower lows and higher, sorry, lower highs and lower lows since then, right? The trend is not their friend. Investors I talk to are being much more cautious about buying the dip. They're raising cash and being a little bit more patient. You know, traders can trade in this environment but unless you can pay attention to in a minute by minute you're going to get whipsawed. Investors tell me that they're still eyeing big tech even though Apple has been on a recent tear and has some exposure with supply change challenges, they're looking for maybe entry points in, within that chop for Apple, Amazon, Microsoft, and Alphabet. And look, as I've been stressing, 6% spending growth is still very solid. It's a case of resetting the outlook relative to previous expectations. So when you zoom out and look at the growth in data, getting digital right, security investments, automation, cloud, AI containers, all the fundamentals are really strong and they have not changed. They're all powering this new digital economy and we believe it's just prudence versus a shift in the importance of IT. Now, one point of caution is there's a lot of discussion around a shift in global economies. Supply chain uncertainty, persistent semiconductor shortages especially in areas like, you know driver ICs and boring things like parts for displays and analog and micro controllers and power regulators. Stuff that's, you know, just not playing nice these days and wreaking havoc. And this creates uncertainty, which sometimes can pick up momentum in a snowballing effect. And that's something that we're watching closely and we're going to be vigilant reporting to you when we see changes in the data and in our forecast even when we think our forecast are wrong. Okay, that's it for today. Thanks to Alex Merson who does the production and podcasts for Breaking Analysis and Stephanie Chan who provides background research. Kristen Martin and Cheryl Knight, and all theCUBE writers they help get the word out, and thanks to Rob Hof, our EIC over at SiliconANGLE. Remember I publish weekly on wikibon.com and siliconangle.com. These episodes are all available as podcasts wherever you listen. All you got to do is search Breaking Analysis podcasts. etr.ai that's where you can get access to all this survey data and make your own cuts. It's awesome, check that out. Keep in touch with me. You can email me at dave.vellante@siliconangle.com. You can hit me up on LinkedIn. This is Dave Vellante for theCUBE insights powered by ETR. Be safe, stay well, and we'll see you next time. (gentle music)

Published Date : Apr 2 2022

SUMMARY :

in Palo Alto in Boston, the pandemic to 50% today

ENTITIES

Entity	Category	Confidence
Alex Merson	PERSON	0.99+
Apple	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
UiPath	ORGANIZATION	0.99+
Erik Bradley	PERSON	0.99+
Dave Vellante	PERSON	0.99+
HP	ORGANIZATION	0.99+
Rob Hof	PERSON	0.99+
Cheryl Knight	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Stephanie Chan	PERSON	0.99+
Dell	ORGANIZATION	0.99+
Adobe	ORGANIZATION	0.99+
Alphabet	ORGANIZATION	0.99+
April	DATE	0.99+
January	DATE	0.99+
80 billion	QUANTITY	0.99+
Europe	LOCATION	0.99+
Palo Alto	LOCATION	0.99+
Kristen Martin	PERSON	0.99+
10%	QUANTITY	0.99+
13%	QUANTITY	0.99+
US	LOCATION	0.99+
December	DATE	0.99+
$4 trillion	QUANTITY	0.99+
6%	QUANTITY	0.99+
ETR	ORGANIZATION	0.99+
2022	DATE	0.99+
last November	DATE	0.99+
40%	QUANTITY	0.99+
8%	QUANTITY	0.99+
First	QUANTITY	0.99+
last year	DATE	0.99+
50%	QUANTITY	0.99+
mid-March of 2022	DATE	0.99+
two year	QUANTITY	0.99+
NASDAQ	ORGANIZATION	0.99+
dave.vellante@siliconangle.com	OTHER	0.99+
three months	QUANTITY	0.99+
today	DATE	0.99+
next week	DATE	0.99+
theCUBE	ORGANIZATION	0.99+
last September	DATE	0.99+
this week	DATE	0.98+
FED	ORGANIZATION	0.98+
this year	DATE	0.98+
one point	QUANTITY	0.98+
SiliconANGLE	ORGANIZATION	0.98+
late last year	DATE	0.98+
September '21	DATE	0.98+
Russia	LOCATION	0.97+
Cathie Wood	PERSON	0.97+
around 6%	QUANTITY	0.97+
more than a thousand CIOs	QUANTITY	0.97+
200 basis point	QUANTITY	0.97+
SAP	ORGANIZATION	0.97+
Boston	LOCATION	0.96+
theCUBE Studios	ORGANIZATION	0.96+
earlier this year	DATE	0.96+
nearly 9%	QUANTITY	0.95+
Accenture	ORGANIZATION	0.95+
LinkedIn	ORGANIZATION	0.95+
first look	QUANTITY	0.94+
Q1	DATE	0.93+
700 basis point	QUANTITY	0.93+
McAfee	ORGANIZATION	0.93+
Q2	DATE	0.93+

David C King, FogHorn Systems | CUBEConversation, November 2018

(uplifting orchestral music) >> Hey, welcome back, everybody. Jeff Frick here with theCUBE. We're at the Palo Alto studios, having theCUBE Conversation, a little break in the action of the conference season before things heat up, before we kind of come to the close of 2018. It's been quite a year. But it's nice to be back in the studio. Things are a little bit less crazy, and we're excited to talk about one of the really hot topics right now, which is edge computing, fog computing, cloud computing. What do all these things mean, how do they all intersect, and we've got with us today David King. He's the CEO of FogHorn Systems. David, first off, welcome. >> Thank you, Jeff. >> So, FogHorn Systems, I guess by the fog, you guys are all about the fog, and for those that don't know, fog is kind of this intersection between cloud, and on prem, and... So first off, give us a little bit of the background of the company and then let's jump into what this fog thing is all about. >> Sure, actually, it all dovetails together. So yeah, you're right, FogHorn, the name itself, came from Cisco's invented term, called fog computing, from almost a decade ago, and it connoted this idea of computing at the edge, but didn't really have a lot of definition early on. And so, FogHorn was started actually by a Palo Alto Incubator, just nearby here, that had the idea that hey, we got to put some real meaning and some real meat on the bones here, with fog computing. And what we think FogHorn has become over the last three and a half years, since we took it out of the incubator, since I joined, was to put some real purpose, meaning, and value in that term. And so, it's more than just edge computing. Edge computing is a related term. In the industrial world, people would say, hey, I've had edge computing for three, 40, 50 years with my production line control and also my distributed control systems. I've got hard wired compute. I run, they call them, industrial PCs in the factory. That's edge compute. The IT roles come along and said, no, no, no, fog compute is a more advanced form of it. Well, the real purpose of fog computing and edge computing, in our view, in the modern world, is to apply what has traditionally been thought of as cloud computing functions, big, big data, but running in an industrial environment, or running on a machine. And so, we call it as really big data operating in the world's smallest footprint, okay, and the real point of this for industrial customers, which is our primary focus, industrial IoT, is to deliver as much analytic machine learning, deep learning AI capability on live-streaming sensor data, okay, and what that means is rather than persisting a lot of data either on prem, and then sending it to the cloud, or trying to stream all this to the cloud to make sense of terabytes or petabytes a day, per machine sometimes, right, think about a jet engine, a petabyte every flight. You want to do the compute as close to the source as possible, and if possible, on the live streaming data, not after you've persisted it on a big storage system. So that's the idea. >> So you touch on all kinds of stuff there. So we'll break it down. >> Unpack it, yeah. >> Unpack it. So first off, just kind of the OT/IT thing, and I think that's really important, and we talked before turning the cameras on about Dr. Tom from HP, he loves to make a big symbolic handshake of the operations technology, >> One of our partners. >> Right, and IT, and the marriage of these two things, where before, as you said, the OT guys, the guys that have been running factories, you know, they've been doing this for a long time, and now suddenly, the IT folks are butting in and want to get access to that data to provide more control. So, you know, as you see the marriage of those two things coming together, what are the biggest points of friction, and really, what's the biggest opportunity? >> Great set of questions. So, quite right, the OT folks are inherently suspicious of IT, right? I mean, if you don't know the history, 40 plus years ago, there was a fork in the road, where in factory operations, were they going to embrace things like ethernet, the internet, connected systems? In fact, they purposely air gapped an island of those systems 'cause they was all about machine control, real-time, for safety, productivity, and uptime of the machine. They don't want any, you can't use kind of standard ethernet, it has to be industrial ethernet, right? It has to have time bound and deterministic. It can't be a retry kind of a system, right? So different MAC layer for a reason, for example. What did the physical wiring look like? It's also different cabling, because you can't have cuts, jumps in the cable, right? So it's a different environment entirely that OT grew up in, and so, FogHorn is trying to really bring the value of what people are delivering for AI, essentially, into that environment in a way that's non-threatening to, it's supplemental to, and adds value in the OT world. So Dr. Tom is right, this idea of bringing IT and OT together is inherently challenging, because these were kind of fork in the road, island-ed in the networks, if you will, different systems, different nomenclature, different protocols, and so, there's a real education curve that IT companies are going through, and the idea of taking all this OT data that's already been produced in tremendous volumes already before you add new kinds of sensing, and sending it across a LAN which it's never talked to before, then across a WAN to go to a cloud, to get some insight doesn't make any sense, right? So you want to leverage the cloud, you want to leverage data centers, you want to leverage the LAN, you want to leverage 5G, you want to leverage all the new IT technologies, but you have to do it in a way that makes sense for it and adds value in the OT context. >> I'm just curious, you talked about the air gapping, the two systems, which means they are not connected, right? >> No, they're connected with a duct, they're connected to themselves, in the industrial-- >> Right, right, but before, the OT system was air gapped from the IT system, so thinking about security and those types of threats, now, if those things are connected, that security measure has gone away, so what is the excitement, adoption scare when now, suddenly, these things that were separate, especially in the age of breaches that we know happen all the time as you bring those things together? >> Well, in fact, there have been cyber breaches in the OT context. Think about Stuxnet, think about things that have happened, think about the utilities back keys that were found to have malwares implanted in them. And so, this idea of industrial IoT is very exciting, the ability to get real-time kind of game changing insights about your production. A huge amount of economic activity in the world could be dramatically improved. You can talk about trillions of dollars of value which the McKenzie, and BCG, and Bain talk about, right, by bringing kind of AI, ML into the plant environment. But the inherent problem is that by connecting the systems, you introduce security problems. You're talking about a huge amount of cost to move this data around, persist it then add value, and it's not real-time, right? So, it's not that cloud is not relevant, it's not that it's not used, it's that you want to do the compute where it makes sense, and for industrial, the more industrialized the environment, the more high frequency, high volume data, the closer to the system that you can do the compute, the better, and again, it's multi-layer of compute. You probably have something on the machine, something in the plant, and something in the cloud, right? But rather than send raw OT data to the cloud, you're going to send processed intelligent metadata insights that have already been derived at the edge, update what they call the fleet-wide digital twin, right? The digital twin for that whole fleet of assets should sit in the cloud, but the digital twin of the specific asset should probably be on the asset. >> So let's break that down a little bit. There's so much good stuff here. So, we talked about OT/IT and that marriage. Next, I just want to touch on cloud, 'cause a lot of people know cloud, it's very hot right now, and the ultimate promise of cloud, right, is you have infinite capacity >> Right, infinite compute. >> Available on demand, and you have infinite compute, and hopefully you have some big fat pipes to get your stuff in and out. But the OT challenge is, and as you said, the device challenge is very, very different. They've got proprietary operating systems, they've been running for a very, very long time. As you said, they put off boatloads, and boatloads, and boatloads of data that was never really designed to feed necessarily a machine learning algorithm, or an artificial intelligence algorithm when these things were designed. It wasn't really part of the equation. And we talk all the time about you know, do you move the compute to the data, you move the data to the compute, and really, what you're talking about in this fog computing world is kind of a hybrid, if you will, of trying to figure out which data you want to process locally, and then which data you have time, relevance, and other factors that just go ahead and pump it upstream. >> Right, that's a great way to describe it. Actually, we're trying to move as much of the compute as possible to the data. That's really the point of, that's why we say fog computing is a nebulous term about edge compute. It doesn't have any value until you actually decide what you're trying to do with it, and what we're trying to do is to take as much of the harder compute challenges, like analytics, machine learning, deep learning, AI, and bring it down to the source, as close to the source as you can, because you can essentially streamline or make more efficient every layer of the stack. Your models will get much better, right? You might have built them in the cloud initially, think about a deep learning model, but it may only be 60, 70% accurate. How do you do the improvement of the model to get it closer to perfect? I can't go send all the data up to keep trying to improve it. Well, typically, what happens is I down sample the data, I average it and I send it up, and I don't see any changes in the average data. Guess what? We should do is inference all the time and all the data, run it in our stack, and then send the metadata up, and then have the cloud look across all the assets of a similar type, and say, oh, the global fleet-wide model needs to be updated, and then to push it down. So, with Google just about a month ago, in Barcelona, at the IoT show, what we demonstrated was the world's first instance of AI for industrial, which is closed loop machine learning. We were taking a model, a TensorFlow model, trained in the cloud in the data center, brought into our stack and referring 100% inference-ing in all the live data, pushing the insights back up into Google Cloud, and then automatically updating the model without a human or data scientist having to look at it. Because essentially, it's ML on ML. And that to us, ML on ML is the foundation of AI for industrial. >> I just love that something comes up all the time, right? We used to make decisions based on the sampling of historical data after the fact. >> That's right, that's how we've all been doing it. >> Now, right, right now, the promise of streaming is you can make it based on all the data, >> All the time. >> All the time in real time. >> Permanently. >> This is a very different thing. So, but as you talked about, you know, running some complex models, and running ML, and retraining these things. You know, when you think of edge, you think of some little hockey puck that's out on the edge of a field, with limited power, limited connectivity, so you know, what's the reality of, how much power do you have at some of these more remote edges, or we always talk about the field of turbines, oil platforms, and how much power do you need, and how much compute that it actually starts to be meaningful in terms of the platform for the software? >> Right, there's definitely use cases, like you think about the smart meters, right, in the home. The older generation of those meters may have had very limited compute, right, like you know, talking about single megabyte of memory maybe, or less, right, kilobytes of memory. Very hard to run a stack on that kind of footprint. The latest generation of smart meters have about 250 megabytes of memory. A Raspberry Pi today is anywhere from a half a gig to a gig of memory, and we're fundamentally memory-bound, and obviously, CPU if it's trying to really fast compute, like vibration analysis, or acoustic, or video. But if you're just trying to take digital sensing data, like temperature, pressure, velocity, torque, we can take humidity, we can take all of that, believe it or not, run literally dozens and dozens of models, even train the models in something as small as a Raspberry Pi, or a low end x86. So our stack can run in any hardware, we're completely OS independent. It's a full up software layer. But the whole stack is about 100 megabytes of memory, with all the components, including Docker containerization, right, which compares to about 10 gigs of running a stream processing stack like Spark in the Cloud. So it's that order of magnitude of footprint reduction and speed of execution improvement. So as I said, world's smallest fastest compute engine. You need to do that if you're going to talk about, like a wind turbine, it's generating data, right, every millisecond, right. So you have high frequency data, like turbine pitch, and you have other conceptual data you're trying to bring in, like wind conditions, reference information about how the turbine is supposed to operate. You're bringing in a torrential amount of data to do this computation on the fly. And so, the challenge for a lot of the companies that have really started to move into the space, the cloud companies, like our partners, Google, and Amazon, and Microsoft, is they have great cloud capabilities for AI, ML. They're trying to move down to the edge by just transporting the whole stack to there. So in a plant environment, okay, that might work if you have massive data centers that can run it. Now I still got to stream all my assets, all the data from all of my assets to that central point. What we're trying to do is come out the opposite way, which is by having the world's smallest, fastest engine, we can run it in a small compute, very limited compute on the asset, or near the asset, or you can run this in a big compute and we can take on lots and lots of use cases for models simultaneously. >> I'm just curious on the small compute case, and again, you want all the data-- >> You want to inference another thing, right? >> Does it eventually go back, or is there a lot of cases where you can get the information you need off the stream and you don't necessarily have to save or send that upstream? >> So fundamentally today, in the OT world, the data usually gets, if the PLC, the production line controller, that has simple KPIs, if temperature goes to X or pressure goes to Y, do this. Those simple KPIs, if nothing is executed, it gets dumped into a local protocol server, and then about every 30, 60, 90 days, it gets written over. Nobody ever looks at it, right? That's why I say, 99% of the brown field data in OT has never really been-- >> Almost like a security-- >> Has never been mined for insight. Right, it just gets-- >> It runs, and runs, and runs, and every so often-- >> Exactly, and so, if you're doing inference-ing, and doing real time decision making, real time actual with our stack, what you would then persist is metadata insights, right? Here is an event, or here is an outcome, and oh, by the way, if you're doing deep learning or machine learning, and you're seeing deviation or drift from the model's prediction, you probably want to keep that and some of the raw data packets from that moment in time, and send that to the cloud or data center to say, oh, our fleet-wide model may not be accurate, or may be drifting, right? And so, what you want to do, again, different horses for different courses. Use our stack to do the lion's share of the heavy duty real time compute, produce metadata that you can send to either a data center or a cloud environment for further learning. >> Right, so your piece is really the gathering and the ML, and then if it needs to go back out for more heavy lifting, you'll send it back up, or do you have the cloud application as well that connects if you need? >> Yeah, so we build connectors to you know, Google Cloud Platform, Google IoT Core, to AWS S3, to Microsoft Azure, virtually any, Kafka, Hadoop. We can send the data wherever you want, either on plant, right back into the existing control systems, we can send it to OSIsoft PI, which is a great time series database that a lot of process industries use. You could of course send it to any public cloud or a Hadoop data lake private cloud. You can send the data wherever you want. Now, we also have, one of our components is a time series database. You can also persist it in memory in our stack, just for buffering, or if you have high value data that you want to take a measurement, a value from a previous calculation and bring it into another calculation during later, right, so, it's a very flexible system. >> Yeah, we were at OSIsoft PI World earlier this year. Some fascinating stories that came out of-- >> 30 year company. >> The building maintenance, and all kinds of stuff. So I'm just curious, some of the easy to understand applications that you've seen in the field, and maybe some of the ones that were a surprise on the OT side. I mean, obviously, preventative maintenance is always towards the top of the list. >> Yeah, I call it the layer cake, right? Especially when you get to remote assets that are either not monitored or lightly monitored. They call it drive-by monitoring. Somebody shows up and listens or looks at a valve or gauge and leaves. Condition-based monitoring, right? That is actually a big breakthrough for some, you know, think about fracking sites, or remote oil fields, or mining sites. The second layer is predictive maintenance, which the next generation is kind of predictive, prescriptive, even preventive maintenance, right? You're making predictions or you're helping to avoid downtime. The third layer, which is really where our stack is sort of unique today in delivering is asset performance optimization. How do I increase throughput, how do I reduce scrap, how do I improve worker safety, how do I get better processing of the data that my PLC can't give me, so I can actually improve the performance of the machine? Now, ultimately, what we're finding is a couple of things. One is, you can look at individual asset optimization, process optimization, but there's another layer. So often, we're deployed to two layers on premise. There's also the plant-wide optimization. We talked about wind farm before, off camera. So you've got the wind turbine. You can do a lot of things about turbine health, the blade pitch and condition of the blade, you can do things on the battery, all the systems on the turbine, but you also need a stack running, like ours, at that concentration point where there's 200 plus turbines that come together, 'cause the optimization of the whole farm, every turbine affects the other turbine, so a single turbine can't tell you speed, rotation, things that need to change, if you want to adjust the speed of one turbine, versus the one next to it. So there's also kind of a plant-wide optimization. Talking about time that's driving, there's going to be five layers of compute, right? You're going to have the, almost what I call the ECU level, the individual sub-system in the car that, the engine, how it's performing. You're going to have the gateway in the car to talk about things that are happening across systems in the car. You're going to have the peer to peer connection over 5G to talk about optimization right between vehicles. You're going to have the base station algorithms looking at a micro soil or macro soil within a geographic area, and of course, you'll have the ultimate cloud, 'cause you want to have the data on all the assets, right, but you don't want to send all that data to the cloud, you want to send the right metadata to the cloud. >> That's why there are big trucks full of compute now. >> By the way, you mentioned one thing that I should really touch on, which is, we've talked a lot about what I call traditional brown field automation and control type analytics and machine learning, and that's kind of where we started in discrete manufacturing a few years ago. What we found is that in that domain, and in oil and gas, and in mining, and in agriculture, transportation, in all those places, the most exciting new development this year is the movement towards video, 3D imaging and audio sensing, 'cause those sensors are now becoming very economical, and people have never thought about, well, if I put a camera and apply it to a certain application, what can I learn, what can I do that I never did before? And often, they even have cameras today, they haven't made use of any of the data. So there's a very large customer of ours who has literally video inspection data every product they produce everyday around the world, and this is in hundreds of plants. And that data never gets looked at, right, other than training operators like, hey, you missed the defects this day. The system, as you said, they just write over that data after 30 days. Well, guess what, you can apply deep learning tensor flow algorithms to build a convolutional neural network model and essentially do the human visioning, rather than an operator staring at a camera, or trying to look at training tapes. 30 days later, I'm doing inference-ing of the video image on the fly. >> So, do your systems close loop back to the control systems now, or is it more of a tuning mechanism for someone to go back and do it later? >> Great question, I just got asked that this morning by a large oil and gas super major that Intel just introduced us to. The short answer is, our stack can absolutely go right back into the control loop. In fact, one of our investors and partners, I should mention, our investors for series A was GE, Bosch, Yokogawa, Dell EMC, and our series debuted a year ago was Intel, Saudi Aramco, and Honeywell. So we have one foot in tech, one foot in industrial, and really, what we're really trying to bring is, you said, IT, OT together. The short answer is, you can do that, but typically in the industrial environment, there's a conservatism about, hey, I don't want to touch, you know, affect the machine until I've proven it out. So initially, people tend to start with alerting, so we send an automatic alert back into the control system to say, hey, the machine needs to be re-tuned. Very quickly, though, certainly for things that are not so time-sensitive, they will just have us, now, Yokogawa, one of our investors, I pointed out our investors, actually is putting us in PLCs. So rather than sending the data off the PLC to another gateway running our stack, like an x86 or ARM gateway, we're actually, those PLCs now have Raspberry Pi plus capabilities. A lot of them are-- >> To what types of mechanism? >> Well, right now, they're doing the IO and the control of the machine, but they have enough compute now that you can run us in a separate module, like the little brain sitting right next to the control room, and then do the AI on the fly, and there, you actually don't even need to send the data off the PLC. We just re-program the actuator. So that's where it's heading. It's eventually, and it could take years before people get comfortable doing this automatically, but what you'll see is that what AI represents in industrial is the self-healing machine, the self-improving process, and this is where it starts. >> Well, the other thing I think is so interesting is what are you optimizing for, and there is no right answer, right? It could be you're optimizing for, like you said, a machine. You could be optimizing for the field. You could be optimizing for maintenance, but if there is a spike in pricing, you may say, eh, we're not optimizing now for maintenance, we're actually optimizing for output, because we have this temporary condition and it's worth the trade-off. So I mean, there's so many ways that you can skin the cat when you have a lot more information and a lot more data. >> No, that's right, and I think what we typically like to do is start out with what's the business value, right? We don't want to go do a science project. Oh, I can make that machine work 50% better, but if it doesn't make any difference to your business operations, so what? So we always start the investigation with what is a high value business problem where you have sufficient data where applying this kind of AI and the edge concept will actually make a difference? And that's the kind of proof of concept we like to start with. >> So again, just to come full circle, what's the craziest thing an OT guy said, oh my goodness, you IT guys actually brought some value here that I didn't know. >> Well, I touched on video, right, so without going into the whole details of the story, one of our big investors, a very large oil and gas company, we said, look, you guys have done some great work with I call it software defined SCADA, which is a term, SCADA is the network environment for OT, right, and so, SCADA is what the PLCs and DCSes connect over these SCADA networks. That's the control automation role. And this investor said, look, you can come in, you've already shown us, that's why they invested, that you've gone into brown field SCADA environments, done deep mining of the existing data and shown value by reducing scrap and improving output, improving worker safety, all the great business outcomes for industrial. If you come into our operation, our plant people are going to say, no, you're not touching my PLC. You're not touching my SCADA network. So come in and do something that's non-invasive to that world, and so that's where we actually got started with video about 18 months ago. They said, hey, we've got all these video cameras, and we're not doing anything. We just have human operators writing down, oh, I had a bad event. It's a totally non-automated system. So we went in and did a video use case around, we call it, flare monitoring. You know, hundreds of stacks of burning of oil and gas in a production plant. 24 by seven team of operators just staring at it, writing down, oh, I think I had a bad flare. I mean, it's a very interesting old world process. So by automating that and giving them an AI dashboard essentially. Oh, I've got a permanent record of exactly how high the flare was, how smoky was it, what was the angle, and then you can then fuse that data back into plant data, what caused that, and also OSIsoft data, what was the gas composition? Was it in fact a safety violation? Was it in fact an environmental violation? So, by starting with video, and doing that use case, we've now got dozens of use cases all around video. Oh, I could put a camera on this. I could put a camera on a rig. I could've put a camera down the hole. I could put the camera on the pipeline, on a drone. There's just a million places that video can show up, or audio sensing, right, acoustic. So, video is great if you can see the event, like I'm flying over the pipe, I can see corrosion, right, but sometimes, like you know, a burner or an oven, I can't look inside the oven with a camera. There's no camera that could survive 600 degrees. So what do you do? Well, that's probably, you can do something like either vibration or acoustic. Like, inside the pipe, you got to go with sound. Outside the pipe, you go video. But these are the kind of things that people, traditionally, how did they inspect pipe? Drive by. >> Yes, fascinating story. Even again, I think at the end of the day, it's again, you can make real decisions based on all the data in real time, versus some of the data after the fact. All right, well, great conversation, and look forward to watching the continued success of FogHorn. >> Thank you very much. >> All right. >> Appreciate it. >> He's David King, I'm Jeff Frick, you're watching theCUBE. We're having a CUBE conversation at our Palo Alto studio. Thanks for watching, we'll see you next time. (uplifting symphonic music)

Published Date : Nov 16 2018

SUMMARY :

of the conference season the background of the company and the real point of this So you touch on Unpack it, of the OT/IT thing, and the marriage of these two things, and the idea of taking all this OT data and something in the cloud, right? and the ultimate promise of cloud, right, and then which data you have time, and all the data, all the time, right? That's right, that's how and how much power do you need, and you have other conceptual data 99% of the brown field data in OT Right, it just gets-- and some of the raw data packets You can send the data wherever you want. that came out of-- and maybe some of the ones the peer to peer connection over 5G of compute now. and essentially do the human visioning, back into the control system to say, and the control of the machine, You could be optimizing for the field. of AI and the edge concept So again, just to come full circle, Outside the pipe, you go video. based on all the data in real time, we'll see you next time.

ENTITIES

Entity	Category	Confidence
Jeff	PERSON	0.99+
David King	PERSON	0.99+
Bosch	ORGANIZATION	0.99+
GE	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
50%	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
Jeff Frick	PERSON	0.99+
600 degrees	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
David C King	PERSON	0.99+
David	PERSON	0.99+
Intel	ORGANIZATION	0.99+
November 2018	DATE	0.99+
FogHorn Systems	ORGANIZATION	0.99+
Yokogawa	ORGANIZATION	0.99+
Honeywell	ORGANIZATION	0.99+
Cisco	ORGANIZATION	0.99+
99%	QUANTITY	0.99+
100%	QUANTITY	0.99+
two systems	QUANTITY	0.99+
one foot	QUANTITY	0.99+
two things	QUANTITY	0.99+
three	QUANTITY	0.99+
Barcelona	LOCATION	0.99+
BCG	ORGANIZATION	0.99+
40	QUANTITY	0.99+
HP	ORGANIZATION	0.99+
third layer	QUANTITY	0.99+
Palo Alto	LOCATION	0.99+
seven team	QUANTITY	0.99+
One	QUANTITY	0.99+
second layer	QUANTITY	0.99+
2018	DATE	0.99+
Saudi Aramco	ORGANIZATION	0.99+
200 plus turbines	QUANTITY	0.99+
SCADA	TITLE	0.99+
60	QUANTITY	0.99+
two layers	QUANTITY	0.99+
McKenzie	ORGANIZATION	0.98+
Dr.	PERSON	0.98+
Tom	PERSON	0.98+
a year ago	DATE	0.98+
OSIsoft	ORGANIZATION	0.98+
Yokogawa	PERSON	0.98+
Dell EMC	ORGANIZATION	0.98+
first	QUANTITY	0.98+
today	DATE	0.98+
single megabyte	QUANTITY	0.98+
30 year	QUANTITY	0.98+
one	QUANTITY	0.97+
AWS	ORGANIZATION	0.97+
90 days	QUANTITY	0.97+
half a gig	QUANTITY	0.97+
about 10 gigs	QUANTITY	0.97+
earlier this year	DATE	0.97+
five layers	QUANTITY	0.96+
40 plus years ago	DATE	0.96+
one turbine	QUANTITY	0.96+
about 100 megabytes	QUANTITY	0.96+
about 250 megabytes	QUANTITY	0.96+
30 days later	DATE	0.96+
one thing	QUANTITY	0.96+
dozens	QUANTITY	0.96+
OSIsoft PI World	ORGANIZATION	0.95+
trillions of dollars	QUANTITY	0.95+
50 years	QUANTITY	0.94+
single turbine	QUANTITY	0.93+
Bain	PERSON	0.93+
Hadoop	TITLE	0.92+
60, 70%	QUANTITY	0.91+
a decade ago	DATE	0.9+
a month ago	DATE	0.89+
FogHorn	ORGANIZATION	0.89+
CUBE	ORGANIZATION	0.88+

Wake Gregg, The eBike Store | InterBike 2018

. >>Hey, welcome back everybody. Jeff Frick here with the cube. We're in Reno, Nevada. Interbike is happening uh, this week in Reno. It's a big huge bike show. They have stuff up at the mountains. They've got stuff at the convention center. This is a small side of it put on by Royal Dutch gazelle, a bikes, 125 year old bike company that is all in on electric bikes. We came because this e-bikes story and part of the big motors, or excuse me, a little motors, big battery kind of last mile thing has really taken off. So we want to come get a better feel for what's going on and we're excited to have a dealer from Portland, one of the most bike friendly towns in all the U S he's wait, Greg. And he runs e-bike store weight. Great to see you. Thank you very much. It's great to be here. Thanks for having me. >>So you said you've had your store open for 10 years. 10 years. We were the first all I looked at short store to open in Portland. Actually it was part of an MBA project. I was in China taking a class, saw electric bikes for the first time, gas had just spiked and realize these are the most efficient form of motorized transportation known. Right. And nobody was doing it. And so next class I had wrote a business plan, launched it 10 years ago by putting 25,000 on a credit card and borrowed 10,000 from a friend and 10 years later we're still here. Love it, love the story. It's been, it's been a fun ride. So it's just, you know, you're the second retailer we've had on and they were also exclusive e-bikes in the Bay area. So you know, was the kind of existing bike infrastructure attitude, you know, industry just looking down to these only things where they just too weird to new Y, you know, kind of the early ones are e-bikes only. >>What's interesting, if you come to the market right now, what you see is you see some of the largest companies in the world putting a lot of resources, engineering resources, manufacturing resources, testing resources behind e-bikes. Back in the day, it wasn't such you, these manufacturers make them in the kind of, the customer was a test person. Right. And so it's been a very bumpy road to get to this point. But at this point they're very reliable. And so at first when caught, when shops were brought these things, they said, why would I ever carry that? Who can keep it running? You know? And now it's at the point where they're very easy to keep her on. They have log files, you plug it into the computer if you have a problem with it and it tells you, Oh error code, fix this one thing and off you're going again. >>But it hasn't always been such. And so the Rick older bike shops in particular avoided them because you make your money in a bike shop by having a customer for life and they couldn't keep them running. So they were nervous. They would not be able to keep the customer. Right. And there was a, it was wise, you know, now it's at the point where all the IBDs are coming in in Portland right now we have seven electrical only bike shops. All the big IBDs are carrying it with IBD, independent bike dealers. Okay. And on top of that, half the people that are looking for any bike will not buy from a traditional bike shop. It only come to an e-bike specialist. And so that's kind of our niche is the people and we really focus on that. So we try to have all of our, how we explain things, not to use big bike terms. >>We talk about how it would value the customer and use a whole different lingo than a traditional shop. Right. So it is a lot of different things going on with bikes. So one of them, right, is the speed, um, and then, and how it's classified. So yes, you know, there's the kind of the 20 mile an hour limit and we see that in the scooters and all these electric vehicles that keep it not a motor vehicle. And then they've got one here. I think it was 27 or 28 miles across three class three. So the laws seem to be kind of trying to catch up, like how do we classify these things? Are they bikes? Are they allowed on the bike path? Are they not allowed on the bike? Pass the hop. It's funny you bring that up tonight. Evolve. Well, it's funny you bring that up today because just today by Portland, which is one of the biggest bike blogs in the nation, um, came out with an article saying they were relatively in the fine print of or Portland code, my city's code and found out you can't ride your bike on the city paths and the city parks, and I didn't know this, I've been in business 10 years, but the very fine print and under dissertation you can't do it. >>Um, so it is, it's a gray space. Um, the 20th mile an hour bikes. Well it seems crazy fast when you and I are standing here. When you're on a road and there is a backup of cars behind you, where's the 20 mile an hour speed limit and they're driving 25 right. You know, it feels kind of safer to be able to go 25 with them and not hold them up and be able to get away from the door and, and zone. We're in a car doesn't go over to the store and you by taking the lane it feels much safer. So I actually, you know, I ride a class one most of the time but I, I do like riding class three bikes. Right. Just curious in terms of of the change of experience on an E bike versus a regular bike, some of the customers that you have, how is it fundamentally different? >>Cause I, you know I came to here today thing and this was really a last mile play. It's not a last mile play at all. For us, about 35% of our customers, their e-bike is their main mode of transportation. It is their car. It is how they get around and about 20% historically from our shop having people with physical disabilities or limitations in some way, shape or form 20%, 20%. So it's people who can no longer make it up the Hill to their house. It's people who can't arrive at work sweaty. It's people with ms, people who are missing along, people who have CLPD, um, you name it. These are people who now can ride again and getting them active again. And so it's a whole different mindset. Um, historically the bike industry has really gone after kinda the elite athlete, right? And this is something different. >>It's people who have, may haven't written a bike for oftentimes 20 plus years, right. Are now able to get out and go on a Hill. And the most interesting thing, they did a study in Australia where they put on, they worked with psycho stupid, been injured and they hooked him up to exercise bikes in front of a video screen showing them as they're paddling down the road essentially. And they change the video to climb a Hill, but they didn't change the settings on the exercise bike. They're sitting on the cyclist reported a higher level of pain when the visual show them climbing the Hill. So e-bikes do the exact reverse of that. And you're actually rewiring your brain so that bikes don't add pain and you can get where you need to go easily and efficiently. Right. So it's their primary, their primary methods. So you talked about the connectivity, um, you know, an app, integrated experience with all these devices we see over and over. >>So how has that changed your experience? Are you, is it, is it app for the consumer in terms of they're keeping track of their miles? Is it just for you and the maintenance or how's the integration of an app working through different ways for the app? So there's a mechanics app, we can plug it in and see the error codes. And that's important because being back in the day, someone will come in and say, I wrote this thing at mile 25 it cut out and stopped working. So after work, you know, or we go out and ride 25 miles and try to see if we could recreate the issue. And it was a pain. Now wait, you just told me it wasn't a pain to ride 25 buses. This is back in the day. It was a pain to try falls off. Intermittent issues are the bane of our existence. >>Yes, yes. But the uh, having a log file, we just plug it in and says, Oh, it cut out because of this error code, you know, and boom. Okay. Replaced the speed sensor. Good. You're back up and rolling. Right. Especially with people who commute. They don't want to leave their bike in the shop. They want ready within 24 hours or less. And so it's gotta be turned right. And so it's a whole different form of mechanics and a whole different level of support from the bike dealer. And that's why we choose the bike lines we choose like gazelle. Right. Who support their products very well. So it's pretty interesting that you said, you know, we talked about the scooter space and one wheels and all that fun stuff. So many deals, companies were started with Kickstarter. It's amazing to me how many kind of Kickstarter projects actually turned into real companies. >>Boosted future motion being a couple of my favorites. Future motion. Actually the design behind it was the guy who first invented the cell and unicycle unicycles Daniel Wood, he's actually from clock, I remember from Clackamas right across the river from Portland. And so I tried as original version of the self-balancing unicycle, which they made their first one wheels from and that, you know, it's come a long way and there's the one wheel, but it's been fascinating progression to watch him write and bring that out too. But that's very different than 125 year old Dutch company that's been making Mike making these bikes for a hundred plus years. Really? It's funny, we have, I think there's seven models here that they're showing today. I asked the exact guys how many regular bikes models they have and they're like one. Yeah. So, so they're all in. I mean this is significant. >>You think about some of the biggest companies in the world market cap. Bosch has always worn the top five or 10 market companies in the world. They make the largest set of best selling system in the United States and in Europe. Right? And they're behind it. They have millions of lithium batteries and people's homes already through their power tool division. They're the kind of engineering they're bringing is staggering and it's been really fun to be part of an industry that has been so nascent and yet just boom. Right. You just comes up with fright before you write for your eyes. Okay, so I got to ask you about the, whether you're from Portland, Portland rains a lot in, in, in Holland. How does the rain impact these things? Obviously you just send us their primary vehicle. Is it, is it more dangerous? Is there more spray? >>Is it, is it a factor? Not a factor. This is where the lines you carry make a huge difference. So when you, if you carry it, if you buy one off the internet that hasn't been product tested, you are the product tester. If you buy one like this, they literally have like a saltwater steam bath. They put the bikes in for weeks to stimulate Marine corrosion. They have hydraulic machines that the tar out of them. And so when you get a product, it just works. Um, and so we've had a, we had a Bosch system go completely underwater. Now, I'm not saying this is going to happen for everybody's experience. We had a guy literally put the bike in a river. He went one way. The bike went another, not on purpose, not on purpose. It was underwater for a few minutes. Right? Right. At work and rode home. >>And about a week later it made some noises and we told Bosch what happened, it was not a warranty issue with it was a collision. And Bosch said, you know, we haven't had enough warranty claims. You have some extra motors, we're going to send you a new one. And the guy said, it uses daily commuter. Right? Um, and it works great. Right? So, so w rain does not affect them, but it really depends on the model you have and how much product testing and how much engineering has gone in behind it to make sure you have the experience. Cause lithium and water are not generally friends. No. So, so just, I'll give you the last word. When you talk to people that are new to the space, maybe they just stumbled into the store, they heard about these e-bike things. What's kind of the biggest surprise that you see time and time again when people get one of these things and bring it home. >>Number one is that it rides like a bike. You can just go further. Um, th how well integrated they are. Um, on average the Baker's written 75% more than a traditional bike, 75% more, 75% more. Um, on average you can go about, well, the average speed wise on it. Um, I just study on this today. You know, you can increase your time by an average cycles average 11 miles an hour average e-bike average is about 13 to 1415 around there. And I forget the exact number. So I'm giving a bit of a gray area there. A little bit faster. Yeah. And so it gets you where you're going faster with less sweat. Right. We'll wake. Thanks for, uh, for taking a minute. What a, it's a, it's a cool story. And you know, Portland obviously is leading the charge in this, in this whole transformation. It's been a fun place to be and our customers are just awesome and no two ways about it. Super. Well, thanks again. He's waking. Jeff, you're watching the cube. We're at the Royal Dutch gazelle bike event at Interbike. Thanks for watching. Thank you.

Published Date : Sep 21 2018

SUMMARY :

one of the most bike friendly towns in all the U S he's wait, Greg. So it's just, you know, you're the second retailer we've had on and they were also exclusive e-bikes And now it's at the point where And so the Rick older bike shops in particular avoided them because you So the laws seem to be kind of trying to catch up, like how do we classify these things? some of the customers that you have, how is it fundamentally different? And so it's a whole different mindset. So you talked about the connectivity, um, you know, an app, integrated experience So after work, you know, or we go out and ride 25 miles and try So it's pretty interesting that you said, you know, we talked about the scooter space and one wheels and all that fun I asked the exact guys how many regular bikes models they have and they're like Okay, so I got to ask you about the, whether you're from Portland, Portland rains a lot in, in, in Holland. And so when you get a product, it just works. has gone in behind it to make sure you have the experience. And so it gets you where you're going

ENTITIES

Entity	Category	Confidence
27	QUANTITY	0.99+
Reno	LOCATION	0.99+
Jeff	PERSON	0.99+
Jeff Frick	PERSON	0.99+
Bosch	ORGANIZATION	0.99+
Holland	LOCATION	0.99+
Portland	LOCATION	0.99+
25,000	QUANTITY	0.99+
Australia	LOCATION	0.99+
Europe	LOCATION	0.99+
20 mile	QUANTITY	0.99+
Daniel Wood	PERSON	0.99+
11 miles	QUANTITY	0.99+
25 miles	QUANTITY	0.99+
United States	LOCATION	0.99+
China	LOCATION	0.99+
75%	QUANTITY	0.99+
Greg	PERSON	0.99+
20th mile	QUANTITY	0.99+
Mike	PERSON	0.99+
seven models	QUANTITY	0.99+
10 years	QUANTITY	0.99+
today	DATE	0.99+
20%	QUANTITY	0.99+
one wheel	QUANTITY	0.99+
28 miles	QUANTITY	0.99+
10 years ago	DATE	0.99+
25	QUANTITY	0.99+
25 buses	QUANTITY	0.99+
10 market companies	QUANTITY	0.99+
20 plus years	QUANTITY	0.99+
10 years later	DATE	0.99+
10,000	QUANTITY	0.98+
one wheels	QUANTITY	0.98+
tonight	DATE	0.98+
first time	QUANTITY	0.98+
one	QUANTITY	0.98+
1415	QUANTITY	0.98+
about 35%	QUANTITY	0.98+
about 20%	QUANTITY	0.98+
125 year old	QUANTITY	0.98+
one way	QUANTITY	0.97+
Wake Gregg	PERSON	0.97+
Reno, Nevada	LOCATION	0.97+
InterBike	EVENT	0.96+
24 hours	QUANTITY	0.96+
second retailer	QUANTITY	0.95+
about a week later	DATE	0.95+
a hundred plus years	QUANTITY	0.93+
mile 25	QUANTITY	0.93+
Royal Dutch gazelle	ORGANIZATION	0.93+
first	QUANTITY	0.93+
Kickstarter	ORGANIZATION	0.92+
this week	DATE	0.91+
seven electrical	QUANTITY	0.9+
Royal Dutch gazelle	EVENT	0.9+
20 mile an hour	QUANTITY	0.9+
The eBike Store	ORGANIZATION	0.89+
Interbike	ORGANIZATION	0.88+
two ways	QUANTITY	0.88+
Dutch	OTHER	0.87+
five	QUANTITY	0.85+
first one wheels	QUANTITY	0.82+
half	QUANTITY	0.79+
millions of lithium batteries	QUANTITY	0.78+
Portland,	LOCATION	0.78+
Clackamas	LOCATION	0.78+
about 13	QUANTITY	0.76+
one thing	QUANTITY	0.76+
U	LOCATION	0.75+
minutes	QUANTITY	0.74+
an hour	QUANTITY	0.73+

Mayor A C Wharton, Jr. & Jen Crozier - IBM Edge 2015 - theCUBE

>>Live from Las Vegas, Nevada. Extracting the signal from the noise. It's the queue covering IBM edge 2015 brought to you by IBM. >>Hello everyone. Welcome to the cube. I'm John furrier. We are here in Las Vegas for a special presentation inside the cube. A special announcement. We have mayor AC Borden, mayor of Memphis and Jen Crozer who's the vice president at IBM alliances and Alliance. Welcome to the cube. So mayor Memphis, I'll see renounced city, great culture. Um, smarter cities is a big thing right now. So talk about why Memphis, why IBM, why are you here? What's the big announcement? What's happening in Memphis? >>Well, it's a great day for Memphis in addition to the Grizzlies had slipped that in there, but uh, one, uh, of, of, of just the handful of cities that are receiving what are known as IBM smart cities challenge grants, we pick a challenge. We have, uh, they help us come up with a solution to it. And it's not some abstract idea. In our case, it's how do we weed out the non-emergency calls from the true emergency calls and our EMS service? 120,000, over 120,000 calls a year, about 25,000 of them are not truly emergency calls. So what that does is it takes valuable time and resources away from those true emergency, a true emergency calls. It should be attended to on a priority basis. >>So I know that you have a Twitter handle and you've got a lot of followers. Is the tech culture in Memphis emerging describes the folks that they, what's it like in Memphis from a tech perspective? Are there people who have moved over or there's rabbit. I know there's a lot of folks in town really talk about the tech community. >>Even in my generation, I'm on there just to do a little quality checking. Also on a double analysis. I'm still in this from Zinn. Uh, we're one of the three cities that will received the, uh, Twitter grants, which will allow us to access us and get that data there and use it as we make decisions. So that's really going to be unique for Memphis. So yes, Memphis is a up to date. >>Jen, I gotta ask you because one of the things that's near and dear to our heart in the cube is technology for the advancement of better signal, not noise, whether that's society, education, the Twitter data, and we've talked to in heat you saw about this is that it's the signal of the humans. Um, and this notion of smarter cities is bringing technology to impact the human lives, not just making people get an iWatch or what are, there's some real benefits. Talk about the grants, talk about what IBM is doing because this is real important stuff. I mean, smarter planets to marketing slogan, but the end of the day technology can help people and talk about how that's part of the grant and, and why Memphis and what are these guys doing that's unique. That could be a great case study for others. We started building a smarter planet at one of the things we had to think about was what was the acupressure >>points that would have the biggest ripple effects. And it's cities, right? More than half of the world's population lives in cities. And that's growing by a multiplier every day. And so that's where we wanted to start and we've been really gratified when we started smarter cities challenge, which is a pro bono program. Give us your toughest problem. We will send you a team of six IBM executives for three weeks to help you solve it for free. We've had over 600 mayors apply and we've delivered more than 115 teens >>and in Memphis. I got to ask the question about how you look at the, the governing process now with mobile computing, you can hear everything. They're talking back in real time and it might not be as organized. Certainly tweeting all over the place and kind of getting that data is really key. What's your vision >>that that's the key. We know Memphis, we know what information we have with that. We have what in the world do you do with it? So what better partner than IBM? We know Memphis, but IBM knows the world. We're not the only one who's faced this challenge. So with this team of experts, the IBM professionals who will be owned the ground there, they will then say, here's what you have. Here's the best way to use it. Here's what they did in Rome. Here's what they did in Berlin, London, New York or wherever. So the key is not how much information do you have but what in the world can you do with it in real day to day solutions to those everyday problems. And let me point this out. This is much more than just technology with the process we're going to employ in Memphis using nurses perhaps as dispatchers so that they can ask a few more questions when the call comes in are perhaps helping us set up a system in which nurses will go to the homes of the individuals who we call frequent flyers who often call when, it's not true any emergency but this is because life is on the line here and you really have to have the ability to analyze in real time and apply the right solution. >>And this is why IBM's expertise on a worldwide basis is so critical. >>We always talk about, we always talk about two aspects of real time near real time, which is people get today it's close enough, but when you're in a self driving car maybe or an emergency situation, you want real time. So that's really the key here. Yeah, >>that's the gay real time information being employed in a real real life situation. And that's what any emergency call that's. >>So I've got to get under the hood a little bit cause we like to go a little bit into the engine of, of the, of the local environment. I mean it, people who know life today, they got their cell phones, they think it's easy to call nine one one. It's not that easy. You have these old systems and the cell towers are connected to the municipal networks and you've got a lot of volume of calls coming in. That's a challenge for the local, the technology team and with this new system that's going to clear it up. So, so talk how you guys go from this clogged, you know, traffic calls to really segmenting the emergencies from the nonverbal. >>Again, that's another critical point. We're confident this is going to work and it will somewhat declaw if that's a word unclog because I experienced just without the grant shows us that we could weed out so many of the other calls. They will not be coming in to your nine one one. So that's, that's a big, big help right there is to make sure if we could weed out 25,000 calls, which is what we had last year. We're not truly emergency calls, you wouldn't speak in terms of a Claude nine one one system. >>I was talking to a friend, they're like, give me an example of some of this clog networks that I go, well imagine your phone going off a million times a night. The notifications, cause we're in a notification economy that you have to kind of weed through that. So how are you guys using the data? What's the technology? Can you give some specifics to what's being implemented, the team and how the local resources inter interact with IBM? >>Well I think, you know, the mayor's called out this one source of data that he's getting and mayors we know are getting multiple strings. So we have our intelligent operations center that IBM uses to create dashboards for mayors to see real time data about several different industries or sources or areas that are important to them. But I think that your point about the humans talking is a really critical one. And I want to come back to that because it's easy for us to fixate on the technology. And I think one of the things we've seen in this program is the technology enabling city leaders to hear their constituents in new ways, what they're saying and what they're not saying. And also for them to communicate back with them and close the loop on feedback as policies and programs are inactive. And the thing about the presence of IBM is kind of like a good housekeeping. It will open up Memphis to resources from other national groups. As a matter of fact, we're already using funds from another entity to set up our dashboards for performance in all areas, including of the nine one one calls. So IBM is like this huge magnet. But once folks see, Hey, IBM is in there, others who come in and say, we're going to help Memphis as it develops this system. So >>may I have to ask you a question. If as automation and technology helps abstract away a lot of the manual clogged data and understanding the signal from the noise, what's relevant, what's real time, you have a lot more contextual visibility into your environment and the people. How would you envision the future organization of the government and education and, and uh, police, fire, et cetera, working together? What's the preferred future in your mind's eye? As technology rolls out? The preferred future will be >>the, that when we come up with an innovation like this will be a non event. It would just be, it ought to be the order of the day. Uh, government sometimes kind of lags behind. No, we want to get to the point where we're leading. Uh, quite frankly, my vision is that this soon will become a non event. It will become the order of the day. Uh, humans are citizens will not be afraid of, Oh, I bet not call. I'm going to get a computer on the end of the line or they got a gadget down. They're just going to try to innovate me and see if I'm going to say it would be the order of the day. That's, that's what we're working forward and what we are emphasizing here is not what we are taking away but what we are bringing in. Additionally for this technology, we will actually be able to have a good diagnosis, a good case record built on what we call the frequent flyers. We know the people who call every two weeks, but they will feel so much better when two days before they usually call. A nurse will show up and say, came to check on you and that's what's coming out of this will be customers. This will be the new norm >>because is work. This is already that they're happy people, happy customers, happy voters. Hey, you nailed it. Barack Obama had put in for the first time a data scientist on the white house, DJ Patel, a former entrepreneur, former venture capitalist. Data science is a big deal. Now. Um, are you guys seeing that role coming into the local presence as well? Yes, >>and it's so critical to government and the private sector. If you come up with an item that's not reducing the profit margin, you just shut it down. We can't do that in government that week. Every service we provide, we're locked into that. I cannot say, well the police department where we are, we're not breaking even on that. Let's just shut that down. We won't run three shifts. We'll cut out that third shift. So we have a mandate. It's an imperative. What we're doing here is not an option. This is an absolutely essential. >>So you're excited for the grant. What's next after the announcement? What do you guys be doing together? We've got 16 cities around the world who will be getting these teams. So it's time to schedule them and get started and have the grant now, how many mayors applied and what was the numbers again? Over the life of the program, over 600 mayors have applied for this. This year it was just over a hundred and we are sending teams to 16 cities this year. Well, you guys can get that technology go and get some more music pumping through the world. That's a great place and I'll see the technology, help them. This is a citizen. Thanks for, for sharing the great story. Congratulations, mr mayor. Thanks for joining us on the cube. We right back here in Las Vegas. You watching the cube? I'm John. We'll be right back.

Published Date : May 11 2015

SUMMARY :

It's the queue covering IBM edge 2015 brought to you by IBM. So talk about why Memphis, why IBM, why are you here? calls from the true emergency calls and our EMS service? So I know that you have a Twitter handle and you've got a lot of followers. So that's really going to be unique for Memphis. We started building a smarter planet at one of the things we had to think about was what was the acupressure We will send you a team of six IBM executives for three weeks to I got to ask the question about how you look at the, the governing process So the key is not how much information do you have but what in the world can you do with So that's really the key here. that's the gay real time information being employed in a real So I've got to get under the hood a little bit cause we like to go a little bit into the engine of, of the, of the local environment. So that's, that's a big, big help right there is to make sure if So how are you guys using the data? And the thing about the presence of IBM is kind of like a may I have to ask you a question. We know the people who call every two weeks, but they will feel so much better when Barack Obama had put in for the first time a data not reducing the profit margin, you just shut it down. So it's time to schedule them and get

ENTITIES

Entity	Category	Confidence
Berlin	LOCATION	0.99+
IBM	ORGANIZATION	0.99+
Jen Crozier	PERSON	0.99+
Jen Crozer	PERSON	0.99+
Memphis	LOCATION	0.99+
25,000 calls	QUANTITY	0.99+
three weeks	QUANTITY	0.99+
London	LOCATION	0.99+
Las Vegas	LOCATION	0.99+
John	PERSON	0.99+
Rome	LOCATION	0.99+
New York	LOCATION	0.99+
Jen	PERSON	0.99+
Barack Obama	PERSON	0.99+
DJ Patel	PERSON	0.99+
AC Borden	PERSON	0.99+
16 cities	QUANTITY	0.99+
last year	DATE	0.99+
this year	DATE	0.99+
This year	DATE	0.99+
Twitter	ORGANIZATION	0.99+
third shift	QUANTITY	0.99+
iWatch	COMMERCIAL_ITEM	0.99+
120,000	QUANTITY	0.99+
Las Vegas, Nevada	LOCATION	0.99+
2015	DATE	0.99+
one	QUANTITY	0.99+
nine	QUANTITY	0.98+
over 600 mayors	QUANTITY	0.98+
six	QUANTITY	0.98+
two aspects	QUANTITY	0.98+
three cities	QUANTITY	0.98+
over a hundred	QUANTITY	0.98+
today	DATE	0.98+
more than 115 teens	QUANTITY	0.98+
about 25,000	QUANTITY	0.97+
Zinn	PERSON	0.97+
Mayor	PERSON	0.97+
first time	QUANTITY	0.96+
John furrier	PERSON	0.96+
three shifts	QUANTITY	0.96+
one source	QUANTITY	0.96+
A C Wharton, Jr.	PERSON	0.91+
over 120,000 calls a year	QUANTITY	0.9+
More than half of the world's population	QUANTITY	0.87+
a million times a night	QUANTITY	0.86+
IBM alliances	ORGANIZATION	0.85+
mayor	PERSON	0.84+
one calls	QUANTITY	0.83+
double analysis	QUANTITY	0.83+
two days	DATE	0.81+
house	ORGANIZATION	0.81+
nine one	QUANTITY	0.79+
that week	DATE	0.77+
two weeks	QUANTITY	0.71+
one system	QUANTITY	0.62+
IBM Edge	ORGANIZATION	0.61+
edge	COMMERCIAL_ITEM	0.6+

Larry Lancaster, Zebrium | Virtual Vertica BDC 2020

>> Announcer: It's theCUBE! Covering the Virtual Vertica Big Data Conference 2020 brought to you by Vertica. >> Hi, everybody. Welcome back. You're watching theCUBE's coverage of the Vertica Virtual Big Data Conference. It was, of course, going to be in Boston at the Encore Hotel. Win big with big data with the new casino but obviously Coronavirus has changed all that. Our hearts go out and we are empathy to those people who are struggling. We are going to continue our wall-to-wall coverage of this conference and we're here with Larry Lancaster who's the founder and CTO of Zebrium. Larry, welcome to theCUBE. Thanks for coming on. >> Hi, thanks for having me. >> You're welcome. So first question, why did you start Zebrium? >> You know, I've been dealing with machine data a long time. So for those of you who don't know what that is, if you can imagine servers or whatever goes on in a data center or in a SAS shop. There's data coming out of those servers, out of those applications and basically, you can build a lot of cool stuff on that. So there's a lot of metrics that come out and there's a lot of log files that come. And so, I've built this... Basically spent my career building that sort of thing. So tools on top of that or products on top of that. The problem is that since at least log files are completely unstructured, it's always doing the same thing over and over again, which is going in and understanding the data and extracting the data and all that stuff. It's very time consuming. If you've done it like five times you don't want to do it again. So really, my idea was at this point with machine learning where it's at there's got to be a better way. So Zebrium was founded on the notion that we can just do all that automatically. We can take a pile of machine data, we can turn it into a database, and we can build stuff on top of that. And so the company is really all about bringing that value to the market. >> That's cool. I want to get in to that, just better understand who you're disrupting and understand that opportunity better. But before I do, tell us a little bit about your background. You got kind of an interesting background. Lot of tech jobs. Give us some color there. >> Yeah, so I started in the Valley I guess 20 years ago and when my son was born I left grad school. I was in grad school over at Berkeley, Biophysics. And I realized I needed to go get a job so I ended up starting in software and I've been there ever since. I mean, I spent a lot of time at, I guess I cut my teeth at Nedap, which was a storage company. And then I co-founded a business called Glassbeam, which was kind of an ETL database company. And then after that I ended up at Nimble Storage. Another company, EMC, ended up buying the Glassbeam so I went over there and then after Nimble though, which where I build the InfoSight platform. That's where I kind of, after that I was able to step back and take a year and a half and just go into my basement, actually, this is my kind of workspace here, and come up with the technology and actually build it so that I could go raise money and get a team together to build Zebrium. So that's really my career in a nutshell. >> And you've got Hello Kitty over your right shoulder, which is kind of cool >> That's right. >> And then up to the left you got your monitor, right? >> Well, I had it. It's over here, yeah. >> But it was great! Pull it out, pull it out, let me see it. So, okay, so you got that. So what do you do? You just sit there and code all night or what? >> Yeah, that's right. So Hello Kitty's over here. I have a daughter and she setup my workspace here on this side with Hello Kitty and so on. And over on this side, I've got my recliner where I basically lay it all the way back and then I pivot this thing down over my face and put my keyboard on my lap and I can just sit there for like 20 hours. It's great. Completely comfortable. >> That's cool. All right, better put that monitor back or our guys will yell at me. But so, obviously, we're talking to somebody with serious coding chops and I'll also add that the Nimble InfoSight, I think it was one of the best pick ups that HP, HPE, has had in a while. And the thing that interested me about that, Larry, is the ability that the company was able to take that InfoSight and poured it very quickly across its product lines. So that says to me it was a modern, architecture, I'm sure API, microservices, and all those cool buzz words, but the proof is in their ability to bring that IP to other parts of the portfolio. So, well done. >> Yeah, well thanks. Appreciate that. I mean, they've got a fantastic team there. And the other thing that helps is when you have the notion that you don't just build on top of the data, you extract the data, you structure it, you put that in a database, we used Vertica there for that, and then you build on top of that. Taking the time to build that layer is what lets you build a scalable platform. >> Yeah, so, why Vertica? I mean, Vertica's been around for awhile. You remember you had the you had the old RDBMS, Oracles, Db2s, SQL Server, and then the database was kind of a boring market. And then, all of a sudden, you had all of these MPP companies came out, a spade of them. They all got acquired, including Vertica. And they've all sort of disappeared and morphed into different brands and Micro Focus has preserved the Vertica brand. But it seems like Vertica has been able to survive the transitions. Why Vertica? What was it about that platform that was unique and interested you? >> Well, I mean, so they're the first fund to build, what I would call a real column store that's kind of market capable, right? So there was the C-Store project at Berkeley, which Stonebreaker was involved in. And then that became sort of the seed from which Vertica was spawned. So you had this idea of, let's lay things out in a columnar way. And when I say columnar, I don't just mean that the data for every column is in a different set of files. What I mean by that is it takes full advantage of things like run length and coding, and L file and coding, and block--impression, and so you end up with these massive orders of magnitude savings in terms of the data that's being pulled off of storage as well as as it's moving through the pipeline internally in Vertica's query processing. So why am I saying all this? Because it's fundamentally, it was a fundamentally disruptive technology. I think column stores are ubiquitous now in analytics. And I think you could name maybe a couple of projects which are mostly open source who do something like Vertica does but name me another one that's actually capable of serving an enterprise as a relational database. I still think Vertica is unique in being that one. >> Well, it's interesting because you're a startup. And so a lot of startups would say, okay, we're going with a born-in-the-cloud database. Now Vertica touts that, well look, we've embraced cloud. You know, we have, we run in the cloud, we run on PRAM, all different optionality. And you hear a lot of vendors say that, but a lot of times they're just taking their stack and stuffing it into the cloud. But, so why didn't you go with a cloud-native database and is Vertica able to, I mean, obviously, that's why you chose it, but I'm interested from a technologist standpoint as to why you, again, made that choice given all these other choices around there. >> Right, I mean, again, I'm not, so... As I explained a column store, which I think is the appropriate definition, I'm not aware of another cloud-native-- >> Hm, okay. >> I'm aware of other cloud-native transactional databases, I'm not aware of one that has the analytics form it and I've tried some of them. So it was not like I didn't look. What I was actually impressed with and I think what let me move forward using Vertica in our stack is the fact that Eon really is built from the ground up to be cloud-native. And so we've been using Eon almost ever since we started the work that we're doing. So I've been really happy with the performance and with reliability of Eon. >> It's interesting. I've been saying for years that Vertica's a diamond in the rough and it's previous owner didn't know what to do with it because it got distracted and now Micro Focus seems to really see the value and is obviously putting some investments in there. >> Yeah >> Tell me more about your business. Who are you disrupting? Are you kind of disrupting the do-it-yourself? Or is there sort of a big whale out there that you're going to go after? Add some color to that. >> Yeah, so our broader market is monitoring software, that's kind of the high-level category. So you have a lot of people in that market right now. Some of them are entrenched in large players, like Datadog would be a great example. Some of them are smaller upstarts. It's a pretty, it's a pretty saturated market. But what's happened over the last, I'd say two years, is that there's been sort of a push towards what's called observability in terms of at least how some of the products are architected, like Honeycomb, and how some of them are messaged. Most of them are messaged these days. And what that really means is there's been sort of an understanding that's developed that that MTTR is really what people need to focus on to keep their customers happy. If you're a SAS company, MTTR is going to be your bread and butter. And it's still measured in hours and days. And the biggest reason for that is because of what's called unknown unknowns. Because of complexity. Now a days, things are, applications are ten times as complex as they used to be. And what you end up with is a situation where if something is new, if it's a known issue with a known symptom and a known root cause, then you can setup a automation for it. But the ones that really cost a lot of time in terms of service disruption are unknown unknowns. And now you got to go dig into this massive mass of data. So observability is about making tools to help you do that, but it's still going to take you hours. And so our contention is, you need to automate the eyeball. The bottleneck is now the eyeball. And so you have to get away from this notion of a person's going to be able to do it infinitely more efficient and recognize that you need automated help. When you get an alert agent, it shouldn't be that, "Hey, something weird's happening. Now go dig in." It should be, "Here's a root cause and a symptom." And that should be proposed to you by a system that actually does the observing. That actually does the watching. And that's what Zebrium does. >> Yeah, that's awesome. I mean, you're right. The last thing you want is just another alert and it say, "Go figure something out because there's a problem." So how does it work, Larry? In terms of what you built there. Can you take us inside the covers? >> Yeah, sure. So there's really, right now there's two kinds of data that we're ingesting. There's metrics and there's log files. Metrics, there's actually sort of a framework that's really popular in DevOp circles especially but it's becoming popular everywhere, which is called Prometheus. And it's a way of exporting metrics so that scrapers can collect them. And so if you go look at a typical stack, you'll find that most of the open source components and many of the closed source components are going to have exporters that export all their stacks to Prometheus. So by supporting that stack we can bring in all of those metrics. And then there's also the log files. And so you've got host log files in a containerized environment, you've got container logs, and you've got application-specific logs, perhaps living on a host mount. And you want to pull all those back and you want to be able to associate this log that I've collected here is associated with the same container on the same host that this metric is associated with. But now what? So once you've got that, you've got a pile of unstructured logs. So what we do is we take a look at those logs and we say, let's structure those into tables, right? So where I used to have a log message, if I look in my log file and I see it says something like, X happened five times, right? Well, that event types going to occur again and it'll say, X happened six times or X happened three times. So if I see that as a human being, I can say, "Oh clearly, that's the same thing." And what's interesting here is the times that X, that X happened, and that this number read... I may want to know when the numbers happened as a time series, the values of that column. And so you can imagine it as a table. So now I have table for that event type and every time it happens, I get a row. And then I have a column with that number in it. And so now I can do any kind of analytics I want almost instantly across my... If I have all my event types structured that way, every thing changes. You can do real anomaly detection and incident detection on top of that data. So that's really how we go about doing it. How we go about being able to do autonomous monitoring in a way that's effective. >> How do you handle doing that for, like the Spoke app? Do you have to, does somebody have to build a connector to those apps? How do you handle that? >> Yeah, that's a really good question. So you're right. So if I go and install a typical log manager, there'll be connectors for different apps and usually what that means is pulling in the stuff on the left, if you were to be looking at that log line, and it will be things like a time stamp, or a severity, or a function name, or various other things. And so the connector will know how to pull those apart and then the stuff to the right will be considered the message and that'll get indexed for search. And so our approach is we actually go in with machine learning and we structure that whole thing. So there's a table. And it's going to have a column called severity, and timestamp, and function name. And then it's going to have columns that correspond to the parameters that are in that event. And it'll have a name associated with the constant parts of that event. And so you end up with a situation where you've structured all of it automatically so we don't need collectors. It'll work just as well on your home-grown app that has no collectors or no parsers to find or anything. It'll work immediately just as well as it would work on anything else. And that's important, because you can't be asking people for connectors to their own applications. It just, it becomes now they've go to stop what they're doing and go write code for you, for your platform and they have to maintain it. It's just untenable. So you can be up and running with our service in three minutes. It'll just be monitoring those for you. >> That's awesome! I mean, that is really a breakthrough innovation. So, nice. Love to see that hittin' the market. Who do you sell to? Both types of companies and what role within the company? >> Well, definitely there's two main sort of pushes that we've seen, or I should say pulls. One is from DevOps folks, SRE folks. So these are people who are tasked with monitoring an environment, basically. And then you've got people who are in engineering and they have a staging environment. And what they actually find valuable is... Because when we find an incident in a staging environment, yeah, half the time it's because they're tearing everything up and it's not release ready, whatever's in stage. That's fine, they know that. But the other half the time it's new bugs, it's issues and they're finding issues. So it's kind of diverged. You have engineering users and they don't have titles like QA, they're Dev engineers or Dev managers that are really interested. And then you've got DevOps and SRE people there (mumbles). >> And how do I consume your product? Is the SAS... I sign up and you say within three minutes I'm up and running. I'm paying by the drink. >> Well, (laughs) right. So there's a couple ways. So, right. So the easiest way is if you use Kubernetes. So Kubernetes is what's called a container orchestrator. So these days, you know Docker and containers and all that, so now there's container orchestrators have become, I wouldn't say ubiquitous but they're very popular now. So it's kind of on that inflection curve. I'm not exactly sure the penetration but I'm going to say 30-40% probably of shops that were interested are using container orchestrators. So if you're using Kubernetes, basically you can install our Kubernetes chart, which basically means copying and pasting a URL and so on into your little admin panel there. And then it'll just start collecting all the logs and metrics and then you just login on the website. And the way you do that is just go to our website and it'll show you how to sign up for the service and you'll get your little API key and link to the chart and you're off and running. You don't have to do anything else. You can add rules, you can add stuff, but you don't have to. You shouldn't have to, right? You should never have to do any more work. >> That's great. So it's a SAS capability and I just pay for... How do you price it? >> Oh, right. So it's priced on volume, data volume. I don't want to go too much into it because I'm not the pricing guy. But what I'll say is that it's, as far as I know it's as cheap or cheaper than any other log manager or metrics product. It's in that same neighborhood as the very low priced ones. Because right now, we're not trying to optimize for take. We're trying to make a healthy margin and get the value of autonomous monitoring out there. Right now, that's our priority. >> And it's running in the cloud, is that right? AWB West-- >> Yeah, that right. Oh, I should've also pointed out that you can have a free account if it's less than some number of gigabytes a day we're not going to charge. Yeah, so we run in AWS. We have a multi-tenant instance in AWS. And we have a Vertica Eon cluster behind that. And it's been working out really well. >> And on your freemium, you have used the Vertica Community Edition? Because they don't charge you for that, right? So is that how you do it or... >> No, no. We're, no, no. So, I don't want to go into that because I'm not the bizdev guy. But what I'll say is that if you're doing something that winds up being OEM-ish, you can work out the particulars with Vertica. It's not like you're going to just go pay retail and they won't let you distinguish between tests, and prod, and paid, and all that. They'll work with you. Just call 'em up. >> Yeah, and that's why I brought it up because Vertica, they have a community edition, which is not neutered. It runs Eon, it's just there's limits on clusters and storage >> There's limits. >> But it's still fully functional though. >> So to your point, we want it multi-tenant. So it's big just because it's multi-tenant. We have hundred of users on that (audio cuts out). >> And then, what's your partnership with Vertica like? Can we close on that and just describe that a little bit? >> What's it like. I mean, it's pleasant. >> Yeah, I mean (mumbles). >> You know what, so the important thing... Here's what's important. What's important is that I don't have to worry about that layer of our stack. When it comes to being able to get the performance I need, being able to get the economy of scale that I need, being able to get the absolute scale that I need, I've not been disappointed ever with Vertica. And frankly, being able to have acid guarantees and everything else, like a normal mature database that can join lots of tables and still be fast, that's also necessary at scale. And so I feel like it was definitely the right choice to start with. >> Yeah, it's interesting. I remember in the early days of big data a lot of people said, "Who's going to need these acid properties and all this complexity of databases." And of course, acid properties and SQL became the killer features and functions of these databases. >> Who didn't see that one coming, right? >> Yeah, right. And then, so you guys have done a big seed round. You've raised a little over $6 million dollars and you got the product market fit down. You're ready to rock, right? >> Yeah, that's right. So we're doing a launch probably, well, when this airs it'll probably be the day before this airs. Basically, yeah. We've got people... Like literally in the last, I'd say, six to eight weeks, It's just been this sort of pique of interest. All of a sudden, everyone kind of gets what we're doing, realizes they need it, and we've got a solution that seems to meet expectations. So it's like... It's been an amazing... Let me just say this, it's been an amazing start to the year. I mean, at the same time, it's been really difficult for us but more difficult for some other people that haven't been able to go to work over the last couple of weeks and so on. But it's been a good start to the year, at least for our business. So... >> Well, Larry, congratulations on getting the company off the ground and thank you so much for coming on theCUBE and being part of the Virtual Vertica Big Data Conference. >> Thank you very much. >> All right, and thank you everybody for watching. This is Dave Vellante for theCUBE. Keep it right there. We're covering wall-to-wall Virtual Vertica BDC. You're watching theCUBE. (upbeat music)

Published Date : Mar 31 2020

SUMMARY :

brought to you by Vertica. and we're here with Larry Lancaster why did you start Zebrium? and basically, you can build a lot of cool stuff on that. and understand that opportunity better. and actually build it so that I could go raise money It's over here, yeah. So what do you do? and then I pivot this thing down over my face and I'll also add that the Nimble InfoSight, And the other thing that helps is when you have the notion and Micro Focus has preserved the Vertica brand. and so you end up with these massive orders And you hear a lot of vendors say that, I'm not aware of another cloud-native-- I'm not aware of one that has the analytics form it and now Micro Focus seems to really see the value Are you kind of disrupting the do-it-yourself? And that should be proposed to you In terms of what you built there. And so you can imagine it as a table. And so you end up with a situation I mean, that is really a breakthrough innovation. and it's not release ready, I sign up and you say within three minutes And the way you do that So it's a SAS capability and I just pay for... and get the value of autonomous monitoring out there. that you can have a free account So is that how you do it or... and they won't let you distinguish between Yeah, and that's why I brought it up because Vertica, But it's still So to your point, I mean, it's pleasant. What's important is that I don't have to worry I remember in the early days of big data and you got the product market fit down. that haven't been able to go to work and thank you so much for coming on theCUBE All right, and thank you everybody for watching.

ENTITIES

Entity	Category	Confidence
Larry Lancaster	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Larry	PERSON	0.99+
Boston	LOCATION	0.99+
five times	QUANTITY	0.99+
three times	QUANTITY	0.99+
six times	QUANTITY	0.99+
EMC	ORGANIZATION	0.99+
six	QUANTITY	0.99+
Zebrium	ORGANIZATION	0.99+
20 hours	QUANTITY	0.99+
Glassbeam	ORGANIZATION	0.99+
Nedap	ORGANIZATION	0.99+
Vertica	ORGANIZATION	0.99+
Nimble	ORGANIZATION	0.99+
Nimble Storage	ORGANIZATION	0.99+
HP	ORGANIZATION	0.99+
HPE	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
a year and a half	QUANTITY	0.99+
Micro Focus	ORGANIZATION	0.99+
ten times	QUANTITY	0.99+
two kinds	QUANTITY	0.99+
two years	QUANTITY	0.99+
three minutes	QUANTITY	0.99+
first question	QUANTITY	0.99+
eight weeks	QUANTITY	0.98+
Stonebreaker	ORGANIZATION	0.98+
Prometheus	TITLE	0.98+
30-40%	QUANTITY	0.98+
Eon	ORGANIZATION	0.98+
hundred of users	QUANTITY	0.98+
One	QUANTITY	0.98+
Vertica Virtual Big Data Conference	EVENT	0.98+
Kubernetes	TITLE	0.97+
first fund	QUANTITY	0.97+
Virtual Vertica Big Data Conference 2020	EVENT	0.97+
AWB West	ORGANIZATION	0.97+
Virtual Vertica Big Data Conference	EVENT	0.97+
Honeycomb	ORGANIZATION	0.96+
SAS	ORGANIZATION	0.96+
20 years ago	DATE	0.96+
Both types	QUANTITY	0.95+
theCUBE	ORGANIZATION	0.95+
Datadog	ORGANIZATION	0.95+
two main	QUANTITY	0.94+
over $6 million dollars	QUANTITY	0.93+
Hello Kitty	ORGANIZATION	0.93+
SQL	TITLE	0.93+
Zebrium	PERSON	0.91+
Spoke	TITLE	0.89+
Encore Hotel	LOCATION	0.88+
InfoSight	ORGANIZATION	0.88+
Coronavirus	OTHER	0.88+
one	QUANTITY	0.86+
less	QUANTITY	0.85+
Oracles	ORGANIZATION	0.85+
2020	DATE	0.85+
CTO	PERSON	0.84+
Vertica	TITLE	0.82+
Nimble InfoSight	ORGANIZATION	0.81+

UNLIST TILL 4/2 - A Technical Overview of Vertica Architecture

>> Paige: Hello, everybody and thank you for joining us today on the Virtual Vertica BDC 2020. Today's breakout session is entitled A Technical Overview of the Vertica Architecture. I'm Paige Roberts, Open Source Relations Manager at Vertica and I'll be your host for this webinar. Now joining me is Ryan Role-kuh? Did I say that right? (laughs) He's a Vertica Senior Software Engineer. >> Ryan: So it's Roelke. (laughs) >> Paige: Roelke, okay, I got it, all right. Ryan Roelke. And before we begin, I want to be sure and encourage you guys to submit your questions or your comments during the virtual session while Ryan is talking as you think of them as you go along. You don't have to wait to the end, just type in your question or your comment in the question box below the slides and click submit. There'll be a Q and A at the end of the presentation and we'll answer as many questions as we're able to during that time. Any questions that we don't address, we'll do our best to get back to you offline. Now, alternatively, you can visit the Vertica forums to post your question there after the session as well. Our engineering team is planning to join the forums to keep the conversation going, so you can have a chat afterwards with the engineer, just like any other conference. Now also, you can maximize your screen by clicking the double arrow button in the lower right corner of the slides and before you ask, yes, this virtual session is being recorded and it will be available to view on demand this week. We'll send you a notification as soon as it's ready. Now, let's get started. Over to you, Ryan. >> Ryan: Thanks, Paige. Good afternoon, everybody. My name is Ryan and I'm a Senior Software Engineer on Vertica's Development Team. I primarily work on improving Vertica's query execution engine, so usually in the space of making things faster. Today, I'm here to talk about something that's more general than that, so we're going to go through a technical overview of the Vertica architecture. So the intent of this talk, essentially, is to just explain some of the basic aspects of how Vertica works and what makes it such a great database software and to explain what makes a query execute so fast in Vertica, we'll provide some background to explain why other databases don't keep up. And we'll use that as a starting point to discuss an academic database that paved the way for Vertica. And then we'll explain how Vertica design builds upon that academic database to be the great software that it is today. I want to start by sharing somebody's approximation of an internet minute at some point in 2019. All of the data on this slide is generated by thousands or even millions of users and that's a huge amount of activity. Most of the applications depicted here are backed by one or more databases. Most of this activity will eventually result in changes to those databases. For the most part, we can categorize the way these databases are used into one of two paradigms. First up, we have online transaction processing or OLTP. OLTP workloads usually operate on single entries in a database, so an update to a retail inventory or a change in a bank account balance are both great examples of OLTP operations. Updates to these data sets must be visible immediately and there could be many transactions occurring concurrently from many different users. OLTP queries are usually key value queries. The key uniquely identifies the single entry in a database for reading or writing. Early databases and applications were probably designed for OLTP workloads. This example on the slide is typical of an OLTP workload. We have a table, accounts, such as for a bank, which tracks information for each of the bank's clients. An update query, like the one depicted here, might be run whenever a user deposits $10 into their bank account. Our second category is online analytical processing or OLAP which is more about using your data for decision making. If you have a hardware device which periodically records how it's doing, you could analyze trends of all your devices over time to observe what data patterns are likely to lead to failure or if you're Google, you might log user search activity to identify which links helped your users find the answer. Analytical processing has always been around but with the advent of the internet, it happened at scales that were unimaginable, even just 20 years ago. This SQL example is something you might see in an OLAP workload. We have a table, searches, logging user activity. We will eventually see one row in this table for each query submitted by users. If we want to find out what time of day our users are most active, then we could write a query like this one on the slide which counts the number of unique users running searches for each hour of the day. So now let's rewind to 2005. We don't have a picture of an internet minute in 2005, we don't have the data for that. We also don't have the data for a lot of other things. The term Big Data is not quite yet on anyone's radar and The Cloud is also not quite there or it's just starting to be. So if you have a database serving your application, it's probably optimized for OLTP workloads. OLAP workloads just aren't mainstream yet and database engineers probably don't have them in mind. So let's innovate. It's still 2005 and we want to try something new with our database. Let's take a look at what happens when we do run an analytic workload in 2005. Let's use as a motivating example a table of stock prices over time. In our table, the symbol column identifies the stock that was traded, the price column identifies the new price and the timestamp column indicates when the price changed. We have several other columns which, we should know that they're there, but we're not going to use them in any example queries. This table is designed for analytic queries. We're probably not going to make any updates or look at individual rows since we're logging historical data and want to analyze changes in stock price over time. Our database system is built to serve OLTP use cases, so it's probably going to store the table on disk in a single file like this one. Notice that each row contains all of the columns of our data in row major order. There's probably an index somewhere in the memory of the system which will help us to point lookups. Maybe our system expects that we will use the stock symbol and the trade time as lookup keys. So an index will provide quick lookups for those columns to the position of the whole row in the file. If we did have an update to a single row, then this representation would work great. We would seek to the row that we're interested in, finding it would probably be very fast using the in-memory index. And then we would update the file in place with our new value. On the other hand, if we ran an analytic query like we want to, the data access pattern is very different. The index is not helpful because we're looking up a whole range of rows, not just a single row. As a result, the only way to find the rows that we actually need for this query is to scan the entire file. We're going to end up scanning a lot of data that we don't need and that won't just be the rows that we don't need, there's many other columns in this table. Many information about who made the transaction, and we'll also be scanning through those columns for every single row in this table. That could be a very serious problem once we consider the scale of this file. Stocks change a lot, we probably have thousands or millions or maybe even billions of rows that are going to be stored in this file and we're going to scan all of these extra columns for every single row. If we tried out our stocks use case behind the desk for the Fortune 500 company, then we're probably going to be pretty disappointed. Our queries will eventually finish, but it might take so long that we don't even care about the answer anymore by the time that they do. Our database is not built for the task we want to use it for. Around the same time, a team of researchers in the North East have become aware of this problem and they decided to dedicate their time and research to it. These researchers weren't just anybody. The fruits of their labor, which we now like to call the C-Store Paper, was published by eventual Turing Award winner, Mike Stonebraker, along with several other researchers from elite universities. This paper presents the design of a read-optimized relational DBMS that contrasts sharply with most current systems, which are write-optimized. That sounds exactly like what we want for our stocks use case. Reasoning about what makes our queries executions so slow brought our researchers to the Memory Hierarchy, which essentially is a visualization of the relative speeds of different parts of a computer. At the top of the hierarchy, we have the fastest data units, which are, of course, also the most expensive to produce. As we move down the hierarchy, components get slower but also much cheaper and thus you can have more of them. Our OLTP databases data is stored in a file on the hard disk. We scanned the entirety of this file, even though we didn't need most of the data and now it turns out, that is just about the slowest thing that our query could possibly be doing by over two orders of magnitude. It should be clear, based on that, that the best thing we can do to optimize our query's execution is to avoid reading unnecessary data from the disk and that's what the C-Store researchers decided to look at. The key innovation of the C-Store paper does exactly that. Instead of storing data in a row major order, in a large file on disk, they transposed the data and stored each column in its own file. Now, if we run the same select query, we read only the relevant columns. The unnamed columns don't factor into the table scan at all since we don't even open the files. Zooming out to an internet scale sized data set, we can appreciate the savings here a lot more. But we still have to read a lot of data that we don't need to answer this particular query. Remember, we had two predicates, one on the symbol column and one on the timestamp column. Our query is only interested in AAPL stock, but we're still reading rows for all of the other stocks. So what can we do to optimize our disk read even more? Let's first partition our data set into different files based on the timestamp date. This means that we will keep separate files for each date. When we query the stocks table, the database knows all of the files we have to open. If we have a simple predicate on the timestamp column, as our sample query does, then the database can use it to figure out which files we don't have to look at at all. So now all of our disk reads that we have to do to answer our query will produce rows that pass the timestamp predicate. This eliminates a lot of wasteful disk reads. But not all of them. We do have another predicate on the symbol column where symbol equals AAPL. We'd like to avoid disk reads of rows that don't satisfy that predicate either. And we can avoid those disk reads by clustering all the rows that match the symbol predicate together. If all of the AAPL rows are adjacent, then as soon as we see something different, we can stop reading the file. We won't see any more rows that can pass the predicate. Then we can use the positions of the rows we did find to identify which pieces of the other columns we need to read. One technique that we can use to cluster the rows is sorting. So we'll use the symbol column as a sort key for all of the columns. And that way we can reconstruct a whole row by seeking to the same row position in each file. It turns out, having sorted all of the rows, we can do a bit more. We don't have any more wasted disk reads but we can still be more efficient with how we're using the disk. We've clustered all of the rows with the same symbol together so we don't really need to bother repeating the symbol so many times in the same file. Let's just write the value once and say how many rows we have. This one length encoding technique can compress large numbers of rows into a small amount of space. In this example, we do de-duplicate just a few rows but you can imagine de-duplicating many thousands of rows instead. This encoding is great for reducing the amounts of disk we need to read at query time, but it also has the additional benefit of reducing the total size of our stored data. Now our query requires substantially fewer disk reads than it did when we started. Let's recap what the C-Store paper did to achieve that. First, we transposed our data to store each column in its own file. Now, queries only have to read the columns used in the query. Second, we partitioned the data into multiple file sets so that all rows in a file have the same value for the partition column. Now, a predicate on the partition column can skip non-matching file sets entirely. Third, we selected a column of our data to use as a sort key. Now rows with the same value for that column are clustered together, which allows our query to stop reading data once it finds non-matching rows. Finally, sorting the data this way enables high compression ratios, using one length encoding which minimizes the size of the data stored on the disk. The C-Store system combined each of these innovative ideas to produce an academically significant result. And if you used it behind the desk of a Fortune 500 company in 2005, you probably would've been pretty pleased. But it's not 2005 anymore and the requirements of a modern database system are much stricter. So let's take a look at how C-Store fairs in 2020. First of all, we have designed the storage layer of our database to optimize a single query in a single application. Our design optimizes the heck out of that query and probably some similar ones but if we want to do anything else with our data, we might be in a bit of trouble. What if we just decide we want to ask a different question? For example, in our stock example, what if we want to plot all the trade made by a single user over a large window of time? How do our optimizations for the previous query measure up here? Well, our data's partitioned on the trade date, that could still be useful, depending on our new query. If we want to look at a trader's activity over a long period of time, we would have to open a lot of files. But if we're still interested in just a day's worth of data, then this optimization is still an optimization. Within each file, our data is ordered on the stock symbol. That's probably not too useful anymore, the rows for a single trader aren't going to be clustered together so we will have to scan all of the rows in order to figure out which ones match. You could imagine a worse design but as it becomes crucial to optimize this new type of query, then we might have to go as far as reconfiguring the whole database. The next problem of one of scale. One server is probably not good enough to serve a database in 2020. C-Store, as described, runs on a single server and stores lots of files. What if the data overwhelms this small system? We could imagine exhausting the file system's inodes limit with lots of small files due to our partitioning scheme. Or we could imagine something simpler, just filling up the disk with huge volumes of data. But there's an even simpler problem than that. What if something goes wrong and C-Store crashes? Then our data is no longer available to us until the single server is brought back up. A third concern, another one of scalability, is that one deployment does not really suit all possible things and use cases we could imagine. We haven't really said anything about being flexible. A contemporary database system has to integrate with many other applications, which might themselves have pretty restricted deployment options. Or the demands imposed by our workloads have changed and the setup you had before doesn't suit what you need now. C-Store doesn't do anything to address these concerns. What the C-Store paper did do was lead very quickly to the founding of Vertica. Vertica's architecture and design are essentially all about bringing the C-Store designs into an enterprise software system. The C-Store paper was just an academic exercise so it didn't really need to address any of the hard problems that we just talked about. But Vertica, the first commercial database built upon the ideas of the C-Store paper would definitely have to. This brings us back to the present to look at how an analytic query runs in 2020 on the Vertica Analytic Database. Vertica takes the key idea from the paper, can we significantly improve query performance by changing the way our data is stored and give its users the tools to customize their storage layer in order to heavily optimize really important or commonly wrong queries. On top of that, Vertica is a distributed system which allows it to scale up to internet-sized data sets, as well as have better reliability and uptime. We'll now take a brief look at what Vertica does to address the three inadequacies of the C-Store system that we mentioned. To avoid locking into a single database design, Vertica provides tools for the database user to customize the way their data is stored. To address the shortcomings of a single node system, Vertica coordinates processing among multiple nodes. To acknowledge the large variety of desirable deployments, Vertica does not require any specialized hardware and has many features which smoothly integrate it with a Cloud computing environment. First, we'll look at the database design problem. We're a SQL database, so our users are writing SQL and describing their data in SQL way, the Create Table statement. Create Table is a logical description of what your data looks like but it doesn't specify the way that it has to be stored, For a single Create Table, we could imagine a lot of different storage layouts. Vertica adds some extensions to SQL so that users can go even further than Create Table and describe the way that they want the data to be stored. Using terminology from the C-Store paper, we provide the Create Projection statement. Create Projection specifies how table data should be laid out, including column encoding and sort order. A table can have multiple projections, each of which could be ordered on different columns. When you query a table, Vertica will answer the query using the projection which it determines to be the best match. Referring back to our stock example, here's a sample Create Table and Create Projection statement. Let's focus on our heavily optimized example query, which had predicates on the stock symbol and date. We specify that the table data is to be partitioned by date. The Create Projection Statement here is excellent for this query. We specify using the order by clause that the data should be ordered according to our predicates. We'll use the timestamp as a secondary sort key. Each projection stores a copy of the table data. If you don't expect to need a particular column in a projection, then you can leave it out. Our average price query didn't care about who did the trading, so maybe our projection design for this query can leave the trader column out entirely. If the question we want to ask ever does change, maybe we already have a suitable projection, but if we don't, then we can create another one. This example shows another projection which would be much better at identifying trends of traders, rather than identifying trends for a particular stock. Next, let's take a look at our second problem, that one, or excuse me, so how should you decide what design is best for your queries? Well, you could spend a lot of time figuring it out on your own, or you could use Vertica's Database Designer tool which will help you by automatically analyzing your queries and spitting out a design which it thinks is going to work really well. If you want to learn more about the Database Designer Tool, then you should attend the session Vertica Database Designer- Today and Tomorrow which will tell you a lot about what the Database Designer does and some recent improvements that we have made. Okay, now we'll move to our next problem. (laughs) The challenge that one server does not fit all. In 2020, we have several orders of magnitude more data than we had in 2005. And you need a lot more hardware to crunch it. It's not tractable to keep multiple petabytes of data in a system with a single server. So Vertica doesn't try. Vertica is a distributed system so will deploy multiple severs which work together to maintain such a high data volume. In a traditional Vertica deployment, each node keeps some of the data in its own locally-attached storage. Data is replicated so that there is a redundant copy somewhere else in the system. If any one node goes down, then the data that it served is still available on a different node. We'll also have it so that in the system, there's no special node with extra duties. All nodes are created equal. This ensures that there is no single point of failure. Rather than replicate all of your data, Vertica divvies it up amongst all of the nodes in your system. We call this segmentation. The way data is segmented is another parameter of storage customization and it can definitely have an impact upon query performance. A common way to segment data is by using a hash expression, which essentially randomizes the node that a row of data belongs to. But with a guarantee that the same data will always end up in the same place. Describing the way data is segmented is another part of the Create Projection Statement, as seen in this example. Here we segment on the hash of the symbol column so all rows with the same symbol will end up on the same node. For each row that we load into the system, we'll apply our segmentation expression. The result determines which segment the row belongs to and then we'll send the row to each node which holds the copy of that segment. In this example, our projection is marked KSAFE 1, so we will keep one redundant copy of each segment. When we load a row, we might find that its segment had copied on Node One and Node Three, so we'll send a copy of the row to each of those nodes. If Node One is temporarily disconnected from the network, then Node Three can serve the other copy of the segment so that the whole system remains available. The last challenge we brought up from the C-Store design was that one deployment does not fit all. Vertica's cluster design neatly addressed many of our concerns here. Our use of segmentation to distribute data means that a Vertica system can scale to any size of deployment. And since we lack any special hardware or nodes with special purposes, Vertica servers can run anywhere, on premise or in the Cloud. But let's suppose you need to scale out your cluster to rise to the demands of a higher workload. Suppose you want to add another node. This changes the division of the segmentation space. We'll have to re-segment every row in the database to find its new home and then we'll have to move around any data that belongs to a different segment. This is a very expensive operation, not something you want to be doing all that often. Traditional Vertica doesn't solve that problem especially well, but Vertica Eon Mode definitely does. Vertica's Eon Mode is a large set of features which are designed with a Cloud computing environment in mind. One feature of this design is elastic throughput scaling, which is the idea that you can smoothly change your cluster size without having to pay the expenses of shuffling your entire database. Vertica Eon Mode had an entire session dedicated to it this morning. I won't say any more about it here, but maybe you already attended that session or if you haven't, then I definitely encourage you to listen to the recording. If you'd like to learn more about the Vertica architecture, then you'll find on this slide links to several of the academic conference publications. These four papers here, as well as Vertica Seven Years Later paper which describes some of the Vertica designs seven years after the founding and also a paper about the innovations of Eon Mode and of course, the Vertica documentation is an excellent resource for learning more about what's going on in a Vertica system. I hope you enjoyed learning about the Vertica architecture. I would be very happy to take all of your questions now. Thank you for attending this session.

Published Date : Mar 30 2020

SUMMARY :

A Technical Overview of the Vertica Architecture. Ryan: So it's Roelke. in the question box below the slides and click submit. that the best thing we can do

ENTITIES

Entity	Category	Confidence
Ryan	PERSON	0.99+
Mike Stonebraker	PERSON	0.99+
Ryan Roelke	PERSON	0.99+
2005	DATE	0.99+
2020	DATE	0.99+
thousands	QUANTITY	0.99+
2019	DATE	0.99+
$10	QUANTITY	0.99+
Paige Roberts	PERSON	0.99+
Vertica	ORGANIZATION	0.99+
Paige	PERSON	0.99+
Node Three	TITLE	0.99+
Today	DATE	0.99+
First	QUANTITY	0.99+
each file	QUANTITY	0.99+
Roelke	PERSON	0.99+
each row	QUANTITY	0.99+
Node One	TITLE	0.99+
millions	QUANTITY	0.99+
each hour	QUANTITY	0.99+
each	QUANTITY	0.99+
Second	QUANTITY	0.99+
second category	QUANTITY	0.99+
each column	QUANTITY	0.99+
One technique	QUANTITY	0.99+
one	QUANTITY	0.99+
two predicates	QUANTITY	0.99+
each node	QUANTITY	0.99+
One server	QUANTITY	0.99+
SQL	TITLE	0.99+
C-Store	TITLE	0.99+
second problem	QUANTITY	0.99+
Ryan Role	PERSON	0.99+
Third	QUANTITY	0.99+
North East	LOCATION	0.99+
each segment	QUANTITY	0.99+
today	DATE	0.98+
single entry	QUANTITY	0.98+
each date	QUANTITY	0.98+
Google	ORGANIZATION	0.98+
one row	QUANTITY	0.98+
one server	QUANTITY	0.98+
single server	QUANTITY	0.98+
single entries	QUANTITY	0.98+
both	QUANTITY	0.98+
20 years ago	DATE	0.98+
two paradigms	QUANTITY	0.97+
a day	QUANTITY	0.97+
this week	DATE	0.97+
billions of rows	QUANTITY	0.97+
Vertica	TITLE	0.97+
4/2	DATE	0.97+
single application	QUANTITY	0.97+
each query	QUANTITY	0.97+
Each projection	QUANTITY	0.97+

UNLIST TILL 4/2 - Vertica Big Data Conference Keynote

>> Joy: Welcome to the Virtual Big Data Conference. Vertica is so excited to host this event. I'm Joy King, and I'll be your host for today's Big Data Conference Keynote Session. It's my honor and my genuine pleasure to lead Vertica's product and go-to-market strategy. And I'm so lucky to have a passionate and committed team who turned our Vertica BDC event, into a virtual event in a very short amount of time. I want to thank the thousands of people, and yes, that's our true number who have registered to attend this virtual event. We were determined to balance your health, safety and your peace of mind with the excitement of the Vertica BDC. This is a very unique event. Because as I hope you all know, we focus on engineering and architecture, best practice sharing and customer stories that will educate and inspire everyone. I also want to thank our top sponsors for the virtual BDC, Arrow, and Pure Storage. Our partnerships are so important to us and to everyone in the audience. Because together, we get things done faster and better. Now for today's keynote, you'll hear from three very important and energizing speakers. First, Colin Mahony, our SVP and General Manager for Vertica, will talk about the market trends that Vertica is betting on to win for our customers. And he'll share the exciting news about our Vertica 10 announcement and how this will benefit our customers. Then you'll hear from Amy Fowler, VP of strategy and solutions for FlashBlade at Pure Storage. Our partnership with Pure Storage is truly unique in the industry, because together modern infrastructure from Pure powers modern analytics from Vertica. And then you'll hear from John Yovanovich, Director of IT at AT&T, who will tell you about the Pure Vertica Symphony that plays live every day at AT&T. Here we go, Colin, over to you. >> Colin: Well, thanks a lot joy. And, I want to echo Joy's thanks to our sponsors, and so many of you who have helped make this happen. This is not an easy time for anyone. We were certainly looking forward to getting together in person in Boston during the Vertica Big Data Conference and Winning with Data. But I think all of you and our team have done a great job, scrambling and putting together a terrific virtual event. So really appreciate your time. I also want to remind people that we will make both the slides and the full recording available after this. So for any of those who weren't able to join live, that is still going to be available. Well, things have been pretty exciting here. And in the analytic space in general, certainly for Vertica, there's a lot happening. There are a lot of problems to solve, a lot of opportunities to make things better, and a lot of data that can really make every business stronger, more efficient, and frankly, more differentiated. For Vertica, though, we know that focusing on the challenges that we can directly address with our platform, and our people, and where we can actually make the biggest difference is where we ought to be putting our energy and our resources. I think one of the things that has made Vertica so strong over the years is our ability to focus on those areas where we can make a great difference. So for us as we look at the market, and we look at where we play, there are really three recent and some not so recent, but certainly picking up a lot of the market trends that have become critical for every industry that wants to Win Big With Data. We've heard this loud and clear from our customers and from the analysts that cover the market. If I were to summarize these three areas, this really is the core focus for us right now. We know that there's massive data growth. And if we can unify the data silos so that people can really take advantage of that data, we can make a huge difference. We know that public clouds offer tremendous advantages, but we also know that balance and flexibility is critical. And we all need the benefit that machine learning for all the types up to the end data science. We all need the benefits that they can bring to every single use case, but only if it can really be operationalized at scale, accurate and in real time. And the power of Vertica is, of course, how we're able to bring so many of these things together. Let me talk a little bit more about some of these trends. So one of the first industry trends that we've all been following probably now for over the last decade, is Hadoop and specifically HDFS. So many companies have invested, time, money, more importantly, people in leveraging the opportunity that HDFS brought to the market. HDFS is really part of a much broader storage disruption that we'll talk a little bit more about, more broadly than HDFS. But HDFS itself was really designed for petabytes of data, leveraging low cost commodity hardware and the ability to capture a wide variety of data formats, from a wide variety of data sources and applications. And I think what people really wanted, was to store that data before having to define exactly what structures they should go into. So over the last decade or so, the focus for most organizations is figuring out how to capture, store and frankly manage that data. And as a platform to do that, I think, Hadoop was pretty good. It certainly changed the way that a lot of enterprises think about their data and where it's locked up. In parallel with Hadoop, particularly over the last five years, Cloud Object Storage has also given every organization another option for collecting, storing and managing even more data. That has led to a huge growth in data storage, obviously, up on public clouds like Amazon and their S3, Google Cloud Storage and Azure Blob Storage just to name a few. And then when you consider regional and local object storage offered by cloud vendors all over the world, the explosion of that data, in leveraging this type of object storage is very real. And I think, as I mentioned, it's just part of this broader storage disruption that's been going on. But with all this growth in the data, in all these new places to put this data, every organization we talk to is facing even more challenges now around the data silo. Sure the data silos certainly getting bigger. And hopefully they're getting cheaper per bit. But as I said, the focus has really been on collecting, storing and managing the data. But between the new data lakes and many different cloud object storage combined with all sorts of data types from the complexity of managing all this, getting that business value has been very limited. This actually takes me to big bet number one for Team Vertica, which is to unify the data. Our goal, and some of the announcements we have made today plus roadmap announcements I'll share with you throughout this presentation. Our goal is to ensure that all the time, money and effort that has gone into storing that data, all the data turns into business value. So how are we going to do that? With a unified analytics platform that analyzes the data wherever it is HDFS, Cloud Object Storage, External tables in an any format ORC, Parquet, JSON, and of course, our own Native Roth Vertica format. Analyze the data in the right place in the right format, using a single unified tool. This is something that Vertica has always been committed to, and you'll see in some of our announcements today, we're just doubling down on that commitment. Let's talk a little bit more about the public cloud. This is certainly the second trend. It's the second wave maybe of data disruption with object storage. And there's a lot of advantages when it comes to public cloud. There's no question that the public clouds give rapid access to compute storage with the added benefit of eliminating data center maintenance that so many companies, want to get out of themselves. But maybe the biggest advantage that I see is the architectural innovation. The public clouds have introduced so many methodologies around how to provision quickly, separating compute and storage and really dialing-in the exact needs on demand, as you change workloads. When public clouds began, it made a lot of sense for the cloud providers and their customers to charge and pay for compute and storage in the ratio that each use case demanded. And I think you're seeing that trend, proliferate all over the place, not just up in public cloud. That architecture itself is really becoming the next generation architecture for on-premise data centers, as well. But there are a lot of concerns. I think we're all aware of them. They're out there many times for different workloads, there are higher costs. Especially if some of the workloads that are being run through analytics, which tend to run all the time. Just like some of the silo challenges that companies are facing with HDFS, data lakes and cloud storage, the public clouds have similar types of siloed challenges as well. Initially, there was a belief that they were cheaper than data centers, and when you added in all the costs, it looked that way. And again, for certain elastic workloads, that is the case. I don't think that's true across the board overall. Even to the point where a lot of the cloud vendors aren't just charging lower costs anymore. We hear from a lot of customers that they don't really want to tether themselves to any one cloud because of some of those uncertainties. Of course, security and privacy are a concern. We hear a lot of concerns with regards to cloud and even some SaaS vendors around shared data catalogs, across all the customers and not enough separation. But security concerns are out there, you can read about them. I'm not going to jump into that bandwagon. But we hear about them. And then, of course, I think one of the things we hear the most from our customers, is that each cloud stack is starting to feel even a lot more locked in than the traditional data warehouse appliance. And as everybody knows, the industry has been running away from appliances as fast as it can. And so they're not eager to get locked into another, quote, unquote, virtual appliance, if you will, up in the cloud. They really want to make sure they have flexibility in which clouds, they're going to today, tomorrow and in the future. And frankly, we hear from a lot of our customers that they're very interested in eventually mixing and matching, compute from one cloud with, say storage from another cloud, which I think is something that we'll hear a lot more about. And so for us, that's why we've got our big bet number two. we love the cloud. We love the public cloud. We love the private clouds on-premise, and other hosting providers. But our passion and commitment is for Vertica to be able to run in any of the clouds that our customers choose, and make it portable across those clouds. We have supported on-premises and all public clouds for years. And today, we have announced even more support for Vertica in Eon Mode, the deployment option that leverages the separation of compute from storage, with even more deployment choices, which I'm going to also touch more on as we go. So super excited about our big bet number two. And finally as I mentioned, for all the hype that there is around machine learning, I actually think that most importantly, this third trend that team Vertica is determined to address is the need to bring business critical, analytics, machine learning, data science projects into production. For so many years, there just wasn't enough data available to justify the investment in machine learning. Also, processing power was expensive, and storage was prohibitively expensive. But to train and score and evaluate all the different models to unlock the full power of predictive analytics was tough. Today you have those massive data volumes. You have the relatively cheap processing power and storage to make that dream a reality. And if you think about this, I mean with all the data that's available to every company, the real need is to operationalize the speed and the scale of machine learning so that these organizations can actually take advantage of it where they need to. I mean, we've seen this for years with Vertica, going back to some of the most advanced gaming companies in the early days, they were incorporating this with live data directly into their gaming experiences. Well, every organization wants to do that now. And the accuracy for clickability and real time actions are all key to separating the leaders from the rest of the pack in every industry when it comes to machine learning. But if you look at a lot of these projects, the reality is that there's a ton of buzz, there's a ton of hype spanning every acronym that you can imagine. But most companies are struggling, do the separate teams, different tools, silos and the limitation that many platforms are facing, driving, down sampling to get a small subset of the data, to try to create a model that then doesn't apply, or compromising accuracy and making it virtually impossible to replicate models, and understand decisions. And if there's one thing that we've learned when it comes to data, prescriptive data at the atomic level, being able to show end of one as we refer to it, meaning individually tailored data. No matter what it is healthcare, entertainment experiences, like gaming or other, being able to get at the granular data and make these decisions, make that scoring applies to machine learning just as much as it applies to giving somebody a next-best-offer. But the opportunity has never been greater. The need to integrate this end-to-end workflow and support the right tools without compromising on that accuracy. Think about it as no downsampling, using all the data, it really is key to machine learning success. Which should be no surprise then why the third big bet from Vertica is one that we've actually been working on for years. And we're so proud to be where we are today, helping the data disruptors across the world operationalize machine learning. This big bet has the potential to truly unlock, really the potential of machine learning. And today, we're announcing some very important new capabilities specifically focused on unifying the work being done by the data science community, with their preferred tools and platforms, and the volume of data and performance at scale, available in Vertica. Our strategy has been very consistent over the last several years. As I said in the beginning, we haven't deviated from our strategy. Of course, there's always things that we add. Most of the time, it's customer driven, it's based on what our customers are asking us to do. But I think we've also done a great job, not trying to be all things to all people. Especially as these hype cycles flare up around us, we absolutely love participating in these different areas without getting completely distracted. I mean, there's a variety of query tools and data warehouses and analytics platforms in the market. We all know that. There are tools and platforms that are offered by the public cloud vendors, by other vendors that support one or two specific clouds. There are appliance vendors, who I was referring to earlier who can deliver package data warehouse offerings for private data centers. And there's a ton of popular machine learning tools, languages and other kits. But Vertica is the only advanced analytic platform that can do all this, that can bring it together. We can analyze the data wherever it is, in HDFS, S3 Object Storage, or Vertica itself. Natively we support multiple clouds on-premise deployments, And maybe most importantly, we offer that choice of deployment modes to allow our customers to choose the architecture that works for them right now. It still also gives them the option to change move, evolve over time. And Vertica is the only analytics database with end-to-end machine learning that can truly operationalize ML at scale. And I know it's a mouthful. But it is not easy to do all these things. It is one of the things that highly differentiates Vertica from the rest of the pack. It is also why our customers, all of you continue to bet on us and see the value that we are delivering and we will continue to deliver. Here's a couple of examples of some of our customers who are powered by Vertica. It's the scale of data. It's the millisecond response times. Performance and scale have always been a huge part of what we have been about, not the only thing. I think the functionality all the capabilities that we add to the platform, the ease of use, the flexibility, obviously with the deployment. But if you look at some of the numbers they are under these customers on this slide. And I've shared a lot of different stories about these customers. Which, by the way, it still amaze me every time I talk to one and I get the updates, you can see the power and the difference that Vertica is making. Equally important, if you look at a lot of these customers, they are the epitome of being able to deploy Vertica in a lot of different environments. Many of the customers on this slide are not using Vertica just on-premise or just in the cloud. They're using it in a hybrid way. They're using it in multiple different clouds. And again, we've been with them on that journey throughout, which is what has made this product and frankly, our roadmap and our vision exactly what it is. It's been quite a journey. And that journey continues now with the Vertica 10 release. The Vertica 10 release is obviously a massive release for us. But if you look back, you can see that building on that native columnar architecture that started a long time ago, obviously, with the C-Store paper. We built it to leverage that commodity hardware, because it was an architecture that was never tightly integrated with any specific underlying infrastructure. I still remember hearing the initial pitch from Mike Stonebreaker, about the vision of Vertica as a software only solution and the importance of separating the company from hardware innovation. And at the time, Mike basically said to me, "there's so much R&D in innovation that's going to happen in hardware, we shouldn't bake hardware into our solution. We should do it in software, and we'll be able to take advantage of that hardware." And that is exactly what has happened. But one of the most recent innovations that we embraced with hardware is certainly that separation of compute and storage. As I said previously, the public cloud providers offered this next generation architecture, really to ensure that they can provide the customers exactly what they needed, more compute or more storage and charge for each, respectively. The separation of compute and storage, compute from storage is a major milestone in data center architectures. If you think about it, it's really not only a public cloud innovation, though. It fundamentally redefines the next generation data architecture for on-premise and for pretty much every way people are thinking about computing today. And that goes for software too. Object storage is an example of the cost effective means for storing data. And even more importantly, separating compute from storage for analytic workloads has a lot of advantages. Including the opportunity to manage much more dynamic, flexible workloads. And more importantly, truly isolate those workloads from others. And by the way, once you start having something that can truly isolate workloads, then you can have the conversations around autonomic computing, around setting up some nodes, some compute resources on the data that won't affect any of the other data to do some things on their own, maybe some self analytics, by the system, etc. A lot of things that many of you know we've already been exploring in terms of our own system data in the product. But it was May 2018, believe it or not, it seems like a long time ago where we first announced Eon Mode and I want to make something very clear, actually about Eon mode. It's a mode, it's a deployment option for Vertica customers. And I think this is another huge benefit that we don't talk about enough. But unlike a lot of vendors in the market who will dig you and charge you for every single add-on like hit-buy, you name it. You get this with the Vertica product. If you continue to pay support and maintenance, this comes with the upgrade. This comes as part of the new release. So any customer who owns or buys Vertica has the ability to set up either an Enterprise Mode or Eon Mode, which is a question I know that comes up sometimes. Our first announcement of Eon was obviously AWS customers, including the trade desk, AT&T. Most of whom will be speaking here later at the Virtual Big Data Conference. They saw a huge opportunity. Eon Mode, not only allowed Vertica to scale elastically with that specific compute and storage that was needed, but it really dramatically simplified database operations including things like workload balancing, node recovery, compute provisioning, etc. So one of the most popular functions is that ability to isolate the workloads and really allocate those resources without negatively affecting others. And even though traditional data warehouses, including Vertica Enterprise Mode have been able to do lots of different workload isolation, it's never been as strong as Eon Mode. Well, it certainly didn't take long for our customers to see that value across the board with Eon Mode. Not just up in the cloud, in partnership with one of our most valued partners and a platinum sponsor here. Joy mentioned at the beginning. We announced Vertica Eon Mode for Pure Storage FlashBlade in September 2019. And again, just to be clear, this is not a new product, it's one Vertica with yet more deployment options. With Pure Storage, Vertica in Eon mode is not limited in any way by variable cloud, network latency. The performance is actually amazing when you take the benefits of separate and compute from storage and you run it with a Pure environment on-premise. Vertica in Eon Mode has a super smart cache layer that we call the depot. It's a big part of our secret sauce around Eon mode. And combined with the power and performance of Pure's FlashBlade, Vertica became the industry's first advanced analytics platform that actually separates compute and storage for on-premises data centers. Something that a lot of our customers are already benefiting from, and we're super excited about it. But as I said, this is a journey. We don't stop, we're not going to stop. Our customers need the flexibility of multiple public clouds. So today with Vertica 10, we're super proud and excited to announce support for Vertica in Eon Mode on Google Cloud. This gives our customers the ability to use their Vertica licenses on Amazon AWS, on-premise with Pure Storage and on Google Cloud. Now, we were talking about HDFS and a lot of our customers who have invested quite a bit in HDFS as a place, especially to store data have been pushing us to support Eon Mode with HDFS. So as part of Vertica 10, we are also announcing support for Vertica in Eon Mode using HDFS as the communal storage. Vertica's own Roth format data can be stored in HDFS, and actually the full functionality of Vertica is complete analytics, geospatial pattern matching, time series, machine learning, everything that we have in there can be applied to this data. And on the same HDFS nodes, Vertica can actually also analyze data in ORC or Parquet format, using External tables. We can also execute joins between the Roth data the External table holds, which powers a much more comprehensive view. So again, it's that flexibility to be able to support our customers, wherever they need us to support them on whatever platform, they have. Vertica 10 gives us a lot more ways that we can deploy Eon Mode in various environments for our customers. It allows them to take advantage of Vertica in Eon Mode and the power that it brings with that separation, with that workload isolation, to whichever platform they are most comfortable with. Now, there's a lot that has come in Vertica 10. I'm definitely not going to be able to cover everything. But we also introduced complex types as an example. And complex data types fit very well into Eon as well in this separation. They significantly reduce the data pipeline, the cost of moving data between those, a much better support for unstructured data, which a lot of our customers have mixed with structured data, of course, and they leverage a lot of columnar execution that Vertica provides. So you get complex data types in Vertica now, a lot more data, stronger performance. It goes great with the announcement that we made with the broader Eon Mode. Let's talk a little bit more about machine learning. We've been actually doing work in and around machine learning with various extra regressions and a whole bunch of other algorithms for several years. We saw the huge advantage that MPP offered, not just as a sequel engine as a database, but for ML as well. Didn't take as long to realize that there's a lot more to operationalizing machine learning than just those algorithms. It's data preparation, it's that model trade training. It's the scoring, the shaping, the evaluation. That is so much of what machine learning and frankly, data science is about. You do know, everybody always wants to jump to the sexy algorithm and we handle those tasks very, very well. It makes Vertica a terrific platform to do that. A lot of work in data science and machine learning is done in other tools. I had mentioned that there's just so many tools out there. We want people to be able to take advantage of all that. We never believed we were going to be the best algorithm company or come up with the best models for people to use. So with Vertica 10, we support PMML. We can import now and export PMML models. It's a huge step for us around that operationalizing machine learning projects for our customers. Allowing the models to get built outside of Vertica yet be imported in and then applying to that full scale of data with all the performance that you would expect from Vertica. We also are more tightly integrating with Python. As many of you know, we've been doing a lot of open source projects with the community driven by many of our customers, like Uber. And so now with Python we've integrated with TensorFlow, allowing data scientists to build models in their preferred language, to take advantage of TensorFlow. But again, to store and deploy those models at scale with Vertica. I think both these announcements are proof of our big bet number three, and really our commitment to supporting innovation throughout the community by operationalizing ML with that accuracy, performance and scale of Vertica for our customers. Again, there's a lot of steps when it comes to the workflow of machine learning. These are some of them that you can see on the slide, and it's definitely not linear either. We see this as a circle. And companies that do it, well just continue to learn, they continue to rescore, they continue to redeploy and they want to operationalize all that within a single platform that can take advantage of all those capabilities. And that is the platform, with a very robust ecosystem that Vertica has always been committed to as an organization and will continue to be. This graphic, many of you have seen it evolve over the years. Frankly, if we put everything and everyone on here wouldn't fit on a slide. But it will absolutely continue to evolve and grow as we support our customers, where they need the support most. So, again, being able to deploy everywhere, being able to take advantage of Vertica, not just as a business analyst or a business user, but as a data scientists or as an operational or BI person. We want Vertica to be leveraged and used by the broader organization. So I think it's fair to say and I encourage everybody to learn more about Vertica 10, because I'm just highlighting some of the bigger aspects of it. But we talked about those three market trends. The need to unify the silos, the need for hybrid multiple cloud deployment options, the need to operationalize business critical machine learning projects. Vertica 10 has absolutely delivered on those. But again, we are not going to stop. It is our job not to, and this is how Team Vertica thrives. I always joke that the next release is the best release. And, of course, even after Vertica 10, that is also true, although Vertica 10 is pretty awesome. But, you know, from the first line of code, we've always been focused on performance and scale, right. And like any really strong data platform, the execution engine, the optimizer and the execution engine are the two core pieces of that. Beyond Vertica 10, some of the big things that we're already working on, next generation execution engine. We're already actually seeing incredible early performance from this. And this is just one example, of how important it is for an organization like Vertica to constantly go back and re-innovate. Every single release, we do the sit ups and crunches, our performance and scale. How do we improve? And there's so many parts of the core server, there's so many parts of our broader ecosystem. We are constantly looking at coverages of how we can go back to all the code lines that we have, and make them better in the current environment. And it's not an easy thing to do when you're doing that, and you're also expanding in the environment that we are expanding into to take advantage of the different deployments, which is a great segue to this slide. Because if you think about today, we're obviously already available with Eon Mode and Amazon, AWS and Pure and actually MinIO as well. As I talked about in Vertica 10 we're adding Google and HDFS. And coming next, obviously, Microsoft Azure, Alibaba cloud. So being able to expand into more of these environments is really important for the Vertica team and how we go forward. And it's not just running in these clouds, for us, we want it to be a SaaS like experience in all these clouds. We want you to be able to deploy Vertica in 15 minutes or less on these clouds. You can also consume Vertica, in a lot of different ways, on these clouds. As an example, in Amazon Vertica by the Hour. So for us, it's not just about running, it's about taking advantage of the ecosystems that all these cloud providers offer, and really optimizing the Vertica experience as part of them. Optimization, around automation, around self service capabilities, extending our management console, we now have products that like the Vertica Advisor Tool that our Customer Success Team has created to actually use our own smarts in Vertica. To take data from customers that give it to us and help them tune automatically their environment. You can imagine that we're taking that to the next level, in a lot of different endeavors that we're doing around how Vertica as a product can actually be smarter because we all know that simplicity is key. There just aren't enough people in the world who are good at managing data and taking it to the next level. And of course, other things that we all hear about, whether it's Kubernetes and containerization. You can imagine that that probably works very well with the Eon Mode and separating compute and storage. But innovation happens everywhere. We innovate around our community documentation. Many of you have taken advantage of the Vertica Academy. The numbers there are through the roof in terms of the number of people coming in and certifying on it. So there's a lot of things that are within the core products. There's a lot of activity and action beyond the core products that we're taking advantage of. And let's not forget why we're here, right? It's easy to talk about a platform, a data platform, it's easy to jump into all the functionality, the analytics, the flexibility, how we can offer it. But at the end of the day, somebody, a person, she's got to take advantage of this data, she's got to be able to take this data and use this information to make a critical business decision. And that doesn't happen unless we explore lots of different and frankly, new ways to get that predictive analytics UI and interface beyond just the standard BI tools in front of her at the right time. And so there's a lot of activity, I'll tease you with that going on in this organization right now about how we can do that and deliver that for our customers. We're in a great position to be able to see exactly how this data is consumed and used and start with this core platform that we have to go out. Look, I know, the plan wasn't to do this as a virtual BDC. But I really appreciate you tuning in. Really appreciate your support. I think if there's any silver lining to us, maybe not being able to do this in person, it's the fact that the reach has actually gone significantly higher than what we would have been able to do in person in Boston. We're certainly looking forward to doing a Big Data Conference in the future. But if I could leave you with anything, know this, since that first release for Vertica, and our very first customers, we have been very consistent. We respect all the innovation around us, whether it's open source or not. We understand the market trends. We embrace those new ideas and technologies and for us true north, and the most important thing is what does our customer need to do? What problem are they trying to solve? And how do we use the advantages that we have without disrupting our customers? But knowing that you depend on us to deliver that unified analytics strategy, it will deliver that performance of scale, not only today, but tomorrow and for years to come. We've added a lot of great features to Vertica. I think we've said no to a lot of things, frankly, that we just knew we wouldn't be the best company to deliver. When we say we're going to do things we do them. Vertica 10 is a perfect example of so many of those things that we from you, our customers have heard loud and clear, and we have delivered. I am incredibly proud of this team across the board. I think the culture of Vertica, a customer first culture, jumping in to help our customers win no matter what is also something that sets us massively apart. I hear horror stories about support experiences with other organizations. And people always seem to be amazed at Team Vertica's willingness to jump in or their aptitude for certain technical capabilities or understanding the business. And I think sometimes we take that for granted. But that is the team that we have as Team Vertica. We are incredibly excited about Vertica 10. I think you're going to love the Virtual Big Data Conference this year. I encourage you to tune in. Maybe one other benefit is I know some people were worried about not being able to see different sessions because they were going to overlap with each other well now, even if you can't do it live, you'll be able to do those sessions on demand. Please enjoy the Vertica Big Data Conference here in 2020. Please you and your families and your co-workers be safe during these times. I know we will get through it. And analytics is probably going to help with a lot of that and we already know it is helping in many different ways. So believe in the data, believe in data's ability to change the world for the better. And thank you for your time. And with that, I am delighted to now introduce Micro Focus CEO Stephen Murdoch to the Vertica Big Data Virtual Conference. Thank you Stephen. >> Stephen: Hi, everyone, my name is Stephen Murdoch. I have the pleasure and privilege of being the Chief Executive Officer here at Micro Focus. Please let me add my welcome to the Big Data Conference. And also my thanks for your support, as we've had to pivot to this being virtual rather than a physical conference. Its amazing how quickly we all reset to a new normal. I certainly didn't expect to be addressing you from my study. Vertica is an incredibly important part of Micro Focus family. Is key to our goal of trying to enable and help customers become much more data driven across all of their IT operations. Vertica 10 is a huge step forward, we believe. It allows for multi-cloud innovation, genuinely hybrid deployments, begin to leverage machine learning properly in the enterprise, and also allows the opportunity to unify currently siloed lakes of information. We operate in a very noisy, very competitive market, and there are people, who are in that market who can do some of those things. The reason we are so excited about Vertica is we genuinely believe that we are the best at doing all of those things. And that's why we've announced publicly, you're under executing internally, incremental investment into Vertica. That investments targeted at accelerating the roadmaps that already exist. And getting that innovation into your hands faster. This idea is speed is key. It's not a question of if companies have to become data driven organizations, it's a question of when. So that speed now is really important. And that's why we believe that the Big Data Conference gives a great opportunity for you to accelerate your own plans. You will have the opportunity to talk to some of our best architects, some of the best development brains that we have. But more importantly, you'll also get to hear from some of our phenomenal Roth Data customers. You'll hear from Uber, from the Trade Desk, from Philips, and from AT&T, as well as many many others. And just hearing how those customers are using the power of Vertica to accelerate their own, I think is the highlight. And I encourage you to use this opportunity to its full. Let me close by, again saying thank you, we genuinely hope that you get as much from this virtual conference as you could have from a physical conference. And we look forward to your engagement, and we look forward to hearing your feedback. With that, thank you very much. >> Joy: Thank you so much, Stephen, for joining us for the Vertica Big Data Conference. Your support and enthusiasm for Vertica is so clear, and it makes a big difference. Now, I'm delighted to introduce Amy Fowler, the VP of Strategy and Solutions for FlashBlade at Pure Storage, who was one of our BDC Platinum Sponsors, and one of our most valued partners. It was a proud moment for me, when we announced Vertica in Eon mode for Pure Storage FlashBlade and we became the first analytics data warehouse that separates compute from storage for on-premise data centers. Thank you so much, Amy, for joining us. Let's get started. >> Amy: Well, thank you, Joy so much for having us. And thank you all for joining us today, virtually, as we may all be. So, as we just heard from Colin Mahony, there are some really interesting trends that are happening right now in the big data analytics market. From the end of the Hadoop hype cycle, to the new cloud reality, and even the opportunity to help the many data science and machine learning projects move from labs to production. So let's talk about these trends in the context of infrastructure. And in particular, look at why a modern storage platform is relevant as organizations take on the challenges and opportunities associated with these trends. The answer is the Hadoop hype cycles left a lot of data in HDFS data lakes, or reservoirs or swamps depending upon the level of the data hygiene. But without the ability to get the value that was promised from Hadoop as a platform rather than a distributed file store. And when we combine that data with the massive volume of data in Cloud Object Storage, we find ourselves with a lot of data and a lot of silos, but without a way to unify that data and find value in it. Now when you look at the infrastructure data lakes are traditionally built on, it is often direct attached storage or data. The approach that Hadoop took when it entered the market was primarily bound by the limits of networking and storage technologies. One gig ethernet and slower spinning disk. But today, those barriers do not exist. And all FlashStorage has fundamentally transformed how data is accessed, managed and leveraged. The need for local data storage for significant volumes of data has been largely mitigated by the performance increases afforded by all Flash. At the same time, organizations can achieve superior economies of scale with that segregation of compute and storage. With compute and storage, you don't always scale in lockstep. Would you want to add an engine to the train every time you add another boxcar? Probably not. But from a Pure Storage perspective, FlashBlade is uniquely architected to allow customers to achieve better resource utilization for compute and storage, while at the same time, reducing complexity that has arisen from the siloed nature of the original big data solutions. The second and equally important recent trend we see is something I'll call cloud reality. The public clouds made a lot of promises and some of those promises were delivered. But cloud economics, especially usage based and elastic scaling, without the control that many companies need to manage the financial impact is causing a lot of issues. In addition, the risk of vendor lock-in from data egress, charges, to integrated software stacks that can't be moved or deployed on-premise is causing a lot of organizations to back off the all the way non-cloud strategy, and move toward hybrid deployments. Which is kind of funny in a way because it wasn't that long ago that there was a lot of talk about no more data centers. And for example, one large retailer, I won't name them, but I'll admit they are my favorites. They several years ago told us they were completely done with on-prem storage infrastructure, because they were going 100% to the cloud. But they just deployed FlashBlade for their data pipelines, because they need predictable performance at scale. And the all cloud TCO just didn't add up. Now, that being said, well, there are certainly challenges with the public cloud. It has also brought some things to the table that we see most organizations wanting. First of all, in a lot of cases applications have been built to leverage object storage platforms like S3. So they need that object protocol, but they may also need it to be fast. And the said object may be oxymoron only a few years ago, and this is an area of the market where Pure and FlashBlade have really taken a leadership position. Second, regardless of where the data is physically stored, organizations want the best elements of a cloud experience. And for us, that means two main things. Number one is simplicity and ease of use. If you need a bunch of storage experts to run the system, that should be considered a bug. The other big one is the consumption model. The ability to pay for what you need when you need it, and seamlessly grow your environment over time totally nondestructively. This is actually pretty huge and something that a lot of vendors try to solve for with finance programs. But no finance program can address the pain of a forklift upgrade, when you need to move to next gen hardware. To scale nondestructively over long periods of time, five to 10 years plus is a crucial architectural decisions need to be made at the outset. Plus, you need the ability to pay as you use it. And we offer something for FlashBlade called Pure as a Service, which delivers exactly that. The third cloud characteristic that many organizations want is the option for hybrid. Even if that is just a DR site in the cloud. In our case, that means supporting appplication of S3, at the AWS. And the final trend, which to me represents the biggest opportunity for all of us, is the need to help the many data science and machine learning projects move from labs to production. This means bringing all the machine learning functions and model training to the data, rather than moving samples or segments of data to separate platforms. As we all know, machine learning needs a ton of data for accuracy. And there is just too much data to retrieve from the cloud for every training job. At the same time, predictive analytics without accuracy is not going to deliver the business advantage that everyone is seeking. You can kind of visualize data analytics as it is traditionally deployed as being on a continuum. With that thing, we've been doing the longest, data warehousing on one end, and AI on the other end. But the way this manifests in most environments is a series of silos that get built up. So data is duplicated across all kinds of bespoke analytics and AI, environments and infrastructure. This creates an expensive and complex environment. So historically, there was no other way to do it because some level of performance is always table stakes. And each of these parts of the data pipeline has a different workload profile. A single platform to deliver on the multi dimensional performances, diverse set of applications required, that didn't exist three years ago. And that's why the application vendors pointed you towards bespoke things like DAS environments that we talked about earlier. And the fact that better options exists today is why we're seeing them move towards supporting this disaggregation of compute and storage. And when it comes to a platform that is a better option, one with a modern architecture that can address the diverse performance requirements of this continuum, and allow organizations to bring a model to the data instead of creating separate silos. That's exactly what FlashBlade is built for. Small files, large files, high throughput, low latency and scale to petabytes in a single namespace. And this is importantly a single rapid space is what we're focused on delivering for our customers. At Pure, we talk about it in the context of modern data experience because at the end of the day, that's what it's really all about. The experience for your teams in your organization. And together Pure Storage and Vertica have delivered that experience to a wide range of customers. From a SaaS analytics company, which uses Vertica on FlashBlade to authenticate the quality of digital media in real time, to a multinational car company, which uses Vertica on FlashBlade to make thousands of decisions per second for autonomous cars, or a healthcare organization, which uses Vertica on FlashBlade to enable healthcare providers to make real time decisions that impact lives. And I'm sure you're all looking forward to hearing from John Yavanovich from AT&T. To hear how he's been doing this with Vertica and FlashBlade as well. He's coming up soon. We have been really excited to build this partnership with Vertica. And we're proud to provide the only on-premise storage platform validated with Vertica Eon Mode. And deliver this modern data experience to our customers together. Thank you all so much for joining us today. >> Joy: Amy, thank you so much for your time and your insights. Modern infrastructure is key to modern analytics, especially as organizations leverage next generation data center architectures, and object storage for their on-premise data centers. Now, I'm delighted to introduce our last speaker in our Vertica Big Data Conference Keynote, John Yovanovich, Director of IT for AT&T. Vertica is so proud to serve AT&T, and especially proud of the harmonious impact we are having in partnership with Pure Storage. John, welcome to the Virtual Vertica BDC. >> John: Thank you joy. It's a pleasure to be here. And I'm excited to go through this presentation today. And in a unique fashion today 'cause as I was thinking through how I wanted to present the partnership that we have formed together between Pure Storage, Vertica and AT&T, I want to emphasize how well we all work together and how these three components have really driven home, my desire for a harmonious to use your word relationship. So, I'm going to move forward here and with. So here, what I'm going to do the theme of today's presentation is the Pure Vertica Symphony live at AT&T. And if anybody is a Westworld fan, you can appreciate the sheet music on the right hand side. What we're going to what I'm going to highlight here is in a musical fashion, is how we at AT&T leverage these technologies to save money to deliver a more efficient platform, and to actually just to make our customers happier overall. So as we look back, and back as early as just maybe a few years ago here at AT&T, I realized that we had many musicians to help the company. Or maybe you might want to call them data scientists, or data analysts. For the theme we'll stay with musicians. None of them were singing or playing from the same hymn book or sheet music. And so what we had was many organizations chasing a similar dream, but not exactly the same dream. And, best way to describe that is and I think with a lot of people this might resonate in your organizations. How many organizations are chasing a customer 360 view in your company? Well, I can tell you that I have at least four in my company. And I'm sure there are many that I don't know of. That is our problem because what we see is a repetitive sourcing of data. We see a repetitive copying of data. And there's just so much money to be spent. This is where I asked Pure Storage and Vertica to help me solve that problem with their technologies. What I also noticed was that there was no coordination between these departments. In fact, if you look here, nobody really wants to play with finance. Sales, marketing and care, sure that you all copied each other's data. But they actually didn't communicate with each other as they were copying the data. So the data became replicated and out of sync. This is a challenge throughout, not just my company, but all companies across the world. And that is, the more we replicate the data, the more problems we have at chasing or conquering the goal of single version of truth. In fact, I kid that I think that AT&T, we actually have adopted the multiple versions of truth, techno theory, which is not where we want to be, but this is where we are. But we are conquering that with the synergies between Pure Storage and Vertica. This is what it leaves us with. And this is where we are challenged and that if each one of our siloed business units had their own stories, their own dedicated stories, and some of them had more money than others so they bought more storage. Some of them anticipating storing more data, and then they really did. Others are running out of space, but can't put anymore because their bodies aren't been replenished. So if you look at it from this side view here, we have a limited amount of compute or fixed compute dedicated to each one of these silos. And that's because of the, wanting to own your own. And the other part is that you are limited or wasting space, depending on where you are in the organization. So there were the synergies aren't just about the data, but actually the compute and the storage. And I wanted to tackle that challenge as well. So I was tackling the data. I was tackling the storage, and I was tackling the compute all at the same time. So my ask across the company was can we just please play together okay. And to do that, I knew that I wasn't going to tackle this by getting everybody in the same room and getting them to agree that we needed one account table, because they will argue about whose account table is the best account table. But I knew that if I brought the account tables together, they would soon see that they had so much redundancy that I can now start retiring data sources. I also knew that if I brought all the compute together, that they would all be happy. But I didn't want them to tackle across tackle each other. And in fact that was one of the things that all business units really enjoy. Is they enjoy the silo of having their own compute, and more or less being able to control their own destiny. Well, Vertica's subclustering allows just that. And this is exactly what I was hoping for, and I'm glad they've brought through. And finally, how did I solve the problem of the single account table? Well when you don't have dedicated storage, and you can separate compute and storage as Vertica in Eon Mode does. And we store the data on FlashBlades, which you see on the left and right hand side, of our container, which I can describe in a moment. Okay, so what we have here, is we have a container full of compute with all the Vertica nodes sitting in the middle. Two loader, we'll call them loader subclusters, sitting on the sides, which are dedicated to just putting data onto the FlashBlades, which is sitting on both ends of the container. Now today, I have two dedicated storage or common dedicated might not be the right word, but two storage racks one on the left one on the right. And I treat them as separate storage racks. They could be one, but i created them separately for disaster recovery purposes, lashing work in case that rack were to go down. But that being said, there's no reason why I'm probably going to add a couple of them here in the future. So I can just have a, say five to 10, petabyte storage, setup, and I'll have my DR in another 'cause the DR shouldn't be in the same container. Okay, but I'll DR outside of this container. So I got them all together, I leveraged subclustering, I leveraged separate and compute. I was able to convince many of my clients that they didn't need their own account table, that they were better off having one. I eliminated, I reduced latency, I reduced our ticketing I reduce our data quality issues AKA ticketing okay. I was able to expand. What is this? As work. I was able to leverage elasticity within this cluster. As you can see, there are racks and racks of compute. We set up what we'll call the fixed capacity that each of the business units needed. And then I'm able to ramp up and release the compute that's necessary for each one of my clients based on their workloads throughout the day. And so while they compute to the right before you see that the instruments have already like, more or less, dedicated themselves towards all those are free for anybody to use. So in essence, what I have, is I have a concert hall with a lot of seats available. So if I want to run a 10 chair Symphony or 80, chairs, Symphony, I'm able to do that. And all the while, I can also do the same with my loader nodes. I can expand my loader nodes, to actually have their own Symphony or write all to themselves and not compete with any other workloads of the other clusters. What does that change for our organization? Well, it really changes the way our database administrators actually do their jobs. This has been a big transformation for them. They have actually become data conductors. Maybe you might even call them composers, which is interesting, because what I've asked them to do is morph into less technology and more workload analysis. And in doing so we're able to write auto-detect scripts, that watch the queues, watch the workloads so that we can help ramp up and trim down the cluster and subclusters as necessary. There has been an exciting transformation for our DBAs, who I need to now classify as something maybe like DCAs. I don't know, I have to work with HR on that. But I think it's an exciting future for their careers. And if we bring it all together, If we bring it all together, and then our clusters, start looking like this. Where everything is moving in harmonious, we have lots of seats open for extra musicians. And we are able to emulate a cloud experience on-prem. And so, I want you to sit back and enjoy the Pure Vertica Symphony live at AT&T. (soft music) >> Joy: Thank you so much, John, for an informative and very creative look at the benefits that AT&T is getting from its Pure Vertica symphony. I do really like the idea of engaging HR to change the title to Data Conductor. That's fantastic. I've always believed that music brings people together. And now it's clear that analytics at AT&T is part of that musical advantage. So, now it's time for a short break. And we'll be back for our breakout sessions, beginning at 12 pm Eastern Daylight Time. We have some really exciting sessions planned later today. And then again, as you can see on Wednesday. Now because all of you are already logged in and listening to this keynote, you already know the steps to continue to participate in the sessions that are listed here and on the previous slide. In addition, everyone received an email yesterday, today, and you'll get another one tomorrow, outlining the simple steps to register, login and choose your session. If you have any questions, check out the emails or go to www.vertica.com/bdc2020 for the logistics information. There are a lot of choices and that's always a good thing. Don't worry if you want to attend one or more or can't listen to these live sessions due to your timezone. All the sessions, including the Q&A sections will be available on demand and everyone will have access to the recordings as well as even more pre-recorded sessions that we'll post to the BDC website. Now I do want to leave you with two other important sites. First, our Vertica Academy. Vertica Academy is available to everyone. And there's a variety of very technical, self-paced, on-demand training, virtual instructor-led workshops, and Vertica Essentials Certification. And it's all free. Because we believe that Vertica expertise, helps everyone accelerate their Vertica projects and the advantage that those projects deliver. Now, if you have questions or want to engage with our Vertica engineering team now, we're waiting for you on the Vertica forum. We'll answer any questions or discuss any ideas that you might have. Thank you again for joining the Vertica Big Data Conference Keynote Session. Enjoy the rest of the BDC because there's a lot more to come

Published Date : Mar 30 2020

SUMMARY :

And he'll share the exciting news And that is the platform, with a very robust ecosystem some of the best development brains that we have. the VP of Strategy and Solutions is causing a lot of organizations to back off the and especially proud of the harmonious impact And that is, the more we replicate the data, Enjoy the rest of the BDC because there's a lot more to come

ENTITIES

Entity	Category	Confidence
Stephen	PERSON	0.99+
Amy Fowler	PERSON	0.99+
Mike	PERSON	0.99+
John Yavanovich	PERSON	0.99+
Amy	PERSON	0.99+
Colin Mahony	PERSON	0.99+
AT&T	ORGANIZATION	0.99+
Boston	LOCATION	0.99+
John Yovanovich	PERSON	0.99+
Vertica	ORGANIZATION	0.99+
Joy King	PERSON	0.99+
Mike Stonebreaker	PERSON	0.99+
John	PERSON	0.99+
May 2018	DATE	0.99+
100%	QUANTITY	0.99+
Wednesday	DATE	0.99+
Colin	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Vertica Academy	ORGANIZATION	0.99+
five	QUANTITY	0.99+
Joy	PERSON	0.99+
2020	DATE	0.99+
two	QUANTITY	0.99+
Uber	ORGANIZATION	0.99+
Stephen Murdoch	PERSON	0.99+
Vertica 10	TITLE	0.99+
Pure Storage	ORGANIZATION	0.99+
one	QUANTITY	0.99+
today	DATE	0.99+
Philips	ORGANIZATION	0.99+
tomorrow	DATE	0.99+
AT&T.	ORGANIZATION	0.99+
September 2019	DATE	0.99+
Python	TITLE	0.99+
www.vertica.com/bdc2020	OTHER	0.99+
One gig	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
Second	QUANTITY	0.99+
First	QUANTITY	0.99+
15 minutes	QUANTITY	0.99+
yesterday	DATE	0.99+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for C-Store: