Mike Cohen, Splunk | Leading with Observability

(upbeat music playing) >> Narrator: From theCUBE's studios in Palo Alto in Boston, connecting with thought leaders all around the world. This is a CUBE conversation. >> Hello, everyone, welcome to this CUBE conversation. I'm John Ferry, host of theCUBE. We're doing a content series called leading with observability. And this segment is network observability for distributed services. And we have CUBE alumni Mike Cohen, head of product management for network monitoring at Splunk. Mike, great to see you. It's been a while, going back to the open stack days, red hat summit. Now here talking about observability with Splunk. Great to see you. >> Thanks a lot for having me. >> So the world's right now observability is at the center of all the conversations from monitoring, investing infrastructure, on premises cloud and also cyber security. A lot of conversations, a lot of, broad reaching implications observability. You're at the head of product management, network observability at Splunk. This is where the conversation's going getting down at the network layer, getting down into the, as the packets move around. This is becoming important. Why is this the trend? What's the situation? >> Yeah, so we're seeing a couple of different trends that are really driving how people think about observability, right? One of them is this huge migration towards public cloud architecture. And you're running, you're running on an infrastructure that you don't own yourself. The other one is around how people are rebuilding and refactoring applications around service-based architectures scale-out models, cloud native paradigms. And both of these things is, they're really introducing a lot of new complexity into the applications and really increasing the service area of where problems can occur. And what this means is when you actually have gaps in visibility or places where you have a separate tool, you know, analyzing parts of your system. It really makes it very hard to debug when things go wrong and to figure out where problems occur. And really what we've seen is that, you know people really need an integrated solution to observability. And one that can really span from what your user is seeing but all the way to the deepest backend services. Where are the problems in some of the core in your infrastructure that you're operating? So that you can really figure out where, where problems occur. And really network observability is playing a critical role in kind of filling in one of those critical gaps. >> You know, you think about the 10 years past decade we've been on this wave. It feels like now more than ever, it's an inflection point because of how awesome cloud native has become from a value standpoint. Value creation, time to market all those things that you know why people are investing in modern applications. But then as you build out your architecture and your, your infrastructure to make that happen there's more things happening. Everything as a service creates new dependencies new things to document. This is an opportunity, certainly on one hand on the other hand, it's a technical challenge. So, you know, balancing out, technical dead end or deploying new stuff, you got to monitor it all. Right, monitoring has turned into observability which is just code word for cloud scale monitoring, I guess. I mean, is that how you see it? I mean, how could you, how do you talk about this? Because it's certainly a major shift happening right now and this transition is pretty obvious. >> Yeah. Yeah, no, absolutely. And we've, you know, we've seen a lot of new interests into the network visibility, network monitoring space. And really, again, the drivers of that, like you know, network infrastructure is actually becoming increasingly opaque as you move towards, you know public cloud. You know, kind of public cloud environments. And it's been sort of a fun thing to blame the network. And say, look Oh it's the network we don't know what's going on. But you know, it's not always the network. Sometimes it is, sometimes it isn't. You actually need to understand where these problems are really occurring to actually have the right level of visibility in your systems. But the other way we've started talking to people thinking about this is. The network has an empowering capability an untapped resource. That you can actually get new data about your distributed systems. You know, SREs are struggling to understand these complex environments, but by. You know with the capabilities we've seen and started taking advantage of things like EBPF and monitoring from the OS. We can actually get visibility into how processes and containers communicate and that can give us insights into our system. It's a new source of data that actually has not existed in the past. That is now available to help us with the broader observability problem. >> You mentioned SRE, Site Reliable Engineers, as it's known Google kind of pioneered this. It's become a kind of a standard persona in large scale kind of infrastructure, cloud environments and what not like massive scale. Are you seeing SREs, now that role become more mainstream in enterprises? I mean, cause some enterprises might not call on the SRE medical on the cloud architect. I mean, what can you just help as you know, if you could tie that together cause it is certainly happening. Is it becoming a proliferating? >> For sure, absolutely Yeah. No absolutely, I think SREs, you know, the title may vary across organizations as you point out. And sometimes the exact layout of you know, the organizational breakdown varies. But this role of someone who really cares about keeping the system up you know, and you know, caring for it and scaling it out and thinking about its architecture is now a really critical role. And sometimes that role sits alongside, it sits alongside developers who are writing the code. And this is really happening in almost every organization that, that we're dealing with today. It is becoming a mainstream occurrence. >> Yeah, it's interesting, I'm going to ask you a question about what businesses are missing when they think about how to, think about observability but since you brought up that, that piece. It's almost as if kubernetes created this kind of demarcation between the line. Between half the stack and the top of the half and bottom half of the stack. Where you can do a lot of engineering underneath the second half of the stack or the bottom of the stack up to say kubernetes and then above that you could just be infrastructure as code application developer. So it's almost, it's almost kind of like leveled out with nice lanes there. I mean, I'm oversimplifying it, but I mean how do you react to that? Do you see that evolving too? Because it's all seems cleaner now. It's like you're engineering below Kubernetes or above it. >> Oh, absolutely. It's definitely one of the ways you see sort of the deepest engagement in. As folks go towards Kubernetes, they start embracing containers. They you know, they start building microservices. You'll see development teams really accelerate the pace of innovation that they have, you know, in in their environment. And that's really the, you know kind of the driver behind this. So, you know, we do see that, that sort of rebuilding refactoring as some of the most, some of the biggest drivers behind, these initiatives. >> What are businesses missing around observability? Cause it seems to be, first of all a very overfunded segment, a lot of new startups coming in. A lot of security vendors over here, you're seeing network folks moving in. What's almost becoming a fabric feature piece of things. What is that mean to businesses? What, what are businesses missing or getting? How are people evaluating observability? How do you see that? >> Yeah. So I'll, for sure, I'll talk. I'll start initially to talk generically about it but then I'll talk a little bit about network areas specifically, right? That's I think one of the, one of the things people are realizing they need in observability is this approach as an integrated suite. So having a disparate set of tools can make it very hard for SREs to actually take advantage of all those tools, use the data within them to solve meaningful problems. And I think what we're, you know, what we're seeing as we've been talking to more people in the industry. They really want something that can bring all that data together and build it into an insight that can help them solve a problem more quickly. Right, so that, you know, I think that's the broader context of what's going on. And I think that's driving some of the work we're doing on the network side. Because, network is a powerful new data set that we can combine with other aspects of what people have already been doing in observability. >> What do you think about programmability? That's been a big topic, when you start to get into that kind of mindset. You're almost making the the software defined aspect come in here heavily. How does that play in, how do you what's your vision around, you know making the network adaptable, programmable, measurable, fully, fully surveilled? >> Yeah, yeah. So I think we'll work, well again, what we're focused on is the capabilities you can have in using, using the network as a means of visibility and observability for, for its systems. Networks are becoming highly flexible. A lot of people, once they get into a cloud environment they have a very rich set of networking capabilities. But what they want to be able to do is use that as a way of getting visibility into the system. So, to talk for, I can talk for a minute or two about some of the capabilities we're exposing. Use it in network observer, network observability. One of them is just being able to visual, visualize and optimize a service architecture. So really seeing what's connecting to what automatically. So we've been using a technology called EBPF, the Extended Berkeley Packet Filter. Part of everyone's Linux operating system, right? You know, you're running Linux you basically have this already. And it gives you an interesting touch point to observe the behavior of every processing container automatically. When you can actually see, with very little overhead what they're doing and correlate that with data from systems like Kubernetes to understand how distributed systems behave. To see how things connect to two other things. We can use this to build a complete service map of the system in seconds, automatically without developers having to do any additional work. Without having, without forcing anyone to change their code. They can get visibility across an entire system automatically. >> That's like the original value proposition of Splunk. When it came out, it was just a great tool for Splunk and the data from logs. Now, as data becomes more complex you're still instrumenting and those are critical services. And they're now microservices, the trends at the top of the stack and on, at the network layer. The network layer has always been a hard nut to crack. I got to ask you why now? Why do you feel, you mentioned earlier that everyone used to blame the network. Oh, it's not my problem. You really can't finger point when you start getting into full instrumentation of the, of the traffic patterns and the underlying processes. So it seems to be good magic going on here. What's the core issue? What are the, what's the, what's going on here? Why is it, why is it now? >> Mike: Yeah. >> Why is the time now? >> Yeah. So, yeah, well. So unreliable networks, slow network, DNS problems. These have always been present in systems. The problem is they're actually becoming exacerbated because people have less visibility into, into them. But also as you have these distributed systems the failure modes are getting more complex. So you'll actually have some of the longest, most challenging troubleshooting problems are these network issues, which tend to be transient which tend to bounce around the systems. They tend to cause other unrelated alerts to happen. Inside your application stack with multiple teams, troubleshooting the wrong problems that don't really exist. So, the network has actually caused some of the most painful outages that the teams, the teams see. And when these outages happen, what you really need to be able to know is, is it truly a network problem or is it something in another part of my system? If I'm running a distributed service, what, you know, which services are affected? Because that's the language now my team thinks about. As you mentioned now, they're in kubernetes. They're trying to think which Kubernetes services are actually going, affected by a potential network outage that I'm worried about? The other aspect is figuring out the scope of the impact. So, are there a couple instances in my cloud provider that aren't doing well? Is an entire availability zone, having problems? Is there a region of the, of the world that, that's an issue? Understanding the scope of this problem will actually help me as an SRE decide what the right mitigation is. And, you know, and by limiting it as much as possible, it can actually help me better hit my SLA. Because I won't have to hit something with a huge hammer when a really small one might solve the problem. >> Yeah, this is one of the things that comes up. Almost just hearing you talk I'm seeing how it could be complex for the customer just documenting the dependencies. I mean, as services come online someone of them are going to be very dynamic not just at the network, both the application level, we mentioned Kubernetes. And you've got service meshes and microservices. You're going to start to see the need to be tracking all this stuff. And that's a big, that's a big part of what's going on with the, with your suite right now. The ability to help there. How are you guys helping people do that? >> Yeah, absolutely. So, you know, just understanding dependencies is, you know, is one of the key aspects of these distributed systems. You know, this began as a simple problem. You have a monolithic application it kind of runs on one machine. You understand its behavior. Once you start moving towards microservices it's very easy for that to change from. Look, we have a handful of microservices to we have hundreds, to we have thousands and they can be running across thousands or tens of thousands of machines as you get bigger. And understanding that environment can become a major challenge and teams' role. They'll end up with the handwritten diagram that has the behavior of their services broken out. Or they'll find out that there's an interaction that they didn't expect to have happened. And that may be the source of an issue. So, you know, one of the capabilities we have using network monitoring out of the operating system with EBPF. Is, we can actually automatically discover every connection that's made. So if you're able to watch the sockets they're created in licks, you can actually see how containers interact with each other. Then you can use that to build automatic service dependency diagrams. So without the user having to change the code, to change anything about their system. You can automatically discover those dependencies and you'll find things you didn't expect. You'll find things that change over time, that weren't well-documented. And these are the critical, the critical level of understanding you need to get to and use the environment. >> Yeah. You know, it's interesting you mentioned that you might've missed them in the past. People have that kind of feeling at the network either because they weren't tracking it well or they used a different network tool. I mean, just packet loss by itself is one, service and host health is another. And if you could track everything, then you got to build it. So I got, so I love, love this direction. My question really is more of, okay how do you operationalize it? Okay, I'm a operator, am I getting alerts? Do I, does it just auto discover? How does this all work from a user, usability standpoint? How do I? >> Yeah. >> What are the key features that unlock, what gets unlocked from that, that kind of instrumentation? >> Yeah, well again, when you do this estimation correctly. It can be really, it can be automatic, right? You can actually put an agent that might run in one of your, on your instances collecting data based on the, that the traffic and the interactions that occur without you having to take any action that's really the Holy grail. And that's where some of the best value of these systems emerge. It just works out of the box. And then it'll pull data from other systems like your cloud provider from your Kubernetes environment and use that to build a picture of what's going on. And that's really where this is, where these systems get super valuable is they actually just, they just work without you having to do a ton of work behind the scenes. >> So Mike, I got to ask you a final question. Explain the distributed services aspect of observability. What should people walk away with from a main concept standpoint and how does it apply to their environment? What should they be thinking about? What is it and what's the real story there? >> Yeah, so I think the way we're thinking about this is. How can you turn, the network from a liability to a strength in the, in your, in these distributed environments, right? So, what it can, you know, by observing data at the network level and, out of the operating system. You can actually use it to automatically construct service maps. To learn about your system, improve the insight and understanding you have of your, of your complex systems. You can identify network problems that are occurring. You can understand how you're utilizing aspects of the network. It can drive things like, costs, cost optimization in your environment. So you can actually get better insights and, be able to troubleshoot problems better and handle the blame game of, is the network really the problem that I'm seeing or is it occurring somewhere else in my application? And though, that's really critical in these complex distributed environments. And critically you can do it in a way that doesn't actually add overhead to your development team. You don't have to change the code. You don't have to, take on a complex engineering task. You just, you can actually deploy agents. that'll act, that'll be able to collect this data automatically. >> Awesome, and take that complexity away and automate, help people get the job done. Great, great stuff. Mike, thanks for coming on theCUBE. Leading with observability, I'm John Ferry with theCUBE. Thanks for watching. >> Mike: Yeah, thanks a lot. (gentle music playing)

Published Date : Feb 22 2021

SUMMARY :

all around the world. to the open stack days, red hat summit. So the world's right So that you can really figure out where, I mean, is that how you see it? And we've, you know, we've seen I mean, what can you about keeping the system up you know, and bottom half of the stack. of innovation that they have, you know, in What is that mean to businesses? And I think what we're, you know, How does that play in, how do you of the system in seconds, automatically I got to ask you why now? of the most painful how it could be complex for the customer And that may be the source of an issue. And if you could track everything, that the traffic and the Explain the distributed services of the network. people get the job done. Mike: Yeah, thanks a lot.

ENTITIES

Entity	Category	Confidence
John Ferry	PERSON	0.99+
Mike Cohen	PERSON	0.99+
Mike	PERSON	0.99+
Palo Alto	LOCATION	0.99+
thousands	QUANTITY	0.99+
hundreds	QUANTITY	0.99+
One	QUANTITY	0.99+
Splunk	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
two	QUANTITY	0.99+
both	QUANTITY	0.99+
one machine	QUANTITY	0.99+
theCUBE	ORGANIZATION	0.99+
one	QUANTITY	0.99+
Boston	LOCATION	0.99+
CUBE	ORGANIZATION	0.98+
Linux	TITLE	0.98+
a minute	QUANTITY	0.98+
Kubernetes	TITLE	0.97+
half	QUANTITY	0.94+
second half of	QUANTITY	0.88+
today	DATE	0.87+
tens of thousands	QUANTITY	0.83+
SRE	ORGANIZATION	0.82+
EBPF	TITLE	0.79+
half the stack	QUANTITY	0.79+
10 years past decade	DATE	0.77+
two other things	QUANTITY	0.74+
machines	QUANTITY	0.73+
SREs	TITLE	0.71+
Engineers	ORGANIZATION	0.67+
Site	ORGANIZATION	0.67+
Splunk	TITLE	0.63+
SRE	TITLE	0.62+
couple	QUANTITY	0.62+
them	QUANTITY	0.58+
Berkeley	COMMERCIAL_ITEM	0.57+
red hat	EVENT	0.54+
Splunk	PERSON	0.51+
Kubernetes	ORGANIZATION	0.5+

Arijit Mukherji, Splunk | Leading with Observability

>> Announcer: From theCUBE studios in Palo Alto and Boston, connecting with thought leaders all around the world, this is a CUBE Conversation. >> Hello and welcome to this special CUBE Conversation here in the Palo Alto studios, I'm John Furrier, host of theCUBE, for this Leading with Observability series with Under the Hood with Splunk Observability, I'm John Furrier with Arijit Mukherji with Splunk, he's a distinguished engineer, great to have you on. These are my favorite talks. Under the Hood means we're going to get all the details, what's powering observability, thanks for coming on. >> It's my pleasure, John, it's always nice to talk to you. >> Leading with Observability is the series, want to take a deep dive look across the spectrum of the product, the problems that it's solving, but Under the Hood is a challenge, because, people are really looking at coming out of COVID with a growth strategy, looking at cloud-native, Kubernetes, you're starting to see microservices really be a big part of that, in real deployments, in real scale. This has been a theme that's been growing, we've been covering it. But now, architectural decisions start to emerge. Could you share your thoughts on this, because this becomes a big conversation. Do you buy a tool here, how do you think it through, what's the approach? >> Exactly, John. So it's very exciting times in some sense, with observability right now. So as you mentioned and discussed a few times, there's a bunch of trends that are happening in the industry which is causing a renewed interest in observability, and also an appreciation of the importance of it, and observability now as a topic, it's like a huge umbrella topic, it covers many many different things like APM, your infrastructure monitoring, your logging, your real user monitoring, your digital experience management, and so on. So it's quite a set of things that all fall under observability, and so the challenge right now, as you mentioned, is how do we look at this holistically? Because, I think at this point, it is so many different parts to this edifice, to this building, that I think having a non-integrated strategy where you just maybe go buy or build individual pieces, I don't think that's going to get you very far, given the complexity of what we're dealing with. And frankly, that's one of the big challenges that we, as architects within Splunk, we are scratching our heads with, is how do we sort of build all of this in a more coherent fashion? >> You know, one of the things, Arijit, I want to get your thoughts on is because, I've been seeing this trend and, we've been talking about it on theCUBE a lot around systems thinking, and if you look at the distributed computing wave, from just go back 20 years and look at the history of how we got here, a lot of those similar concepts are happening again, with the cloud, but not as simple. You're seeing a lot more network, I won't say network management, but observability is essentially instrumentation of the traffic and looking at all the data, to make sure things like breaches and cybersecurity, and also making systems run effectively, but it's distributed computing at the end of it, so there's a lot of science that's been there, and now new science emerging around, how do you do this all? What's your thoughts on this, because this becomes a key part of the architectural choices that some companies have to make, if they want to be in position to take advantage of cloud-native growth, which is multifold benefits, and your product people talk about faster time to market and all that good stuff, but these technical decisions matter, can you explain? >> Yes, it absolutely does. I think the main thing that I would recommend that everybody do, is understand why observability, what do you want to get out of it? So it is not just a set of parts, as I mentioned earlier, but it brings direct product benefits, as we mentioned, like faster mean time to resolution, understanding what's going on in your environment, having maybe fewer outages at the same time, understanding your causes, so many different benefits. So the point is not that one has the ability to do maybe (indistinct) or ability to do infrastructure (indistinct), the main question is aspirationally, what are my goals that are aligned to what my business wants? So what do I want to achieve, do I want to innovate faster? In that case, how is observability going to help me? And this is sort of how you need to define your strategy in terms of what kind of tools you get and how they work together. And so, if you look at what we're doing at Splunk, you'll notice it's extremely exciting right now, there's a lot of acquisitions happening, a lot of products that we're building, and the question we're asking as architects is, suppose we want to use, that will help us achieve all of this, and at the same time be somewhat future-proofed. And I think any organization that's either investing in it, or building it, or buying it, they all would probably want to think along those lines. Like what are my foundational principles, what are the basic qualities I want to have out of this system? Because technologies and infrastructures will keep on changing, that's sort of the rule of nature right now. The question is how do we best address it in a more future-proofed system? At Splunk, we have come up with a few guiding principles, and I'm sure others will have done the same. >> You know, one of the dynamics I want to get your reaction to is kind of two perspectives, one is, the growth of more teams that are involved in the work, so whether it's from cyber to monitoring, there's more teams with tools out there that are working on the network. And then you have just the impact of the diversity of use cases, not so much data volume, 'cause that's been talked about, lot of, we're having a tsunami of data, that's clear. But different kinds of dynamics, whether it's real-time, bursting, and so when you have this kind of environment, you can have gaps. And solar winds have taught us anything, it's that you have to identify problems and resolve them, this comes up a lot in observability conversations, MTTI, mean time to identify, and then to resolve. These are concepts. If you don't see the data, you can't understand what's going on if you can't measure it. This is like huge. >> Yes, absolutely right, absolutely right. So what we really need now is, as you mentioned, we need an integrated tool set, right? What we mean by that, is the tools must be able to work together, the data must be able to be used across the board. So like by use case it should not be siloed or fragmented, that they should work as one system that users are able to learn, and then sort of be able to use effectively without context switching. Another concept that's quite important is, how flexible are you? Are you digging yourself into a fixed solution, or are you depending on open standards that will then let you change out implementations, or vendors, or what have you, (static crackles) down the line, relatively easily. So understanding how you're collecting the data, how good can open standards and open source you're using is important. But to your point about missing and gaps, I think full fidelity, like understanding every single transaction, if you can pull it off, is a fascinating superpower, because that's where you don't get the gaps, and if you are able to go back and track any bad transaction, any time, that is hugely liberating, right? Because without that, if you're going to do a lot of sampling, you're going to miss a huge percentage of the user interactions, that's probably a recipe for some kind of trouble down the line, as you mentioned. And actually, these are some of those principles that we are using to build the Splunk Observability Suite, is no sample or full fidelity is a core foundational principle, and for us, it's not just isolated to, let's say application performance management, where user gets your API and you're able to track what happened, we are actually taking this upstream, up to the user, so the user is taking actions on the browser, how do we capture and correlate what's happening on the browser, because (indistinct) as you know, there's a huge move towards single-page applications, where half of my logic that my users are using is actually running on the browser, right? And so understanding the whole thing end to end, without any gaps, without any sampling, is extremely powerful. And so yes, so those are some of the things that we're investing in, and I think, again, one should keep in mind, when they're considering observability. >> You know, we were talking the other day, and having a debate around technical debt, and how that applies to observability, and one of the things that you brought up earlier about tools, and tool sprawl, that causes problems, you have operational friction, and we've heard people say "Yeah, I've got too many tools," and just too much, to replatform or refactor, it's just too much pain in the butt for me to do that, so at some point they break, I take on too much technical debt. When is that point of no return, where someone feels the pain on tool sprawl? What are some of the signaling where it's like, "You better move now (indistinct) too late," 'cause this integrated platform, that's what seems to be the way people go, as you mentioned. But this tool sprawl is a big problem. >> It is, and I think it starts hitting you relatively early on, nowadays, if you ask my opinion. So, tool sprawl is I think, if you find yourself, I think using three or four different tools, which are all part of some critical workload together, that's a stink that there's something could be optimized. For example, let's say I'm observing whether my website works fine, and if my alerting tool is different from my data gathering, or whatever, the infrastructure monitoring metrics tool, which is different from my incident management tool, which is different from my logs tool, then if you put the hat on of an engineer, a poor engineer who's dealing with a crisis, the number of times they have to context switch and the amount of friction that adds to the process, the delay that it adds to the process is very very painful. So my thinking is that at some point, especially if we find that core critical workloads are being fragmented, and that's when sort of I'm adding a bunch of friction, it's probably not good for us to sort of make that sort of keep on going for a while, and it would be time to address that problem. And frankly, having these tools integrated, it actually brings a lot of benefit, which is far bigger than the sum of the parts, because think about it, if I'm looking at, say, an incident, and if I'm able to get a cross-tool data, all presented in one screen, one UI, that is hugely powerful because it gives me all the information that I need without having to, again, dig into five different tools, and allows me to make quicker, faster decisions. So I think this is almost an inevitable wave that everybody must and will adopt, and the question is, I think it's important to get on the good program early, because unless you sort of build a lot of practices within an organization, that becomes very very hard to change later, it is just going to be more costly down the line. >> So from an (indistinct) standpoint, under the hood, integrated platform, takes that tool sprawl problem away, helps there. You had open source technology so there's no lock-in, you mentioned full fidelity, not just sampling, full end to end tracing, which is critical, wants to avoid those gaps. And then the other are I want to get your thoughts on, that you didn't bring up yet, that people are talking about is, real time streaming of analytics. What role does that play, is that part of the architecture, what function does that do? >> Right, so to me, it's a question of, how quickly do I find a problem? If you think about it, we are moving to more and more software services, right? So everybody's a software service now, and we all talk to each other in different services. Now, any time you use a dependency, you want to know how available it is, what are my SLAs and SLOs and so on, and three nines is almost a given, that you must provide three nines or better. Ideally four nines of availability, because your overall system stability is going to be less than the one of any single part, and if you go to look at four nines, you have about four or five minutes of total downtime in one whole month. That's a hard thing to be able to control. And if your alerting is going to be in order of five or 10 minutes, there's no chance you're going to be able to promise the kind of high availability that you need to be able to do, and so the fundamental question is you need to understand problems quick, like fast, within seconds, ideally. Now streaming is one way to do it, but that really is the problem definition, how do I find the problems early enough so that I can give my automation or my engineers time to figure out what happened and take corrective action? Because if I can't even know that there's something amiss, then there's no chance I'm going to be able to sort of provide that availability that my solution needs. So in that context, real time is very important, it is much more important now, because we have all these software and service dependencies, than it maybe used to be in the past. And so that's kind of why, again, at Splunk, we invested in real time streaming analytics, with the idea again being, let the problem, how can we address this, how can we provide customers with quick, high level important alerts in seconds, and that sort of real time streaming is probably the best way to achieve that. And then, if I were to, sorry, go ahead. >> No, go on, finish. >> Yeah, I was going to say that it's one thing to get an alert, but the question then is, now what do I do with it? And there's obviously a lot of alert noise that's going out, and people are fatigued, and I have all these alerts, I have this complex environment, understanding what to do, which is sort of reducing the MTTR part of it, is also important, I think environments are so complex now, that without a little bit of help from the tool, you are not going to be able to be very effective, it's going to take you longer, and this is also another reason why integrated tools are better, because they can provide you hints, looking at all the data, not just one type, not just necessarily logs, or not just necessarily traces, but they have access to the whole data set, and they can give you far better hints, and that's again one of the foundational principles, because this is in the emergent field of AIOps, where the idea is that we want to bring the power of data science, the power of machine learning, and to aid the operator in figuring out where a problem might be, so that they can at least take corrective action faster, not necessarily fix it, but at least bypass the problem, or take some kind of corrective action, and that's a theme that sort of goes across our suite of tools is, the question we ask ourselves is, "In every situation, what information could I have provided them, what kind of hints could we have provided them, to short circuit their resolution process?" >> It's funny you mention suite of tools, you have an Observability Suite, which Splunk leads with, as part of the series, it's funny, suite of tools, it's kind of like, you kind of don't want to say it, but it is kind of what's being discussed, it's kind of a platform and tool working together, and I think the trend seems to be, it used to be in the old days, you were a platform player or a tool player, really kind of couldn't do both, but now with cloud-native, as it's distributed computing, with all this importance around observability, you got to start thinking, suite has platform features, could you react to that, and how would you talk about that, because what does it mean to be a platform? Platforms have benefits, tools have benefits, working together implies it's a combination, could you share your thoughts on that reaction to that? >> That's a very interesting question you asked, John, so this is actually, if you asked me how I look at the solution set that we have, I will explain it thus. We are a platform, we are a set of products and tools, and we are an enterprise solution. And let me explain what I mean by that, because I think all of these matter, to somebody or the other. As a platform, you're like "How good am I in dealing with data?" Like ingesting data, analyzing data, alerting you, so those are the core foundational features that everybody has, these are the database-centric aspects of it, right? And if you look at a lot of organizations who have mature practices, they are looking for a platform, maybe it scales better than what they have, or whatnot, and they're looking for a platform, they know what to do, build out on top of that, right? But at the same time, a platform is not a product, 99% of our users, they're not going to make database calls to fetch and query data, they want an end to end, like a thing that they can use to say, "Monitor my Kubernetes," "Monitor my Elasticsearch," "Monitor my," you know, whatever other solution I may have. So then we build a bunch of products that are built on top of the platform, which provide sort of the usability, so where, it's very easy to get on, send the data, have built-in content, dashboard (indistinct), what have you, so that my day to day work is fast, because I'm not a observability engineer, I'm a software engineer working on something, and I want to use observability, make it easy for me, right? So that's sort of the product aspect of it. But then if you look at organizations that a little bit scale up, just a product is also not good enough. Now we're looking at a observability solution that's deployed in an enterprise, and there are many many products, many many teams, many many users, and then how can one be effective there? And if you look at what's important at that level, it's not the database aspect or the platform aspect, it's about how well can I manage it, do I have visibility into what I am sending, what my bill is, can I control against incorrect usage, do I have permissions to sort of control who can mess with my (indistinct) and so on, and so there's a bunch of layer of what we call enterprise capabilities that are important in an organizational setting. So I think in order to build something that's successful in this space, we have to think at all these three levels, right? And all of these are important, because in the end, it's how much value am I getting out of it, it's not just what's theoretically possible, what's really happening, and all of these are important in that context. >> And I think, Arijit, that's amazing masterclass right there, soundbite right there, and I think it's because the data also is important, if you're going to be busting down data silos, you need to have a horizontally scalable data observability space. You have to have access to the data, so I think the trend will be more integrated, clearly, and more versatile from a platform perspective, it has to be. >> Absolutely, absolutely. >> Well, we're certainly going to bring you back on our conversations when we have our events and/or our groups around digital transformation Under the Hood series that we're going to do, but great voice, great commentary, Arijit, thank you for sharing that knowledge with us, appreciate it. >> My pleasure, thank you very much. >> Okay, I'm John Furrier with theCUBE, here, Leading with Observability content series with Splunk, I'm John Furrier with theCUBE, thanks for watching. (calm music)

Published Date : Feb 22 2021

SUMMARY :

leaders all around the world, great to have you on. always nice to talk to you. Could you share your thoughts on this, and so the challenge right and if you look at the and at the same time be it's that you have to identify and if you are able to go back and how that applies to observability, the delay that it adds to the that part of the architecture, and so the fundamental question is And if you look at a lot of organizations and I think it's because going to bring you back I'm John Furrier with

ENTITIES

Entity	Category	Confidence
John	PERSON	0.99+
Arijit Mukherji	PERSON	0.99+
five	QUANTITY	0.99+
Arijit	PERSON	0.99+
John Furrier	PERSON	0.99+
Palo Alto	LOCATION	0.99+
10 minutes	QUANTITY	0.99+
99%	QUANTITY	0.99+
Splunk	ORGANIZATION	0.99+
Boston	LOCATION	0.99+
one screen	QUANTITY	0.99+
three	QUANTITY	0.99+
Under the Hood	TITLE	0.99+
five minutes	QUANTITY	0.98+
five different tools	QUANTITY	0.98+
Splunk	PERSON	0.98+
one system	QUANTITY	0.97+
three nines	QUANTITY	0.97+
two perspectives	QUANTITY	0.97+
one way	QUANTITY	0.97+
one	QUANTITY	0.97+
Leading with Observability	TITLE	0.97+
Splunk Observability Suite	TITLE	0.96+
one whole month	QUANTITY	0.96+
four nines	QUANTITY	0.96+
one thing	QUANTITY	0.96+
both	QUANTITY	0.96+
single	QUANTITY	0.95+
theCUBE	ORGANIZATION	0.94+
four different tools	QUANTITY	0.92+
one type	QUANTITY	0.92+
three levels	QUANTITY	0.9+
about four	QUANTITY	0.89+
Under the Hood with Splunk Observability	TITLE	0.89+
20 years	QUANTITY	0.82+
single part	QUANTITY	0.81+
CUBE Conversation	EVENT	0.79+
Kubernetes	ORGANIZATION	0.78+
page	QUANTITY	0.76+
Leading with Observability	TITLE	0.75+
one UI	QUANTITY	0.73+
my Kubernetes	TITLE	0.72+
with Observability	TITLE	0.71+
Elasticsearch	TITLE	0.69+
COVID	TITLE	0.68+
single transaction	QUANTITY	0.66+
Under the	TITLE	0.66+
less than	QUANTITY	0.6+
Conversation	EVENT	0.54+
Hood	ORGANIZATION	0.47+

Craig Hyde, Splunk | Leading with Observability | January 2021

>> Narrator: From theCUBE studios in Palo Alto in Boston connecting with that leaders all around the world, this is a CUBE Conversation. >> Hello and welcome to this special CUBE Conversation. I'm John Furrier, your host. We're here for a special series, Leading with Observability, and this segment is: End-to-end observability drives great digital experiences. We've got a great guest here, Craig Hyde, senior director of product management for Splunk. Craig, great to see you. Thanks for coming on. >> And thanks for having me. This is great. >> So this series, Leading with Observability is a super hot topic obviously with cloud native. In the pandemic, COVID-19 has really kind of shown cloud native trend has been a tailwind for people who invested in it, who have been architecting for cloud on premises where data is a key part of that value proposition and then there's people who haven't been doing it. So, and out of this trend, the word observability has become a hot segment. And for us insiders in the industry, we know observability is just kind of network management on steroids in the cloud, so it's about data and all this. But at the end of the day, there's value that's enabled from observability. So I want to talk to you about that value that's enabled in the experience of the end user whether it's in a modern application or user inside the enterprise. Tell us what you think about this end user perspective. >> Sure, yeah thanks a lot for that intro. And I would actually argue that observability wouldn't even just be machine data or network data, it's more of a broader context where you can see everything that's going on inside the application and the digital user experience. From a user experience or a digital experience management perspective, I believe the metrics that you pull from such a thing are the most useful and ubiquitous metrics that you have and visibility in all of technology. And when done right, it can tell you what the actual end result of all this technology that you're piecing together, the end result of what's getting delivered to the user, both quantitatively and qualitatively. So, my background, I actually started a company in this domain. It was called Rigor and we focused purely on looking at user experience and digital experience. And the idea was that, you know, this was 10 years ago, we were just thinking, look, 10 years from now, more and more people are going to do business digitally, they're going to work more digitally and at the same time we saw the legacy data centers being shut down and things were moving to the cloud. So we said, look, the future is in the users, and where it all comes together is on the user's desktop or on their phone, and so we set out to focus specifically on that space. Fast forward 10 years, we're now a part of Splunk and we're really excited to bolt this onto an overall observability strategy. You know, I believe that it's becoming more and more popular, like you said, with the pandemic and COVID-19, it was already on a tear from a digital perspective, the adoption was going through the roof and people were doing more and more remote, they were buying more and more offline, but the pandemic has just pushed it through the roof. And I mean, wow, like the digital business genie's out of the bottle and there's no putting it back now. But, you know, there's also other things that are driving the need for this and the importance of it and part of it comes with the way technology is growing. It's becoming much more complex in terms of moving parts. Where an app used to be run off three different tiers in a data center, now it could be across hundreds of machines and opaque networks, opaque data centers all over the world, and the only time you often see things, how they come together, is on the user's desktop. And so that's where we really think you got to start from the user experience and work back. And, you know, all the drive in computing is all about making things better, faster and cheaper, but without this context of the user, often the customer and the experience gets left out from reaping the rewards from all these gains. So that's sort of like encapsulates my overall view of the space and why we got into it and why I'm so excited about it. >> Well Craig, I got to ask on a personal level. I mean, you look at what happened with the pandemic, I mean, you're a pioneer, you had a vision. Folks that are on the entrepreneurial side say, hey digital businesses is coming and they get it and it's slowly gets known in the real world, becomes more certain, but with the pandemic, it just happened all of a sudden so fast for everybody because everyone's impacted. Teachers, students, families, work, everyone's at home. So the entire user experience was impacted in the entire world. What was going through your mind when you saw all this happening and you see the winners obviously were people had invested in cloud native and data-driven technologies, what was your take on all this when you saw this coming? >> Well, the overall trend has been going on for decades, right? And so the direction of it isn't that surprising, but the magnitude and the acceleration, there's some stats out there from Forbes where the e-commerce adoption doubled within the first six months of the pandemic. So we're talking, you know, 10, 12 years of things ticking up and then within six months, a doubling of the adoption of e-commerce. And so like anybody else, you first freeze and say, what does this mean? But when people start working remote and people start ordering things from Amazon and all the other websites, it's quick to see like, aha! It no longer matters what chairs somebody is sitting in when they're doing work or that they're close to a store and you have a physical storefront when you're trying to buy something, it's all about that digital experience and it needs to be ubiquitous. So it's been interesting to see the change over the past few months for sure. But again, it doesn't change the trend, it just magnified it and I don't see it going back anytime soon. >> Yeah I mean, digital transformation has always been a buzz word that everyone kind of uses as a way to kind of talk about the big picture. >> Right. >> It's actually transforming and there's also share shifts that happen in every transformation, in any market shift. Obviously that's happening with cloud. Cloud native edge is becoming super important. In all of these, and by the way, in all the applications that sit on that infrastructure which is now infrastructure as code, has a data requirement that observability piece becomes super critical, not just from identifying and resolving, but also for training machine learning and AI, right? So, again, you have this new flywheel observability that's really at the heart of digital transformation. What should companies think about when they associate observability to digital transformation as they're sitting around whether they're CXOs or CSOs or solution architects going, okay, how does observability plug into my plans? >> Yeah, absolutely. I mean, my recommendation and the approach that I would take is that you want to start with the end in mind and it's all about how you set your goals when you're setting out in getting into digital transformation. And, you know, the late Steve Jobs, to borrow one of his quotes, he said that you have to start with the customer experience in mind and work backwards to the technology. And so I think that applies when you get into an observability strategy. So without understanding what the actual user experience is, you don't have a good enough yardstick to go out there and start working towards. So availability on a server or CPU time or transaction time in a database, like, those are all great, but without the context of what is the goal you're actually going after, it's kind of useless. So, like I said, it's not uptime, it's not server time, it's not any of that stuff, and it's user experience and these things are different. So they're like visual metrics, right? So what a user sees, because all kinds of things are going on in the background, but if it can see that the person can see and their experience is that they're getting some kind of response from the machine, then that's how you measure where the end point is and what the overall goal is. And so like to keep kind of going on with that, it's like you start with the end in mind, you use that end to set your goals, you use that domain and that visibility to troubleshoot faster. So when the calls start rolling in then they say, hey, I'm stuck at home and I'm on a slow internet connection, I can't get on the app and core IT is taking a phone call, You can quickly look and instrument that user and see exactly what they're seeing. So when you're troubleshooting, you're looking at the data from their perspective and then working backwards to the technology. >> That's super exciting. I want to get your thoughts on that. So just to double down on that because I think this highlights the trend that we were just talking about. But I'll break it down into three areas that I see happening in the marketplace. Number one, availability and performance. That's on everyone's mind. You just hit that, right? Number two, integrations. There's more integrations going on within platforms or tools or systems, whether it's an API over here, people are working together digitally, right? And you're seeing e-commerce. And third is the user patterns and the expectations are changing. So when you unpack those kinds of like trends, there's features of observability underneath each. Could you talk about that because I think that seems to be the common pattern that I'm seeing? Okay, high availability, okay, check. Everyone has to have that. Almost table stakes. But it's hard when you're scaling, right? And then integrations, all kinds of API is being slinged around. You've got microservices, you've got Kubernetes, people are integrating data flows, control planes, whatever, and then finally users. They want new things. New patterns emerge, which is new data. >> Yeah, absolutely. And to just kind of talk about that, it reminds me of like a Maslow's hierarchy of needs of visibility, right? Like, okay, the machine is on, check. Like you said, it's table stakes, make sure it's up and running. That's great. Then you want to see sort of the applications that are running on the machine, how they're talking to each other, are other components that you're making API calls to, are they timing out or are they breaking things? And so you get that visibility of like, okay, they're on, what's going on top of those machines are inside of them or in the containers or the virtual machines or whatever segment of computing that you're looking at, And then that cherry on top, the highest point is like, how is that stack of technology serving your customer? How's it serving the user and what's the experience? So those are sort of the three levels that we kind of look at when we're thinking of user experience. And so, it's a different way to look at it, but it's sort of the way that kind of we see the world is that three tier, that three layer cake. >> It's interesting. >> And you need all the layers. >> It's super relevant. And again, it's better together, but you can mix and match and have product in there. So I want to get into the Splunk solution. You guys have the digital experience monitoring solution. Can you explain what that is and how that fits into all this and what's in it for the customers, what's the benefit? >> Right, sure. So with the digital experience monitoring and the platform that we have, we're giving people the ability to basically do what I was talking about, where it enables you to take a look at what the user's experience are and pull metrics and then correlate them from the user all the way through the technical journey to the back end, through the different tiers of the application and so on. So that technology is called real user monitoring where we instrument the users. And then we also layer in synthetic monitoring which is the sort of robot users that are always on for when you're in lower level environments and you want to see, you know, what experience is going to look like when you push out new software, or when nobody's on the application, did something break? So a couple of those two together and then we feed that into our overall observability platform that's fed with machine data, we have all the metrics from all the components that you're looking at in that single pane of glass. And the idea is that we're also bringing you not only just the metrics and the events from logs and all the happenings, but we're also trying to help tease out some of these problems for you. So many problems that happen in technology have happened before, and we've got a catalog with our optimization platform of 300 plus things that go wrong when webpages or web applications or API calls start acting funky. And so we can provide, based on our intelligence that's built into the platform, basically run books for people to fix things faster and build those playbooks into the release process so you don't break the applications to begin with and you can set flags to where people understand what performance is before when it's being delivered to the customer, and if there are problems, let's fix them before we break the experience and lose the trust of the user. So again, it's the metrics from the stats that are coming across the wire of everything all the way to the users, it's the events from the logs that are coming in so you can see context, and then it's that user experience, it's a trace level data from where you can double click into each of the tiers and say, like, what's going on in here? What's going on in the browser? What's going on in the application? What's going on in the backend? And so you can sort of pool all that together in a single pane of glass and find problems faster, fix them faster and prevent users from having problems to begin with. And to do this properly, you really need it all under one roof and so that's why we're so excited to bring this all together. >> Yeah, I've been sitting on theCUBE for 10 years now. We've been 11 years, on our 11th year doing theCUBE. Digital you can measure everything. So why not? There should be no debate if done properly. So that brings up this whole concept that you guys are talking about full fidelity. Can you just take a minute to explain what that is? What is full fidelity mean? >> Sure, you know, full fidelity really comes down to a lot about these traces. So when we talk about metrics, logs and traces, it's all about getting all the activity that goes on in an application and looking at it. So when you or I interact with our company's app online and there's problems, that the person who's going to fix this problem, they can actually see specifically me. They can look at my experience and look at what it would look like in my browser, you know, what were all the services that I was interacting with and what was going on in the application, what code was being called, what services were being called, and look at specifically me as opposed to an aggregate of all the domains all put together. And it really is important from a troubleshooting standpoint. It's really important from an understanding of the actuals because without full fidelity and capturing all of the data, you're kind of going, you know, you're taking guesses and it eliminates a lot of the guesswork. And so that's something that's special with our platform is that ability to have the full fidelity. >> When does a client, a customer not have full fidelity? I might think I have it, someone sold me a product, What's the tell sign that I don't have full fidelity? >> Oh yeah, well with observability, there's a lot of tricks in the game. And so you see a lot of summary data that looks like, hey, this is that one call, but usually it's knitted together from a bunch of different calls. So that summary data just from, because this stuff takes up a lot of storage and there's a lot of problems with scale, and so when you might see something that looks like it's this call, it's actually like, in general, when a call like this happens, this is what it looks like. And so you've got to say like, is this the exact call? And, you know, it makes a big difference from a troubleshooting perspective and it's really hard to implement and that's something that Splunk's very good at, right? It's data at scale. It's the 800 pound gorilla in collecting and slicing apart machine data. So like, you have to have something of that scale in order to ingest all this information. It's a hard problem for sure. >> Yeah, totally. And I appreciate that. While I got you here, you're an expert, I got to ask you about Open Telemetry. We've heard that term kicked around. What does that mean? Is it an open source thing, is it an open framework? What is Open Telemetry and what does it mean for your customers or the marketplace? >> Yeah, I think of Open Telemetry as finally creating a standard for how we're collecting data from applications across AP- In the past, it's been onesie-twosie, here and there each company coming up with it themselves and there are never any standards of how to look at transactions across data, across applications and across tiers. And so Open Telemetry is the attempt and it's a consortium, so there's many people involved in pushing this together, but think of like a W3C, which creates the standards for how websites operate, and without it, the web wouldn't be what it is today. And now Open Telemetry is coming behind and doing that same thing from an observability standpoint. So you're not just totally locked into one vendor in the way that they do it and you're held hostage to only looking at that visibility. We're trying to set the standards to lower the barrier of entry into getting to application performance monitoring, network performance monitoring and just getting that telemetry where there are standards across the board. And so it's an open source project. We commit to it, and it's a really important project for observability in general. >> So does that speak to like, the whole more data you have, the less blind spots you might have? Is that the same concept? Is that some of the thinking behind this? >> It enables you to get more data faster. Now, if you think about, if there are no standards and there are no rules on the road and everybody can get on the road and they can decide if they want to drive in the left lane or the right lane today, it makes getting places a lot harder. And the same is true with Open Telemetry. without the standards of what, you know, the naming conventions, where you instrument, how you instrument, it becomes very hard to put some things in a single pane of glass because they just look differently everywhere. And so that's the idea behind it. >> Well Craig, great to have you on. You're super smart on this, and Leading with Observability, it's a hot topic. It's super cool and relevant right now with digital transformation as companies are looking to rearchitect and change how they're going to flip the script on software development, modern applications, modern infrastructure, edge, all of this is on top of mind of everyone's thing on their plans. And we certainly want to have you back in some of our conversations that we have around this on our editorial side as well with when we have these clubhouses we are going to start doing a lot of those. We definitely want to bring you in. I'll give you a final word here. Tell us what you're most excited about. Put the commercial for Splunk. Why Splunk? Why you guys are excited. Take a minute to get the plug in. >> It's so easy. Splunk has the base to make this possible. Splunk is, like I said, it's an 800 pound gorilla in machine data and taking in data at scale. And when you start going off into the observability abyss, the really, really hard part about it is having the scale to not only go broad in the levels of technology that you can collect, but also go deep. And that depth, when we talked about that full fidelity, it's really important when you get down to brass tacks and you start implementing changes and troubleshooting things and turning that data that you have in to doing, so understanding what you can do with it. And Splunk is fully committed to going, not only broad to get everything under one roof, but also deep so that you can make all of the information that you collect actionable and useful. And it's something that I haven't seen anybody even attempt and I'm really excited to be a part of building towards that vision. >> Well, I've been covering Splunk for, man, many, many years. 10 years plus, I think, since it's been founded, and really the growth and the vision and the mission still is the same. Leveraging data, making use of it, unlocking the power of data as it evolves and there's more of it. And it gets more complicated when data is involved in the user experience end-to-end from cybersecurity to user flows and new expectations. So congratulations. Great product. Thanks for coming on and sharing. >> Thanks again for having us. >> Okay, this is John Furrier in theCUBE. Leading with Observability is the theme of this series and this topic was End-to-end observability to enable great digital experiences. Thanks for watching. (lighthearted music)

Published Date : Feb 22 2021

SUMMARY :

all around the world, and this segment is: And thanks for having me. in the experience of the end user and the only time you often see things, and you see the winners obviously and all the other websites, about the big picture. and by the way, in all the applications but if it can see that the person can see and the expectations are changing. that are running on the machine, and how that fits into all this and the platform that we have, that you guys are talking and it eliminates a lot of the guesswork. and so when you might see something I got to ask you about Open Telemetry. And so Open Telemetry is the and everybody can get on the road Well Craig, great to have you on. but also deep so that you can and really the growth and is the theme of this series

ENTITIES

Entity	Category	Confidence
Craig Hyde	PERSON	0.99+
John Furrier	PERSON	0.99+
10 years	QUANTITY	0.99+
January 2021	DATE	0.99+
Craig	PERSON	0.99+
Palo Alto	LOCATION	0.99+
Steve Jobs	PERSON	0.99+
11 years	QUANTITY	0.99+
11th year	QUANTITY	0.99+
10	QUANTITY	0.99+
Splunk	ORGANIZATION	0.99+
800 pound	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
Boston	LOCATION	0.99+
each	QUANTITY	0.99+
Rigor	ORGANIZATION	0.99+
third	QUANTITY	0.99+
six months	QUANTITY	0.99+
pandemic	EVENT	0.98+
one call	QUANTITY	0.98+
two	QUANTITY	0.98+
300 plus things	QUANTITY	0.98+
both	QUANTITY	0.98+
first	QUANTITY	0.98+
10 years ago	DATE	0.98+
COVID-19	OTHER	0.97+
three different tiers	QUANTITY	0.97+
each company	QUANTITY	0.97+
Open Telemetry	TITLE	0.97+
one vendor	QUANTITY	0.97+
today	DATE	0.97+
three areas	QUANTITY	0.97+
Splunk	PERSON	0.97+
hundreds of machines	QUANTITY	0.96+
three levels	QUANTITY	0.96+
one	QUANTITY	0.95+
first six months	QUANTITY	0.94+
three layer	QUANTITY	0.91+
CUBE Conversation	EVENT	0.91+
single pane	QUANTITY	0.9+
decades	QUANTITY	0.9+
one roof	QUANTITY	0.89+
Kubernetes	TITLE	0.87+
Leading with Observability	TITLE	0.86+
double	QUANTITY	0.83+
COVID-19	EVENT	0.8+
12 years	QUANTITY	0.8+
three tier	QUANTITY	0.8+
Number two	QUANTITY	0.77+
a minute	QUANTITY	0.74+
theCUBE	ORGANIZATION	0.69+
past few months	DATE	0.62+
Forbes	ORGANIZATION	0.58+
10 years	DATE	0.57+
Conversation	EVENT	0.52+
Maslow	ORGANIZATION	0.46+
tiers	QUANTITY	0.43+

Patrick Lin, Splunk | Leading with Observability | January 2021

(upbeat music) >> Announcer: From the keeps studios in Palo Alto in Boston, connecting with that leaders all around the world. This is theCube conversation. >> Welcome to theCube conversation here in Palo Alto, California. I'm John Furrier, host of theCube. With a special content series called, Leading with observability, and this topic is, Keeping watch over microservices and containers. With great guests, Patrick Lin, VP of Product Management for the observability product at Splunk. Patrick, great to see you. Thanks for coming on remotely. We're still in the pandemic, but thanks for coming on. >> Yeah, John, great to see you as well. Thanks for having me. >> So, leading with observability is a big theme of our content series. Managing end to end and user experience is a great topic around how data can be used for user experience. But now underneath that layer, you have this whole craziness of the rise of the container generation, where containers are actually going mainstream. And Gardner will forecast anywhere from 30 to 40 percent of enterprises still yet, haven't really adopted at full scale and you've got to keep watch over these. So, what is the topic about keeping watch over microservice and containers, because, yeah, we know they're being deployed. Is it just watching them for watching sake or is there a specific reason? What's the theme here? Why this topic? >> Yeah, well, I think containers are part of the entire kind of stack of technology that's being deployed in order to develop and ship software more quickly. And, the fundamental reasons for that haven't changed but they've been greatly accelerated by the impact of the pandemic. And so I think for the past few years we've been talking about how software's eating the world, how it's become more and more important that company go through the transformation to be more digital. And I think now that is so patently obvious to everybody. When your only way of accessing your customer and for the customer to access your services is through a digital media. The ability for your IT and DevOps teams to be able to deliver against those requirements, to deliver that flawless customer experience, to sort of keep pace with it the digital transformation and the cloud initiatives. All of that is kind of coming as one big wave. And so, we see a lot of organizations migrating workloads to the cloud, refactoring applications, building new applications natively. And so, when they do that oftentimes the infrastructure of choice is containers. Because it's the thing that keeps up with the pace of the development. It's a much more efficient use of underlying resources. So it's all kind of part of the overall movement that we see. >> What is the main driver for this use case microservices and where's the progress bar in your mind of the adoption and deployment of microservices, and what is the critical things that are there you guys are looking at that are important to monitor and observe and keep track of? Is it the status of the microservices? Is it the fact that they're being turned on and off, the state, non-state, I mean take us through some of the main drivers for why you guys are keeping an eye on the microservices component? >> Sure, well, I think that if we take a step back the reason that people have moved towards microservices and containers fundamentally has to do with the desire to be able to, number one, develop and ship more quickly. And so if you can parallelize the development have API is the interface between these services rather than having sort of one monolithic code base, you can evolve more quickly. And on top of that, the goal is to be able to deliver software that is able to scale as needed. And so, that is a part of the equation as well. So when you sort of look at at this the desire to be able to iterate on your software and services more quickly, to be able to scale infinitely, staying up and so on. That's all like a great reason to do it, but what happens along those lines, what comes with it is a few kind of additional layers of complexity because now rather than have, let's say an end to your app that you're watching over on some hosts that you could reboot when there's a problem. Now you have 10's, maybe 100's of services running on top of maybe 100,000's, maybe 10,000's of containers. And so the complexity of that environment has grown quite quickly. And the fact that those containers may go away as you are scale the service up and down to meet demand also adds to that complexity. And so from an observability perspective, what you need to be able to do is a few things. One is you need to actually be tracking this in enough detail and at a high enough resolution in realtime. So that you know when things are coming in and out. And that's been one of the more critical things that we've built towards a Splunk, is that ability to watch over it in realtime. But more important, or just as important in that is, understanding the dependencies and the relationships between these different services. And so, that's one of the main things that we worked on here is to make sure that you can understand the dependency so that when there's an issue you have a shot at actually figuring out where the problem is coming from. Because of the fact that there's so many different services and so many things that could be affecting the overall user experience when something goes wrong. >> I think that's one of the most exciting areas right now, on observability is this whole microservices container equation, because a lot of actions being done there, there's a lot of complexity but the upside, if you do it right, it's significant. I think people generally are bought into that concept, Patrick, but I want to get your thoughts. I get this question a lot from executives and leaders whether it's a cloud architect or a CXO. And the question is, what should I consider? What do I need to consider when deploying an observability solution? >> Yeah, that's a great question. Cause I think they're obviously a lot of considerations here. So, I think one of the main ones, and this is something that I think is a pattern that we are pretty familiar with in the this sort of monitoring and management tool world. Is that, over time most enterprises have gotten themselves a very large number of tools. One for each part of their infrastructure or their application stack and so on. And so, what you end up with is sprawl in the monitoring toolset that you have. Which creates not just sort of a certain amount of overhead in terms of the cost, but also complexity that gets in the way of actually figuring out where the problem is. I've been looking at some of the toolsets that some of our customers have pulled together and they have the ability to get information about everything but it's not kind of woven together in a useful way. And it sort of gets in the way actually, having so many tools when you are actually in the heat of the moment trying to figure something out. It sort of hearkens back to the time when you have an outage, you have a con call with like a cast of 1000's on it trying to figure out what's going on. And each person comes to that with their own tool, with their own view, without anything that ties that to what the others are seeing. And so, that need to be able to provide sort of an integrated toolset, with a consistent interface across infrastructure, across the application, across what the user experience is and across the different data types. The metrics, the traces, the logs. Fundamentally I think that ability to kind of easily correlate the data across it and get to the right insight. We think that's a super important thing. >> Yeah, and I think what that points out, I mean, I always say, don't be a fool with a tool. And if you have too many tools, you have a tool shed, and there are too many tools everywhere. And that's kind of a trend, and tools are great when you need tools. To do things. But when you have too many, when you have a data model where essentially what you're saying is, a platform is the trend, because weaving stuff together you need to have a data control plane, you need to have data visualization. You need to have these things for understanding the success there. So, really it's a platform, but platforms also have tools as well. So tools or features of a platform if I get what you're saying, right? Is that correct? Yeah, so I think that there's one part of this which is, you need to be able to, if I start from the user point of view, what you want is a consistent and coherent set of workflows for the people who are trying to actually do the work. You don't want them to have to deal with the impedance mismatches across different tools that exist based on, whatever, even the language that they use but how they bring the data in and how it's being processed. You go down one layer from that. You sort of want to make sure that what they're working with is actually consistent as well. And that's the sort of capabilities that you're looking at whether you're whatever, trying to chart something to be able to look at the details, or go from a view of logs to the related traces. You sort of want to make sure that the information that's being served up there is consistent. And that in turn relies on data coming in, in a way that is sort of processed to be correlated well. So that if you say, Hey, I'm I'm looking at a particular service. I want to understand what infrastructure is sitting on or I'm looking at a log and I see that it relates to a particular service. And I want to look at traces for that service. Those things need to be kind of related from the data on in and that needs to be exposed to the user so that they can navigate it properly and make use of it. Whether that's during kind of, or time during an incident or peace time. >> Yeah, I love that wartime conciliary versus peace time. I saw blog posts from a VC, I think said, don't be a Tom Hagen, which is the guy in The Godfather when the famous lines said, you're not a wartime conciliary. Which means things are uncertain in these times and you've got to get them to be certain. This is a mindset, this is part of the pandemic we're living in. Great point, I love that. Maybe we could follow up on that at the end, but I want to get some of these topics. I want to get your reactions to. So, I want you to react to the following, Patrick. it's an issue in a topic, and there it is, missing data results in limited analytics and misguided troubleshooting. What's your reaction to that? What's your take on that? What's the Splunk's take on that? >> Yeah, I mean, I think Splunk has sort of been a proponent of that view for a very long time. I think that whether that's from the log data or from, let's say, the metric data that we capture at high resolution or from tracing. The goal here is to have the data that you need in order to actually properly diagnose what's going on. And I think that older approaches, especially on the application side, tend to sample data right at the source and provide hopefully useful samples of it for when you have that problem. That doesn't work very well in the microservice world because you need to actually be able to see the entirety of a transaction, to a full trace across many services before you could possibly make a decision as to what's useful to keep. And so, the approach that I think we believe is the right one, is to be able to capture at full fidelity all of those bits of information, partly because of what I just said, you want to be able to find the right sample, but also because it's important to be able to tie it to something that may be being pulled in by different system. So, an example of that might be, in a case where you are trying to do real user monitoring alongside of APM, and you want to see the end to end trace from what the user sees all the way through to all the backend services. And so, what's typical in this world today is that, that information is being captured by two different systems independent sampling decisions. And therefore the ability to draw a straight line from what the end user sees all the way to what is effecting it on the backend is pretty hard. Where it gets really expensive. And I think the approach that we've taken is to make it so that that's easy and cost-effective. And it's tremendously helpful then to tie it back to kind of what we were talking about at the outset here where you were trying to provide services that make sense and are easy access and so on to your end user. to be able to have that end to end view because you're not missing data. It's tremendously valuable. >> You know what I love about Splunk is, cause I'm a data geek going back when it wasn't fashionable back in the 80's. And Splunk has always been about ingesting all the data. So they bring all the data, we'll take it all. Now from at the beginning it was pretty straightforward, complex but still it had a great utility. But even now, today, it's the same thing you just mentioned, ingest all the data because there's now benefits. And I want to just ask you a quick question on this, distributed computing trend, because I mean everyone's pretty much in agreement that's in computer science or in the industry and in technology says, okay, cloud is a distributed computing with the edge. It's essentially distributed computing in a new way, new architecture with new great benefits, new things, but science is still kicking apply some science there. You mentioned distributed tracing because at the end of the day that's also a new major thing that you guys are focused on and it's not so much about, it's also good get me all the data but distributed tracing is a lot harder than understanding that because of the environment and it's changing so fast. What's your take on it? >> Yeah, well fundamentally I think this goes back to, ironically one of the principles in observability. Which is that oftentimes you need participation from the developers in sort of making sure that you have the right visibility. And it has to do with the fact that there are many services that are being kind of strong together as it were to be able to deliver on some end user transaction or some experience. And so, the fact that you have many services that are part of this, means that you need to make sure that each of those components is actually kind of providing some view into what it's doing. And distribute tracing is about taking that and kind of weaving it together so that you get that coherent view of the business workflow within the overall kind of web of services that make up your application. >> So the next topic, I want to get into, we've got limited time, but I'm going to squeeze through, but I'm going to read it to you real quick. Slow alerts and insights are difficult to scale. If they're difficult to scale it holds back the meantime between resolving. And so, it's difficult to detect in cloud. It was easier maybe on premise, but with cloud this is another complexity thing. How are you seeing the inability to scale quickly across the environments for to manage the performance issues and delays that are coming out of not having that kind of in slow insights or managing that? What's your reaction to that? >> Yeah, well, I think there are a lot of tools out there that we'll take in events or where issues from cloud environments. But they're not designed from the very beginning to be able to handle the sort of scale of what you're looking at. So, I mentioned, it's not uncommon for a company to have 10's or maybe even 100's of services and 1000's of containers or hosts. And so, the sort of sheer amount of data you have to be looking at on an ongoing basis. And the fact that things can change very quickly. Containers can pop in and go away within seconds. And so, the ability to track that in realtime implies that you need to have an architectural approach that is built for that from the very beginning. It's hard to retrofit a system to be able to handle orders to magnitude more complexity and change in pace of change. You need to start from the very beginning. And the belief we have is that you need some form of a realtime streaming architecture. Something that's capable of providing that realtime detection and alerting across a very wide range of things in order to handle the scale and the ephemeral nature of cloud environments. >> Let me ask a question then, because I heard some people say, well, it doesn't matter. 10, 15 minutes to log in to an event is good enough. What would you react to that? (chuckles) What a great example of where it's not good enough? I mean, is it minutes is it's seconds, what are we talking about here? What's the good enough bar right now? >> Yeah, I mean, I think any anybody who has tried to deliver an experience digitally to an end user, if you think you can wait minutes to solve a problem you clearly haven't been paying enough attention. And I think that, I think it almost goes without saying, that the faster you know that you have a problem, the better off you are. And so, when you think about what are the objectives that you have for your service levels or your performance or availability. I think you run out of minutes pretty quickly, if you get to anything like say, three nines So, waiting 15 minutes, maybe would have been acceptable before people were really trying to use your service at scale. But definitely not any more. >> And the latest app requires it. It's super important. I brought that up and tongue in cheek kind of tee that up for you because these streaming analytics, streaming engines are super valuable, and knowing when to use realtime and not also matters. This is where the platforms come in. >> Yes, absolutely. The platform is the thing that enables that. And I think you have to sort of build it from the very beginning with that streaming approach with the ability to do analytics against the streams coming in, in order for you to deliver on this sort of promise of alerts and insights at scale and in realtime. >> All right, final point. I'll give you the last word here. Give a plug for the Splunk observability suite. What is it? Why is it important? Why should people buy it? Why should people adopt it? Why should they upgrade to it? Give the perspective, give the plug. >> Yeah, sure. I appreciate the opportunity. So, I think as we've been out there speaking to customers right over the last year as part of Splunk and before that, I think they've spoken to us a lot about the need for better visibility into their environments. Which are increasingly complex and where they're trying to deliver on the best possible user experience. And to sort of add to that, where they're trying to actually consolidate the tools. We spoke about the sprawl at the beginning. And so, with what we're putting together here with the Splunk observability suite. I'd say we have the industry's most comprehensive and powerful combination of solutions that will help both sort of IT and DevOps teams tackle these new challenges for monitoring and observability that other tools simply can't address. So you're able to eliminate the management complexity by having a single consistent user experience across the metrics and logs and traces, so that you can have seamless monitoring and troubleshooting and investigation. You can create a better user experiences by having that true end to end visibility, all the way from the front end to the backend services, so that you can actually see what kind of impact you're having on users and figure it out within seconds. I think we're also able to help increase developer productivity. As these high performance tools that help the DevOps teams get to a better quality code faster, because they can get immediate feedback on how their coachings are doing with each we would see each release and they're able to operate more efficiently. So, I think there's a very large number of benefits from this approach of providing a single unified toolset that relies on a source of data that's consistent across it but then has the sort of particular tools that different users need for what they care about. Whether you're the front end developer, needing to understand the user experience, whether you're backend service owner wanting to see how your service relates to others, whether you're owning the infrastructure, and needs to see, is it actually providing what the services are running on it need. >> Well, Patrick, great to see you. And I just want to say, congratulations has been following your work, going back in the industry specifically with SignalFx, you guys were really early and seeing the value of observability before it was a category. And so how has more often so relevant as you guys had saw it. So, congratulations and keep up the great work. We'll keep a competition's open. Thanks for coming on. >> Great, thanks so much, John. Great talking to you. >> All right, this is theCube, Leading with observability, it's a series, check it out. We have a multiple talk tracks. Check out the Splunk's a series, Leading with observability. I'm John Furrier with theCube. Thanks for watching. (upbeat music)

Published Date : Feb 22 2021

SUMMARY :

all around the world. for the observability product at Splunk. Yeah, John, great to see you as well. What's the theme here? and for the customer the goal is to be able to deliver software And the question is, And so, that need to be able and that needs to be exposed to the user What's the Splunk's take on that? the data that you need it's the same thing you just mentioned, And so, the fact that the environments for to And so, the ability to What's the good enough bar right now? that the faster you know of tee that up for you And I think you have to sort of build it Give a plug for the Splunk the DevOps teams get to a and seeing the value of observability Great talking to you. Check out the Splunk's a series,

ENTITIES

Entity	Category	Confidence
John	PERSON	0.99+
Patrick	PERSON	0.99+
John Furrier	PERSON	0.99+
Patrick Lin	PERSON	0.99+
January 2021	DATE	0.99+
Tom Hagen	PERSON	0.99+
Palo Alto	LOCATION	0.99+
15 minutes	QUANTITY	0.99+
10	QUANTITY	0.99+
Palo Alto, California	LOCATION	0.99+
today	DATE	0.99+
one	QUANTITY	0.99+
Splunk	ORGANIZATION	0.99+
The Godfather	TITLE	0.99+
one layer	QUANTITY	0.99+
30	QUANTITY	0.99+
each	QUANTITY	0.99+
Boston	LOCATION	0.98+
one part	QUANTITY	0.98+
last year	DATE	0.98+
each person	QUANTITY	0.98+
both	QUANTITY	0.98+
pandemic	EVENT	0.98+
1000	QUANTITY	0.98+
1000's	QUANTITY	0.98+
single	QUANTITY	0.97+
One	QUANTITY	0.96+
40 percent	QUANTITY	0.96+
each release	QUANTITY	0.95+
80's	DATE	0.94+
SignalFx	ORGANIZATION	0.93+
each part	QUANTITY	0.93+
10's	QUANTITY	0.91+
two different systems	QUANTITY	0.88+
10,000's of containers	QUANTITY	0.87+
100,000's	QUANTITY	0.84+
Gardner	PERSON	0.84+
Leading with Observability	TITLE	0.81+
100's of	QUANTITY	0.81+
three nines	QUANTITY	0.81+
100's of services	QUANTITY	0.79+
theCube	ORGANIZATION	0.76+
DevOps	TITLE	0.72+
Splunk	PERSON	0.71+
past few years	DATE	0.63+
Splunk	TITLE	0.59+
containers	QUANTITY	0.55+
one	EVENT	0.54+
big wave	EVENT	0.53+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Leading with Observability: