Arijit Mukherji, Splunk | Leading with Observability

>> Announcer: From theCUBE studios in Palo Alto and Boston, connecting with thought leaders all around the world, this is a CUBE Conversation. >> Hello and welcome to this special CUBE Conversation here in the Palo Alto studios, I'm John Furrier, host of theCUBE, for this Leading with Observability series with Under the Hood with Splunk Observability, I'm John Furrier with Arijit Mukherji with Splunk, he's a distinguished engineer, great to have you on. These are my favorite talks. Under the Hood means we're going to get all the details, what's powering observability, thanks for coming on. >> It's my pleasure, John, it's always nice to talk to you. >> Leading with Observability is the series, want to take a deep dive look across the spectrum of the product, the problems that it's solving, but Under the Hood is a challenge, because, people are really looking at coming out of COVID with a growth strategy, looking at cloud-native, Kubernetes, you're starting to see microservices really be a big part of that, in real deployments, in real scale. This has been a theme that's been growing, we've been covering it. But now, architectural decisions start to emerge. Could you share your thoughts on this, because this becomes a big conversation. Do you buy a tool here, how do you think it through, what's the approach? >> Exactly, John. So it's very exciting times in some sense, with observability right now. So as you mentioned and discussed a few times, there's a bunch of trends that are happening in the industry which is causing a renewed interest in observability, and also an appreciation of the importance of it, and observability now as a topic, it's like a huge umbrella topic, it covers many many different things like APM, your infrastructure monitoring, your logging, your real user monitoring, your digital experience management, and so on. So it's quite a set of things that all fall under observability, and so the challenge right now, as you mentioned, is how do we look at this holistically? Because, I think at this point, it is so many different parts to this edifice, to this building, that I think having a non-integrated strategy where you just maybe go buy or build individual pieces, I don't think that's going to get you very far, given the complexity of what we're dealing with. And frankly, that's one of the big challenges that we, as architects within Splunk, we are scratching our heads with, is how do we sort of build all of this in a more coherent fashion? >> You know, one of the things, Arijit, I want to get your thoughts on is because, I've been seeing this trend and, we've been talking about it on theCUBE a lot around systems thinking, and if you look at the distributed computing wave, from just go back 20 years and look at the history of how we got here, a lot of those similar concepts are happening again, with the cloud, but not as simple. You're seeing a lot more network, I won't say network management, but observability is essentially instrumentation of the traffic and looking at all the data, to make sure things like breaches and cybersecurity, and also making systems run effectively, but it's distributed computing at the end of it, so there's a lot of science that's been there, and now new science emerging around, how do you do this all? What's your thoughts on this, because this becomes a key part of the architectural choices that some companies have to make, if they want to be in position to take advantage of cloud-native growth, which is multifold benefits, and your product people talk about faster time to market and all that good stuff, but these technical decisions matter, can you explain? >> Yes, it absolutely does. I think the main thing that I would recommend that everybody do, is understand why observability, what do you want to get out of it? So it is not just a set of parts, as I mentioned earlier, but it brings direct product benefits, as we mentioned, like faster mean time to resolution, understanding what's going on in your environment, having maybe fewer outages at the same time, understanding your causes, so many different benefits. So the point is not that one has the ability to do maybe (indistinct) or ability to do infrastructure (indistinct), the main question is aspirationally, what are my goals that are aligned to what my business wants? So what do I want to achieve, do I want to innovate faster? In that case, how is observability going to help me? And this is sort of how you need to define your strategy in terms of what kind of tools you get and how they work together. And so, if you look at what we're doing at Splunk, you'll notice it's extremely exciting right now, there's a lot of acquisitions happening, a lot of products that we're building, and the question we're asking as architects is, suppose we want to use, that will help us achieve all of this, and at the same time be somewhat future-proofed. And I think any organization that's either investing in it, or building it, or buying it, they all would probably want to think along those lines. Like what are my foundational principles, what are the basic qualities I want to have out of this system? Because technologies and infrastructures will keep on changing, that's sort of the rule of nature right now. The question is how do we best address it in a more future-proofed system? At Splunk, we have come up with a few guiding principles, and I'm sure others will have done the same. >> You know, one of the dynamics I want to get your reaction to is kind of two perspectives, one is, the growth of more teams that are involved in the work, so whether it's from cyber to monitoring, there's more teams with tools out there that are working on the network. And then you have just the impact of the diversity of use cases, not so much data volume, 'cause that's been talked about, lot of, we're having a tsunami of data, that's clear. But different kinds of dynamics, whether it's real-time, bursting, and so when you have this kind of environment, you can have gaps. And solar winds have taught us anything, it's that you have to identify problems and resolve them, this comes up a lot in observability conversations, MTTI, mean time to identify, and then to resolve. These are concepts. If you don't see the data, you can't understand what's going on if you can't measure it. This is like huge. >> Yes, absolutely right, absolutely right. So what we really need now is, as you mentioned, we need an integrated tool set, right? What we mean by that, is the tools must be able to work together, the data must be able to be used across the board. So like by use case it should not be siloed or fragmented, that they should work as one system that users are able to learn, and then sort of be able to use effectively without context switching. Another concept that's quite important is, how flexible are you? Are you digging yourself into a fixed solution, or are you depending on open standards that will then let you change out implementations, or vendors, or what have you, (static crackles) down the line, relatively easily. So understanding how you're collecting the data, how good can open standards and open source you're using is important. But to your point about missing and gaps, I think full fidelity, like understanding every single transaction, if you can pull it off, is a fascinating superpower, because that's where you don't get the gaps, and if you are able to go back and track any bad transaction, any time, that is hugely liberating, right? Because without that, if you're going to do a lot of sampling, you're going to miss a huge percentage of the user interactions, that's probably a recipe for some kind of trouble down the line, as you mentioned. And actually, these are some of those principles that we are using to build the Splunk Observability Suite, is no sample or full fidelity is a core foundational principle, and for us, it's not just isolated to, let's say application performance management, where user gets your API and you're able to track what happened, we are actually taking this upstream, up to the user, so the user is taking actions on the browser, how do we capture and correlate what's happening on the browser, because (indistinct) as you know, there's a huge move towards single-page applications, where half of my logic that my users are using is actually running on the browser, right? And so understanding the whole thing end to end, without any gaps, without any sampling, is extremely powerful. And so yes, so those are some of the things that we're investing in, and I think, again, one should keep in mind, when they're considering observability. >> You know, we were talking the other day, and having a debate around technical debt, and how that applies to observability, and one of the things that you brought up earlier about tools, and tool sprawl, that causes problems, you have operational friction, and we've heard people say "Yeah, I've got too many tools," and just too much, to replatform or refactor, it's just too much pain in the butt for me to do that, so at some point they break, I take on too much technical debt. When is that point of no return, where someone feels the pain on tool sprawl? What are some of the signaling where it's like, "You better move now (indistinct) too late," 'cause this integrated platform, that's what seems to be the way people go, as you mentioned. But this tool sprawl is a big problem. >> It is, and I think it starts hitting you relatively early on, nowadays, if you ask my opinion. So, tool sprawl is I think, if you find yourself, I think using three or four different tools, which are all part of some critical workload together, that's a stink that there's something could be optimized. For example, let's say I'm observing whether my website works fine, and if my alerting tool is different from my data gathering, or whatever, the infrastructure monitoring metrics tool, which is different from my incident management tool, which is different from my logs tool, then if you put the hat on of an engineer, a poor engineer who's dealing with a crisis, the number of times they have to context switch and the amount of friction that adds to the process, the delay that it adds to the process is very very painful. So my thinking is that at some point, especially if we find that core critical workloads are being fragmented, and that's when sort of I'm adding a bunch of friction, it's probably not good for us to sort of make that sort of keep on going for a while, and it would be time to address that problem. And frankly, having these tools integrated, it actually brings a lot of benefit, which is far bigger than the sum of the parts, because think about it, if I'm looking at, say, an incident, and if I'm able to get a cross-tool data, all presented in one screen, one UI, that is hugely powerful because it gives me all the information that I need without having to, again, dig into five different tools, and allows me to make quicker, faster decisions. So I think this is almost an inevitable wave that everybody must and will adopt, and the question is, I think it's important to get on the good program early, because unless you sort of build a lot of practices within an organization, that becomes very very hard to change later, it is just going to be more costly down the line. >> So from an (indistinct) standpoint, under the hood, integrated platform, takes that tool sprawl problem away, helps there. You had open source technology so there's no lock-in, you mentioned full fidelity, not just sampling, full end to end tracing, which is critical, wants to avoid those gaps. And then the other are I want to get your thoughts on, that you didn't bring up yet, that people are talking about is, real time streaming of analytics. What role does that play, is that part of the architecture, what function does that do? >> Right, so to me, it's a question of, how quickly do I find a problem? If you think about it, we are moving to more and more software services, right? So everybody's a software service now, and we all talk to each other in different services. Now, any time you use a dependency, you want to know how available it is, what are my SLAs and SLOs and so on, and three nines is almost a given, that you must provide three nines or better. Ideally four nines of availability, because your overall system stability is going to be less than the one of any single part, and if you go to look at four nines, you have about four or five minutes of total downtime in one whole month. That's a hard thing to be able to control. And if your alerting is going to be in order of five or 10 minutes, there's no chance you're going to be able to promise the kind of high availability that you need to be able to do, and so the fundamental question is you need to understand problems quick, like fast, within seconds, ideally. Now streaming is one way to do it, but that really is the problem definition, how do I find the problems early enough so that I can give my automation or my engineers time to figure out what happened and take corrective action? Because if I can't even know that there's something amiss, then there's no chance I'm going to be able to sort of provide that availability that my solution needs. So in that context, real time is very important, it is much more important now, because we have all these software and service dependencies, than it maybe used to be in the past. And so that's kind of why, again, at Splunk, we invested in real time streaming analytics, with the idea again being, let the problem, how can we address this, how can we provide customers with quick, high level important alerts in seconds, and that sort of real time streaming is probably the best way to achieve that. And then, if I were to, sorry, go ahead. >> No, go on, finish. >> Yeah, I was going to say that it's one thing to get an alert, but the question then is, now what do I do with it? And there's obviously a lot of alert noise that's going out, and people are fatigued, and I have all these alerts, I have this complex environment, understanding what to do, which is sort of reducing the MTTR part of it, is also important, I think environments are so complex now, that without a little bit of help from the tool, you are not going to be able to be very effective, it's going to take you longer, and this is also another reason why integrated tools are better, because they can provide you hints, looking at all the data, not just one type, not just necessarily logs, or not just necessarily traces, but they have access to the whole data set, and they can give you far better hints, and that's again one of the foundational principles, because this is in the emergent field of AIOps, where the idea is that we want to bring the power of data science, the power of machine learning, and to aid the operator in figuring out where a problem might be, so that they can at least take corrective action faster, not necessarily fix it, but at least bypass the problem, or take some kind of corrective action, and that's a theme that sort of goes across our suite of tools is, the question we ask ourselves is, "In every situation, what information could I have provided them, what kind of hints could we have provided them, to short circuit their resolution process?" >> It's funny you mention suite of tools, you have an Observability Suite, which Splunk leads with, as part of the series, it's funny, suite of tools, it's kind of like, you kind of don't want to say it, but it is kind of what's being discussed, it's kind of a platform and tool working together, and I think the trend seems to be, it used to be in the old days, you were a platform player or a tool player, really kind of couldn't do both, but now with cloud-native, as it's distributed computing, with all this importance around observability, you got to start thinking, suite has platform features, could you react to that, and how would you talk about that, because what does it mean to be a platform? Platforms have benefits, tools have benefits, working together implies it's a combination, could you share your thoughts on that reaction to that? >> That's a very interesting question you asked, John, so this is actually, if you asked me how I look at the solution set that we have, I will explain it thus. We are a platform, we are a set of products and tools, and we are an enterprise solution. And let me explain what I mean by that, because I think all of these matter, to somebody or the other. As a platform, you're like "How good am I in dealing with data?" Like ingesting data, analyzing data, alerting you, so those are the core foundational features that everybody has, these are the database-centric aspects of it, right? And if you look at a lot of organizations who have mature practices, they are looking for a platform, maybe it scales better than what they have, or whatnot, and they're looking for a platform, they know what to do, build out on top of that, right? But at the same time, a platform is not a product, 99% of our users, they're not going to make database calls to fetch and query data, they want an end to end, like a thing that they can use to say, "Monitor my Kubernetes," "Monitor my Elasticsearch," "Monitor my," you know, whatever other solution I may have. So then we build a bunch of products that are built on top of the platform, which provide sort of the usability, so where, it's very easy to get on, send the data, have built-in content, dashboard (indistinct), what have you, so that my day to day work is fast, because I'm not a observability engineer, I'm a software engineer working on something, and I want to use observability, make it easy for me, right? So that's sort of the product aspect of it. But then if you look at organizations that a little bit scale up, just a product is also not good enough. Now we're looking at a observability solution that's deployed in an enterprise, and there are many many products, many many teams, many many users, and then how can one be effective there? And if you look at what's important at that level, it's not the database aspect or the platform aspect, it's about how well can I manage it, do I have visibility into what I am sending, what my bill is, can I control against incorrect usage, do I have permissions to sort of control who can mess with my (indistinct) and so on, and so there's a bunch of layer of what we call enterprise capabilities that are important in an organizational setting. So I think in order to build something that's successful in this space, we have to think at all these three levels, right? And all of these are important, because in the end, it's how much value am I getting out of it, it's not just what's theoretically possible, what's really happening, and all of these are important in that context. >> And I think, Arijit, that's amazing masterclass right there, soundbite right there, and I think it's because the data also is important, if you're going to be busting down data silos, you need to have a horizontally scalable data observability space. You have to have access to the data, so I think the trend will be more integrated, clearly, and more versatile from a platform perspective, it has to be. >> Absolutely, absolutely. >> Well, we're certainly going to bring you back on our conversations when we have our events and/or our groups around digital transformation Under the Hood series that we're going to do, but great voice, great commentary, Arijit, thank you for sharing that knowledge with us, appreciate it. >> My pleasure, thank you very much. >> Okay, I'm John Furrier with theCUBE, here, Leading with Observability content series with Splunk, I'm John Furrier with theCUBE, thanks for watching. (calm music)

Published Date : Feb 22 2021

SUMMARY :

leaders all around the world, great to have you on. always nice to talk to you. Could you share your thoughts on this, and so the challenge right and if you look at the and at the same time be it's that you have to identify and if you are able to go back and how that applies to observability, the delay that it adds to the that part of the architecture, and so the fundamental question is And if you look at a lot of organizations and I think it's because going to bring you back I'm John Furrier with

ENTITIES

Entity	Category	Confidence
John	PERSON	0.99+
Arijit Mukherji	PERSON	0.99+
five	QUANTITY	0.99+
Arijit	PERSON	0.99+
John Furrier	PERSON	0.99+
Palo Alto	LOCATION	0.99+
10 minutes	QUANTITY	0.99+
99%	QUANTITY	0.99+
Splunk	ORGANIZATION	0.99+
Boston	LOCATION	0.99+
one screen	QUANTITY	0.99+
three	QUANTITY	0.99+
Under the Hood	TITLE	0.99+
five minutes	QUANTITY	0.98+
five different tools	QUANTITY	0.98+
Splunk	PERSON	0.98+
one system	QUANTITY	0.97+
three nines	QUANTITY	0.97+
two perspectives	QUANTITY	0.97+
one way	QUANTITY	0.97+
one	QUANTITY	0.97+
Leading with Observability	TITLE	0.97+
Splunk Observability Suite	TITLE	0.96+
one whole month	QUANTITY	0.96+
four nines	QUANTITY	0.96+
one thing	QUANTITY	0.96+
both	QUANTITY	0.96+
single	QUANTITY	0.95+
theCUBE	ORGANIZATION	0.94+
four different tools	QUANTITY	0.92+
one type	QUANTITY	0.92+
three levels	QUANTITY	0.9+
about four	QUANTITY	0.89+
Under the Hood with Splunk Observability	TITLE	0.89+
20 years	QUANTITY	0.82+
single part	QUANTITY	0.81+
CUBE Conversation	EVENT	0.79+
Kubernetes	ORGANIZATION	0.78+
page	QUANTITY	0.76+
Leading with Observability	TITLE	0.75+
one UI	QUANTITY	0.73+
my Kubernetes	TITLE	0.72+
with Observability	TITLE	0.71+
Elasticsearch	TITLE	0.69+
COVID	TITLE	0.68+
single transaction	QUANTITY	0.66+
Under the	TITLE	0.66+
less than	QUANTITY	0.6+
Conversation	EVENT	0.54+
Hood	ORGANIZATION	0.47+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Under the Hood withSplunk Observability: