Try our new research platform with insights from 80,000+ expert users
Ken Fillinger - PeerSpot reviewer
Assistant Vice President & Software Engineer & Infraoperations- & Operations Developer at a financial services firm with 201-500 employees
Real User
Top 10
Sep 11, 2025
RUM data has improved visibility into user paths and strengthened operational performance
Pros and Cons
  • "Splunk Observability Cloud has helped improve our operational performance and company's resilience, which was their initial offering."
  • "I've only experienced downtime, crashes, or performance issues when it isn't configured correctly."

What is our primary use case?

Our main use cases include synthetic monitoring, APM, RUM, alerting, detectors, dashboards, and all related functionality.

What is most valuable?

My favorite feature of Splunk Observability Cloud is the RUM because I like the RUM data. Splunk Observability Cloud has helped improve our operational performance and company's resilience, which was their initial offering. Without it, we wouldn't have any exposure and would be looking at raw data.

What needs improvement?

It can be improved through the integration of AI, which is either coming or already available.

For how long have I used the solution?

I've been using Splunk Observability Cloud for two and a half years.

Buyer's Guide
Splunk Observability Cloud
February 2026
Learn what your peers think about Splunk Observability Cloud. Get advice and tips from experienced pros sharing their opinions. Updated: February 2026.
884,933 professionals have used our research since 2012.

What do I think about the stability of the solution?

I've only experienced downtime, crashes, or performance issues when it isn't configured correctly.

What do I think about the scalability of the solution?

Splunk Observability Cloud scales effectively with the growing needs of our organization; we simply need to pay more and ingest more data.

Which solution did I use previously and why did I switch?

Prior to this, I wasn't using another solution to address similar needs.

What was our ROI?

I've seen ROI with Splunk Observability Cloud, though I cannot specify the exact amount. We would be unable to track user access issues without it, which would result in significant losses.

What other advice do I have?

I use the cloud almost exclusively and am still learning some features. I handle synthetic monitoring, but don't manage all integrations or usage aspects. I need to explore the AI-powered analytics and guidance, as we haven't implemented it yet. The out-of-the-box customizable dashboards are effective because they contain all the necessary base components.

On a scale of one to 10, I would rate Splunk Observability Cloud as an eight. I appreciate the cloud because it provides more visibility into the user's path. It's quite good, though the observability aspect is somewhat complicated, primarily due to my limited experience with it.

Which deployment model are you using for this solution?

Hybrid Cloud
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Last updated: Sep 11, 2025
Flag as inappropriate
PeerSpot user
Lakshmi Padaga - PeerSpot reviewer
Java Developer at U.S. Bank
Real User
Top 5
Sep 11, 2024
Collaborates performance metrics with log data to pinpoint the exact cause of issues and offers error detection
Pros and Cons
  • "It supports proactive management, enhances security, and improves operational efficiency."
  • "There are always areas for potential improvement to enhance its functionality and user experience."

What is our primary use case?

We use Splunk in APM to monitor our applications. So, we integrated it into our systems to enhance our monitoring and observing capabilities, especially for our microservices. 

So, we have used Splunk APM for this.

How has it helped my organization?

APM integrates well with Splunk’s other observability solutions. These logs with application performance monitoring can significantly impact our business in several positive ways, like troubleshooting and root cause analysis. 

Using these logs with APM, we can collaborate performance metrics with log data. It allows us to pinpoint the exact cause of issues, such as identifying specific errors in the logs. Because of this, we have access to faster resolution and detailed logs alongside performance metrics, enabling quicker diagnosis and resolution of problems. It also helps us minimize downtime and improve system reliability. 

Additionally, it improves our performance optimization with detailed insights and analyzing historical log data along with APM metrics. This allows us to understand long-term trends and make informed decisions about performance improvements and better user experience, like error reduction and proactive monitoring.

Splunk has reduced our mean time to resolution by 30%. 

If there is any issue in Splunk; we’ll identify the issue first and look for the error messages, like alerting with the Splunk user interface or in the logs that might indicate what the issue is and then determine which part of Splunk is affected. 

Then, we’ll refer to the Splunk official documentation and check the system's health. We’ll review the logs. By following these steps, I can resolve the issues with Splunk, ensuring that our monitoring and analytics capabilities remain effective.

What is most valuable?

Mainly, I like Splunk APM because it will show the errors compared to other tools. We use the dashboards to monitor our applications. It will tell us the errors, and we can solve them quickly.

I have used APM but haven’t used Trace Analyzer, though I have some knowledge of it. We are able to implement it. We have some Trace Log Points in Splunk APM to catch the errors. We have a special graph for it where we can see the red points.

We use OpenTelemetry. OpenTelemetry and Splunk APM are similar in terms of observability and monitoring. We use it for observability standardization, which allows us to collect traces and metrics, making it easier to work with different monitoring tools, including Splunk APM. It is more flexible because it allows us to instrument our applications without being locked into a specific monitoring vendor. 

It supports collecting traces, metrics, and logs from our applications, providing a comprehensive view of our performance and health endpoints. This data can be fed into Splunk APM, giving us in-depth analysis and insights about our application.

What needs improvement?

Splunk APM is a robust tool with many capabilities. There are always areas for potential improvement to enhance its functionality and user experience. 

For Splunk APM, there could be simplified navigation, like streamlining the user interface to make navigation more intuitive for our users, especially those new to APM, which can enhance usability. We can provide more customization options for dashboards and visualizations to help users tailor the platform to their specific needs. 

There could be more integration capabilities with a wider range of third-party tools and platforms would also be beneficial. By focusing on these areas, Splunk APM can enhance its value proposition, improve user satisfaction, and better meet the evolving needs of organizations monitoring their application performance.

For how long have I used the solution?

I have been using it for a year. 

What do I think about the stability of the solution?

I never had an issue with the stability. It worked fine. 

Which solution did I use previously and why did I switch?

My team has used alternatives to Splunk APM, like Datadog and New Relic.

How was the initial setup?

The initial setup was easy. To fully deploy it, we had to add some signal effects into our applications and just deploy it. It took like 20 minutes. That’s it.

What about the implementation team?

We took some help from our teams and my senior manager and also from other teams across our company. We connected and did all this together.

For deployment, one person can do it, actually, but as we are junior developers, we took help from our senior manager, like three to four people. 

Splunk is good like this now. I don’t think any updates would be required, but there are some regular updates and upgrades of Splunk APM, like software updates, version upgrades, and all. 

These provide more powerful monitoring capabilities and help ensure the system remains reliable, secure, and aligned with organizational needs. Regular updates, performance tuning, and proactive management help in maximizing these benefits of the Splunk solution.

What was our ROI?

We’ll see the results after the deployment. It’s not that late, and that’s the reason we are using Splunk APM.

 Splunk made our job easier in a way. It will give the points when we use any dashboards, and there are no delays in everything, like performance. It will give the error issues very clearly, and it will monitor 24/7. It will show the issues, and it is very effective. It will pinpoint the exact cause of the issues, and it will help us troubleshoot the issues very fast.

It benefits the IT staff in other teams, like operations, improves efficiency, and manages the IT environments more effectively. When it centralizes the logs and search analytics, the powerful capabilities allow IT teams to perform in-depth troubleshooting, identify root causes, and analyze complex issues with ease. 

Splunk also provides real-time visibility into IT infrastructure, and we have connected with cross-functional teams around our team to work with Splunk APM. It supports proactive management, enhances security, and improves operational efficiency. It facilitates better collaboration across the team.

What's my experience with pricing, setup cost, and licensing?

The pricing is  based on several factors, including the scale of deployment. The pricing model typically includes considerations like the number of hosts, features, and capabilities. 

What other advice do I have?

Overall, I would rate the solution a nine out of ten. 

Which deployment model are you using for this solution?

Public Cloud
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Buyer's Guide
Splunk Observability Cloud
February 2026
Learn what your peers think about Splunk Observability Cloud. Get advice and tips from experienced pros sharing their opinions. Updated: February 2026.
884,933 professionals have used our research since 2012.
Principal software architect at Verizon
Real User
Top 10
Jul 9, 2024
Useful to find statistical similarities between different traces
Pros and Cons
  • "The product's deployment phase is good and very easy because it is done with OpenTelemetry for most of the parts."
  • "Once you see the issues related to the scalability part, you need to understand that it is a warning triangle. After seeing the warning triangle, you need to realize that you cannot trust any of the numbers you see in the chart because it is not a complete, full data set."

What is our primary use case?

I use the solution in my company primarily for distributed tracing and metrics troubleshooting. I use the tool to troubleshoot incidents and find the root cause of errors when something goes wrong. I also personally use it to have a developer's understanding of what is going on in my application. Sometimes, there is a case where you might put your application in a library or a new library, and that library also makes calls somewhere. Splunk APM's monitoring can show you that there is a call you are making now that you never used to make in the prior version of the library. In these cases, which you may not know just by looking at the external view of the application code, the tracing part traces everything, including the lowest types of supports.

How has it helped my organization?

The main benefit of the tool I have noticed in the solution is reduced time for the resolution of incidents. The meantime to resolve can help pinpoint the root causes of the issues because you see the connections on the graph in Tag Spotlight. It is easier to pinpoint who is responsible for the incident, especially when you have a larger organization. You have teams that ride services where they need to talk to different services from different teams rather than having to hand off instant resolutions from one team to another. You can often find it much more quickly from the first instance of the problem occurring with the product in place. The tool specifically helps your sites move up more frequently, and then when it does go down, the solution finds the root cause and gets it back up as fast as possible.

What is most valuable?

The most valuable feature of the solution, and my favorite, is always Tag Spotlight, especially considering the way they slice and dice all of Splunk APM's traces by span attributes. 

I like the tool because it looks at a whole set of traces in aggregate, which means that it can find statistical similarities between different traces. Often, the cases are such that you will find some traces that show an error and have some other common attribute, which is much more apparent when you look at the feature known as Tag Spotlight rather than just looking at an overall metric. I like Tag Spotlight as it is one of the most simple to use features.

The meantime to resolve, or MTTR, can help pinpoint the root causes of the issues because you see the connections on the graph in Tag Spotlight. I don't personally have metrics associated with MTTR. I am more of the implementer of making certain that all the data is going in and looking at the debugging part. I am not a part of the set of people who keep track of the tool's MTTR.

In our company's case, we have reasonably good metrics related to the meantime to detect. I can't get a rough number when it comes to the meantime to detect, so I don't know for sure. My guess is that we often detect problems reasonably well. Our company figures out that there is some problem, but we just don't know where it is, so I feel that if there is an improvement, then it is mostly in the area of meantime to resolve. When it comes to the meantime to detect, I think our existing metrics are probably sufficient, and adding Splunk APM makes it much easier to detect the resolution time.

The tool has improved our organization's business resilience. In terms of resilience, in the tool, it is possible not to have downtime and make certain things up and running. The faster you get to web pages working again, the more people can actually do things that they want to do, such as trade players on their NFL Fantasy teams. In general, it gives out a better business result.

What needs improvement?

In our company's case, we have some very high throughput services, so they might be getting 10,000 requests per second. Currently, Splunk APM and Splunk Observability want to do things in a way that wants you to send every single span for every single request that is a part of the 10,000 requests per second. The process may give you all the data in the back end, but a lot of data, including CPU memory and network costs, is involved in sending data to Splunk. My feeling is that it would be nice if there were an easier way to send only a sample of my traces, which means that I send 10 percent or 5 percent, and then Splunk would extrapolate on the back end. It is obvious that with 10 percent of traces, the real metrics are something like ten times with a plus or minus margin of error. I am okay with the plus or minus margin of error because I think when you have a high enough request rate, you will see such problems appear even in a lower sample population. The process is political polling. You don't call all 150,000,000 people in the US and ask them who they are going to vote for, and I feel it is better if you choose to take a sample of maybe 10,000 and then extrapolate your findings to the rest. I feel the same should be applicable to trace something in Splunk APM.

For how long have I used the solution?

I have been using Splunk APM for two years.

What do I think about the stability of the solution?

I really haven't noticed anything going wrong with the tool's stability, and I haven't seen any downtime. I don't know if my company is necessarily measuring the stability part by ourselves, but at least for me, it is a pretty growing and solid solution.

What do I think about the scalability of the solution?

There is one issue with the tool's scalability. In our company, we are fairly big in terms of the number of containers we have, especially since we can run very large clusters. When you look at some of the charts, it will say 30,000 time series, reached the limit, and cannot show anymore, or it states that a particular data may not be complete. For me, it is a problem that I would like to see fixed. I have spoken to Spunk's team about it, and they have told me that they do recognize the issue and that other people have also mentioned the same problem. Once you see the issues related to the scalability part, you need to understand that it is a warning triangle. After seeing the warning triangle, you need to realize that you cannot trust any of the numbers you see in the chart because it is not a complete, full data set. I want the tool to either tell me that it can't show me the numbers or that I need to find some way to show all the numbers in a more summarized view. The tool asks you to filter things down more, but it would be nice to offer specific suggestions as to what you could filter down to get it into a more specific or reasonable number. In some cases, my company just has to have a number, considering that we have 1,00,000 containers. If I want to know how many containers are running, currently, the way the backend works in a way where it requires to know how many different time series there are, and then it just says that the 30,000 limit has been reached, but when it happens, I don't know whether it is for 1,00,000 containers, 1,20,000 containers or 80,000 containers.

How are customer service and support?

The technical support team for the solution is good for our company. My company has a weekly meeting with Splunk's sales support team, and if there are any issues, we bring them up for discussion. I have seen that the technical support team is super responsive.

Which solution did I use previously and why did I switch?

My company has its own internal solution, which was built ten to fifteen years ago, and it has progressed over time, but it is only ever used to support metrics and events, not for tracing. In short, it is not used for Splunk APM-related stuff, which is a big change that makes a difference for us.

How was the initial setup?

The product's deployment phase is good and very easy because it is done with OpenTelemetry for most of the parts. The product's deployment is not some custom thing where you have to deploy a particular agent that belongs to a particular company and put it on every single host. It is very easy to follow OpenTelemetry's models for the most part. Splunk is a very big contributor to OpenTelemetry, and I value it. It consists of the reasons I recommend using Splunk as a backend provider. In my company, we are more open to being more of an OpenTelemetry-compliant organization instead of going for other vendors.

What was our ROI?

I can't speak about the tool's ROI since I get paid, but I don't have to spend money on the product.

What's my experience with pricing, setup cost, and licensing?

I don't have much insight into the costs and licensing area attached to the tool. I am the engineer and developer, not the person who writes the checks in the company. I know that my company has a Splunk Enterprise Security license which is used for logging and even for Splunk Observability.

What other advice do I have?

I think the tool has the best trace aggregation features compared to what I have seen in different products, and I feel Tag Spotlight is a good example of it. A lot of the other products support tracing, but when you look at them, you see that they show one trace at a time. I can deep dive into one trace at a time, but what I want to find is commonality across the traces. I think it will give the tool a high grade for all its features. I rate the tool highly since it offers a very good Kubernetes integration. With a lot of data, you can see which part the Kubernetes host is running on, switch between them, and see the application metrics and the actual infrastructure metrics. Seeing it all together can be very useful.

I rate the tool a nine out of ten.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Sathis-Kumar - PeerSpot reviewer
Senior Manager at Bank of America
Real User
Top 5Leaderboard
Dec 30, 2024
Seamless issue detection with user time tracking and application load analysis
Pros and Cons
  • "The most valuable features include user time tracking and the ability to analyze application load times."
  • "The most valuable features include user time tracking and the ability to analyze application load times."
  • "It would be beneficial to have more enhanced features with capabilities to adapt more integrated applications. Improvements in dashboard configuration, customization, and artificial intelligence functionalities are desired."
  • "There is room for improvement in customer support due to delays and standard feedback responses."

What is our primary use case?

We primarily use Splunk Real User Monitoring to analyze performance bottlenecks and application transactions. It allows us to see how applications are experienced on the user side, making it easy to capture any bottlenecks or performance issues.

What is most valuable?

The most valuable features include user time tracking and the ability to analyze application load times. Splunk provides advanced notifications of roadblocks in the application, which helps us to improve and avoid impacts during high-volume days. It is very useful for identifying performance bottlenecks.

What needs improvement?

It would be beneficial to have more enhanced features with capabilities to adapt more integrated applications. Improvements in dashboard configuration, customization, and artificial intelligence functionalities are desired. There is room for improvement in customer support due to delays and standard feedback responses.

For how long have I used the solution?

I have been working with Splunk Real User Monitoring for almost two years.

What do I think about the stability of the solution?

In terms of stability, I would rate it a nine out of ten. It is a very stable solution.

What do I think about the scalability of the solution?

Splunk Real User Monitoring is definitely scalable. I would rate its scalability a nine out of ten.

How are customer service and support?

Technical support is rated an eight. There is some delay in their in-depth responses and standard answers to questions.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

I worked with Splunk alongside Dynatrace. Before Splunk, I did not use any other services.

How was the initial setup?

It takes about an hour to set up the client for real-time monitoring.

What about the implementation team?

We have a separate team for deployment, consisting of about three to four people.

What was our ROI?

We have achieved a return on investment between 10% to 20% as it helped in removing roadblocks, which could lead to more savings with wider usage.

What's my experience with pricing, setup cost, and licensing?

Splunk is a little expensive, however, it is in line with the current market pricing. I would rate the pricing an eight on a scale of one to ten, as it reflects the going rate in the market.

What other advice do I have?

I would recommend this product to other users because of its capabilities in monitoring and analytics. 

I rate the overall solution eight out of ten, considering the comparison with other products like Dynatrace.

Which deployment model are you using for this solution?

On-premises
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
MoatazElsayed - PeerSpot reviewer
Vice President, Consultancy Services at MTS
Real User
Top 20
Sep 24, 2025
Improves network visibility through real-time telemetry but pricing continues to be a challenge
Pros and Cons
  • "The best feature of this product is the latency and processing of all the telemetry that is being received, which gives full visibility at the right time."
  • "The pricing would be one area for improvement."

What is our primary use case?

The main use case with Splunk Observability Cloud is to capture the logs from the SD-WAN in order to check the health of the network and the flow of data from different sources to the central place.

What is most valuable?

The best feature of this product is the latency and processing of all the telemetry that is being received, which gives full visibility at the right time. 

One cannot protect and operate what they don't know. When there is this observability, it helps to see exactly what is present, the problems that may exist, and hence, it increases digital resilience by having proactive actions ahead, which increases the availability of the service.

The teams have utilized the ability to enrich data with custom metrics, as this enrichment is one of the key aspects used to have a clear understanding of which assets are being attacked, enabling necessary actions to be taken. The data has been enriched by adding customized information from customers' databases from different sources.

What needs improvement?

The pricing would be one area for improvement.

For how long have I used the solution?

I have used the SIEM solution since 2019 and have had experience with Splunk Observability Cloud for the last year.

How are customer service and support?

I would rate their customer service and technical support an eight out of ten.

How would you rate customer service and support?

Positive

What about the implementation team?

I work for SI, and we deliver to different organizations based on their requirements. We are responsible for implementation, so we implement and they see the value out of it.

What was our ROI?

Splunk Observability Cloud has improved the operational performance of our clients.

What's my experience with pricing, setup cost, and licensing?

It is expensive.

What other advice do I have?

The AI component is one of their strengths; currently, most competitors are moving in the same direction. As SI professionals, we are seeing different improvements in the AI domain for different products, and they are at the leading edge with many vendors following them.

My overall rating for Splunk Observability Cloud would be a seven out of ten.

Disclosure: My company has a business relationship with this vendor other than being a customer. Partner
Last updated: Sep 24, 2025
Flag as inappropriate
PeerSpot user
Munish Jain - PeerSpot reviewer
Head Security Operations at Health Care Authority
Real User
Top 20
Jul 9, 2024
Provides good optimization, performance, and visibility
Pros and Cons
  • "Splunk's GUI and dashboard capacity are the most valuable features of Splunk Infrastructure Monitoring."
  • "There's no standard use case available where you can see the utilization of the number of use cases I have."

What is our primary use case?

We use the solution to monitor and calculate the number of systems, applications, and DR sites we have. Then, if there is any problem, we can detect the information on which server belongs to which application. This really helps us.

How has it helped my organization?

We have seen 28% to 29% optimization and performance with Splunk Infrastructure Monitoring. You will know the moment you see any anomaly in the system, the server, or the infrastructure. The solution has given us more visibility not only from the infrastructure or server point of view but also from the network perspective.

What is most valuable?

Splunk's GUI and dashboard capacity are the most valuable features of Splunk Infrastructure Monitoring.

Compared to Microsoft Azure, Splunk Infrastructure Monitoring can ingest all the log sources. You can ingest all the data in one single source. Then, it accumulates the data, calculates internally, and gives you the right information you're looking for. Splunk Infrastructure Monitoring is the optimal solution, where you can see everything on one screen.

Our organization monitors multiple cloud environments, including GCP (Google Cloud Platform) and AWS (Amazon Web Services).

We're all completely dependent on Splunk's end-to-end visibility into our cloud-native environment to see everything, including any incident that comes.

Splunk Infrastructure Monitoring has helped drastically improve our meantime to resolve, detect, and investigate.

The solution has helped reduce our mean time to resolve by 28%, which is a huge number. We aim to reduce it by 30% to 37%, but that would definitely require some AI concept and new enterprise security. That's our plan for next year.

Splunk Infrastructure Monitoring has helped improve our organization's business resilience. The moment you receive an incident, you have full visibility. You can go deep into the investigation, do threat hunting, and find the root cause analysis. That's the visibility and performance we look for in enterprise security solutions like Splunk.

Splunk's unified platform helps consolidate networking, security, and IT observability tools. When you have multidimensional solutions and a multi-cloud environment, you have specific applications for finance and patient care. You can see everything consolidated in one solution.

DevOps and GRC compliance solutions come into one solution, and visibility extends. That gives you confidence, and we build trust with the business. Businesses are confident when they're going outside. Because we have full visibility, we provide that trust to the patient and my health care entities that we are safe.

What needs improvement?

The utilization of the use cases is not available. You need to write custom out-of-the-box use cases. There's no standard use case available where you can see the utilization of the number of use cases I have. For example, if you have 200 use cases, do you know if you are utilizing all 200 and if they are actually clicking at the right time?

If I can work 20 use cases out of 200, it is 20% utilization for the use cases. So, I'll focus more on 20% and try to optimize them based on my business requirements rather than focusing on 200.

For how long have I used the solution?

I have been using Splunk Infrastructure Monitoring for six years.

What do I think about the scalability of the solution?

The solution's scalability is marvelous because we can just add on. We are currently using two TB, and the solution gives us the flexibility to add an extra 500 GB next month.

How are customer service and support?

Sometimes, we face technical difficulties because of the limitations of the connectors. Integrating Splunk with post-relational databases like InterSystems is challenging because such applications or databases are not very much publicly exposed. The technical team faces a lot of challenges when integrating because they need to write some custom connectors to integrate the data.

We have some clinical applications specific to a particular specialty, and you have different applications and databases for that. For that, you need to write custom connectors. Sometimes, the technical team lingers on and passes the time because they're also exploring.

I rate the solution's technical support seven and a half out of ten.

How would you rate customer service and support?

Neutral

Which solution did I use previously and why did I switch?

We previously used a different solution called RSA. We switched to Splunk because RSA was not providing the latest changes and many of the upgrades we were expecting. Also, a lot of functionality we were expecting, like XDR, optimization processes, and connectors, was not available. We used RSA for four and a half years. RSA had performance issues, and a lot of use cases were not met because it was an old solution.

What about the implementation team?

We had a system integrator who initially helped us integrate and deploy the solution. They helped us to deploy the solution, and we take their help to develop any new use cases.

What was our ROI?

We have seen a return on investment with the solution. Our KPIs have become smooth. When we have more visibility, our KPIs definitely increase. We can easily measure meantime to detect and meantime to resolve. You will definitely be up to the mark when your incident response capability increases. Our performance has increased. Our IT environment and DevOps team have more visibility and are more transparent now.

What's my experience with pricing, setup cost, and licensing?

The solution's pricing is costly. We're now looking for a cloud version that would have a completely different pricing calculation.

What other advice do I have?

Splunk Infrastructure Monitoring has use case capability, visibility capability, and performance. It also has a vast dashboard capability that no other solution currently provides. There are many solutions in the market, but Splunk stands out separately. With Splunk Infrastructure Monitoring, you can correlate data and ingest any kind of data with your connectors. Flexibility is another important functionality of Splunk.

Overall, I rate the solution an eight out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: My company has a business relationship with this vendor other than being a customer. Home & Health Partner
PeerSpot user
reviewer2383053 - PeerSpot reviewer
Platform leader at a retailer with 10,001+ employees
Real User
Top 20
Apr 16, 2024
Used for troubleshooting purposes and to understand the bottlenecks of applications
Pros and Cons
  • "The solution's service map feature allows us to have a holistic overview and to see quickly where the issues are."
  • "Splunk APM should include a better correlation between resources and infrastructure monitoring."

What is our primary use case?

We use Splunk APM to understand and know the inner workings of our cloud-based and on-premises applications. We use the solution mainly for troubleshooting purposes and to understand where the bottlenecks and limits are. It's not used for monitoring purposes or sending an alert when the number of calls goes above or below some threshold.

The solution is used more for understanding and knowing where your bottlenecks are. So, it's used more for observability rather than for pure monitoring.

What is most valuable?

The solution's service map feature allows us to have a holistic overview and to see quickly where the issues are. It also allows us to look at every session without considering the sampling policy and see if a transaction contains any errors. It's also been used when we instrument real use amounts from the front end and then follow the sessions back into the back-end systems.

What needs improvement?

Splunk APM should include a better correlation between resources and infrastructure monitoring. The solution should define better service level indicators and service level objectives. The solution should also define workloads where you can say an environment is divided up by this area of back end and this area of integration. The solution should define workloads more to be able to see what is the service impact of a problem.

For how long have I used the solution?

I've been using Splunk APM in my current organization for the last 2 years, and I've used it for 4-5 years in total.

What do I think about the stability of the solution?

Splunk APM is a remarkably stable solution. We have only once encountered an outage of the ingestion, which was very nicely explained and taken care of by the Splunk team.

I rate the solution a 9 out of 10 for stability.

What do I think about the scalability of the solution?

Around 50 to 80 users use the solution in our organization. The solution's scalability fits what we are paying for. On the level of what we pay for, we have discovered both the soft limit and the hard limit of our environment. I would say we are abusing the system in terms of how scalable it is. Considering what we are paying for, we are able to use the landscape very well.

We have plans to increase the usage of Splunk APM.

How are customer service and support?

Splunk support itself leaves room for improvement. We have excellent support from the sales team, the sales engineers, the sales contact person, and our customer success manager. They are our contact when we need to escalate any support tickets. Since Splunk support is bound not to touch the consumer's environment, they cannot fix issues for us. It's pretty straightforward to place a support ticket.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We have previously used AppDynamics, Dynatrace, and New Relic. We see more and more that Splunk APM is the platform for collaboration. New Relic is more isolated, and each account or team has its own part of New Relic. It's very easy to correlate and find the data within an account. Collaborating across teams, their data, and their different accounts is very troublesome.

With Splunk APM, there is no sensitivity in the data. We can share the data and find a way to agree on how to collaborate. If two environments are named differently, we can still work together without infecting each other's operations.

How was the initial setup?

If you're using the more common languages, the initial deployment of Splunk APM is pretty straightforward.

What about the implementation team?

The solution's deployment time depends on the environment. If the team uses the cloud-native techniques of TerraForm and Ansible, it's pretty straightforward. The normal engagement is within a couple of weeks. When you assess the tool they need and look at the architecture and so on, the deployment time is very, very minimal. Most of the time spent internally is caused by our own overhead.

What's my experience with pricing, setup cost, and licensing?

We have a very good conversation with our vendor for Splunk APM. We have full transparency regarding the different license and cost models. We have found a way to handle both the normal average load and the high peak that some of our tests can cause. Splunk APM is a very cost-efficient solution. We have also changed the license model from a host-based license model to a more granular way to measure it, such as the number of metric time series or the traces analyzed per minute.

We have quite a firm statement that for every cost caused within Splunk, you need to be able to correlate it to an IT project or a team to see who the biggest cost driver is. As per our current model, we are buying a capacity, and we eventually want to have a pay-as-you-go model. We cannot use that currently because we have renewed our license for only one year.

What other advice do I have?

We are using Splunk Observability Cloud as a SaaS solution, but we have implemented Splunk APM on-premises, hybrid, and in the cloud. We are using it for Azure, AWS, and Google. Initially, the solution's implementation took a couple of months. Now, we are engaging more and more internal consumers on a weekly basis.

We implement the code and services and send the data into the Splunk Observability Cloud. This helps us understand who is talking to whom, where you have any latencies, and where you have the most error types of transactions between the services.

Most of the time, we do verification tests in production to see if we can scale up the number of transactions to a system and handle the number of transactions a business wants us to handle at a certain service level. It's both for verification and to understand where the slowness occurs and how it is replicated throughout the different services.

We can have full fidelity and totality of the information in the tool, and we don't need to think about the big variations of values. We can assess and see all the data. Without the solution's trace search and analytics feature, you will be completely blind. It's critical as it is about visibility and understanding your service.

Splunk APM offers end-to-end visibility across our environment because we use it to coexist with both synthetic monitoring and real user monitoring. What we miss today is the correlation to logs. We can connect to Splunk Cloud, but we are missing the role-based access control to the logs so that each user can see their related logs.

Visualizing and troubleshooting our cloud-native environment with Splunk APM is easy. A lot of out-of-the-box knowledge is available that is preset for looking at certain standard data sets. That's not only for APM but also for the available pre-built dashboards.

We are able to use distributed tracing with Splunk APM, and it is for the totality of our landscape. A lot of different teams can coexist and work with the same type of data and easily correlate with other systems' data. So, it's a platform for us to collaborate and explore together.

We use Splunk APM Trace Analyzer to better understand where the errors originate and the root cause of the errors. We use it to understand whether we are looking at the symptom or the real root cause. We identify which services have the problem and understand what is caused by code errors.

The Splunk Observability Cloud as a platform has improved over time. It allows us to use profiling together with Splunk Distribution of OpenTelemetry Collector, which provides a lot of insights into our applications and metadata. The tool is now a part of our natural workbench of different tools, and it's being used within the organization as part of the process. It is the tool that we use to troubleshoot and understand.

Our organization's telemetry data is interesting, not only from an IT operational perspective but also to understand how the tools are being used and how they have been providing value for the business. It is a multifaceted view of the data we have, and it is being generated and collected by the solution.

Splunk APM has helped reduce our mean time to resolve. Something that used to take 2-3 weeks to troubleshoot is now done within hours. Splunk APM has freed up some resources if we are going to troubleshoot. If you spend a lot of time troubleshooting something and can't find a problem, we cannot close the ticket saying there's no resolution. With Splunk APM, we can now know for sure where we have the problem rather than just ignoring it.

Splunk APM has saved our organization around 25% to 30% time. It's a little bit about moving away from firefighting to be preventive and estimate more for the future. That's why we are using it for performance. The solution allows us to help and support the organization during peak hours and be preventative with the bottlenecks rather than identify them afterward.

Around 5-10 people were involved in the solution's initial deployment. Integrating the solution with our existing DevOps tools is not part of the developer's IDE environment, and it's not tightly connected. We have both subdomains and teams structured. Normally, they also compartmentalize the environment, and we use the solution in different environments.

Splunk APM requires some life cycle management, which is natural. In general, once you have set it up, you don't need to put much effort into it. I would recommend Splunk APM to other users. That is mainly due to how you collaborate with the data and do not isolate it. There is a huge advantage with Splunk. We are currently using Splunk, Sentry, and New Relic, and part of our tool strategy is to move to Splunk.

As a consumer, you need to consider whether you are going to rely on OpenTelemetry as part of your standard observability framework. If that is the case, you should go for Splunk because Splunk is built on OpenTelemetry principles.

Compared to other tools using proprietary agents and proprietary techniques, you may have more insights into some implementations. However, you will have a tighter vendor lock-in, and you won't have the portability of the back end. If you rely on OpenTelemetry, then Splunk is the tool for you.

Overall, I rate the solution a 9 out of 10.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
GauravGupta6 - PeerSpot reviewer
Project Team Lead - Devops at a financial services firm with 10,001+ employees
Real User
Top 20
Mar 28, 2024
Provides great visibility, analysis, and data telemetry
Pros and Cons
  • "Detectors are a powerful feature."
  • "We currently lack log analysis capabilities in Splunk APM."

What is our primary use case?

We use Splunk APM to monitor the performance of our applications.

How has it helped my organization?

Splunk APM offers end-to-end visibility across our entire environment. We need to control how many types of metrics are ingested by Splunk APM from all incoming requests. While we allow some metrics to be collected, Splunk APM provides the ability to track each request from its starting point to its endpoint at every stage.

Splunk APM trace analyzer allows us to analyze a request by providing its trace ID. This trace ID gives us a detailed breakdown of how the request entered the system, how many services it interacted with along the way, and its overall path within the system. We can also identify any errors that occurred during the request's processing and track any slowness or latency issues. This information is very helpful for troubleshooting performance problems in our application.

Splunk APM telemetry data has been incredibly valuable. While we faced challenges with Splunk Enterprise, such as the lack of a trace analyzer, Splunk APM's user interface is modern and highly flexible. The wide range of data it provides has significantly improved our incident response times, allowing us to quickly create alerts and adhere to the infrastructure as code principle. Splunk APM also proves beneficial during load testing, contributing to a positive impact on our overall infrastructure performance analysis.

Splunk APM helps us reduce our mean time to resolution. With its fast and accurate alerting system, we can quickly identify the exact location of issues. This pinpoint accuracy streamlines the investigation process, leading to faster root-cause analysis.

Splunk APM has helped us save significant time. We're now spending less time resolving production incidents and analyzing performance data. This focus on Splunk APM allows us to dedicate more time to other areas.

What is most valuable?

Detectors are a powerful feature. They create signal flow code in a format similar to Splunk APM language. For example, if we select five conditions, the detector can automatically generate the code for that signal flow. This code can then be directly integrated into our Terraform modules, streamlining the creation of detectors using Terraform. This is particularly helpful because our infrastructure adheres to a well-defined practice, and detectors help automate this process.

APM dashboards are another valuable tool. They provide more comprehensive information than traditional spotlights. One particularly useful feature is the breakdown of a trace ID. This breakdown allows us to see the entire journey of a request, including where it originated, any slowdowns it encountered, and any issues it faced. This level of detail enables us to track down the root cause of performance problems for every request.

What needs improvement?

We currently lack log analysis capabilities in Splunk APM. Implementing this functionality would be very beneficial. With log analysis, we could eliminate our dependence on Splunk Enterprise and rely solely on APM. The user interface design of APM seems intuitive, which would likely simplify setting up log-level alerts. Currently, all log-level alerting is done through Splunk Enterprise, while infrastructure-level alerting has already transitioned to Splunk APM.

The Splunk APM documentation on the official Splunk website could benefit from additional resources. Specifically, including more examples of adapter creation and management using real-world use cases would be helpful. During our setup process, we found the documentation lacked specific implementation details. While some general information was available on public platforms like Google and YouTube, it wasn't comprehensive. This suggests that others using Splunk APM in the future might face similar challenges due to the limited information available on social media. It's important to remember that many users rely on social media for setup guidance these days.

For how long have I used the solution?

I have been using Splunk APM for 1.5 years.

What do I think about the stability of the solution?

While Splunk APM occasionally experiences slowdowns, it recovers on its own. Fortunately, these haven't resulted in major incidents because most maintenance is scheduled for weekends, with ample notice provided in advance. We have never experienced any data loss that occurred during previous slowdowns.

How are customer service and support?

Splunk APM customer support is helpful. They promptly acknowledge requests and provide regular updates. They've been able to fulfill all our information requests so far. However, Splunk APM is a constantly evolving product. This means there are some limitations due to ongoing industry advancements. They are actively working on incorporating customer feedback, such as the CV request. Overall, the customer support is excellent, but the desired features may not all be available yet.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

Previously, we used Grafana, but we faced challenges that led us to switch to Splunk APM. Since then, Splunk has become our primary tool for data analysis. In our experience, Splunk offers several advantages over Grafana. Setting up and using Splunk is significantly easier than Grafana. Splunk provides a user-friendly interface that allows anyone to start working immediately, while Grafana's setup can be more complex. Splunk also boasts superior reliability. Its architecture utilizes a master-slave node structure, with the ability to cluster for redundancy. This ensures that if a node goes down, another available node automatically takes over, minimizing downtime. Ultimately, our decision to switch to Splunk was driven by several factors: user-friendliness, a wider range of features, cost-effectiveness, and its established reputation. Splunk is a globally recognized and widely used tool, which suggests a higher level of trust and support from the industry.

We use Splunk Enterprise and Splunk APM. Splunk APM offers a comprehensive view of various application elements. We primarily migrated to APM to gain application-level metrics. This includes latency issues, which are delays in processing user requests. Splunk APM generates a unique trace ID for each user request. This allows us to track the request from the user to our servers and identify any delays or errors that occur along the way. 
Additionally, Splunk APM utilizes detectors to create alerts based on specific metrics. We've implemented alerts for CPU and memory usage, common issues in our Kubernetes infrastructure. We can also track container restarts within the cluster and pinpoint the causes. Another crucial area for us is subscription latency. Splunk APM allows us to monitor this metric and identify any performance bottlenecks. This capability was absent in Splunk Enterprise, necessitating the switch to APM. Furthermore, Splunk APM enables us to track application status codes, such as 404 errors.

Splunk APM facilitates the creation of informative dashboards using collected metrics. Additionally, the Metrics Explorer tool allows us to investigate specific metrics of interest and generate alerts or customized spotlights. 
Spotlights are tailored visualizations that track metrics for critical application areas. They can trigger alerts based on unexpected changes, such as a sudden increase in error codes over a set timeframe. This provides a more proactive approach to identifying potential issues compared to traditional detector-based alerts.

Splunk APM empowers us to effectively monitor various metrics during load testing. This includes analyzing memory usage across ten to eleven metrics, tracking container restarts during flow testing, and verifying the functionality of auto scaling mechanisms. The comprehensive visualization capabilities of Splunk APM surpass those of Splunk Enterprise, making it ideal for analyzing large sets of metrics and graphs.

We're currently exploring the integration of an OpenTelemetry agent with Splunk APM. This will enable us to collect and transmit a wider range of data, including application metrics, latency metrics, and basic infrastructure metrics such as CPU, memory, etc.

How was the initial setup?

During the initial Splunk deployment, I found that most information available on social media platforms catered to enterprise deployments. Fortunately, many of our new hires had prior Splunk experience, which eased the initial learning curve. Splunk's widespread adoption across industries also meant there was a general familiarity with the tool among the team. Additionally, the comprehensive documentation proved helpful. Overall, the initial rollout went smoothly, though there were some challenges that we were able to resolve.

The Splunk deployment was done on multiple environments. We started with development and then deployed to a staging environment, which sits between development and production. As expected, the development deployment took the longest. The total time for the entire deployment, including my cloud setup, was 2 to 3 weeks. It's important to note that this timeframe isn't solely dependent on Splunk implementation. Other factors can influence the timeline, such as network requests, firewall changes, and coordination with IT teams for license purchases. While the development deployment took longer, promoting Splunk to the staging and production environments was significantly faster. It only took 1 week for each environment.

What about the implementation team?

Our cloud deployment didn't require a consultant, but we used one for our on-premise enterprise deployment, which was a bit more complex.

What other advice do I have?

I would rate Splunk APM 9 out of 10.

The maintenance required is minimal because the cluster deployment helps ensure there is always 1 node working.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Buyer's Guide
Download our free Splunk Observability Cloud Report and get advice and tips from experienced pros sharing their opinions.
Updated: February 2026
Buyer's Guide
Download our free Splunk Observability Cloud Report and get advice and tips from experienced pros sharing their opinions.