Try our new research platform with insights from 80,000+ expert users
reviewer2383053 - PeerSpot reviewer
Platform leader at a retailer with 10,001+ employees
Real User
Used for troubleshooting purposes and to understand the bottlenecks of applications
Pros and Cons
  • "The solution's service map feature allows us to have a holistic overview and to see quickly where the issues are."
  • "Splunk APM should include a better correlation between resources and infrastructure monitoring."

What is our primary use case?

We use Splunk APM to understand and know the inner workings of our cloud-based and on-premises applications. We use the solution mainly for troubleshooting purposes and to understand where the bottlenecks and limits are. It's not used for monitoring purposes or sending an alert when the number of calls goes above or below some threshold.

The solution is used more for understanding and knowing where your bottlenecks are. So, it's used more for observability rather than for pure monitoring.

What is most valuable?

The solution's service map feature allows us to have a holistic overview and to see quickly where the issues are. It also allows us to look at every session without considering the sampling policy and see if a transaction contains any errors. It's also been used when we instrument real use amounts from the front end and then follow the sessions back into the back-end systems.

What needs improvement?

Splunk APM should include a better correlation between resources and infrastructure monitoring. The solution should define better service level indicators and service level objectives. The solution should also define workloads where you can say an environment is divided up by this area of back end and this area of integration. The solution should define workloads more to be able to see what is the service impact of a problem.

For how long have I used the solution?

I've been using Splunk APM in my current organization for the last 2 years, and I've used it for 4-5 years in total.

Buyer's Guide
Splunk Observability Cloud
May 2025
Learn what your peers think about Splunk Observability Cloud. Get advice and tips from experienced pros sharing their opinions. Updated: May 2025.
856,873 professionals have used our research since 2012.

What do I think about the stability of the solution?

Splunk APM is a remarkably stable solution. We have only once encountered an outage of the ingestion, which was very nicely explained and taken care of by the Splunk team.

I rate the solution a 9 out of 10 for stability.

What do I think about the scalability of the solution?

Around 50 to 80 users use the solution in our organization. The solution's scalability fits what we are paying for. On the level of what we pay for, we have discovered both the soft limit and the hard limit of our environment. I would say we are abusing the system in terms of how scalable it is. Considering what we are paying for, we are able to use the landscape very well.

We have plans to increase the usage of Splunk APM.

How are customer service and support?

Splunk support itself leaves room for improvement. We have excellent support from the sales team, the sales engineers, the sales contact person, and our customer success manager. They are our contact when we need to escalate any support tickets. Since Splunk support is bound not to touch the consumer's environment, they cannot fix issues for us. It's pretty straightforward to place a support ticket.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We have previously used AppDynamics, Dynatrace, and New Relic. We see more and more that Splunk APM is the platform for collaboration. New Relic is more isolated, and each account or team has its own part of New Relic. It's very easy to correlate and find the data within an account. Collaborating across teams, their data, and their different accounts is very troublesome.

With Splunk APM, there is no sensitivity in the data. We can share the data and find a way to agree on how to collaborate. If two environments are named differently, we can still work together without infecting each other's operations.

How was the initial setup?

If you're using the more common languages, the initial deployment of Splunk APM is pretty straightforward.

What about the implementation team?

The solution's deployment time depends on the environment. If the team uses the cloud-native techniques of TerraForm and Ansible, it's pretty straightforward. The normal engagement is within a couple of weeks. When you assess the tool they need and look at the architecture and so on, the deployment time is very, very minimal. Most of the time spent internally is caused by our own overhead.

What's my experience with pricing, setup cost, and licensing?

We have a very good conversation with our vendor for Splunk APM. We have full transparency regarding the different license and cost models. We have found a way to handle both the normal average load and the high peak that some of our tests can cause. Splunk APM is a very cost-efficient solution. We have also changed the license model from a host-based license model to a more granular way to measure it, such as the number of metric time series or the traces analyzed per minute.

We have quite a firm statement that for every cost caused within Splunk, you need to be able to correlate it to an IT project or a team to see who the biggest cost driver is. As per our current model, we are buying a capacity, and we eventually want to have a pay-as-you-go model. We cannot use that currently because we have renewed our license for only one year.

What other advice do I have?

We are using Splunk Observability Cloud as a SaaS solution, but we have implemented Splunk APM on-premises, hybrid, and in the cloud. We are using it for Azure, AWS, and Google. Initially, the solution's implementation took a couple of months. Now, we are engaging more and more internal consumers on a weekly basis.

We implement the code and services and send the data into the Splunk Observability Cloud. This helps us understand who is talking to whom, where you have any latencies, and where you have the most error types of transactions between the services.

Most of the time, we do verification tests in production to see if we can scale up the number of transactions to a system and handle the number of transactions a business wants us to handle at a certain service level. It's both for verification and to understand where the slowness occurs and how it is replicated throughout the different services.

We can have full fidelity and totality of the information in the tool, and we don't need to think about the big variations of values. We can assess and see all the data. Without the solution's trace search and analytics feature, you will be completely blind. It's critical as it is about visibility and understanding your service.

Splunk APM offers end-to-end visibility across our environment because we use it to coexist with both synthetic monitoring and real user monitoring. What we miss today is the correlation to logs. We can connect to Splunk Cloud, but we are missing the role-based access control to the logs so that each user can see their related logs.

Visualizing and troubleshooting our cloud-native environment with Splunk APM is easy. A lot of out-of-the-box knowledge is available that is preset for looking at certain standard data sets. That's not only for APM but also for the available pre-built dashboards.

We are able to use distributed tracing with Splunk APM, and it is for the totality of our landscape. A lot of different teams can coexist and work with the same type of data and easily correlate with other systems' data. So, it's a platform for us to collaborate and explore together.

We use Splunk APM Trace Analyzer to better understand where the errors originate and the root cause of the errors. We use it to understand whether we are looking at the symptom or the real root cause. We identify which services have the problem and understand what is caused by code errors.

The Splunk Observability Cloud as a platform has improved over time. It allows us to use profiling together with Splunk Distribution of OpenTelemetry Collector, which provides a lot of insights into our applications and metadata. The tool is now a part of our natural workbench of different tools, and it's being used within the organization as part of the process. It is the tool that we use to troubleshoot and understand.

Our organization's telemetry data is interesting, not only from an IT operational perspective but also to understand how the tools are being used and how they have been providing value for the business. It is a multifaceted view of the data we have, and it is being generated and collected by the solution.

Splunk APM has helped reduce our mean time to resolve. Something that used to take 2-3 weeks to troubleshoot is now done within hours. Splunk APM has freed up some resources if we are going to troubleshoot. If you spend a lot of time troubleshooting something and can't find a problem, we cannot close the ticket saying there's no resolution. With Splunk APM, we can now know for sure where we have the problem rather than just ignoring it.

Splunk APM has saved our organization around 25% to 30% time. It's a little bit about moving away from firefighting to be preventive and estimate more for the future. That's why we are using it for performance. The solution allows us to help and support the organization during peak hours and be preventative with the bottlenecks rather than identify them afterward.

Around 5-10 people were involved in the solution's initial deployment. Integrating the solution with our existing DevOps tools is not part of the developer's IDE environment, and it's not tightly connected. We have both subdomains and teams structured. Normally, they also compartmentalize the environment, and we use the solution in different environments.

Splunk APM requires some life cycle management, which is natural. In general, once you have set it up, you don't need to put much effort into it. I would recommend Splunk APM to other users. That is mainly due to how you collaborate with the data and do not isolate it. There is a huge advantage with Splunk. We are currently using Splunk, Sentry, and New Relic, and part of our tool strategy is to move to Splunk.

As a consumer, you need to consider whether you are going to rely on OpenTelemetry as part of your standard observability framework. If that is the case, you should go for Splunk because Splunk is built on OpenTelemetry principles.

Compared to other tools using proprietary agents and proprietary techniques, you may have more insights into some implementations. However, you will have a tighter vendor lock-in, and you won't have the portability of the back end. If you rely on OpenTelemetry, then Splunk is the tool for you.

Overall, I rate the solution a 9 out of 10.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
GauravGupta6 - PeerSpot reviewer
Project Team Lead - Devops at a financial services firm with 10,001+ employees
Real User
Top 10
Provides great visibility, analysis, and data telemetry
Pros and Cons
  • "Detectors are a powerful feature."
  • "We currently lack log analysis capabilities in Splunk APM."

What is our primary use case?

We use Splunk APM to monitor the performance of our applications.

How has it helped my organization?

Splunk APM offers end-to-end visibility across our entire environment. We need to control how many types of metrics are ingested by Splunk APM from all incoming requests. While we allow some metrics to be collected, Splunk APM provides the ability to track each request from its starting point to its endpoint at every stage.

Splunk APM trace analyzer allows us to analyze a request by providing its trace ID. This trace ID gives us a detailed breakdown of how the request entered the system, how many services it interacted with along the way, and its overall path within the system. We can also identify any errors that occurred during the request's processing and track any slowness or latency issues. This information is very helpful for troubleshooting performance problems in our application.

Splunk APM telemetry data has been incredibly valuable. While we faced challenges with Splunk Enterprise, such as the lack of a trace analyzer, Splunk APM's user interface is modern and highly flexible. The wide range of data it provides has significantly improved our incident response times, allowing us to quickly create alerts and adhere to the infrastructure as code principle. Splunk APM also proves beneficial during load testing, contributing to a positive impact on our overall infrastructure performance analysis.

Splunk APM helps us reduce our mean time to resolution. With its fast and accurate alerting system, we can quickly identify the exact location of issues. This pinpoint accuracy streamlines the investigation process, leading to faster root-cause analysis.

Splunk APM has helped us save significant time. We're now spending less time resolving production incidents and analyzing performance data. This focus on Splunk APM allows us to dedicate more time to other areas.

What is most valuable?

Detectors are a powerful feature. They create signal flow code in a format similar to Splunk APM language. For example, if we select five conditions, the detector can automatically generate the code for that signal flow. This code can then be directly integrated into our Terraform modules, streamlining the creation of detectors using Terraform. This is particularly helpful because our infrastructure adheres to a well-defined practice, and detectors help automate this process.

APM dashboards are another valuable tool. They provide more comprehensive information than traditional spotlights. One particularly useful feature is the breakdown of a trace ID. This breakdown allows us to see the entire journey of a request, including where it originated, any slowdowns it encountered, and any issues it faced. This level of detail enables us to track down the root cause of performance problems for every request.

What needs improvement?

We currently lack log analysis capabilities in Splunk APM. Implementing this functionality would be very beneficial. With log analysis, we could eliminate our dependence on Splunk Enterprise and rely solely on APM. The user interface design of APM seems intuitive, which would likely simplify setting up log-level alerts. Currently, all log-level alerting is done through Splunk Enterprise, while infrastructure-level alerting has already transitioned to Splunk APM.

The Splunk APM documentation on the official Splunk website could benefit from additional resources. Specifically, including more examples of adapter creation and management using real-world use cases would be helpful. During our setup process, we found the documentation lacked specific implementation details. While some general information was available on public platforms like Google and YouTube, it wasn't comprehensive. This suggests that others using Splunk APM in the future might face similar challenges due to the limited information available on social media. It's important to remember that many users rely on social media for setup guidance these days.

For how long have I used the solution?

I have been using Splunk APM for 1.5 years.

What do I think about the stability of the solution?

While Splunk APM occasionally experiences slowdowns, it recovers on its own. Fortunately, these haven't resulted in major incidents because most maintenance is scheduled for weekends, with ample notice provided in advance. We have never experienced any data loss that occurred during previous slowdowns.

How are customer service and support?

Splunk APM customer support is helpful. They promptly acknowledge requests and provide regular updates. They've been able to fulfill all our information requests so far. However, Splunk APM is a constantly evolving product. This means there are some limitations due to ongoing industry advancements. They are actively working on incorporating customer feedback, such as the CV request. Overall, the customer support is excellent, but the desired features may not all be available yet.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

Previously, we used Grafana, but we faced challenges that led us to switch to Splunk APM. Since then, Splunk has become our primary tool for data analysis. In our experience, Splunk offers several advantages over Grafana. Setting up and using Splunk is significantly easier than Grafana. Splunk provides a user-friendly interface that allows anyone to start working immediately, while Grafana's setup can be more complex. Splunk also boasts superior reliability. Its architecture utilizes a master-slave node structure, with the ability to cluster for redundancy. This ensures that if a node goes down, another available node automatically takes over, minimizing downtime. Ultimately, our decision to switch to Splunk was driven by several factors: user-friendliness, a wider range of features, cost-effectiveness, and its established reputation. Splunk is a globally recognized and widely used tool, which suggests a higher level of trust and support from the industry.

We use Splunk Enterprise and Splunk APM. Splunk APM offers a comprehensive view of various application elements. We primarily migrated to APM to gain application-level metrics. This includes latency issues, which are delays in processing user requests. Splunk APM generates a unique trace ID for each user request. This allows us to track the request from the user to our servers and identify any delays or errors that occur along the way. 
Additionally, Splunk APM utilizes detectors to create alerts based on specific metrics. We've implemented alerts for CPU and memory usage, common issues in our Kubernetes infrastructure. We can also track container restarts within the cluster and pinpoint the causes. Another crucial area for us is subscription latency. Splunk APM allows us to monitor this metric and identify any performance bottlenecks. This capability was absent in Splunk Enterprise, necessitating the switch to APM. Furthermore, Splunk APM enables us to track application status codes, such as 404 errors.

Splunk APM facilitates the creation of informative dashboards using collected metrics. Additionally, the Metrics Explorer tool allows us to investigate specific metrics of interest and generate alerts or customized spotlights. 
Spotlights are tailored visualizations that track metrics for critical application areas. They can trigger alerts based on unexpected changes, such as a sudden increase in error codes over a set timeframe. This provides a more proactive approach to identifying potential issues compared to traditional detector-based alerts.

Splunk APM empowers us to effectively monitor various metrics during load testing. This includes analyzing memory usage across ten to eleven metrics, tracking container restarts during flow testing, and verifying the functionality of auto scaling mechanisms. The comprehensive visualization capabilities of Splunk APM surpass those of Splunk Enterprise, making it ideal for analyzing large sets of metrics and graphs.

We're currently exploring the integration of an OpenTelemetry agent with Splunk APM. This will enable us to collect and transmit a wider range of data, including application metrics, latency metrics, and basic infrastructure metrics such as CPU, memory, etc.

How was the initial setup?

During the initial Splunk deployment, I found that most information available on social media platforms catered to enterprise deployments. Fortunately, many of our new hires had prior Splunk experience, which eased the initial learning curve. Splunk's widespread adoption across industries also meant there was a general familiarity with the tool among the team. Additionally, the comprehensive documentation proved helpful. Overall, the initial rollout went smoothly, though there were some challenges that we were able to resolve.

The Splunk deployment was done on multiple environments. We started with development and then deployed to a staging environment, which sits between development and production. As expected, the development deployment took the longest. The total time for the entire deployment, including my cloud setup, was 2 to 3 weeks. It's important to note that this timeframe isn't solely dependent on Splunk implementation. Other factors can influence the timeline, such as network requests, firewall changes, and coordination with IT teams for license purchases. While the development deployment took longer, promoting Splunk to the staging and production environments was significantly faster. It only took 1 week for each environment.

What about the implementation team?

Our cloud deployment didn't require a consultant, but we used one for our on-premise enterprise deployment, which was a bit more complex.

What other advice do I have?

I would rate Splunk APM 9 out of 10.

The maintenance required is minimal because the cluster deployment helps ensure there is always 1 node working.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Buyer's Guide
Splunk Observability Cloud
May 2025
Learn what your peers think about Splunk Observability Cloud. Get advice and tips from experienced pros sharing their opinions. Updated: May 2025.
856,873 professionals have used our research since 2012.
Nagendra Nekkala. - PeerSpot reviewer
Senior Manager ICT & at Bangalore International Airport Limited
Real User
Top 5Leaderboard
Is easy to use, provides great visibility, and reduces our resolution time
Pros and Cons
  • "The data collection from our VMs, containers, databases, and backend components is valuable."
  • "Splunk Infrastructure Monitoring's data analytics can be improved by including suggestions for various types of continuous monitoring."

What is our primary use case?

We use Splunk Infrastructure Monitoring to monitor our hybrid infrastructure.

We implemented Splunk Infrastructure Monitoring to help us monitor our infrastructure as we scale.

How has it helped my organization?

Splunk Infrastructure Monitoring is easy to use. It helps us quickly analyze how our infrastructure is performing across various services.

It helps with proper log management, allowing us to monitor our systems and analyze log data regularly. It also provides security operations capabilities for monitoring system health and ensuring uptime. We noticed these benefits immediately.

Our operational efficiency has been increased. It has improved our system health by monitoring the performance of data on servers, virtual machines, and containers, along with overall background processes.

Splunk Infrastructure Monitoring provides end-to-end visibility into our cloud-native environment. This is crucial because any data corruption can impact all the information we've deployed. It also aids in log management, offering parameters that extend its functionality as a comprehensive monitoring tool for CPU, memory usage, and network traffic.

It has helped reduce our mean time to detect by four hours.Our mean time to resolution has been reduced by two hours. By providing access to all our network parameters, it simplifies log ingestion through streamlined calculations.

Splunk Infrastructure Monitoring provides us with faster and more comprehensive insights into our infrastructure, allowing us to focus on critical business initiatives.

We saw the time to value immediately after deploying Splunk Infrastructure Monitoring.

What is most valuable?

The data collection from our VMs, containers, databases, and backend components is valuable.

What needs improvement?

Splunk Infrastructure Monitoring's data analytics can be improved by including suggestions for various types of continuous monitoring.

For how long have I used the solution?

I have been using Splunk Infrastructure Monitoring for three years.

What do I think about the stability of the solution?

The network uptime and monitoring are great.

What do I think about the scalability of the solution?

The scalability of Splunk Infrastructure Monitoring is excellent.

How are customer service and support?

The technical support is good.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We previously used Datadog but it doesn't offer network monitoring features like CPU utilization or overall server performance, which Splunk Infrastructure Monitoring does, so we switched.

Splunk Infrastructure Monitoring offers more functionality and visibility, making it a better choice for handling cloud architecture compared to Datadog.

How was the initial setup?

The initial setup was straightforward. One person was required for the deployment.

What other advice do I have?

I would rate Splunk Infrastructure Monitoring 9 out of 10.

Splunk Infrastructure Monitoring offers automated, continuous monitoring and diagnostics, delivering real-time reports for all your data with enhanced functionality compared to other solutions.

We have 200 users of Splunk Infrastructure Monitoring.

Splunk Infrastructure Monitoring is the best solution for monitoring networks, parameters, CPU, memory usage, and network traffic cases. 

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Nagendra Nekkala. - PeerSpot reviewer
Senior Manager ICT & at Bangalore International Airport Limited
Real User
Top 5Leaderboard
Offers end-to-end visibility, real-time monitoring, and distributed tracing, enabling organizations to optimize application performance and troubleshoot issues efficiently
Pros and Cons
  • "The most valuable features are troubleshooting and optimizing application performance."
  • "It is essential for the monitoring tool to deliver quick response times when generating analytical reports, instead of prolonged delays."

What is our primary use case?

I use it for monitoring and troubleshooting the performance of cloud-native applications.

How has it helped my organization?

Providing comprehensive visibility throughout the environment, it monitors my system, enhances career performance, and offers insights into the user experience.

Troubleshooting and visualizing a cloud-native environment is made easy with Splunk APM. It provides complete visibility into software tools, swiftly monitoring business performance and applications.

It possesses the capability to conduct distributed tracing within our environment. This includes monitoring the speed of tracked access, extending from end users to the Internet, system, and network services, and supporting my software application. Consequently, it offers an end-to-end overview of potential bottlenecks.

Splunk APM has significantly enhanced our organizational efficiency. Initially, my responsibilities included tracking website application performance, managing applications, and handling license releases. Now, it provides real-time user monitoring, transforming the way I handle these tasks.

It significantly impacts our organization's telemetry data, improving operational performance and user experience. The platform provides insights into application performance and effective log management. Ensuring accurate tracking of all performance-related logs contributes to building up the application performance percentage with comprehensive data.

It contributed to a daily reduction of six hours in our mean time to resolve.

What is most valuable?

The most valuable features are troubleshooting and optimizing application performance. 

Another value lies in the resilience and quick recovery capabilities offered by the SIEM. It enables thorough monitoring across our landscape, providing insights into the number of running software applications. The tool furnishes comprehensive information across microservices, significantly enhancing our proficiency.

What needs improvement?

Enhancing system availability and optimizing service performance are crucial. It is essential for the monitoring tool to deliver quick response times when generating analytical reports, instead of prolonged delays.

For how long have I used the solution?

I have been using it for two years.

What do I think about the stability of the solution?

It provides good stability capabilities.

What do I think about the scalability of the solution?

It has the capacity to scale. There are approximately two hundred users and one administrator that use it.

How are customer service and support?

I would rate its customer service and support eight out of ten.

How would you rate customer service and support?

Positive

How was the initial setup?

The initial setup was straightforward.

What about the implementation team?

The deployment process took six hours. During this time, a clear understanding was established regarding which technical applications—whether cloud-based, native, or others—needed monitoring and improved performance. These categories were identified in-house, with two individuals overseeing the process.

What was our ROI?

It allowed our IT staff to focus on other projects by freeing up their time. In total, it saved around four hours.

Which other solutions did I evaluate?

We evaluated Grafana.

What other advice do I have?

It can serve as an analytical application for enhancing performance, ensuring all dependencies are effectively addressed. Overall, I would rate it eight out of ten.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
reviewer2500098 - PeerSpot reviewer
Senior Cybersecurity Engineer at a energy/utilities company with 5,001-10,000 employees
Real User
Saves time and enables our teams to look at and troubleshoot issues themselves
Pros and Cons
  • "Dashboards help the application support teams to have a quick look at how their systems are running. It helps other teams as well."
  • "They can get more integration with a few more products. They can also update some of the dashboards that are in there now."

What is our primary use case?

We have a lot of applications that we monitor. We have a lot of hardware that runs on VMware. We monitor all of that as well.

How has it helped my organization?

Dashboards have been helpful because people can go and look for themselves how their systems are running. The requests for us to go look at something have gone down because people can go and do it themselves.

It is important for us that Splunk Infrastructure Monitoring has end-to-end visibility. Developers and those types of teams can look at and troubleshoot any kind of issues quickly.

Splunk Infrastructure Monitoring has helped reduce our mean time to resolve, but I do not know how much. We just help as needed, but for the most part, it is just the teams going in there and looking at things themselves.

Splunk Infrastructure Monitoring has helped improve our organization’s business resilience.

Different teams can see a lot of different aspects of what is going on. They can see network traffic. They can see applications, and they can see hardware peaks and performances. They can see everything they need.

We could see the value of Splunk Infrastructure Monitoring within a couple of weeks of implementing it.

What is most valuable?

Dashboards help the application support teams to have a quick look at how their systems are running. It helps other teams as well.

What needs improvement?

They can get more integration with a few more products.

They can also update some of the dashboards that are in there now.

It is pretty good in terms of the ability to predict, identify, and solve problems in real-time, but there is always room for improvement.

For how long have I used the solution?

I am in a new role. I have been there for two months. That is as long as I have been using it.

What do I think about the stability of the solution?

It is very stable. It is good.

What do I think about the scalability of the solution?

Its scalability is great.

How are customer service and support?

It is very good. I would rate them a nine out of ten. They are usually pretty helpful and knowledgeable.

How would you rate customer service and support?

Positive

How was the initial setup?

We have it on-prem, and we also have a cloud instance. Our cloud provider is AWS. We do not monitor multiple cloud environments.

Deploying it was pretty straightforward. We just had to make sure that we were getting the logs right and setting the apps right. That was pretty much it.

What was our ROI?

We have seen an ROI in terms of manhours and less work for everyone.

What's my experience with pricing, setup cost, and licensing?

I have always used Splunk.

What other advice do I have?

I would rate Splunk Infrastructure Monitoring a ten out of ten. It is great. It is much better than a lot of other products, so it is definitely up there.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Computer Engineer at Fuse engineering
Real User
Top 20
Provides good metrics, scales well, and has good support
Pros and Cons
  • "I have primarily used it to go back into the past and understand why something happened. It provides enough information to do research and figure things out."
  • "One thing I recently ran into was that the logs on the server most often get Gzipped after they have been rotated. We found that we were not monitoring some of the things, so we had to go back and pull them in. Right now, it pulls one at a time, untars it, or unzips it, so I cannot look at the entire history. There can be an improvement in that area."

What is our primary use case?

We are monitoring our servers and their health. We are monitoring their functionality and supporting the Kubernetes platform.

How has it helped my organization?

Our team supports multiple different projects. They all have their own clusters and ways of operating, but we just use one Splunk Infrastructure Monitoring system.

Splunk Infrastructure Monitoring has helped improve our organization’s business resilience.

What is most valuable?

I have primarily used it to go back into the past and understand why something happened. It provides enough information to do research and figure things out.

What needs improvement?

One thing I recently ran into was that the logs on the server most often get Gzipped after they have been rotated. We found that we were not monitoring some of the things, so we had to go back and pull them in. Right now, it pulls one at a time, untars it, or unzips it, so I cannot look at the entire history. There can be an improvement in that area.

For how long have I used the solution?

I have been using Splunk Infrastructure Monitoring for four years.

What do I think about the stability of the solution?

It is stable.

What do I think about the scalability of the solution?

About a year ago, we added another 600 servers and scaled up. We are getting more in the next year or later this year. It works smoothly.

How are customer service and support?

They are good. I have a ticket open now. I told them to go ahead and close it because we thought it was a hardware issue, but they said that they would keep the case open till the hardware replacement to see if the issue goes away. That was pretty nice.

Which solution did I use previously and why did I switch?

All of our hardware is HPE-based. We rely mostly on OneView, but it does not give us the service aggregation and other things that Splunk Infrastructure Monitoring is giving us.

How was the initial setup?

One of the gentlemen on other teams came to ours. He is very knowledgeable about Splunk, so he helped with the implementation.

All of our servers are RHEL-based.

Which other solutions did I evaluate?

A different organization group within our organization had Splunk, and they liked it, so we just went with Splunk.

What other advice do I have?

I would rate Splunk Infrastructure Monitoring a ten out of ten.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Rodney Riettie - PeerSpot reviewer
Software Engineer at a healthcare company with 10,001+ employees
Real User
Top 20
Helps to ingest a massive amount of raw data and use it effectively
Pros and Cons
  • "The most valuable thing that we have seen within our group is the ability to ingest all this raw data and have it organized in a certain way so that different groups can get effective alerting from this massive amount of raw data that is out there."
  • "A lot of customers had a hard time effectively searching within the data in Splunk. There is a learning curve from searches to indexes and using all the macros that we have created. It is a little difficult for somebody who has not used it quite a bit and does not have a lot of practice with it, but the AI features that we have been hearing about through Splunk will make it a lot easier for us to use human language to search this data. That is big. That is pretty powerful, and that will help a lot with our customers."

What is our primary use case?

We mainly use it for different divisions and departments within our company to keep track of our systems' health. We also ingest log files to get data and alerts for different groups.

How has it helped my organization?

We used to use a number of different tools before we were introduced to Splunk. We used to have a very hard time getting this data in and being able to effectively use it because we had such a massive amount of data. We also could not find a way to organize it effectively. Splunk helped us to effectively use all the data that we collect in a valuable way for different customers and groups that we have in our company.

It has definitely helped reduce our meantime to resolve (MTTR). A lot of our customers have difficulty getting to root cause analysis of different problems and situations. They also do not have the data to perform analytical responses for different problems that there could be within our industry. They are now able to use this data effectively, not just for alerting, but also for preventative maintenance.

It has definitely improved our organization’s business resiliency by a lot. I do not have the actual data to share at this time, but there has been a marked improvement in the organization. We are now able to keep track of all the raw data that we pull in and then use it effectively. This helps our organization run more efficiently.

It has improved our organization's ability to predict, identify, and solve problems in real time. We are able to use data and search for it effectively. We have different analytical forms and data that we can use to improve in different ways. 

What is most valuable?

The most valuable thing that we have seen within our group is the ability to ingest all this raw data and have it organized in a certain way so that different groups can get effective alerting from this massive amount of raw data that is out there.

What needs improvement?

A lot of customers had a hard time effectively searching within the data in Splunk. There is a learning curve from searches to indexes and using all the macros that we have created. It is a little difficult for somebody who has not used it quite a bit and does not have a lot of practice with it, but the AI features that we have been hearing about through Splunk will make it a lot easier for us to use human language to search this data. That is big. That is pretty powerful, and that will help a lot with our customers. At the Splunk conference, some of the talks have been about the AI platform and more effective and easier ways to search within Splunk through indexes and other things. These features will help correct some of the things with which we are having a hard time with some of our customers.

For how long have I used the solution?

We have been using this solution for about four years.

What do I think about the stability of the solution?

We are not on the cloud. We are all on-prem. We have had certain issues with space on the servers and things like that, and while moving things up to what we need, we have not had any issues on the Splunk side.

How are customer service and support?

It is great. We have not had any major issues with getting support from Splunk. With our monthly license, there are a certain amount of hours that we have with Splunk support. We are able to use it when we are getting close to the end of the month. In our meetings, we make a list of different topics that we would like to explore and discuss with Splunk. We create meetings for that, and they are always very helpful. We never had any issues in getting support from Splunk. I would rate their support a ten out of ten.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We used to use Tivoli. We also use AppDynamics in addition to Splunk for different parts, but we are starting to learn that Splunk does have a lot of similar toolsets. Splunk does the same as what AppDynamics does, and in some cases, there are more powerful tool sets that would help us. We are thinking of petering down our different tools to get into one tool, possibly Splunk. We already got rid of Tivoli, and we are using Splunk fully in place of Tivoli. We have seen a positive response to it.

We have seen cost efficiencies by switching to this solution. Because of the wider range of tools that Splunk offers, we were able to get rid of Tivoli and get rid of that licensing obligation on an annual basis. We are able to save a good amount of money on that and move that budget over to our Splunk budget to keep everything under one umbrella.

How was the initial setup?

I was not involved in its deployment. I came on the year after.

We are currently on-prem, but we are working on developing and moving everything over to a Google Cloud platform. The announcement that Splunk is partnering with Google Cloud, in addition to AWS, is pretty good for us because we are working on moving over to the cloud in the next couple of years.

What was our ROI?

We have definitely seen an ROI. Our team is able to spend more time learning one tool as opposed to having to learn multiple different toolsets. Therefore, we are able to get more work done in a more efficient manner.

We have seen time to value using this solution. Our company has a very heavy push toward work-life management. Since we have been able to, especially in our group, switch to this tool, we could cut down on our on-call time and have our groups run on different patterns where people who are off are actually off. They do not have to be called in because essentially, everybody is able to access the tool and use it effectively because it is the one tool that we use as opposed to having different tool sets. Everybody knows how to use it, so it definitely has helped us in that way.

Which other solutions did I evaluate?

I know there was a panel and a team that was going through different tools. I was not a part of that process, but I know there were quite a bit of tools that they were looking at. Splunk must have worked out better than everything else.

What other advice do I have?

I would rate Splunk Infrastructure Monitoring a ten out of ten.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Manish Arora - PeerSpot reviewer
Senior Client Partner at a tech consulting company with 1-10 employees
Real User
Top 10
Is easy to use, and improves performance, but does not monitor network devices
Pros and Cons
  • "The vibrant dashboards are valuable."
  • "The end-to-end visibility is lacking because Splunk cannot directly monitor network devices."

What is our primary use case?

Splunk Infrastructure Monitoring helps identify bottlenecks within the network domain, including issues related to server databases, application response times, and code. These problems can be resolved by our customers promptly.

How has it helped my organization?

It is easy to use. It offers a unique dashboard reporting tool called Ollie. Ollie is essentially an observability tool, and it's also referred to simply as "Ollie" for brevity. It's important to note that this product is agent-based only.

Splunk Infrastructure Monitoring helps improve the efficiency and performance of applications by up to 70 percent.

It has helped reduce our mean time to detect. It has helped to reduce our mean time to resolve by around 50 percent.

Splunk helps us focus on business-critical initiatives.

It integrates well with multiple sets of products.

What is most valuable?

The vibrant dashboards are valuable.

What needs improvement?

The main drawback of Splunk for network monitoring is its limited agent deployment. Splunk excels at collecting data from servers and databases where agents can be installed. However, it cannot directly monitor network devices, unlike Broadcom.

Broadcom offers Spectrum and Performance Management tools that primarily work on SNMP to collect data from network devices. Splunk doesn't have a directly comparable functionality for network devices.

While Splunk offers a wider range of data collection, including metrics, logs, and more, it can be more expensive. Splunk's licensing model is based on data volume (terabytes) rather than the number of devices. This can be costlier compared to Broadcom or similar tools, which often use device-based licensing.

The end-to-end visibility is lacking because Splunk cannot directly monitor network devices.

Broadcom provides a topology-based root cause analysis that is not available with Splunk.

For how long have I used the solution?

I have been using Splunk Infrastructure Monitoring for 10 years. 

What do I think about the stability of the solution?

Splunk Infrastructure Monitoring is stable. 

How was the initial setup?

Splunk deployment is simplified because it is cloud-based. The deployment takes no more than 15 days to complete.

What's my experience with pricing, setup cost, and licensing?

Splunk's infrastructure monitoring costs can be high because our billing is based on data volume measured in terabytes, rather than the number of devices being monitored.

Replacing legacy systems with Splunk could cost up to $200,000.

What other advice do I have?

I would rate Splunk Infrastructure Monitoring 7 out of 10.

The decision to move from another infrastructure monitoring solution to Splunk should be based on a customer's specific needs. While Splunk offers visually appealing dashboards and access to a wider range of data compared to Broadcom products, pricing can be a significant factor, especially in the Indian market.

Deploying Splunk for a customer can involve higher upfront infrastructure costs. This is because implementing Splunk effectively often requires writing custom queries to filter data and optimize license usage. While this approach minimizes licensing costs, it can be labor-intensive.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Other
Disclosure: My company has a business relationship with this vendor other than being a customer: partner
PeerSpot user
Buyer's Guide
Download our free Splunk Observability Cloud Report and get advice and tips from experienced pros sharing their opinions.
Updated: May 2025
Buyer's Guide
Download our free Splunk Observability Cloud Report and get advice and tips from experienced pros sharing their opinions.