What is our primary use case?
In our organization, we are using Splunk Observability Cloud for real-time monitoring and troubleshooting of our applications and the infrastructure performance, tracking metrics such as CPU usage, memory, latency, and the services of different microservices which we run for our applications and products.
What is most valuable?
The best features from Splunk Observability Cloud include the high-level dashboard for clear visibility of our infrastructure and the product, as well as the detailed traces for the request flow of our APIs and the in-between application communication. From the detailed traces, we can know where our application fails, allowing us to solve incidents very easily, which has drastically reduced the MTTR of our application.
I find the out-of-the-box dashboards very helpful. Although we have not done much customization yet, the out-of-the-box dashboards and detection capabilities include pre-built dashboards for common services and infrastructure components. We have not used them extensively, but we customize them for our organization's needs, and we also adapt the detectors for alerting purposes.
I find the AI-powered analytics very helpful because we have also used other observability platforms such as SignalFX, where the AI-powered analytics is not built into the application. Here, the AI provides intelligent insights and very early anomaly detection and pattern recognition, automatically informing us of highly unusual behavior in the application before any incident or outage occurs during production.
What needs improvement?
One area that has room for improvement is the pricing; as I mentioned, it can be expensive due to large data volumes. Also, the pricing can be unpredictable, and if it were more predictable, the organization would be more comfortable with it. Additionally, I found the learning curve quite steep when I started using Splunk Observability Cloud; it took me some time to learn it. I also think that while our team is large enough to utilize it, smaller teams might not prefer this solution.
We have not started customizing Splunk Observability Cloud yet according to our needs, but we plan to in the next weeks. We have used the basic customization features, and I believe it is customizable.
For how long have I used the solution?
I have been using Splunk Observability Cloud for the last one year; I have joined my recent organization from the last three to four months, where I have been using it from the last three to four months.
What do I think about the stability of the solution?
The stability and reliability of Splunk Observability Cloud is top-notch, as we have not faced much downtime, so I would rate it nine.
What do I think about the scalability of the solution?
The scalability of Splunk Observability Cloud is also very good; we can ingest any data we desire, so I would rate that nine as well.
How are customer service and support?
I rate the technical support as very proactive, and our doubts and queries are resolved properly, so I would give it a rating of five.
Which solution did I use previously and why did I switch?
Before using Splunk Observability Cloud, we had used SignalFX and considered vendors such as Datadog and New Relic. We chose Splunk Observability Cloud because of its vast features, the visibility we gain from the dashboard, the AI integrated into the platform, detailed traces, and logging capabilities. While Datadog and New Relic are also good, Splunk Observability Cloud is better in certain areas.
How was the initial setup?
The deployment part was handled by the other developers and ops engineers in my organization, but I know the initial setup for Splunk Observability Cloud is simple and very easy.
What about the implementation team?
The deployment part was handled by the other developers and ops engineers in my organization.
What was our ROI?
From an ROI perspective, Splunk Observability Cloud offers much higher value because, as I mentioned earlier, our MTTR has reduced by more than 50%, which decreases the overall downtime for our application. When there is an outage, the time to resolve is shorter, and application uptime has also increased because of it. This improvement is the main reason for using Splunk Observability Cloud; we wanted to decrease our application downtime. Additionally, the visibility provided by the dashboard helps us understand where our application has failed.
Which other solutions did I evaluate?
Before using Splunk Observability Cloud, we had used SignalFX and considered vendors such as Datadog and New Relic. We chose Splunk Observability Cloud because of its vast features, the visibility we gain from the dashboard, the AI integrated into the platform, detailed traces, and logging capabilities. While Datadog and New Relic are also good, Splunk Observability Cloud is better in certain areas.
What other advice do I have?
I have not used the no-sample tracing feature yet, so I am not sure about that.
I would say it takes around one month to learn Splunk Observability Cloud; it varies from person to person, but that was my experience in learning all the features and use cases our organization employs.
Our company is not deeply involved in LLMs and GPUs for AI applications; our applications mainly run on normal Java processes on standard servers, not on GPUs or LLMs yet. We are in the process of developing our capabilities in AI later on.
We are using normal servers as a cloud-based solution, but we still have some drawbacks, mainly the pricing part, as smaller teams may not find it suitable, and the pricing model is complex while the learning curve is steep, particularly for the SignalFlow query language.
My advice for anyone considering this solution is to opt for Splunk Observability Cloud without any hesitation, as it can drastically decrease the mean time to resolution and mean time to detect any issues in their applications. The overall visibility of the organization, including application usage and memory metrics, is clearly presented on the dashboard, allowing insights into what went wrong and when. Although the learning curve can be challenging initially, users will adapt and find it very beneficial for their organization.
I would describe the pricing as neither too high nor too low; however, if it could be cheaper, it would be beneficial for us since sometimes due to large data volumes, it can be expensive for the organization to track large datasets, as it charges for large volumes of data. Sometimes it can be costly if the data we are receiving is irrelevant.
Our organization has between 200 to 500 people, and I believe that more than 100 people are using Splunk Observability Cloud, including developers, ops engineers, security engineers, and others. I am not certain of the exact number, but it is definitely more than 50.
I would rate this product overall at a nine.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Disclosure: My company does not have a business relationship with this vendor other than being a customer.