No more typing reviews! Try our Samantha, our new voice AI agent.

Datadog vs Gremlin Reliability Management Platform comparison

 

Comparison Buyer's Guide

Executive Summary

Review summaries and opinions

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

ROI

Sentiment score
6.5
Datadog boosts efficiency, reduces incident detection time, and enhances system reliability, improving resource utilization and operational performance.
Sentiment score
6.7
Gremlin Platform cut testing staff, reduced errors, improved uptime, and increased efficiency with 30% fewer production issues.
Previously we had thirteen contractors doing the monitoring for us, which is now reduced to only five.
IT Manager at Liberty Mutual Insurance
Datadog has delivered more than its value through reduced downtime, faster recovery, and infrastructure optimization.
Sr. Cloud Infrastructure Engineer at a tech vendor with 51-200 employees
We have also seen fewer escalations for minor issues because alerts help us catch problems earlier, which indirectly reduces downtime and improves overall efficiency.
Network Security Consultant at NTT DATA
We are seeing a return on investment from using Gremlin Reliability Management Platform because we are getting less production issues by thirty percent, as I mentioned earlier, making it a great investment.
DEVOPS specialist at a media company with 10,001+ employees
We do not need to look at all the day's metrics on Grafana dashboards; we run our chaos experiments in a production environment to see how reliable our product or service is.
DevOps & Mlops Engineer at a printing company with 1-10 employees
If we needed ten people to do tests once upon a time, now, using Gremlin Reliability Management Platform, we can do it with a fifty percent reduction in employees.
Senior Software Engineer at a sports company with 10,001+ employees
 

Customer Service

Sentiment score
6.7
Datadog support is generally reliable with strong documentation, but feedback varies regarding response time and communication effectiveness.
Sentiment score
8.4
Gremlin's customer support is highly praised for responsiveness, effective solutions, and valuable subscription models, enhancing overall customer satisfaction.
When I have additional questions, the ticket is updated with actual recommendations or suggestions pointing me in the correct direction.
Applications Web Services Technical Engineer at Ace Hardware
Overall, the entire Datadog comprehensive experience of support, onboarding, getting everything in there, and having a good line of feedback has been exceptional.
Systems Administrator at Townsquare Interactive
I've had a couple instances where I reached out to Datadog's support team, and they have been really super helpful and very kind, even reaching back out after resolving my issues to check if everything's going well.
Security Engineer at Invitation Homes
When I have questions or run into issues with Gremlin Reliability Management Platform, their support team is helpful and responsive.
DevOps & Mlops Engineer at a printing company with 1-10 employees
The expert partnership model is a significant strength I can suggest for Gremlin Reliability Management Platform.
VP Global at a tech vendor with 10,001+ employees
The customer support for Gremlin Reliability Management Platform is good overall.
DEVOPS specialist at a media company with 10,001+ employees
 

Scalability Issues

Sentiment score
7.6
Datadog excels in scalability and efficient workload handling, though managing log volumes is vital to control costs.
Sentiment score
7.7
Gremlin's platform scales well on AWS and GCP, smoothly supporting chaos experiments and larger teams with positive user experiences.
Datadog's scalability has been great as it has been able to grow with our needs.
IT Manager at Liberty Mutual Insurance
Since it is a SaaS platform, we did not have to worry about backend scaling.
Network Security Consultant at NTT DATA
We have not faced any major performance issues from the platform side; it handles increased metrics and monitoring loads smoothly.
Cyber Security Consultant at ProTechmanize
Gremlin Reliability Management Platform scales smoothly for running more chaos experiments, adding more services, or supporting a larger team.
DevOps & Mlops Engineer at a printing company with 1-10 employees
More than scalability, I thought about availability because it is a really important thing of the architecture tools.
Dev Ops To Development (IT) at a non-tech company with self employed
The scalability of Gremlin Reliability Management Platform depends on the scalability of the underlying infrastructure that we are hosting it on.
Senior Software Engineer at a sports company with 10,001+ employees
 

Stability Issues

Sentiment score
8.0
Datadog is stable and reliable with minimal downtime, efficiently resolving minor issues and ensuring high availability for users.
Sentiment score
9.2
The Gremlin Reliability Management Platform is praised for its stability and reliability, with users highlighting its dependable performance.
Metrics collection and alerting have been consistent in day-to-day use.
Cyber Security Consultant at ProTechmanize
Datadog is very stable, as there hasn't been any downtime or issues since I've been here, and it's always on time.
Security Engineer at Invitation Homes
Datadog seems stable in my experience without any downtime or reliability issues.
Full Stack Developer at Townsquare Interactive
I have not seen any downtime or issues with its behavior or performance.
Senior Software Engineer at a sports company with 10,001+ employees
 

Room For Improvement

Datadog users seek improved alert quality, cost management, integration, mobile support, and enhanced UI, UX, and training resources.
The Gremlin Reliability Platform requires AI enhancements, better integration, user-friendly features, and more educational resources to improve usability and value.
It would be great to see stronger AI-driven anomaly detection and predictive analytics to help identify potential issues before they impact performance.
Operations Manager at a financial services firm with 1,001-5,000 employees
We want to be able to customize the cost part, and we would appreciate more granular access control.
Service Manager at PwC
Having more transparent and granular cost control features would make it easier to manage usage.
Network Security Consultant at NTT DATA
I think it would be useful to have some integration with Splunk or other log collectors, or maybe in the future, the ability to link Dynatrace or any other observability platform.
Dev Ops To Development (IT) at a non-tech company with self employed
If we can integrate it with natural language, could we talk to Gremlin Reliability Management Platform and have it configure some of the basic settings so that non-technical persons can also work on Gremlin Reliability Management Platform-like tools?
DEVOPS specialist at a media company with 10,001+ employees
The user interface is great, the integration is smooth, and Gremlin Reliability Management Platform has a fantastic support team that helps us a lot in many cases.
DevOps & Mlops Engineer at a printing company with 1-10 employees
 

Setup Cost

Enterprise buyers should monitor data usage and understand Datadog's pricing to manage costs effectively and avoid unexpected expenses.
Enterprise buyers find Gremlin costly yet valuable for large-scale systems, though pricing and dashboard clarity vary by company.
The setup cost for Datadog is more than $100.
Senior Performance and Architecture Analyst at a manufacturing company with 10,001+ employees
Pricing is mainly based on data ingestion, such as logs, metrics, and traces, and it can increase quickly if everything is enabled by default.
Cyber Security Consultant at ProTechmanize
Everybody wants the agent installed, but we only have so many dollars to spread across, so it's been difficult for me to prioritize who will benefit from Datadog at this time.
Applications Web Services Technical Engineer at Ace Hardware
It is not so cheap, but it has very powerful features.
Dev Ops To Development (IT) at a non-tech company with self employed
From a pricing standpoint of view regarding Gremlin Reliability Management Platform, I would say it is a bit expensive, but that expense is worth it given the kind of benefits it offers.
VP Global at a tech vendor with 10,001+ employees
My role does not incur costs for us since we have an NFR for Gremlin Reliability Management Platform that we can use in our case.
DevOps & Mlops Engineer at a printing company with 1-10 employees
 

Valuable Features

Datadog excels in seamless cloud integration, customizable dashboards, real-time monitoring, enhancing operational efficiency and troubleshooting capabilities.
Gremlin's platform improves reliability with automated tests, failure simulations, risk detection, flexibility, and measurable infrastructure resilience.
Our architecture is written in several languages, and one area where Datadog particularly shines is in providing first-class support for a multitude of programming languages.
Senior Software Engineer at Los Angeles Times Communications, LLC
Having all that associated analytics helps me in troubleshooting by not having to bounce around to other tools, which saves me a lot of time.
Senior Site Reliability Engineer at a wholesaler/distributor with 5,001-10,000 employees
Datadog was able to find the alerts and trigger to notify our team in a very prompt manner before it got worse, allowing us to promptly adjust and remediate the situation in time.
Security Engineer at Invitation Homes
There are really two pathways along: fewer incidents because with Gremlin Reliability Management Platform, we can make every part of the infrastructure more solid, and less downtime because we can test more architectures and then things like how to put in high availability clusters.
Dev Ops To Development (IT) at a non-tech company with self employed
We fix failures even before they occur, which is basically proactive risk detection and risk mitigation.
VP Global at a tech vendor with 10,001+ employees
The best feature that Gremlin Reliability Management Platform offers for me is the prebuilt reliability test; I think that is the best feature along with the automated scheduling.
DEVOPS specialist at a media company with 10,001+ employees
 

Categories and Ranking

Datadog
Ranking in Application Performance Monitoring (APM) and Observability
1st
Ranking in IT Infrastructure Monitoring
2nd
Average Rating
8.6
Reviews Sentiment
7.0
Number of Reviews
210
Ranking in other categories
Network Monitoring Software (4th), Log Management (4th), Container Monitoring (3rd), Cloud Monitoring Software (1st), AIOps (1st), Cloud Security Posture Management (CSPM) (5th), AI Observability (1st)
Gremlin Reliability Managem...
Ranking in Application Performance Monitoring (APM) and Observability
23rd
Ranking in IT Infrastructure Monitoring
25th
Average Rating
8.8
Reviews Sentiment
7.0
Number of Reviews
7
Ranking in other categories
DevSecOps (7th)
 

Mindshare comparison

As of May 2026, in the Application Performance Monitoring (APM) and Observability category, the mindshare of Datadog is 4.7%, down from 9.3% compared to the previous year. The mindshare of Gremlin Reliability Management Platform is 0.1%. It is calculated based on PeerSpot user engagement data.
Application Performance Monitoring (APM) and Observability Mindshare Distribution
ProductMindshare (%)
Datadog4.7%
Gremlin Reliability Management Platform0.1%
Other95.2%
Application Performance Monitoring (APM) and Observability
 

Featured Reviews

Dhroov Patel - PeerSpot reviewer
Site Reliability Engineer at Grainger
Has improved incident response with better root cause visibility and supports flexible on-call scheduling
Datadog needs to introduce more hard limits to cost. If we see a huge log spike, administrators should have more control over what happens to save costs. If a service starts logging extensively, I want the ability to automatically direct that log into the cheapest log bucket. This should be the case with many offerings. If we're seeing too much APM, we need to be aware of it and able to stop it rather than having administrators reach out to specific teams. Datadog has become significantly slower over the last year. They could improve performance at the risk of slowing down feature work. More resources need to go into Fleet Automation because we face many problems with things such as the Ansible role to install Datadog in non-containerized hosts. We mainly want to see performance improvements, less time spent looking at costs, the ability to trust that costs will stay reasonable, and an easier way to manage our agents. It is such a powerful tool with much potential on the horizon, but cost control, performance, and agent management need improvement. The main issues are with the administrative side rather than the actual application.
VL
Senior Software Engineer at a sports company with 10,001+ employees
Chaos experiments have revealed weak points and now provide controlled cost-saving tests
The best features of Gremlin Reliability Management Platform are the safe failure injection, which is crucial as we can simulate the failures in a manner that we know these are just dumping tests and not the actual issues. Whether it is the CPU spike or the memory exhaustion, or the network latency, or the server shutdown, server shutdown is one of the most favorite features that I have in Gremlin Reliability Management Platform. The controlled blast radius is another standout feature. The controlled blast radius feature has helped my team in that we actually wanted to target only one specific container, our Docker containers that we deployed. It helped us to conduct tests in a very specific, isolated manner instead of launching a larger test or focusing on hundreds of servers at a time, resulting in very limited impact. Since ours is a very small team, we do not want to impact other servers. This controlled blast radius helped us to only focus on our servers and not impact any other team. Gremlin Reliability Management Platform has positively impacted my organization because before Gremlin Reliability Management Platform, we did not even know how to conduct these chaos engineering tests. We heard about it, but we had no idea of how to do something of that nature. If there are ten servers, ten systems in our architecture and if suddenly something goes down, nobody knew what would happen next. We did not even know how to simulate these types of tests. This lack of confidence has been mitigated by using Gremlin Reliability Management Platform. Now we can confidently test and see which system is the most critical. If this goes down, what happens? How much business valuation are we going to impact? How much loss are we going to incur? All of this is now clearly visible and transparent. Since using Gremlin Reliability Management Platform, we were able to reduce the incidents by six percent after conducting our limited experiments. We were also able to increase the uptime from ninety-eight to ninety-nine, which represents a one percent increase in uptime.
report
Use our free recommendation engine to learn which Application Performance Monitoring (APM) and Observability solutions are best for your needs.
894,738 professionals have used our research since 2012.
 

Top Industries

By visitors reading reviews
Financial Services Firm
14%
Computer Software Company
9%
Manufacturing Company
8%
Healthcare Company
6%
Printing Company
11%
Construction Company
11%
Financial Services Firm
10%
Sports Company
10%
 

Company Size

By reviewers
Large Enterprise
Midsize Enterprise
Small Business
By reviewers
Company SizeCount
Small Business82
Midsize Enterprise47
Large Enterprise100
By reviewers
Company SizeCount
Small Business3
Large Enterprise6
 

Questions from the Community

Any advice about APM solutions?
There are many factors and we know little about your requirements (size of org, technology stack, management systems, the scope of implementation). Our goal was to consolidate APM and infra monitor...
Datadog vs ELK: which one is good in terms of performance, cost and efficiency?
With Datadog, we have near-live visibility across our entire platform. We have seen APM metrics impacted several times lately using the dashboards we have created with Datadog; they are very good c...
Which would you choose - Datadog or Dynatrace?
Our organization ran comparison tests to determine whether the Datadog or Dynatrace network monitoring software was the better fit for us. We decided to go with Dynatrace. Dynatrace offers network ...
What needs improvement with Gremlin Reliability Management Platform?
There are certain areas where I think Gremlin Reliability Management Platform can improve. I would certainly add features related to AI and GenAI for recommendations. While dependency identificatio...
What is your primary use case for Gremlin Reliability Management Platform?
The primary reason I am using Gremlin Reliability Management Platform is to proactively test failures, identify weaknesses in my system, and fix them before real incidents actually occur. From a pr...
What advice do you have for others considering Gremlin Reliability Management Platform?
I would certainly suggest others venture into Gremlin Reliability Management Platform, as there is no second thought about it. However, I would not recommend jumping straight into production chaos....
 

Comparisons

No data available
 

Overview

 

Sample Customers

Adobe, Samsung, facebook, HP Cloud Services, Electronic Arts, salesforce, Stanford University, CiTRIX, Chef, zendesk, Hearst Magazines, Spotify, mercardo libre, Slashdot, Ziff Davis, PBS, MLS, The Motley Fool, Politico, Barneby's
Information Not Available
Find out what your peers are saying about Datadog vs. Gremlin Reliability Management Platform and other solutions. Updated: April 2026.
894,738 professionals have used our research since 2012.