Documentation Engineer at a tech vendor with 1,001-5,000 employees
Real User
Top 20
May 16, 2026
My main use case for Gremlin Reliability Management Platform is to see how our applications behave under extreme stress and how resilient our application is when a simulation of server crash alongside increased network latency happens. We want to see how our applications can hold up to that. Recently, I used Gremlin Reliability Management Platform to create a simulation where we increased the network latency significantly and killed one or two containers or pods in our Kubernetes cluster, testing how resilient our application is in such extreme stress scenarios. The results were impressive because it provided us insight into weaknesses such as poor failure mechanisms or scaling issues. In addition to reliability testing, I also use Gremlin Reliability Management Platform for automated test scenarios, allowing us to perform dependency loss tests, latency testing, and scalability checks for our applications. It provides great disaster recovery validation checks and security certificate expiration checks.
The primary reason I am using Gremlin Reliability Management Platform is to proactively test failures, identify weaknesses in my system, and fix them before real incidents actually occur. From a proactive test failure standpoint, I was able to break the ice in terms of fear of breaking in production while using Gremlin Reliability Management Platform. I was able to make sure my teams do not hesitate in terms of pushing the code into production with all the safety nets that we have. Additionally, the blast radius has been controlled tremendously through our test program which minimizes the impact on many users, and by increasing the observability of our platform as well. Out of those features, I rely upon dependency mapping test case testing very often in my day-to-day work. Dependency mapping automatically discovers service dependencies and also tests scenarios such as dependency failures and increased latency. Especially for my microservices type of architecture, these failures cascade very easily. Using Gremlin Reliability Management Platform, I was able to ensure that those microservices cascading failures are restricted through dependency mapping and dependency testing.
Performance Test Engineer at a educational organization with 51-200 employees
Real User
Top 10
Apr 14, 2026
My main use case for Gremlin Reliability Management Platform is to analyze the failures in AWS. I cannot provide details about how I use Gremlin Reliability Management Platform to analyze periods in AWS as it is not according to company policy, but I have used Gremlin Reliability Management Platform to find and to test the Kubernetes of the application we have deployed on AWS. I do not have anything else to add about my main use case with Gremlin Reliability Management Platform, including how often I run these tests or what kind of results I typically look for. Gremlin Reliability Management Platform is deployed in our organization on the public cloud.
Senior Software Engineer at a sports company with 10,001+ employees
Real User
Top 20
Mar 11, 2026
My main use case for Gremlin Reliability Management Platform is that we wanted to do chaos engineering, and in order for us to orchestrate the tests better, Gremlin helped us a lot. A quick specific example of a chaos engineering test I've run using Gremlin is that one use case that actually helped us was to simulate a CPU spike on one of our servers, because it was harder for us in production to simulate a spike in CPU servers as we need. Gremlin helped us to spike the CPU servers. I have a lot to add about how I'm using Gremlin Reliability Management Platform, as there were many experiments that have actually helped us. Auto-scaling was one thing that we actually wanted to see how it works. It was difficult for us to experiment and see how different auto-scaling strategies are working based on CPU utilization and whether they will automatically scale down. We wanted to see it live if it is happening because it relates directly and correlates to the costing of our services on the cloud. Using Gremlin Reliability Management Platform, when we launched some CPU spikes and intentionally reduced the utilization of an API, we were able to see the auto-scaling up and down. It helped us save a lot of costs and select the right instances.
Dev Ops To Development (IT) at a non-tech company with self employed
Real User
Top 10
Mar 2, 2026
My main use case for Gremlin Reliability Management Platform is chaos testing. I take my infrastructure and then I sabotage some things to see how they reach the goal. I try network or infrastructure attacks mainly, and I play every code on Gremlin Reliability Management Platform. Regarding a memorable incident, I found a lot of vulnerabilities in some SMTP servers, and I fixed it with Gremlin Reliability Management Platform. It is interesting because Gremlin Reliability Management Platform is not a penetration tester, but by disrupting other parts of the infrastructure and then running some other tests, it serves this purpose effectively.
DevOps & Mlops Engineer at a printing company with 1-10 employees
Real User
Top 20
Mar 2, 2026
My main use case for Gremlin Reliability Management Platform is the Chaos Engineering part for software. A quick specific example of how I've used Gremlin Reliability Management Platform for Chaos Engineering in my work is with a web service we have, where we need to know the reliability score of it. We conducted chaos experiments with it, including a network experiment, black hole, CPU, and memory experiments, that create chaos for the service, and then we receive a reliability score reflecting the service's reliability, especially in a production environment. Gremlin Reliability Management Platform is amazing with the reliability score. There is a built-in Chaos Engineering experiment that can help you to provide this to your service. You run it on your service, and then you receive the reliability score from Gremlin Reliability Management Platform, along with insights on the issues and risks present in your service that you can examine and work on.
Learn what your peers think about Gremlin Reliability Management Platform. Get advice and tips from experienced pros sharing their opinions. Updated: June 2026.
DEVOPS specialist at a media company with 10,001+ employees
Real User
Top 20
Feb 27, 2026
My main use case for Gremlin Reliability Management Platform is to test. We are running a Kubernetes cluster on GCP, and we want to check our clusters, especially node reliability for the HA use case. The way we used to check the Kubernetes cluster is that we have multiple nodes with multiple tags on nodes, and we are deploying different applications on different nodes to ensure that all the nodes are up. We are using Gremlin Reliability Management Platform for chaos engineering to check those nodes in a pre-prod environment. Sometimes, we also check EC2 instances on Amazon.
Site Reliability Engineer at a tech services company with 10,001+ employees
Real User
Top 20
Dec 3, 2025
The Enterprise Reliability Platform serves as my main use case for the next question. A quick specific example of how I use The Enterprise Reliability Platform to maintain reliability and efficiency is that we have our own internal system to track and maintain the reliability and efficiency.
Gremlin Reliability Management Platform empowers organizations to proactively identify and mitigate potential failures. It enhances system resilience through controlled chaos engineering, aiding tech teams in delivering reliable services.Designed for tech-savvy users, Gremlin enables teams to implement chaos engineering effectively to ensure system reliability. It offers precise control over variables, allowing teams to simulate real-world scenarios and fortify system operations. Gremlin...
My main use case for Gremlin Reliability Management Platform is to see how our applications behave under extreme stress and how resilient our application is when a simulation of server crash alongside increased network latency happens. We want to see how our applications can hold up to that. Recently, I used Gremlin Reliability Management Platform to create a simulation where we increased the network latency significantly and killed one or two containers or pods in our Kubernetes cluster, testing how resilient our application is in such extreme stress scenarios. The results were impressive because it provided us insight into weaknesses such as poor failure mechanisms or scaling issues. In addition to reliability testing, I also use Gremlin Reliability Management Platform for automated test scenarios, allowing us to perform dependency loss tests, latency testing, and scalability checks for our applications. It provides great disaster recovery validation checks and security certificate expiration checks.
The primary reason I am using Gremlin Reliability Management Platform is to proactively test failures, identify weaknesses in my system, and fix them before real incidents actually occur. From a proactive test failure standpoint, I was able to break the ice in terms of fear of breaking in production while using Gremlin Reliability Management Platform. I was able to make sure my teams do not hesitate in terms of pushing the code into production with all the safety nets that we have. Additionally, the blast radius has been controlled tremendously through our test program which minimizes the impact on many users, and by increasing the observability of our platform as well. Out of those features, I rely upon dependency mapping test case testing very often in my day-to-day work. Dependency mapping automatically discovers service dependencies and also tests scenarios such as dependency failures and increased latency. Especially for my microservices type of architecture, these failures cascade very easily. Using Gremlin Reliability Management Platform, I was able to ensure that those microservices cascading failures are restricted through dependency mapping and dependency testing.
My main use case for Gremlin Reliability Management Platform is to analyze the failures in AWS. I cannot provide details about how I use Gremlin Reliability Management Platform to analyze periods in AWS as it is not according to company policy, but I have used Gremlin Reliability Management Platform to find and to test the Kubernetes of the application we have deployed on AWS. I do not have anything else to add about my main use case with Gremlin Reliability Management Platform, including how often I run these tests or what kind of results I typically look for. Gremlin Reliability Management Platform is deployed in our organization on the public cloud.
My main use case for Gremlin Reliability Management Platform is that we wanted to do chaos engineering, and in order for us to orchestrate the tests better, Gremlin helped us a lot. A quick specific example of a chaos engineering test I've run using Gremlin is that one use case that actually helped us was to simulate a CPU spike on one of our servers, because it was harder for us in production to simulate a spike in CPU servers as we need. Gremlin helped us to spike the CPU servers. I have a lot to add about how I'm using Gremlin Reliability Management Platform, as there were many experiments that have actually helped us. Auto-scaling was one thing that we actually wanted to see how it works. It was difficult for us to experiment and see how different auto-scaling strategies are working based on CPU utilization and whether they will automatically scale down. We wanted to see it live if it is happening because it relates directly and correlates to the costing of our services on the cloud. Using Gremlin Reliability Management Platform, when we launched some CPU spikes and intentionally reduced the utilization of an API, we were able to see the auto-scaling up and down. It helped us save a lot of costs and select the right instances.
My main use case for Gremlin Reliability Management Platform is chaos testing. I take my infrastructure and then I sabotage some things to see how they reach the goal. I try network or infrastructure attacks mainly, and I play every code on Gremlin Reliability Management Platform. Regarding a memorable incident, I found a lot of vulnerabilities in some SMTP servers, and I fixed it with Gremlin Reliability Management Platform. It is interesting because Gremlin Reliability Management Platform is not a penetration tester, but by disrupting other parts of the infrastructure and then running some other tests, it serves this purpose effectively.
My main use case for Gremlin Reliability Management Platform is the Chaos Engineering part for software. A quick specific example of how I've used Gremlin Reliability Management Platform for Chaos Engineering in my work is with a web service we have, where we need to know the reliability score of it. We conducted chaos experiments with it, including a network experiment, black hole, CPU, and memory experiments, that create chaos for the service, and then we receive a reliability score reflecting the service's reliability, especially in a production environment. Gremlin Reliability Management Platform is amazing with the reliability score. There is a built-in Chaos Engineering experiment that can help you to provide this to your service. You run it on your service, and then you receive the reliability score from Gremlin Reliability Management Platform, along with insights on the issues and risks present in your service that you can examine and work on.
My main use case for Gremlin Reliability Management Platform is to test. We are running a Kubernetes cluster on GCP, and we want to check our clusters, especially node reliability for the HA use case. The way we used to check the Kubernetes cluster is that we have multiple nodes with multiple tags on nodes, and we are deploying different applications on different nodes to ensure that all the nodes are up. We are using Gremlin Reliability Management Platform for chaos engineering to check those nodes in a pre-prod environment. Sometimes, we also check EC2 instances on Amazon.
The Enterprise Reliability Platform serves as my main use case for the next question. A quick specific example of how I use The Enterprise Reliability Platform to maintain reliability and efficiency is that we have our own internal system to track and maintain the reliability and efficiency.