Datadog has flexibility.
System Engineer at Raymond James
A stable and scalable infrastructure monitoring solution
Pros and Cons
- "Datadog has flexibility."
- "The product needs to have more enterprise approach to configuration."
What is most valuable?
What needs improvement?
The product needs to have more enterprise approach to configuration.
For how long have I used the solution?
We use the tool to monitor our whole infrastructure. CPU, memory, and disk space are the types of things we use it for.
What do I think about the stability of the solution?
It is a stable solution.
Buyer's Guide
Datadog
June 2025

Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: June 2025.
856,873 professionals have used our research since 2012.
What do I think about the scalability of the solution?
It is a scalable solution.
How are customer service and support?
The technical support team is good and responsive.
How would you rate customer service and support?
Positive
How was the initial setup?
The initial setup is not very easy and the deployment took eight months.It took quite a few teams to get it all accomplished. I rate it a six out of ten.
What other advice do I have?
I rate the solution eight out of ten.
Which deployment model are you using for this solution?
On-premises
Disclosure: My company does not have a business relationship with this vendor other than being a customer.

Software Engineer at a financial services firm with 10,001+ employees
Helpful support, good RUM monitoring, and nice dashboards
Pros and Cons
- "I really enjoy the RUM monitoring features of Datadog. It allows us to monitor user behavior in a way we couldn't before."
- "At times, it can be hard to generate metrics out of logs."
What is our primary use case?
We use it to monitor and alert our ECS instances as well as other AWS services, including DynamoDB, API Gateway, etc.
We have it connected to Pagerduty for alerting all our cloud applications.
We also use custom RUM monitoring and synthetic tests for both our internal and public-facing websites.
For our cloud applications, we can use Datadog to define our SLOs, and SLIs and generate dashboards that are used to monitor SLOs and report them to our senior leadership.
How has it helped my organization?
Datadog has been able to improve our cloud-native monitoring significantly, as CloudWatch doesn't have enough features to create robust, sustainable dashboards that are easily able to present all the information in an aggregated manner in one place for a combination of applications, databases, and other services including our UI applications.
RUM monitoring is also something we didn't have before Datadog. We had Splunk, which was a lot harder to set up than Datadog's custom RUM metrics and its dashboards.
What is most valuable?
I really enjoy the RUM monitoring features of Datadog. It allows us to monitor user behavior in a way we couldn't before.
It's useful to be able to obfuscate sensitive information by setting up custom RUM actions and blocking the default ones with too much data.
I also like being able to generate custom metrics and monitors by adding facets to existing logging. Datadog can parse logs well for that purpose. The primary method of error detection for our external website is synthetic tests. This is extremely valuable for us as we have a large user base.
What needs improvement?
At times, it can be hard to generate metrics out of logs. I've seen some of those break over time and have flakey data available.
Creating a monitor out of the metric and using it in a dashboard to generate our SLIs and SLOs has been hard, especially in cases where the data comes from nested logging facets.
For how long have I used the solution?
I've used the solution for two years.
What do I think about the stability of the solution?
The stability is pretty good.
What do I think about the scalability of the solution?
The solution is pretty scalable! It's hard to set up all the infra (terraform code) required to link private links in Datadog to all of our different AWS accounts.
How are customer service and support?
They offer good support. Solutions are provided by the team when needed. For example, we had to delete all our RUM metrics when we accidentally logged sensitive data and the CTO of Datadog stepped in to help out and prioritize it at the time.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
We previously used Splunk and some internal tools. We switched due to the fact that some cloud applications don't integrate well with pre-existing solutions.
How was the initial setup?
The initial setup for connecting our different AWS accounts via Datadog private link wasn't great. There was a lot of duplicate terraform that had to be written. The dashboard setup is way easier.
What about the implementation team?
We installed it with the help of a vendor team.
What was our ROI?
Our return on investment is great and is so much better than CloudWatch. We can easily integrate with Pagerduty for alerting.
What's my experience with pricing, setup cost, and licensing?
Our company set up the product for us, so the engineers didn't need to be involved with pricing.
The pricing structure isn't very clear to engineers.
Which other solutions did I evaluate?
We looked into Splunk and some internal tools.
Which deployment model are you using for this solution?
Private Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Buyer's Guide
Datadog
June 2025

Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: June 2025.
856,873 professionals have used our research since 2012.
Associate at a financial services firm with 10,001+ employees
Great for debugging with good UI and helpful filtering capabilities
Pros and Cons
- "It is easy to navigate the menu and create tests."
- "This service could be less costly."
What is our primary use case?
We use the product for recording loggers on our various services across different teams. For example, we use logs to keep track of info logs for events and error logs to catch exceptions.
When users ask us to investigate a situation, we use logs to keep track of events and where the user's code traveled to. We also use synthetic testing and monitoring features to keep track of our many alerts in the production and QA environments.
How has it helped my organization?
We use Datadog mainly for debugging purposes. For example, we use it to navigate where the code trace is when an issue arises due to its ability to search through the logs.
We also use it to address user queries. Sometimes users would ask us a certain question concerning our codebase, we use Datadog to track the code stack and also use time monitoring to get an idea of the time frame around when the use case happened.
What is most valuable?
The feature I have found to be the most valuable is the filtering feature in logs. It is really easy to type plus and minus to filter out different logs. I use it to navigate the noise.
I use synthetic tests as well. It is easy to navigate the menu and create tests.
Much of the UI is very straightforward, and I do appreciate the ability to search for any documentation on the various features when I need to as well. The DASH monitoring boards are nice to give an overview of various performances and allow us to track use cases.
What needs improvement?
This service could be less costly. Right now, we only keep 15 days worth of logs since we want to be more economical in terms of cost. It would be nice if I had the option to monitor logs beyond 15 days. For APM traces, we only keep a year worth of traces. The UI can be a little more straightforward as well. I found it to have too many options.
For how long have I used the solution?
I've used the solution for three years.
What do I think about the stability of the solution?
The stability is good.
What do I think about the scalability of the solution?
The scalability is good.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Senior Software Engineer at LeafLink
Good log stream with a useful APM and democratizes logs
Pros and Cons
- "Datadog's log aggregation is really helpful since it lets me and every other engineer on my team login, view, and share logs when we need to debug our application."
- "The menu on the left is pretty dense (and I know it has to be). I never knew about the cmd+k functionality until recently. It would be helpful to offer more tips/cheat sheets to see handy shortcuts like that."
What is our primary use case?
We use Datadog to view and aggregate logs and monitor all of our services. We have a lot of running infrastructure and it is very convenient to have logs and metrics all aggregated somewhere we can view and chart them.
I use Datadog to create dashboards and runbooks, and sharable graphs, which really help out my whole team. We mostly use logs and APM, yet have been starting to use other products. I would like to use more synthetic monitors.
How has it helped my organization?
It has democratized our logs and metrics, allowing all engineers to have insight into how our apps perform. It is also extremely helpful when debugging issues.
It would be very difficult to debug issues without aggregated logs and APM traces.
It has also definitely saved us some money since we can keep an eye on our running infrastructure in an easy-to-see way, rather than a less friendly CLI. It has been a very big help!
What is most valuable?
The log stream has been the most useful thing. Having so many logs on so many different running containers means it is very inconvenient to view them individually. Datadog's log aggregation is really helpful since it lets me and every other engineer on my team login, view, and share logs when we need to debug our application.
APM has also been extremely helpful for debugging issues and profiling and optimizing our apps. Dashboards have also been really helpful for communicating needs and priorities to engineering leadership.
It is very easy to get buy-in with graphs to back things up.
What needs improvement?
I recently saw the education, and it is amazing. Events like DASH are extremely helpful in understanding the deep set of features. Anything that helps to educate users is a huge win here.
The menu on the left is pretty dense (and I know it has to be). I never knew about the cmd+k functionality until recently. It would be helpful to offer more tips/cheat sheets to see handy shortcuts like that.
For how long have I used the solution?
I've used the solution for three years.
Which solution did I use previously and why did I switch?
We previously used AWS Cloudwatch logs. It was way less friendly and fully featured.
How was the initial setup?
The solution is pretty straightforward to set up. It helps with logs and metrics, and the AWS integration is really great.
What about the implementation team?
We handled the implementation in-house.
What other advice do I have?
It is hard to educate an entire team. There is a big learning curve.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
API Developer at a tech services company with 501-1,000 employees
Good monitoring, logging, and alert features
Pros and Cons
- "Thanks to the logs, we manage to make better reports through Jira and also to trace the request with more facility than we would be able to do otherwise."
- "When the logs are too big, and Datadog splits them, the JSON format breaks and it is not so useful for us."
What is our primary use case?
We use the solution for monitoring, logging, and alerts.
Thanks to Datadog, we report errors using the logger integrated into our services, which is crucial since we only do unit tests. The infrastructure team handles the monitoring part, so I can't give more insights about that. I am an API developer, so I use Datadog mainly for logging.
The alerts are connected to Microsoft Teams in a specific channel, and we pay a lot of attention to it, and we usually create tickets based on these alerts.
How has it helped my organization?
Thanks to the logs, we manage to make better reports through Jira and also to trace the request with more facility than we would be able to do otherwise.
Since there are many teams in my company, the fact that we can share the trace of an error, for example, together with all the information about the log, we are able to save a lot of time when it comes to communication between everyone.
What is most valuable?
The most valuable feature for me so far is logging. We do not do integration tests, so we rely a lot on tracing all the requests and we report errors to different teams in the company together with logs that we take from Datadog.
Since I am an API developer, I do not use so much with the other features. Also, I have been in the company for only four months. I have only worked with monitors and alters.
I value tracing the request and being able to tell other teams which component, service, or line of code has an issue.
What needs improvement?
Since I have only been in the organization for four months, I only worked with the log, alerts, and monitoring. I do not have so many insights to share about what can be improved.
I am not an expert user, and not even an intermediate user yet. Rather, I am a beginner.
That said when the logs are too big, and Datadog splits them, the JSON format breaks and it is not so useful for us.
For how long have I used the solution?
I've used the solution for four months.
Which solution did I use previously and why did I switch?
I did not previously use a different solution.
What's my experience with pricing, setup cost, and licensing?
I will get informed about this, I have no idea about costs as an API developer. But I get curious about it
Which other solutions did I evaluate?
I did not evaluate other options previously.
Which deployment model are you using for this solution?
Hybrid Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Google
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Customizable and helpful for isolating and filtering environments
Pros and Cons
- "We have way more observability than what we had before - on the application and the overall system."
- "Auto instrumentation on tracing has not been very easy to find in the documentation."
What is our primary use case?
We use Datadog for observability and system/application health, mainly for product support, triaging, debugging, and incident responses.
We use a lot of the logging and the Datadog agent to collect logs, metrics, and traces from our GKE workloads. We use APM and continuous profiling for latency and performance measurement. We use RUM to observe frontend user events, such as tracing on request and what actions they take before errors occur. We also use error tracking and source maps to debug production failures.
We are still relatively new to the product, and we are planning to use more of the notebook functionality and power packs to record run books and break knowledge silos. We also need to utilize dashboards and continuous profiling more for performance measurement and integrate Datadog alerts for incident response.
How has it helped my organization?
We have way more observability than what we had before - on the application and the overall system. That includes the GKE cluster, nodes, and pods. It's helped with our cloud-run instances, databases, and data storage.
We also started observability in the CI pipeline to measure our CI performance, as it was a pain point for us. We are aiming to do incremental deployments and releases, and the bottleneck so far has been our CI performance. The visibility on which actions or functions take the most time allows us to pinpoint and focus on improving configurations on these.
What is most valuable?
We use structure logging a lot to triage production issues. The querying, attributes and tags manipulation, and customization have been very helpful in isolating and filtering environments. The integration with Winston logger has also been a breeze.
First and foremost, was that structured logging, tags, and attributes have not only allowed us to narrow down to a problem quickly in production, they have also let us create dashboards from these logs to understand more user behaviors, such as how many users stop and leave our application before an upload has completed. That helps us understand how important processing time is to a user.
We also intend to use distributed tracing more to understand where the error has occurred in a particular request.
What needs improvement?
Definitely, documentation could use improvement. As I navigated and try to find instrumentation and implementation details, I discovered inconsistency among SDKs based on languages.
There are also places where highlighting can be improved. I once created an issue on GitHub, and it was resolved right away by an engineer. He pointed out that it was actually in the documentation. I looked again and found it was not very obvious. We were stuck on the problem for days.
Auto instrumentation on tracing has not been very easy to find in the documentation. We ended up using OpenTelemetry, yet the conversion between tracing contexts has been difficult.
For how long have I used the solution?
We've used the solution between six months and a year.
How are customer service and support?
Customer service and support are generally very fast. I did experience one ticket, which involved changing the log index retention period, not being responded to. Any support tickets related to technical issues were resolved pretty fast.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
We used to use GCP Stackdriver for logging and monitoring since our infrastructure is all GCP based. It was lacking a lot, particularly on tracing and structured logging. We often had a lot of trouble triaging and diagnosing a production problem. Datadog's specialty is observability. Since we started using the product, we were able to create dashboards, and utilize APM, continuous profiling, RUM, and distributed tracing for production support and user trends.
Datadog also offers labs and workshops for its products, which is very helpful.
What about the implementation team?
We implemented the product ourselves.
What was our ROI?
I'm not sure what our ROI would be.
What's my experience with pricing, setup cost, and licensing?
We started with on-demand pricing as we were re-writing our product, and we weren't sure about the total usage. After we went into production and released the product, we experienced a price surge. Fortunately, our Datadog account manager reached out to us and suggested a monthly subscription, which is what we'll be switching to.
I'd advise keeping an eye on the usage and possibly setting up some monitoring on price. We didn't have much of a setup cost; we started with a free trial and continued with on-demand after the trial ended.
Which other solutions did I evaluate?
We didn't evaluate many of the other options. However, we do also use OpenTelemetry, which is vendor agnostic and integrates with Datadog.
What other advice do I have?
We always keep the Datadog agent to the latest version.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Google
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Senior Manager at a manufacturing company with 10,001+ employees
Great network monitoring, testing, and integration tools
Pros and Cons
- "The visibility into our network has allowed for quick diagnosis of failures, identification of underutilized or over-utilized resources, and allowed for cloud cost optimization opportunities."
- "I would love to see more metrics or analytics in IoT devices."
What is our primary use case?
This solution is for physical device monitoring across breweries, including PLCs, HMI Cameras, RFID panels, scales, etc. We want to gain visibility into these devices to influence predictive maintenance and unscheduled downtime. We want to monitor physical devices across the zone from a control tower perspective for end users and support teams alike. Understanding more about the performance of the devices and mechanical components will allow us to schedule downtime to fix imminent catastrophic failures and prevent unplanned downtime and lost revenue.
How has it helped my organization?
Previously, we had no visibility into the architectural layout of our infrastructure. The UI of Datadog has allowed for increased visibility and access to broken or underperforming resources or critical pieces of infrastructure. Beyond this, it has allowed us to identify areas where we can optimize cost in our cloud infrastructure.
What is most valuable?
The most valuable features I have found are network monitoring, testing, and integration tools. The visibility into our network has allowed for quick diagnosis of failures, identification of underutilized or over-utilized resources, and allowed for cloud cost optimization opportunities. The ability to correlate metrics has proven useful in determining downstream or upstream issues influencing the device, machine, or database having issues.
What needs improvement?
I would love to see more metrics or analytics in IoT devices.
For how long have I used the solution?
I've been using the solution for approximately two years.
What do I think about the stability of the solution?
I have never experienced an issue or outage.
What do I think about the scalability of the solution?
The solution is very scalable and developed in a fashion that provides the ability to scale easily.
How are customer service and support?
Customer service has been outstanding. They have been timely and knowledgeable with all of my questions.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
We used a different product for the total stack solution.
How was the initial setup?
The initial setup was straightforward.
What about the implementation team?
We handled the setup process in-house.
What was our ROI?
I'm unsure as to if we've seen an ROI.
Which other solutions did I evaluate?
We did evaluate SolarWinds.
Which deployment model are you using for this solution?
Hybrid Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Microsoft Azure
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
VP, Application support at a financial services firm with 10,001+ employees
Good service catalog and dashboard but the application performance monitoring module needs more functionality
Pros and Cons
- "The service catalog helped improve our organization by giving a good view of the flow for our microservices applications."
- "The dashboard could be improved. It would be helpful to get a view of specific things that we need to monitor for our application."
What is our primary use case?
We primarily use the solution for the service catalog.
We use this type of offering for our Microservices applications, and it gives a good view of flow. It is a must when we have different developers working on different services.
Having the trace and log features are useful for locating the microservice for the on-call person.
We would like to see some more useful applications for health monitoring where we can customize the cases based on data from the database.
It needs to have the facility to monitor data inside tables and the status of the UI.
How has it helped my organization?
The service catalog helped improve our organization by giving a good view of the flow for our microservices applications. It's important when we have different developers working on different services and having the trace and log features help the on-call person locate the microservice.
The application performance monitoring has also been useful. This module had a few functionalities that we needed for the application health check. This needs to have some more features to consolidate the view in one tree. We may need more of a one-stop shop on top of the dashboard, and that is missing in Datadog. We'd like to be able to scrap our existing monitoring tool.
What is most valuable?
The service catalog is very useful. We use this type of offering for our Microservices applications, and it gives a good view of flow. It is a must when we have different developers working on different services. Having the trace and log features have been useful in order to locate the microservice for the on-call person.
The dashboard is great. It is helpful to get a view of specific things that we need to monitor for our application. It has been a good way to watch specific things and add them together.
The application performance monitoring is an excellent aspect. This module had a few functionalities that we needed for the application health check. This needs to have some more features to consolidate the view into one tree, however.
What needs improvement?
The dashboard could be improved. It would be helpful to get a view of specific things that we need to monitor for our application. However, it was a good way to watch specific things and add them together.
The application performance monitoring module had very few functionalities that we needed for the application health check. This needs to have some more features to consolidate the view into one tree.
For how long have I used the solution?
I've used the solution for one month.
Which solution did I use previously and why did I switch?
We previously used ITRS Geneos.
What other advice do I have?
We are using the latest version of the solution.
I'd rate the solution seven out of ten.
Which deployment model are you using for this solution?
Hybrid Cloud
Disclosure: My company has a business relationship with this vendor other than being a customer: Provider

Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros
sharing their opinions.
Updated: June 2025
Product Categories
Cloud Monitoring Software Application Performance Monitoring (APM) and Observability Network Monitoring Software IT Infrastructure Monitoring Log Management Container Monitoring AIOps Cloud Security Posture Management (CSPM)Popular Comparisons
Wazuh
Dynatrace
Zabbix
Splunk Enterprise Security
Microsoft Defender for Cloud
Prisma Cloud by Palo Alto Networks
New Relic
IBM Security QRadar
Azure Monitor
Elastic Security
Splunk AppDynamics
Grafana
Elastic Observability
Sentry
Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
- Datadog vs ELK: which one is good in terms of performance, cost and efficiency?
- Any advice about APM solutions?
- Which would you choose - Datadog or Dynatrace?
- What is the biggest difference between Datadog and New Relic APM?
- Which monitoring solution is better - New Relic or Datadog?
- Do you recommend Datadog? Why or why not?
- How is Datadog's pricing? Is it worth the price?
- Anyone switching from SolarWinds NPM? What is a good alternative and why?
- Datadog vs ELK: which one is good in terms of performance, cost and efficiency?
- What cloud monitoring software did you choose and why?