Try our new research platform with insights from 80,000+ expert users
reviewer1486134 - PeerSpot reviewer
Infrastructure Engineer at DATACAMP, INC
Vendor
Easy to set up, supported with good documentation, and the single pane of glass improves efficiency
Pros and Cons
  • "The fact that everything is under a single pane of glass is really valuable, as developers don't have to spend their time copying correlation IDs across tools to find what they need."
  • "The incident management beta looks promising, but it is still missing the ability to automatically create incidents based on certain alerts."

What is our primary use case?

We use Datadog as a monitoring platform to achieve visibility into our container environments.

Almost all of our workloads are containerized and with DataDog, we are able to get metrics, logs, alerts, and events about all the containers that we are running. Our developers also extensively use APM to find and diagnose performance issues that might appear.

We use Terraform to automatically create all of the necessary monitors and dashboards that our developers need to make sure that our level of service is sufficient.

How has it helped my organization?

We implemented Datadog around the same time as the company was growing from 30 to 150 people. Before that, we didn't have a standard stack for monitoring. Each team used their own logging solutions, metrics were missing or non-existent, and it was impossible to correlates metrics collected by different teams. DataDog provided us with an out-of-the-box solution that allowed us to focus on putting in place practices and processes around monitoring, rather than focus on implementation details.

Every squad is now confident in their ability to quickly identify and diagnose issues when they arise.

What is most valuable?

The fact that everything is under a single pane of glass is really valuable, as developers don't have to spend their time copying correlation IDs across tools to find what they need.

Thanks to the unified tagging system, it's really easy to jump around the different Datadog products without losing the context. That makes debugging really easy for developers because they can go from APM to logs to metrics in a few clicks.

Watchdog is also a great feature that helped us identify overlooked issues more than once.

What needs improvement?

The incident management beta looks promising, but it is still missing the ability to automatically create incidents based on certain alerts.

SLOs are also a great way to visualize how you are doing with regard to the level of service that you are providing but it missing crucial components like:

  • The ability to visualize the remaining error budget and how it evolved during the month. An error budget burndown graph would be helpful.
  • The ability to display a different level of alert on an SLO based on how fast it is consuming the error budget. This is the slow burn versus fast burn.
Buyer's Guide
Datadog
May 2025
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: May 2025.
852,649 professionals have used our research since 2012.

For how long have I used the solution?

We've been using Datadog for a bit more than two years.

How are customer service and support?

There is extensive documentation and the support is very reactive.

Which solution did I use previously and why did I switch?

Prior to using Datadog, each team was using their own solutions. This included a mix of custom tooling, third-party tools, and AWS tools.

How was the initial setup?

The initial setup is very easy. 

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer1480866 - PeerSpot reviewer
Director of DevOps at Digital Media Solutions Group
Real User
Provides good visibility across applications, good integration, and helpful support
Pros and Cons
  • "The most valuable features are logging, the extensive set of integrations, and easy jumpstart."
  • "In the past two years, there have been a couple of outages."

What is our primary use case?

We primarily use this product for availability and performance monitoring, log aggregation.

How has it helped my organization?

Datadog gave us awesome visibility across all of our applications.

What is most valuable?

The most valuable features are logging, the extensive set of integrations, and easy jumpstart.

What needs improvement?

In the past two years, there have been a couple of outages.

For how long have I used the solution?

We have been using Datadog for two years.

What do I think about the stability of the solution?

The outages that we have had in the past two years were fixed in a matter of minutes.

What do I think about the scalability of the solution?

So far we did not have any issues with scaling, and everything is working great.

How are customer service and technical support?

Support is awesome.

Which solution did I use previously and why did I switch?

We did use NewRelic, but the logging feature was not as good as it is in Datadog.

How was the initial setup?

The initial setup is straightforward and everything is very well documented and easy to start using.

What about the implementation team?

We implemented it in-house.

Which other solutions did I evaluate?

We evaluated a custom ELK solution, Sumo Logic, and Logentries.

What other advice do I have?

Datadog is already covering much more than we normally need with exceptional quality. This is a great product.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Datadog
May 2025
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: May 2025.
852,649 professionals have used our research since 2012.
reviewer1477686 - PeerSpot reviewer
Senior DevOps Engineer at DigitalOnUs
Real User
Affordably-priced and improves visibility of infrastructure, apps, and services
Pros and Cons
  • "Having a clear view, not only of our infrastructure but our apps and services as well, has brought a great added value to our customers."
  • "The pricing model could be simplified as it feels a bit outdated, especially when you look at the billing model of compute instances vs the containers instances."

What is our primary use case?

Our primary use of Datadog includes: 

  • Keeping a close look into our AWS resources. Monitoring our multiple RDS and ElastiCache instances play a big role in our indicators.
  • Kubernetes. We aren't using all of the available Kubernetes integrations but the few of them that work out of the box adds great value to our metrics.
  • Monitoring and alerting. We wired our most relevant monitoring and alerts to services like PagerDuty, and for the rest of them, we keep our engineers up to date with constant Slack updates. 

How has it helped my organization?

Observability is something that a lot of Companies are trying to achieve. Having a clear view, not only of our infrastructure but our apps and services as well, has brought a great added value to our customers.

For a logging solution, we use to have Papertrail. It did the trick but having a single point that manages and indexes all the logs is a BIG improvement. Also, having the option to generate metrics from logs is a game-changer that we're trying to include in our monitoring strategy.

I would like to say the same about APM but the support for PHP seems to be somewhat lacking. It works but I think this service could provide us more information.

What is most valuable?

With respect to logs, we used to integrate various kinds of tools to achieve very basic tasks and it always felt like a very fragile solution. I think logs are by far the most useful feature and at the same time, the one that we could improve.

APM - This is either a hit or miss, allow me to explain: we use various programming languages, mainly PHP and Ruby, and the traces generated don't always provide all of the information we want. For example, we get a great level of detail for the SQL queries that the app generates but not so much for the PHP side. It's hard to track where exactly where all of the bottlenecks are, so some analysis tools for APM could make a good addition.

What needs improvement?

Please add PHP profiling; you already have it for other popular programming languages such as Python and Java, which is great because we have a little bit of those, but our main app is powered by PHP and we don't have profiling for this yet. I guess it's only a matter of time for this to be added, so in the meanwhile, you can consider this review as a vote for the PHP profiling support.

The pricing model could be simplified as it feels a bit outdated, especially when you look at the billing model of compute instances vs the containers instances.

For how long have I used the solution?

We have been using Datadog for one year.

What do I think about the stability of the solution?

It's pretty stable for the main integrations. There was only one time where Datadog was down and that was scary since all of our monitoring is handled by Datadog. There was a lot of uncertainty while the outage was in place.

What do I think about the scalability of the solution?

For everyday use, it's adequate, but for very specific tasks, not so much. There was a time where I had to do a big export and as expected, the API is somewhat limited. Since it was a one-time task, it was not a big deal but if this was a regular task, I wouldn't be happy about it.

How are customer service and technical support?

For small tasks, I think it's great. For specialized support, it feels like you're under-staffed, having to wait days/weeks for a solution is a big NO-NO.

Which solution did I use previously and why did I switch?

I've used a few other products such as NewRelic and AppDynamics. The switch is usually affected by two factors: pricing and convenience.

How was the initial setup?

Getting APM metrics out of Kubernetes is always a painful task. We got support to take a look at this and we had to go through various iterations to get it right, and then AGAIN the next year. This was a bad experience.

What about the implementation team?

It was all implemented in-house. The documentation is fairly up to date, for the most part.

What's my experience with pricing, setup cost, and licensing?

Pricing is somewhat affordable compared to other solutions but in order to really lower the costs of other products you need to plan very carefully your resources usage, otherwise, it can get expensive real quick.

Which other solutions did I evaluate?

Unfortunately, it wasn't my call to include Datadog for this Company but sure I'm glad that the Lead Architect took this decision. It brought many improvements in a small span of time.

What other advice do I have?

Please add PHP profiling soon!

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer2004177 - PeerSpot reviewer
Cloud Engineer at a retailer with 51-200 employees
Real User
Good logs, analytics and dashboards
Pros and Cons
  • "We can handle debugging and find out why things are breaking in our applications."
  • "The documentation leaves a lot to be desired for new users."

What is our primary use case?

I am using the solution for monitoring metrics, logs, traces, etc. It's mainly for making dashboards as well as monitoring our services. 

We also use Datadog to help centralize our incident management to show the logs, where issues spiked, and some metrics. 

We use Datadog to do troubleshooting in Kubernetes, specifically in our Azure Kubernetes service. Beyond that, we are looking to use open telemetry in tandem with Datadog to further our log-tracing efforts. In the future, this may be expanded.

How has it helped my organization?

This solution improves our organization as now we have higher visibility into our application that we otherwise would not have. 

Since the Datadog agent comes in three forms, agentless, scraping, and through the API, it is very flexible. It is this flexibility in how to report our logs that keeps our logs centralized and organized. 

One major drawback of Datadog is the cost. Sometimes we set up flows in place to monitor resources that end up logging more than we thought, and the bill is too high.

What is most valuable?

Dashboards have been marrying the most valuable parts of Datadog. Dashboards use metrics that are very helpful for monitoring services. I recently used metrics to monitor the number of pods in Kubernetes, the spikes in requests in Kubernetes, and overall CPU and memory usage in our Kubernetes clusters. 

We can also use log analytics to further our understanding. We can handle debugging and find out why things are breaking in our applications. 

The log portion of Datadog has robust features to debug the applications we are running. I really appreciate the ability to use facets to par down the logs.

What needs improvement?

The documentation leaves a lot to be desired for new users. The documentation is way too much text and has no real information just to help get people started. Sometimes it doesn't help to read an entire essay just to get a grasp on how the logs or metrics work.

For how long have I used the solution?

I've used the solution for two years.

Which deployment model are you using for this solution?

Private Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer2003829 - PeerSpot reviewer
Sr Platform Engineer at a pharma/biotech company with 11-50 employees
Real User
Good logging with lots of great integrations and an interesting dashboard
Pros and Cons
  • "Datadog has a lot of features to be able to drill down deep into the swath of logs that our platforms generate."
  • "Some of the interface is still confusing to use."

What is our primary use case?

We use it mostly for logging log messages from our Kubernetes and EC2 instances, for example, system messages and errors. Also, we want log messages from our firewalls and other network infrastructure in case of network issues. We intend to use it for application logging, et cetera, to get insight into internal problems in the applications in Kubernetes pods. We want to use it for monitoring in case of system problems and hardware failures so that it can notify us.

How has it helped my organization?

It's good to have a single location for all the logs. If you have logs coming from a whole lot of sources, it makes it hard to find where the problem lies. 

We had to spend a lot of time logging into various systems and pursuing a billion different log files looking for something that stands out as a possible cause of the issue. That can take a lot of time and doesn't give much visibility into the possible interactions between systems.

What is most valuable?

Datadog has a lot of features to be able to drill down deep into the swath of logs that our platforms generate. 

It has a lot of ability to make fancy and deep searches using regular expressions and to graph them into useful and interesting dashboard graphs. 

The plethora of built-in/downloadable integrations make it much easier to set up for our platforms. Otherwise, we'd have to parse the log files ourselves, which would take a great deal of effort. Had to do it before when had to use an ELK stack for logging, which was painful.

What needs improvement?

Some of the interface is still confusing to use. It has many features, and it takes a lot of effort to figure out what they all mean. Maybe having tooltips or something would be helpful. Also, some of the integrations are better than others.

For how long have I used the solution?

I've used the solution for a month.

What do I think about the stability of the solution?

The solution seems very stable.

Which solution did I use previously and why did I switch?

Have used an ELK stack before. However, it took a lot of effort to maintain, and parsing the logs was difficult.

How was the initial setup?

We implemented the solution in-house.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer2003781 - PeerSpot reviewer
Product SRE at a computer software company with 51-200 employees
Real User
Good dashboards and documentation with helpful Synthetics Tests
Pros and Cons
  • "Dashboards and their versatility are among the most valuable features."
  • "We would like to see some versioning system for the Synthetic Tests so that we could have a backup of our tests since they are time-consuming to make and very easy to damage in a moment of error."

What is our primary use case?

We use Datadog for application logs, error tracking, performance tracking, alerting, and overall production state surveillance. 

It helps us improve observability and ease of maintenance through better information for our support teams and their issue qualification. 

We also use dashboards to keep all the information at ready and easy to access. SLOs notably for our uptimes but also our feature usage. It also feeds our alerting for our on-call SREs into PagerDuty by launching alerts when specific parameters are exceeded.

How has it helped my organization?

Our usage of Datadog has allowed us to improve our observability at great lengths. We have been able to track pain points more easily with it, and be able to define custom metrics to track our user's usage of the features we roll out.

Being able to generate dashboards has given higher management a better view of our teams' work and has allowed for better client information by our sales team as they have a more transparent way ofdealing with our upcoming features.

What is most valuable?

Dashboards and their versatility are among the most valuable features. They allow us to have internal facing trackers of our application's issues, usages, and features. They also allow us to have a better understanding of how users react to new features, and to display more information to other teams or also clients through uptime SLOs, et cetera.

We also found the Synthetics Tests and especially the Browser Tests very helpful. It is a nicer way to create end-to-end tests in a more user-friendly way than through code. They are very valuable in saving time compared to code-based testing.

Documentation is also very clear and interesting.

What needs improvement?

We would like to see some versioning system for the Synthetic Tests so that we could have a backup of our tests since they are time-consuming to make and very easy to damage in a moment of error.

I look forward to seeing the next features that will be released.

For how long have I used the solution?

I have been using the product for a year and a half. The company has been using it for longer. I don't know the exact details.

What do I think about the stability of the solution?

We have yet to have a large-scale problem with stability using Datadog. It's very satisfying.

What do I think about the scalability of the solution?

The scalability is very good.

How are customer service and support?

I've had only a few experiences with customer support, and it went well. They were fast!

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We did not use a different solution previously.

How was the initial setup?

I wasn't there for the initial setup.

What about the implementation team?

I wasn't there for the initial setup.

What was our ROI?

I cna't speak to the ROI.

What's my experience with pricing, setup cost, and licensing?

I don't give advice regarding that.

Which other solutions did I evaluate?

I wasn't part of the decision-making process.

What other advice do I have?

It would be nicer if the pricing information was easier to find in the documentation. Sometimes it helps to get an overall idea of the cost of certain options.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer2003214 - PeerSpot reviewer
Sr. Director of Software Engineering at a tech consulting company with 1,001-5,000 employees
Real User
Helpful support, good incident management, and helps triage faster
Pros and Cons
  • "The RUM solution has improved our ability to triage faster and hand more capabilities to our customer support."
  • "The pricing is a bit confusing."

What is our primary use case?

The RUM is implemented for customer support session replays to quickly route, triage, and troubleshoot support issues which can be sent to our engineering teams directly. 

Customer Support will log in directly after receiving a customer request and work on the issue. Engineers will utilize the replay along with RUM to pinpoint the issue combined with APM and Infra trace to be able to look for signals to find the direct cause of the customer impact. 

Incident management will be utilized to open a Jira ticket for engineering, and it integrates with ITSM systems and on-call as needed.

How has it helped my organization?

The RUM solution has improved our ability to triage faster and hand more capabilities to our customer support.

The RUM is implemented for customer support. It can quickly route, triage, and troubleshoot support issues that are sent to our engineering teams. 

Customer support can log in and start troubleshooting after receiving a customer request. The replay and RUM help pinpoint the issue. This functionality is combined with APM and Infra trace to be able to look for the cause of the issue. Incident management is leveraged to open a Jira ticket for engineering, and it can integrate with ITSM systems and on-call as needed.

What is most valuable?

RUM with session replay combined with a future use case to support synthetics will help to identify issues earlier in our process. We have not rolled this out yet but plan for it as a future use case for our customer support process. This, combined with integrated automation for incident management, will drive down our MTTR and time spent working through tickets. Overall, we are hoping to use this to look at our data and perfection rate over time in a BI-like way to reduce our customer support headcount by saving on time spent.

What needs improvement?

I would like to see retention options greater than 30-days for session replay. I'd also like to see forwarding options for retention to custom solutions, and a greater ability to event and export data from the tooling overall to BI/DW solutions for reporting across the long term and to see trends as needed.

For how long have I used the solution?

I've used the solution for about nine months.

What do I think about the stability of the solution?

So far, stability has been great.

What do I think about the scalability of the solution?

I'd like to see more bells and whistles added over time. Widgets are coming soon to help with RUM.

How are customer service and support?

Support is very good. They are responsive and gave us the help we need.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We have utilized New Relic, however, not for RUM. We went with Datadog to potentially switch the entire platform into an all-in-one solution that makes sense for a company of our size.

How was the initial setup?

We started on the beta, and the documentation was lagging behind. We also needed direct instructions and links from the customer support/account representative that was not immediately available by searching online.

What about the implementation team?

We implemented the solution ourselves.

What was our ROI?

Ideally, this will inform our strategy to not increase our customer support headcount as significantly into 2023 and beyond.

What's my experience with pricing, setup cost, and licensing?

The pricing is a bit confusing. However, the RUM session replay, in general, is very inexpensive compared to whole solutions.

Which other solutions did I evaluate?

We looked into LogRocket and New Relic.

What other advice do I have?

I'd advise other users to try it out.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer2000451 - PeerSpot reviewer
SRE at a financial services firm with 10,001+ employees
Real User
Great visibility, easy to implement, and offers the ability to set thresholds
Pros and Cons
  • "It has provided visibility with ease of implementation and allowed multiple teams to quickly onboard it."
  • "Federated views for Datadog dashboards are critical as large companies utilize multiple instances of the product and cannot link the metrics or correlate the metrics together. This stunts the usage of Datadog."

What is our primary use case?

We primarily use the solution for observability, metrics, logs, tracing, and end-to-end user flow monitoring. 

We are looking to implement this as a company-wide standard for cloud solutions.

At this time, we're currently in a POC, and we're interested in using either a Datadog agent or the OTel agent with a Datadog exporter. We have dashboards with panels that correlate metrics and allow you to link through to traces. Flame graphs to show latency across services and the various spans. 

While we are not security minded, we still require it and are interested in more. It's used for monitoring critical systems.

How has it helped my organization?

It has provided visibility with ease of implementation and allowed multiple teams to quickly onboard it. This provided a standard way to approach observability and visibility. 

Monitoring rules and alerting thresholds can also be set and exported to other teams for use. 

There is an issue with federated dashboards, as multiple teams running on different Datadog instances cannot use features like the service catalog or easily switch between services in a long business flow.

What is most valuable?

The K8 monitoring is extremely useful in Datadog. Preset dashboards that it provides help to speed up the work. 

The metrics summary is useful. Tracing with a span breakdown is helpful for us. We like the dashboarding with power packs and logging correlation with traces and logs. 

The Flame graph for tracing helps determine where the latency is the highest. 

Dashboards are created as a standard set and then exported into other Datadog instances for other teams. 

These dashboards would be updated regularly and pushed out to the teams. Unfortunately, there is no way to automatically push or deploy code in a quicker way. Each team I work with has its own Datadog instance.

What needs improvement?

Federated views for Datadog dashboards are critical as large companies utilize multiple instances of the product and cannot link the metrics or correlate the metrics together. This stunts the usage of Datadog. Additionally, using an OTel agent would be more acceptable and allow for easier adoption of Datadog across the hundreds of teams here.

For how long have I used the solution?

I've used the solution for four months.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros sharing their opinions.
Updated: May 2025
Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros sharing their opinions.