Datadog Reviews and Pricing

Jordan Lee

Site Reliability Engineer at Icario

Sep 30, 2024

Download

Good centralization with helpful monitoring and streamlined investigation capabilities

Pros and Cons

"The two most valuable aspects are the Terraform provider for Datadog and the K8s Orchestrator."

"A big problem with Datadog is the billing. They need to make the billing more user-friendly."

What is our primary use case?

We utilize Datadog to monitor both some legacy products and a new PaaS solution that we are building out here at Icario which is Micro-Service arch.

All of our infrastructure is in AWS with very few legacies being rackspace. For the PaaS we mainly just utilize the K8s Orchestrator which implements the APM libraries into services deployed there as well as giving us infra info regarding the cluster.

For legacies, we mainly just utilize the Agent or the AWS integration. With APM in specific places. We monitor mainly prod in Legacy and the full scope in the PaaS for now.

How has it helped my organization?

Datadog has greatly improved the time needed to investigate issues. Putting everything into a single pane of glass. Allowing us to get ahead of infra/app-based issues before they affect customer experience with our products.

Outside of that, the ease of management, deployment of agents, integrations etc. has greatly helped the teams. There isn't much leg work needed by the devs to manage or deploy Datadog into their stacks. This is with the use of Terraform, pipelines and the orchestrator. All in all, it has been an improvement.

What is most valuable?

The two most valuable aspects are the Terraform provider for Datadog and the K8s Orchestrator. People don't take that into account when buying into a tooling product like Datadog in this age where scalability, management, and ease of implementation is key. Other tools not having good IaC products or options is a ball drop. Orchestration for the tools agent is good. Not having to use another tool to manage the agents and config files in mutiple places/instances is a huge win!

What needs improvement?

A big problem with Datadog is the billing. They need to make the billing more user-friendly. I know it like the back of my hand at this point, yet trying to explain it to the C-suite as to why costs went up or are what they are is many times more complicated than it needs to be. I can't even say "why" due to of the lack of metadata tied to billing. For instance, with the AWS Integration Host ingestion, I cant say well this month THESE host got added and thats what caused cost to go up. The billing visibility really needs to be resolved!

Buyer's Guide

Datadog

July 2026

Free Report: Datadog Reviews and More

Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: July 2026.

DOWNLOAD NOW

903,257 professionals have used our research since 2012.

For how long have I used the solution?

I'd rate the solution for more than four years.

What do I think about the stability of the solution?

Datadog has always been extremely stable, with outages really only ever creating delays, never actual downtime of the service, which is amazing and impressive.

What do I think about the scalability of the solution?

The solution is very scalable if implemented right and not on top of complicated architecture.

How are customer service and support?

Support is excellent. They are always looking for a resolution, and a ticket is never left unresolved unless the feature just can't exist or isn't currently possible.

Which solution did I use previously and why did I switch?

We did have New Relic, Datadog, Sumo Logic, Pingdom, and some other custom or third-party tooling. We switched because we wanted everything to be in a single pane and because Datadog is a better solution than the competitors.

How was the initial setup?

For us, set-up is a mixed bag as we support legacy apps and architectures as well as a new microservice architecture. That being said, legacy is somewhat complex just due to the nature of how those apps stack and the underlying infra and configuration and setup. Microservice is a breeze and straight-forward for most of the out-of-the-box stuff.

What about the implementation team?

Our Team of SRE Engineers, Platform Engineers and Cloud Engineers implemented the solution.

What was our ROI?

I can't really speak to ROI; however, from my perspective, we definitely get our money's worth from the product.

What's my experience with pricing, setup cost, and licensing?

Users just just really need to make sure they stay on top of costs and don't let all of the engineers do as they please. Billing with Datadog can get out of hand if you let them. Not everything needs to be monitored.

Which other solutions did I evaluate?

We didn't really need to evaluate other options.

Which deployment model are you using for this solution?

Hybrid Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

Gediminas Anza

Manager, System at Visma

Sep 30, 2024

Download

Increases efficiency, helps with customer satisfaction, and enhances collaboration

Pros and Cons

"The agents feature in Datadog stands out as a valuable asset within our organization due to its robust functionality, versatility, and role in providing comprehensive monitoring and observability capabilities."

"Presently, the billing CSV reports provide insights into billing-related information yet are somewhat limited in functionality, typically offering reports with only three columns."

What is our primary use case?

The primary use case of Datadog within our organization encompasses providing a comprehensive and sophisticated solution that caters to the diverse needs of our internal customers. We have strategically implemented Datadog to serve as a centralized platform for monitoring, analyzing, and optimizing various aspects of our operations. With a robust suite of functionalities, Datadog empowers us to meet the dynamic requirements of over 40 internal customers efficiently.

Through Datadog, we offer a wide array of services to our internal stakeholders, allowing them to access and leverage its capabilities to enhance performance, troubleshoot issues, and make data-driven decisions. The tool's versatility enables different teams within our organization to monitor and track distinct metrics, such as application performance, infrastructure health, and logs, tailored to their specific requirements.

Moreover, Datadog serves as a pivotal component in our organizational ecosystem by streamlining processes, enhancing collaboration, and fostering a culture of data-driven decision-making. By harnessing the power of Datadog, our internal customers can proactively address issues, optimize resources, and ultimately improve operational efficiency across the board.

In essence, the primary use case of Datadog in our organization revolves around empowering our internal customers with a comprehensive and feature-rich solution that enables them to monitor, analyze, and optimize various aspects of our operations seamlessly and effectively. This strategic implementation of Datadog plays a vital role in enhancing our overall performance, fostering transparency, and driving continuous improvement within our organization.

How has it helped my organization?

Datadog has significantly contributed to enhancing the overall effectiveness and efficiency of our organization through various key improvements. One of the standout benefits has been the accelerated resolution of issues. By leveraging Datadog's monitoring and alerting capabilities, we have been able to swiftly detect, diagnose, and address issues before they escalate, resulting in minimized downtime and enhanced operational continuity.

Moreover, the implementation of Datadog has had a tangible positive impact on customer satisfaction. With improved visibility into our systems and applications, coupled with proactive monitoring and performance optimization, we have been able to deliver a more reliable and seamless experience to our customers. This has translated into higher customer satisfaction scores and strengthened relationships with our stakeholders.

Another notable improvement brought about by Datadog is the streamlining of our toolset. By identifying and removing multiple unused or redundant features and tools, Datadog has helped optimize our workflows and resources. This decluttering of unnecessary functionalities has not only increased operational efficiency yet also streamlined our processes, allowing us to focus on the tools and features that truly add value to our operations.

In summary, Datadog's impact on our organization has been profound, enhancing our ability to resolve issues rapidly, improving customer satisfaction levels, and streamlining our toolset for increased efficiency and focus. These improvements have led to a more robust and resilient operational environment, enabling us to better meet the needs of our internal and external stakeholders.

What is most valuable?

Within our organization, we have found the Agents feature in Datadog to be exceptionally valuable due to its rich set of functionalities and capabilities. The Agents play a crucial role in our monitoring and data collection processes, providing a comprehensive and reliable means to gather crucial performance metrics and insights across our systems and applications.

One of the key reasons why the agents feature stands out as particularly valuable is its versatility. The Agents offer a wide range of monitoring and data collection options, allowing us to capture diverse metrics and performance data with precision. This flexibility enables us to tailor our monitoring strategy to meet the specific needs of different teams and use cases within our organization.

Moreover, the agents feature in Datadog enhances the overall observability of our infrastructure and applications. By deploying Agents strategically across our environment, we can gather real-time metrics, logs, and traces, enabling us to monitor the health, performance, and behavior of our systems comprehensively. This deep level of observability empowers us to proactively identify issues, optimize performance, and make informed decisions based on accurate and timely data.

Furthermore, the agents feature in Datadog plays a pivotal role in driving actionable insights and facilitating efficient troubleshooting. With the detailed data collected by the Agents, we can perform in-depth analysis, detect anomalies, and troubleshoot issues quickly and effectively. This proactive approach to monitoring and analysis ultimately enhances our operational efficiency and resilience.

In essence, the agents feature in Datadog stands out as a valuable asset within our organization due to its robust functionality, versatility, and role in providing comprehensive monitoring and observability capabilities. By leveraging the power of the Agents feature, we can effectively monitor, analyze, and optimize our systems and applications to ensure seamless operations and performance excellence.

What needs improvement?

In assessing areas for potential improvement, one key aspect where Datadog could enhance its service is in the realm of billing CSV reports. Presently, the billing CSV reports provide insights into billing-related information yet are somewhat limited in functionality, typically offering reports with only three columns. Expanding the capabilities of the billing CSV reports to include more detailed and customizable information would greatly benefit users by allowing them to gain a deeper understanding of their usage, costs, and billing trends within Datadog.

Additionally, in considering features for inclusion in the next release of Datadog, the development of more robust and customizable billing CSV reports could be a significant enhancement. By allowing users to tailor their billing reports to specific metrics, timeframes, and parameters of interest, Datadog could provide greater transparency and control over billing data, enabling users to make informed decisions regarding resource allocation, cost optimization, and budget planning.

Moreover, the inclusion of features such as cost forecasting, budget tracking, and customizable alerts related to billing thresholds could further empower users to manage their expenses effectively and proactively monitor and control costs within Datadog. These additions would not only enhance user experience and satisfaction, however, also contribute to a more holistic and actionable approach to financial management within the Datadog platform.

By refining the functionality of billing CSV reports and incorporating advanced features for cost analysis, forecasting, and monitoring, Datadog can elevate its service offering and provide users with enhanced tools for optimizing their usage, expenses, and financial oversight within the platform.

For how long have I used the solution?

I've used the solution for over three years.

What do I think about the scalability of the solution?

Datadog is easy to scale. However, it's scaled for price, so be sure to measure what you need and not push all logs to the solution, or your price will skyrocket quickly.

Which solution did I use previously and why did I switch?

We use multiple APM tools to have both price and value correlations relevant to the teams using them.

What's my experience with pricing, setup cost, and licensing?

Request a test account during the POC phase to determine if the tool is the right fit; all providers do that for free.

Which other solutions did I evaluate?

We did POC with over five products. I can't name them due to the related NDA.

Which deployment model are you using for this solution?

Public Cloud

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

Buyer's Guide

Datadog

July 2026

Free Report: Datadog Reviews and More

Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: July 2026.

DOWNLOAD NOW

903,257 professionals have used our research since 2012.

reviewer2507895

Software Architect at Keller Williams Realty, Inc.

Oct 2, 2024

Download

Good RUM and APM with good observability

Pros and Cons

"We also use APM and metrics to view the status of our Pub/Sub topics and queues, especially when dealing with undelivered messages."

"The cost is pretty high."

What is our primary use case?

We use Datadog across the enterprise for observability of infrastructure, APM, RUM, SLO management, alert management and monitoring, and other features. We're also planning on using the upcoming cloud cost management features and product analytics.

For infrastructure, we integrate with our Kube systems to show all hosts and their data.

For APM, we use it with all of our API and worker services, as well as cronjobs and other Kube deployments.

We use serverless to monitor our Cloud Functions.

We use RUM for all of our user interfaces, including web and mobile.

How has it helped my organization?

It's given us the observability we need to see what's happening in our systems, end to end. We get full stack visibility from APM and RUM, through to logging and infrastructure/host visibility. It's also becoming the basis of our incident management process in conjunction with PagerDuty.

APM is probably the most prominent place where it has helped us. APM gives us detailed data on service performance, including latency and request count. This drives all of the work that we do on SLOs and SLAs.

RUM is also prominent and is becoming the basis of our product team's vision of how our software is actually used.

What is most valuable?

APM is a fundamental part of our service management, both for viewing problems and improving latency and uptime. The latency views drive our SLOs and help us identify problems.

We also use APM and metrics to view the status of our Pub/Sub topics and queues, especially when dealing with undelivered messages.

RUM has been critical in identifying what our users are actually doing, and we'll be using the new product analytics tools to research and drive new feature development.

All of this feeds into the PagerDuty integration, which we use to drive our incident management process.

What needs improvement?

Sometimes thesolution changes features so quickly that the UI keeps moving around. The cost is pretty high. Outside of that, we've been relatively happy.

The APM service catalog is evolving fast. That said, it is redundant with our other tools and doesn't allow us to manage software maturity. However, we do link it with our other tools using the APIs, so that's helpful.

Product analytics is relatively new and based on RUM, so it will be interesting to see how it evolves.

Sometimes some of the graphs take a while to load, based on the window of data.

Some stock dashboards don't allow customization. You need to clone them first, but this can lead to an abundance of dashboards. Also, there are some things that stock dashboards do that can't yet be duplicated with custom dashboards, especially around widget organization.

The "top users" widget on the product analytics page only groups by user email, which is unfortunate, since user ID is the field we use to identify our users.

For how long have I used the solution?

I've used the solution for three and a half years.

What do I think about the stability of the solution?

The solution is pretty stable.

What do I think about the scalability of the solution?

The solution is very scalable.

How are customer service and support?

Support was excellent during the sales process, with a huge dropoff after we purchased the product. It has only recently (within the past year) they have begun to reach acceptable levels again.

How would you rate customer service and support?

Neutral

Which solution did I use previously and why did I switch?

We did not have a global solution. Some teams were using New Relic.

How was the initial setup?

The instructions aren't always clear, especially when dealing with multiple products across multiple languages. The tracer works very differently from one language to another.

What about the implementation team?

We handled the setup in-house.

What's my experience with pricing, setup cost, and licensing?

We have built our own set of installation instructions for our teams, to ensure consistent tagging and APM setup.

Which other solutions did I evaluate?

We did look at Dynatrace.

What other advice do I have?

The service was great during the initial testing phase. However, once we bought the product, the quality of service dropped significantly. However, in the past year or so, it has improved and is now approaching the level we'd expect based on the cost.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Google

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

reviewer2543758

Engineering Manager at RVshare

Sep 26, 2024

Download

Good visibility into application performance, understanding of end-user behavior, and a single pane of glass view

Pros and Cons

"The single pane of glass view with maneuvering between products has helped us to truly understand root causes after incidents."

"The wide range of products Datadog now offers can be a bit intimidating to developers."

What is our primary use case?

The primary use case for this solution is to enhance our monitoring visibility, determine the root cause of incidents, understand end-user behaviour from their point of view (RUM), and understand application performance.

Our technical environment consists of a local dev env where Datadog is not enabled, we have deployed environments that range from UAT testing with our product org to ephemeral stacks that our developers use to test there code not on there computer. We also have a mobile app where testing is also performed.

How has it helped my organization?

Datadog has greatly improved our organization in many ways. Some of those ways include greater visibility into application performance, understanding of end-user behavior, and a single pane of glass view into our entire infrastructure.

Regarding visibility, our organization previously used New Relic, and when incidents or regressions happened, New Relic's query language was very hard to use. End-user behavior in RUM has improved our ability to know what to focus on. Lastly, the single pane of glass view with maneuvering between products has helped us truly understand root causes after incidents.

What is most valuable?

APM has been a top feature for us. I can speak for all developers here: they use it more often than other products. Due to a standard in tracing (even though it is customizable), engineers find it easier to walk a trace than to understand what went wrong when looking at logging.

Another feature that I find valuable, though it isn't the first one that comes to mind, is Watchdog. I have found that has been a good source of understanding anomalies and where maybe we (as an organization) need more monitoring coverage.

What needs improvement?

I am not 100% sure how this is done or if it can be though I've had a lot of education I've had to do to ramp developers up on the platform. This feels like the nature of just the sheer growth and number of products Datadog now offers.

When I first started using the Datadog platform, I thought that was a big pro of the company that the ramp-up time was much quicker, not having to learn a query language. I still believe that to be true when comparing the product to someone like New Relic though with the wide range of products Datadog now offers it can be a bit intimidating to developers to know where to go to find what they want.

For how long have I used the solution?

I have been using the solution at my current company for almost four years, and have used it at my previous company as well.

Which solution did I use previously and why did I switch?

A while ago, we used New Relic, and we switched due to Datadog being a better product.

What about the implementation team?

We did the implementation in-house.

What's my experience with pricing, setup cost, and licensing?

The value compared to pricing is reasonable, though it can be a bit of a sticker shock to some.

Which other solutions did I evaluate?

We did not evaluate other options.

Which deployment model are you using for this solution?

Public Cloud

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

Dmitri Panfilov

Software Engineer at Redfin Corp

Sep 20, 2024

Download

Easy dashboard creation and alarm monitoring with a good ROI

Pros and Cons

"The ease of dashboard creation and alarm monitoring has helped us not only stay competitive but be industry leaders in performance."

"The product can be improved by allowing the grouping of APIs to add variables. That way, any API with a unique ID could be grouped together."

What is our primary use case?

We use the solution to monitor production service uptime/downtime, latency, and log storage.

Our entire monitoring infrastructure runs off Datadog, so all our alarms are configured with it. We also use it for tracing API performance; what are the biggest regression points.

Finally we use it to compare performance on SEO metrics vs competitors. This is a primary use case as SEO dictates our position from google traffic which is a large portion of our customer view generation so it is a vital part of the business we rely on datadog for.

How has it helped my organization?

The product improved the organization primarily by providing consistent data with virtually zero downtime. This was a problem we had with an old provider. It also made it easy to transition an otherwise massive migration involving hundreds of alarms.

The training provided was crucial, along with having a dedicated team that can forward our requests to and from Datadog efficiently. Without that, we may have never transitioned to Datadog in the first place since it is always hard to lead a migration for an entire company.

What is most valuable?

The API tracing has been massive for debugging latency regressions and how to improve the performance of our least performant APIs. Through tracing, we managed to find the slowest step of an API, improve its latency, and iterate on the process until we had our desired timings. This is important for improving our SEO as LCP, INP are directly taking from the numbers we see on Datadog for our API timings.

The ease of dashboard creation and alarm monitoring has helped us not only stay competitive but be industry leaders in performance.

What needs improvement?

The product can be improved by allowing the grouping of APIs to add variables. That way, any API with a unique ID could be grouped together.

Furthermore, SEO monitoring has been crucial for us but also a difficult part to set up as comparing alarms between us and competitors is a tough feat. Data is not always consistent so we have been toying and experimenting with removing the noise of datadog but its been taking a while.

Finally, Datadog should have a feature that reports stale alarms based on activity.

For how long have I used the solution?

I've used the solution for six months.

What do I think about the stability of the solution?

Its very stable and we have not experienced an issue with downtime on Datadog.

What do I think about the scalability of the solution?

Datadog works well for scalability as volume has not seemed to slow.

How are customer service and support?

We haven't talked to the support team.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We switched to Datadog as we used to have a provider that had very inconsistent logging. Our alarms would often not fire since our services were not working since the provider had a logging problem.

How was the initial setup?

The initial setup was somewhat complex due to the built-in monitoring with services. This is not always super comprehensive and has to be studied as opposed to other metrics platforms that just service all your endpoints, which you can trace them with Grafana.

What about the implementation team?

We implemented the solution through an in-house team.

What was our ROI?

The ROI is good.

What's my experience with pricing, setup cost, and licensing?

Users must try to understand the way Datadog alarms work off the bat so that they can minimize the requirements for expensive features like custom metrics.

It can sometimes be tempting to use them; however, it is not always necessary as you migrate to Datalog, as they are a provider that treats alarms somewhat differently than you may be used to.

Which other solutions did I evaluate?

We have evaluated New Relic, Grafana, Splunk, and many more in our quest to find the best monitoring provider.

Which deployment model are you using for this solution?

Hybrid Cloud

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

reviewer1494894

Senior Manager, Site Reliability Engineering at Extra Space Storage

Sep 19, 2024

Download

Improved time to discovery and resolution but needs better consumption visibility

Pros and Cons

"Several critical dashboards were created years ago and are still in use today."

"We would love to see a % consumed and alert us if we are over budget before getting an overage charge 20 days into the month."

What is our primary use case?

The product monitors multiple systems, from customer interactions on our web applications down to the database and all layers in between. RUM, APM, logging, and infrastructure monitoring are all surfaced into single dashboards.

We initially started with application logs and generated long-term business metrics out of critical logs. We have turned those metrics and logs into a collection of alerts integrated into our pager system. As we have evolved, we have also used APM and RUM data to trigger additional alerts.

How has it helped my organization?

The solution has surfaced how integrated our applications really are and helps us track calls from the top down, identifying slowness and errors all through the call stack.

The biggest improvement we have seen is our time to discovery and resolution. As Datadog has improved, and we add new features, the depth and clarity we get from top to bottom has been excellent. Our engineering teams have quickly adopted many features within Datadog, and are quick to build out their own dashboards and alerts. This has also led to a rapid sprawl when left unchecked.

What is most valuable?

We started with application logs and have expanded over the years to include infrastructure, APM, and now RUM. All of these tools have been incredibly valuable in their own sphere. The huge value is tying all of the data points together.

Logging was the first tool we started with years ago, replacing our ELK stack. It was the easiest to get in place, and our engineers quickly embraced the tools. Several critical dashboards were created years ago and are still in use today. Over time, we have shifted from verbose logs and matured into APM and RUM. That has helped us focus on fine-tuning the performance of our applications.

What needs improvement?

We need better visibility into our consumption rate, which is tied to our commit levels. We would love to see a % consumed and alert us if we are over budget before getting an overage charge 20 days into the month.

The biggest complaint we hear comes from the cost of the tool. It is pretty easy to accidentally consume a lot of extra data. Unless you watch everything come in almost daily, you could be in for a big surprise.

We utilize the Datadog estimated usage metrics to build out alerts and dashboards. The usage and cost system page still doesn't tie into our committed spending - it would be wonderful to see the monthly burn rate on any given day.

For how long have I used the solution?

I've used the solution for six years.

What do I think about the stability of the solution?

There have not been as many outages in the past year. We also haven't been jumping into the new features as quickly as they come out. We may be working on more stable products.

What do I think about the scalability of the solution?

It has scaled up to meet our needs pretty well. Over the years, we have only managed to trigger internal DataDog alerts once or twice by misconfiguring a metric and spiralling out of control with costs.

How are customer service and support?

Support has been lacking. Opening a chat with the tech support rep of the day is always a gamble. We are looking into working with third-party support because it has been so rough over the years.

How would you rate customer service and support?

Neutral

Which solution did I use previously and why did I switch?

We used the ELK stack for logging and monitoring and AppDynamics for APM.

How was the initial setup?

The initial setup for new teams has become easier over the years. We are increasing our adoption rate as we shift our technology to more cloud-native tools. Datadog has supported easy implementation by simply adding a package to the app.

They have really focused on a lot of out-of-the-box functionality, but the real fun happens as you dive deeper into the configuration. We have also begun adapting open telemetry standards. This has kept us from going too deep into vendor-specific implementations.

What about the implementation team?

We did the initial setup via an in-house team.

What was our ROI?

As long as we stay on top of our consumption mid-month, it has been worth it. However, the few engineers we have who are dedicated to playing whack-a-mole with the growing spending could be better utilized in teaching best practices to new users. I suppose our implementation of the rapidly changing tools over the years has led to a fair amount of technical debt.

What's my experience with pricing, setup cost, and licensing?

It is quite easy to set up any specific tool, but to take advantage of the full visibility it offers, you need to instrument across the board—which can be time-consuming. Be careful about how each tool is billed, and watch your consumption like a hawk.

Which other solutions did I evaluate?

We evaluated AppDynamics and Dynatrace.

What other advice do I have?

It's a very powerful tool, with lots of new features coming, but you certainly will pay for what you get.

Which deployment model are you using for this solution?

Public Cloud

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

reviewer092526

Director of Engineering at Ordoro

Oct 15, 2024

Download

Debugs slow performance with good support and a straightforward setup

Pros and Cons

"Datadog infrastructure monitoring has helped us identify health issues with our virtual machines, such as high load, CPU, and disk usage, as well as monitoring uptime and alerting when Kubernetes containers have a bad time staying up."

"We have found that some of the different options for filtering for logs ingestion, APM traces and span ingestion, and RUM sessions vs replay settings can be hard to discover and tough to determine how to adjust and tweak for both optimal performance and monitoring as well as for billing within the console."

What is our primary use case?

We use Datadog for monitoring the performance of our infrastructure across multiple types of hosts in multiple environments. We also use APM to monitor our applications in production.

We have some Kubernetes clusters and multi-cloud hosts with Datadog agents installed. We have recently added RUM to monitoring our application from the user side, including replay sessions, and are hoping to use those to replace existing monitoring for errors and session replay for debugging issues in the application.

How has it helped my organization?

We have been using Datadog since I started working at the company ten years ago and it has been used for many reasons over the years. Datadog across our services has helped debug slow performance on specific parts of our application, which, in turn, allows us to provide a snappier and more performant application for our customers.

The monitoring and alerting system has allowed our team to be aware of the issues that have come up in our production system and react faster with more tools to debug and view to keep the system online for our customers.

What is most valuable?

Datadog infrastructure monitoring has helped us identify health issues with our virtual machines, such as high load, CPU, and disk usage, as well as monitoring uptime and alerting when Kubernetes containers have a bad time staying up. Our use of Datadog's Application Monitoring, APM over the last six years or so has been crucial to identifying performance and bottleneck issues as well as alerting us when services are seeing high error rates, which have made it easier to debug when specific services may be going down.

What needs improvement?

We have found that some of the different options for filtering for logs ingestion, APM traces and span ingestion, and RUM sessions vs replay settings can be hard to discover and tough to determine how to adjust and tweak for both optimal performance and monitoring as well as for billing within the console.

It can sometimes be difficult to determine which information is documented, as we have found inconsistencies with deprecated information, such as environment variables within the documentation.

For how long have I used the solution?

I've been using the solution for ten years.

What do I think about the stability of the solution?

The solution seems pretty stable, as we've been using it for more than a decade.

What do I think about the scalability of the solution?

The solution seems quite scalable, especially within Kubernetes. Costs are a factor.

How are customer service and support?

SUpport has been very helpful whenever we need it.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We had tried some other APM monitoring in the past, however, it was too expensive, and then we added it to Datadog since we were already using Datadog and it seemed like a good value add.

How was the initial setup?

The solution is straightforward to set up. Sometimes, it is complex to find the correct documentation.

What about the implementation team?

We handled the setup in-house.

What was our ROI?

Our ROI is ease of mind with alerts and monitoring, as well as the ability to review and debug issues for our customers.

What's my experience with pricing, setup cost, and licensing?

Getting settled on pricing is something you want to keep an eye on, as things seem to change regularly.

Which other solutions did I evaluate?

We used New Relic previously.

What other advice do I have?

Datadog is a great service that is continually growing its solution for monitoring and security. It is easy to set up and turn on and off its features once you have instrumented agents and tailored solutions to your needs.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Other

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

Kenneth Dozier Jr.

Works

Sep 26, 2024

Download

Improves monitoring and observability with actionable alerts

Pros and Cons

"The selection of monitors is a big feature I have been working with."

"The PagerDuty integration could be a little bit better."

What is our primary use case?

We are using Datadog to improve our monitoring and observability so we can hopefully improve our customer experience and reliability.

I have been using Datadog to build better actionable alerts to help teams across the enterprise. Also by using Datadog we are hoping to have improved observability into our apps and we are also taking advantage of this process to improve our tagging strategy so teams can hopefully troubleshoot incidents faster and a much reduced mean time to resolve.

We have a lot of different resources we use like Kubernetes, App Gateway and Cosmos DB just to name a few.

How has it helped my organization?

As soon as we started implementing Datadog into our cloud environment people really like how it looked and how easy it was to navigate. We could see the most data in our Kubernetes environments than we ever could.

Some people liked how the logs were color coded so it was easy to see what kind of log you were looking at. The ease of making dashboards has also been greatly received as a benefit.

People have commented that there is so much information that it takes a time to digest and get used to what you are looking at and finding what you are looking for.

What is most valuable?

The selection of monitors is a big feature I have been working with. Previously with Azure Monitor we couldn't do a whole lot with their alerts. The log alerts can sometimes take a while to ingest. Also, we couldn't do any math with the metrics we received from logs to make better alerts from logs.

The metric alerts are ok but are still very limited. With Datadog, we can make a wide range of different monitors that we can tweak in real time because there is a graph of data as you are creating the alert which is very beneficial. The ease of making dashboards has saved a lot of people a lot of time. No KQL queries to put together the information you are looking for and the ability to pin any info you see into a dashboard is very convenient.

RUM is another feature we are looking forward to using this upcoming tax season, as we will have a front-row view into what frustrates customers or where things go wrong in their process of using our site.

What needs improvement?

The PagerDuty integration could be a little bit better. If there was a way to format the monitors to different incident management software that would be awesome. As of right now, it takes a lot of manipulating of PagerDuty to get the monitors from Datadog to populate all the fields we want in PagerDuty.

I love the fact you can query data without using something like KQL. However, it would also be helpful if there was a way to convert a complex KQL query into Datadog to be able to retrieve the same data - especially for very specific scenarios that some app teams may want to look for.

For how long have I used the solution?

I've used the solution for about two years.

Which solution did I use previously and why did I switch?

We previously used Azure Monitor, App Insights, and Log Analytics. We switched because it was a lot for developers and SREs to switch between three screens to try troubleshoot and when you add in the slow load times from Azure it can take a while to get things done.

What's my experience with pricing, setup cost, and licensing?

I would advise taking a close look at logging costs, man-hours needed, and the amount of time it takes for people to get comfortable navigating Datadog because there is so much information that it can be overwhelming to narrow down what you need.

Which other solutions did I evaluate?

We did evaluate DynaTrace and looked into New Relic before settling on Datadog.

Which deployment model are you using for this solution?

Hybrid Cloud

Disclosure: My company has a business relationship with this vendor other than being a customer. H&R Block has just recently became a customer of Datadog.

reviewer9816413

Engineering Manager at Video Blocks

Sep 23, 2024

Download

Easy, more reliable, and transparent monitoring

Pros and Cons

"Monitors have also been very valuable when setting up our on-call processes. It makes it easy to set up and adjust alerting to keep our teams aware of anything going wrong."

"One thing to improve would be making it easier to see common patterns across traces."

What is our primary use case?

We use the solution to monitor and investigate issues with production services at work. We're periodically reviewing the service catalog view for the various applications and I use it to identify any anomalies with service metrics, any changes in user behavior evident via API calls, and/or spikes in errors.

We use monitors to trigger alerts for on-call engineers to act upon. The monitors have set thresholds for request latency, error rates, and throughput.

We also use automated rules to block bad actors based on request volume or patterns.

How has it helped my organization?

Datadog has made setting up monitors easier, more reliable, and more transparent. This has helped standardize our on-call process and set all of our on-call engineers up for success.

It has also standardized the way we evaluate issues with our applications by encouraging all teams to use the service catalog.

It makes it easier for our platforms and QA teams to get other engineering teams up to speed with managing their own applications' performance.

Overall, Datadog has been very helpful for us.

What is most valuable?

The service catalog view is very helpful for periodic reviews of our application. It has also standardized the way we evaluate issues with our applications. Having one page with an easy-to-scan view of app metrics, error patterns, package vulnerabilities, etc., is very helpful and reduces friction for our full-stack engineers.

Monitors have also been very valuable when setting up our on-call processes. It makes it easy to set up and adjust alerting to keep our teams aware of anything going wrong.

What needs improvement?

Datadog is great overall. One thing to improve would be making it easier to see common patterns across traces. I sometimes end up in a trace but have a hard time finding other common features about the error/requests that are similar to that trace. This could be easier to get to; however, in that case, it's actually an education issue.

Another thing that could be improved is the service list page sometimes refreshes slowly, and I accidentally click the wrong environment since the sort changes late.

For how long have I used the solution?

I've used the solution for about a year.

What do I think about the stability of the solution?

It is very stable. I have not seen any issues with Datadog.

What do I think about the scalability of the solution?

It seems very scalable.

How are customer service and support?

I've had no specific experience with technical support.

How would you rate customer service and support?

Neutral

Which solution did I use previously and why did I switch?

We used Honeycomb before. We switched since Datadog offered more tooling.

How was the initial setup?

Each application has been easy to instrument.

What about the implementation team?

We implemented the solution in-house.

What was our ROI?

Engineers save an unquantifiable amount of time by having one standard view for all applications and monitors.

What's my experience with pricing, setup cost, and licensing?

I am not exposed to this aspect of Datadog.

Which other solutions did I evaluate?

We did not evaluate other options.

Which deployment model are you using for this solution?

Public Cloud

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

Neil Elver

Application Development Team Lead at TCS EDUCATION SYSTEM

Sep 20, 2024

Download

Good synthetic testing, centralized pipeline tracking and error logging

Pros and Cons

"Synthetic testing has been a game-changer, allowing us to catch potential problems before they impact real users."

"I'd like to see an expansion of the Android and IOS apps to have a simplified CI/CD pipeline history view."

What is our primary use case?

Our primary use case is custom and vendor-supplied web application log aggregation, performance tracing and alerting.

We run a mix of AWS EC2, Azure serverless, and colocated VMWare servers to support higher education web applications.

Managing a hybrid multi-cloud solution across hundreds of applications is always a challenge. Datadog agents on each web host and native integrations with GitHub, AWS, and Azure get all of our instrumentation and error data in one place for easy analysis and monitoring.

How has it helped my organization?

Through the use of Datadog across all of our apps, we were able to consolidate a number of alerting and error-tracking apps, and Datadog ties them all together in cohesive dashboards. Whether the app is vendor-supplied or we built it ourselves, the depth of tracing, profiling, and hooking into logs is all obtainable and tunable. Both legacy .NET Framework and Windows Event Viewer and cutting-edge .NET Core with streaming logs all work. The breadth of coverage for any app type or situation is really incredible. It feels like there's nothing we can't monitor.

What is most valuable?

When it comes to Datadog, several features have proven particularly valuable.

The centralized pipeline tracking and error logging provide a comprehensive view of our development and deployment processes, making it much easier to identify and resolve issues quickly.

Synthetic testing has been a game-changer, allowing us to catch potential problems before they impact real users. Real user monitoring gives us invaluable insights into actual user experiences, helping us prioritize improvements where they matter most. And the ability to create custom dashboards has been incredibly useful, allowing us to visualize key metrics and KPIs in a way that makes sense for different teams and stakeholders.

Together, these features form a powerful toolkit that helps us maintain high performance and reliability across our applications and infrastructure, ultimately leading to better user satisfaction and more efficient operations.

What needs improvement?

I'd like to see an expansion of the Android and IOS apps to have a simplified CI/CD pipeline history view. I like the idea of monitoring on the go, however, it seems the options are still a bit limited out of the box.

While the documentation is very good considering all the frameworks and technology Datadog covers, there are areas - specifically .NET Profiling and Tracing of IIS-hosted apps - that need a lot of focus to pick up on the key details needed. In some cases the screenshots don't match the text as updates are made. I feel I spent longer than I should figuring out how to correlate logs to traces, mostly related to environmental variables.

For how long have I used the solution?

I've used the solution for about three years.

What do I think about the stability of the solution?

We have been impressed with the uptime and clean and light resource usage of the agents.

What do I think about the scalability of the solution?

The solution was very scalable and very customizable.

How are customer service and support?

Sales service is always helpful in tuning our committed costs and alerting us when we start spending outside the on-demand budget.

Which solution did I use previously and why did I switch?

We used a mix of a custom error email system, SolarWinds, UptimeRobot, and GitHub actions. We switched to find one platform that could give deep app visibility regardless of Linux, Windows, Container, cloud or on-prem hosted.

How was the initial setup?

The setup is generally simple. That said, .NET Profiling of IIS and aligning logs to traces and profiles was a challenge.

What about the implementation team?

The solution was iImplemented in-house.

What was our ROI?

I'd count our ROI as significant time saved by the development team assessing bugs and performance issues.

What's my experience with pricing, setup cost, and licensing?

It's a good idea to set up live trials to asses cost scaling. Small decisions around how monitors are used can have big impacts on cost scaling.

Which other solutions did I evaluate?

NewRelic was considered. LogicMonitor was chosen over Datadog for our network and campus server management use cases.

What other advice do I have?

We are excited to dig further into the new offerings around LLM and continue to grow our footprint in Datadog.

Which deployment model are you using for this solution?

Hybrid Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

Buyer's Guide

Download our free Datadog Report and get advice and tips from experienced pros sharing their opinions.

Updated: July 2026

DOWNLOAD NOW

Product Categories

Cloud Monitoring Software Application Performance Monitoring (APM) and Observability Network Monitoring Software IT Infrastructure Monitoring Log Management Container Monitoring AIOps Cloud Security Posture Management (CSPM) AI Observability

Buyer's Guide

Download our free Datadog Report and get advice and tips from experienced pros sharing their opinions.

Quick Links

Learn More:

Questions: